Puppeteer Stealth Page Fetcher Info

During the web scraping using Selenium or Headless Chrome, some websites can recognize that they are visited by an automated browser and then block it (there are ready to use services to protect websites from webscraping, for example like Distil).

It happends because browser environment in automation mode is different than in the normal one. There are possible ways how to change the environment so it will look similar to a normal non-automated browser, you can read more info about it in the following articles:

At the moment, the most modern and well-used Automation Browser API is Puppeteer for Headless Chrome. It is also has a lot of useful options which gives the possibility to change the browser environment. On the other hand, there is Selenium, which is less configurable, has less settings to tweak and in general not a good fit for web scraping/web automation if we want to hide the fact that browser is automated.

Puppeteer also has ready to use plugins which allows to mimic the normal non-automated browser like this one https://github.com/berstend/puppeteer-extra/tree/master/packages/puppeteer-extra-plugin-stealth .

I’ve build a simple Puppeteer (with puppeteer-extra-plugin-stealth enabled) script which can be called from a command line with a provided webpage url and will print HTML DOM output back to the console. It’s as easy as:

$ node page_fetcher.js --url "https://www.google.com/"

# <html>...</html>

You can also can it from any other programming language, here is an example for Ruby:

html = `node page_fetcher.js --url "https://www.google.com/"`

Multiple options are supported, like providing a user-agent, proxy, custom delay, make a screenshot or run in a visible mode (for debug). Check it out here: https://github.com/vifreefly/Puppeteer-Stealth-Page-Fetcher .

First thing to do after creating a new VPS server: setup deploy user and enable SSH-key authentication only

0) If after creating a VPS server you received a root user password, it’s recommended to change it. To do it, login to the server as root and type $ passwd command. Save a new root password somewhere so you will not forget it.

1) Create on the remote server a deploy user:

# on the server, as a root user

$ adduser deploy
$ adduser deploy sudo

2) Make sure that you can login to the server as a deploy user without password prompt:

Read More

Did you know that you can use Bundler without Gemfile?

Did you know that you can use Bundler inside single Ruby script (without Gemfile) and automatically install required dependencies for it?

# example.rb

require 'bundler/inline'

gemfile do
  source 'https://rubygems.org'
  gem 'rest-client'
  gem 'nokogiri'
end

###

body = RestClient.get("https://www.reddit.com/r/ruby/").body

puts "Posts from r/ruby front page:"
Nokogiri::HTML(body).xpath("//div[contains(@class, 'scrollerItem')]//h2").each do |h2|
  puts h2.text.strip
end

Read More

VPN on Linux Ubuntu Desktop: user-friendly way

What is better: VPN provider or own private VPN, based on VPS server? Even if it’s not a problem for you to manually setup the server (buy VPS and install/configure OpenVPN there), almost every VPN provider has multiple IP locations feature. In one click you can switch your location from USA/New York to Europe/Amsterdam and so on. In case of your own VPN, IP is always the same.

The sad thing is that many of VPN providers don’t have a client for Linux. Here are who does:

Right after easy registration (you have to provide only account email and password, credit card in not required) Tunnel Bear gives you 512 mb for free.

Let’s use it and see how to setup VPN on Ubuntu Desktop (18.04-18.10) in a few, user-friendly steps:

Read More

Increase readability of your bash scripts using functions

You can find it very obvious, but there are tons of bash scripts out there written very badly.

People often forget that Bash actually a programming language. And just like JavaScript, Python, Ruby, GoLang and many others languages, Bash language has functions.

Let’s check simple bash function which prints green string taken as an argument:

logger() {
  local GREEN="\033[1;32m"
  local NC="\033[00m"

  echo -e "${GREEN}Logger: $1 ${NC}"
}

Read More