For scraping purposes, I need good quality proxies every day for 1–2 hours, when my web crawlers are running. I don’t need proxies 24/hours per day.
So it makes sense to use AWS EC2 t2.nano/t2.micro instances for proxy servers and create/destroy them on demand within HTTP API.
Benefits of this approach:
- If you are not reached AWS Free Tier yet, that means you have 750 hours and 15 GB of outbound traffic per month for free, for up to 20 EC2 instances (VPS servers) running at the same time. For example, you’ll spend only 600 hours/month, if you need 20 proxies per day for a one hour (20 x 1 x 30 = 600).
- When you’ll exceed 15 GB/month free tier limit of outbound traffic, AWS will charge you only $0.090 per additional GB. I think it’s pretty cheap.
- Each new instance (or instance which was stopped and started again) will get a random IPv4 address from pool, so each time we’ll have fresh proxy IP’s.
Even if free tier is not available for you, price for EC2 t2.nano instance is only $0.0058 per Hour, and price for 1 GB outbound traffic is $0.090.
To automate the process of creation EC2 instances and installation proxy server software, I used next tools:
- Terraform to automatically create/install software/destroy EC2 instances
- Goproxy for a proxy server. Simple but powerful: one-line command installation, zero configuration. HTTPS, SOCKS5 proxy with optional authorization out of box
- Ruby Sinatra gem for HTTP API to manage proxy instances (optional)
- Ubuntu 16.04 server
- Systemd to convert goproxy process to the system daemon service
To give a try, check the project repo on github here: https://github.com/vifreefly/ec2_proxies