Web scraping specialists are dealing with using proxy servers to overcome various anti-bot defenses every day. One of those protections is IP rate limiting, a primary anti-scraping mechanism.
Let's learn more about this protection method and the most effective ways of bypassing it.
What is IP rate limiting?
Rate limiting is a strategy for limiting network traffic. It defines how often someone can repeat an action within a specific timeframe – for instance, trying to view content.
Imagine that someone is constantly trying to get information from your website, and you can observe it from your web server logs. Of course, the first idea is to ban this IP address, but what if the web scraper uses a dynamic IP or the whole bunch of users is behind this IP address?
At this point, it is way better to limit the number of requests that a user can perform instead of banning. It would not harm real users but drastically slow a web scraper performance.
Rate limiting libraries
IP rate-limiting techniques can be implemented on different levels of web resources, but for a better understanding of this protection method nature, I have prepared a list of the most common open-source libraries for IP restrictions implementation.
Most of those libraries are just implementations of the same algorithm, so feel free to pick the most familiar programming language to check the sources.
Let's check them out:
Python
ratelimit - a simple Python library with a rate-limiting decorator for a usage simplicity
NodeJS
express-slow-down and express-rate-limit - middleware libraries for the most popular NodeJS framework
Java
bucket4j - a rate-limiting library based on token-bucket algorithm
PHP
rate-limit - a basic PHP rate limiter with support of an external storage
Also, check out open-source rate-limiting implementations for web servers and load balancers:
HAProxy
HAProxy Rate Limiting: Four Examples
NGINX
Rate Limiting with NGINX and NGINX Plus
Traefik
Forget about getting blocked
As we have already acknowledged all the needed information about our target, let's discover the methods of fighting against it.
Free Proxy server
Using a free proxy server is the most cost-efficient solution to overcome the rate limiting protection. Still, as we have concluded in our previous article Best Free Proxy Scraping Tools, it is insanely unstable and suitable only for educational purposes or small web scraping projects.
For a quick recap, the main free proxy server disadvantages are slow speed, significant latency, short availability time, and limited sources to grab free proxies for web scraping.
In case of the points above haven't changed your mind about free proxies, and you still want to use them - check out our free proxies for web scraping.
Paid Proxy server
Another great option is to use a paid proxy server.
There are a lot of different proxy providers and proxy types, so you can easily find a suitable one in a few top search results.
Proxy providers may provide their services by the following methods:
- Lend IP address - this method would allow you to use a specific IP address as a proxy for a specified timeframe. Usually, traffic consumption while using an IP address will not be billed in this case (unlimited).
- Lend IP address pool - this method would allow you to use a determined collection of proxies, and a service provider will rotate those proxies for you so that each new request will be performed from a new IP. Usually, traffic is billable in this method.
- Access a shared IP address pool - this method is almost the same as previous, but provided IP addresses would be provided to other service users too. The main benefit of this approach is more affordable pricing.
Also, different proxy providers may apply various limitations and billing techniques:
- Pay per IP - a proxy provider allows using a specific IP address as a proxy and bills per amount of provided IP addresses. Traffic amount is not limited.
- Pay per GB - a proxy provider allows utilizing a pool of IP addresses. The only amount of traffic counts.
- Pay per thread - proxy provider allows using a pool of IP addresses, traffic amount would not be billed, but the number of simultaneous connections via proxy is limited.
The general idea behind using paid proxies (and proxies at all) while avoiding IP rate limiting is to obtain the most extensive rotating proxy pool for the best price. Using many different IP addresses would allow getting data from a protected website.
VPN
VPN is another technology that allows hiding real IP address. It is often used to access geographically restricted web resources and protect private traffic from spying by real users.
VPN technology aims to create a tunnel between a specific computer and a target network so that this computer will act as a part of this network. For example, a user from France can create a VPN tunnel to a US network and proceed with requesting data from the web as a US user.
This method can also be used to avoid IP restrictions. Still, usually, VPN services don't provide a significant diversity of IP addresses, which means that obtained via VPN IP address may be rate-limited quite fast.
Web Scraping API
The most significant part of this article - Web Scraping API.
It is an exceptional service that ideally overcomes rate-limiting. Web Scraping API handles different IP types and IP rotation for you and proxies pool size and headless browser usage. It means that the only thing is needed to avoid rate limiting - register and start making requests via API.
The usage is simple as the following CURL example:
curl --request GET \
--url 'https://api.scrapingant.com/v1/general?url=https%3A%2F%2Fhttpbin.org%2Fip' \
--header 'x-api-key: <YOUR_SCRAPINGANT_API_TOKEN>'
For obtaining your API token, please, log in to the ScrapingAnt dashboard. It's free for personal use.
Each execution of this CURL request will display a requester IP address, and each time this response will be different, as ScrapingAnt uses more than 60.000 IP addresses across the World.
And the best part of ScrapingAnt's Web Scraping API is that it is free for personal use (10.000 requests per month).
Conclusion
There are a lot of different techniques for overcoming IP rate-limiting, and usage of each particular one depends on project requirements. Still, I would like to highlight paid proxies and web scraping API. Both of them provide a reliable quality along with IP addresses diversity which is a crucial part of IP rate-limiting avoidance.
Also, check out our related resources:
- Free and Publicly Available Proxies - our list of scraped and checked proxy servers
- Best Open Source Tools for Proxy Scraping - how to automate free proxy harvesting
- Residential VS Datacenter Proxies for Web Scraping - what is the difference between residential and datacenter proxy
- Three Reasons You Might Reconsider Getting a Free Proxy Server for Web Scraping - why using a free proxy server for web scraping is not the best idea
Happy Web Scraping, and don't forget to benchmark proxy providers before choosing one 🎯