For requesting the target site via a proxy server we just should specify the
--proxy-server launch parameter with a proper proxy address. For example,
As a result,
httpbin should respond with a JSON, that contains the exact proxy server address, so the code above can be used for the further proxy IP address testing:
Pretty simple, isn't it? The only one downside of this approach, that the defined proxy server will be used for all the requests from the browser start, and for changing the proxy server the browser should be relaunched by
puppeteer.launch with a new proxy IP address.
To avoid ban while web scraping you need to use different proxies and rotate them. In case of implementing your custom IP pool you'll need to re-launch your headless Chrome each time with a new proxy server settings. How to implement proxy rotation by each browser request?
The answer is pretty simple - you may intercept each request with your own proxy rotation tool! That kind of tool will handle proxy rotation for the browser, and you'll be able to save the precious time while web scraping.
The proxies in the example above can be outdated at the moment of article reading. You can find the freshest proxies at our Free proxy page.
The only one disadvantage of this method is that you have to handle bigger codebase and have a deep dive into networking, proxy management and maintenance.
In order to simplify the web scraper and have more space while scraping at scale you might want to get rid of the infrastructure pain and just focus on what you really want to achieve (extract the data).
With ScrapingAnt Web Scraping API, you can forget about any complications with IP rotation, and the internal anti-scraping avoiding mechanisms will help you to not be detected by Cloudflare. You can use it for free, follow here to sign in and get your API token.