Skip to main content

6 Puppeteer Tricks to Avoid Detection and Make Web Scraping Easier

Oleg Kulyk

Oleg Kulyk

Co-Founder @ ScrapingAnt

6 Puppeteer Tricks to Avoid Detection and Make Web Scraping Easier

As you know, Puppeteer is a high-level API to control headless Chrome, and it's probably one of the most popular web scraping tools on the Internet. The only problem is that an average web developer might be overloaded by tons of possible settings for a proper web scraping setup.

I want to share 6 handy and pretty obvious tricks that should help web developers to increase web scraper success rate, improve performance and avoid bans.

How to use a proxy in Playwright

Oleg Kulyk

Oleg Kulyk

Co-Founder @ ScrapingAnt

How to use a proxy in Playwright?

Playwright is a high-level API to control and automate headless Chrome (Chromium), Firefox and Webkit. It can be considered as an extended Puppeteer, as it allow using more browser types to automate modern web apps testing and scraping. Playwright API can be used in JavaScript & TypeScript, Python, C# and, Java. In this article, we are going to show how to set up a proxy in Playwright for all the supported browsers.

How to use rotating proxies with Puppeteer

Oleg Kulyk

Oleg Kulyk

Co-Founder @ ScrapingAnt

How to use rotating proxies with Puppeteer?

Puppeteer is a high-level API to control headless Chrome. Most things that you can do manually in the browser can be done using Puppeteer, so it quickly became one of the most popular web scraping tool in Node.js and Python. Many developers use it for a single page applications (SPA) data extraction as it allows executing client-side Javascript. In this article, we are going to show how to set up a proxy in Puppeteer and how to spin up your own rotating proxy server.

How to use Microsoft Edge with Playwright

Oleg Kulyk

Oleg Kulyk

Co-Founder @ ScrapingAnt

How to use Microsoft Edge with Playwright

Web scraping a website with the actually supported or other browsers has a real benefit in ensuring that the scraper will not be banned by the fingerprint or the behavioral pattern. Playwright already provides full support for Chromium, Firefox, and WebKit out of the box without installing the browsers manually, but since most of the users out there use Google Chrome or Microsoft Edge instead of the open-source Chromium variant, in some scenarios, it's safer to use them to emulate a more realistic browser environment.

HTML Parsing Libraries - C#

Oleg Kulyk

Oleg Kulyk

Co-Founder @ ScrapingAnt

HTML Parsing Libraries - C#

Web sites are written using HTML, which means that each web page is a structured document. Sometimes the goal is to obtain some data from them and preserve the structure while we’re at it. Websites don’t always provide their data in comfortable formats such as CSV or JSON, so only the way to deal with it is to parse the HTML page.