Web scraping a website with the actually supported or other browsers has a real benefit in ensuring that the scraper will not be banned by the fingerprint or the behavioral pattern. Playwright already provides full support for Chromium, Firefox, and WebKit out of the box without installing the browsers manually, but since most of the users out there use Google Chrome or Microsoft Edge instead of the open-source Chromium variant, in some scenarios, it's safer to use them to emulate a more realistic browser environment.
On macOS systems, the browsers are installed in the
/Applications directory, with the related binaries. For Linux, the browsers are commonly installed in the
/usr/bin directory; you'll find some examples below. On Windows systems, the browsers are installed in the
C:\Program Files (x86)\ directory.
Check out examples of Canary and Nightly build places inside popular macOS, Windows and Linux directories:
/Applications/Microsoft Edge Canary.app/Contents/MacOS/Microsoft Edge Canary- Microsoft Edge Canary on macOS
/Applications/Google Chrome Canary.app/Contents/MacOS/Google Chrome Canary- Google Chrome Canary on macOS
/usr/bin/google-chrome-unstable- Google Chrome Canary on Ubuntu
C:\Users\<username>\AppData\Local\Google\Chrome SxS\Application\chrome.exe- Google Chrome Canary on Windows
/Applications/Brave Browser Nightly.app/Contents/MacOS/Brave Browser Nightly- Brave Nightly on macOS
To launch the selected browser from code you just need to pass the
executablePath inside the
As the browser launches with
headless: false option you'll be able to observe the browser start. Also, we're using the playwright-core package, which only installs the library instead of downloading the browsers which we don't need in our case.
Only the Canary builds are eligible for use with Playwright. To get one just visit the official website.
To use Playwright, we need a recent Canary build too. Starting from Oct 2020 you're able to use it on Linux as well. The browser can be downloaded on the official website.
Brave itself does not rely on the official Chromium release schedule, that's why their latest versions are not the same as Chromium. There are no guarantees that all Playwright functionality will work out of the box. If you still want to try it out, you can obtain their Nightly version on their official website.
In this article, we've observed a pretty easy way of connecting Chromium-based browsers with Playwright. To avoid getting blocked, it's a nice way of blurring the browser fingerprint more than using just common techniques like using the stealth plugin. For the advanced usage info and documentation about Playwright features, please, follow the official website playwright.dev.
Happy web scraping and don't forget to pass the cookies while data extraction!