Skip to main content

10 posts tagged with "playwright"

View All Tags

· 16 min read
Oleg Kulyk

Web Scraping with Java

Java is one of the most popular and high demanded programming languages nowadays. It allows creating highly-scalable and reliable services as well as multi-threaded data extraction solutions. Let's check out the main concepts of web scraping with Java and review the most popular libraries to setup your data extraction flow.

· 6 min read
Oleg Kulyk

How to download a file with Playwright?

In this article, we will share several ideas on how to download files with Playwright. Automating file downloads can sometimes be confusing. You need to handle a download location, download multiple files simultaneously, support streaming, and even more. Unfortunately, not all the cases are well documented. Let's go through several examples and take a deep dive into Playwright's APIs used for file download.

· 10 min read
Oleg Kulyk

Scrape a Dynamic Website with Python

Internet extends fast and modern websites pretty often use dynamic content load mechanisms to provide the best user experience. Still, on the other hand, it becomes harder to extract data from such web pages, as it requires the execution of internal Javascript in the page context while scraping. Let's review several conventional techniques that allow data extraction from dynamic websites using Python.

· 12 min read
Oleg Kulyk

Web Scraping with Javascript

Javascript (JS) becomes more popular as a programming language for web scraping. The whole domain becomes more demanded, and more technical specialists try to start data mining with a handy scripting language. Let's check out the main concepts of web scraping with Javascript and review the most popular libraries to improve data extraction flow.

· 4 min read
Oleg Kulyk

How to use a proxy in Playwright?

Playwright is a high-level API to control and automate headless Chrome (Chromium), Firefox and Webkit. It can be considered as an extended Puppeteer, as it allows using more browser types to automate modern web apps testing and scraping. Playwright API can be used in JavaScript & TypeScript, Python, C# and, Java. In this article, we are going to show how to set up a proxy in Playwright for all the supported browsers.

· 4 min read
Oleg Kulyk

How to use Microsoft Edge with Playwright

Web scraping a website with the actually supported or other browsers has a real benefit in ensuring that the scraper will not be banned by the fingerprint or the behavioral pattern. Playwright already provides full support for Chromium, Firefox, and WebKit out of the box without installing the browsers manually, but since most of the users out there use Google Chrome or Microsoft Edge instead of the open-source Chromium variant, in some scenarios, it's safer to use them to emulate a more realistic browser environment.