HTML is a simply structured markup language and everyone who is going to write a web scraper should deal with HTML parsing. The goal of this article is to help you find the right tool for HTML processing.
242 posts tagged with "web scraping"
View All TagsHow to Collect Data from TikTok
There is a lot of news related to TikTok being sold to US companies and the issue of scraping TikTok data becomes more real due to the possible closing of the service.
Web browser automation with Python and Playwright
In this article, we'd like to share the current state of Playwright integration with Python and several helpful code snippets for understanding the code techniques.
HTML Parsing Libraries - JavaScript
HTML is a simple structured markup language and everyone who is going to write the web scraper should deal with HTML parsing. The goal of this article is to help you to find the right tool for HTML processing. We are not going to present libraries for more specific tasks, such as article extractors, product extractors, or web scrapers.
Open Source Javascript Web Scraping
In this article, I’d like to list some most popular Javascript open-source projects that can be useful for web scraping. It consists of both libraries and standalone niche scrapers that can scrape a particular site (Amazon, iTunes, Instagram, Google Play, etc.)
Scraping with millions of browsers or Puppeteer Cluster
In this article, we’d like to introduce an awesome open-source Web Scraping solution for running a pool of Chromium instances using Puppeteer.
How to run Playwright on AWS Lambda
In this article, I’d like to share a quick guide of how to run Playwright inside AWS Lambda. There are a bunch of similar guides about Puppeteer, but only a few are about the successor from Microsoft.
Top Popular JavaScript Libraries for Web Scraping in 2024
We’d like to continue the sequence of our posts about Top 5 Popular Libraries for Web Scraping in 2024 with a new programming language - JavaScript.
JS is a quite well-known language with a great spread and community support. It can be used for both client and server web scraping scripting that makes it pretty suitable for writing your scrapers and crawlers.
Most of these libraries' advantages can be received by web scraping API and some of these libraries can be used in stack with it.
So let’s check them out.
Top 5 Popular Python Libraries for Web Scraping in 2024
It is a well-known fact that Python is one of the most popular programming languages for data mining and Web Scraping. There are tons of libraries and niche scrapers around the community, but we’d like to share the 5 most popular of them.
Most of these libraries' advantages can be received by web scraping API and some of these libraries can be used in stack with it.
AngularJS site scraping. The easy deal with Puppeteer and Headless Chrome.
AngularJS is a quite common framework for building modern Single Page Applications, but what about the ability to scrape sites based on it? Let’s find out.