249 posts tagged with "data extraction"

Best Free Proxy Scraping Tools

November 23, 2021 · 7 min read

Co-Founder @ ScrapingAnt

Best open source proxy scrapers

Using a quality proxy server is the key to a successful web scraper. A variety of IPs along with their quality make it possible to collect data from various web sites without worrying about being blocked.

Still, many websites provide free proxy lists, so can the process of getting IP addresses from them be automated? Are free proxies good enough for web scraping? Let's check it out.

How to make POST, PUT and DELETE requests using Puppeteer?

October 25, 2021 · 5 min read

Oleg Kulyk

Co-Founder @ ScrapingAnt

Puppeteer make POST request

Making POST, PUT, and DELETE requests is a crucial web scraping and web testing technique. Still, this functionality is not included in Puppeteer's API as a separate function.

Let's check out the workaround for this situation and create a helper function to fix this out.

How to get all text from a webpage using Puppeteer?

September 5, 2021 · 5 min read

Oleg Kulyk

Co-Founder @ ScrapingAnt

Puppeteer extracts text from webpage

While communicating with our web scraping API users, we've found that many of them use a whole web page text extraction for further data manipulation.

It's interesting, as such an approach simplifies the needed data extraction by just picking the particular text row from the text or using RegExp.

How to download images with NodeJS?

August 26, 2021 · 5 min read

Oleg Kulyk

Co-Founder @ ScrapingAnt

How to download images with NodeJS?

Working with images in NodeJS extends your web scraping capabilities, from downloading the image with an URL to retrieving photo attributes like EXIF. How to achieve the image download and obtain the data?

This article is a part of the series on image downloading with different programming languages. Check out the other articles in the series:

How to parse HTML in .NET

July 18, 2021 · 8 min read

Oleg Kulyk

Co-Founder @ ScrapingAnt

How to parse HTML in .NET

HTML parsing is a vital part of web scraping, as it allows convert web page content to meaningful and structured data. Still, as HTML is a tree-structured format, it requires a proper tool for parsing, as it can't be property traversed using Regex.

This article will reveal the most popular .NET libraries for HTML parsing with their strong and weak parts.

Block resources with Playwright

June 21, 2021 · 5 min read

Oleg Kulyk

Co-Founder @ ScrapingAnt

Block resources with Playwright

This article will expose how to block specific resources (HTTP requests, CSS, video, images) from loading in Playwright. Playwright is Puppeteer's successor with the ability to control Chromium, Firefox, and Webkit. So I'd call it the second one of the most widely used web scraping and automation tools with headless browser support.

Web Scraping with Java

June 13, 2021 · 16 min read

Oleg Kulyk

Co-Founder @ ScrapingAnt

Web Scraping with Java

Java is one of the most popular and high demanded programming languages nowadays. It allows creating highly-scalable and reliable services as well as multi-threaded data extraction solutions. Let's check out the main concepts of web scraping with Java and review the most popular libraries to setup your data extraction flow.

How to submit a form with Playwright?

May 30, 2021 · 5 min read

Oleg Kulyk

Co-Founder @ ScrapingAnt

How to submit a form with Playwright?

In this article, we'll take a look at how to submit forms using Playwright. This knowledge might be beneficial while scraping the web, as it allows to get the information from the target web page, which requires providing parameters before.

Looking for a Puppeteer guide? Check out: How to submit a form with Puppeteer?

How to download a file with Playwright?

May 26, 2021 · 6 min read

Oleg Kulyk

Co-Founder @ ScrapingAnt

How to download a file with Playwright?

In this article, we will share several ideas on how to download files with Playwright. Automating file downloads can sometimes be confusing. You need to handle a download location, download multiple files simultaneously, support streaming, and even more. Unfortunately, not all the cases are well documented. Let's go through several examples and take a deep dive into Playwright's APIs used for file download.

This guide is a part of the series on web scraping and file downloading with different web drivers and programming languages. Check out the other articles in the series:

Benefits of Web Scraping for Hospitality

May 19, 2021 · 8 min read

ScrapingAnt Team

ScrapingAnt

Benefits of Web Scraping for Hospitality

We all want our business to succeed. If you are in the hospitality business, you want to hit your targets and surpass them. You want to beat your competitors through anything that will keep you on top or still running. You can achieve this in so many different ways. Lately, and the most modern method of placing your hospitality business upfront is through web scraping.