Skip to main content

37 posts tagged with "web scraping"

View All Tags

How to parse HTML in .NET

Oleg Kulyk

Oleg Kulyk

Co-Founder @ ScrapingAnt

How to parse HTML in .NET

HTML parsing is a vital part of web scraping, as it allows convert web page content to meaningful and structured data. Still, as HTML is a tree-structured format, it requires a proper tool for parsing, as it can't be property traversed using Regex.

This article will reveal the most popular .NET libraries for HTML parsing with their strong and weak parts.

Web Scraping with Java

Oleg Kulyk

Oleg Kulyk

Co-Founder @ ScrapingAnt

Web Scraping with Java

Java is one of the most popular and high demanded programming languages nowadays. It allows creating highly-scalable and reliable services as well as multi-threaded data extraction solutions. Let's check out the main concepts of web scraping with Java and review the most popular libraries to setup your data extraction flow.

How to download a file with Playwright?

Oleg Kulyk

Oleg Kulyk

Co-Founder @ ScrapingAnt

How to download a file with Playwright?

In this article, we will share several ideas on how to download files with Playwright. Automating file downloads can sometimes be confusing. You need to handle a download location, download multiple files simultaneously, support streaming, and even more. Unfortunately, not all the cases are well documented. Let's go through several examples and take a deep dive into Playwright's APIs used for file download.

Benefits of Web Scraping for Hospitality

ScrapingAnt Team

ScrapingAnt Team

ScrapingAnt

Benefits of Web Scraping for Hospitality

We all want our business to succeed. If you are in the hospitality business, you want to hit your targets and surpass them. You want to beat your competitors through anything that will keep you on top or still running. You can achieve this in so many different ways. Lately, and the most modern method of placing your hospitality business upfront is through web scraping.

Uses of Web Scraping for Price Monitoring

ScrapingAnt Team

ScrapingAnt Team

ScrapingAnt

Uses of Web Scraping for Price Monitoring

Consumers nowadays are constantly looking for discounts, special offers, and compare prices in different online businesses. Therefore, you, too, as a business owner, should be alert and check how prices fluctuate among your competitors. It would be best if you were up to date on pricing so that, you too, can offer your customers better deals. Consequently, you will retain your customers and even reach more.

Residential vs Datacenter Proxies in Web Scraping

ScrapingAnt Team

ScrapingAnt Team

ScrapingAnt

Residential vs Datacenter Proxies in Web Scraping

Web scraping software has made it extremely helpful for a business to base its advertising system according to the gathered information and make informed decisions. Web scraping software can operate efficiently and safely only with the use of a reliable proxy. In fact, proxies are a significant part of a decent web scraping project. Adding proxies to your scraping programs offers various advantages, however, choosing the best proxy for your scraping project might be a difficult task.

Web Scraping for Data Scientists

ScrapingAnt Team

ScrapingAnt Team

ScrapingAnt

Web Scraping for Data Scientists

Data is all around us, and scientists train themselves to question everything. Scientists usually spend hours studying data in their specific field to facilitate learning, understanding, innovation.

However, to procure the volume of data necessary, scientists often need help from computer programs and AI technology. Many times, the correct technology for this job is a web scraping tool.

This article will explain the uses of web scraping for data scientists, information about web scraping, and why ScrapingAnt can help you get the information you need.

Web Scraping with Deno

Oleg Kulyk

Oleg Kulyk

Co-Founder @ ScrapingAnt

Web Scraping with Deno

Dynamic languages are helpful tools for web scraping. Scripting allows users to rapidly tie together complex systems or libraries and express ideas without dealing with memory management or build systems.

JavaScript is the most popularly used dynamic language, operating on every device with a web browser, and Node.js as a JS runtime proved to be a very successful software platform. Due to design mistakes, it became hard to evolve with an existing user base, so Deno was born to resolve all the problems. Let's find out how to scrape the web and dynamic websites with Deno.

Scrape a Dynamic Website with Python

Oleg Kulyk

Oleg Kulyk

Co-Founder @ ScrapingAnt

Scrape a Dynamic Website with Python

Internet extends fast and modern websites pretty often use dynamic content load mechanisms to provide the best user experience. Still, on the other hand, it becomes harder to extract data from such web pages, as it requires the execution of internal Javascript in the page context while scraping. Let's review several conventional techniques that allow data extraction from dynamic websites using Python.

Web Scraping with Javascript (NodeJS)

Oleg Kulyk

Oleg Kulyk

Co-Founder @ ScrapingAnt

Web Scraping with Javascript

Javascript (JS) becomes more popular as a programming language for web scraping. The whole domain becomes more demanded, and more technical specialists try to start data mining with a handy scripting language. Let's check out the main concepts of web scraping with Javascript and review the most popular libraries to improve data extraction flow.

6 Puppeteer Tricks to Avoid Detection and Make Web Scraping Easier

Oleg Kulyk

Oleg Kulyk

Co-Founder @ ScrapingAnt

6 Puppeteer Tricks to Avoid Detection and Make Web Scraping Easier

As you know, Puppeteer is a high-level API to control headless Chrome, and it's probably one of the most popular web scraping tools on the Internet. The only problem is that an average web developer might be overloaded by tons of possible settings for a proper web scraping setup.

I want to share 6 handy and pretty obvious tricks that should help web developers to increase web scraper success rate, improve performance and avoid bans.

How to use a proxy in Playwright

Oleg Kulyk

Oleg Kulyk

Co-Founder @ ScrapingAnt

How to use a proxy in Playwright?

Playwright is a high-level API to control and automate headless Chrome (Chromium), Firefox and Webkit. It can be considered as an extended Puppeteer, as it allows using more browser types to automate modern web apps testing and scraping. Playwright API can be used in JavaScript & TypeScript, Python, C# and, Java. In this article, we are going to show how to set up a proxy in Playwright for all the supported browsers.

How to use rotating proxies with Puppeteer

Oleg Kulyk

Oleg Kulyk

Co-Founder @ ScrapingAnt

How to use rotating proxies with Puppeteer?

Puppeteer is a high-level API to control headless Chrome. Most things that you can do manually in the browser can be done using Puppeteer, so it quickly became one of the most popular web scraping tool in Node.js and Python. Many developers use it for a single page applications (SPA) data extraction as it allows executing client-side Javascript. In this article, we are going to show how to set up a proxy in Puppeteer and how to spin up your own rotating proxy server.

How to use Microsoft Edge with Playwright

Oleg Kulyk

Oleg Kulyk

Co-Founder @ ScrapingAnt

How to use Microsoft Edge with Playwright

Web scraping a website with the actually supported or other browsers has a real benefit in ensuring that the scraper will not be banned by the fingerprint or the behavioral pattern. Playwright already provides full support for Chromium, Firefox, and WebKit out of the box without installing the browsers manually, but since most of the users out there use Google Chrome or Microsoft Edge instead of the open-source Chromium variant, in some scenarios, it's safer to use them to emulate a more realistic browser environment.

HTML Parsing Libraries - C#

Oleg Kulyk

Oleg Kulyk

Co-Founder @ ScrapingAnt

HTML Parsing Libraries - C#

Web sites are written using HTML, which means that each web page is a structured document. Sometimes the goal is to obtain some data from them and preserve the structure while we’re at it. Websites don’t always provide their data in comfortable formats such as CSV or JSON, so only the way to deal with it is to parse the HTML page.

HTML Parsing Libraries - JavaScript

Oleg Kulyk

Oleg Kulyk

Co-Founder @ ScrapingAnt

HTML Parsing Libraries - JavaScript

HTML is a simple structured markup language and everyone who is going to write the web scraper should deal with HTML parsing. The goal of this article is to help you to find the right tool for HTML processing. We are not going to present libraries for more specific tasks, such as article extractors, product extractors, or web scrapers.

Top Popular JavaScript Libraries for Web Scraping in 2020

Oleg Kulyk

Oleg Kulyk

Co-Founder @ ScrapingAnt

Top 5 Popular Javascript Libraries for Web Scraping in 2020

We’d like to continue the sequence of our posts about Top 5 Popular Libraries for Web Scraping in 2020 with a new programming language - JavaScript.

JS is a quite well-known language with a great spread and community support. It can be used for both client and server web scraping scripting that makes it pretty suitable for writing your scrapers and crawlers.

Most of these libraries' advantages can be received by web scraping API and some of these libraries can be used in stack with it.

So let’s check them out.

Top 5 Popular Python Libraries for Web Scraping in 2020

Oleg Kulyk

Oleg Kulyk

Co-Founder @ ScrapingAnt

Top 5 Popular Python Libraries for Web Scraping in 2020

It is a well-known fact that Python is one of the most popular programming languages for data mining and Web Scraping. There are tons of libraries and niche scrapers around the community, but we’d like to share the 5 most popular of them.

Most of these libraries' advantages can be received by web scraping API and some of these libraries can be used in stack with it.

Amazon Product Scraping. Relatively Easy.

Oleg Kulyk

Oleg Kulyk

Co-Founder @ ScrapingAnt

Amazon product scraping

In the current article, I’d like to share my experience with Amazon products scraping. The well-known Amazon marketplace offers the best deals for thousands of product types and from thousands of sellers. The potential amount of data to scrape is quite insane and can be used for:

  • Market price comparison
  • Price change tracking
  • Analyzing product reviews
  • Copyright check
  • Finding the best products for selling or dropshipping
  • A lot of data science and machine learning stuff

What is Web Scraping? A Special Guide.

Oleg Kulyk

Oleg Kulyk

Co-Founder @ ScrapingAnt

What is web scraping?

Web Scraping, also known as web data extraction, is the process of retrieving or “scraping” data from a website. Data displayed by most websites can only be viewed using a web browser. Most websites do not provide the option to save the data which they display to your local storage, or to your own website. This is where a Web Scraping software like ScrapingAnt comes in handy.