Skip to main content

Top Popular JavaScript Libraries for Web Scraping in 2020

Oleg Kulyk

Oleg Kulyk

Co-Founder @ ScrapingAnt

Top 5 Popular Javascript Libraries for Web Scraping in 2020

We’d like to continue the sequence of our posts about Top 5 Popular Libraries for Web Scraping in 2020 with a new programming language - JavaScript.

JS is a quite well-known language with a great spread and community support. It can be used for both client and server web scraping scripting that makes it pretty suitable for writing your scrapers and crawlers.

Most of these libraries' advantages can be received by using our API and some of these libraries can be used in stack with it.

So let’s check them out.

The 5 Top JavaScript Web Scraping Libraries in 2020#

1. Axios#

Axios is a promise-based HTTP client for the browser and Node.js. But why exactly this library? There are a lot of libraries that can be used instead of a well-known request: got, superagent, node-fetch. But Axios is a suitable solution not only for Node.js but for client usage too.

Simplicity of usage is shown below:

const axios = require('axios');
// Make a request for a user with a given ID
axios.get('/user?ID=12345')
.then(function (response) {
// handle success
console.log(response);
})
.catch(function (error) {
// handle error
console.log(error);
})
.finally(function () {
// always executed
});

Promises are cool, isn’t it?

To get this library you can use one of the preferable ways:

Using npm:

npm install axios

Using bower:

bower install axios

Using yarn:

yarn add axios

GitHub repository: https://github.com/axios/axios

2. Cheerio#

Cheerio implements a subset of core jQuery. In simple words - you can just swap your jQuery and Cheerio environments for web scraping. And guess what? It has the same benefit that Axios has - you can use it from client and Node.js as well.

For the sample of usage, you can check another of our articles: Amazon Scraping. Relatively easy.

Also, check out the docs:

3. Selenium#

Selenium is a popular Web Driver that has a lot of wrappers for most programming languages. Quality Assurance engineers, automation specialists, developers, data scientists - all of them at least once have used this perfect tool. For Web Scraping it’s like a swiss knife - no additional libraries needed. Any action can be performed with a browser like a real user: page opening, button click, form filling, Captcha resolving and much more.

Selenium may be installed via npm with:

npm install selenium-webdriver

And the usage is a simple too:

const {Builder, By, Key, until} = require('selenium-webdriver');
(async function example() {
let driver = await new Builder().forBrowser('firefox').build();
try {
await driver.get('http://www.google.com/');
await driver.findElement(By.name('q'));
await driver.sendKeys('webdriver', Key.RETURN);
await driver.wait(until.titleIs('webdriver - Google Search'), 1000);
} finally {
await driver.quit();
}
})();

More info can be found via the documentation:

4. Puppeteer#

There are a lot of things we can say about Puppeteer: it’s a reliable and production-ready library with great community support. Basically Puppeteer is a Node.js library that offers a simple and efficient API and enables you to control Google’s Chrome or Chromium browser. So you can run a particular site's JavaScript (as well as with Selenium) and scrape single-page applications based on Vue.js, React.js, Angular, etc.

We have a great example of using Puppeteer for scraping Angular-based website, you can check it here: AngularJS site scraping. Easy deal?

Also, we’d like to suggest you check out a great curated list of awesome Puppeteer resources: https://github.com/transitive-bullshit/awesome-puppeteer

As well, useful official resources:

5. Playwright#

Not as well-known a library as Puppeteer, but can be named as Puppeteer 2, since Playwright is a library maintained by former Puppeteer contributors. Unlike Puppeteer it supports Chrome, Chromium, Webkit and Firefox backend.

To install it just run the following command:

npm install playwright

To be sure, that the API is pretty much the same, you can take a look at the example below:

const playwright = require('playwright');
(async () => {
for (const browserType of ['chromium', 'firefox', 'webkit']) {
const browser = await playwright[browserType].launch();
const context = await browser.newContext();
const page = await context.newPage();
await page.goto('http://whatsmyuseragent.org/');
await page.screenshot({ path: `example-${browserType}.png` });
await browser.close();
}
})();

Conclusion#

It’s always up to you to decide what to use for your particular web scraping case, but it’s also pretty obvious that the amount of data on the Internet increases exponentially and data mining becomes a crucial instrument for your business growth.

But remember, instead of choosing a fancy tool that may not be of much use, you should focus on finding out a tool that suits your requirements best.