- Axios: Promise based HTTP client for the browser and node.js.
Features: XMLHttpRequests from the browser, HTTP requests from node.js, Promise API, intercepting of request and response, transforming of request and response, automatic transforming for JSON data.
- Got: Human-friendly and powerful HTTP request library for Node.js.
Features: HTTP/2 support, Promise API, Stream API, Pagination API, Cookies (out-of-box), Progress events.
- Superagent: Small progressive client-side HTTP request library, and Node.js module with the same API, supporting many high-level HTTP client features.
Features: HTTP/2 support, Promise API, Stream API, Request cancelation, Follows redirects, Retries on failure, Progress events.
DOM manipulation and HTML parsing
- Cheerio: Fast, flexible & lean implementation of core jQuery designed specifically for the server.
- htmlparser2: A forgiving HTML/XML/RSS parser. The parser can handle streams and provides a callback interface.
This module started as a fork of the
htmlparsermodule. The main difference is that
htmlparser2is intended to be used only with NodeJs (it runs on other platforms using
htmlparser2was rewritten multiple times and, while it maintains an API that's compatible with htmlparser in most cases, the projects don't share any code anymore.
- Puppeteer: Puppeteer is a NodeJS library that provides a high-level API to control Chrome or Chromium over the DevTools Protocol. Puppeteer runs headless by default, but can be configured to run full (non-headless) Chrome or Chromium.
- Awesome resources for Puppeteer: https://github.com/transitive-bullshit/awesome-puppeteer
- Selenium: Selenium is an umbrella project encapsulating a variety of tools and libraries enabling web browser automation. Selenium specifically provides an infrastructure for the W3C WebDriver specification — a platform and language-neutral coding interface compatible with all major web browsers.
- Playwright: Playwright is a Node library to automate Chromium, Firefox, and WebKit with a single API. Playwright is built to enable cross-browser web automation that is ever-green, capable, reliable, and fast.
- amazon-scraper: Useful tool to scrape product information from the amazon
- app-store-scraper: Node.js module to scrape application data from the iTunes/Mac App Store.
- instagram-scraper: Since Instagram has removed the option to load public data through its API, this actor should help replace this functionality.
- google-play-scraper: Node.js module to scrape application data from the Google Play store.
- scrapedin: Scraper for LinkedIn full profile data. Unlike other scrapers, it's working in 2020 with their new website.
- tiktok-scraper: Scrape and download useful information from TikTok.
And it's only the most interesting ones. Feel free to browse through Github to find out your best one!
Also, our web scraping API is language agnostic, so you can check it even if you're not very familiar with JS or Python.