Working with images in NodeJS extends your web scraping capabilities, from downloading the image with an URL to retrieving photo attributes like EXIF. How to achieve the image download and obtain the data?
Let's start our walk through the several methods used to download images in NodeJS.
Download an image using
Our goal is to create a function that can download and save the image. This function should have 2 parameters input -
url will be used to specify the remote image path (URL or path at the server) and
filepath - path to be downloaded in (where to save the image). So, the empty function will be the following:
Let's keep this signature across all the file downloading methods, so we'll be able to substitute the content of the function without changing the output. Also, it is helpful for unit testing and clean coding.
The vanilla downloading code will have a following look:
https.get function to process the file downloading from the server, while
fs streaming allows us to save the file to the defined path.
https module is used here to process the encrypted
https requests (as I assume, that most of the Internet is secured with SSL). Otherwise,
https should be replaced with
http without any extra coding.
Still, this function requires some extra modification. It doesn't notify us about success or failure, and we won't track the processing end. So let's fix this situation by promisifying it.
Voila! Our function returns a promise which allows us to track the process completion and the status.
Let's move forward and check out another popular option.
axios you can use
npm or your favorite package manager like
Then we're able to replace our function internal to get the same functionality. Also, we're going to add async/await flavor to our code.
As I've mentioned before, we can change the entire function content while keeping the behavior persistent.
It's a Node module for downloading image to disk from a given URL.
It can be installed by execution of the following command:
This kind of library allows you to solve your specific task with the smallest possible amount of code. To demonstrate this, we will rewrite our function for the module usage:
Pretty terse, isn't it?
As always, each of these methods has its pros and cons. Still, such a variety of available ways of an image download allows you to pick up the best one. I'd recommend only one approach - avoid bloating the codebase with many libraries and stick up to one HTTP client.
- How to get all text from a webpage using Puppeteer? - text extraction using Puppeteer and NodeJS
Happy Web Scraping, and don't forget to enable GZIP compression in your HTTP client for the proxy traffic saving 💰