Skip to main content

How to make POST, PUT and DELETE requests using Puppeteer?

Oleg Kulyk

Oleg Kulyk

Co-Founder @ ScrapingAnt

Puppeteer make POST request

Making POST, PUT, and DELETE requests is a crucial web scraping and web testing technique. Still, this functionality is not included in Puppeteer's API as a separate function.

Let's check out the workaround for this situation and create a helper function to fix this out.

Why do we need POST while web scraping?#

POST request is one of several available HTTP request types. By design, this method is used to send data to a web server for processing and possible storage.

POST requests usage is one of the main ways to send form data during login or registration. It is also one of the ways to send any data to the web server.

Earlier, one of the main implementation patterns of login or registration was to send the form data with the required authorization parameters via POST request and get the protected content as a response to this request (along with cookies to avoid re-entering the authentication and authorization data).

Nowadays SPAs (Single Page Applications) also use POST requests to send data to API, but such requests usually return only necessary data to update web page and not the whole page.

Thus, many sites use POST requests for client-server communication and this requires the ability of sending POST requests while web scraping.

Unfortunately, Puppeteer developers haven't introduced the native way of making requests other than GET, but it's not a big deal for us to create a workaround.

Interception of the initial request#

The idea behind our approach is quite simple - we need to change the request type while opening the page, so we can send POST data along with opening a page.

To do that, we have to intercept the request using page.on('request') handler.

We're going to use HTTPBin which can help us with our solution testing.

Let's check out the simple JS snippet which just opens HTTPBin's POST endpoint:

const puppeteer = require('puppeteer');
const TARGET_URL = 'https://httpbin.org/post';
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(TARGET_URL);
console.log(await page.content());
})();

The result is, definitely, not what we're trying to achieve:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"><html><head><title>405 Method Not Allowed</title>
</head><body><h1>Method Not Allowed</h1>
<p>The method is not allowed for the requested URL.</p>
</body></html>

So, let's add a request interception:

const puppeteer = require('puppeteer');
const TARGET_URL = 'https://httpbin.org/post';
const POST_JSON = { hello: 'I like ScrapingAnt' };
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.setRequestInterception(true);
page.once('request', request => {
request.continue({ method: 'POST', postData: JSON.stringify(POST_JSON), headers: request.headers });
});
await page.goto(TARGET_URL);
console.log(await page.content());
})();

This time our POST request has been successfully executed:

<html><head></head><body><pre style="word-wrap: break-word; white-space: pre-wrap;">{
"args": {},
"data": "{\"hello\":\"I like ScrapingAnt\"}",
"files": {},
"form": {},
"headers": {
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "en-US",
"Content-Length": "30",
"Host": "httpbin.org",
"Sec-Fetch-Dest": "document",
"Sec-Fetch-Mode": "navigate",
"Sec-Fetch-Site": "none",
"Sec-Fetch-User": "?1",
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/93.0.4577.0 Safari/537.36",
"X-Amzn-Trace-Id": "Root=1-61757548-7608d72817d01768524a3298"
},
"json": {
"hello": "I like ScrapingAnt"
},
"origin": "7.133.7.133",
"url": "https://httpbin.org/post"
}
</pre></body></html>

Unfortunately, this code is only a proof-of-concept, not a complete solution, because any request made by the browser will be converted to a POST request.

Let's improve our code and make it production-ready.

Puppeteer's extension#

The next big step of our HTTP requests Puppeteer journey is to create something reusable to avoid code duplication while making non-GET requests.

To achieve that, let's create a function gotoExtended:

async function gotoExtended(page, request) {
const { url, method, headers, postData } = request;
if (method !== 'GET' || postData || headers) {
let wasCalled = false;
await page.setRequestInterception(true);
const interceptRequestHandler = async (request) => {
try {
if (wasCalled) {
return await request.continue();
}
wasCalled = true;
const requestParams = {};
if (method !== 'GET') requestParams.method = method;
if (postData) requestParams.postData = postData;
if (headers) requestParams.headers = headers;
await request.continue(requestParams);
await page.setRequestInterception(false);
} catch (error) {
log.debug('Error while request interception', { error });
}
};
await page.on('request', interceptRequestHandler);
}
return page.goto(url);
}

The usage of this function is straightforward and simple:

const puppeteer = require('puppeteer');
const TARGET_URL = 'https://httpbin.org/post';
const POST_JSON = { hello: 'I like ScrapingAnt' };
const headers = { header_1: 'custom_header' };
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await gotoExtended(page, { url: TARGET_URL, method: 'POST', postData: JSON.stringify(POST_JSON), headers });
console.log(await page.content());
})();

It is also possible to use this helper function with any available HTTP method and custom headers.

Conclusion#

Having the ability to send any request type using Puppeteer allows web scraping specialists to extend their capabilities and improve their data crawlers' performance by skipping unnecessary steps like form filling.

As usual, we recommend you extend your web scraping knowledge using our articles:

Happy Web Scraping, and don't forget to cover your code with unit-tests 👷

Forget about getting blocked while scraping the Web

Try out ScrapingAnt Web Scraping API with thousands of proxy servers and an entire headless Chrome cluster