How to run Playwright on AWS Lambda

Playwright on AWS lambda

In this article, I’d like to share a quick guide of how to run Playwright inside AWS Lambda. There are a bunch of similar guides about Puppeteer, but only a few are about the successor from Microsoft.

PlayWright 101

Before getting into the AWS Lambda portion of this, let’s briefly go over what we are trying to achieve with PlayWright. In order to get the content of a given URL with PlayWright, we have to go through four steps:

Launch a new browser
Open a new page
Navigate to the given URL
Get the page content

Here’s what that looks like:

const { chromium } = require('playwright');

(async () => {
  const browser = await chromium.launch();
  const context = await browser.newContext();
  const page = await context.newPage();
  await page.goto('https://scrapingant.com/');
  const result = await page.content();
  console.log(result);
  await browser.close();
})();

AWS Lambda 101

These are NodeJS (in our case) functions that can be called from the frontend website or any other code via HTTP/SDK request. They give us the power of a backend server, without having to worry about actually creating and maintaining a fully blown API.

AWS Lambda beginners guide: https://aws.amazon.com/lambda/getting-started/

The extensive guide for AWS Lambda with SAM: https://itnext.io/creating-aws-lambda-applications-with-sam-dd13258c16dd

The sample of single AWS Lambda function is shown below:

exports.handler = async (event, context) => {
  // do your stuff here
}

For example, if we’d like to implement AWS Lambda scraper:

exports.handler = async (event, context) => {
    const params = JSON.parse(event.body);
    const pageToScrape = params.pageToScrape;
    // exact scraping by pageToScrape
}

Putting Playwright and AWS Lambda together

Probably you’d like to know why not just use Playwright library as is? The problem is in the Chromium binaries inside the library that should be compiled for it, but this option is not supported by Microsoft by default.

So where to find the right binaries?

It’s a Chromium Binary for AWS Lambda and Google Cloud Functions.

And to connect it we have two options:

1. Lazy and simple headless Chrome running

The following library and NPM package gives the support of both Playwright and Chromium Binaries from chrome-aws-lambda: https://github.com/JupiterOne/playwright-aws-lambda

It can be installed via npm:

npm install playwright-core playwright-aws-lambda --save

And our final code will have the following look:

const playwright = require('playwright-aws-lambda');

exports.handler = async (event, context) => {
  const params = JSON.parse(event.body);
  const pageToScrape = params.pageToScrape;

  let result = null;
  let browser = null;

  try {
    const browser = await playwright.launchChromium();
    const context = await browser.newContext();

    const page = await context.newPage();
    await page.goto(pageToScrape);
   const result = await page.content();
   console.log(result);
  } catch (error) {
    throw error;
  } finally {
    if (browser !== null) {
      await browser.close();
    }
  }
};

But, you’ll have no ability to update the Playwright version and Chromium version without waiting or contributing to playwright-aws-lambda library.

2. Flexible and maintainable

We can connect both Playwright and aws-chrome-lambda by ourselves.

Installing all the dependencies:

npm install chrome-aws-lambda playwright-core --save

And pass the Chromium executable path to PlayWright in the following way:

const { chromium } = require('playwright-core');
const awsChromium = require('chrome-aws-lambda');

//.....

const browser = await chromium.launch({
        headless: false,
        executablePath: awsChromium.executablePath,
    });

//.....

So in this way, we will be able to modify both PlayWright and chrome-aws-lambda versions, but it may be a bit difficult due to not all the versions are cross-compatible, so it just a start vector for your further experiments.

Conclusion

By using a Playwright you can get the latest browser API features from the former Puppeteer team, but the community support is still not really impressive and some of the issues should be resolved on your own.

To know more about Playwright just visit the official Github repo: https://github.com/microsoft/playwright

As well, you can just use our web scraping API to throw away all the difficulties and just enjoy your data mining experience.

How to run Playwright on AWS Lambda

PlayWright 101

AWS Lambda 101

Putting Playwright and AWS Lambda together

1. Lazy and simple headless Chrome running

2. Flexible and maintainable

Conclusion

Forget about getting blocked while scraping the Web

Web Scraping with ScrapingAnt

PlayWright 101​

AWS Lambda 101​

Putting Playwright and AWS Lambda together​

1. Lazy and simple headless Chrome running​

2. Flexible and maintainable​

Conclusion​

Forget about getting blocked while scraping the Web

Web Scraping with ScrapingAnt

PlayWright 101

AWS Lambda 101

Putting Playwright and AWS Lambda together

1. Lazy and simple headless Chrome running

2. Flexible and maintainable

Conclusion