How to run PlayWright on AWS Lambda

In this article, I’d like to share the quick guide of how to run PlayWright inside AWS Lambda. There are a bunch of similar guides about Puppeteer, but only several ones are about successor from Microsoft.

PlayWright 101

Before getting into the AWS Lambda portion of this, let’s briefly go over what we are trying to achieve with PlayWright. In order to get the content of a given URL with PlayWright, we have to go through four steps:

  • Launch a new browser
  • Open a new page
  • Navigate to the given URL
  • Get the page content

Here’s how that looks like:


const { chromium } = require('playwright');

(async () => {
  const browser = await chromium.launch();
  const context = await browser.newContext();
  const page = await context.newPage();
  await page.goto('http://whatsmyuseragent.org/');
  const result = await page.content();
  console.log(result);
  await browser.close();
})();

AWS Lambda 101

These are NodeJS (in our case) functions that can be called from the frontend website or any other code via HTTP/SDK request. They give us the power of a backend server, without having to worry about actually creating and maintaining a fully blown API.

AWS Lambda beginners guide: https://aws.amazon.com/lambda/getting-started/

The extensive guide for AWS Lambda with SAM: https://itnext.io/creating-aws-lambda-applications-with-sam-dd13258c16dd

The sample of single AWS Lambda function is shown below:


exports.handler = async (event, context) => {
  // do your stuff here
}

For example, if we’d like to implement AWS Lambda scraper:


exports.handler = async (event, context) => {
    const params = JSON.parse(event.body);
    const pageToScrape = params.pageToScrape;
}

Putting it all together

Probably you’d like to know, why not just use PlayWright library as is?
The problem is in the Chromium binaries inside the library that should be compiled for it, but this option is not supported by Microsoft by default.

So where to find the right binaries?

The following repository helps us to resolve this problem: https://github.com/alixaxel/chrome-aws-lambda

It’s a Chromium Binary for AWS Lambda and Google Cloud Functions.

And for connecting it we have two options:

1. Lazy and simple

The following library and NPM package gives the support of both PlayWright and Chromium Binaries from chrome-aws-lambda: https://github.com/JupiterOne/playwright-aws-lambda

It can be installed by NPM:


npm install playwright-core playwright-aws-lambda --save

And our final code will have the following look:


const playwright = require('playwright-aws-lambda');

exports.handler = async (event, context) => {
  const params = JSON.parse(event.body);
  const pageToScrape = params.pageToScrape;

  let result = null;
  let browser = null;

  try {
    const browser = await playwright.launchChromium();
    const context = await browser.newContext();

    const page = await context.newPage();
    await page.goto(pageToScrape);
   const result = await page.content();
   console.log(result);
  } catch (error) {
    throw error;
  } finally {
    if (browser !== null) {
      await browser.close();
    }
  }
};


But you’ll have no ability to update PlayWright version and Chromium version without waiting or contributing to playwright-aws-lambda library.

2. Flexible and maintainable

We can connect both PlayWright and aws-chrome-lambda by ourselves.

Installing all the dependencies:


npm install chrome-aws-lambda playwright-core --save

And pass the Chromium executable path to PlayWright in the following way:


const { chromium } = require('playwright-core');
const awsChromium = require('chrome-aws-lambda');

.....

const browser = await chromium.launch({
        headless: false,
        executablePath: awsChromium.executablePath,
    });

.....

So in this way, we will be able to modify both PlayWright and chrome-aws-lambda versions, but it may be a bit difficult due to not all the versions are cross-compatible, so it just a start vector for your further experiments.

Conclusion

With using a PlayWright you can get the latest browser API features from the former Puppeteer team, but community support is still not really big and some of the issues should be resolved on your own.

To know more about PlayWright just visit the official Github repo: https://github.com/microsoft/playwright

As well, you can just use our API to throw away all the difficulties and just enjoy your data mining experience.

Close Bitnami banner
Bitnami