How to run Playwright on AWS Lambda
Oleg Kulyk
Co-Founder @ ScrapingAntIn this article, I’d like to share a quick guide of how to run Playwright inside AWS Lambda. There are a bunch of similar guides about Puppeteer, but only a few are about the successor from Microsoft.
#
PlayWright 101Before getting into the AWS Lambda portion of this, let’s briefly go over what we are trying to achieve with PlayWright. In order to get the content of a given URL with PlayWright, we have to go through four steps:
- Launch a new browser
- Open a new page
- Navigate to the given URL
- Get the page content
Here’s what that looks like:
#
AWS Lambda 101These are NodeJS (in our case) functions that can be called from the frontend website or any other code via HTTP/SDK request. They give us the power of a backend server, without having to worry about actually creating and maintaining a fully blown API.
AWS Lambda beginners guide: https://aws.amazon.com/lambda/getting-started/
The extensive guide for AWS Lambda with SAM: https://itnext.io/creating-aws-lambda-applications-with-sam-dd13258c16dd
The sample of single AWS Lambda function is shown below:
For example, if we’d like to implement AWS Lambda scraper:
#
Putting Playwright and AWS Lambda togetherProbably you’d like to know why not just use Playwright library as is? The problem is in the Chromium binaries inside the library that should be compiled for it, but this option is not supported by Microsoft by default.
So where to find the right binaries?
It’s a Chromium Binary for AWS Lambda and Google Cloud Functions.
And to connect it we have two options:
#
1. Lazy and simple headless Chrome runningThe following library and NPM package gives the support of both Playwright and Chromium Binaries from chrome-aws-lambda
: https://github.com/JupiterOne/playwright-aws-lambda
It can be installed via npm
:
And our final code will have the following look:
But, you’ll have no ability to update the Playwright version and Chromium version without waiting or contributing to playwright-aws-lambda library.
#
2. Flexible and maintainableWe can connect both Playwright and aws-chrome-lambda
by ourselves.
Installing all the dependencies:
And pass the Chromium executable path to PlayWright in the following way:
So in this way, we will be able to modify both PlayWright and chrome-aws-lambda
versions, but it may be a bit difficult due to not all the versions are cross-compatible, so it just a start vector for your further experiments.
#
ConclusionBy using a Playwright you can get the latest browser API features from the former Puppeteer team, but the community support is still not really impressive and some of the issues should be resolved on your own.
To know more about Playwright just visit the official Github repo: https://github.com/microsoft/playwright
As well, you can just use our web scraping API to throw away all the difficulties and just enjoy your data mining experience.