Skip to main content

How to Scrape Google Trends Data using Python

· 11 min read
Satyam Tripathi

How to Scrape Google Trends Data using Python

Google Trends tracks the popularity of search topics over time by collecting data from billions of searches. It's a valuable tool for analyzing trends, behaviors, and public interest. However, scraping Google Trends data can be challenging due to dynamic content and a complex DOM structure.

In this article, we'll walk you through a simple, step-by-step process on how to extract and clean Google Trends data using Python.

Google Trends data can be an invaluable resource for gaining insights into keyword popularity, market behavior, and competitive trends. Scraping this data can provide an edge in multiple areas, from SEO to business strategy.

  1. Keyword Research and SEO: Google Trends is a goldmine for keyword research. By analyzing trending search terms, SEO professionals adjust their content strategies to match what people are searching for. This helps boost website traffic and keeps content relevant by identifying emerging trends early on.
  2. Market Research and Consumer Insights: Businesses can use Google Trends data to better understand customer preferences and behavior. For example, E-commerce companies use this data to identify regional trends or seasonal demand, helping them optimize product launches and marketing campaigns.
  3. Competitive Analysis: Scraping Google Trends data can help businesses monitor competitors by analyzing brand-related search terms. This allows companies to see how they measure up in the market and discover winning strategies that can improve their own performance.
  4. Societal and Cultural Shifts: Journalists and researchers can tap into Google Trends to track public interest in specific events or topics. This data is especially useful for analyzing how major events, like elections or global news stories, influence public opinion.

Choosing the Right Library

A popular tool for scraping Google Trends is Pytrends, an unofficial API for Google Trends in Python. It provides a simple way to automate the download of Google Trends reports. However, it has certain limitations.

Since Pytrends uses unofficial methods to retrieve data, it may occasionally face rate limiting or provide incomplete data, leading to discrepancies between the results shown on the Google Trends website and the data fetched through the API.

To overcome these limitations, you can use headless browsers like Playwright or Selenium. For this tutorial, we’ll focus on Playwright.

Choosing the Right Scraping Method

At first, I tried scraping Google Trends by interacting with the page directly through Playwright. This involved inspecting the Document Object Model (DOM), identifying the correct CSS or XPath selectors, and targeting specific elements to extract the data. While this method works, it is both time-consuming and prone to breaking whenever Google Trends updates its page structure.

A more reliable solution is to use the download button in Google Trends, which allows you to download the data as a CSV file. You can simply automate the click on this button with Playwright to get the data without needing to manually extract elements. Automating the download process is generally more reliable than scraping individual elements from the page.

Below, you can see the download buttons that we will target using Playwright to automate the CSV file download.

Image 1:

related topics queries

Image 2:

interest by region

Image 3:

interest over time

Formulating the URL

Let’s say you want to explore the trending keywords for an office chair over the past 12 months in India. After applying the filters (location: India, time: past 12 months), Google Trends will display related topics and queries.

apply filter

For instance, one of the rising search terms is “da urban merlion office chair”, which shows a +950% growth, indicating a high level of interest.

rising data

Now, if you switch to the Top section, you’ll see the most commonly searched queries related to “office chair”. For example, "chair for office" is the most popular related query with a relative interest score of 100 over the last 12 months.

top filter

These are the basics of Google Trends data. You can refine your search using different filters such as location, time range, categories, and search type. For instance, if you want to see trending terms on YouTube, simply switch to the YouTube search filter.

youtube search

Here’s the final URL after applying some filters:

https://trends.google.com/trends/explore?geo=IN&q=office%20chair&hl=en-GB

Next, we’ll walk through how to build a Google Trends scraper using Playwright.

Let’s dive into building a Google Trends scraper using Playwright.

Setting Up the Environment

Before we begin, ensure that your development environment is properly set up:

  1. Install Python: Download and install the latest version from the official Python website.
  2. Choose an IDE: You can use PyCharm, Visual Studio Code, or Jupyter Notebook.

Next, create a new project. It's a good practice to use a virtual environment, as it helps manage dependencies and keeps your projects isolated. Navigate to your project directory and run the following command to create a virtual environment:

python -m venv google_trends

Activate the virtual environment:

  • On Windows, run:
    google_trends\Scripts\activate
  • On macOS and Linux, run:
    source google_trends/bin/activate

With the virtual environment activated, let’s install the required libraries. Install Playwright and its browser binaries with these commands:

pip install playwright
playwright install

Setting Up Directories to Store Downloaded and Cleaned Files

Before you download the data, let’s create directories to organize the files. We’ll create one directory for the raw CSV files and another for the cleaned versions.

import os

# Save the downloads
download_dir = os.path.join(os.getcwd(), "downloads")
os.makedirs(download_dir, exist_ok=True)

# Save the cleaned files
cleaned_dir = os.path.join(os.getcwd(), "cleaned")
os.makedirs(cleaned_dir, exist_ok=True)

Here, we use Python’s os module to create the directories. The os.makedirs() function ensures that the directories are created if they don't already exist. Once this is run, you’ll see two new folders—downloads and cleaned—in your working directory.

Now that you have the directories set up, let's automate the data download from Google Trends.

import asyncio
from playwright.async_api import async_playwright

async def download_google_trends_data():
print("Starting Google Trends data download...")
async with async_playwright() as p:
browser = await p.chromium.launch(headless=True)
context = await browser.new_context(accept_downloads=True)

page = await context.new_page()
url = (
"https://trends.google.com/trends/explore?geo=IN&q=office%20chair&hl=en-GB"
)

max_retries = int(os.getenv("MAX_RETRIES", 5))
for attempt in range(max_retries):
await asyncio.sleep(5)
response = await page.goto(url)
if response.status == 429:
print(f"Retry {attempt + 1}/{max_retries} - Received 429 status code")
await asyncio.sleep(
5 + 2**attempt
)
continue
elif response.status != 200:
print(f"Error: Failed to load page (status {response.status})")
await browser.close()
return
print("Page loaded successfully")
break
await page.wait_for_load_state("networkidle", timeout=60000)
await page.wait_for_selector("button.export", state="visible", timeout=20000)

download_buttons = await page.query_selector_all("button.export")
download_buttons = [
button for button in download_buttons if await button.is_visible()
]

if len(download_buttons) < 4:
print(
f"Expected 4 download buttons, but found {len(download_buttons)}. Please check if all buttons are loaded properly."
)
await browser.close()
return
file_names = [
"Interest_Over_Time.csv",
"Interest_By_SubRegion.csv",
"Related_Topics.csv",
"Related_Queries.csv",
]

for idx, button in enumerate(download_buttons[:4]):
print(f"Downloading: {file_names[idx]}")
await button.scroll_into_view_if_needed()
async with page.expect_download() as download_info:
await button.click()
download = await download_info.value
file_name = file_names[idx]
await download.save_as(os.path.join(download_dir, file_name))
print(f"Downloaded {file_name}")
await browser.close()
print("All files downloaded successfully")

This function navigates to the Google Trends page and downloads four CSV files: Interest Over Time, Interest By Subregion, Related Topics, and Related Queries.

If you’re new to Playwright and want a step-by-step guide, I’ve created a 4-part Playwright series that’s beginner-friendly and covers everything you need to know.

After downloading the data, you need to clean it to remove inconsistencies and irrelevant entries. Google Trends includes both ‘Top’ and ‘Rising’ categories for related topics and queries, but both are combined into a single CSV file. To make the process easier, we’ll split these into separate CSV files for each category.

import pandas as pd
import csv

def clean_related_data(file_path, output_top_path, output_rising_path, columns):
if not os.path.exists(file_path):
print(f"Error: File not found: {file_path}")
return
print(f"Cleaning {os.path.basename(file_path)}...")
with open(file_path, "r", encoding="utf-8-sig") as file:
csv_reader = csv.reader(file)
lines = []
for line in csv_reader:
lines.append(line)
top_start = next(i for i, line in enumerate(lines) if line and line[0] == "TOP") + 1
rising_start = (
next(i for i, line in enumerate(lines) if line and line[0] == "RISING") + 1
)

top_data = lines[top_start : rising_start - 1]
rising_data = lines[rising_start:]

top_df = pd.DataFrame(top_data, columns=columns)
rising_df = pd.DataFrame(rising_data, columns=columns)

top_df = top_df[:-1]
rising_df = rising_df[:-1]

top_df.to_csv(output_top_path, index=False, encoding="utf-8-sig")
rising_df.to_csv(output_rising_path, index=False, encoding="utf-8-sig")
print("Cleaned data saved for TOP and RISING queries")

This function separates the "Top" and "Rising" sections from the downloaded files and saves them into two separate CSV files.

Cleaning the Interest by SubRegion Data

Let’s clean the data that shows user interest by geographic region.

def clean_interest_by_subregion_data(file_path, output_path):
if not os.path.exists(file_path):
print(f"Error: File not found: {file_path}")
return
print(f"Cleaning {os.path.basename(file_path)}...")
with open(file_path, "r", encoding="utf-8-sig") as file:
csv_reader = csv.reader(file)
lines = []
for line in csv_reader:
lines.append(line)
region_data = pd.DataFrame(lines[3:], columns=["Region", "Interest"])

region_data = region_data[:-1]

region_data.to_csv(output_path, index=False, encoding="utf-8-sig")
print(f"Cleaned data saved: {output_path}")

This function processes the "Interest By Subregion" CSV file, removes empty values, and formats the data properly.

Cleaning the Interest Over Time Data

Finally, let’s clean the data that tracks search interest over time.

def clean_interest_over_time_data(file_path, output_path):
if not os.path.exists(file_path):
print(f"Error: File not found: {file_path}")
return
print(f"Cleaning data from: {file_path}")
df = pd.read_csv(file_path, skiprows=2)

if df.shape[1] >= 2:
if df.shape[1] != 2:
print(f"Warning: Found {df.shape[1]} columns, using first two")
cleaned_df = df.iloc[:, [0, 1]]
cleaned_df.columns = ["Week", "Search Interest"]

# Remove last row from cleaned_df

cleaned_df = cleaned_df[:-1]

cleaned_df.to_csv(output_path, index=False, encoding="utf-8-sig")
print(f"Cleaned data saved: {os.path.basename(output_path)}")
else:
print("Error: File has unexpected number of columns")

This function ensures that the "Interest Over Time" CSV is formatted correctly by selecting the appropriate columns and saving them to a new file.

Main Function

The main function first triggers the download_google_trends_data() function to download the CSV files, then processes and cleans each file using the corresponding cleaning functions.

if __name__ == "__main__":
asyncio.run(download_google_trends_data())

clean_related_data(
os.path.join(download_dir, "Related_Topics.csv"),
os.path.join(cleaned_dir, "cleaned_top_topics.csv"),
os.path.join(cleaned_dir, "cleaned_rising_topics.csv"),
["Topics", "Interest"],
)

clean_related_data(
os.path.join(download_dir, "Related_Queries.csv"),
os.path.join(cleaned_dir, "cleaned_top_queries.csv"),
os.path.join(cleaned_dir, "cleaned_rising_queries.csv"),
["Query", "Interest"],
)

clean_interest_by_subregion_data(
os.path.join(download_dir, "Interest_By_SubRegion.csv"),
os.path.join(cleaned_dir, "cleaned_region_data.csv"),
)

clean_interest_over_time_data(
os.path.join(download_dir, "Interest_Over_Time.csv"),
os.path.join(cleaned_dir, "cleaned_interest_over_time.csv"),
)

print("Data download and cleaning completed!")

Running the Code

You can find the complete code in my GitHub Gist: Google Trends Scraper. Simply copy and paste the code to your environment and run it. When executed, two folders—downloads and cleaned—will be automatically created.

Note that when running the code, you might encounter a 429 status code. This typically results in a page similar to the one shown below. However, our code handles this challenge by automatically retrying the request if you encounter this status code.

429 status code

Once the code runs successfully, you’ll see four files in the downloads folder:

downloaded files

In the cleaned folder, you’ll find six cleaned versions of these files, organized and ready for use:

cleaned files

Here’s an example of a downloaded file. As you can see, it contains unstructured data and irrelevant entries, with both Top and Rising data combined:

unstructured data

Now, take a look at one of the cleaned files. The data is well-organized, more readable, and easy to use for further analysis:

cleaned data

That’s it!

Challenges and Considerations

While Google Trends provides valuable data, scraping it directly can be difficult due to Google’s advanced bot detection systems. Even with methods like rotating IPs, adding random delays, and mimicking user behavior, these techniques aren't foolproof, especially if you're scraping at scale across multiple URLs.

A more efficient solution is to use ScrapingAnt, a web scraping API that manages everything for you—IP rotation, delays, and browser emulation—ensuring that you can scrape large amounts of Google Trends data without getting blocked.

To start using the ScrapingAnt API, you only need two things: the URL you’d like to scrape and the API key, which can be obtained from your ScrapingAnt dashboard after signing up for a free test account.

scrapingant dashboard

To integrate the ScrapingAnt API into your Python project, install the Python client scrapingant-client :

pip install scrapingant-client

You can also explore more on the GitHub project page.

The ScrapingAnt API client is straightforward to use, supporting various input and output formats as described on the Request and Response Format page. Below is a simple example demonstrating its usage:

from scrapingant_client import ScrapingAntClient

client = ScrapingAntClient(token="YOUR_SCRAPINGANT_API_KEY")

response = client.general_request(
"https://www.amazon.com/Dowinx-Headrest-Ergonomic-Computer-Footrest/dp/B0CVWXK632/"
)
print(response.content)

Here's our result:

scrapingant result

This shows how ScrapingAnt simplifies the web scraping process by handling the complexities for you.

Wrapping Up

Google Trends data is an invaluable resource for gaining insights into keyword popularity, market trends, and competitor activity. While the Pytrends Python library is a popular choice for accessing this data, it has its limitations, so we opted for Playwright to download the data directly.

The downloaded data was unstructured and contained irrelevant entries, making it difficult to analyze. To address this, we implemented a thorough data-cleaning process to ensure the final dataset was clear, organized, and ready for meaningful insights.

Scraping at scale presents challenges like bot detection and rate limiting, but using tools like ScrapingAnt simplifies the process by handling these issues automatically. Get started today with 10,000 free API credits 🚀

Forget about getting blocked while scraping the Web

Try out ScrapingAnt Web Scraping API with thousands of proxy servers and an entire headless Chrome cluster