...

Scrape Google Search Results With Python(2024)

Scrape Google Search Results Using Python

The Python programming language was developed in 1991 by Guido van Rossum with its main emphasis on code readability and clear and concise syntax.

Python has gained vast popularity in the web scraping community due to advantages like readability, scalability, etc. This makes it a great alternative to other programming languages and a perfect choice for web scraping tasks.

This blog post will not only focus on scraping Google but also provide you with a clear understanding of why Python is the best choice for extracting data from Google and what are the benefits of collecting information from Google.

We are going to use HTTPX and BS4 for scraping and parsing the raw HTML data. 

By the end of this article, you will have a basic understanding of scraping Google Search Results with Python. You can also leverage this knowledge for future web scraping projects with other programming languages.

What are Google Search Results?

Google Search Results can be defined as the listings displayed on the search engine page for a particular query entered in the search bar. These results often include organic search results, knowledge graphs, “People Also Asked” sections, news articles, and various other types of relevant content depending on the user’s query. Recently, Google has also added search results powered by generative AI, marking a significant revolution in how search results are displayed, providing a comprehensive and summarized combination of outputs for users’ queries.

Why Python for Scraping Google?

Python is a robust and powerful language that has given great importance to its code readability and clarity. This enables beginners to learn and implement scraping scripts quickly and easily. It also has a large and active community of developers who can help you in case of any problem in your code.

Another advantage of using Python is that it offers a wide range of frameworks and libraries specifically designed for scraping data from the web, including Scrapy, BeautifulSoup, Playwright, and Selenium.

Overall, Python offers numerous advantages like high performance, scalability, and various other scraping resources. This makes it stand as an excellent choice for not only extracting data from Google but also for other web scraping tasks.

Read More: The Top Preferred Languages For Web Scraping

Scraping Google Search Results Using Python

In this blog post, we will create a basic Python script to scrape the first ten Google Search Results, including their title, descriptions, and links.

Set-Up

For those users who have not installed Python on their devices, please consider these videos:

  1. How to install Python on Windows?
  2. How to install Python on MacOS?

If you don’t want to watch videos, you can directly install Python from their official website.

Installing Libraries

Now, let’s install the necessary libraries for this project in our folder.

  1. Beautiful Soup —A third-party library to parse the extracted HTML from the websites.
  2. HTTPX— A fully featured HTTP client for Python to extract data from websites.

If you don’t want to read their documentation, install these two libraries by running the below commands.

pip install httpx
pip install beautifulsoup4

Process

So, we have completed the setup of our Python project for scraping Google. Let us first import the libraries we will use further in this tutorial. 

import httpx
import asyncio
from bs4 import BeautifulSoup

Then, we will define an asynchronous function that will scrape the organic data from this webpage

async def get_organic_data():
    headers = {
        "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4703.0 Safari/537.36"
    }
    async with httpx.AsyncClient() as client:
        response = await client.get("https://www.google.com/search?q=python+tutorial&gl=us&hl=en", headers=headers)

After defining the function, we initialized the headers variable to User Agent to make our scraping bot mimic an organic user.

User Agent is a request header that identifies the device requesting the software. 

Refer to this guide if you want to learn more about headers: Web Scraping With Python

Then, we used an asynchronous context manager to create an HTTP client. Finally, we used this client to make an HTTP GET request on our target URL with the specified headers by using the await keyword to wait for the response from the server.

Now, within the context manager, we will create a BeautifulSoup object to parse and navigate through the HTML.

    async with httpx.AsyncClient() as client:
        response = await client.get("https://www.google.com/search?q=python+tutorial&gl=us&hl=en", headers=headers)
        soup = BeautifulSoup(response.content, "html.parser")

After creating the Beautiful Soup object, we will locate the tags for the required elements from the HTML.

Inspecting the HTML

If you inspect the webpage, you will get to know that every organic result is under the div container with class g

So, we will loop over every div tag with the class g to get the required information from the HTML.

        organic_results = []

        for el in soup.select(".g"):

Then, we will locate the tags for the title, description, and link.

Extracting the HTML

If you further inspect the HTML, or if you take a look at the above image, you will find that the tag for the title is h3, the tag for the link is .yuRUbf > a and the tag for the description is .VwiC3b.

        organic_results = []
        i = 0

        for el in soup.select(".g"):
            organic_results.append({
                "title": el.select_one("h3").text,
                "link": el.select_one(".yuRUbf > a")["href"],
                "description": el.select_one(".VwiC3b").text,
                "rank": i+1
            })

            i+=1    

        print(organic_results)


asyncio.run(get_organic_data())

Run this code in your terminal. You will be able to get the required data from Google.

[
  {
    "title": "The Python Tutorial \u2014 Python 3.11.3 documentation",
    "link": "https://docs.python.org/3/tutorial/",
    "description": "This tutorial introduces the reader informally to the basic concepts and features of the Python language and system. It helps to have a Python interpreter\u00a0...",
    "rank": 1
  },
  {
    "title": "Python Tutorial",
    "link": "https://www.w3schools.com/python/",
    "description": "Learn by examples! This tutorial supplements all explanations with clarifying examples. See All Python Examples. Python Quiz. Test your Python skills with a\u00a0...",
    "rank": 2
  },
  .....

Congratulations 🎉🎉!! You have successfully made a Python script to scrape Google Search Results.

But this method still can’t be used to scrape data from Google at a large scale, as this can result in a permanent block of your IP by Google. Instead, you can try this Google Scraper API to scrape data from Google without getting blocked.

Using Google Search API to Scrape Search Results

Serpdog provides an easy and streamlined solution to scrape Google Search Results with its robust SERP APIs, and it also solves the problem of dealing with proxies and CAPTCHAs for a smooth scraping journey. It provides tons of extra data other than organic results in the most affordable pricing in the whole industry.

Read More: Top 10 SERP API

Serpdog: Google SERP API

You will also get 1000 free API credits upon signing up.

After registering on our website, you will get an API Key. Copy your API Key in the below code, and you will be able to easily scrape Google Search Results with Python at a rapid speed.

import requests
payload = {'api_key': 'APIKEY', 'q':'python+tutorial' , 'gl':'us'}
resp = requests.get('https://api.serpdog.io/search', params=payload)
print (resp.text)

Conclusion:

Overall, Python is an excellent language and offers various functionality related to web scraping. However, certain limitations exist when working with Python, such as a slower response rate when scraping Google, lack of support for multiple threads, and the risk of having your IP blocked by Google due to a high amount of requests.

It is advisable to implement an ethical strategy when dealing with Google. Alternatively, developers can integrate a Google Scraper API into their software to avoid blockage.

In this tutorial, we learned to scrape Google Search Results using Python. Feel free to message me anything you need clarification on. Follow me on Twitter. Thanks for reading!

Additional Resources

  1. Web Scraping With JavaScript and Node JS
  2. Web Scraping Amazon
  3. Scrape Bing
  4. Scrape Zillow Properties
  5. Scrape LinkedIn Jobs