...

Web Scraping Google With Node JS

Web Scraping Google With Node JS

In this post, we will learn to scrape Google Search Results with Node JS using some of the in-demand web scraping and web parsing libraries present in Node JS.

This article will be helpful to beginners who want to make their career in web scraping, data extraction, or mining in Javascript. Web Scraping can open many opportunities to developers around the world. As we say, “Data is the new oil.”

So, here we end the introduction and get started with our long tutorial on scraping Google Search Results with Node JS.

Web Scraping Google With Node JS

Before we start with the tutorial, let me explain the headers and their importance in scraping.

What are HTTP Headers?

Headers are an important part of an HTTP request or response that provides additional meta-information about the request or response.
Headers are case-insensitive. Their name and values are usually separated by a single colon in a text string format.

Headers play an important role in web scraping. Usually, when website owners have information that data can be extracted from their website, they implement different tools and strategies to save their website from being scraped by the bots.

Scrapers with nonoptimized headers fail to scrape these types of websites. But when you pass correct headers, your bot not only mimics a real user but is also successfully able to scrape quality data from the website. Thus, scrapers with optimized headers can save your IPs from being blocked by these websites.

Headers can be classified into four different categories:

  • Request Headers
  • Response Headers
  • Representation Headers
  • Payload Headers

HTTP Request Header

These are the headers sent by the client while fetching data from the server. It consists same key-value pair headers in a text string format as other headers. Identification of the request sender can be done with the help of the information in the request headers.

The following examples show some of the request headers:

HTTP Request header contains various information like:

  • The browser version of the request sender.
  • Requested page URL
  • Platform from which the request is sent.

HTTP Response Header

The headers sent back by the server after successfully receiving the request headers from the user are known as Response Headers. It contains information like the date and time and the type of file sent back by the server. It also consists of information about the server that generated the response.

The following examples show some of the response headers:

  • content-length: 27098
  • content-type: text/html
  • date: Fri, 16 Sep 2022 19:49:16 GMT
  • server: nginx
  • cache-control: cache-control: max-age=21584

HTTP Representation Header

The representation header describes the type of resource sent in an HTTP message body. The data transferred can be in any format, such as JSON, XML, HTML, etc. These headers tell the client about the data format they received.
The following examples show some of the representation headers:

  • content-encoding: gzip
  • content-length: 27098
  • content-type: text/html

HTTP Payload Headers

Understanding the payload headers is quite tricky. So, first, you should know about the meaning of payload, then we will come to an explanation.

What is Payload?
The message content or the data expected by the server that the recipient will receive when transferring data over a server is known as payload.
The Payload Header is the HTTP header that consists of the payload information about the original resource representation. They consist of information about the content length and range of the message, any encoding present in the transfer of the message, etc.

The following examples show some of the payload headers:

  • content-length: 27098
  • content-range: bytes 200–1000/67589
  • trailer: Expires
  • transfer-encoding: chunked

The content–range header indicates where a partial message belongs in a full-body message. The transfer-encoding header is used to specify the type of encoding to securely transfer the payload body to the user.

User Agent


User-Agent
is used to identify the application, operating system, vendor, and version of the requesting user agent. It can help us in mimicking as a real user. Thus, saving our IP from being blocked by Google. It is one of the main headers we use while scraping Google Search Results.

It looks like this:

Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36

Libraries for Scraping Google in Node JS


The top 5 Libraries for scraping Google in Node JS are:

  1. Unirest
  2. Axios
  3. Cheerio
  4. Puppeteer
  5. Playwright

Unirest

In this section, we will learn about the Node JS library Unirest which will help us scrape Google Search Results. We will discuss the need for this library and the advantages associated with it.

Unirest is a lightweight HTTP library available in many languages, including Java, Python, PHP, .Net, etc. Kong currently manages the Unirest JS. Also, it comes in the list of one of the most popular web scraping Javascript libraries. It helps us to make all types of HTTP requests to scrape the precious data on the requested page.

Let us take an example of how we can scrape Google search results using unirest:

npm i unirest

Then we will request our target URL:

const unirest = require(“unirest”)
function getData()
    {
    const url =   "https://www.google.com/search?q=javascript&gl=us&hl=en"
    
    let header = {
    "User-Agent": "Mozilla/5.0 (Windows NT 6.3; Win64; x64)  AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.84 Safari/537.36 Viewer/96.9.4688.89"
    }
    
    return unirest
    .get(url)
    .headers(header)
    .then((response) => {
    console.log(response.body);
    })
    
    }
    getData();

Step-by-step explanation after header declaration:

  1. get() is used to make a get request at our target URL.
  2. headers() are used to pass HTTP request headers along with the request.

This block of code will return an HTML file and will look like this:

Unreadable, right? Don’t worry. We will be discussing a web parsing library in a bit.
As we know, Google can block our request if we request with the same User Agent each time. So, if you want to rotate User-Agents on each request, let us define a function that will return random User-Agent strings from the User-Agent array.

const selectRandom = () => {
    const userAgents =  ["Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36",
    "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36",
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.157 Safari/537.36",
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36",
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36",
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36",
    ]
    var randomNumber = Math.floor(Math.random() * userAgents.length);
    return userAgents[randomNumber];
    }
    let user_agent = selectRandom();
    let header = {
    "User-Agent": `${user_agent}`
    }

This logic will ensure we don’t have to use the same User-Agents each time.

Advantages of using Unirest:

  1. It has proxy support.
  2. It supports all HTTP request methods(GET, POST, DELETE, etc.).
  3. It supports form downloads.
  4. It supports TLS/SSL protocol.
  5. It supports HTTP authentication.

Axios


Axios
is a promise-based HTTP client for Node JS and browsers and one of the most popular and powerful javascript libraries. It can make XMLHttpRequests and HTTP from the browser Node JS respectively. It also has client-side support for protecting against the CSRF.

Let us take an example of how we can use Axios for scraping Google:

npm i axios

The below block of code will return the same HTML file we saw in the Unirest section.

const axios = require('axios');
let headers = { 
    "User-Agent": "Mozilla/5.0 (Windows NT 6.3; Win64; x64)   AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.84 Safari/537.36 Viewer/96.9.4688.89"
    }
    
    axios.get('https://www.google.com/search?q=javascript&gl=us&hl=en' , headers)
    .then((response) {
    console.log(response.body);
    })
    .catch((e) {
    console.log(e);
    });

Advantages of using Axios:

  1. It can support old browsers also, indicating wider browser support.
  2. It supports response timeout.
  3. It can support multiple requests at the same time.
  4. It can intercept HTTP requests.
  5. Most important for developers, it has brilliant community support.

Cheerio


Cheerio
is a web parsing library that can parse any HTML and XML document. It implements a subset of jQuery, so its syntax is quite similar to jQuery.

Manipulating and rendering the markup can be done very fast with the help of Cheerio. It doesn’t produce a visual rendering, apply CSS, load external resources, or execute Javascript.

Let us take a small example of how we can use Cheerio to parse the Google ads search results.
You can install Cheerio by running the below command in your terminal.

npm i cheerio

Now, we will prepare our parser by finding the CSS selectors using the SelectorGadget extension. Watch the tutorial on the selector gadget website if you want to learn how to use it.

Let us first scrape the HTML with the help of unirest and make a Cheerio instance for parsing the HTML.

const cheerio = require("cheerio");
    const unirest = require("unirest");
    
    const getData = async() => {
    try
    {
    const url = "https://www.google.com/search?q=life+insurance";
    
    const response = await unirest
    .get(url)
    .headers({
        "User-Agent":
        "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36"
    })
    
    const $ = cheerio.load(response.body)

In the last line, we just created a constant and loaded the scraped HTML in it. If you look at the bottom right of the page, the results of the ads are under the tag.uEierd.
We will scrape the ad’s title, snippet, link, displayed link, and site links.

Look at the bottom of the image for the tag of the title.
Similarly, for the snippet:

Let us find the tag for the displayed link:

And if you inspect the title, you will find the tag for the link to be a.sVXRqc.
After searching all the tags, our code will look like this:

let ads = [];
    $("#tads .uEierd").each((i,el) => {
    ads[i] = {
    title: $(el).find(".v0nnCb span").text(),
    snippet: $(el).find(".lyLwlc").text(),
    displayed_link: $(el).find(".qzEoUe").text(),
    link: $(el).find("a.sVXRqc").attr("href"),
    }
    })

Now, let us find tags for site links.

Now, similarly, if we follow the above process to find the tags for sitelinks titles, snippets, and links, our code will look like this:

let sitelinks = [];  
    if($(el).find(".UBEOKe").length)
    {
    $(el).find(".MhgNwc").each((i,el) => {
    sitelinks.push({
        title: $(el).find("h3").text(),
        link: $(el).find("a").attr("href"),
        snippet: $(el).find(".lyLwlc").text()
    })
    })
    ads[i].sitelinks = sitelinks
    }

And our results:

Complete Code:

const cheerio = require("cheerio");
    const unirest = require("unirest");
    
    const getData = async() => {
    try
    {
    const url = "https://www.google.com/search?q=life+insurance";
    
    const response = await unirest
    .get(url)
    .headers({
        "User-Agent":
        "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36"
    })
    
    const $ = cheerio.load(response.body)
    
    let ads=[];
    
    $("#tads .uEierd").each((i,el) => {
        let sitelinks = [];
        ads[i] = {
            title: $(el).find(".v0nnCb span").text(),
            snippet: $(el).find(".lyLwlc").text(),
            displayed_link: $(el).find(".qzEoUe").text(),
            link: $(el).find("a.sVXRqc").attr("href"),
        }
    
        if($(el).find(".UBEOKe").length)
        {
            $(el).find(".MhgNwc").each((i,el) => {
                sitelinks.push({
                title: $(el).find("h3").text(),
                link: $(el).find("a").attr("href"),
                snippet: $(el).find(".lyLwlc").text()
                })
            })
            ads[i].sitelinks = sitelinks
        }
        })
        console.log(ads)
    }
    catch(e)
    {
        console.log(e);
    }
    }
    
    getData();

You can see how easy it is to use Cheerio JS for parsing HTML. Similarly, we can use Cheerio with other web scraping libraries like Axios, Puppeteer, Playwright, etc.

If you want to learn more about scraping websites with Cheerio, you can consider my blogs where I have used Cheerio as a web parser:

  1. Scrape Google Maps Reviews
  2. Scrape Google Search Organic Results

Advantages of using Cheerio:

  1. Cheerio implements a subset of jQuery. It reveals its gorgeous API by removing all the DOM inconsistencies from jQuery.
  2. Cheerio JS is swift as it doesn’t produce visual rendering, apply CSS, load external resources, or execute Javascript, which is common in single-page applications.
  3. It can parse nearly any HTML and XML document.

Headless Browsers

Gone are the days when websites were built with only HTML and CSS. Nowadays, interaction on modern websites can be handled entirely by JavaScript, especially the SPAs(single page applications), built on frameworks like React, Next, and Angular, which rely heavily on Javascript for rendering dynamic content.

But when doing web scraping, the content we require is sometimes rendered by Javascript, which is not accessible from the HTML response we get from the server.

And that’s where the headless browser comes into play. Let’s discuss a couple of Javascript libraries that use headless browsers for web automation and scraping.

Puppeteer


Puppeteer
is a Google-designed Node JS library that provides a high-quality API that enables you to control Chrome or Chromium browsers.

Here are some features associated with Puppeteer JS:

  1. It can be used to crawl single-page applications and can generate pre-rendered content, i.e., server-side rendering.
  2. It works in the background and performs actions as directed by the API.
  3. It can generate screenshots of web pages.
  4. It can make pdf of web pages.

Let us take an example of how we can scrape Google Books Results using Puppeteer JS. We will scrape the book title, image, description, and writer.

First, install Puppeteer by running the below command in your project terminal:

npm i puppeteer

Now, let us create a web crawler by launching the puppeteer in a non-headless mode.

const url = "https://www.google.com/search?q=merchant+of+venice&gl=us&tbm=bks";
 
    browser = await puppeteer.launch({
        headless: false,
        args: ["--disabled-setuid-sandbox", "--no-sandbox"],
    });
    const page = await browser.newPage();
    await page.setExtraHTTPHeaders({
        "User-Agent":
        "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.84 Safari/537.36 Agency/97.8.6287.88",
    });
await page.goto(url, { waitUntil: "domcontentloaded" });

What each line of code says:

  1. puppeteer.launch() – This will launch the Chrome browser with non-headless mode.
  2. browser.newPage() – This will open a new tab in the browser.
  3. page.setExtraHTTPHeaders() – This will allow us to set headers on our target URL.
  4. page.goto() – This will navigate us to our target URL page.

Now, let us find the CSS selector for the book title.

As you can see at the bottom of the page, the CSS selector of our title.
We will paste this into our code:

let books_results = [];
books_results = await page.evaluate(() => {
    return Array.from(document.querySelectorAll(".Yr5TG")).map((el) => {
        return {    
            title: el.querySelector(".DKV0Md")?.textContent
        }
      })
    });

Here I have used the page.evaluate() function to evaluate the page’s context and return the result.

Then I selected the parent handler of the title, which is also a parent handler of other things we want to scrape(image, writer, description, etc. as stated above) using the document.querySelectorAll() method.

Finally, we selected the title from the elements present in the parent handler container with the help of querySelector(). The textContent will allow us to grab the text inside the selected element.

We will select the other elements just in the same way as we selected the title. Now, let us find the tag for the writer.

books_results = await page.evaluate(() => {
    return Array.from(document.querySelectorAll(".Yr5TG")).map((el) => {
     return {    
      title: el.querySelector(".DKV0Md")?.textContent,
      writers: el.querySelector(".N96wpd")?.textContent,
      }
     })
    });

Let us find the tag for our description as well.

let books_results = [];
books_results = await page.evaluate(() => {
        return Array.from(document.querySelectorAll(".Yr5TG")).map((el) => {
            return {    
                title: el.querySelector(".DKV0Md")?.textContent,
                writers: el.querySelector(".N96wpd")?.textContent,
                description: el.querySelector(".cmlJmd")?.textContent,
            }
        })
    });

And finally for the image:

let books_results = [];
books_results = await page.evaluate(() => {
     return Array.from(document.querySelectorAll(".Yr5TG")).map((el) => {
        return {    
            title: el.querySelector(".DKV0Md")?.textContent,
            writers: el.querySelector(".N96wpd")?.textContent,
            description: el.querySelector(".cmlJmd")?.textContent,
            thumbnail: el.querySelector("img").getAttribute("src"),
        }
    })
  });
  console.log(books_results);
  await browser.close();

We don’t need to find the tag for the image as it is the only image in the container. So we just used the “img” element for reference. Don’t forget to close the browser.
Now, let us run our program to check the results.

The long URL you see as a thumbnail value is nothing but a base64 image URL. So, we got the results we wanted.

Complete Code:

const puppeteer = require("puppeteer");
  const cheerio = require("cheerio");
 const getBooksData = async () => {
  const url = "https://www.google.com/search?q=merchant+of+venice&gl=us&tbm=bks";
 browser = await puppeteer.launch({
    headless: true,
    args: ["--disabled-setuid-sandbox", "--no-sandbox"],
  });
  const page = await browser.newPage();
  await page.setExtraHTTPHeaders({
    "User-Agent":
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.84 Safari/537.36 Agency/97.8.6287.88",
  });
  await page.goto(url, { waitUntil: "domcontentloaded" });
  let books_results = [];
  books_results = await page.evaluate(() => {
     return Array.from(document.querySelectorAll(".Yr5TG")).map((el) => {
        return {    
            title: el.querySelector(".DKV0Md")?.textContent,
            writers: el.querySelector(".N96wpd")?.textContent,
            description: el.querySelector(".cmlJmd")?.textContent,
            thumbnail: el.querySelector("img").getAttribute("src"),
        }
    })
  });
  console.log(books_results)
  await browser.close();
};
getBooksData();

So, we have now a basic understanding of Puppeteer JS. Now, let’s discuss its advantages:

Advantages of using Puppeteer:

  1. We can scroll the page in Puppeteer js.
  2. We can click on elements like buttons and links.
  3. We can take screenshots of the web page.
  4. We can navigate between the web pages.
  5. We can parse Javascript also with the help of Puppeteer JS.

Playwright JS


Playwright JS
is a test automation framework used by developers around the world to automate web browsers. The same team that worked on Puppeteer JS previously has developed the Playwright JS. You will find the syntax of Playwright JS to be similar to Puppeteer JS, the API method in both cases is also identical, but both languages have some differences. 

Playwright v/s Puppeteer JS:

  1. The playwright JS supports multiple languages like C#, .NET, Javascript, etc. While the latter only supports Javascript.
  2. The Playwright JS is still a new library with limited community support, unlike Puppeteer JS, which has good community support.
  3. Playwright supports browsers like Chromium, Firefox, and Webkit, while Puppeteer’s main focus is Chrome and Chromium, with limited support for Firefox.

Let us take an example of how we can use Playwright JS to scrape Top Stories from Google Search Results. First, install Playwright by running the below command in your terminal:

npm i playwright

Now, let’s create our scraper by launching the Chromium browser at our target URL.

const browser = await playwright['chromium'].launch({ headless: false, args: ['--no-sandbox'] });
    const context = await browser.newContext();
    const page = await context.newPage();
    await page.goto("https://www.google.com/search?q=india&gl=us&hl=en");

Step-by-step explanation:

  1. The first step is to launch the Chromium browser in non-headless mode.
  2. The second step creates a new browser context. It won’t share cookies/cache with other browser contexts.
  3. The third step opens a new tab in the browser.
  4. In the fourth step, we navigate to our target URL.

Now, let us search for the tags for these single stories.

As you can see every single story comes under the .WlydOe tag. This method page.$$ will find all elements matching the specified selector within the page and will return the array containing all these elements.

Look for tags of the title, date, and thumbnail, with the same approach as we have done in the Puppeteer section. After finding the tags push the data in our top_stories array and close the browser.

let top_stories = [];
    for(let single_story of single_stories)
    {
        top_stories.push({
        title: await single_story.$eval(".mCBkyc", el => el.textContent.replace('\n','')),
        link: await single_story.getAttribute("href"),
        date: await single_story.$eval(".eGGgIf", el => el.textContent),
        thumbnail: await single_story.$eval("img", el => el.getAttribute("src"))
        })
    }
    console.log(top_stories)
    await browser.close();

The $eval will find the specified element inside the parent element we declared above in single_stories array. The textContent will return the text inside the specified element and getAttribute will return the value of the specified element’s attribute.
Our result will look like this:

Here is the complete code:

const playwright = require("playwright");
    const getTopStories = async () => {
    try {
    const browser = await playwright['chromium'].launch({ headless: false, args: ['--no-sandbox'] });
    const context = await browser.newContext();
    const page = await context.newPage();
    await page.goto("https://www.google.com/search?q=football&gl=us&hl=en");
    const single_stories = await page.$$(".WlydOe");
    let top_stories = [];
    for(let single_story of single_stories)
    {
        top_stories.push({
        title: await single_story.$eval(".mCBkyc", el => el.textContent.replace('\n','')),
        link: await single_story.getAttribute("href"),
        date: await single_story.$eval(".eGGgIf", el => el.textContent),
        thumbnail: await single_story.$eval("img", el => el.getAttribute("src"))
        })
    }
    console.log(top_stories)
    await browser.close();
    } catch (e) {
    console.log(e);
    }
    };
 getTopStories();

Advantages of using Playwright:

  1. It enables auto-wait for elements before performing any tasks.
  2. It allows you to test your web applications in mobile browsers.
  3. It comes in the list of one of the fastest processing libraries when it comes to web scraping.
  4. It covers all modern web browsers like Chrome, Edge, Safari, and Firefox.

Recap

The above sections taught us to scrape and parse Google Search Results with various Javascript libraries. We also saw how we can use a combination of Unirest and Cheerio and Axios and Cheerio to extract the data from Google. It is obvious, if you want to scrape millions of pages of Google, that won’t work without proxies and captchas.

But, wait! You can still use our Google Search API that solves all your problems of handling proxies and captchas enabling you to scrape millions of Google Search Results without any hindrance.

Also, you require a large pool of user agents to make millions of requests on Google. But if you use the same user agent each time you request, your proxies will get blocked. Serpdog also solves this problem as our Google Search API uses a large pool of User Agents to scrape Google Search Results successfully.

Moreover, Serpdog provides its users with 100 free credits on the first sign-up.

Here are some articles if you want to know more about how to scrape Google:

  1. Scrape Google Shopping Results
  2. Web Scraping Google News Results
  3. Scrape Google Shopping Product Results

Other Libraries

In this section, we will discuss some of the alternatives to the above-discussed libraries.

Nightmare JS


Nightmare JS
is a web automation library designed for websites that don’t own APIs and want to automate browsing tasks.

Nightmare JS is mostly used by developers for UI testing and crawling. It can also help mimic user actions(like goto, type, and click) with an API that feels synchronous for each block of scripting.

Let us take an example of how we can use Nightmare JS to scrape Google Search Twitter Results.

Install the Nightmare JS by running this command:

npm i nightmare

As you can see in the above image, each Twitter result is under the tag .dHOsHb. So, this makes our code look like this:

const Nightmare = require("nightmare")
const nightmare = Nightmare()
nightmare.goto("https://www.google.com/search?q=cristiano+ronaldo&gl=us")
    .wait(".dHOsHb")
    .evaluate(() => {
        let twitter_results = [];
        const results = document.querySelectorAll(".dHOsHb")
        results.forEach((result) => {
            let row = {
                "tweet": result.innerText,
            }
            twitter_results.push(row)
        })
        return twitter_results;
    })
    .end()
    .then((result) => {
    result.forEach((r) => {
        console.log(r.tweet);
    })
    })
    .catch((error) => {
        console.log(error)
    })

Step-by-step explanation:

  1. After importing the library, we created an instance of Nightmare JS with the name Nightmare.
  2. Then we use goto() to navigate to our target URL.
  3. In the next step, we used wait() to wait for the selected tag of the Twitter result. You can also pass a time value as a parameter to wait for a specific period.
  4. Then we used evaluate(), which invokes functions on the page, in our case, it is querySelectorAll().
  5. In the next step, we used the forEach() function to iterate over the results array and fill each element with the text content.
  6. At last, we called end() to stop the crawler and returned our scraped value.

Here are our results:

Node Fetch


Node Fetch
is a lightweight module that brings Fetch API to Node JS, or you can say it enables to use of the fetch() functionality in Node JS.

Features:

  1. Use native promise and async functions.
  2. It is consistent with window.fetch API.
  3. It uses native Node streams for the body, on both request and response.

To use Node Fetch run this command in your project terminal:

npm i [email protected]

Let us take a simple example to request our target URL:

const fetch = require("node-fetch");
const getData = async() => {
    const response = await fetch("https://google.com/search?q=web+scraping&gl=us" , {
    headers: {
    "User-Agent": 
        "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.84 Safari/537.36 Agency/97.8.6287.88"
    }
    });
    const body = await response.text();
    console.log(body);
    }
    getData();

Osmosis

Osmosis is an HTML/XML parser for Node JS.

Features:

  1. It supports CSS 3.0 and XPath 1.0 selector hybrids.
  2. It is a very fast parser with a small memory footprint.
  3. No large dependencies like jQuery or Cheerio JS.

Advantages of using Osmosis:

  1. It supports fast searching.
  2. Supports single or multiple proxies and handles their failures.
  3. Support form submission and session cookies.
  4. Retries and redirect limits.

Using Serpdog for scraping Google

There are various benefits of using Serpdog’s Google Search API:

  1. You don’t have to deal with complex HTML data anymore.
  2. Serpdog’s Google Search API uses a massive amount of residential proxies, which not only increases the success rate of scraping Google but also helps you to avoid frequent IP bans.
  3. Serpdog’s powerful API infrastructure can handle millions of Google requests without any problems.

To use our services for free, you have to first sign up on our website.

Serpdog – Google Search API

After registering on our website, you will get an API Key which you can use to scrape Google using our API.

const axios = require('axios');

axios.get('https://api.serpdog.io/search?api_key=APIKEY&q=coffee&gl=us')
  .then(response => {
    console.log(response.data);
  })
  .catch(error => {
    console.log(error);
  });

By embedding your API Key in the above code, you will be able to scrape Google at a blazingly fast speed.

Conclusion

In this tutorial, we discussed eight Javascript libraries that can be used for web scraping Google Search Results. We also learned some examples of scraping search results. Each of these libraries has unique features and advantages, some are just new, and some have been updated and adopted according to developer needs. Thus, you know which library to choose according to the circumstances.

If you have any questions about the tutorial, please feel free to ask me.
If you think I have not covered some topics in the tutorial, please feel free to message me.

Additional Resources

  1. Scrape Google Autocomplete Results
  2. Web Scraping Google Images
  3. Scrape Google Scholar Results

Frequently Asked Questions

This tutorial is designed for beginners to develop a basic understanding of scraping Google search results. If anyone wants to learn more, I have already written several blogs on scraping Google, which you can find on Serpdog’s Blog page. These resources will provide you with an intermediate to advanced understanding of scraping Google.

Web scraping Google is relatively straightforward! Even a developer with a decent knowledge base can kickstart their career in web scraping with the right tools.

Yes, all publicly available data on the internet is legal to scrape.