Scraping Google Search Results With Java

Java is on the list as one of the oldest and most popular programming languages. Its popularity is evident from the fact that it runs on more than a billion Android devices.

It is also one of the most powerful multithread languages which can be used to conduct various tasks. One of the useful tasks it can do is web scraping.

Web Scraping is the process of extracting or collecting data from websites or other sources and storing it in the needed format. It is used for various tasks such as data mining, price monitoring, lead generation, SEO, etc. The scraped data can be used by businesses to make informed decisions and gain information about their target market.

In this blog, we will learn how to scrape Google search results using Java and its libraries. 

Why Java for Scraping Google?

Java is a very user-friendly language to understand for beginners. The community support available for Java is also large, which can help you face any error while programming your scraper.

You can solve your errors by asking questions in large programming communities present in both Reddit and Discord.

Java is a powerful language, and with the support of its high-performance capability, it can be a good choice for scraping Google.

Scraping Google Search Results Using SERP API in Java

In this blog, we will be designing a Java script to scrape the first 10 Google Search Results. The output will include the link, title, description, and position of each respective result. This data can be utilized for SEO, media monitoring, ad verification, and other purposes.

We will be using two methods to extract the search results:

  1. Using Serpdog’s Google SERP API
  2. Using Java Scraping and Parsing Libraries

Requirements:

Many libraries in Java can be used for web scraping, but in this tutorial, we will be going with:

  1. Jsoup is a Java library that can extract and process HTML.

Set-Up:

Create a folder and save the file with the name you want with the .java extension. If you have not installed Java, you can install it by reading the following articles:

  1. How to install Java on Windows?
  2. How to install Java on MacOS?

Getting API Credentials From Serpdog’s Google SERP API

Hold on!! Before getting started, we need an API Key from Serpdog to access the search results.

Register from the highlighted button and get your API Key from the dashboard.

Setting Up our code for scraping search results

Our first step will be to import the Java built-in libraries to make an HTTP GET request to the Serpdog API, read the response data, and handle it accordingly.

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;

Now, we will create a class and construct the URL for the request.

public class SerpDogAPIClient {
    private static final String API_KEY = "YOUR_API_KEY";

    public static void main(String[] args) {
        try {
            String apiUrl = "https://api.serpdog.io/search?api_key=" + API_KEY + "&q=top+tech+companies+in+new+york&gl=us";
            URL url = new URL(apiUrl);

Next, we will open a connection to our target URL.

            HttpURLConnection connection = (HttpURLConnection) url.openConnection();
            connection.setRequestMethod("GET");

The last line sets the request method to GET because we are retrieving data from the API.

Once the request has been sent to the API endpoint, we will retrieve the status code and check if the received status code is 200 or not.

            int responseCode = connection.getResponseCode();
            if (responseCode == HttpURLConnection.HTTP_OK) {

If it is true, then we will print the response.

            if (responseCode == HttpURLConnection.HTTP_OK) {
                BufferedReader reader = new BufferedReader(new InputStreamReader(connection.getInputStream()));
                StringBuilder response = new StringBuilder();
                String line;
                while ((line = reader.readLine()) != null) {
                    response.append(line);
                }
                reader.close();

                // Handle the JSON response here using a JSON parsing library
                String jsonResponse = response.toString();
                System.out.println(jsonResponse);
            } 

Otherwise, we will print the respective error we got from the API.

else {
                throw new RuntimeException("Error in API Call. Response code: " + responseCode);
            }
        } catch (Exception ex) {
            ex.printStackTrace();
        }
    }
}

Compile and run this Java program, and you will get the desired search data.

  "organic_results": [
    {
      "title": "Top Tech Companies in NYC, NY 2024",
      "link": "https://www.builtinnyc.com/companies",
      "displayed_link": "https://www.builtinnyc.com › companies",
      "favicon": "",
      "source": "Built In NYC",
      "snippet": "See the complete list of NYC, NY technology companies, many of which are hiring now. See company benefits, info, interviews and more at Built In NYC.",
      "highlighted_keywords": [
        "NYC",
        "NY",
        "companies",
        "NYC"
      ],
      "inline_sitelinks": [
        {
          "title": "Top Companies Hiring...",
          "link": "https://www.builtinnyc.com/companies/hiring/remote"
        },
        {
          "title": "Justworks",
          "link": "https://www.builtinnyc.com/company/justworks"
        },
        {
          "title": "Spring Health",
          "link": "https://www.builtinnyc.com/company/spring-health"
        },
        {
          "title": "Citadel",
          "link": "https://www.builtinnyc.com/company/citadel"
        }
      ],
      "rank": 1
    },

In the above response, we got the title, link, snippet, and other relevant information about the search result. Similarly, if you have run the program, you have also gotten the featured snippets like People Also Ask, Related Queries, and menu items from the extracted data.

So, this is how you can use Serpdog’s Google SERP API with Java to get the Google Search Results.

Scraping Google Search Results Using Java And JSoup

In this section, we will use Jsoup in Java to create a custom web scraper and learn the basics of how SERP APIs extract search results data.

https://www.google.com/search?q=Java&gl=us

You can choose any query and location in the URL.

    import org.jsoup.Jsoup;
    import org.jsoup.nodes.Document;
    import org.jsoup.nodes.Element;
    import org.jsoup.select.Elements;
    
    import java.io.IOException;
    
    public class GoogleScraper {
        public static void main(String[] args) throws IOException {
    
            String googleUrl = "https://www.google.com/search?q=java&gl=us";
    
            // Connect to the Google search page
            Document doc = Jsoup.connect(googleUrl).userAgent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.54 Safari/537.36").get();
            // Document object represents the HTML dom (Talking about "doc" here)

Step-by-step explanation:

  1. First, we imported all the required classes from the jsoup library and the Java io package.
  2. After declaring the class and the main method we initialized our Google URL.
  3. After that, we made a connection, with the Google web page using the URL and the User Agent with the help of JSOUP’s connect method. Then we extracted the HTML data with the help of the get method and stored it in the document data type.

User-Agent is used to identify the application, operating system, vendor, and version of the requesting user agent, which can save help in making a fake visit to Google by acting as a real user.

This will help us to extract the raw HTML code. Then we will parse this HTML with the help of the select method.

Web Scraping Google With Java 2

If you inspect the HTML, you will get to know that every result is contained inside a “div” container with a class name g.

We will now select all the divs with the class name g.

Elements results = doc.select("div.g");

select() - It is used to select matching elements from the HTML or XML document.
And then, we will loop over these selected divs.

  int c = 0;
    for (Element result : results) {
        // Extract the title and link of the result
        String title = result.select("h3").text();
        String link = result.select(".yuRUbf > a").attr("href");
        String snippet = result.select(".VwiC3b").text();
        System.out.println("Title: " + title);
        System.out.println("Link: " + link);
        System.out.println("Snippet: " + snippet);
        System.out.println("Position: "+ (c+1));
        System.out.println("\n");
        c++;
    }                       

You can find the tags for the title, snippet, and link under the g.div. Let us inspect the HTML so we can find them. 

Web Scraping Google With Java 3

From the image, you can say that the tag for the title is h3, for the link it is .yuRUbf > a, and for the snippet it is .VwiC3b.

After running the code successfully your results should look like this:

 Title: Java | Oracle
    Link: https://www.java.com/
    Snippet: Get Java for desktop applications. Download Java · What is Java? Uninstall help. Happy Java User. Are you a software developer looking for JDK downloads?
    Position: 1
    
    Title: Java Downloads | Oracle
    Link: https://www.oracle.com/java/technologies/downloads/
    Snippet: The JDK includes tools for developing and testing programs written in the Java programming language and running on the Java platform. Linux; macOS; Windows.
    Position: 2
    
    Title: Java - Wikipedia
    Link: https://en.wikipedia.org/wiki/Java
    Snippet: Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of ...
    Position: 3                

But if you follow this method, Google may block your IP easily. You can avoid this to some extent by using random User Agents for each request. Let me show you, how you can do this:

Initialize an array of User Agents.

String UserAgents[] = {"Mozilla/5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, Like Gecko) Chrome/74.0.3729.169 Safari/537.36", "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, Like Gecko) Chrome/72.0.3626.121 Safari/537.36", "Mozilla/5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, Like Gecko) Chrome/74.0.3729.157 Safari/537.36", "Mozilla/5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, Like Gecko) Chrome/96.0.4664.110 Safari/537.36", "Mozilla/5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, Like Gecko) Chrome/96.0.4664.45 Safari/537.36", "Mozilla/5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, Like Gecko) Chrome/97.0.4692.71 Safari/537.36", }

Then choose a random number between 0 and the length of the array.

int rnd = (int)(Math.random()*userAgents.length);

Then you can pass it when you scrape the HTML.

Document Doc = Jsoup.Connect(GoogleUrl).UserAgent(UserAgents[Rnd]).Get();

So, this is how you can prepare a basic script to scrape Google Search Results with Java.
If you are looking for a more streamlined and maintenance-free solution, then you might consider our Google SERP API for scraping search results.

Advantages of Scraping Google Search Results

Collecting data from Google provides you with many benefits: 

SERP Monitoring – It can be used, to monitor website rankings on Google, which can help you to increase your website visibility in the market.

Scalable – It allows you to collect a large amount of data without any hindrance, which can be used for various purposes like lead generation, market trends analysis, etc.

Price Monitoring – This can be used to gather the pricing of the products sold by your competitors or online retailers to remain competitive in the market.

Lead Generation – It can be used to gather the email addresses of your potential customers.

Access to real-time data – It provides you with access to the most up-to-date data as Google Search Results keep updating frequently.

Inexpensive – It is a very cost-effective choice instead of using official API, which is not affordable by most businesses.

Problems with Offical Google Search API

There are a few reasons why businesses don’t use official Google Search API

Not Affordable: Their API is priced at 5$ for 1k requests, which is not affordable to most businesses. It is not feasible for those who are on a tight budget.

Limited Access: The API provides only a limited amount of data, that’s why people consider scrapers available in the market, which extracts the HTML directly from the web page, giving them complete control over the results.

Complex Setup: The Google Search API is very complex to set up for users who don’t have any knowledge about coding.

Conclusion

In this tutorial, we learned to scrape Google Search Results using Java. Please do not hesitate to message me if I missed something. If you think we can complete your custom scraping projects feel free to contact us.

Follow me on Twitter. Thanks for reading!

Additional Resources

  1. Web Scraping Google With Node JS - A Complete Guide
  2. Web Scraping- A Complete Guide
  3. Web Scraping Google Without Getting Blocked
  4. Web Scraping Google News Result