...

How to Scrape LinkedIn Profiles From Google

In this tutorial, we will learn to scrape LinkedIn profiles from Google Search Results.

Requirements:

Web Parsing with CSS selectors

Searching the tags from the HTML files is not only a difficult thing to do but also a time-consuming process. It is better to use the CSS Selectors Gadget for selecting the perfect tags to make your web scraping journey easier.
This gadget can help you to come up with the perfect CSS selector for your need. Here is the link to the tutorial, which will teach you to use this gadget for selecting the best CSS selectors according to your needs.

User Agents

User-Agent is used to identify the application, operating system, vendor, and version of the requesting user agent, which can save help in making a fake visit to Google by acting as a real user.
You can also rotate User Agents, read more about this in this article: How to fake and rotate User Agents using Python 3.

If you want to further safeguard your IP from being blocked by Google, you can try these 10 Tips to avoid getting Blocked while Scraping Google.

Install Libraries

Before we begin, install these libraries so we can move forward and prepare our scraper.

  1. Unirest
  2. Cheerio

Or you can type the below commands in your project terminal to install the libraries:

npm i unirest
npm i cheerio                                       

Target:

How to scrape LinkedIn profiles from Google? 2

We will target employees working at Amazon in the USA.

Process:

In this section, we will make our scraper to scrape the LinkedIn profiles of the employees working at Amazon.

Now, let us start making our scraper. First, we will make a GET request on the target URL to extract the HTML data.

const url = "https://www.google.com/search?q=site:linkedin.com/in+'amazon.com'&gl=us";
    const response = await unirest
    .get(url)
    .header({
    "User-Agent":
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36",
    })

    const $ = cheerio.load(response.body);          

Step-by-step explanation:

  1. In the fourth line, we made a GET request to our target URL.
  2. In the next line, we passed the headers required with our target URL.
  3. Then we stored the returned response in the Cheerio instance.

You can also make an array of user agents to rotate them on every request so that Google doesn’t block your request.

 const selectRandom = () => {
        const userAgents =  ["Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36",
        "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36",
        "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.157 Safari/537.36",
        "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36",
        "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36",
        "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36",
        ]
        var randomNumber = Math.floor(Math.random() * userAgents.length);
        return userAgents[randomNumber];
        }
        let user_agent = selectRandom();
        let header = {
        "User-Agent": `${user_agent}`
        }                   

Now, we will prepare our parser and will extract the important data we need in our response. 

How to scrape LinkedIn profiles from Google? 3

As you can see the tag for the title is .DKV0Md. It contains the employee’s name and his post in the company.

let title = [];
    $(".g").each((i,el) => {
        title[i] = $(el).find(".DKV0Md").text();
        if(title[i].includes(" - LinkedIn") || title[i].includes(" | LinkedIn"))
        {
        title[i] = title[i].replace(" - LinkedIn" , "")
        title[i] = title[i].replace(" | LinkedIn", "")
        }
    })                      

Now, we will find the location from which the employee is working. 

 How to scrape LinkedIn profiles from Google? 4

From the above image, we found the tag for the location as .WZ8Tjf, which makes our code look like this:

let title = [], location = [];
    $(".g").each((i,el) => {
    title[i] = $(el).find(".DKV0Md").text();
    if(title[i].includes(" - LinkedIn") || title[i].includes(" | LinkedIn"))
    {
     title[i] = title[i].replace(" - LinkedIn" , "")
     title[i] = title[i].replace(" | LinkedIn", "")
    }
     location[i] = $(el).find(".WZ8Tjf").text(); 
    })                          

Now, we will parse the description. 

How to scrape LinkedIn profiles from Google? 5

From the above image, we find the tag for the description as .lEBKkf. Similarly, we can also find the tag for the link.

 $(".g").each((i,el) => {
        title[i] = $(el).find(".DKV0Md").text();
        if(title[i].includes(" - LinkedIn") || title[i].includes(" | LinkedIn"))
        {
            title[i] = title[i].replace(" - LinkedIn" , "")
            title[i] = title[i].replace(" | LinkedIn", "")
        }
        location[i] = $(el).find(".WZ8Tjf").text();
        about[i] = $(el).find(".lEBKkf").text();
        link[i] = $(el).find("a").attr("href")
    })                   

Now, we have completed the work to prepare our parser. We will now push this data to the employees_data array and will print it in the terminal.

 let employees_data = [];
    for (let i = 0; i < title.length; i++) {
        employees_data.push({
        title: title[i],
        location: location[i],
        about: about[i],
        link: link[i]
        })
    }
    console.log(employees_data)

This is what our results look like:

[
  {
    title: 'Werner Vogels - VP & CTO - Amazon.com',
    location: 'Seattle, Washington, United States · VP & CTO · Amazon.com',
    about: 'I believe in democratising business creation, simplifying operation and driving innovation by providing a low cost, scalable and reliable infrastructure that ...',
    link: 'https://www.linkedin.com/in/wernervogels'
  },
  {
    title: 'Jolene Bacca - Executive Assistant - Amazon.com',
    location: 'Seattle, Washington, United States · Executive Assistant · Amazon.com',
    about: 'I am very capable at efficiently solving difficult and complex problems affecting people within my various departments and other senior level personnel. The key ...',
    link: 'https://www.linkedin.com/in/jolene-bacca-9497147'
  },
  {
    title: 'J.R. Harris - Operations Manager - Amazon.com',
    location: 'McCordsville, Indiana, United States · Operations Manager · Amazon.com',
    about: 'Manufacturing management leading hourly and salary associates within a moderate volume and highly customized environment. Experienced with ISO 9001, ...',
    link: 'https://www.linkedin.com/in/jrharris1'
  },
  ……                     

Here is the complete code:

 const cheerio = require("cheerio");
    const unirest = require("unirest");
    
    const getData = async() => {
    try
    {
    const url = "https://www.google.com/search?q=site:linkedin.com/in+'amazon.com'&gl=us";
    
    const selectRandom = () => {
        const userAgents =  ["Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36",
        "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36",
        "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.157 Safari/537.36",
        "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36",
        "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36",
        "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36",
        ]
        var randomNumber = Math.floor(Math.random() * userAgents.length);
        return userAgents[randomNumber];
        }
        let user_agent = selectRandom();
        let header = {
        "User-Agent": `${user_agent}`
        }
    
    const response = await unirest
    .get(url)
    .header(header)
    
    
    const $ = cheerio.load(response.body)
    
    let title = [],location = [], about = [],link=[];
    $(".g").each((i,el) => {
        title[i] = $(el).find(".DKV0Md").text();
        if(title[i].includes(" - LinkedIn") || title[i].includes(" | LinkedIn"))
        {
        title[i] = title[i].replace(" - LinkedIn" , "")
        title[i] = title[i].replace(" | LinkedIn", "")
        }
        location[i] = $(el).find(".WZ8Tjf").text();
        about[i] = $(el).find(".lEBKkf").text();
        link[i] = $(el).find("a").attr("href")
    })
    let employees_data = [];
    for (let i = 0; i < title.length; i++) {
        employees_data.push({
        title: title[i],
        location: location[i],
        about: about[i],
        link: link[i]
        })
    }
    console.log(employees_data)
    }
    catch(e)
    {
        console.log(e)
    }
    }
    getData();           

This data can be useful when you are generating leads for your company, you can send messages to your target companies’ founders and CEO and tell them about your product. Another usage is that when you are opening your own lead generation company like snov.io or findthatlead.com, you can scrape these results to gather all the employee lists and update them from time to time.

Scraping LinkedIn Profiles From Google Search API

If you don’t want to code and maintain the scraper in the long run then you can definitely try our Google SERP API. Serpdog’s Google Search API also supports all advanced featured snippets like knowledge graphs, answer box results, etc.

Scraping Google also requires solving captchas, a large pool of User agents, and proxies, but Serpdog solves all these problems on its behalf for a smooth scraping experience.
Our users also get 100 free requests on the first sign-up.

 const axios = require('axios');
    axios.get('https://api.serpdog.io/search?api_key=APIKEY&q=coffee&gl=us')
    .then(response => {
        console.log(response.data);
    })
    .catch(error => {
        console.log(error);
    });                     

Conclusion:

In this tutorial, we learned to scrape LinkedIn profiles with the help of Google using Node JS. Feel free to message me if I missed something. Follow me on Twitter. Thanks for reading!

Additional Resources

  1. Web Scraping Google With Node JS – A Complete Guide
  2. Web Scraping Google Without Getting Blocked
  3. Scrape Google Organic Search Results
  4. Scrape Google Shopping Results