...

Scrape Google Play Store Data

In this tutorial, we are going to scrape Google Play Store App Data using Node JS. We will cover some basic information like app ratings and reviews, sample images, descriptions, etc.

Requirements:

Web Parsing with CSS selectors

Searching the tags from the HTML files is not only a difficult thing to do but also a time-consuming process. It is better to use the CSS Selectors Gadget for selecting the perfect tags to make your web scraping journey easier.
This gadget can help you to come up with the perfect CSS selector for your need. Here is the link to the tutorial, which will teach you to use this gadget for selecting the best CSS selectors according to your needs.

User Agents

User-Agent is used to identify the application, operating system, vendor, and version of the requesting user agent, which can save help in making a fake visit to Google by acting as a real user.
You can also rotate User Agents, read more about this in this article: How to fake and rotate User Agents using Python 3.
If you want to further safeguard your IP from being blocked by Google, you can try these 10 Tips to avoid getting Blocked while Scraping Google.

Install Libraries

To start scraping Google Play Store App Data we need to install some NPM libraries so that we can move forward.

  1. Unirest
  2. Cheerio

So before starting, we have to ensure that we have set up our Node JS project and installed both the packages - Unirest JS and Cheerio JS. You can install both packages from the above link.

Target:

Process:

Let’s begin the process of scraping the Play Store App Data. We will be using Unirest JS to extract the raw HTML data and parse this data with the help of Cheerio JS.
Open the below link in your browser, so we can start selecting the HTML tags for the required elements.

https://play.google.com/store/apps/details?id=com.whatsapp            

Let us make a GET request using Unirest JS on the target URL.

 const unirest = require("unirest");
    const cheerio = require("cheerio");
    
    const getGooglePlayData = async() => {
        
    let url = "https://play.google.com/store/apps/details?id=com.whatsapp"
    
    let response = await unirest
    .get(url)
    .headers({
        "User-Agent":
        "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36",
        })
    const $ = cheerio.load(response.body)

Step-by-step explanation:

  1. In the first and second lines, we declared the constant for the Unirest and Cheerio libraries.
  2. In the next line, we declared a function to get the Google Play Data.
  3. After that, we declared a constant for the URL and a head object which consists of the User Agent.
  4. Next, we made the request on the URL with the help of Unirest.
  5. In the last line, we declared a Cheerio instance variable to load the response.

Now, we will prepare our parser by searching the tags with the help CSS selector gadget, as stated above in the Requirements section.

In the above image, the tag for the title is xwcR9d. So, its parser would look like this:

let app_info = {};
app_info.title = $(".xwcR9d").text();

In the above code, we have declared an object app_info for storing the basic information about the app. Then extract the title from the HTML with the help of the Cheerio constant.

Let us now scrape user reviews also. 

These reviews are under the container with the tag EGFGHd here. So, after parsing the reviews our code looks like this:

app_info.user_reviews = [];

    $(".EGFGHd").each((i,el) => {
        app_info.user_reviews.push({
            name: $(el).find(".X5PpBb").text(),
            date: $(el).find(".bp9Aid").text(),
            description: $(el).find(".h3YV2d").text(),
            thumbnail: $(el).find("img").attr("src")
        })
    })                      

First, we declared an array user_reviews inside our app_info object. Then we loop over the selected container to scrape the required data.

We scraped the name, review date, description, and user thumbnail with the help of the above code.

Similarly, we can scrape the other parts on the page by selecting the tags with the help of a selector gadget. After, completing the selection process our parser will look like this:

 let app_info = {};

    app_info.title = $(".xwcR9d").text();
    app_info.company = $(".auoIOc").text();
    app_info.app_thumbnail = $(".arM4bb").attr("src");
    app_info.rating = parseFloat($(".jILTFe").text().replace("star", ""));
    app_info.reviews = $(".EHUI5b").text().split(" ")[0];
    app_info.downloads = $(".wVqUob:nth-child(2) .ClM7O").text();
    app_info.rated_for = $(".wVqUob~ .wVqUob+ .wVqUob .g1rdde").text().replace("Rated for ", "").replace("info", "")
    app_info.description = $(".bARER").text();
    app_info.developer_info = {};
    app_info.developer_info.website = $(".VVmwY:nth-child(1) .pSEeg").text();
    app_info.developer_info.email = $(".VVmwY:nth-child(2) .pSEeg").text();
    app_info.developer_info.address = $(".VVmwY:nth-child(3) .pSEeg").text();
    app_info.developer_info.privacy_policy = $(".VVmwY:nth-child(4) .pSEeg").text();

    app_info.user_reviews = [];

    $(".EGFGHd").each((i,el) => {
        app_info.user_reviews.push({
            name: $(el).find(".X5PpBb").text(),
            date: $(el).find(".bp9Aid").text(),
            description: $(el).find(".h3YV2d").text(),
            thumbnail: $(el).find("img").attr("src")
        })
    })

    app_info.images_results = [];

    $(".aoJE7e .Atcj9b").each((i,el) => {
        app_info.images_results.push({
            src: $(el).find("img").attr("src")
        })
    })                      

Now, our results should look like this:

 {
        title: 'WhatsApp Messenger',
        company: 'WhatsApp LLC',
        app_thumbnail: 'https://play-lh.googleusercontent.com/bYtqbOcTYOlgc6gqZ2rwb8lptHuwlNE75zYJu6Bn076-hTmvd96HH-6v7S0YUAAJXoJN=w240-h480-rw',
        rating: 4.1,
        reviews: '172M',
        downloads: '5B+',
        rated_for: '3+',
        description: 'WhatsApp from Meta is a FREE messaging and video calling app. It’s used by over 2B people in more than 180 countries. It’s simple, reliable, and private, so you can easily keep in touch with your friends and family. WhatsApp works across mobile and desktop even on slow connections, with no subscription fees*.Private messaging across the worldYour personal messages and calls to friends and family are end-to-end encrypted. No one outside of your chats, not even WhatsApp, can read or listen to them.Simple and secure connections, right awayAll you need is your phone number, no user names or logins. You can quickly view your contacts who are on WhatsApp and start messaging.High quality voice and video callsMake secure video and voice calls with up to 8 people for free*. Your calls work across mobile devices using your phone’s Internet service, even on slow connections.Group chats to keep you in contactStay in touch with your friends and family. End-to-end encrypted group chats let you share messages, photos, videos and documents across mobile and desktop.Stay connected in real timeShare your location with only those in your individual or group chat, and stop sharing at any time. Or record a voice message to connect quickly.Share daily moments through StatusStatus allows you to share text, photos, video and GIF updates that disappear after 24 hours. You can choose to share status posts with all your contacts or just selected ones.*Data charges may apply. Contact your provider for details.---------------------------------------------------------If you have any feedback or questions, please go to WhatsApp > Settings > Help > Contact Us',
        developer_info: {
            website: 'http://www.whatsapp.com/',
            email: 'android@support.whatsapp.com',
            address: '1601 Willow Road\nMenlo Park, CA 94025',
            privacy_policy: 'http://www.whatsapp.com/legal/#Privacy'
        },
        user_reviews: [
            {
            name: 'Vedant Jain',
            date: 'December 13, 2022',
            description: 'After the recent update I am not able to view the photos which are set to view only once. In addition to that I am not able to backup my chats to Google drive. It always stops at 96%, even though there is enough space on my drive. The customer support just sends automated messages and is of no help. Really disappointed with the customer support',
            thumbnail: 'https://play-lh.googleusercontent.com/a/AEdFTp5OsFV7faP9ETEAvpvI_GxXgW-7bodH2UIukBqM=s32-rw-mo'
            },
            {
            name: 'Stage Hermit',
            date: 'December 12, 2022',
            description: "Had a great experience so far ... However, after the recent update, > Whattsapp calls are taking way too long to connect. ( Both incoming and outgoing ). Haven't checked on video calls though.. > Messaging is working pretty fine. > Also ..while trying to attach a image from within the chat window, ( especially which has been taken as a screenshot ) .. it isn't able to locate the image. Have to go to the gallery and use the Share option on the image. I am using a OnePlus 10 pro",
            thumbnail: 'https://play-lh.googleusercontent.com/a/AEdFTp4ce67C2FB5MGWdSPWgjcej33T0kYNbwYdujreG=s32-rw-mo'
            },
            {
            name: 'David Raju',
            date: 'November 26, 2022',
            description: "I had a very good experience of using this application, and it's so useful for the person to have a conversation with the person who is far from us. just one suggestion you must add up the music in the story uploading so the people who wants to add music with photo or a video can be possible eaasily. Must do this change and update it soon, people will so happy to use it. Thank you very much.As a user of whatsapp , I have seen one of the new features I.e. poll selection. This feature is not so us",
            thumbnail: 'https://play-lh.googleusercontent.com/a-/AD5-WCn-xwvCn4_ZrjAw4-T7El7pBo6z8MxxJmd9DK6x=s32-rw'
            }
        ],
        images_results: [
            {
            src: 'https://play-lh.googleusercontent.com/tNuMAclO_TrRn5RbiSo2iU2ySljFaHjCIWoMUSoemUcl4FjTyVO0PpJZL_zTrYf7v_4=w526-h296-rw'
            },
            {
            src: 'https://play-lh.googleusercontent.com/ijfSGQUCqeCmCQX0w_HjdSWkiYZoFk5JZ5CsxmGI-qT1VPT8V3wGohMBpWZOAp2o7A=w526-h296-rw'
            },
            {
            src: 'https://play-lh.googleusercontent.com/Ck5x7vPWfgXoLvkGqVs5INzV3dzHMYYy4Jr6YVpXDTR-00p_V_kpGABtfXCp9qx10cs=w526-h296-rw'
            },
            {
            src: 'https://play-lh.googleusercontent.com/ef3mz9xoDiwk08KB7B6oN0uSqJkxy8yMBwdOl9TGc3rSsOLdYBQlRZqMCduJjJyeBQ=w526-h296-rw'
            },
            {
            src: 'https://play-lh.googleusercontent.com/8InPqYGQ-28qwt_mLmm6R3VzbMcf3ZSJNUxO_OJosyLRqPHeStZFtjKskgDvHkanfRUJ=w526-h296-rw'
            }
        ]
        }                 

Here is the complete code:

 const unirest = require("unirest");
    const cheerio = require("cheerio");
    
    const getGooglePlayData = async() => {
        
        let url = "https://play.google.com/store/apps/details?id=com.whatsapp"
        
        let response = await unirest
        .get(url)
        .headers({
            "User-Agent":
            "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36",
            })
        const $ = cheerio.load(response.body)
        
        let app_info = {};
    
        app_info.title = $(".xwcR9d").text();
        app_info.company = $(".auoIOc").text();
        app_info.app_thumbnail = $(".arM4bb").attr("src");
        app_info.rating = parseFloat($(".jILTFe").text().replace("star", ""));
        app_info.reviews = $(".EHUI5b").text().split(" ")[0];
        app_info.downloads = $(".wVqUob:nth-child(2) .ClM7O").text();
        app_info.rated_for = $(".wVqUob~ .wVqUob+ .wVqUob .g1rdde").text().replace("Rated for ", "").replace("info", "")
        app_info.description = $(".bARER").text();
        app_info.developer_info = {};
        app_info.developer_info.website = $(".VVmwY:nth-child(1) .pSEeg").text();
        app_info.developer_info.email = $(".VVmwY:nth-child(2) .pSEeg").text();
        app_info.developer_info.address = $(".VVmwY:nth-child(3) .pSEeg").text();
        app_info.developer_info.privacy_policy = $(".VVmwY:nth-child(4) .pSEeg").text();
    
        app_info.user_reviews = [];
    
        $(".EGFGHd").each((i,el) => {
            app_info.user_reviews.push({
                name: $(el).find(".X5PpBb").text(),
                date: $(el).find(".bp9Aid").text(),
                description: $(el).find(".h3YV2d").text(),
                thumbnail: $(el).find("img").attr("src")
            })
        })
    
        app_info.images_results = [];
    
        $(".aoJE7e .Atcj9b").each((i,el) => {
            app_info.images_results.push({
                src: $(el).find("img").attr("src")
            })
        })
        
    
        console.log(app_info)
    
    
    
    };
    
    getGooglePlayData(); 

Conclusion:

In this tutorial, we learned to scrape Google Play Store App Data with Node JS. Feel free to message me if I missed something. Follow me on Twitter. Thanks for reading!

Additional Resources

  1. Web Scraping Google With Node JS – A Complete Guide
  2. Scrape Google Play Apps Results
  3. Scrape Google Organic Search Results
  4. Scrape Google Shopping Results
  5. Scrape Google Maps Reviews
Seraphinite AcceleratorOptimized by Seraphinite Accelerator
Turns on site high speed to be attractive for people and search engines.