...

Web Scraping Google Maps Places

In this tutorial, we are going to scrape Google Maps Places Results with Node JS using Puppeteer JS.

Web Scraping Google Maps Places
Web Scraping Google Maps Places Results

Let’s start Scraping Google Maps Places:

Scraping Google Maps Places results is quite easy. You only need to run a Puppeteer script, which will assist our headless browser in navigating to the target URL and extracting the required information from the webpage, such as phone numbers, email addresses, and addresses.

But before we begin our project, we have to complete some requirements.

Web Parsing with CSS selectors

Searching the tags from the HTML files is not only a difficult thing to do but also a time-consuming process. It is better to use the CSS Selectors Gadget for selecting the perfect tags to make your web scraping journey easier.

This gadget can help you to come up with the perfect CSS selector for your need. Here is the link to the tutorial, which will teach you to use this gadget for selecting the best CSS selectors according to your needs.

User Agents

User-Agent is used to identify the application, operating system, vendor, and version of the requesting user agent, which can save help in making a fake visit to Google by acting as a real user.
You can also rotate User Agents to escape blocking to a certain extent.
Read more: How to fake and rotate User Agents using Python 3.

If you want to further safeguard your IP from being blocked by Google, you can try these 10 Tips to avoid getting Blocked while Scraping Websites.

Install Libraries

Before we begin, install these libraries so we can move forward and prepare our scraper.

  1. Puppeteer JS

Or you can type the below commands in your project terminal to install the libraries:

npm i puppeteer

Process

Web Scraping Google Maps Places Results 2

So, we have installed all the libraries required for this project. We will now prepare our scraper. Copy the below URL in your browser, and you will see the results as shown in the above image.

https://www.google.com/maps/place/Blacklist+Coffee+Roasters/@-31.9473,115.8073705,14z/data=!4m13!1m7!3m6!1s0x0:0xf79bec80595c6aa8!2sBlacklist+Coffee+Roasters!8m2!3d-31.9472988!4d115.8248801!10e1!3m4!1s0x0:0xf79bec80595c6aa8!8m2!3d-31.9472988!4d115.8248801                     

First, we will create the driver function, which will launch the browser and navigate to the target URL.

const getMapsPlacesData = async () => {
     try {
        const url = "https://www.google.com/maps/place/Blacklist+Coffee+Roasters/@-31.9473,115.8073705,14z/data=!4m13!1m7!3m6!1s0x0:0xf79bec80595c6aa8!2sBlacklist+Coffee+Roasters!8m2!3d-31.9472988!4d115.8248801!10e1!3m4!1s0x0:0xf79bec80595c6aa8!8m2!3d-31.9472988!4d115.8248801";
    
        browser = await puppeteer.launch({
            headless: false,
            args: ["--disabled-setuid-sandbox", "--no-sandbox"],
        });
        const [page] = await browser.pages();
    
        await page.goto(url, { waitUntil: "domcontentloaded", timeout: 60000 });
        await page.waitForTimeout(3000);
    
        const data = await extractData(page);
        console.log(data)
    
        await browser.close();
     }
     catch (e) {
        console.log(e);
     }
    }                 

Step-by-step explanation:

  1. puppeteer.launch() – This will launch the Chromium browser with the options we have set in our code. In our case, we are launching our browser in non-headless mode.
  2. browser.newPage() – This will open a new page or tab in the browser.
  3. page.setExtraHTTPHeaders() – It is used to pass HTTP headers with every request the page initiates.
  4. page.goto() – This will navigate the page to the specified target URL.
  5. page.waitForTimeout() – It will cause the page to wait for 3 seconds to do further operations.
  6. extractData() – In this step, we called our function to extract the data we need from the page.
  7. console.log(data) – We printed the scraped data in the terminal.
  8. await browser.close() – Finally, we closed the browser window. It is an essential step as if the browser remains open, it would increase CPU usage.

Now, let us prepare our parser and extract the required data.

const extractData = async (page) => {
    let items = await page.evaluate(() => {
    let i = 0;
    return {
        title: document.querySelector(".fontHeadlineLarge")?.textContent,
        rating: document.querySelector(".F7nice")?.textContent,
        reviews: document.querySelector(".mmu3tf .DkEaL")?.textContent,
        type: document.querySelector(".u6ijk")?.textContent,
        service_options: document.querySelector(".E0DTEd")?.textContent.replaceAll("·", ""),
        address: document.querySelector("button[data-tooltip='Copy address']")?.textContent.trim(),
        website: document.querySelector("a[data-tooltip='Open website']")?.textContent.trim(),
        pluscode: document.querySelector("button[data-tooltip='Copy plus code']")?.textContent.trim(),
        timings: Array.from(document.querySelectorAll(".OqCZI tr")).map((el) => {
            return {
                [el.querySelector("td:first-child")?.textContent.trim()]: el.querySelector("td:nth-child(2) li.G8aQO")?.textContent,
            };
        }),
        popularTimes: {
            graphResults: Array.from(document.querySelectorAll(".C7xf8b > div")).map((el) => {
                let day;
                if (i == 0) {
                    day = "Sunday"
                }
                else if (i == 1) {
                    day = "Monday"
                }
                else if (i == 2) {
                    day = "Tuesday"
                }
                else if (i == 3) {
                    day = "Wednesday"
                }
                else if (i == 4) {
                    day = "Thursday"
                }
                else if (i == 5) {
                    day = "Friday"
                }
                else if (i == 6) {
                    day = "Saturday"
                }
                i++;
                return {
                    [day]: Array.from(el.querySelectorAll(`.dpoVLd`)).map((el) => {
                        const time = el.getAttribute("aria-label").split("at")[1].trim();
                        const busy_percentage = el.getAttribute("aria-label").split("busy")[0].trim();
                        return {
                            time,
                            busy_percentage,
                        };
                    }),
                };
            }),
        },
        photos: Array.from(document.querySelectorAll(".dryRY .ofKBgf")).map((el) => {
            return {
                title: el.getAttribute("aria-label"),
                thumbnail: el.querySelector("img").getAttribute("src"),
            }
        }),
        question_and_answers: {
            question: document.querySelector(".Py6Qke")?.textContent,
            answer: document.querySelector(".l79Qmc").textContent
        },
        user_ratings: Array.from(document.querySelectorAll(".ExlQHd tr")).map((el) => {
            return {
                [el.getAttribute("aria-label")?.split(",")[0].trim()]: el.getAttribute("aria-label")?.split(",")[1].trim(),
            };
        }),
        user_reviews: Array.from(document.querySelectorAll(".tBizfc")).map((el) => {
            return {
                description: el.textContent.replace(/"/g, "").trim(),
                user_link: el.querySelector("a").getAttribute("href")
            }
        }),
        mentions: Array.from(document.querySelectorAll(".KNfEk+ div .L6Bbsd")).map((el) => {
            return {
                query: el.querySelector(".uEubGf").textContent,
                mentioned: el.querySelector(".fontBodySmall").textContent + "times"
            }
        }),
        most_relevant: Array.from(document.querySelectorAll(".jJc9Ad")).map((el) => {
        return {
            user: {
            name: el.querySelector(".d4r55")?.textContent,
            thumbnail: el.querySelector(".NBa7we")?.getAttribute("src"),
            local_guide: el.querySelector(".RfnDt span:nth-child(1)")?.textContent.length ? true : false,
            reviews: el.querySelector(".RfnDt span:nth-child(2)")?.textContent.replace(".", "").trim(),
            link: el.querySelector(".WEBjve")?.getAttribute("href")
            },
            rating: el.querySelector(".kvMYJc")?.getAttribute("aria-label"),
            date: el.querySelector(".rsqaWe")?.textContent,
            review: el.querySelector(".MyEned .wiI7pd").textContent,
            images: Array.from(el.querySelectorAll(".KtCyie button")).length ? Array.from(el.querySelectorAll(".KtCyie button")).map((el) => {
                return {
                    thumbnail: getComputedStyle(el).backgroundImage.split('")')[0].replace('url("', ""),
                };
            })
                : "",
        }
        })
        }
    });
    return items;
    }                                                  

Step-by-step explanation:

  1. document.querySelectorAll() – It will return all the elements that match the specified CSS selector. In our case, it is Nv2PK.
  2. getAttribute() -This will return the attribute value of the specified element.
  3. textContent – It returns the text content inside the selected HTML element.
  4. split() – Used to split a string into substrings with the help of a specified separator and return them as an array.
  5. trim() – Removes the spaces from the starting and end of the string.
  6. replaceAll() – Replaces the specified pattern from the whole string.
  7. map() – It calls a callback function on each element of the array and returns an array that contains the results.

Here is the complete code:

const puppeteer = require("puppeteer");

    const extractData = async (page) => {
    let items = await page.evaluate(() => {
    let i = 0;
    return {
        title: document.querySelector(".fontHeadlineLarge")?.textContent,
        rating: document.querySelector(".F7nice")?.textContent,
        reviews: document.querySelector(".mmu3tf .DkEaL")?.textContent,
        type: document.querySelector(".u6ijk")?.textContent,
        service_options: document.querySelector(".E0DTEd")?.textContent.replaceAll("·", ""),
        address: document.querySelector("button[data-tooltip='Copy address']")?.textContent.trim(),
        website: document.querySelector("a[data-tooltip='Open website']")?.textContent.trim(),
        pluscode: document.querySelector("button[data-tooltip='Copy plus code']")?.textContent.trim(),
        timings: Array.from(document.querySelectorAll(".OqCZI tr")).map((el) => {
            return {
                [el.querySelector("td:first-child")?.textContent.trim()]: el.querySelector("td:nth-child(2) li.G8aQO")?.textContent,
            };
        }),
        popularTimes: {
            graph_data: Array.from(document.querySelectorAll(".C7xf8b > div")).map((el) => {
                let day;
                if (i == 0) {
                    day = "Sunday"
                }
                else if (i == 1) {
                    day = "Monday"
                }
                else if (i == 2) {
                    day = "Tuesday"
                }
                else if (i == 3) {
                    day = "Wednesday"
                }
                else if (i == 4) {
                    day = "Thursday"
                }
                else if (i == 5) {
                    day = "Friday"
                }
                else if (i == 6) {
                    day = "Saturday"
                }
                i++;
                return {
                    [day]: Array.from(el.querySelectorAll(`.dpoVLd`)).map((el) => {
                        const time = el.getAttribute("aria-label").split("at")[1].trim();
                        const busy_percentage = el.getAttribute("aria-label").split("busy")[0].trim();
                        return {
                            time,
                            busy_percentage,
                        };
                    }),
                };
            }),
        },
        photos: Array.from(document.querySelectorAll(".dryRY .ofKBgf")).map((el) => {
            return {
                title: el.getAttribute("aria-label"),
                thumbnail: el.querySelector("img").getAttribute("src"),
            }
        }),
        question_and_answers: {
            question: document.querySelector(".Py6Qke")?.textContent,
            answer: document.querySelector(".l79Qmc").textContent
        },
        user_ratings: Array.from(document.querySelectorAll(".ExlQHd tr")).map((el) => {
            return {
                [el.getAttribute("aria-label")?.split(",")[0].trim()]: el.getAttribute("aria-label")?.split(",")[1].trim(),
            };
        }),
        user_reviews: Array.from(document.querySelectorAll(".tBizfc")).map((el) => {
            return {
                description: el.textContent.replace(/"/g, "").trim(),
                user_link: el.querySelector("a").getAttribute("href")
            }
        }),
        mentions: Array.from(document.querySelectorAll(".KNfEk+ div .L6Bbsd")).map((el) => {
            return {
                query: el.querySelector(".uEubGf").textContent,
                mentioned: el.querySelector(".fontBodySmall").textContent + "times"
            }
        }),
        most_relevant: Array.from(document.querySelectorAll(".jJc9Ad")).map((el) => {
            return {
                user: {
                    name: el.querySelector(".d4r55")?.textContent,
                    thumbnail: el.querySelector(".NBa7we")?.getAttribute("src"),
                    local_guide: el.querySelector(".RfnDt span:nth-child(1)")?.textContent.length ? true : false,
                    reviews: el.querySelector(".RfnDt span:nth-child(2)")?.textContent.replace(".", "").trim(),
                    link: el.querySelector(".WEBjve")?.getAttribute("href")
                },
                rating: el.querySelector(".kvMYJc")?.getAttribute("aria-label"),
                date: el.querySelector(".rsqaWe")?.textContent,
                review: el.querySelector(".MyEned .wiI7pd").textContent,
                images: Array.from(el.querySelectorAll(".KtCyie button")).length ? Array.from(el.querySelectorAll(".KtCyie button")).map((el) => {
                    return {
                        thumbnail: getComputedStyle(el).backgroundImage.split('")')[0].replace('url("', ""),
                    };
                })
                    : "",
            }
        })
    }
    });
    return items;
    }

    const getMapsPlacesData = async () => {
    try {
    const url = "https://www.google.com/maps/place/Blacklist+Coffee+Roasters/@-31.9473,115.8073705,14z/data=!4m13!1m7!3m6!1s0x0:0xf79bec80595c6aa8!2sBlacklist+Coffee+Roasters!8m2!3d-31.9472988!4d115.8248801!10e1!3m4!1s0x0:0xf79bec80595c6aa8!8m2!3d-31.9472988!4d115.8248801";

    browser = await puppeteer.launch({
        headless: false,
        args: ["--disabled-setuid-sandbox", "--no-sandbox"],
    });
    const [page] = await browser.pages();

    await page.goto(url, { waitUntil: "domcontentloaded", timeout: 60000 });
    await page.waitForTimeout(3000);

    const data = await extractData(page);
    console.log(data)

    await browser.close();
    }
    catch (e) {
    console.log(e);
    }
    }

    getMapsPlacesData();                

Our result should look like this 👇🏻:

{
    title: ' Blacklist Coffee Roasters  ',
    rating: '4.8116 reviews',
    reviews: '116 reviews',
    type: 'Coffee shop',
    service_options: '    Dine-in    Takeaway    Delivery  ',
    address: '439D Hay St, Subiaco WA 6008, Australia',
    website: 'blacklistcoffee.com.au',
    pluscode: '3R3F+3X Subiaco, Western Australia, Australia',
    timings: [
        { Saturday: '7am-2pm' },
        { Sunday: '8am-2pm' },
        { Monday: '7am-2pm' },
        { Tuesday: '7am-2pm' },
        { Wednesday: '7am-2pm' },
        { Thursday: '7am-2pm' },
        { Friday: '7am-2pm' }
    ],
    popularTimes: {
        graphResults: [
        [Object], [Object],
        [Object], [Object],
        [Object], [Object],
        [Object]
        ]
    },
    photos: [
        {
        title: 'All',
        thumbnail: 'https://lh5.googleusercontent.com/p/AF1QipOQrrdy6N2Z7Xp9zbS-BE0LqVqJPXyHAYPW76zD=w224-h298-k-no'
        },
        {
        title: 'Latest · 10 days ago',
        thumbnail: 'https://lh5.googleusercontent.com/p/AF1QipPAGd8tNSNaBLdx7XGTtL4o48xOK4kLgMjFGHh-=w448-h298-k-no'
        },
        {
        title: 'Food & drink',
        thumbnail: 'https://lh5.googleusercontent.com/p/AF1QipOIfWpjgc7syqDrvU72Cg_ey4JhsDWU-v1kcmpS=w447-h298-k-no'
        },
        {
        title: 'Vibe',
        thumbnail: 'https://lh5.googleusercontent.com/p/AF1QipMDFZ_xthQMPS9nQcrbCLGYcawrzmnYQE9dDDjN=w224-h298-k-no'
        },
        {
        title: 'Latte',
        thumbnail: 'https://lh5.googleusercontent.com/p/AF1QipPlwUXR7bfyPYQz1CjtoUJljds1na3T-POExbZK=w397-h298-k-no'
        },
        {
        title: 'Coffee',
        thumbnail: 'https://lh5.googleusercontent.com/p/AF1QipOOfsk6V1Dc7Ew8NbGHQpUYU2XN8Ua_58nJHuPN=w224-h298-k-no'
        },
        {
        title: 'By owner',
        thumbnail: 'https://lh5.googleusercontent.com/p/AF1QipMqKXEiGXy-YjmB_mTurbqgi31mdn8EWRRsYwAI=w446-h298-k-no'
        },
        {
        title: 'Street View & 360°',
        thumbnail: 'https://streetviewpixels-pa.googleapis.com/v1/thumbnail?panoid=OkXbXBk_L_BvCTTYcRC2Cw&cb_client=maps_sv.tactile.gps&w=224&h=298&yaw=156.95456&pitch=0&thumbfov=100'
        },
        {
        title: 'Videos',
        thumbnail: 'https://lh5.googleusercontent.com/p/AF1QipP4Jq3MimzxDXc9oh_hGQAQkZDpxDOh5m9FRpEd=w224-h298-k-no'
        }
    ],
    question_and_answers: {
        question: 'Will they grind purchased beans in store?',
        answer: 'Hi Alex, we can grind any beans you purchase from us :)'
    },
    user_ratings: [
        { '5 stars': '103 reviews' },
        { '4 stars': '8 reviews' },
        { '3 stars': '2 reviews' },
        { '2 stars': '0 reviews' },
        { '1 stars': '3 reviews' }
    ],
    user_reviews: [
        {
        description: "They also sell coffee equipment at standard, non-inflated prices (I've checked).",
        user_link: 'https://www.google.com/maps/contrib/101811065862344097095?hl=en-IN'
        },
        {
        description: 'Very serious about their coffee 👌 Deserves more attention this place!',
        user_link: 'https://www.google.com/maps/contrib/109201574522158862622?hl=en-IN'
        },
        {
        description: 'Ordered a distilled coffee and a mocha plus a cookie.',
        user_link: 'https://www.google.com/maps/contrib/111095785064544767742?hl=en-IN'
        }
    ],
    mentions: [
        { query: 'beans', mentioned: '10times' },
        { query: 'coffee tasting', mentioned: '9times' },
        { query: 'barista', mentioned: '7times' },
        { query: 'milk', mentioned: '3times' }
    ],
    most_relevant: [
        {
        user: [Object],
        rating: ' 5 stars ',
        date: '2 weeks ago',
        review: 'A regular coffee stop. If you're into brews and love a tasting, here's one to go to. Definitely recommend this place. Love the vibe of the cafe; the interior. Ideal to chill here in the morning. Parking is easy to find which is great especially in Subiaco.',
        images: [Array]
        },
        {
        user: [Object],
        rating: ' 5 stars ',
        date: 'a year ago',
        review: 'Excellent coffee, lactose-free milk available. We came for the $14 coffee tasting and it was really good! You can taste coffees in filter style, espresso, or with milk (latte etc). The staff are lovely, Bree was so nice! Apparently they rotate the coffees for tasting every 2-3 weeks, so will definitely be back for another tasting.',
        images: [Array]
        },
        {
        user: [Object],
        rating: ' 5 stars ',
        date: '5 months ago',
        review: 'Come here not only for the great black coffee but ALL the staff here are super welcoming and lovely to speak to. ' ...',
        images: [Array]
        }
    ]
   }                                                                         

Conclusion:

In this tutorial, we learned to scrape Google Maps Results using Node JS. Feel free to message me if I missed something. Follow me on Twitter. Thanks for reading!

Additional Resources

  1. Web Scraping Google With Node JS – A Complete Guide
  2. Web Scraping Google Without Getting Blocked
  3. Scrape Google Organic Search Results
  4. Scrape Google Maps Reviews
  5. Web Scraping Google Maps