Web Scraping With Node JS – An Ultimate Guide

JavaScript has now become one of the most preferred languages for web scraping. Its ability to extract the data from SPA(Single Page Application) is boosting its popularity. Developers can easily automate their tasks with the help of libraries like Puppeteer and Cheerio, which are available in JavaScript.

In this blog, we are going to discuss various web scraping libraries in JavaScript, their advantages and disadvantages, and determine the best among them. In the end, we will also discuss JavaScript is a great choice for web scraping tasks.

Web Scraping With Node JS

Web Scraping With Node JS

Before starting to learn web scraping with Node JS, let us learn some basics of web scraping.

What is Web Scraping?

What is Web Scraping?
What is Web Scraping?

Web Scraping is the process of extracting data from a single or bunch of websites with the help of HTTP requests on the website’s server to access the raw HTML of a particular webpage and then convert it into a desired format. 

There are various uses of Web Scraping:

SEO – Web Scraping can be used to scrape Google Search Results for various objectives like SERP Monitoring, keyword tracking, rank tracking, SEO, etc.

News Monitoring – Web Scraping can enable access to a large number of articles from various media agencies which can be used to keep track of current news and global events around the world.

Lead Generation – Web Scraping helps to extract the contact details of a person who can be your potential customer.

Price Comparison – Web Scraping can be used to gather product pricing from multiple online retailers for price comparison. For example – you can extract the pricing of a particular product from Amazon.com and Walmart.com, and then you can compare the pricing to go with the inexpensive retailer.

Best Web Scraping Libraries in Node JS

The best web scraping libraries present in Node JS are:

  1. Unirest
  2. Axios
  3. SuperAgent
  4. Cheerio
  5. Puppeteer
  6. Playwright
  7. Nightmare

Let us start discussing these various web scraping libraries one by one.


HTTP client libraries interact with website servers by sending requests and retrieving the response. In the following sections, we will discuss several libraries that can be utilized for making HTTP requests.


Unirest is a lightweight HTTP request library available in multiple languages, built and maintained by Kong. It supports various HTTP methods like GET, POST, DELETE, HEAD, etc., which can be easily added to your applications, making it a preferable choice for effortless use cases.

Unirest is one of the most popular JavaScript libraries that can be utilized to extract the valuable data available on the internet.

Let us take an example of how we can do it. Before starting, I am assuming that you have already set up your Node JS project with a working directory.

First, install Unirest JS by running the following command in your project terminal.

npm i unirest 

Now, using Unirest we will request the target URL to extract the raw HTML data.

    const unirest = require("unirest");
    const getData = async() => {
    const response = await unirest.get("https://www.reddit.com/r/programming.json")
    console.log(response.body); // HTML

This is how you can create a basic scraper with Unirest.


  1. All HTTP methods are supported, including GET, POST, DELETE, etc.
  2. It is very fast for web scraping tasks and can handle a large amount of load without any problem.
  3. It allows file transfer over a server in a much simpler way.


Axios is a promise-based HTTP client for both Node JS and browsers. Axios is widely used among the developer community because of its wide range of methods, simplicity, and active maintenance. It also supports various features like cancel requests, automatic transforms for JSON data, etc.

Install the Axios library by running the following command in your terminal.

npm i axios

Making an HTTP request with Axios is quite simple.

const axios = require("axios");
    const getData = async() => {
        const response = await axios.get("https://books.toscrape.com/")
        console.log(response.data); // HTML


  1. It can intercept an HTTP request and can modify it.
  2. It has large community support and is actively maintained by its founders making it a reliable option for making HTTP requests.
  3. It can transform the request and response data.


SuperAgent is another lightweight HTTP Client library for both Node JS and browser. It supports many high-level HTTP client features. It features a similar API as Axios and supports, both promise and async/await syntax for handling responses.

You can install SuperAgent by running the following command.

npm i superagent

You can make an HTTP request using async/await with SuperAgent like this:

    const superagent = require("superagent");
    const getData = async() => {
        const response = await superagent.get("https://books.toscrape.com/")
        console.log(response.text); // HTML


  1. SuperAgent can be easily extended via various plugins.
  2. It works in both the browser and node.


  1. Fewer features as compared to other HTTP client libraries like Axios.
  2. Documentation is not provided in detail.

Web Parsing Libraries

Web parsing of HTML
Web parsing of HTML

Web Parsing Libraries are used to filter out the required data from the raw HTML or XML document. There are various web parsing libraries present in JavaScript including Cheerio, JSONPath, html-parse-stringify2, etc. In the following section, we will discuss Cheerio, the most popular web parsing library in JavaScript.


Cheerio is a lightweight web parsing library based on the powerful API of jQuery that can be used to parse and extract data from HTML and XML documents.

Cheerio is blazingly fast in HTML parsing, manipulating, and rendering as it works with a simple consistent DOM model. It is not a web browser as it can’t produce visual rendering, apply CSS, and execute JavaScript. For scraping SPA(Single Page Applications) we need complete browser automation tools like Puppeteer, Playwright, etc which we will discuss in a bit.

Let us scrape the title of the book in the below image. 

Inspecting the book title

First, we will install the Cheerio library.

npm i cheerio

Then, we can extract the title by running the below code.

 const unirest = require("unirest");
    const cheerio = require("cheerio");
    const getData = async() => {
        const response = await unirest.get("https://books.toscrape.com/catalogue/sharp-objects_997/index.html")
    const $ = cheerio.load(response.body);
        console.log("Book Title: " + $("h1").text()); // "Book Title: Sharp Objects"

The process is quite similar to what we have done in the Unirest section but with a little difference. In the above code, we load the extracted HTML into a Cheerio constant, and then we used the CSS Selector of the title to extract the required data.


  1. Faster than any other web parsing library.
  2. Cheerio has a very simple syntax and is similar to jQuery which allows developers to scrape web pages easily.
  3. Cheerio can be used or integrated with various web scraping libraries like Unirest and Axios, which can be an excellent combo for scraping a website.


  1. It cannot execute Javascript.

Headless Browsers

Web development has become more advanced, and developers are using JavaScript frameworks at their backend to load dynamic content on their websites. But this content rendered by JavaScript is not accessible while scraping with a simple HTTP GET request, which can only be used to get the static part of HTML. The only way you can scrape the dynamic content is by using headless browsers.

Browsers that can operate without any Graphical User Interface are known as Headless Browsers. They can be controlled programmatically to perform various tasks like submitting forms, clicking buttons, infinite scrolling, etc.

Let us discuss those libraries which can help us in scraping the dynamically rendered content.


Puppeteer is a Node JS library designed by Google that provides a high-level API that to control Chrome or Chromium browsers.

Features associated with Puppeteer JS:

  1. Puppeteer can be used to have better control over Chrome.
  2. It can generate screenshots and PDFs of web pages.
  3. It can be used to scrape web pages which uses JavaScript to load the content dynamically.

Let us scrape all the book titles and their links on this website.
But first, we will install the puppeteer library.

npm i puppeteer

Now, we will prepare a script to scrape the required information. 

Inspecting the books pricing

Write the below code in your js file.

 const browser = await puppeteer.launch({
        headless: false,
        const page = await browser.newPage(); 
        await page.goto("https://books.toscrape.com/index.html" , {
        waitUntil: 'domcontentloaded'

Step-by-step explanation:

  1. First, we launched the browser with the headless mode set to false, which allows us to see exactly what is happening.
  2. Then, we created a new page in the headless browser.
  3. After that, we navigated to our target URL and waited until the HTML completely loaded.

Now, we will parse the HTML.

 let data = await page.evaluate(() => {
        return Array.from(document.querySelectorAll("article h3")).map((el) => {
        return {
        title: el.querySelector("a").getAttribute("title"),
        link: el.querySelector("a").getAttribute("href"),

The page.evalueate() will execute the javascript within the current page context. And then document.querySelectorAll() will select all the elements that identify with article h3 tags. The document.querySelector() is the same, but it chooses a single HTML element.

Great! Now, we will print the data and close the browser.

   await browser.close();                   

This will give you 20 titles and links to the books present on the web page.


  1. We can perform various activities on the web page, like clicking on the buttons and links, navigating between the pages, scrolling the web page, etc.
  2. It can be used to take screenshots of web pages.
  3. The evaluate() function in the puppeteer JS helps you to execute Javascript.
  4. You don’t need an external driver to run the tests.


  1. It requires very high CPU usage to run.
  2. It currently supports only Chrome web browsers.


Playwright is a test automation framework to automate web browsers like Chrome, Firefox, and Safari with an API similar to Puppeteer. It was developed by the same team that worked on Puppeteer. Like Puppeteer, Playwright can also run in the headless and non-headless modes making it suitable for a wide range of uses from automating tasks to web scraping or web crawling.

Major Differences between Playwright and Puppeteer

  1. Playwright is compatible with Chrome, Firefox, and Safari, while Puppeteer only supports Chrome web browsers.
  2. Playwright provides a wide range of options to control the browser in headless mode.
  3. Puppeteer is limited to Javascript only, while Playwright supports various languages like C#, .NET, Java, Python, etc.

Let us install Playwright now.

npm i playwright

We will now prepare a basic script to scrape the prices and stock availability from the same website which we used in the Puppeteer section. 

Inspecting the stock avalability

The syntax is quite similar to Puppeteer.

 const browser = await playwright['chromium'].launch({ headless: false,});
    const context = await browser.newContext();
    const page = await context.newPage();
    await page.goto("https://books.toscrape.com/index.html");              

The newContext() will create a new browser context.
Now, we will prepare our parser.

  let articles =  await page.$$("article");

    let data = [];                 
    for(let article of articles)
            price: await article.$eval("p.price_color", el => el.textContent),
            availability: await article.$eval("p.availability", el => el.textContent),

Then, we will close our browser.

await browser.close();


  1. It supports multiple languages like Python, Java, .Net, and Javascript.
  2. It is faster than any other web browser automation library.
  3. It supports multiple web browsers like Chrome, Firefox, and Safari on a single API.
  4. Documentation is well-written which makes it easy for developers to learn and use.

Nightmare JS

Nightmare is a high-level web automation library designed to automate browsing, web scraping, and other relevant tasks. It uses Electron(similar to Phantom JS, but twice as fast) with a headless browser, making it efficient and easy to use. It is predominantly used for UI testing and crawling.

It can be utilized for mimicking user actions such as navigating to a website, clicking a button or a link, typing, etc., with an API that provides a smooth experience for each script block.
Install Nightmare JS by running the following command.

npm i nightmare

Now, we will search for the results of “Serpdog” on duckduckgo.com.

   const Nightmare = require('nightmare')
    const nightmare = Nightmare()
    .type('#search_form_input_homepage', 'Serpdog')
    .evaluate(() =>
        return Array.from(document.querySelectorAll('.nrn-react-div')).map((el) => {
        return {
        title: el.querySelector("h2").innerText.replace("\n",""),
        link: el.querySelector("h2 a").href
        .then((data) => {
        .catch((error) => {
        console.error('Search failed:', error)

In the above code, first, we declared an instance of Nightmare. Then, we navigated to the Duckduckgo search page.

Then, we used the type() method to type Serpdog in the search field, and submit the form by clicking the search button on the homepage using the click() method. We will make our scraper wait till the search results are loaded, after that we will extract the search results present on the web page with the help of their CSS selectors.


  1. It is faster than Puppeteer.
  2. Fewer resources are needed to run the program.


  1. It doesn’t have good community support like Puppeteer. Also, some undiscovered issues exist on Electron, which can allow a malicious website to execute code on your computer.

Other libraries

In this section, we will discuss some alternatives to the previously discussed libraries.

Node Fetch

Node Fetch is a lightweight library that brings Fetch API to Node JS, allowing HTTP requests in an efficient manner in the Node JS environment.


  1. It allows the use of promises and async functions.
  2. It implements the Fetch API functionality in Node JS.
  3. Simple API that is maintained regularly, and is easy to use and understand.

You can install Node Fetch by running the following command.

npm i node-fetch

Here is how you can use Node Fetch for web scraping.

  const fetch = require("node-fetch")
    const getData = async() => {
    const response = await fetch('https://en.wikipedia.org/wiki/JavaScript');
    const body = await response.text();



Osmosis is a web scraping library used for extracting HTML or XML documents from the web page.


  1. It has no large dependencies like jQuery and Cheerio.
  2. It has a clean promise-like interface.
  3. Fast parsing and small memory footprint.


  1. It supports retries and redirects limits.
  2. Supports single and multiple proxies.
  3. Supports form submission, session cookies, etc.

Is Node JS good for web scraping?

Node JS has various powerful libraries like Axios and Puppeteer, which makes it a preferred choice for data extraction. Also, the ease of extraction of data from dynamically rendered content has made it a significant option for web scraping tasks.

Also, the community support available for Node JS on platforms like StackOverflow, Discord, and Reddit is great and has answers to every problem you can face in your web scraping journey.

Let us discuss some advantages of using Node JS for web scraping:

Highly Scalable – Node JS can handle huge chunks of data without any problem, which is A Beneficial Feature Required For Web Scraping. This makes it a highly scalable language for web scraping purposes.

Simple Syntax – Node JS has a simple syntax making it easy to learn for beginners.

Vast Community Support – Node JS has great community support and a huge community of active developers who can help you out of a problem or provide guidance to solve your issues which will help you progress in your web scraping journey.


In this tutorial, we learned about various libraries in Node JS which can be used for scraping, we also learned their advantages and disadvantages.

If you think we can complete your web scraping tasks and help you collect data, feel free to contact us.

I hope this tutorial gave you a complete overview of web scraping with Node JS and JavaScript. Please do not hesitate to message me if I missed something. Follow me on Twitter. Thanks for reading!

Additional Resources

I have prepared a complete list of blogs for scraping Google on Node JS which can give you an idea of how to gather data from advanced websites like Google.

  1. Scrape Google Maps Reviews
  2. Scrape Google Shopping Results
  3. Scrape Google Scholar Results
  4. Web Scraping Google News Results

Similar Posts