Open-source

How to Scrap Webpages using Using Node.js and Cheerio

Hazem Abbas

May 16, 2023 — 2 min read

Photo by GR Stocks / Unsplash

Table of Content

Web scraping is a process of extracting useful information from web pages. Node.js is a popular backend language that can be used for web scraping. In this tutorial, we will learn how to use Node.js for web scraping.

Prerequisites

Before we start, make sure you have the following installed on your machine:

Node.js
npm
A text editor

Step 1: Installing Required Packages

We will be using the following packages for our web scraping project:

Request: to make HTTP requests
Cheerio: to parse HTML
fs: to write data to files

To install these packages, run the following command in your terminal:

npm install request cheerio fs

Step 2: Making HTTP Requests

Now that we have installed the required packages, let's make our first HTTP request using the Request package.

const request = require('request');

request('<https://www.example.com>', (error, response, html) => {
  if (!error && response.statusCode === 200) {
    console.log(html);
  }
});

This code will make a GET request to https://www.example.com and log the HTML response to the console.

Step 3: Parsing HTML

Now that we have the HTML response, we need to parse it to extract the data we need. This is where the Cheerio package comes in.

const cheerio = require('cheerio');

const $ = cheerio.load(html);

$('h1').each((i, el) => {
  console.log($(el).text());
});

This code will load the HTML into Cheerio and use the each method to loop through all h1 elements and log their text content to the console.

Step 4: Writing Data to Files

Finally, we can write the extracted data to a file using the fs package.

const fs = require('fs');

fs.writeFile('data.txt', $('h1').text(), (err) => {
  if (err) throw err;
  console.log('Data saved to file');
});

This code will write the text content of all h1 elements to a file called data.txt.

Other Node.js Scrapping Libraries

Scrape-it: A simple yet powerful Node.js scrapping library.
Website Scrapper: another rich free and open-source JavaScript scrapping library
Puppeteer: Puppeteer is a Node.js library which provides a high-level API to control Chrome/Chromium over the DevTools Protocol. Puppeteer runs in headless mode by default, but can be configured to run in full ("headful") Chrome/Chromium.
Revenant: A headless browser powered by PhantomJS functions in Node.js. Based on the PhantomJS-Node bridge.

Conclusion

In this tutorial, we learned how to use Node.js for web scraping. We covered how to make HTTP requests, parse HTML using Cheerio, and write data to files using fs. With this knowledge, you can start scraping data from any website you want!

Open-source Tutorials Scrapping JavaScript TypeScript web development data engineering How to?

How to Scrap Webpages using Using Node.js and Cheerio

Hazem Abbas

Table of Content

Prerequisites

Step 1: Installing Required Packages

Step 2: Making HTTP Requests

Step 3: Parsing HTML

Step 4: Writing Data to Files

Other Node.js Scrapping Libraries

Conclusion

Are You Truly Ready to Put Your Mobile or Web App to the Test?

Articles

Systems

Development

Apps

Science - Healthcare

Open-source Apps

Medical Apps

Lists

Dev. Resources

Read more

Astro vs. Next.js vs. SvelteKit for eCommerce: Which Should You Pick?

Why Platformers Are ADHD Superpowers (And How My VR Game ‘Bubbles’ Blew My Friend’s Mind)

How Coding with AI Can Mess with Your Confidence (and Why That’s Okay)

The Unethical Sneaky Ads of Clickup.com Against Milanote; NOT Cool

Table of Content

Prerequisites

Step 1: Installing Required Packages

Step 2: Making HTTP Requests

Step 3: Parsing HTML

Step 4: Writing Data to Files

Other Node.js Scrapping Libraries

Conclusion

Read More Articles in Open-source

Bias AI and Hallucinations in AI Search: Is This the Comeback Opportunity for Traditional Search Engines as Google and Bing?

Is Google Search Dying? The AI Revolution That’s Changing How We Find Stuff

15 Free Docker-based CCTV and NVR Solutions - Get your Surveillance System Up in Mins

11 Amazing Things You Can Do With Godot (Even If You’re New!) – A Journey From Global Game Jam to Our Local Godot Workshop

Puter OS: A Step-by-Step Guide to Installing Your Own Privacy-Centric Internet OS

When AI Becomes Your Coding Buddy: How to Use It Without Losing Yourself

Articles

Systems

Development

Apps

Science - Healthcare

Open-source Apps

Medical Apps

Lists

Dev. Resources

Read more

Astro vs. Next.js vs. SvelteKit for eCommerce: Which Should You Pick?

Why Platformers Are ADHD Superpowers (And How My VR Game ‘Bubbles’ Blew My Friend’s Mind)

How Coding with AI Can Mess with Your Confidence (and Why That’s Okay)

The Unethical Sneaky Ads of Clickup.com Against Milanote; NOT Cool