Logo

Web scraping with Pupeteer

Web scraping is a technique used to extract data from websites. It allows you to extract the data from a website and save it to a local file or database. In this tutorial, we will be using the Puppeteer library in Node.js to perform web scraping.

Puppeteer is a Node.js library that provides a high-level API for controlling headless Chrome or Chromium. It allows you to open a website, interact with it, and scrape the data you need.

To use Puppeteer in your Node.js project, you need to first install it:

npm install puppeteer

Once Puppeteer is installed, you can use it in your project by requiring it in your code:

const puppeteer = require('puppeteer');

Next, we will open a new browser window using Puppeteer. You can specify the options, such as whether to run it in headless mode (without opening a window) or not, as well as other options like the viewport size.

const browser = await puppeteer.launch({
  headless: false, // set to true to run in headless mode
  defaultViewport: {
    width: 1200,
    height: 800
  }
});

Once the browser is open, we can open a new page and navigate to the website we want to scrape.

const page = await browser.newPage();
await page.goto('https://example.com/');

Now that we have opened the website, we can use Puppeteer to interact with it and scrape the data we need. For example, we can use the page.evaluate() function to execute JavaScript code on the page and extract the data we need.

const data = await page.evaluate(() => {
  // select the element(s) you want to scrape
  const elements = document.querySelectorAll('.element-class');

  // extract the data you need from the element(s)
  return Array.from(elements).map(element => element.innerText);
});

In the code above, we are selecting all elements with the class element-class and extracting the inner text of each element. The data is then returned as an array.

You can then use the data as you need, such as saving it to a file or database.

When you are finished with the scraping, be sure to close the browser to free up resources.

await browser.close();

That's it! You now know how to use Puppeteer to perform web scraping in Node.js. With this knowledge, you can easily extract data from websites and use it in your projects.