Skip to content

stacksapien/scalable-nodejs-web-crawler

Repository files navigation

Scalable-NodeJS-Web-Crawler

A NodeJS based web-Crawler which can scale on the go!

  • Support both Static & Dynamic Page Crawling

Pre-requisite:

  • Linux (Ubuntu)
  • Redis
  • Nodejs

Installation:

  • Install NodeJS by executing the below command in root directory of project:
     $ cd init-scripts/ $ sudo bash install-nodejs.sh
  • Install Redis
     $ sudo bash install-redis.sh
  • Install project dependencies. In root directory of the project execute the following command:
     $ npm install

Usage:

 $ node index.js "<url>" "path-to-store-url" $ node index.js "https://stacksapien.com" "./temp"
  • In Above Example, Files like valid-urls.txt, external-urls.txt & invalid-urls.txt will be generated in temp folder of your git project directory.

Releases

No releases published

Packages

No packages published