- Node.js (v18+)
- npm (v9+)
-
Clone the repository:
git clone https://github.com/jitsmaster/web-crawler-mcp.git cd web-crawler-mcp -
Install dependencies:
npm install
-
Build the project:
npm run build
Create a .env file with the following environment variables:
CRAWL_LINKS=false MAX_DEPTH=3 REQUEST_DELAY=1000 TIMEOUT=5000 MAX_CONCURRENT=5Start the MCP server:
npm startAdd the following to your MCP settings file:
{ "mcpServers": { "web-crawler": { "command": "node", "args": ["/path/to/web-crawler/build/index.js"], "env": { "CRAWL_LINKS": "false", "MAX_DEPTH": "3", "REQUEST_DELAY": "1000", "TIMEOUT": "5000", "MAX_CONCURRENT": "5" } } } }The server provides a crawl tool that can be accessed through MCP. Example usage:
{ "url": "https://example.com", "depth": 1 }| Environment Variable | Default | Description |
|---|---|---|
| CRAWL_LINKS | false | Whether to follow links |
| MAX_DEPTH | 3 | Maximum crawl depth |
| REQUEST_DELAY | 1000 | Delay between requests (ms) |
| TIMEOUT | 5000 | Request timeout (ms) |
| MAX_CONCURRENT | 5 | Maximum concurrent requests |