Skip to content

adheizal/sitefetch

 
 

Repository files navigation

sitefetch

Fetch an entire site and save it as a text file (to be used with AI models).

image

Install

One-off usage (choose one of the followings):

bunx sitefetch npx sitefetch pnpx sitefetch

Install globally (choose one of the followings):

bun i -g sitefetch npm i -g sitefetch pnpm i -g sitefetch

Usage

sitefetch https://egoist.dev -o site.txt # or better concurrency sitefetch https://egoist.dev -o site.txt --concurrency 10

Match specific pages

Use the -m, --match flag to specify the pages you want to fetch:

sitefetch https://vite.dev -m "/blog/**" -m "/guide/**"

The match pattern is tested against the pathname of target pages, powered by micromatch, you can check out all the supported matching features.

Content selector

We use mozilla/readability to extract readable content from the web page, but on some pages it might return irrelevant contents, in this case you can specify a CSS selector so we know where to find the readable content:

sitefetch https://vite.dev --content-selector ".content" 

Cookie Support

To crawl protected websites that require authentication, you can use the --cookies-file flag to provide cookies from your browser:

sitefetch https://example.com --cookies-file cookies.txt

Exporting Cookies from Browser

  1. Install a browser extension to export cookies in Netscape format:

  2. Login to the protected site in your browser

  3. Export cookies using the extension:

    • Click the extension icon
    • Select "Export" → "Netscape format"
    • Save as cookies.txt
  4. Use with sitefetch:

    sitefetch https://protected-site.com --cookies-file cookies.txt -o output.txt

Security Notice: Cookies contain authentication credentials. Never commit cookies.txt to version control or share it publicly. Delete it after use.

Plug

If you like this, please check out my LLM chat app: https://chatwise.app

API

import { fetchSite } from "sitefetch" await fetchSite("https://egoist.dev", { //...options })

Check out options in types.ts.

License

MIT.

About

Fetch an entire site and save it as a text file (to be used with AI models).

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • TypeScript 92.1%
  • JavaScript 7.9%