Fetch an entire site and save it as a text file (to be used with AI models).
One-off usage (choose one of the followings):
bunx sitefetch npx sitefetch pnpx sitefetchInstall globally (choose one of the followings):
bun i -g sitefetch npm i -g sitefetch pnpm i -g sitefetchsitefetch https://egoist.dev -o site.txt # or better concurrency sitefetch https://egoist.dev -o site.txt --concurrency 10Use the -m, --match flag to specify the pages you want to fetch:
sitefetch https://vite.dev -m "/blog/**" -m "/guide/**"The match pattern is tested against the pathname of target pages, powered by micromatch, you can check out all the supported matching features.
We use mozilla/readability to extract readable content from the web page, but on some pages it might return irrelevant contents, in this case you can specify a CSS selector so we know where to find the readable content:
sitefetch https://vite.dev --content-selector ".content" To crawl protected websites that require authentication, you can use the --cookies-file flag to provide cookies from your browser:
sitefetch https://example.com --cookies-file cookies.txt-
Install a browser extension to export cookies in Netscape format:
- Chrome: Get cookies.txt LOCALLY
- Firefox: cookies.txt
- Edge: Get cookies.txt LOCALLY
-
Login to the protected site in your browser
-
Export cookies using the extension:
- Click the extension icon
- Select "Export" → "Netscape format"
- Save as
cookies.txt
-
Use with sitefetch:
sitefetch https://protected-site.com --cookies-file cookies.txt -o output.txt
Security Notice: Cookies contain authentication credentials. Never commit cookies.txt to version control or share it publicly. Delete it after use.
If you like this, please check out my LLM chat app: https://chatwise.app
import { fetchSite } from "sitefetch" await fetchSite("https://egoist.dev", { //...options })Check out options in types.ts.
MIT.
