sitefetch

Fetch an entire site and save it as a text file (to be used with AI models).

Install

One-off usage (choose one of the followings):

bunx sitefetch npx sitefetch pnpx sitefetch

Install globally (choose one of the followings):

bun i -g sitefetch npm i -g sitefetch pnpm i -g sitefetch

Usage

sitefetch https://egoist.dev -o site.txt # or better concurrency sitefetch https://egoist.dev -o site.txt --concurrency 10

Match specific pages

Use the -m, --match flag to specify the pages you want to fetch:

sitefetch https://vite.dev -m "/blog/**" -m "/guide/**"

The match pattern is tested against the pathname of target pages, powered by micromatch, you can check out all the supported matching features.

Content selector

We use mozilla/readability to extract readable content from the web page, but on some pages it might return irrelevant contents, in this case you can specify a CSS selector so we know where to find the readable content:

sitefetch https://vite.dev --content-selector ".content"

Cookie Support

To crawl protected websites that require authentication, you can use the --cookies-file flag to provide cookies from your browser:

sitefetch https://example.com --cookies-file cookies.txt

Exporting Cookies from Browser

Install a browser extension to export cookies in Netscape format:
- Chrome: Get cookies.txt LOCALLY
- Firefox: cookies.txt
- Edge: Get cookies.txt LOCALLY
Login to the protected site in your browser
Export cookies using the extension:
- Click the extension icon
- Select "Export" → "Netscape format"
- Save as cookies.txt

Use with sitefetch:

sitefetch https://protected-site.com --cookies-file cookies.txt -o output.txt

Security Notice: Cookies contain authentication credentials. Never commit cookies.txt to version control or share it publicly. Delete it after use.

Plug

If you like this, please check out my LLM chat app: https://chatwise.app

API

import { fetchSite } from "sitefetch" await fetchSite("https://egoist.dev", { //...options })

Check out options in types.ts.

License

MIT.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
docs/plans		docs/plans
src		src
test/fixtures		test/fixtures
.gitignore		.gitignore
.prettierrc		.prettierrc
LICENSE		LICENSE
README.md		README.md
bun.lockb		bun.lockb
package-lock.json		package-lock.json
package.json		package.json
rolldown.config.js		rolldown.config.js
shims.d.ts		shims.d.ts
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

sitefetch

Install

Usage

Match specific pages

Content selector

Cookie Support

Exporting Cookies from Browser

Plug

API

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

sitefetch

Install

Usage

Match specific pages

Content selector

Cookie Support

Exporting Cookies from Browser

Plug

API

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages