GitHub - erno-szabados/booksmart: BookSmart is a local-first book analysis pipeline for long books that do not fit into a single model context window.

BookSmart

BookSmart is a local-first book analysis pipeline for long books that do not fit into a single model context window. It uses hierarchical summarization with LangChain map-reduce chains, a local llama.cpp-compatible chat endpoint, a local embedding endpoint, Chroma for retrieval, and Chainlit for the UI.

What It Supports

Whole-book summaries.
Chapter summaries.
Question answering over a processed book.
Question answering scoped by chapter.
Nearby chunk summaries for a query.
A larger local section view assembled around the most relevant chunk.

Processing Model

Split a text book into chapters when chapter headings are present.
Split large chapters into context-safe chunks.
Run LangChain map-reduce summarization over the chunks.
Persist chunk summaries, chapter summaries, and a whole-book summary.
Build a local Chroma index from the source chunks.
Answer questions using global summary + chapter summaries + retrieved chunks + a larger local section.

Rebuild Policy

Re-ingesting or uploading a book rebuilds its artifacts so stale outputs are not reused.
Continuing with an existing processed book reuses the persisted artifacts already on disk.

Configuration

Copy booksmart.example.toml to booksmart.toml or set environment variables directly.

Default endpoints:

Chat LLM: http://localhost:8001/v1
Embeddings: http://localhost:8002/v1

Config values:

data_dir
llm_base_url
embedding_base_url
api_key
model
embedding_model
temperature
chunk_size_chars
chunk_overlap_chars
embedding_chunk_size_chars
embedding_chunk_overlap_chars
chat_llm_base_url
chat_model
retrieval_k
section_char_budget
reduce_max_tokens

Install

uv sync

CLI Usage

Ingest a book from a text file:

uv run booksmart ingest /path/to/book.txt

List processed books:

uv run booksmart list

Show a whole-book summary:

uv run booksmart summary <book-slug>

Ask a question:

uv run booksmart ask <book-slug> "What drives the protagonist's decision near the end?"

Chainlit UI

Run the UI locally:

uv run chainlit run chainlit_app.py

Useful commands inside the chat:

/books
/use <book-slug>
/ingest <path-to-book.txt>
/upload
/summary
/chapter <number-or-title>
/nearby <question>
/section <question>

Data Layout

Each processed book is stored under data/books/<slug>/ with:

source.txt
manifest.json
chapters.json
chunks.json
map_summaries.json
chapter_summaries.json
global_summary.md
chroma/

Notes

Input support is intentionally focused on plain text for v1.
The implementation assumes OpenAI-compatible llama.cpp endpoints.
The section context budget is character-based, which is a pragmatic approximation for local models.
Summarization and embeddings now use separate chunk budgets. Keep chunk_size_chars large enough for efficient map-reduce summarization, and keep embedding_chunk_size_chars small enough for your embedding endpoint's token limit.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
booksmart		booksmart
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
booksmart.example.toml		booksmart.example.toml
chainlit.md		chainlit.md
chainlit_app.py		chainlit_app.py
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BookSmart

What It Supports

Processing Model

Rebuild Policy

Configuration

Install

CLI Usage

Chainlit UI

Data Layout

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

BookSmart

What It Supports

Processing Model

Rebuild Policy

Configuration

Install

CLI Usage

Chainlit UI

Data Layout

Notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages