Skip to content

nuvoqueonline/TreeDex

 
 

Repository files navigation

TreeDex

Tree-based, vectorless document RAG framework.

Index any document into a navigable tree structure, then retrieve relevant sections using any LLM. No vector databases, no embeddings — just structured tree retrieval.

Available for both Python and Node.js — same API, same index format, fully cross-compatible.

Open In Colab PyPI npm License: MIT Python 3.10+ Node 18+


How It Works

How TreeDex Works

  1. Load — Extract pages from any supported format
  2. Index — LLM analyzes page groups and extracts hierarchical structure
  3. Build — Flat sections become a tree with page ranges and embedded text
  4. Query — LLM selects relevant tree nodes for your question
  5. Return — Get context text, source pages, and reasoning

Why TreeDex instead of Vector DB?

TreeDex vs Vector DB


Supported LLM Providers

LLM Providers

TreeDex works with every major AI provider out of the box. Pick what works for you:

One-liner backends (zero config)

Backend Provider Default Model Python Deps Node.js Deps
GeminiLLM Google gemini-2.0-flash google-generativeai @google/generative-ai
OpenAILLM OpenAI gpt-4o openai openai
ClaudeLLM Anthropic claude-sonnet-4-20250514 anthropic @anthropic-ai/sdk
MistralLLM Mistral AI mistral-large-latest mistralai @mistralai/mistralai
CohereLLM Cohere command-r-plus cohere cohere-ai
GroqLLM Groq llama-3.3-70b-versatile groq groq-sdk
TogetherLLM Together AI Llama-3-70b-chat-hf None None (fetch)
FireworksLLM Fireworks llama-v3p1-70b-instruct None None (fetch)
OpenRouterLLM OpenRouter claude-sonnet-4 None None (fetch)
DeepSeekLLM DeepSeek deepseek-chat None None (fetch)
CerebrasLLM Cerebras llama-3.3-70b None None (fetch)
SambanovaLLM SambaNova Llama-3.1-70B-Instruct None None (fetch)
HuggingFaceLLM HuggingFace Mistral-7B-Instruct None None (fetch)
OllamaLLM Ollama (local) llama3 None None (fetch)

Universal backends

Backend Use case Dependencies
OpenAICompatibleLLM Any OpenAI-compatible endpoint (URL + key) None
LiteLLM 100+ providers via litellm library (Python only) litellm
FunctionLLM Wrap any function None
BaseLLM Subclass to build your own None

Quick Start

Install

PythonNode.js
pip install treedex # With optional LLM SDK pip install treedex[gemini] pip install treedex[openai] pip install treedex[claude] pip install treedex[all]
npm install treedex # With optional LLM SDK npm install treedex openai npm install treedex @google/generative-ai npm install treedex @anthropic-ai/sdk

Pick your LLM and go

PythonNode.js / TypeScript
from treedex import TreeDex, GeminiLLM llm = GeminiLLM(api_key="YOUR_KEY") index = TreeDex.from_file("doc.pdf", llm=llm) result = index.query("What is the main argument?") print(result.context) print(result.pages_str) # "pages 5-8, 12-15"
import { TreeDex, GeminiLLM } from "treedex"; const llm = new GeminiLLM("YOUR_KEY"); const index = await TreeDex.fromFile("doc.pdf", llm); const result = await index.query("What is the main argument?"); console.log(result.context); console.log(result.pagesStr); // "pages 5-8, 12-15"

All providers work the same way

PythonNode.js / TypeScript
from treedex import * # Google Gemini llm = GeminiLLM(api_key="YOUR_KEY") # OpenAI llm = OpenAILLM(api_key="sk-...") # Claude llm = ClaudeLLM(api_key="sk-ant-...") # Groq (fast inference) llm = GroqLLM(api_key="gsk_...") # Together AI llm = TogetherLLM(api_key="...") # DeepSeek llm = DeepSeekLLM(api_key="...") # OpenRouter (access any model) llm = OpenRouterLLM(api_key="...") # Local Ollama llm = OllamaLLM(model="llama3") # Any OpenAI-compatible endpoint llm = OpenAICompatibleLLM( base_url="https://your-api.com/v1", api_key="...", model="model-name", )
import { /* any backend */ } from "treedex"; // Google Gemini const llm = new GeminiLLM("YOUR_KEY"); // OpenAI const llm = new OpenAILLM("sk-..."); // Claude const llm = new ClaudeLLM("sk-ant-..."); // Groq (fast inference) const llm = new GroqLLM("gsk_..."); // Together AI const llm = new TogetherLLM("..."); // DeepSeek const llm = new DeepSeekLLM("..."); // OpenRouter (access any model) const llm = new OpenRouterLLM("..."); // Local Ollama const llm = new OllamaLLM("llama3"); // Any OpenAI-compatible endpoint const llm = new OpenAICompatibleLLM({ baseUrl: "https://your-api.com/v1", apiKey: "...", model: "model-name", });

Wrap any function

PythonNode.js / TypeScript
from treedex import FunctionLLM llm = FunctionLLM(lambda p: my_api(p))
import { FunctionLLM } from "treedex"; const llm = new FunctionLLM((p) => myApi(p));

Build your own backend

PythonNode.js / TypeScript
from treedex import BaseLLM class MyLLM(BaseLLM): def generate(self, prompt: str) -> str: return my_api_call(prompt)
import { BaseLLM } from "treedex"; class MyLLM extends BaseLLM { async generate(prompt: string): Promise<string> { return await myApiCall(prompt); } }

Agentic RAG — get direct answers

Standard mode returns raw context. Agentic mode goes one step further — it retrieves the relevant sections, then generates a direct answer.

PythonNode.js / TypeScript
# Standard: returns context + page ranges result = index.query("What is X?") print(result.context) # Agentic: returns a direct answer result = index.query("What is X?", agentic=True) print(result.answer) # LLM-generated answer print(result.pages_str) # source pages
// Standard: returns context + page ranges const result = await index.query("What is X?"); console.log(result.context); // Agentic: returns a direct answer const result = await index.query("What is X?", { agentic: true }); console.log(result.answer); // LLM-generated answer console.log(result.pagesStr); // source pages

Swap LLM at query time

# Build index with one LLM index = TreeDex.from_file("doc.pdf", llm=gemini_llm) # Query with a different one — same index, different brain result = index.query("...", llm=groq_llm)

Save and load indexes

Indexes are saved as JSON. An index created in Python loads in Node.js and vice versa.

PythonNode.js / TypeScript
# Save index.save("my_index.json") # Load index = TreeDex.load("my_index.json", llm=llm)
// Save await index.save("my_index.json"); // Load const index = await TreeDex.load("my_index.json", llm);

Supported Document Formats

Format Loader Python Deps Node.js Deps
PDF PDFLoader pymupdf pdfjs-dist (included)
TXT / MD TextLoader None None
HTML HTMLLoader None (stdlib) htmlparser2 (optional, has fallback)
DOCX DOCXLoader python-docx mammoth (optional)

Use auto_loader(path) / autoLoader(path) for automatic format detection.


API Reference

TreeDex

Method Python Node.js
Build from file TreeDex.from_file(path, llm) await TreeDex.fromFile(path, llm)
Build from pages TreeDex.from_pages(pages, llm) await TreeDex.fromPages(pages, llm)
Create from tree TreeDex.from_tree(tree, pages) TreeDex.fromTree(tree, pages)
Query index.query(question) await index.query(question)
Agentic query index.query(q, agentic=True) await index.query(q, { agentic: true })
Save index.save(path) await index.save(path)
Load TreeDex.load(path, llm) await TreeDex.load(path, llm)
Show tree index.show_tree() index.showTree()
Stats index.stats() index.stats()
Find large index.find_large_sections() index.findLargeSections()

QueryResult

Property Python Node.js Description
Context .context .context Concatenated text from relevant sections
Node IDs .node_ids .nodeIds IDs of selected tree nodes
Page ranges .page_ranges .pageRanges [(start, end), ...] page ranges
Pages string .pages_str .pagesStr Human-readable: "pages 5-8, 12-15"
Reasoning .reasoning .reasoning LLM's explanation for selection
Answer .answer .answer LLM-generated answer (agentic mode only)

Cross-language Index Compatibility

TreeDex uses the same JSON index format in both Python and Node.js. All field names use snake_case in the JSON:

{ "version": "1.0", "framework": "TreeDex", "tree": [{ "structure": "1", "title": "...", "node_id": "0001", ... }], "pages": [{ "page_num": 0, "text": "...", "token_count": 123 }] }

Build an index with Python, query it from Node.js (or vice versa).


Benchmarks

TreeDex vs Vector DB vs Naive Chunking

Comparison Benchmark

Real benchmark on the same document (NCERT Electromagnetic Waves, 14 pages, 10 queries). All three methods retrieve from the same content — only the indexing and retrieval approach differs. Auto-generated by CI on every push.

TreeDex Stats

Benchmarks

Feature TreeDex Vector RAG Naive Chunking
Page Attribution Exact source pages Approximate None
Structure Preserved Full tree hierarchy None None
Index Format Human-readable JSON Opaque vectors Text chunks
Embedding Model Not needed Required Not needed
Infrastructure None (JSON file) Vector DB required None
Core Dependencies 2 5-8+ 2-5

Run your own benchmarks:

# Python python benchmarks/run_benchmark.py # Node.js npx tsx benchmarks/node/run-benchmark.ts

Architecture

Architecture


Project Structure

treedex/ ├── treedex/ # Python package │ ├── core.py │ ├── llm_backends.py │ ├── loaders.py │ ├── pdf_parser.py │ ├── tree_builder.py │ ├── tree_utils.py │ └── prompts.py ├── src/ # TypeScript source │ ├── index.ts │ ├── core.ts │ ├── llm-backends.ts │ ├── loaders.ts │ ├── pdf-parser.ts │ ├── tree-builder.ts │ ├── tree-utils.ts │ ├── prompts.ts │ └── types.ts ├── tests/ # Python tests (pytest) ├── test/ # Node.js tests (vitest) ├── examples/ # Python examples ├── examples/node/ # Node.js examples ├── benchmarks/ # Python benchmarks ├── benchmarks/node/ # Node.js benchmarks ├── pyproject.toml # Python package config ├── package.json # npm package config ├── tsconfig.json # TypeScript config └── tsup.config.ts # Build config (ESM + CJS) 

Running Tests

PythonNode.js
pip install -e ".[dev]" pytest pytest --cov=treedex pytest tests/test_core.py -v
npm install npm test npm run test:watch npm run typecheck

Examples

Python

python examples/quickstart.py path/to/document.pdf python examples/multi_provider.py python examples/custom_llm.py python examples/save_load.py path/to/document.pdf

Node.js

npx tsx examples/node/quickstart.ts path/to/document.pdf npx tsx examples/node/multi-provider.ts npx tsx examples/node/custom-llm.ts npx tsx examples/node/save-load.ts path/to/document.pdf

Contributing

git clone https://github.com/mithun50/TreeDex.git cd TreeDex # Python development pip install -e ".[dev]" pytest # Node.js development npm install npm run build npm test

License

MIT License — Mithun Gowda B

About

Tree-based, vectorless document RAG framework. Connect any LLM via URL/API key.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 56.2%
  • TypeScript 41.0%
  • Jupyter Notebook 2.8%