Recursive Language Models - DSpy (RLM)

Using Python and DSpy’s Recursive Language Model implementation to handle unbounded context lengths.

Based on the paper by Alex Zhang and Omar Khattab (MIT, 2025)

What is RLM?

RLM enables language models to process extremely long contexts (100k+ tokens) by:

Storing context as a Python variable instead of in the prompt
Allowing the LM to recursively explore and partition the context
Avoiding "context rot" (performance degradation with long context)

Instead of this:

llm.complete(prompt="Summarize this", context=huge_document) # Context rot!

RLM does this:

rlm = RLM(model="gpt-5-mini") result = rlm.completion( query="Summarize this", context=huge_document # Stored as variable, not in prompt )

The LM can then peek, search, and recursively process the context adaptively.

Installation

Note: This package is not yet published to PyPI. Install from source:

# Clone the repository git clone https://github.com/codecrack3/recursive-llm.git cd recursive-llm # Install in editable mode pip install -e . # Or install with dev dependencies pip install -e ".[dev]"

Future: Once published to PyPI, you'll be able to install with pip install recursive-llm

Requirements

Python 3.9 or higher
An API key for your chosen LLM provider (OpenAI, Anthropic, etc.)
Or a local model setup (Ollama, llama.cpp, etc.)

Quick Start

from rlm import RLM # Initialize with any LLM (auto-selects best backend) rlm = RLM(model="gpt-4o-mini") # Process long context result = rlm.completion( query="What are the main themes in this document?", context=long_document ) print(result)

DSPy + E2B Sandbox

from rlm import RLM # RLM uses DSPy for LLM orchestration with E2B cloud sandbox rlm = RLM( model="gpt-4o-mini", sandbox='e2b' # E2B cloud sandbox (or 'auto' to auto-detect) ) result = rlm.completion(query, context)

API Keys Setup

Set your API key via environment variable or pass it directly:

export OPENAI_API_KEY="sk-..." # or ANTHROPIC_API_KEY, etc.

Or pass directly in code:

rlm = RLM(model="gpt-5-mini", api_key="sk-...")

Supported Models

Works with 100+ LLM providers via DSPy and OpenRouter:

# OpenAI rlm = RLM(model="gpt-4o-mini") rlm = RLM(model="gpt-4o") # Anthropic rlm = RLM(model="claude-sonnet-4") rlm = RLM(model="claude-sonnet-4-20250514") # OpenRouter (100+ models with single API key) rlm = RLM(model="openrouter/anthropic/claude-3.5-sonnet") rlm = RLM(model="openrouter/openai/gpt-4o-mini") rlm = RLM(model="openrouter/google/gemini-pro") rlm = RLM(model="openrouter/meta-llama/llama-3.1-70b") # Ollama (local) rlm = RLM(model="ollama/llama3.2") rlm = RLM(model="ollama/mistral") # llama.cpp (local) rlm = RLM( model="openai/local", api_base="http://localhost:8000/v1" ) # Azure OpenAI rlm = RLM(model="azure/gpt-4-deployment") # And many more...

Advanced Usage

Two Models (Optimize Cost)

Use a cheaper model for recursive calls:

rlm = RLM( model="gpt-5", # Root LM (main decisions) recursive_model="gpt-5-mini" # Recursive calls (cheaper) )

Async API

For better performance with parallel recursive calls:

import asyncio async def main(): rlm = RLM(model="gpt-5-mini") result = await rlm.acompletion(query, context) print(result) asyncio.run(main())

Configuration

rlm = RLM( model="gpt-5-mini", max_depth=5, # Maximum recursion depth max_iterations=20, # Maximum REPL iterations temperature=0.7, # LLM parameters timeout=60 )

How It Works

Context is stored as a variable in a Python REPL environment
Root LM gets only the query plus instructions

LM can explore context using Python code:

# Peek at context context[:1000] # Search with regex import re re.findall(r'pattern', context) # Recursive processing recursive_llm("extract dates", context[1000:2000])

Returns final answer via FINAL(answer) statement

Graph Tracking & Visualization

Visualize recursive LLM calls with interactive NetworkX graphs:

from rlm import RLM # Enable graph tracking rlm = RLM( model="gpt-4o-mini", enable_graph_tracking=True, graph_output_path="./rlm_graph.html" ) result = rlm.completion(query="Analyze this", context=document) # Graph automatically saved to ./rlm_graph.html

The interactive HTML visualization shows:

Hierarchical structure: See the complete call tree
Node details: Input/output for each recursive call
REPL iterations: Code generated and executed at each step
Performance metrics: Iterations and LLM calls per node
Error tracking: Which nodes encountered issues

Programmatic access:

import networkx as nx # Get the graph object graph = rlm.get_graph() print(f"Total nodes: {graph.number_of_nodes()}") # Analyze the graph structure for node_id, node_data in graph.nodes(data=True): print(f"Depth {node_data['depth']}: {node_data['iterations']} iterations") # Save to different location rlm.save_graph("./analysis/custom_graph.html")

Learn more: See docs/GRAPH_TRACKING.md for full documentation.

🔍 LLM Call History (Debugging)

Track and inspect all LLM calls for debugging prompts and responses:

# Enable history tracking rlm = RLM( model="gpt-4o-mini", enable_history=True, # Enable LLM call history history_output_path="./logs/history.json" # Auto-save to JSON (optional) ) result = rlm.completion(query="Your query", context=document) # Print history summary (shows model, messages, outputs) rlm.print_history(detailed=False) # Print detailed history with full prompts/responses rlm.print_history(detailed=True, max_length=2000) # Get raw history for programmatic access history = rlm.get_history() print(f"Total LLM calls: {len(history)}") # Save history to JSON manually rlm.save_history("./my_history.json", pretty=True) # Clear history for new run rlm.clear_history()

Use cases:

🐛 Debug prompts: See exactly what's being sent to the LLM (shows full messages/inputs)
📊 Analyze responses: Inspect the raw outputs from each call
🔧 Optimize prompts: Iterate on prompt engineering
📈 Monitor usage: Track token usage and costs (exports to JSON)
💾 Export logs: Auto-save or manually export history as JSON
🔄 Combine with graph tracking: Visualize + inspect LLM calls

Tip: Combine enable_history=True and enable_graph_tracking=True for comprehensive debugging!

Examples

See the examples/ directory for complete working examples:

basic_usage.py - Simple completion with OpenAI
dspy_usage.py - DSPy backend with E2B sandbox
openrouter_usage.py - OpenRouter multi-model access
e2b_usage.py - E2B cloud sandbox features
ollama_local.py - Using Ollama locally
two_models.py - Cost optimization with two models
long_document.py - Processing 50k+ token documents
data_extraction.py - Extract structured data from text
multi_file.py - Process multiple documents
custom_config.py - Advanced configuration
graph_tracking.py - NetworkX visualization of recursive calls

Run an example:

# Set your API key first export OPENAI_API_KEY="sk-..." # Run example python examples/basic_usage.py

Performance

Paper Results

On OOLONG benchmark (132k tokens):

GPT-5: baseline
RLM(GPT-5-Mini): 33% better than GPT-5 at similar cost

Our Benchmark Results

Tested with GPT-5-Mini on structured data queries (counting, filtering) across 5 different test cases:

60k token contexts:

RLM: 80% accurate (4/5 correct)
Direct OpenAI: 0% accurate (0/5 correct, all returned approximations)

RLM wins on accuracy. Both complete requests, but only RLM gives correct answers.

150k+ token contexts:

Direct OpenAI: Fails (rate limit errors)
RLM: Works (processes 1M+ tokens successfully)

Token efficiency: RLM uses ~2-3k tokens per query vs 95k+ for direct approach, since context is stored as a variable instead of being sent in prompts.

Development

# Clone repository git clone https://github.com/codecrack3/recursive-llm.git cd recursive-llm # Install with dev dependencies pip install -e ".[dev]" # Run tests pytest tests/ -v # Run tests with coverage pytest tests/ -v --cov=src/rlm --cov-report=term-missing # Type checking mypy src/rlm # Linting ruff check src/rlm # Format code black src/rlm tests examples

Architecture

RLM uses DSPy for LLM orchestration with flexible sandbox options:

RLM (DSPy Backend) ├── Custom RLMModule (DSPy module for REPL pattern) ├── E2B Sandbox (cloud code execution) └── RestrictedPython Fallback (local execution)

Key Features:

Programmatic LLM orchestration via DSPy
Better prompt optimization and composability
Supports E2B cloud sandboxes for enhanced security
Custom RLMModule optimized for recursive REPL pattern
RestrictedPython fallback for local execution

Sandbox Selection

Automatic Selection (Recommended)

# Auto-selects E2B if API key set, otherwise RestrictedPython rlm = RLM(model="gpt-4o-mini")

Explicit Sandbox Selection

# Use E2B cloud sandbox (requires E2B_API_KEY) rlm = RLM(model="gpt-4o-mini", sandbox='e2b') # Use RestrictedPython (no API key needed, runs locally) rlm = RLM(model="gpt-4o-mini", sandbox='restricted')

Environment Variables

Configure sandbox preferences via environment variables:

# Sandbox selection export RLM_SANDBOX=e2b # or 'restricted', 'auto' # E2B API key (get from https://e2b.dev) export E2B_API_KEY=your-key-here

What's New in v0.2.0

Simplified Architecture

DSPy Only: Removed legacy LiteLLM backend for simpler codebase
Cleaner API: No more backend selection - DSPy is the only backend
Better Maintainability: Reduced complexity and dependencies

DSPy Integration

Programmatic LLM Orchestration: Use DSPy for better prompt engineering
Custom RLMModule: Purpose-built DSPy module for recursive REPL pattern
Automatic Optimization: DSPy's optimization capabilities (optional)

E2B Sandbox

Cloud Execution: Secure sandboxed code execution in isolated containers
Enhanced Security: Better isolation than local RestrictedPython
Package Installation: Install Python packages on-the-fly if needed
Auto-Fallback: Gracefully falls back to RestrictedPython if E2B unavailable

Migration Guide

From v0.1.0 to v0.2.0

The LiteLLM backend has been removed in v0.2.0. If you were using the backend parameter, simply remove it:

Before (v0.1.0):

from rlm import RLM rlm = RLM(model="gpt-4o-mini", backend='dspy') result = rlm.completion(query, context)

After (v0.2.0):

from rlm import RLM rlm = RLM(model="gpt-4o-mini") # backend parameter removed result = rlm.completion(query, context)

Breaking Changes:

Removed backend parameter (only DSPy is supported now)
Removed RLMLiteLLM class
Removed Backend type from exports
Removed litellm dependency

Setting Up E2B

Get API key from https://e2b.dev
Add to .env file:
```
E2B_API_KEY=your-key-here
```
RLM will automatically use E2B when available

Setting Up OpenRouter

OpenRouter provides access to 100+ models through a single API key:

Sign up at https://openrouter.ai
Get your API key at https://openrouter.ai/keys
Add to .env file:
```
OPENROUTER_API_KEY=your-key-here
```

Usage:

# Anthropic Claude via OpenRouter rlm = RLM(model="openrouter/anthropic/claude-3.5-sonnet") # OpenAI GPT via OpenRouter rlm = RLM(model="openrouter/openai/gpt-4o-mini") # Google Gemini via OpenRouter rlm = RLM(model="openrouter/google/gemini-pro") # Meta Llama via OpenRouter rlm = RLM(model="openrouter/meta-llama/llama-3.1-70b-instruct")

Benefits:

✅ Access 100+ models with single API key
✅ No rate limits on most models
✅ Competitive pricing
✅ Automatic fallback if model unavailable
✅ Easy model switching for testing

Cost optimization:

# Use premium model for root, economical for recursion rlm = RLM( model="openrouter/anthropic/claude-3.5-sonnet", recursive_model="openrouter/anthropic/claude-3-haiku" )

See full model list: https://openrouter.ai/models

Limitations

REPL execution is sequential (no parallel code execution yet)
No prefix caching (future enhancement)
Recursion depth is limited (configurable via max_depth)
No streaming support yet
E2B requires API key for cloud sandboxes (free tier available)

Troubleshooting

"Max iterations exceeded"

Increase max_iterations parameter
Simplify your query
Check if the model is getting stuck in a loop

"API key not found"

Set the appropriate environment variable (e.g., OPENAI_API_KEY)
Or pass api_key parameter to RLM constructor

"Model not found"

Check model name format for your provider
See DSPy docs: https://dspy-docs.vercel.app/

Using Ollama

Make sure Ollama is running: ollama serve
Pull a model first: ollama pull llama3.2
Use model format: ollama/model-name

Contributing

Contributions welcome! Please:

Fork the repository
Create a feature branch
Add tests for new features
Ensure all tests pass (pytest tests/)
Follow code style (use black and ruff)
Submit a pull request

Citation

This implementation is based on the RLM paper by Alex Zhang and Omar Khattab.

To cite this implementation:

@software{rlm_python, title = {recursive-llm-dspy: Using Python and DSpy’s Recursive Language Model implementation to handle unbounded context lengths}, author = {codecrack3}, year = {2025}, url = {https://github.com/codecrack3/recursive-llm} } @software{rlm_python, title = {recursive-llm: Python Implementation of Recursive Language Models}, author = {ysz}, year = {2025}, url = {https://github.com/ysz/recursive-llm} } @software{rlm_python, title = {Recursive Language Models (minimal version)}, author = {alexzhang13}, year = {2025}, url = {https://github.com/alexzhang13/rlm} }

To cite the original paper:

@misc{zhang2025rlm, title = {Recursive Language Models}, author = {Zhang, Alex and Khattab, Omar}, year = {2025}, month = {October}, url = {https://alexzhang13.github.io/blog/2025/rlm/} }

License

MIT License - see LICENSE file for details

Acknowledgments

Based on the Recursive Language Models paper by Alex Zhang and Omar Khattab from MIT CSAIL.

Built using:

DSPy for LLM orchestration
E2B for cloud code execution
RestrictedPython for safe local code execution

Links

Paper: https://alexzhang13.github.io/blog/2025/rlm/
DSPy Docs: https://dspy-docs.vercel.app/
E2B Docs: https://e2b.dev/docs
Issues: https://github.com/codecrack3/recursive-llm/issues

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
examples		examples
images		images
src/rlm		src/rlm
tests		tests
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
RLM.MD		RLM.MD
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Recursive Language Models - DSpy (RLM)

What is RLM?

Installation

Requirements

Quick Start

DSPy + E2B Sandbox

API Keys Setup

Supported Models

Advanced Usage

Two Models (Optimize Cost)

Async API

Configuration

How It Works

Graph Tracking & Visualization

🔍 LLM Call History (Debugging)

Examples

Performance

Paper Results

Our Benchmark Results

Development

Architecture

Sandbox Selection

Automatic Selection (Recommended)

Explicit Sandbox Selection

Environment Variables

What's New in v0.2.0

Simplified Architecture

DSPy Integration

E2B Sandbox

Migration Guide

From v0.1.0 to v0.2.0

Setting Up E2B

Setting Up OpenRouter

Limitations

Troubleshooting

"Max iterations exceeded"

"API key not found"

"Model not found"

Using Ollama

Contributing

Citation

License

Acknowledgments

Links

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages