A powerful tool to index Markdown documentation from GitHub repositories and ask questions with AI-powered answers and citations.
- 📚 Index
.mdand.mdxfiles from any GitHub repository - 🔍 Semantic search with vector embeddings
- 💬 AI-powered question answering with source citations
- 🎯 Multiple LLM providers (OpenAI, Anthropic)
- 🚀 Fast retrieval with ChromaDB
- 🌐 Web UI and CLI interface
- Clone this repository
- Install dependencies:
pip install -r requirements.txt- Copy
.env.exampleto.envand add your API keys:
cp .env.example .env- Edit
.envwith your API keys
python -m deepwiki index <github_repo_url>Example:
python -m deepwiki index https://github.com/anthropics/anthropic-sdk-pythonpython -m deepwiki ask "How do I use streaming with the SDK?"python -m deepwiki serveThen open http://localhost:8000 in your browser.
python -m deepwiki listpython -m deepwiki clear- Crawling: Fetches all
.mdand.mdxfiles from the specified GitHub repository - Chunking: Splits documents into manageable chunks with overlap
- Embedding: Generates vector embeddings using OpenAI or similar models
- Indexing: Stores embeddings in ChromaDB for fast retrieval
- Question Answering: Retrieves relevant chunks and uses LLM to generate answers with citations
deepwiki/ ├── __init__.py ├── __main__.py # CLI entry point ├── config.py # Configuration management ├── crawler.py # GitHub repository crawler ├── indexer.py # Document chunking and indexing ├── retriever.py # Search and retrieval ├── qa.py # Question answering with citations └── api.py # FastAPI web server Edit .env to customize:
LLM_PROVIDER: ChooseopenaioranthropicEMBEDDING_MODEL: Embedding model to useCHUNK_SIZE: Size of text chunks for indexingCHUNK_OVERLAP: Overlap between chunks
- Python 3.8+
- OpenAI API key or Anthropic API key
- GitHub token (optional, for higher rate limits)
MIT