Modular, extensible local-first code indexer designed to enhance Claude Code and other LLMs with deep code understanding capabilities. Built on the Model Context Protocol (MCP) for seamless integration with AI assistants.
Version: 1.0.0 (MVP Release) Core Features: Stable - Local indexing, symbol/text search, 48-language support Optional Features: Semantic search (requires Voyage AI), Index sync (beta) Performance: Sub-100ms queries, <10s indexing for cached repositories
New to Code-Index-MCP? Check out our Getting Started Guide for a quick walkthrough.
- π Local-First Architecture: All indexing happens locally for speed and privacy
- π Local Index Storage: All indexes stored at
.indexes/(relative to MCP server) - π Plugin-Based Design: Easily extensible with language-specific plugins
- π 48-Language Support: Complete tree-sitter integration with semantic search
- β‘ Real-Time Updates: File system monitoring for instant index updates
- π§ Semantic Search: AI-powered code search with Voyage AI embeddings
- π Rich Code Intelligence: Symbol resolution, type inference, dependency tracking
- π Enhanced Performance: Sub-100ms queries with timeout protection and BM25 bypass
- π Git Synchronization: Automatic index updates tracking repository changes
- π¦ Portable Index Management: Zero-cost index sharing via GitHub Artifacts
- π Automatic Index Sync: Pull indexes on clone, push on changes
- π― Smart Result Reranking: Multi-strategy reranking for improved relevance
- π Security-Aware Export: Automatic filtering of sensitive files from shared indexes
- π Hybrid Search: BM25 + semantic search with configurable fusion
- π Index Everything Locally: Search .env files and secrets on your machine
- π« Smart Filtering on Share: .gitignore and .mcp-index-ignore patterns applied only during export
- π Multi-Language Indexing: Index entire repositories with mixed languages
The Code-Index-MCP follows a modular, plugin-based architecture designed for extensibility and performance:
-
π System Context (Level 1)
- Developer interacts with Claude Code or other LLMs
- MCP protocol provides standardized tool interface
- Local-first processing with optional cloud features
- Performance SLAs: <100ms symbol lookup, <500ms search
-
π¦ Container Architecture (Level 2)
βββββββββββββββββββ ββββββββββββββββ βββββββββββββββ β API Gateway ββββββΆβ Dispatcher ββββββΆβ Plugins β β (FastAPI) β β β β (Language) β βββββββββββββββββββ ββββββββββββββββ βββββββββββββββ β β β βΌ βΌ βΌ βββββββββββββββββββ ββββββββββββββββ βββββββββββββββ β Local Index β β File Watcher β β Embedding β β (SQLite+FTS5) β β (Watchdog) β β Service β βββββββββββββββββββ ββββββββββββββββ βββββββββββββββ -
π§ Component Details (Level 3)
- Gateway Controller: RESTful API endpoints
- Dispatcher Core: Plugin routing and lifecycle
- Plugin Base: Standard interface for all plugins
- Language Plugins: Specialized parsers and analyzers
- Index Manager: SQLite with FTS5 for fast searches
- Watcher Service: Real-time file monitoring
The project follows a clean, organized structure. See docs/PROJECT_STRUCTURE.md for detailed layout.
Key directories:
mcp_server/- Core MCP server implementationscripts/- Development and utility scriptstests/- Comprehensive test suite with fixturesdocs/- Documentation and guidesarchitecture/- System design and diagramsdocker/- Docker configurations and compose filesdata/- Database files and indexeslogs/- Application and test logsreports/- Generated performance reports and analysisanalysis_archive/- Historical analysis and archived research
Production-Ready Features:
- Dynamic Plugin Loading: Languages are loaded on-demand for optimal performance
- Tree-sitter Parsing: Accurate AST-based symbol extraction with language-specific queries
- Query Caching: Improved performance with cached tree-sitter queries
- Semantic Search: Optional AI-powered code search (when Qdrant is available)
- Cross-Language Search: Find symbols and patterns across all supported languages
Language Categories:
| Category | Languages | Features |
|---|---|---|
| Dedicated Plugins | Python, JavaScript, TypeScript, C, C++, Dart, HTML/CSS | Enhanced analysis, framework support |
| Systems Languages | Go, Rust, C, C++, Zig, Nim, D, V | Memory safety, performance analysis |
| JVM Languages | Java, Kotlin, Scala, Clojure | Package analysis, build tool integration |
| Web Technologies | JavaScript, TypeScript, HTML, CSS, SCSS, PHP | Framework detection, bundler support |
| Scripting Languages | Python, Ruby, Perl, Lua, R, Julia | Dynamic typing, REPL integration |
| Functional Languages | Haskell, Elixir, Erlang, F#, OCaml | Pattern matching, type inference |
| Mobile Development | Swift, Kotlin, Dart, Objective-C | Platform-specific APIs |
| Infrastructure | Dockerfile, Bash, PowerShell, Makefile, CMake | Build automation, CI/CD |
| Data Formats | JSON, YAML, TOML, XML, GraphQL, SQL | Schema validation, query optimization |
| Documentation | Markdown, LaTeX, reStructuredText | Cross-references, formatting |
Implementation Status: Production-Ready - All languages supported via the enhanced dispatcher with:
- β Dynamic plugin loading (lazy initialization)
- β Robust error handling and fallback mechanisms
- β Path resolution for complex project structures
- β Graceful degradation when external services unavailable
# Auto-configures MCP for your environment ./scripts/setup-mcp-json.sh # Or interactive mode ./scripts/setup-mcp-json.sh --interactiveThis automatically detects your environment and creates the appropriate .mcp.json configuration.
# Install MCP Index with Docker curl -sSL https://raw.githubusercontent.com/ViperJuice/Code-Index-MCP/main/scripts/install-mcp-docker.sh | bash # Index your current directory docker run -it -v $(pwd):/workspace ghcr.io/code-index-mcp/mcp-index:minimal# Set your API key (get one at https://voyageai.com) export VOYAGE_AI_API_KEY=your-key # Run with semantic search docker run -it -v $(pwd):/workspace -e VOYAGE_AI_API_KEY ghcr.io/code-index-mcp/mcp-index:standard# PowerShell .\scripts\setup-mcp-json.ps1 # Or manually with Docker Desktop docker run -it -v ${PWD}:/workspace ghcr.io/code-index-mcp/mcp-index:minimal# Install Docker Desktop or use Homebrew brew install --cask docker # Run setup ./scripts/setup-mcp-json.sh# Install Docker (no Desktop needed) curl -fsSL https://get.docker.com | sh # Run setup ./scripts/setup-mcp-json.sh# With Docker Desktop integration ./scripts/setup-mcp-json.sh # Auto-detects WSL+Docker # Without Docker Desktop cp .mcp.json.templates/native.json .mcp.json pip install -e .# For VS Code/Cursor dev containers # Option 1: Use native Python (already in container) cp .mcp.json.templates/native.json .mcp.json # Option 2: Use Docker sidecar (avoids dependency conflicts) docker-compose -f docker/compose/development/docker-compose.mcp-sidecar.yml up -d cp .mcp.json.templates/docker-sidecar.json .mcp.jsonThe setup script creates the appropriate .mcp.json for your environment. Manual examples:
{ "mcpServers": { "code-index-native": { "command": "python", "args": ["scripts/cli/mcp_server_cli.py"], "cwd": "${workspace}" } } }{ "mcpServers": { "code-index-docker": { "command": "docker", "args": [ "run", "-i", "--rm", "-v", "${workspace}:/workspace", "ghcr.io/code-index-mcp/mcp-index:minimal" ] } } }| Feature | Minimal | Standard | Full | Cost |
|---|---|---|---|---|
| Code Search | β | β | β | Free |
| 48 Languages | β | β | β | Free |
| Semantic Search | β | β | β | ~$0.05/1M tokens |
| GitHub Sync | β | β | β | Free |
| Monitoring | β | β | β | Free |
- Python 3.12+
- Git
# Install the package pip install index-it-mcp # Or install with dev tools for testing pip install index-it-mcp[dev]# Clone the repository git clone https://github.com/ViperJuice/Code-Index-MCP.git cd Code-Index-MCP # Create virtual environment python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate # Install in editable mode pip install -e .# Authenticate GitHub artifact access once gh auth login # Check repo/artifact readiness before starting work mcp-index preflight # Pull the latest published index baseline for this repo mcp-index artifact pull --latest # Reconcile only your local drift after restore mcp-index artifact sync # The restored files live locally for MCP runtime use: # - code_index.db # - .index_metadata.json # - vector_index.qdrant/ # Check index status mcp-index index status # Start the API server mcp-index serve # Or choose a custom port explicitly mcp-index serve --port 9123 # Test the API curl http://localhost:8000/status curl -X POST http://localhost:8000/search \ -H "Content-Type: application/json" \ -d '{"query": "def parse"}'Create a .env file for configuration:
# Semantic profile setup VOYAGE_AI_API_KEY=your_api_key_here # Use 127.0.0.1 for local inference, or a Tailscale/SSH tunnel IP for remote GPUs OPENAI_API_BASE=http://127.0.0.1:8001/v1 QDRANT_PATH=vector_index.qdrant # Server settings MCP_SERVER_HOST=0.0.0.0 MCP_SERVER_PORT=8000 MCP_LOG_LEVEL=INFO # Workspace settings MCP_WORKSPACE_ROOT=. MCP_MAX_FILE_SIZE=10485760 # 10MB # GitHub Artifact Sync (privacy settings) MCP_ARTIFACT_SYNC=false # Set to true to enable AUTO_UPLOAD=false # Auto-upload on changes AUTO_DOWNLOAD=true # Auto-download on clonePublished artifacts now carry the full lexical baseline plus two semantic profiles:
commercial_highusingvoyage-code-3oss_highusingQwen/Qwen3-Embedding-8B
Those profiles are stored in separate Qdrant collections inside the artifact so consumers can pull one baseline and use either profile locally.
Pro Tip: Remote Inference for the Open-Source Profile If your local machine lacks the GPU power to run the oss_high embedding model locally (e.g., via vLLM or Ollama), you can run inference on a remote machine and point the MCP server to it:
- Tailscale/VPN: Set
OPENAI_API_BASE=http://<tailnet-ip>:8001/v1 - SSH Tunnel: Run
ssh -L 8001:localhost:8001 user@remote-gpu-machine, and the default127.0.0.1:8001configuration will tunnel directly to your inference server.
The generated index files are not meant to live in git history. The repo tracks the code, workflow, and configuration needed to build/publish them; GitHub artifacts distribute the actual runtime baseline that MCP restores locally.
For open source users, the recommended multi-repo model is local-first on one machine:
# Register each local repository once mcp-index repository register /path/to/repo-a mcp-index repository register /path/to/repo-b # Inspect repository-level readiness details mcp-index repository list -v # Inspect all registered repositories and their local artifact/runtime readiness mcp-index artifact workspace-status # Refresh readiness after restoring or rebuilding local indexes mcp-index artifact reconcile-workspace # Prepare per-repo local artifact payloads without requiring remote publication mcp-index artifact publish-workspaceRecommended pattern:
- register each repo checkout once with
mcp-index repository register <path> - keep each repo checkout self-contained
- restore or rebuild local runtime files per repo as needed
- use
mcp-index repository list -vand workspace status/reconcile to see which repos are ready or stale - only use GitHub artifact publication when you actually want to share a repo baseline
Control how your code index is shared:
// .mcp-index.json { "github_artifacts": { "enabled": false, // Disable sync entirely "auto_upload": false, // Manual upload only "auto_download": true, // Still get team indexes "exclude_patterns": [ // Additional exclusions "internal/*", "proprietary/*" ] } }Privacy Features:
- Indexes filtered by .gitignore automatically
- Additional patterns via .mcp-index-ignore
- Audit logs show what was excluded
- Sync disabled by default in Docker minimal version
The system includes multiple reranking strategies to improve search relevance:
# Configure reranking in your searches from mcp_server.indexer.reranker import RerankConfig, TFIDFReranker config = RerankConfig( enabled=True, reranker=TFIDFReranker(), # Or CohereReranker(), CrossEncoderReranker() top_k=20 ) # Search with reranking results = await search_engine.search(query, rerank_config=config)Available Rerankers:
- TF-IDF: Fast, local reranking using term frequency
- Cohere: Cloud-based neural reranking (requires API key)
- Cross-Encoder: Local transformer-based reranking
- Hybrid: Combines multiple rerankers with fallback
Prevent accidental sharing of sensitive files:
# Analyze current index for security issues python scripts/utilities/analyze_gitignore_security.py # Create secure index export (filters gitignored files) python scripts/utilities/secure_index_export.py # The secure export will: # - Exclude all gitignored files # - Remove sensitive patterns (*.env, *.key, etc.) # - Create audit logs of excluded filesCombines traditional full-text search with semantic search:
# The system automatically uses hybrid search when available # Configure weights in settings: HYBRID_SEARCH_BM25_WEIGHT=0.3 HYBRID_SEARCH_SEMANTIC_WEIGHT=0.5 HYBRID_SEARCH_FUZZY_WEIGHT=0.2The enhanced dispatcher includes timeout protection and automatic fallback:
from mcp_server.dispatcher.dispatcher_enhanced import EnhancedDispatcher from mcp_server.storage.sqlite_store import SQLiteStore store = SQLiteStore(".indexes/YOUR_REPO_ID/current.db") dispatcher = EnhancedDispatcher( sqlite_store=store, semantic_search_enabled=True, # Enable if Qdrant available lazy_load=True, # Load plugins on-demand use_plugin_factory=True # Use dynamic plugin loading ) # Search with automatic optimization results = list(dispatcher.search("your query", limit=10))For maximum performance with BM25-only search:
from mcp_server.dispatcher.simple_dispatcher import create_simple_dispatcher # Ultra-fast BM25 search without plugin overhead dispatcher = create_simple_dispatcher(".indexes/YOUR_REPO_ID/current.db") results = list(dispatcher.search("your query", limit=10))Configure dispatcher behavior via environment variables:
# Dispatcher settings MCP_DISPATCHER_TIMEOUT=5 # Plugin loading timeout (seconds) MCP_USE_SIMPLE_DISPATCHER=false # Use simple dispatcher MCP_PLUGIN_LAZY_LOAD=true # Load plugins on-demand # Performance tuning MCP_BM25_BYPASS_ENABLED=true # Enable direct BM25 bypass MCP_MAX_PLUGIN_MEMORY=1024 # Max memory for plugins (MB)All indexes are now stored centrally at .indexes/ (relative to the MCP project) for better organization and to prevent accidental commits:
.indexes/ βββ {repo_hash}/ # Unique hash for each repository β βββ main_abc123.db # Index for main branch at commit abc123 β βββ main_abc123.metadata.json β βββ current.db -> main_abc123.db # Symlink to active index βββ qdrant/ # Semantic search embeddings β βββ main.qdrant/ # Centralized Qdrant database Benefits:
- Indexes never accidentally committed to git
- Reusable across multiple clones of same repository
- Clear separation between code and indexes
- Automatic discovery based on git remote
Migration: For existing repositories with local indexes:
python scripts/move_indexes_to_central.pyThis project uses GitHub Actions Artifacts for efficient index sharing, so most users start from a published index baseline instead of rebuilding locally.
# First time setup - pull latest indexes mcp-index artifact pull --latest # After pull, reconcile only your branch/worktree drift mcp-index artifact sync # Share your indexes with the team mcp-index artifact push # Check sync status mcp-index artifact sync # Optional: Install git hooks for automatic sync mcp-index hooks install # Now indexes upload automatically on git push # and download automatically on git pullEnable portable index management in any repository with zero GitHub compute costs:
# One-line install curl -sSL https://raw.githubusercontent.com/ViperJuice/Code-Index-MCP/main/scripts/install-mcp.sh | bash # Or via npm npm install -g mcp-index-kit mcp-index init-
Zero-Cost Architecture:
- All indexing happens on developer machines
- Indexes stored as GitHub Artifacts (free for public repos)
- Automatic download on clone, upload on push
- No GitHub Actions compute required
-
Portable Design:
- Single command setup for any repository
- Auto-detected by MCP servers and tools
- Works with all 48 supported languages
- Enable/disable per repository
-
Usage:
# Initialize in your repo cd your-repo mcp-index init # Build index locally mcp-index build # Push to GitHub Artifacts mcp-index push # Pull latest index mcp-index pull # Auto sync mcp-index sync
To enable semantic search capabilities, you need a Voyage AI API key. Get one from https://www.voyageai.com/.
Method 1: Claude Code Configuration (Recommended)
Create or edit .mcp.json in your project root:
{ "mcpServers": { "code-index-mcp": { "command": "uvicorn", "args": ["mcp_server.gateway:app", "--host", "0.0.0.0", "--port", "8000"], "env": { "VOYAGE_AI_API_KEY": "your-voyage-ai-api-key-here", "SEMANTIC_SEARCH_ENABLED": "true" } } } }Method 2: Claude Code CLI
claude mcp add code-index-mcp -e VOYAGE_AI_API_KEY=your_key -e SEMANTIC_SEARCH_ENABLED=true -- uvicorn mcp_server.gateway:appMethod 3: Environment Variables
export VOYAGE_AI_API_KEY=your_key export SEMANTIC_SEARCH_ENABLED=trueMethod 4: .env File
Create a .env file in your project root:
VOYAGE_AI_API_KEY=your_key SEMANTIC_SEARCH_ENABLED=true Check Configuration
Verify your semantic search setup:
mcp-index index check-semanticEdit .mcp-index.json in your repository:
{ "enabled": true, "auto_download": true, "artifact_retention_days": 30, "github_artifacts": { "enabled": true, "max_size_mb": 100 } }See mcp-index-kit for full documentation
mcp-index artifact info 12345
#### Index Management ```bash # Check index status mcp-index index status # Check compatibility mcp-index index check-compatibility # Rebuild indexes locally only if artifact sync cannot catch up mcp-index index rebuild # Create backup mcp-index index backup my_backup # Restore from backup mcp-index index restore my_backup - Pull Requests: Validates developer-provided indexes (no rebuilding)
- Merges to Main: Promotes validated indexes to artifacts
- Cost-Efficient: Uses free GitHub Actions Artifacts storage
- Auto-Cleanup: Old artifacts cleaned up after 30 days
- GitHub Actions Artifacts: FREE for public repos, included in private repo quotas
- Retention: 7 days for PR artifacts, 30 days for main branch
- Size Limits: 500MB per artifact (compressed)
- Automatic Compression: ~70% size reduction with tar.gz
-
Clone Repository
git clone https://github.com/yourusername/Code-Index-MCP.git cd Code-Index-MCP -
Get Latest Indexes
gh auth login mcp-index artifact pull --latest
- This downloads the current full GitHub artifact snapshot.
mcp-index artifact syncthen reconciles only your local branch/worktree drift when incremental catch-up is appropriate.
-
Make Your Changes
- Edit code as normal
- Indexes update automatically via file watcher
-
Share Updates
# Your indexes are already updated locally mcp-index artifact push
The system tracks embedding model versions to ensure compatibility:
- Current model:
voyage-code-3(1024 dimensions) - Distance metric: Cosine similarity
- Auto-detection: System checks compatibility before download
Multi-profile semantic config can be provided in either:
SEMANTIC_PROFILES_JSON(environment variable), orcode-index-mcp.profiles.yaml(repository root).
- GitHub artifact pulls are full snapshot downloads, not partial remote patch fetches.
- The current compressed artifact is modest enough that full downloads stay simpler than a remote delta protocol.
- Efficiency comes from local incremental indexing after restore:
- pull the latest full artifact
- compare the restored artifact commit to local
HEAD - let the watcher or local incremental reindexing reconcile added, modified, deleted, and renamed files
- Branch-specific remote artifacts are optional. The default strategy is to use the latest
mainartifact as the base and reconcile branch drift locally.
Run onboarding with automatic local Qdrant startup:
mcp-index setup semanticSettings precedence (highest to lowest):
- CLI flags (for one command run)
- Environment variables /
.env code-index-mcp.profiles.yamlSEMANTIC_PROFILES_JSON- Built-in defaults
Common controls:
# Preflight checks only mcp-index setup semantic --dry-run # Strict mode: fail command if semantic stack isn't ready mcp-index setup semantic --strict # Override local embedding endpoint mcp-index setup semantic --openai-api-base http://127.0.0.1:8001/v1Plugin loading is auto-optimized by default using fast repository language detection:
MCP_AUTO_DETECT_LANGUAGES=trueMCP_LANGUAGE_DETECT_MAX_FILES=5000MCP_LANGUAGE_DETECT_MIN_FILES=2
For startup-sensitive environments, enable:
MCP_FAST_STARTUP=true(uses lazy plugin loading and skips file watcher startup)
When MCP_AUTO_DETECT_LANGUAGES=true, auto-detection takes precedence over plugins.yaml. Set MCP_AUTO_DETECT_LANGUAGES=false to force plugins.yaml language selection.
For a dual-profile setup (Voyage + local vLLM/Qwen), set:
VOYAGE_API_KEY(orVOYAGE_AI_API_KEY)OPENAI_API_BASE(for examplehttp://127.0.0.1:8000/v1)OPENAI_API_KEY(placeholder accepted for local vLLM setups)
If you use a different embedding model, the system will detect incompatibility and rebuild locally with your configuration.
-
Create plugin structure
mkdir -p mcp_server/plugins/my_language_plugin cd mcp_server/plugins/my_language_plugin touch __init__.py plugin.py -
Implement the plugin interface
from mcp_server.plugin_base import PluginBase class MyLanguagePlugin(PluginBase): def __init__(self): self.tree_sitter_language = "my_language" def index(self, file_path: str) -> Dict: # Parse and index the file pass def getDefinition(self, symbol: str, context: Dict) -> Dict: # Find symbol definition pass def getReferences(self, symbol: str, context: Dict) -> List[Dict]: # Find symbol references pass
-
Register the plugin
# In dispatcher.py from .plugins.my_language_plugin import MyLanguagePlugin self.plugins['my_language'] = MyLanguagePlugin()
# Run all tests pytest # Run specific test pytest test_python_plugin.py # Run with coverage pytest --cov=mcp_server --cov-report=html# View C4 architecture diagrams docker run --rm -p 8080:8080 \ -v "$(pwd)/architecture":/usr/local/structurizr \ structurizr/lite # Open http://localhost:8080 in your browserGet symbol definition
GET /symbol?symbol_name=parseFile&file_path=/path/to/file.py Query parameters:
symbol_name(required): Name of the symbol to findfile_path(optional): Specific file to search in
Search for code patterns
GET /search?query=async+def.*parse&file_extensions=.py,.js Query parameters:
query(required): Search pattern (regex supported)file_extensions(optional): Comma-separated list of extensions
All API responses follow a consistent JSON structure:
Success Response:
{ "status": "success", "data": { ... }, "timestamp": "2024-01-01T00:00:00Z" }Error Response:
{ "status": "error", "error": "Error message", "code": "ERROR_CODE", "timestamp": "2024-01-01T00:00:00Z" }The project includes multiple Docker configurations for different environments:
Development (Default):
# Uses docker-compose.yml + Dockerfile docker-compose up -d # - SQLite database # - Uvicorn development server # - Volume mounts for code changes # - Debug logging enabledProduction:
# Uses docker-compose.production.yml + Dockerfile.production docker-compose -f docker-compose.production.yml up -d # - PostgreSQL database # - Gunicorn + Uvicorn workers # - Multi-stage optimized builds # - Security hardening (non-root user) # - Production loggingEnhanced Development:
# Uses both compose files with development overrides docker-compose -f docker-compose.yml -f docker-compose.dev.yml up -d # - Development base + enhanced debugging # - Source code volume mounting # - Read-write code accessImportant: By default, docker-compose restart uses the DEVELOPMENT configuration:
docker-compose restartβ Usesdocker-compose.yml(Development)docker-compose -f docker-compose.production.yml restartβ Uses Production
For production environments, we provide:
- Multi-stage Docker builds with security hardening
- PostgreSQL database with async support
- Redis caching for performance optimization
- Qdrant vector database for semantic search
- Prometheus + Grafana monitoring stack
- Kubernetes manifests in
k8s/directory - nginx reverse proxy configuration
See our Deployment Guide for detailed instructions including:
- Kubernetes deployment configurations
- Auto-scaling setup
- Database optimization
- Security best practices
- Monitoring and observability
- Minimum: 2GB RAM, 2 CPU cores, 10GB storage
- Recommended: 8GB RAM, 4 CPU cores, 50GB SSD storage
- Large codebases: 16GB+ RAM, 8+ CPU cores, 100GB+ SSD storage
For quick setup, download pre-built indexes from our GitHub releases:
# List available releases python scripts/download-release.py --list # Download latest release python scripts/download-release.py --latest # Download specific version python scripts/download-release.py --tag v2024.01.15 --output ./my-indexMaintainers can create new releases with pre-built indexes:
# Create a new release (as draft) python scripts/create-release.py --version 1.0.0 # Create and publish immediately python scripts/create-release.py --version 1.0.0 --publishThe project includes Git hooks for automatic index synchronization:
- Pre-push: Uploads index changes to GitHub artifacts
- Post-merge: Downloads compatible indexes after pulling
Install hooks with: mcp-index hooks install
We welcome contributions! Please see our Contributing Guide for details.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes
- Add tests (aim for 90%+ coverage)
- Update documentation
- Submit a pull request
- Follow PEP 8 for Python code
- Use type hints for all functions
- Write descriptive docstrings
- Keep functions small and focused
| Operation | Performance Target | Current Status |
|---|---|---|
| Symbol Lookup | <100ms (p95) | β Achieved - All queries < 100ms |
| Code Search | <500ms (p95) | β Achieved - BM25 search < 50ms |
| File Indexing | 10K files/min | β Achieved - 152K files indexed |
Latest rerun indexed the full Code-Index-MCP repository using:
- Qwen embeddings via Fireworks (
fireworks/qwen3-embedding-8b) - Voyage embeddings (
voyage-code-3) - Local Qwen embeddings (
Qwen/Qwen3-Embedding-8Bvia tailnet endpoint) - BM25, fuzzy, classic FTS, and hybrid retrieval
| Metric | Result |
|---|---|
| Files indexed | 1282 |
| Semantic-skipped files (oversized) | 1 |
| MCP query suite Top-1 pass rate | 33.3% |
| MCP query suite Top-3 pass rate | 64.4% |
Native tools pass rate (rg/grep/glob) | 44.4% |
| Semantic Qwen latency p50 (single query, top-5) | 238.5 ms |
| Semantic Voyage latency p50 (single query, top-5) | 116.68 ms |
| Semantic Local Qwen latency p50 (single query, top-5) | 49.22 ms |
| BM25 latency p50 (single query, top-5) | 46.45 ms |
| Fuzzy latency p50 (single query, top-5) | 10.11 ms |
| Hybrid semantic-intent latency p50 (single query, top-5) | 5093.14 ms |
Known gaps from this rerun:
classicmode still surfaces benchmark/doc artifacts for the semantic-preflight probe.hybridmode still has high tail latency and low Top-1 on intent/persistence probes.- Top-3 improved materially, but Top-1 remains a ranking problem.
Benchmark artifacts:
docs/benchmarks/e2e_retrieval_validation_fullrepo_fireworks_qwen_voyage_local_iter5_rerun.jsondocs/benchmarks/mcp_vs_native_benchmark_fullrepo_fireworks_qwen_voyage_local_iter5_rerun.jsondocs/benchmarks/fullrepo-benchmark-analysis.md
The system follows C4 model architecture patterns:
- Workspace Definition: 100% implemented (architecture/workspace.dsl) - Validated with CLI tools
- System Context (L1): Claude Code integration with MCP sub-agent support fully operational
- Container Level (L2): 8 main containers including enhanced MCP server and user documentation
- Component Level (L3): Plugin system with 48 languages, memory management, and cross-repo coordination
- Code Level (L4): 43 PlantUML diagrams documenting all system components and flows
For detailed architectural documentation, see the architecture/ directory.
See ROADMAP.md for detailed development plans and current progress.
Current Status: v1.0.0 MVP Release
- β Core Indexing: SQLite + FTS5 for fast local search
- β Multi-Language: 48 languages via tree-sitter integration
- β MCP Protocol: Full compatibility with Claude Code and other MCP clients
- β Performance: Sub-100ms queries with BM25 optimization
- π Index Sync: Beta support via GitHub Artifacts
- π Semantic Search: Optional feature requiring Voyage AI API
Recent Improvements:
- β‘ Dispatcher Optimization: Timeout protection and BM25 bypass for reliability
- π Hybrid Search: BM25 + semantic search with graceful degradation
- π Result Ranking: Improved relevance with score normalization
- π§ CLI Tools: Full-featured
mcp-indexcommand for index management
Performance optimization features are implemented and available:
- Enable caching: Redis caching is implemented and configurable via environment variables
- Adjust batch size: Configurable via
INDEXING_BATCH_SIZEenvironment variable - Use SSD storage: Improves indexing speed significantly
- Limit file size: Configurable via
INDEXING_MAX_FILE_SIZEenvironment variable - Parallel processing: Multi-worker indexing configurable via
INDEXING_MAX_WORKERS
- Local-first: All processing happens locally by default
- Path validation: Prevents directory traversal attacks
- Input sanitization: All queries are sanitized
- Secret detection: Automatic redaction of detected secrets
- Plugin isolation: Plugins run in restricted environments
β οΈ Semantic Summary Risks: If you enable LLM-generated semantic summaries (lazy or comprehensive), be aware of prompt injection vulnerabilities. Malicious actors could place hidden instructions in code comments (e.g., in an open-source dependency) that the summarizer LLM might execute. Always review generated index metadata if summarizing untrusted code.
This project is licensed under the MIT License - see the LICENSE file for details.
- Tree-sitter for language parsing
- Jedi for Python analysis
- FastAPI for the API framework
- Voyage AI for embeddings
- Anthropic for the MCP protocol
- Issues: GitHub Issues
- Discussions: GitHub Discussions
Built with β€οΈ for the developer community