An MCP (Model Context Protocol) server that provides search and retrieval tools for Apache Spark documentation. This server enables AI assistants like Claude to search and read Spark documentation directly.
- Full-text search using SQLite FTS5 with BM25 ranking and Porter stemming
- Section filtering to narrow search results by documentation category
- Sparse checkout for efficient cloning of only the docs directory from apache/spark
- Docker support for portable deployment across projects
- STDIO transport for seamless MCP client integration
# Build the Docker image (includes pre-indexed documentation) make docker-build # Test the server make docker-run# Initialise the environment make init # Build the documentation index make index # Run the server make runAdd to your .mcp.json or global settings:
{ "mcpServers": { "spark-documentation": { "command": "docker", "args": ["run", "-i", "--rm", "martoc/mcp-spark-documentation:latest"] } } }For a locally built Docker image:
{ "mcpServers": { "spark-documentation": { "command": "docker", "args": ["run", "-i", "--rm", "mcp-spark-documentation"] } } }For local development without Docker:
{ "mcpServers": { "spark-documentation": { "command": "uv", "args": ["run", "mcp-spark-documentation"], "cwd": "/path/to/mcp-spark-documentation" } } }| Tool | Description |
|---|---|
search_documentation | Search Spark documentation by keyword query with optional section filtering |
read_documentation | Retrieve the full content of a specific documentation page |
Search Apache Spark documentation using full-text search with stemming support.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
query | string | Yes | - | Search terms (supports stemming) |
section | string | No | None | Filter by section (e.g., sql-ref, streaming, mllib) |
limit | integer | No | 10 | Maximum results (1-50) |
Common Sections: sql-ref, api, streaming, mllib, graphx, structured-streaming, configuration, tuning
Retrieve the full content of a documentation page.
| Parameter | Type | Required | Description |
|---|---|---|---|
path | string | Yes | Relative path to document (from search results) |
# Build/rebuild the documentation index uv run spark-docs-index index uv run spark-docs-index index --rebuild uv run spark-docs-index index --branch master # Show index statistics uv run spark-docs-index statsmake init # Initialise development environment make build # Run full build (lint, typecheck, test) make test # Run tests with coverage make format # Format code make lint # Run linter make typecheck # Run type checker- USAGE.md - Detailed usage instructions
- CODESTYLE.md - Code style guidelines
- CLAUDE.md - Claude Code instructions
This project is licensed under the MIT Licence - see the LICENSE file for details.