Skip to content

martoc/mcp-spark-documentation

License: MIT Python 3.12 MCP

MCP Spark Documentation Server

An MCP (Model Context Protocol) server that provides search and retrieval tools for Apache Spark documentation. This server enables AI assistants like Claude to search and read Spark documentation directly.

Features

  • Full-text search using SQLite FTS5 with BM25 ranking and Porter stemming
  • Section filtering to narrow search results by documentation category
  • Sparse checkout for efficient cloning of only the docs directory from apache/spark
  • Docker support for portable deployment across projects
  • STDIO transport for seamless MCP client integration

Quick Start

Using Docker (Recommended)

# Build the Docker image (includes pre-indexed documentation) make docker-build # Test the server make docker-run

Using uv (Local Development)

# Initialise the environment make init # Build the documentation index make index # Run the server make run

Configuration

Claude Code / Claude Desktop

Add to your .mcp.json or global settings:

{ "mcpServers": { "spark-documentation": { "command": "docker", "args": ["run", "-i", "--rm", "martoc/mcp-spark-documentation:latest"] } } }

For a locally built Docker image:

{ "mcpServers": { "spark-documentation": { "command": "docker", "args": ["run", "-i", "--rm", "mcp-spark-documentation"] } } }

For local development without Docker:

{ "mcpServers": { "spark-documentation": { "command": "uv", "args": ["run", "mcp-spark-documentation"], "cwd": "/path/to/mcp-spark-documentation" } } }

MCP Tools

Tool Description
search_documentation Search Spark documentation by keyword query with optional section filtering
read_documentation Retrieve the full content of a specific documentation page

search_documentation

Search Apache Spark documentation using full-text search with stemming support.

Parameter Type Required Default Description
query string Yes - Search terms (supports stemming)
section string No None Filter by section (e.g., sql-ref, streaming, mllib)
limit integer No 10 Maximum results (1-50)

Common Sections: sql-ref, api, streaming, mllib, graphx, structured-streaming, configuration, tuning

read_documentation

Retrieve the full content of a documentation page.

Parameter Type Required Description
path string Yes Relative path to document (from search results)

CLI Commands

# Build/rebuild the documentation index uv run spark-docs-index index uv run spark-docs-index index --rebuild uv run spark-docs-index index --branch master # Show index statistics uv run spark-docs-index stats

Development

make init # Initialise development environment make build # Run full build (lint, typecheck, test) make test # Run tests with coverage make format # Format code make lint # Run linter make typecheck # Run type checker

Documentation

Licence

This project is licensed under the MIT Licence - see the LICENSE file for details.

About

Spark MCP Servers — helping you get the most out of Spark, wherever you use MCP

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors