Published: Feb 13, 2026 by martoc
We’ve all been there. It’s 2am, you’re knee-deep in a ClickHouse query that refuses to cooperate, the Helm chart values file looks like it was written in ancient Sumerian, and your AI assistant confidently suggests a Karpenter configuration option that hasn’t existed since version 0.27. The AI is brilliant, creative, and occasionally a magnificent liar when it comes to niche technical documentation.
What if your AI assistant could actually read the docs before answering? Not the docs from two years ago baked into its training data, but the real, current, authoritative documentation? That’s precisely what the Model Context Protocol (MCP) enables, and it’s why I’ve built six open-source MCP servers for the tools I use every day.
What is MCP?
The Model Context Protocol is an open standard introduced by Anthropic in November 2024 that standardises how AI systems integrate with external data sources and tools. Think of it as a USB-C port for AI: one universal connector instead of a drawer full of proprietary cables.
At its core, MCP uses a client-server architecture. AI applications (clients) connect to data sources and tools (servers) through a standard protocol built on JSON-RPC. When an agent needs data, context, or to perform an action, it sends a standardised request to an MCP server. The server retrieves the information or executes the action, then sends back a standardised response. Simple, elegant, and remarkably powerful.
MCP defines three core primitives:
| Primitive | Description | Controlled By |
|---|---|---|
| Tools | Executable functions that AI applications can invoke | Model |
| Resources | Data sources that provide contextual information | Application |
| Prompts | Reusable templates for structuring interactions | User |
The protocol supports multiple transport mechanisms. STDIO runs the server as a local subprocess, ideal for command-line tools. Streamable HTTP enables remote deployment with support for multiple client connections. The older SSE transport has been deprecated in favour of Streamable HTTP.
Why MCP Matters
The adoption numbers speak for themselves. MCP has grown from an internal Anthropic experiment to an industry standard with over 1,000 community-built servers. OpenAI adopted MCP in March 2025, Google Cloud followed suit, and Microsoft Azure incorporated MCP into the Azure AI Agent Service. Organisations implementing MCP report 40-60% faster agent deployment times.
But beyond the industry momentum, there are practical reasons why MCP is a game-changer for developers:
- No more hallucinated documentation: Your AI assistant queries real, indexed documentation instead of relying on training data that may be outdated or incomplete
- Build once, use everywhere: A single MCP server works across Claude Desktop, Claude Code, Cursor, VS Code, and any MCP-compatible client
- Context efficiency: Tools load on demand rather than consuming precious context window space upfront
- Composability: Chain multiple MCP servers together, giving your AI access to documentation, databases, APIs, and custom tools simultaneously
Best Practices for Building MCP Servers
Having built six documentation MCP servers, I’ve learned a few things about what makes a good one. Here are the practices that have worked well.
Keep Tools Focused and Minimal
Each MCP server should expose a small number of well-defined tools. All my servers expose exactly two tools:
search_documentation- Full-text search with optional filteringread_documentation- Retrieve complete document content
This follows the principle of least surprise. The AI model doesn’t need to reason about which of fifteen tools to call; it searches, finds what it needs, and reads the full document.
Pre-Index Documentation at Build Time
Rather than cloning repositories and indexing at runtime, build Docker images with pre-indexed documentation. This means instant startup, consistent performance, and no network dependency at query time. Every one of my servers uses a multi-stage Dockerfile that indexes documentation during the build phase.
Use Efficient Search with BM25 Ranking
SQLite FTS5 with BM25 ranking is a remarkably capable search engine for this use case. With weighted fields (title: 5.0, description: 2.0, content: 1.0) and Porter stemming, search results are relevant and fast (typically under 100ms). The key is proper query sanitisation to prevent FTS5 syntax errors from user input.
Sparse Git Checkout for Efficiency
Most documentation repositories are enormous. Apache Spark’s repository is over 1GB, but the docs directory is only around 10MB. Using sparse checkout with shallow cloning (--depth 1) dramatically reduces both clone time and disk usage.
Security Considerations
MCP servers can execute commands and perform API calls, which means security cannot be an afterthought. The MCP specification includes security best practices that should be followed:
- Use parameterised SQL queries to prevent injection
- Validate and cap all user inputs (search limits between 1-50)
- Use subprocess list arguments rather than shell strings
- Follow least-privilege principles
A notable incident in September 2025 involved an unofficial MCP server that silently added BCC fields to emails, leaking content to an attacker. Always review MCP servers you install, especially those from untrusted sources.
My Open-Source MCP Documentation Servers
I’ve built a suite of six MCP servers, all following the same architecture and quality standards. Each one indexes official documentation from the upstream project and makes it searchable through Claude.
Common Architecture
All servers share a consistent design:
MCP Client (Claude Desktop / Claude Code / Cursor)
|
| STDIO Transport (JSON-RPC)
|
FastMCP Server (2 tools)
|
DocumentDatabase (SQLite FTS5, BM25 ranking)
|
Indexer (sparse git checkout)
|
Parser (RST / Markdown / MDX)
Each server is built with Python 3.12+, uses FastMCP as the framework, and includes comprehensive test suites with pytest. They all ship as Docker images on Docker Hub for easy deployment.
The Servers
mcp-airflow-documentation
Indexes Apache Airflow core documentation (RST format) and the Python Client documentation (Markdown format) into a single unified database.
- Sources: Airflow Core (~300 docs) + Python Client (~100 docs)
- Parsers: RST (docutils with visitor pattern) + Markdown (frontmatter)
- Unique feature: Source-aware filtering - search across both sources or filter by
airflow-coreorairflow-python-client - Repository: github.com/martoc/mcp-airflow-documentation
- Docker:
martoc/mcp-airflow-documentation:latest
mcp-clickhouse-documentation
Indexes ClickHouse documentation including the knowledge base articles. Handles Docusaurus MDX files with JSX components.
- Sources: ClickHouse docs + knowledge base (~850 docs)
- Parser: MDX with JSX component cleaning (regex-based, no Node.js dependency)
- Unique feature: Multi-directory indexing covering both
docs/andknowledgebase/ - Repository: github.com/martoc/mcp-clickhouse-documentation
- Docker:
martoc/mcp-clickhouse-documentation:latest
mcp-cloudcustodian-documentation
Indexes Cloud Custodian documentation with section-based filtering by cloud provider.
- Sources: Cloud Custodian docs (~195 docs)
- Parser: RST with docutils visitor pattern
- Unique feature: Section filtering by cloud provider (AWS, Azure, GCP, Kubernetes, OCI)
- Repository: github.com/martoc/mcp-cloudcustodian-documentation
- Docker:
martoc/mcp-cloudcustodian-documentation:latest
mcp-helm-documentation
Indexes Helm documentation from the helm-www repository.
- Sources: Helm docs (~130 docs)
- Parser: Markdown with YAML frontmatter and MDX cleaning
- Unique feature: Rich section structure including chart template guide and best practices
- Repository: github.com/martoc/mcp-helm-documentation
- Docker:
martoc/mcp-helm-documentation:latest
mcp-karpenter-documentation
Indexes AWS Karpenter documentation from the karpenter-provider-aws repository.
- Sources: Karpenter website docs (~181 docs)
- Parser: Markdown with Hugo shortcode cleaning
- Unique feature: Hugo-specific content handling and karpenter.sh URL generation
- Repository: github.com/martoc/mcp-karpenter-documentation
- Docker:
martoc/mcp-karpenter-documentation:latest
mcp-spark-documentation
Indexes Apache Spark documentation with section-based filtering.
- Sources: Spark docs (sql-ref, streaming, mllib, graphx, and more)
- Parser: Markdown with Jekyll liquid tag cleaning
- Unique feature: Extensive section coverage including SQL reference, ML library, and graph processing
- Repository: github.com/martoc/mcp-spark-documentation
- Docker:
martoc/mcp-spark-documentation:latest
Getting Started
Setting up any of these servers takes about 30 seconds. Add the following to your Claude Desktop configuration (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):
{
"mcpServers": {
"airflow-docs": {
"command": "docker",
"args": ["run", "-i", "--rm", "martoc/mcp-airflow-documentation:latest"]
},
"clickhouse-docs": {
"command": "docker",
"args": ["run", "-i", "--rm", "martoc/mcp-clickhouse-documentation:latest"]
},
"cloudcustodian-docs": {
"command": "docker",
"args": ["run", "-i", "--rm", "martoc/mcp-cloudcustodian-documentation:latest"]
},
"helm-docs": {
"command": "docker",
"args": ["run", "-i", "--rm", "martoc/mcp-helm-documentation:latest"]
},
"karpenter-docs": {
"command": "docker",
"args": ["run", "-i", "--rm", "martoc/mcp-karpenter-documentation:latest"]
},
"spark-docs": {
"command": "docker",
"args": ["run", "-i", "--rm", "martoc/mcp-spark-documentation:latest"]
}
}
}
For Claude Code, add a .mcp.json file to your project root:
{
"mcpServers": {
"helm-docs": {
"command": "docker",
"args": ["run", "-i", "--rm", "martoc/mcp-helm-documentation:latest"]
}
}
}
Once configured, you can simply ask Claude questions and it will automatically search the relevant documentation. For example:
“How do I configure a Karpenter NodePool with spot instances?”
Claude will call search_documentation on the Karpenter MCP server, find the relevant pages, call read_documentation to get the full content, and provide an answer grounded in the actual current documentation.
Contributing
All six servers are open-source under the MIT licence. Contributions are welcome, whether that’s reporting issues, suggesting new features, or submitting pull requests. The common architecture means that once you understand one server, contributing to any of them is straightforward.
If you’d like to build your own documentation MCP server, the pattern is well-established:
- Create a parser for your documentation format (RST, Markdown, MDX, or HTML)
- Build an indexer that clones and processes the upstream repository
- Store documents in SQLite FTS5 with BM25 ranking
- Expose
search_documentationandread_documentationtools via FastMCP - Package it in a Docker image with pre-indexed documentation
The future of AI-assisted development isn’t about AI that knows everything. It’s about AI that knows where to look and how to read. MCP makes that future practical today.
Share