Chunker Server¶

Overview¶

The Chunker MCP Server provides advanced text chunking capabilities with multiple strategies and configurable options. It supports recursive, semantic, sentence-based, fixed-size, and markdown-aware chunking methods to meet different text processing needs. The server is now available in both original MCP and FastMCP implementations, with FastMCP offering enhanced type safety and automatic validation.

Key Features¶

Multiple Chunking Strategies: Recursive, semantic, sentence-based, fixed-size, markdown-aware
Markdown Support: Intelligent markdown chunking respecting header structure
Configurable Parameters: Chunk size, overlap, separators, and more
Text Analysis: Analyze text to recommend optimal chunking strategy
Library Integration: Supports LangChain text splitters, NLTK, and spaCy
FastMCP Implementation: Modern decorator-based tool definitions with automatic validation

Quick Start¶

Installation¶

# Basic installation with core functionality
make install

# With NLP libraries (NLTK and spaCy)
make install-nlp

# With LangChain support
make install-langchain

# Full installation (recommended - includes all features)
make install-full

Running the Server¶

# FastMCP server (recommended)
make dev-fastmcp

# Original MCP server
make dev

# HTTP bridge for REST API access
make serve-http-fastmcp  # FastMCP version
make serve-http          # Original version

Available Tools¶

chunk_text¶

Universal text chunking with multiple strategies.

Parameters:

text (required): Text to chunk
chunk_size: Maximum chunk size (default: 1000, range: 100-100000)
chunk_overlap: Overlap between chunks (default: 200)
chunking_strategy: "recursive", "semantic", "sentence", or "fixed_size"
separators: Custom separators for splitting
preserve_structure: Preserve document structure when possible

chunk_markdown¶

Markdown-aware chunking that respects header structure.

Parameters:

text (required): Markdown text to chunk
headers_to_split_on: Headers to use as boundaries (default: ["#", "##", "###"])
chunk_size: Maximum chunk size (default: 1000)
chunk_overlap: Overlap between chunks (default: 100)

semantic_chunk¶

Content-aware chunking based on semantic boundaries.

Parameters:

text (required): Text to chunk
min_chunk_size: Minimum chunk size (default: 200)
max_chunk_size: Maximum chunk size (default: 2000)
similarity_threshold: Threshold for semantic grouping (default: 0.8)

sentence_chunk¶

Sentence-based chunking with configurable grouping.

Parameters:

text (required): Text to chunk
sentences_per_chunk: Sentences per chunk (default: 5, range: 1-50)
overlap_sentences: Overlapping sentences (default: 1, range: 0-10)

fixed_size_chunk¶

Fixed-size chunking with word boundary preservation.

Parameters:

text (required): Text to chunk
chunk_size: Fixed chunk size (default: 1000)
overlap: Overlap between chunks (default: 0)
split_on_word_boundary: Avoid breaking words (default: true)

analyze_text¶

Analyze text characteristics and get chunking recommendations.

Parameters:

text (required): Text to analyze

Returns:

Text statistics (length, word count, paragraph count)
Structure detection (markdown headers, lists, etc.)
Recommended chunking strategies with parameters

get_strategies¶

Get information about available chunking strategies and libraries.

Returns:

Available strategies and their descriptions
Best use cases for each strategy
Library availability status

Configuration¶

MCP Client Configuration¶

FastMCP Server (Recommended)¶

{
  "mcpServers": {
    "chunker": {
      "command": "python",
      "args": ["-m", "chunker_server.server_fastmcp"]
    }
  }
}

Original Server¶

{
  "mcpServers": {
    "chunker": {
      "command": "python",
      "args": ["-m", "chunker_server.server"]
    }
  }
}

Examples¶

Basic Text Chunking¶

{
  "text": "Your long text here...",
  "chunk_size": 1000,
  "chunk_overlap": 200,
  "chunking_strategy": "recursive"
}

Markdown Documentation Processing¶

{
  "text": "# API Reference\n\n## Authentication\n\n...",
  "headers_to_split_on": ["#", "##"],
  "chunk_size": 2000,
  "chunk_overlap": 100
}

Semantic Chunking for Articles¶

{
  "text": "Article content with multiple paragraphs...",
  "min_chunk_size": 500,
  "max_chunk_size": 3000,
  "similarity_threshold": 0.7
}

Preparing Text for Embeddings¶

{
  "text": "Text to be embedded...",
  "chunk_size": 512,
  "chunk_overlap": 50,
  "chunking_strategy": "recursive"
}

Integration¶

With MCP Gateway¶

To integrate with MCP Gateway, expose the server over HTTP:

# Start the chunker server via HTTP
make serve-http-fastmcp

# Register with MCP Gateway
curl -X POST http://localhost:8000/gateways \
  -H "Content-Type: application/json" \
  -d '{
    "name": "chunker-server",
    "url": "http://localhost:9000",
    "description": "Text chunking server"
  }'

Programmatic Usage¶

import asyncio
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client

async def chunk_text():
    server_params = StdioServerParameters(
        command="python",
        args=["-m", "chunker_server.server_fastmcp"]
    )

    async with stdio_client(server_params) as (read, write):
        async with ClientSession(read, write) as session:
            # Initialize the client
            await session.initialize()

            # List available tools
            tools = await session.list_tools()

            # Call chunk_text tool
            result = await session.call_tool("chunk_text", {
                "text": "Your text here...",
                "chunk_size": 1000,
                "chunking_strategy": "recursive"
            })

            print(result.content[0].text)

asyncio.run(chunk_text())

Response Format¶

All tools return a JSON response with:

success: Boolean indicating success/failure
strategy: The chunking strategy used
chunks: Array of text chunks
chunk_count: Number of chunks created
Additional metadata specific to each strategy

Example Response:

{
  "success": true,
  "strategy": "recursive",
  "chunks": [
    "First chunk of text...",
    "Second chunk of text..."
  ],
  "chunk_count": 2,
  "total_length": 2000,
  "average_chunk_size": 1000
}

Chunking Strategies Guide¶

Recursive Chunking¶

Best for: General text, mixed content
How it works: Hierarchically splits using multiple separators
Use cases: Books, articles, documentation

Markdown Chunking¶

Best for: Markdown documents, structured content
How it works: Splits on markdown headers, preserves structure
Use cases: Technical documentation, READMEs, wiki pages

Semantic Chunking¶

Best for: Articles, essays, narrative text
How it works: Groups content by semantic boundaries
Use cases: Research papers, blog posts, news articles

Sentence Chunking¶

Best for: Precise sentence-level processing
How it works: Groups sentences with optional overlap
Use cases: Translation, summarization, sentence analysis

Fixed-Size Chunking¶

Best for: Uniform chunk sizes, simple splitting
How it works: Splits at fixed character counts
Use cases: Token limits, consistent processing windows