Pandoc Server¶

Overview¶

The Pandoc MCP Server provides powerful document conversion capabilities using the versatile pandoc tool. This Go-based server enables text conversion between 30+ document formats with support for standalone documents, table of contents generation, and custom metadata. It serves as a bridge between the MCP protocol and pandoc's extensive format conversion capabilities.

Key Features¶

Convert between 30+ document formats: Supports markdown, HTML, LaTeX, PDF, DOCX, EPUB, and many more
Standalone document generation: Create complete, self-contained documents
Table of contents support: Automatically generate TOCs for supported formats
Custom metadata handling: Add titles, authors, and other metadata to documents
Format discovery tools: List available input and output formats
Health monitoring: Check pandoc installation and version information

Quick Start¶

Prerequisites¶

Pandoc must be installed on the system:

# Ubuntu/Debian
sudo apt install pandoc

# macOS
brew install pandoc

# Windows: Download from pandoc.org

# Verify installation
pandoc --version

Go 1.23 or later for building from source.

Installation¶

From Source¶

# Clone the repository
git clone <repository-url>
cd pandoc-server

# Install dependencies
go mod download

# Build the server
make build

Using Docker¶

# Build the Docker image
docker build -t pandoc-server .

# Run the container
docker run -i pandoc-server

Running the Server¶

# Run the built server
./dist/pandoc-server

# Or with MCP Gateway for HTTP/SSE access
python3 -m mcpgateway.translate --stdio "./dist/pandoc-server" --port 9000

Available Tools¶

pandoc¶

Convert text from one format to another using pandoc.

Parameters:

from (required): Input format (e.g., markdown, html, latex, rst, docx, epub)
to (required): Output format (e.g., html, markdown, latex, pdf, docx, plain)
input (required): The text content to convert
standalone: Produce a standalone document (default: false)
title: Document title for standalone documents
metadata: Additional metadata in key=value format
toc: Include table of contents (default: false)

list-formats¶

List available pandoc input and output formats.

Parameters:

type: Format type to list - 'input', 'output', or 'all' (default: 'all')

health¶

Check if pandoc is installed and return version information.

Returns:

Pandoc installation status
Version information
Available features and extensions

Configuration¶

MCP Client Configuration¶

{
  "mcpServers": {
    "pandoc-server": {
      "command": "./dist/pandoc-server"
    }
  }
}

Via MCP Gateway¶

{
  "mcpServers": {
    "pandoc-server": {
      "command": "python3",
      "args": ["-m", "mcpgateway.translate", "--stdio", "./dist/pandoc-server", "--port", "9000"]
    }
  }
}

Examples¶

Convert Markdown to HTML¶

{
  "tool": "pandoc",
  "arguments": {
    "from": "markdown",
    "to": "html",
    "input": "# Hello World\n\nThis is **bold** text.",
    "standalone": true,
    "title": "My Document"
  }
}

Response:

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="" xml:lang="">
<head>
  <meta charset="utf-8" />
  <title>My Document</title>
</head>
<body>
<h1 id="hello-world">Hello World</h1>
<p>This is <strong>bold</strong> text.</p>
</body>
</html>

Convert HTML to Markdown¶

{
  "tool": "pandoc",
  "arguments": {
    "from": "html",
    "to": "markdown",
    "input": "<h1>Title</h1><p>This is a <em>paragraph</em> with <strong>formatting</strong>.</p>"
  }
}

Create LaTeX Document with TOC¶

{
  "tool": "pandoc",
  "arguments": {
    "from": "markdown",
    "to": "latex",
    "input": "# Introduction\n\nThis is the introduction.\n\n# Main Content\n\nThis is the main content.\n\n# Conclusion\n\nThis is the conclusion.",
    "standalone": true,
    "title": "Research Paper",
    "toc": true,
    "metadata": "author=John Doe,date=2024-01-15"
  }
}

Convert DOCX to Plain Text¶

{
  "tool": "pandoc",
  "arguments": {
    "from": "docx",
    "to": "plain",
    "input": "<base64-encoded-docx-content>"
  }
}

List Available Formats¶

{
  "tool": "list-formats",
  "arguments": {
    "type": "input"
  }
}

Response:

{
  "success": true,
  "format_type": "input",
  "formats": [
    "markdown",
    "html",
    "latex",
    "rst",
    "docx",
    "epub",
    "json",
    "csv",
    "mediawiki",
    "org"
  ]
}

Check Pandoc Health¶

{
  "tool": "health",
  "arguments": {}
}

Response:

{
  "success": true,
  "pandoc_installed": true,
  "version": "2.19.2",
  "features": ["pdf-engine", "lua-filters", "bibliography"],
  "status": "healthy"
}

Integration¶

With MCP Gateway¶

The Pandoc server integrates seamlessly with MCP Gateway for HTTP and SSE access:

# Start pandoc server via MCP Gateway
python3 -m mcpgateway.translate --stdio "./dist/pandoc-server" --port 9000

# Register with MCP Gateway
curl -X POST http://localhost:8000/gateways \
  -H "Content-Type: application/json" \
  -d '{
    "name": "pandoc-server",
    "url": "http://localhost:9000",
    "description": "Document conversion server using Pandoc"
  }'

Programmatic Usage¶

// Example Go client usage
package main

import (
    "context"
    "fmt"
    "log"

    "github.com/your-org/mcp-go-client"
)

func main() {
    client, err := mcp.NewStdioClient("./dist/pandoc-server")
    if err != nil {
        log.Fatal(err)
    }
    defer client.Close()

    // Convert markdown to HTML
    result, err := client.CallTool(context.Background(), "pandoc", map[string]any{
        "from":       "markdown",
        "to":         "html",
        "input":      "# Hello\n\nWorld!",
        "standalone": true,
    })
    if err != nil {
        log.Fatal(err)
    }

    fmt.Println(result)
}

Supported Formats¶

Pandoc supports numerous input and output formats. Use the list-formats tool to see all available formats on your system.

Common Input Formats¶

markdown: Pandoc's extended Markdown
html: HTML documents
latex: LaTeX documents
rst: reStructuredText
docx: Microsoft Word documents
epub: EPUB e-books
json: Pandoc JSON format
csv: Comma-separated values
mediawiki: MediaWiki markup
org: Emacs Org mode

Common Output Formats¶

html: HTML documents
markdown: Markdown format
latex: LaTeX documents
pdf: PDF documents (requires LaTeX)
docx: Microsoft Word documents
epub: EPUB e-books
plain: Plain text
json: Pandoc JSON format
asciidoc: AsciiDoc format
rst: reStructuredText

Advanced Features¶

Metadata Handling¶

{
  "tool": "pandoc",
  "arguments": {
    "from": "markdown",
    "to": "html",
    "input": "# Document\n\nContent here.",
    "standalone": true,
    "title": "My Article",
    "metadata": "author=John Doe,date=2024-01-15,keywords=documentation pandoc"
  }
}

Table of Contents Generation¶

{
  "tool": "pandoc",
  "arguments": {
    "from": "markdown",
    "to": "html",
    "input": "# Chapter 1\n\n## Section 1.1\n\n### Subsection 1.1.1\n\n# Chapter 2\n\n## Section 2.1",
    "standalone": true,
    "toc": true,
    "title": "Technical Manual"
  }
}

Batch Processing¶

// Example batch conversion
documents := []struct {
    input  string
    format string
}{
    {"# Doc 1\n\nContent 1", "html"},
    {"# Doc 2\n\nContent 2", "latex"},
    {"# Doc 3\n\nContent 3", "docx"},
}

for i, doc := range documents {
    result, err := client.CallTool(context.Background(), "pandoc", map[string]any{
        "from":       "markdown",
        "to":         doc.format,
        "input":      doc.input,
        "standalone": true,
        "title":      fmt.Sprintf("Document %d", i+1),
    })
    if err != nil {
        log.Printf("Error converting doc %d: %v", i+1, err)
        continue
    }

    // Process result...
}

Use Cases¶

Documentation Workflows¶

Convert Markdown documentation to HTML for web publishing
Generate PDF versions of documentation from Markdown sources
Transform reStructuredText to various output formats

Content Publishing¶

Convert blog posts between different markup formats
Generate e-books (EPUB) from Markdown sources
Create presentation slides from Markdown

Academic Writing¶

Convert between LaTeX and Word formats for collaboration
Generate bibliographies and citations
Create formatted academic papers

Report Generation¶

Convert data reports to multiple output formats
Generate executive summaries in different formats
Create standardized document templates

Migration Projects¶

Convert legacy document formats to modern alternatives
Batch process document archives
Standardize document formats across organizations

Error Handling¶

The server provides comprehensive error handling for:

Missing Pandoc Installation: Clear error messages with installation guidance
Unsupported Format Combinations: Validation of input/output format compatibility
Invalid Input Content: Proper error reporting for malformed documents
Conversion Failures: Detailed pandoc error messages
Resource Limits: Handling of large documents and memory constraints

Development¶

Building from Source¶

# Format code
make fmt

# Run tests
make test

# Tidy dependencies
make tidy

# Build binary
make build

Testing¶

# Run all tests
make test

# Test specific functionality
go test -v ./...

# Test with coverage
go test -cover ./...

Docker Development¶

# Build development image
docker build -t pandoc-server:dev .

# Run tests in container
docker run --rm pandoc-server:dev make test

# Interactive development shell
docker run --rm -it pandoc-server:dev /bin/sh

Performance Considerations¶

Pandoc Startup Overhead: Each conversion spawns a new pandoc process
Large Documents: Memory usage scales with document size
Complex Formats: PDF generation requires LaTeX installation and is slower
Concurrent Requests: The server can handle multiple simultaneous conversions
Caching: Consider implementing caching for frequently converted content

Security Considerations¶

Input Validation: The server validates input formats and content
Process Isolation: Each pandoc conversion runs in a separate process
Resource Limits: Consider implementing timeouts for long-running conversions
File System Access: Pandoc may access local files for includes and templates

Limitations¶

Format Support: Available formats depend on pandoc installation and features
Binary Content: Some formats require special handling for binary content
Template Dependencies: Custom templates and includes may require additional setup
PDF Generation: Requires LaTeX installation for PDF output
Large Files: Very large documents may hit memory or processing limits