Pandoc Server¶
Overview¶
The Pandoc MCP Server provides powerful document conversion capabilities using the versatile pandoc tool. This Go-based server enables text conversion between 30+ document formats with support for standalone documents, table of contents generation, and custom metadata. It serves as a bridge between the MCP protocol and pandoc's extensive format conversion capabilities.
Key Features¶
- Convert between 30+ document formats: Supports markdown, HTML, LaTeX, PDF, DOCX, EPUB, and many more
- Standalone document generation: Create complete, self-contained documents
- Table of contents support: Automatically generate TOCs for supported formats
- Custom metadata handling: Add titles, authors, and other metadata to documents
- Format discovery tools: List available input and output formats
- Health monitoring: Check pandoc installation and version information
Quick Start¶
Prerequisites¶
Pandoc must be installed on the system:
# Ubuntu/Debian
sudo apt install pandoc
# macOS
brew install pandoc
# Windows: Download from pandoc.org
# Verify installation
pandoc --version
Go 1.23 or later for building from source.
Installation¶
From Source¶
# Clone the repository
git clone <repository-url>
cd pandoc-server
# Install dependencies
go mod download
# Build the server
make build
Using Docker¶
# Build the Docker image
docker build -t pandoc-server .
# Run the container
docker run -i pandoc-server
Running the Server¶
# Run the built server
./dist/pandoc-server
# Or with MCP Gateway for HTTP/SSE access
python3 -m mcpgateway.translate --stdio "./dist/pandoc-server" --port 9000
Available Tools¶
pandoc¶
Convert text from one format to another using pandoc.
Parameters: - from
(required): Input format (e.g., markdown, html, latex, rst, docx, epub) - to
(required): Output format (e.g., html, markdown, latex, pdf, docx, plain) - input
(required): The text content to convert - standalone
: Produce a standalone document (default: false) - title
: Document title for standalone documents - metadata
: Additional metadata in key=value format - toc
: Include table of contents (default: false)
list-formats¶
List available pandoc input and output formats.
Parameters: - type
: Format type to list - 'input', 'output', or 'all' (default: 'all')
health¶
Check if pandoc is installed and return version information.
Returns: - Pandoc installation status - Version information - Available features and extensions
Configuration¶
MCP Client Configuration¶
Via MCP Gateway¶
{
"mcpServers": {
"pandoc-server": {
"command": "python3",
"args": ["-m", "mcpgateway.translate", "--stdio", "./dist/pandoc-server", "--port", "9000"]
}
}
}
Examples¶
Convert Markdown to HTML¶
{
"tool": "pandoc",
"arguments": {
"from": "markdown",
"to": "html",
"input": "# Hello World\n\nThis is **bold** text.",
"standalone": true,
"title": "My Document"
}
}
Response:
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="" xml:lang="">
<head>
<meta charset="utf-8" />
<title>My Document</title>
</head>
<body>
<h1 id="hello-world">Hello World</h1>
<p>This is <strong>bold</strong> text.</p>
</body>
</html>
Convert HTML to Markdown¶
{
"tool": "pandoc",
"arguments": {
"from": "html",
"to": "markdown",
"input": "<h1>Title</h1><p>This is a <em>paragraph</em> with <strong>formatting</strong>.</p>"
}
}
Create LaTeX Document with TOC¶
{
"tool": "pandoc",
"arguments": {
"from": "markdown",
"to": "latex",
"input": "# Introduction\n\nThis is the introduction.\n\n# Main Content\n\nThis is the main content.\n\n# Conclusion\n\nThis is the conclusion.",
"standalone": true,
"title": "Research Paper",
"toc": true,
"metadata": "author=John Doe,date=2024-01-15"
}
}
Convert DOCX to Plain Text¶
{
"tool": "pandoc",
"arguments": {
"from": "docx",
"to": "plain",
"input": "<base64-encoded-docx-content>"
}
}
List Available Formats¶
Response:
{
"success": true,
"format_type": "input",
"formats": [
"markdown",
"html",
"latex",
"rst",
"docx",
"epub",
"json",
"csv",
"mediawiki",
"org"
]
}
Check Pandoc Health¶
Response:
{
"success": true,
"pandoc_installed": true,
"version": "2.19.2",
"features": ["pdf-engine", "lua-filters", "bibliography"],
"status": "healthy"
}
Integration¶
With MCP Gateway¶
The Pandoc server integrates seamlessly with MCP Gateway for HTTP and SSE access:
# Start pandoc server via MCP Gateway
python3 -m mcpgateway.translate --stdio "./dist/pandoc-server" --port 9000
# Register with MCP Gateway
curl -X POST http://localhost:8000/gateways \
-H "Content-Type: application/json" \
-d '{
"name": "pandoc-server",
"url": "http://localhost:9000",
"description": "Document conversion server using Pandoc"
}'
Programmatic Usage¶
// Example Go client usage
package main
import (
"context"
"fmt"
"log"
"github.com/your-org/mcp-go-client"
)
func main() {
client, err := mcp.NewStdioClient("./dist/pandoc-server")
if err != nil {
log.Fatal(err)
}
defer client.Close()
// Convert markdown to HTML
result, err := client.CallTool(context.Background(), "pandoc", map[string]any{
"from": "markdown",
"to": "html",
"input": "# Hello\n\nWorld!",
"standalone": true,
})
if err != nil {
log.Fatal(err)
}
fmt.Println(result)
}
Supported Formats¶
Pandoc supports numerous input and output formats. Use the list-formats
tool to see all available formats on your system.
Common Input Formats¶
- markdown: Pandoc's extended Markdown
- html: HTML documents
- latex: LaTeX documents
- rst: reStructuredText
- docx: Microsoft Word documents
- epub: EPUB e-books
- json: Pandoc JSON format
- csv: Comma-separated values
- mediawiki: MediaWiki markup
- org: Emacs Org mode
Common Output Formats¶
- html: HTML documents
- markdown: Markdown format
- latex: LaTeX documents
- pdf: PDF documents (requires LaTeX)
- docx: Microsoft Word documents
- epub: EPUB e-books
- plain: Plain text
- json: Pandoc JSON format
- asciidoc: AsciiDoc format
- rst: reStructuredText
Advanced Features¶
Metadata Handling¶
{
"tool": "pandoc",
"arguments": {
"from": "markdown",
"to": "html",
"input": "# Document\n\nContent here.",
"standalone": true,
"title": "My Article",
"metadata": "author=John Doe,date=2024-01-15,keywords=documentation pandoc"
}
}
Table of Contents Generation¶
{
"tool": "pandoc",
"arguments": {
"from": "markdown",
"to": "html",
"input": "# Chapter 1\n\n## Section 1.1\n\n### Subsection 1.1.1\n\n# Chapter 2\n\n## Section 2.1",
"standalone": true,
"toc": true,
"title": "Technical Manual"
}
}
Batch Processing¶
// Example batch conversion
documents := []struct {
input string
format string
}{
{"# Doc 1\n\nContent 1", "html"},
{"# Doc 2\n\nContent 2", "latex"},
{"# Doc 3\n\nContent 3", "docx"},
}
for i, doc := range documents {
result, err := client.CallTool(context.Background(), "pandoc", map[string]any{
"from": "markdown",
"to": doc.format,
"input": doc.input,
"standalone": true,
"title": fmt.Sprintf("Document %d", i+1),
})
if err != nil {
log.Printf("Error converting doc %d: %v", i+1, err)
continue
}
// Process result...
}
Use Cases¶
Documentation Workflows¶
- Convert Markdown documentation to HTML for web publishing
- Generate PDF versions of documentation from Markdown sources
- Transform reStructuredText to various output formats
Content Publishing¶
- Convert blog posts between different markup formats
- Generate e-books (EPUB) from Markdown sources
- Create presentation slides from Markdown
Academic Writing¶
- Convert between LaTeX and Word formats for collaboration
- Generate bibliographies and citations
- Create formatted academic papers
Report Generation¶
- Convert data reports to multiple output formats
- Generate executive summaries in different formats
- Create standardized document templates
Migration Projects¶
- Convert legacy document formats to modern alternatives
- Batch process document archives
- Standardize document formats across organizations
Error Handling¶
The server provides comprehensive error handling for:
- Missing Pandoc Installation: Clear error messages with installation guidance
- Unsupported Format Combinations: Validation of input/output format compatibility
- Invalid Input Content: Proper error reporting for malformed documents
- Conversion Failures: Detailed pandoc error messages
- Resource Limits: Handling of large documents and memory constraints
Development¶
Building from Source¶
# Format code
make fmt
# Run tests
make test
# Tidy dependencies
make tidy
# Build binary
make build
Testing¶
# Run all tests
make test
# Test specific functionality
go test -v ./...
# Test with coverage
go test -cover ./...
Docker Development¶
# Build development image
docker build -t pandoc-server:dev .
# Run tests in container
docker run --rm pandoc-server:dev make test
# Interactive development shell
docker run --rm -it pandoc-server:dev /bin/sh
Performance Considerations¶
- Pandoc Startup Overhead: Each conversion spawns a new pandoc process
- Large Documents: Memory usage scales with document size
- Complex Formats: PDF generation requires LaTeX installation and is slower
- Concurrent Requests: The server can handle multiple simultaneous conversions
- Caching: Consider implementing caching for frequently converted content
Security Considerations¶
- Input Validation: The server validates input formats and content
- Process Isolation: Each pandoc conversion runs in a separate process
- Resource Limits: Consider implementing timeouts for long-running conversions
- File System Access: Pandoc may access local files for includes and templates
Limitations¶
- Format Support: Available formats depend on pandoc installation and features
- Binary Content: Some formats require special handling for binary content
- Template Dependencies: Custom templates and includes may require additional setup
- PDF Generation: Requires LaTeX installation for PDF output
- Large Files: Very large documents may hit memory or processing limits