Skip to content

LibreOffice Server

Overview

The LibreOffice MCP Server provides comprehensive document conversion capabilities using LibreOffice in headless mode. It supports conversion between various document formats including PDF, DOCX, ODT, HTML, and more, with batch processing, text extraction, and document merging capabilities.

Key Features

  • Document Conversion: Convert between multiple formats (PDF, DOCX, ODT, HTML, TXT, etc.)
  • Batch Processing: Convert multiple documents at once
  • Text Extraction: Extract text content from documents
  • Document Merging: Merge PDF documents (requires pdftk)
  • Document Analysis: Get document information and metadata
  • Format Support: Wide range of input and output formats via LibreOffice

Quick Start

Prerequisites

LibreOffice must be installed:

# Ubuntu/Debian
sudo apt install libreoffice

# macOS
brew install --cask libreoffice

# Windows: Download from libreoffice.org

Optional - for PDF merging:

# Ubuntu/Debian
sudo apt install pdftk

# macOS
brew install pdftk-java

Installation

# Install in development mode
make dev-install

# Or install normally
make install

Running the Server

# Stdio mode (for Claude Desktop, IDEs)
make dev

# HTTP mode (via MCP Gateway)
make serve-http

Available Tools

convert_document

Convert a single document to another format.

Parameters: - input_file (required): Path to input document - output_format (required): Target format (pdf, docx, odt, html, txt, etc.) - output_dir: Output directory (default: same as input file) - output_filename: Custom output filename

convert_batch

Convert multiple documents to the same format.

Parameters: - input_files (required): List of input file paths - output_format (required): Target format for all files - output_dir: Output directory (default: "./converted")

merge_documents

Merge multiple documents (PDF merging requires pdftk).

Parameters: - input_files (required): List of document paths to merge - output_file (required): Path for merged document - format: Output format (default: "pdf")

extract_text

Extract text content from documents.

Parameters: - input_file (required): Path to input document - output_file: Path for extracted text file - preserve_formatting: Keep basic formatting (default: false)

get_document_info

Get document metadata and statistics.

Parameters: - input_file (required): Path to document

list_supported_formats

List all supported input/output formats.

Returns: - Available input formats - Available output formats - Format descriptions and capabilities

Configuration

MCP Client Configuration

{
  "mcpServers": {
    "libreoffice-server": {
      "command": "python",
      "args": ["-m", "libreoffice_server.server_fastmcp"],
      "cwd": "/path/to/libreoffice_server"
    }
  }
}

Examples

Convert DOCX to PDF

{
  "input_file": "presentation.docx",
  "output_format": "pdf",
  "output_dir": "./converted",
  "output_filename": "presentation_final.pdf"
}

Batch Convert Multiple Documents

{
  "input_files": ["doc1.docx", "doc2.odt", "doc3.rtf"],
  "output_format": "pdf",
  "output_dir": "./batch_output"
}

Extract Text from PDF

{
  "input_file": "document.pdf",
  "output_file": "extracted_text.txt",
  "preserve_formatting": true
}

Merge PDF Documents

{
  "input_files": ["chapter1.pdf", "chapter2.pdf", "chapter3.pdf"],
  "output_file": "complete_book.pdf",
  "format": "pdf"
}

Get Document Information

{
  "input_file": "./report.docx"
}

Response:

{
  "success": true,
  "file_info": {
    "filename": "report.docx",
    "size": 245760,
    "format": "Microsoft Word Document",
    "created": "2024-01-15T10:30:00",
    "modified": "2024-01-15T14:20:00"
  },
  "document_info": {
    "title": "Monthly Report",
    "author": "John Doe",
    "subject": "Sales Analysis",
    "page_count": 12,
    "word_count": 2350
  },
  "conversion_capabilities": ["pdf", "odt", "html", "txt", "rtf"]
}

List Supported Formats

{}

Response:

{
  "success": true,
  "input_formats": [
    {"extension": "docx", "description": "Microsoft Word Document"},
    {"extension": "odt", "description": "OpenDocument Text"},
    {"extension": "pdf", "description": "Portable Document Format"},
    {"extension": "html", "description": "HyperText Markup Language"}
  ],
  "output_formats": [
    {"extension": "pdf", "description": "Portable Document Format"},
    {"extension": "docx", "description": "Microsoft Word Document"},
    {"extension": "odt", "description": "OpenDocument Text"},
    {"extension": "html", "description": "HyperText Markup Language"}
  ]
}

Integration

With MCP Gateway

# Start the LibreOffice server via HTTP
make serve-http

# Register with MCP Gateway
curl -X POST http://localhost:8000/gateways \
  -H "Content-Type: application/json" \
  -d '{
    "name": "libreoffice-server",
    "url": "http://localhost:9000",
    "description": "Document conversion server using LibreOffice"
  }'

Programmatic Usage

import asyncio
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client

async def convert_documents():
    server_params = StdioServerParameters(
        command="python",
        args=["-m", "libreoffice_server.server_fastmcp"]
    )

    async with stdio_client(server_params) as (read, write):
        async with ClientSession(read, write) as session:
            await session.initialize()

            # Convert single document
            result = await session.call_tool("convert_document", {
                "input_file": "./document.docx",
                "output_format": "pdf",
                "output_dir": "./converted"
            })

            # Batch convert
            batch_result = await session.call_tool("convert_batch", {
                "input_files": ["file1.docx", "file2.odt"],
                "output_format": "pdf",
                "output_dir": "./batch_converted"
            })

asyncio.run(convert_documents())

Supported Formats

Input Formats

  • Documents: DOC, DOCX, ODT, RTF, TXT, HTML, HTM, PDF
  • Spreadsheets: XLS, XLSX, ODS, CSV
  • Presentations: PPT, PPTX, ODP

Output Formats

  • Documents: PDF, DOCX, ODT, HTML, TXT, RTF
  • Spreadsheets: XLSX, ODS, CSV
  • Presentations: PPTX, ODP
  • Images: PNG, JPG, SVG (for presentations)

Advanced Features

Batch Processing with Custom Output

# Convert multiple files with custom naming
files_to_convert = [
    {"input": "report_q1.docx", "output_name": "Q1_Report_Final.pdf"},
    {"input": "report_q2.docx", "output_name": "Q2_Report_Final.pdf"},
    {"input": "report_q3.docx", "output_name": "Q3_Report_Final.pdf"}
]

for file_info in files_to_convert:
    await session.call_tool("convert_document", {
        "input_file": file_info["input"],
        "output_format": "pdf",
        "output_filename": file_info["output_name"]
    })

Document Pipeline Processing

# Multi-step document processing
async def process_document_pipeline(input_file):
    # Step 1: Get document info
    info = await session.call_tool("get_document_info", {
        "input_file": input_file
    })

    # Step 2: Extract text for analysis
    await session.call_tool("extract_text", {
        "input_file": input_file,
        "output_file": f"{input_file}_text.txt"
    })

    # Step 3: Convert to PDF for archival
    await session.call_tool("convert_document", {
        "input_file": input_file,
        "output_format": "pdf",
        "output_dir": "./archive"
    })

    # Step 4: Convert to HTML for web display
    await session.call_tool("convert_document", {
        "input_file": input_file,
        "output_format": "html",
        "output_dir": "./web"
    })

Document Merging Workflow

# Merge multiple documents into a single PDF
chapters = ["intro.docx", "chapter1.docx", "chapter2.docx", "conclusion.docx"]

# First convert all to PDF
pdf_files = []
for chapter in chapters:
    result = await session.call_tool("convert_document", {
        "input_file": chapter,
        "output_format": "pdf",
        "output_dir": "./temp_pdfs"
    })
    pdf_files.append(f"./temp_pdfs/{chapter.replace('.docx', '.pdf')}")

# Then merge all PDFs
await session.call_tool("merge_documents", {
    "input_files": pdf_files,
    "output_file": "./final_book.pdf"
})

Use Cases

Document Workflow Automation

  • Convert incoming documents to standardized formats
  • Batch process document archives
  • Create PDF versions for legal compliance

Content Management Systems

  • Convert user uploads to web-friendly formats
  • Generate multiple format versions for different platforms
  • Extract text for search indexing

Publishing Workflows

  • Convert manuscripts between formats
  • Generate print and digital versions
  • Merge chapters into complete publications

Business Process Automation

  • Convert reports to PDF for distribution
  • Extract data from documents for processing
  • Standardize document formats across organization

Digital Archive Management

  • Convert legacy documents to modern formats
  • Create searchable text versions
  • Generate preservation-quality PDFs

Performance Considerations

  • LibreOffice startup overhead affects single conversions
  • Batch processing is more efficient for multiple files
  • Large documents may require increased timeout values
  • Complex formatting may not be perfectly preserved

Error Handling

The server provides comprehensive error handling for:

  • LibreOffice Installation: Detection and guidance for missing LibreOffice
  • Format Support: Clear messages for unsupported format combinations
  • File Access: Permission and file existence errors
  • Conversion Failures: Detailed error messages from LibreOffice
  • Resource Limits: Handling of large files and memory constraints

Limitations

  • LibreOffice conversion quality depends on the version installed
  • Some complex formatting may not be preserved during conversion
  • PDF merging requires additional tools like pdftk
  • Large files may take longer to process
  • Some proprietary formats may have limited support