Observability¶

ContextForge includes production-grade OpenTelemetry instrumentation for distributed tracing, enabling you to monitor performance, debug issues, and understand request flows across your gateway instances.

Overview¶

The observability implementation is vendor-agnostic at the transport/export layer and works with any OTLP-compatible backend. ContextForge also supports optional Langfuse-oriented span enrichment when you point OTLP at a Langfuse ingestion endpoint, or when you explicitly enable that schema with OTEL_EMIT_LANGFUSE_ATTRIBUTES=true.

Recommended Backend Options¶

Langfuse - LLM Observability and Analytics¶

Best for: LLM trace visualization, prompt management, evaluations, and cost tracking

Langfuse is an open-source LLM observability platform that receives traces via standard OTLP/HTTP. It provides a comprehensive suite of tools for monitoring, evaluating, and improving LLM applications.

Key Features:

Trace Visualization: End-to-end request traces with latency breakdown and error analysis
Prompt Management: Version, test, and deploy prompts with A/B testing
Evaluations: Score traces with custom or built-in evaluators
Cost Tracking: Token usage and cost analytics per model and user
Datasets: Create test datasets from production traces for regression testing
OpenTelemetry Native: Standard OTLP/HTTP ingestion (v3.22.0+)

Ideal Use Cases:

Monitoring LLM-powered tool invocations and prompt rendering
Tracking costs and token usage across gateway operations
Building evaluation pipelines for response quality
Managing and versioning prompt templates

Transport: Uses OTLP over HTTP (gRPC not supported)

See the Langfuse Integration Guide for setup instructions.

Arize Phoenix - AI/LLM-Focused Observability¶

Best for: AI applications, LLM debugging, and prompt optimization

Arize Phoenix is an open-source AI observability platform specifically designed for LLM applications and AI systems. Built on OpenTelemetry, it provides specialized features for understanding AI application behavior.

Key Features:

LLM Tracing: Purpose-built for tracking LLM calls, token usage, and model performance
Evaluation Tools: Built-in evaluators for response quality, retrieval accuracy, and hallucination detection
Prompt Playground: Interactive environment for optimizing prompts with version control and experimentation
Framework Support: Native integrations with LlamaIndex, LangChain, Haystack, DSPy, and major LLM providers
Cost Tracking: Monitor token usage and API costs across different providers
Zero Lock-in: Fully open source and self-hostable with no feature gates

Ideal Use Cases:

Debugging RAG (Retrieval-Augmented Generation) pipelines
Optimizing prompt templates and model parameters
Tracking LLM performance metrics and costs
Evaluating response quality and accuracy

Transport: Uses OTLP (OpenTelemetry Protocol) via gRPC

Jaeger - Production-Grade Distributed Tracing¶

Best for: Microservices architectures, production deployments, and general-purpose tracing

Jaeger is a mature, battle-tested distributed tracing platform originally created by Uber and now a CNCF graduated project. It excels at monitoring complex distributed systems.

Key Features:

Proven Scale: Handles high-volume production environments with battle-tested reliability
Full Observability: Comprehensive request flow visualization across microservices
Multiple Storage Backends: Supports Elasticsearch, Cassandra, and other scalable storage solutions
Kubernetes Native: Deep integration with cloud-native environments
Advanced Sampling: Configurable sampling strategies (constant, probabilistic, rate-limiting)
OpenTelemetry Integration: Full OTLP support in v2 with native OpenTelemetry data model

Ideal Use Cases:

Large-scale microservices deployments
Performance bottleneck identification
Service dependency mapping
Root cause analysis of distributed failures

Transport: Supports both native Jaeger protocol and OTLP

Grafana Tempo - Cost-Efficient High-Scale Tracing¶

Best for: High-volume environments, cost-conscious deployments, and Grafana ecosystem users

Grafana Tempo is a high-scale, low-cost distributed tracing backend that eliminates expensive indexing by leveraging object storage (S3, GCS, Azure Blob).

Key Features:

Extreme Cost Efficiency: Uses cheap object storage instead of expensive databases (Elasticsearch/Cassandra)
Massive Scale: Designed to handle 100% trace capture without prohibitive costs
No Heavy Indexing: Lightweight bloom filters instead of full-text indexes reduce storage costs dramatically
TraceQL Query Language: Powerful query language inspired by PromQL and LogQL
Grafana Ecosystem Integration: Deep integration with Grafana, Prometheus, Loki, and Mimir
Multi-Protocol Support: Compatible with Jaeger, Zipkin, OpenCensus, and OpenTelemetry

Ideal Use Cases:

High-volume microservices (170k+ spans/second proven in production)
Cost-sensitive environments requiring long trace retention
Organizations already using Grafana stack
Teams wanting 100% trace capture without sampling

Transport: Uses OTLP (OpenTelemetry Protocol) via gRPC

Other Compatible Backends¶

The OTLP exporter also works with commercial APM solutions:

Datadog APM - Full-featured commercial observability platform
New Relic - Cloud-based application performance monitoring
Honeycomb - Observability for production systems
Zipkin - Lightweight distributed tracing (legacy but widely supported)

What Gets Traced¶

Tool invocations - Full lifecycle with arguments, results, and timing
Prompt rendering - Template processing and message generation
Resource fetching - URI resolution, caching, and content retrieval
Gateway federation - Cross-gateway requests and health checks
Plugin execution - Pre/post hooks if plugins are enabled
Errors and exceptions - Full stack traces and error context

Quick Start¶

1. Install Dependencies¶

The observability packages are included in the Docker containers by default. For local development:

# Install base gateway with core observability support
pip install mcp-contextforge-gateway[observability]

Important: Only one exporter backend should be installed at a time - they are mutually exclusive. Install the specific backend package based on your chosen observability platform (see backend-specific instructions in the "Start Your Backend" section below).

2. Configure Environment¶

Set these environment variables (or add to .env):

# Enable observability (default: false)
export OTEL_ENABLE_OBSERVABILITY=true

# Service identification
export OTEL_SERVICE_NAME=mcp-gateway
export OTEL_SERVICE_VERSION=1.0.0-RC-3
export OTEL_DEPLOYMENT_ENVIRONMENT=development

# Choose your backend (otlp, jaeger, zipkin, console, none)
export OTEL_TRACES_EXPORTER=otlp

# OTLP Configuration (for Phoenix, Tempo, etc.)
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
export OTEL_EXPORTER_OTLP_PROTOCOL=grpc
export OTEL_EXPORTER_OTLP_INSECURE=true

3. Start Your Backend¶

Choose your preferred observability backend:

Phoenix (AI/LLM Focus)¶

# Install Phoenix exporter backend
pip install opentelemetry-exporter-otlp-proto-grpc

# Start Phoenix
docker run -d \
  --name phoenix \
  -p 6006:6006 \
  -p 4317:4317 \
  arizephoenix/phoenix:latest

# Configure environment
export OTEL_TRACES_EXPORTER=otlp
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
export OTEL_SERVICE_NAME=mcp-gateway

# View UI at http://localhost:6006

Jaeger¶

# Install Jaeger exporter backend
pip install opentelemetry-exporter-jaeger

# Start Jaeger
docker run -d \
  --name jaeger \
  -p 16686:16686 \
  -p 14268:14268 \
  jaegertracing/all-in-one

# Configure environment
export OTEL_TRACES_EXPORTER=jaeger
export OTEL_EXPORTER_JAEGER_ENDPOINT=http://localhost:14268/api/traces
export OTEL_SERVICE_NAME=mcp-gateway

# View UI at http://localhost:16686

Zipkin¶

# Install Zipkin exporter backend
pip install opentelemetry-exporter-zipkin

# Start Zipkin
docker run -d \
  --name zipkin \
  -p 9411:9411 \
  openzipkin/zipkin

# Configure environment
export OTEL_TRACES_EXPORTER=zipkin
export OTEL_EXPORTER_ZIPKIN_ENDPOINT=http://localhost:9411/api/v2/spans
export OTEL_SERVICE_NAME=mcp-gateway

# View UI at http://localhost:9411

Grafana Tempo¶

# Install OTLP exporter backend (Tempo uses OTLP)
pip install opentelemetry-exporter-otlp-proto-grpc

# Start Tempo
docker run -d \
  --name tempo \
  -p 4317:4317 \
  -p 3200:3200 \
  grafana/tempo:latest

# Configure environment (uses OTLP)
export OTEL_TRACES_EXPORTER=otlp
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
export OTEL_SERVICE_NAME=mcp-gateway

Console (Development)¶

# For debugging - prints traces to stdout
export OTEL_TRACES_EXPORTER=console
export OTEL_SERVICE_NAME=mcp-gateway

4. Run ContextForge¶

# Start the gateway (observability is disabled by default)
mcpgateway

# Or with Docker
docker run -e OTEL_ENABLE_OBSERVABILITY=true \
           -e OTEL_EXPORTER_OTLP_ENDPOINT=http://host.docker.internal:4317 \
           ghcr.io/ibm/mcp-context-forge:1.0.0-RC-3

Configuration Reference¶

Core Settings¶

Variable	Description	Default	Options
`OTEL_ENABLE_OBSERVABILITY`	Master switch	`false`	`true`, `false`
`OTEL_SERVICE_NAME`	Service identifier	`mcp-gateway`	Any string
`OTEL_SERVICE_VERSION`	Service version	`1.0.0-RC-3`	Any string
`DEPLOYMENT_ENV` / `ENVIRONMENT`	Environment tag	`development`	`development`, `staging`, `production`
`OTEL_TRACES_EXPORTER`	Export backend	`otlp`	`otlp`, `jaeger`, `zipkin`, `console`, `none`
`OTEL_RESOURCE_ATTRIBUTES`	Custom attributes	-	`key=value,key2=value2`
`OTEL_COPY_RESOURCE_ATTRS_TO_SPANS`	Copy selected resource attrs onto spans	`false`	`true`, `false`

OTLP Configuration¶

Variable	Description	Default	Example
`OTEL_EXPORTER_OTLP_ENDPOINT`	Collector endpoint	-	`http://localhost:4317`
`OTEL_EXPORTER_OTLP_PROTOCOL`	Protocol	`grpc`	`grpc`, `http/protobuf`
`OTEL_EXPORTER_OTLP_HEADERS`	Auth headers	-	`api-key=secret,x-auth=token`
`LANGFUSE_OTEL_ENDPOINT`	Optional Langfuse OTLP/HTTP endpoint override	-	`https://cloud.langfuse.com/api/public/otel/v1/traces`
`LANGFUSE_PUBLIC_KEY`	Langfuse project public key for derived OTLP auth	-	`pk-lf-...`
`LANGFUSE_SECRET_KEY`	Langfuse project secret key for derived OTLP auth	-	`sk-lf-...`
`LANGFUSE_OTEL_AUTH`	Optional base64-encoded `pk:sk` auth override	-	base64 string
`OTEL_EXPORTER_OTLP_INSECURE`	Skip TLS verify	`true`	`true`, `false`
`OTEL_EMIT_LANGFUSE_ATTRIBUTES`	Force-enable or disable Langfuse-specific span attributes	auto	`true`, `false`
`OTEL_CAPTURE_IDENTITY_ATTRIBUTES`	Force-enable or disable user/team identity enrichment	auto	`true`, `false`

Alternative Backends¶

Variable	Description	Default
`OTEL_EXPORTER_JAEGER_ENDPOINT`	Jaeger collector	`http://localhost:14268/api/traces`
`OTEL_EXPORTER_JAEGER_USER`	Jaeger collector username	-
`OTEL_EXPORTER_JAEGER_PASSWORD`	Jaeger collector password	-
`OTEL_EXPORTER_ZIPKIN_ENDPOINT`	Zipkin collector	`http://localhost:9411/api/v2/spans`

Payload Capture and Redaction¶

Variable	Description	Default
`OTEL_REDACT_FIELDS`	Comma-separated field names redacted from structured trace payloads and free-text error messages	`password,secret,token,...`
`OTEL_MAX_TRACE_PAYLOAD_SIZE`	Maximum serialized trace payload size in characters	`32768`
`OTEL_CAPTURE_INPUT_SPANS`	Comma-separated allowlist of span names that may capture observation input payloads	empty
`OTEL_CAPTURE_OUTPUT_SPANS`	Comma-separated allowlist of span names that may capture observation output payloads	empty

Input/output capture is allowlist-based. ContextForge does not capture those payloads by default unless the relevant span names are listed. Structured payloads are redacted by field name, and exported string values are also scrubbed for sensitive URLs, key=value secret patterns, and embedded Bearer / Basic credentials. The local Langfuse compose overlay sets a dev-friendly input allowlist for tool.invoke,prompt.render,llm.proxy,a2a.invoke; production deployments should decide that list intentionally.

Performance Tuning¶

Variable	Description	Default
`OTEL_TRACES_SAMPLER`	Sampling strategy	`parentbased_traceidratio`
`OTEL_TRACES_SAMPLER_ARG`	Sample rate (0.0-1.0)	`0.1` (10%)
`OTEL_BSP_MAX_QUEUE_SIZE`	Max queued spans	`2048`
`OTEL_BSP_MAX_EXPORT_BATCH_SIZE`	Batch size	`512`
`OTEL_BSP_SCHEDULE_DELAY`	Export interval (ms)	`5000`

Understanding Traces¶

Span Attributes¶

Each span includes standard attributes:

Operation name - e.g., tool.invoke, prompt.render, resource.read
Service info - Service name, version, environment
User context - User ID, tenant ID, request ID
Timing - Start time, duration, end time
Status - Success/error status with error details

Tool Invocation Spans¶

{
  "name": "tool.invoke",
  "attributes": {
    "tool.name": "github_search",
    "tool.id": "550e8400-e29b-41d4-a716",
    "tool.integration_type": "REST",
    "arguments_count": 3,
    "success": true,
    "duration.ms": 234.5,
    "http.status_code": 200
  }
}

Error Tracking¶

Failed operations include:

error: true
error.type: Exception class name
error.message: Error description
Sanitized exception event metadata (exception.type, exception.message)

Production Deployment¶

Docker Compose¶

Use the provided compose files:

# Start ContextForge with Phoenix observability
docker-compose -f docker-compose.yml \
               -f docker-compose.with-phoenix.yml up -d

Kubernetes¶

Add environment variables to your deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: mcp-gateway
spec:
  template:
    spec:
      containers:

      - name: gateway
        image: ghcr.io/ibm/mcp-context-forge:1.0.0-RC-3
        env:

        - name: OTEL_ENABLE_OBSERVABILITY
          value: "true"

        - name: OTEL_TRACES_EXPORTER
          value: "otlp"

        - name: OTEL_EXPORTER_OTLP_ENDPOINT
          value: "http://otel-collector:4317"

        - name: OTEL_SERVICE_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.labels['app.kubernetes.io/name']

Sampling Strategies¶

For production, adjust sampling to balance visibility and performance:

# Sample 1% of traces
export OTEL_TRACES_SAMPLER=parentbased_traceidratio
export OTEL_TRACES_SAMPLER_ARG=0.01

# Always sample errors (coming in future update)
# export OTEL_TRACES_SAMPLER=parentbased_always_on_errors

Testing Your Setup¶

Generate Test Traces¶

Use the trace generator helper to verify your observability backend is working:

# Activate virtual environment if needed
. .venv/bin/activate

# Run the trace generator
python tests/integration/helpers/trace_generator.py

This will send sample traces for:

Tool invocations
Prompt rendering
Resource fetching
Gateway federation
Complex workflows with nested spans

Troubleshooting¶

No Traces Appearing¶

Check observability is enabled:

echo $OTEL_ENABLE_OBSERVABILITY  # Should be "true"

Verify endpoint is reachable:

curl -v http://localhost:4317  # Should connect

Use console exporter for debugging:

export OTEL_TRACES_EXPORTER=console
mcpgateway  # Traces will print to stdout

High Memory Usage¶

Reduce batch size and queue limits:

export OTEL_BSP_MAX_QUEUE_SIZE=512
export OTEL_BSP_MAX_EXPORT_BATCH_SIZE=128

Missing Spans¶

Check sampling rate:

# Temporarily disable sampling
export OTEL_TRACES_SAMPLER=always_on

Performance Impact¶

When disabled: Zero overhead (no-op context managers)
When enabled: ~0.1-0.5ms per span
Memory: ~50MB for typical workload
Network: Batched exports every 5 seconds

Next Steps¶

Phoenix Integration - AI/LLM-focused observability
Langfuse Integration - LLM observability and prompt management
Internal Observability - Built-in database-backed tracing
Prometheus Metrics - Time-series monitoring

OTEL Span Attributes - Complete span attributes reference
OpenTelemetry Architecture - Technical implementation details
OpenTelemetry Best Practices - Official OTEL documentation

Observability¶

Overview¶

Recommended Backend Options¶

Langfuse - LLM Observability and Analytics¶

Arize Phoenix - AI/LLM-Focused Observability¶

Jaeger - Production-Grade Distributed Tracing¶

Grafana Tempo - Cost-Efficient High-Scale Tracing¶

Other Compatible Backends¶

What Gets Traced¶

Quick Start¶

1. Install Dependencies¶

2. Configure Environment¶

3. Start Your Backend¶

Phoenix (AI/LLM Focus)¶

Jaeger¶

Zipkin¶

Grafana Tempo¶

Console (Development)¶

4. Run ContextForge¶

Configuration Reference¶

Core Settings¶

OTLP Configuration¶

Alternative Backends¶

Payload Capture and Redaction¶

Performance Tuning¶

Understanding Traces¶

Span Attributes¶

Tool Invocation Spans¶

Error Tracking¶

Production Deployment¶

Docker Compose¶

Kubernetes¶

Sampling Strategies¶

Testing Your Setup¶

Generate Test Traces¶

Troubleshooting¶

No Traces Appearing¶

High Memory Usage¶

Missing Spans¶

Performance Impact¶

Next Steps¶

Related Documentation¶