OpenTelemetry Integration¶
ContextForge integrates OpenTelemetry (OTEL) for distributed tracing, providing comprehensive observability across MCP operations, tool invocations, and plugin execution.
Overview¶
The OTEL integration provides:
- W3C Trace Context Propagation: Automatic propagation of trace context via
traceparentheaders - W3C Baggage Support: Optional extraction of allowlisted HTTP headers into OTEL baggage
- Request-Root Spans: Every HTTP request creates a root span in the observability middleware
- MCP Client Spans: Detailed tracing of MCP protocol operations (initialize, request, response)
- Plugin Hook Spans: Visibility into plugin execution lifecycle
- Session Pool Awareness: Non-pooled sessions propagate trace context and baggage; pooled sessions skip per-request propagation to prevent context pollution
Architecture¶
Span Hierarchy¶
http.request (root span)
├── mcp.client.call
│ ├── mcp.client.initialize
│ ├── mcp.client.request
│ └── mcp.client.response
├── plugin.hook.prompt_pre_fetch
├── plugin.hook.tool_pre_invoke
└── plugin.hook.tool_post_invoke
Span attributes may also include request baggage dimensions, for example:
- `baggage.tenant.id`
- `baggage.user.id`
- `baggage.request.id`
Trace Context Flow¶
- Inbound Request: Extract
traceparentheader from incoming HTTP request - Optional Baggage Extraction: Convert configured request headers into OTEL baggage
- Root Span: Create request-root span with extracted trace ID and baggage attributes
- Child Spans: All operations inherit trace context automatically
- Outbound Requests: Inject
traceparentand optionalbaggageheaders into MCP client calls - Upstream Propagation: Upstream MCP servers can attach their spans to the trace
Configuration¶
Environment Variables¶
# Enable OTEL tracing
OTEL_ENABLE_OBSERVABILITY=true
# Exporter configuration
OTEL_EXPORTER_TYPE=otlp # otlp, jaeger, zipkin, console
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
OTEL_EXPORTER_OTLP_PROTOCOL=grpc # grpc or http
# Service identification
OTEL_SERVICE_NAME=mcp-gateway
OTEL_SERVICE_VERSION=1.0.0
# Resource attributes (comma-separated key=value pairs)
OTEL_RESOURCE_ATTRIBUTES=deployment.environment=production,service.namespace=mcp
# Batch processor tuning
OTEL_BSP_MAX_QUEUE_SIZE=2048
OTEL_BSP_MAX_EXPORT_BATCH_SIZE=512
OTEL_BSP_SCHEDULE_DELAY=5000
# Copy resource attributes to span attributes (for Arize compatibility)
OTEL_COPY_RESOURCE_ATTRS_TO_SPANS=false
# Optional baggage extraction from inbound HTTP headers
OTEL_BAGGAGE_ENABLED=false
OTEL_BAGGAGE_HEADER_MAPPINGS='[
{"header_name": "X-Tenant-ID", "baggage_key": "tenant.id"},
{"header_name": "X-User-ID", "baggage_key": "user.id"}
]'
OTEL_BAGGAGE_PROPAGATE_TO_EXTERNAL=false
OTEL_BAGGAGE_MAX_ITEMS=32
OTEL_BAGGAGE_MAX_SIZE_BYTES=8192
OTEL_BAGGAGE_LOG_REJECTED=true
OTEL_BAGGAGE_LOG_SANITIZATION=true
Langfuse Integration¶
For Langfuse observability, use the OTLP endpoint:
OTEL_EXPORTER_TYPE=otlp
OTEL_EXPORTER_OTLP_ENDPOINT=https://cloud.langfuse.com
OTEL_EXPORTER_OTLP_HEADERS=Authorization=Bearer sk-lf-...
W3C Trace Context Propagation¶
Inbound Propagation¶
The observability middleware automatically extracts W3C trace context from incoming requests:
The middleware: 1. Parses the traceparent header 2. Extracts trace-id and parent-span-id 3. Creates a new span as a child of the external trace 4. Stores trace context in request state
Outbound Propagation¶
When making MCP client calls, trace context is automatically injected:
This ensures: - Upstream MCP servers receive traceparent header - Distributed traces span multiple services - End-to-end visibility across the call chain
W3C Baggage Support¶
Purpose¶
W3C baggage carries low-cardinality request context alongside tracing information. In ContextForge this is intended for metadata such as tenant, user, or request IDs that improve trace filtering and cross-service diagnosis.
Inbound Header Extraction¶
When baggage support is enabled, the gateway can map specific inbound HTTP headers to baggage keys:
Security-Enhanced Baggage Processing¶
Dual Processing Model¶
ContextForge implements a fail-closed security model for baggage processing with two distinct input channels:
1. Header Extraction (Gatekeeper)¶
Inbound HTTP headers are converted to baggage using an explicit allowlist:
OTEL_BAGGAGE_HEADER_MAPPINGS='[
{"header_name": "X-Tenant-ID", "baggage_key": "tenant.id"},
{"header_name": "X-User-ID", "baggage_key": "user.id"}
]'
Security Properties: - Only explicitly configured headers are processed - Case-insensitive header matching prevents bypass - Values are sanitized (control characters removed) - Size limits prevent resource exhaustion - Undefined headers are logged and rejected
2. Inbound Baggage Header (Security Enhancement)¶
The W3C baggage header from upstream callers is also processed, but with strict filtering:
GET /mcp/sse HTTP/1.1
traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
baggage: tenant.id=tenant-123,user.id=user-456,malicious.key=attack
Security Properties: - Only baggage keys matching configured baggage_key values are accepted - Unauthorized keys (e.g., malicious.key) are filtered out - Values undergo same sanitization as header-extracted baggage - Same size and item limits apply - Fail-closed: unknown keys are rejected, not propagated
Configuration Approach¶
Production-Ready Example¶
# Enable baggage with security controls
OTEL_BAGGAGE_ENABLED=true
# Define allowlist: only these headers → baggage keys
OTEL_BAGGAGE_HEADER_MAPPINGS='[
{"header_name": "X-Tenant-ID", "baggage_key": "tenant.id"},
{"header_name": "X-User-ID", "baggage_key": "user.id"},
{"header_name": "X-Request-ID", "baggage_key": "request.id"},
{"header_name": "X-Correlation-ID", "baggage_key": "correlation.id"}
]'
# Security: disable downstream propagation by default
OTEL_BAGGAGE_PROPAGATE_TO_EXTERNAL=false
# Resource limits
OTEL_BAGGAGE_MAX_ITEMS=32
OTEL_BAGGAGE_MAX_SIZE_BYTES=8192
# Audit logging
OTEL_BAGGAGE_LOG_REJECTED=true
OTEL_BAGGAGE_LOG_SANITIZATION=true
Multi-Tenant Example¶
# Multi-tenant SaaS with user context
OTEL_BAGGAGE_HEADER_MAPPINGS='[
{"header_name": "X-Tenant-ID", "baggage_key": "tenant.id"},
{"header_name": "X-Organization-ID", "baggage_key": "org.id"},
{"header_name": "X-User-ID", "baggage_key": "user.id"},
{"header_name": "X-User-Email", "baggage_key": "user.email"},
{"header_name": "X-Session-ID", "baggage_key": "session.id"}
]'
Distributed Tracing Example¶
# Enable downstream propagation for distributed tracing
OTEL_BAGGAGE_PROPAGATE_TO_EXTERNAL=true
# Minimal context for cross-service correlation
OTEL_BAGGAGE_HEADER_MAPPINGS='[
{"header_name": "X-Request-ID", "baggage_key": "request.id"},
{"header_name": "X-Trace-ID", "baggage_key": "trace.id"}
]'
Security Implications of PROPAGATE_TO_EXTERNAL¶
When Disabled (Default - Recommended)¶
Behavior: - Baggage is recorded on internal spans only - Downstream MCP servers do NOT receive baggage header - Trace context (traceparent) still propagates - Prevents leaking tenant/user metadata to external services
Use When: - Downstream services are untrusted or third-party - Baggage contains sensitive tenant/user identifiers - You want observability without metadata exposure - Compliance requires data minimization
When Enabled (Opt-In)¶
Behavior: - Baggage is sent to downstream MCP servers via baggage header - Enables end-to-end correlation across service boundaries - Downstream services can attach baggage to their spans
Use When: - All downstream services are trusted and internal - Cross-service correlation is required - Downstream services need tenant/user context - You control the entire service mesh
Security Considerations: - Review what metadata is being propagated - Ensure downstream services sanitize baggage - Consider data residency and compliance requirements - Monitor for baggage size explosion
Validation and Constraints¶
Header Name Validation¶
# Valid header names (RFC 7230)
X-Tenant-ID ✅
X-User-ID ✅
X-Request-ID ✅
# Invalid header names
X-Tenant@ID ❌ (special characters)
1-Tenant-ID ❌ (starts with number)
X Tenant ID ❌ (contains spaces)
Baggage Key Validation¶
# Valid baggage keys (W3C spec)
tenant.id ✅
user.id ✅
request-id ✅
user_email ✅
# Invalid baggage keys
tenant@id ❌ (special characters)
1tenant.id ❌ (starts with number)
tenant id ❌ (contains spaces)
Size Limits¶
| Limit | Default | Purpose |
|---|---|---|
| Max Items | 32 | Prevent cardinality explosion |
| Max Size | 8192 bytes | Prevent resource exhaustion |
| Max Key Length | 256 chars | W3C spec compliance |
| Max Value Length | 4096 chars | Prevent header bloat |
Sanitization Process¶
All baggage values undergo sanitization:
# Control characters removed
"value\x00\x01\x02" → "value"
# Whitespace normalized
"value with spaces" → "value with spaces"
# Empty after sanitization → rejected
"\x00\x01\x02" → (rejected)
Monitoring and Auditing¶
Enable logging to track security events:
OTEL_BAGGAGE_LOG_REJECTED=true # Log rejected headers/keys
OTEL_BAGGAGE_LOG_SANITIZATION=true # Log sanitized values
Logged Events: - Rejected undefined headers (not in allowlist) - Rejected unauthorized baggage keys (not in allowlist) - Values sanitized (control characters removed) - Size limit violations - Item limit violations
Best Practices¶
- Minimize Baggage Keys: Only include essential correlation metadata
- Disable External Propagation: Keep
PROPAGATE_TO_EXTERNAL=falseunless required - Use Low-Cardinality Values: Avoid high-cardinality data (e.g., timestamps, UUIDs in values)
- Enable Audit Logging: Monitor rejected headers and sanitization events
- Review Regularly: Audit configured mappings and remove unused entries
- Test Limits: Verify size and item limits match your use case
- Document Mappings: Maintain documentation of header → baggage key mappings
GET /mcp/sse HTTP/1.1
traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
X-Tenant-ID: tenant-123
X-User-ID: user-456
With this configuration:
OTEL_BAGGAGE_ENABLED=true
OTEL_BAGGAGE_HEADER_MAPPINGS='[
{"header_name": "X-Tenant-ID", "baggage_key": "tenant.id"},
{"header_name": "X-User-ID", "baggage_key": "user.id"}
]'
the request context will carry baggage similar to:
The baggage middleware runs before the request-root tracing middleware, so the root request span and child spans can all see the same baggage values.
Outbound Propagation¶
Outbound propagation of baggage is opt-in:
When enabled, outbound MCP client requests include a W3C baggage header in addition to traceparent. When disabled, baggage remains internal to the gateway and is recorded only on spans.
Existing Inbound baggage Header¶
The gateway also parses inbound W3C baggage headers from upstream callers, but it does not trust them blindly. Incoming baggage is filtered to the configured baggage-key allowlist and is subject to sanitization and size limits before it is merged into the active request context.
Baggage Configuration¶
| Variable | Default | Purpose |
|---|---|---|
OTEL_BAGGAGE_ENABLED | false | Enables baggage extraction and span enrichment |
OTEL_BAGGAGE_HEADER_MAPPINGS | [] | JSON array of header_name to baggage_key mappings |
OTEL_BAGGAGE_PROPAGATE_TO_EXTERNAL | false | Sends baggage to downstream services when enabled |
OTEL_BAGGAGE_MAX_ITEMS | 32 | Maximum accepted baggage items per request |
OTEL_BAGGAGE_MAX_SIZE_BYTES | 8192 | Maximum combined baggage size from untrusted request input |
OTEL_BAGGAGE_LOG_REJECTED | true | Logs rejected headers and baggage keys |
OTEL_BAGGAGE_LOG_SANITIZATION | true | Logs when values are altered by sanitization |
Session Pooling with Tracing¶
Design Decision and Trade-off¶
Current Behavior:
# Session pool enabled only when tracing context is inactive
if settings.mcp_session_pool_enabled and not otel_context_active():
# Use base headers without trace context injection
async with pool.session(url=server_url, headers=headers) as pooled:
# Pool provides 10-20x latency improvement
# But per-request trace context and baggage do NOT propagate upstream
Why Trace Headers Are Not Injected¶
The MCP SDK pins headers at transport creation time. If we inject per-request trace headers (traceparent, X-Correlation-ID) or baggage before pooling:
- Trace Corruption: The first request's trace context gets pinned to the transport
- Context Leakage: Later unrelated requests reuse the same trace ID
- Broken Distributed Tracing: Upstream servers see wrong parent spans
- Correlation ID Leakage: Different requests appear correlated when they're not
- Baggage Leakage: Tenant or request metadata from one request bleeds into another
The Trade-off¶
| Aspect | Pooled Sessions | Non-Pooled Sessions |
|---|---|---|
| Latency | 10-20x faster (reuse connection) | Slower (new connection each time) |
| Trace Propagation | ❌ No upstream propagation | ✅ Full W3C trace context |
| Baggage Propagation | ❌ No upstream propagation | ✅ Optional W3C baggage propagation |
| Correlation IDs | ❌ Not sent to upstream | ✅ Sent per-request |
| Use Case | High-throughput, internal tracing | Distributed tracing across services |
When to Use Each¶
Use Session Pooling (default): - High request volume to same MCP servers - Internal observability is sufficient - 10-20x latency improvement is critical - Upstream servers don't need trace context or baggage
Disable Session Pooling (for distributed tracing):
- Need end-to-end distributed tracing - Upstream MCP servers participate in traces - Need downstream baggage propagation - Correlation IDs must reach upstream - Latency is acceptable trade-offImplementation Details¶
The session pool: - Reuses transports with pinned headers (base headers only) - Does NOT inject per-request trace headers or baggage - Provides 10-20x latency improvement - Maintains internal trace context within gateway - Upstream servers do not receive trace or baggage propagation
Security Considerations¶
Sanitization¶
All sensitive data is sanitized before adding to OTEL spans:
# Query string sanitization
"url.query": sanitize_trace_text(str(request.url.query))
# Exception message sanitization
sanitized_error = sanitize_for_log(sanitize_trace_text(str(e)))
"exception.message": sanitized_error
This prevents: - Leaking credentials in query parameters - Exposing sensitive error details - Bypassing existing sanitization flows
Baggage Hardening¶
Header-to-baggage conversion uses a fail-closed model:
- Only explicitly configured headers are processed
- Only configured baggage keys are accepted from inbound
baggageheaders - Values are sanitized before use and before downstream propagation
- Untrusted request input is constrained by item-count and size limits
- Downstream propagation is disabled by default
This reduces the risk of: - High-cardinality baggage exploding span dimensions - Propagating attacker-controlled metadata to downstream services - Cross-request leakage through pooled transports - Using observability channels to exfiltrate sensitive request content
Data Minimization¶
Only essential attributes are exported: - HTTP method, path, status code - Tool names and IDs (not arguments) - Timing information - Error types (not full stack traces in production)
Span Naming Conventions¶
All spans follow the <domain>.<operation> pattern:
| Domain | Operations | Example |
|---|---|---|
http | request | http.request |
mcp.client | call, initialize, request, response | mcp.client.call |
tool | invoke, list | tool.invoke |
prompt | render, list | prompt.render |
resource | invoke, list | resource.invoke |
plugin.hook | prompt_pre_fetch, tool_pre_invoke, etc. | plugin.hook.tool_pre_invoke |
Semantic Attributes¶
Standard Attributes¶
Following OpenTelemetry semantic conventions:
{
"http.method": "POST",
"http.route": "/tools/invoke",
"http.status_code": 200,
"network.protocol.name": "mcp",
"server.address": "localhost",
"server.port": 8000,
"url.path": "/mcp/sse",
"url.full": "http://localhost:8000/mcp/sse",
}
Custom Attributes¶
ContextForge-specific attributes use the contextforge. prefix:
{
"contextforge.tool.id": "tool-123",
"contextforge.gateway_id": "gateway-456",
"contextforge.runtime": "python",
"contextforge.transport": "sse",
"contextforge.user.email": "user@example.com",
"contextforge.team.id": "team-789",
"baggage.tenant.id": "tenant-123",
"baggage.user.id": "user-456",
}
Plugin Server Tracing¶
External plugin servers can enable OTEL tracing:
# In plugin server environment
OTEL_ENABLE_OBSERVABILITY=true
OTEL_SERVICE_NAME=my-plugin-server
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
Important: The OTEL_SERVICE_NAME must be set before importing mcpgateway.observability, as the tracer is initialized at import time.
Performance Impact¶
Overhead¶
- Minimal: ~1-2ms per request for span creation
- Batch Export: Spans are batched and exported asynchronously
- Configurable: Adjust batch size and delay via environment variables
Optimization¶
# Increase batch size for high-throughput scenarios
OTEL_BSP_MAX_EXPORT_BATCH_SIZE=1024
OTEL_BSP_SCHEDULE_DELAY=10000 # 10 seconds
# Increase queue size to prevent drops
OTEL_BSP_MAX_QUEUE_SIZE=4096
Troubleshooting¶
No Traces Appearing¶
- Check OTEL is enabled:
OBSERVABILITY_ENABLED=true - Verify exporter endpoint: Test connectivity to OTLP endpoint
- Check service name: Ensure
OTEL_SERVICE_NAMEis set correctly - Review logs: Look for "OpenTelemetry initialized" message
Broken Trace Context¶
- Verify header injection: Check that
inject_trace_context_headers()is called - Session pool headers: Ensure headers are injected before
pool.session() - Upstream support: Verify upstream MCP server supports W3C trace context
Performance Issues¶
- Reduce batch delay: Lower
OTEL_BSP_SCHEDULE_DELAYfor faster export - Increase batch size: Raise
OTEL_BSP_MAX_EXPORT_BATCH_SIZEto reduce export frequency - Check exporter: Ensure OTLP endpoint is responsive
Examples¶
Basic Tracing¶
from mcpgateway.observability import create_span, set_span_attribute
with create_span("custom.operation", {"custom.attr": "value"}):
# Your code here
set_span_attribute("result.count", 42)
Distributed Tracing¶
# Service A (ContextForge)
headers = inject_trace_context_headers(base_headers)
response = await httpx_client.post(upstream_url, headers=headers)
# Service B (Upstream MCP Server)
# Automatically extracts traceparent and attaches to trace
Plugin Hook Tracing¶
# Automatic tracing in plugin framework
async def tool_pre_invoke(self, payload, context):
# This hook execution is automatically traced
# Span name: plugin.hook.tool_pre_invoke
return PluginResult(continue_processing=True)
Related Documentation¶
User Guides¶
- Observability Overview - Choosing the right observability approach
- OpenTelemetry Integration - User-facing OTEL setup guide
- Internal Observability - Built-in database-backed tracing
- Prometheus Metrics - Time-series monitoring
- Langfuse Integration - LLM observability platform
- Phoenix Integration - AI/LLM-focused observability
Technical References¶
- OTEL Span Attributes Reference - Complete list of span attributes used in ContextForge
- OpenTelemetry Specification
- W3C Trace Context
- W3C Baggage
- Semantic Conventions
- Langfuse OTEL Integration