Observability
ObservabilityΒΆ
MCP Gateway provides comprehensive observability through two complementary systems:
- Internal Observability - Built-in database-backed tracing with Admin UI dashboards
- OpenTelemetry - Standard distributed tracing to external backends (Phoenix, Jaeger, Tempo)
DocumentationΒΆ
- OpenTelemetry Overview - External observability with OTLP backends
- Internal Observability - Built-in tracing, metrics, and Admin UI dashboards
- Phoenix Integration - AI/LLM-focused observability with Arize Phoenix
Quick StartΒΆ
Internal Observability (Built-in)ΒΆ
# Enable internal observability
export OBSERVABILITY_ENABLED=true
# Run MCP Gateway
mcpgateway
# View dashboards at http://localhost:4444/admin/observability
OpenTelemetry (External)ΒΆ
# Enable OpenTelemetry (enabled by default)
export OTEL_ENABLE_OBSERVABILITY=true
export OTEL_TRACES_EXPORTER=otlp
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
# Start Phoenix for AI/LLM observability
docker run -p 6006:6006 -p 4317:4317 arizephoenix/phoenix:latest
# Run MCP Gateway
mcpgateway
# View traces at http://localhost:6006
Prometheus metrics (important)ΒΆ
Note: the metrics exposure is wired from mcpgateway/main.py but the HTTP handler itself is registered by the metrics module. The main application imports and calls setup_metrics(app) from mcpgateway.services.metrics. The setup_metrics function instruments the FastAPI app and registers the Prometheus scrape endpoint using the Prometheus instrumentator; the endpoint available to Prometheus scrapers is:
- GET /metrics/prometheus
The route is created by Instrumentator.expose inside mcpgateway/services/metrics.py (not by manually adding a GET handler in main.py). The endpoint is registered with include_in_schema=True (so it appears in OpenAPI / Swagger) and gzip compression is enabled by default (should_gzip=True) for the exposition handler.
Env vars / settings that control metricsΒΆ
ENABLE_METRICS(env) β set totrue(default) to enable instrumentation; setfalseto disable.METRICS_EXCLUDED_HANDLERS(env / settings) β comma-separated regexes for endpoints to exclude from instrumentation (useful for SSE/WS or per-request high-cardinality paths). The implementation readssettings.METRICS_EXCLUDED_HANDLERSand compiles the patterns.METRICS_CUSTOM_LABELS(env / settings) β comma-separatedkey=valuepairs used as static labels on theapp_infogauge (low-cardinality values only). When present, a Prometheusapp_infogauge is created and set to 1 with those labels.- Additional settings in
mcpgateway/config.py:METRICS_NAMESPACE,METRICS_SUBSYSTEM. Note: these config fields exist, but the currentmetricsmodule does not wire them into the instrumentator by default (they're available for future use/consumption by custom collectors).
Enable / verify locallyΒΆ
-
Ensure
ENABLE_METRICS=truein your shell or.env. -
Start the gateway (development). By default the app listens on port 4444. The Prometheus endpoint will be:
-
Quick check (get the first lines of exposition text):
-
If metrics are disabled, the endpoint returns a small JSON 503 response.
Prometheus scrape job exampleΒΆ
Add the job below to your prometheus.yml for local testing:
scrape_configs:
- job_name: 'mcp-gateway'
metrics_path: /metrics/prometheus
static_configs:
- targets: ['localhost:4444']
If Prometheus runs in Docker, adjust the target host accordingly (host networking or container host IP). See the repo docs/manage/scale.md for examples of deploying Prometheus in Kubernetes.
Grafana and dashboardsΒΆ
- Use Grafana to import dashboards for Kubernetes, PostgreSQL and Redis (IDs suggested elsewhere in the repo). For MCP Gateway app metrics, create panels for:
- Request rate:
rate(http_requests_total[1m]) - Error rate:
rate(http_requests_total{status=~"5.."}[5m]) - P99 latency:
histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))
- Request rate:
Common pitfalls β short guidanceΒΆ
-
High-cardinality labels
- Never add per-request identifiers (user IDs, full URIs, request IDs) as Prometheus labels. They explode the number of time series and can crash Prometheus memory.
- Use
METRICS_CUSTOM_LABELSonly for low-cardinality labels (env, region).
-
Compression (gzip) vs CPU
- The metrics exposer in
mcpgateway.services.metricsenables gzip by default for the/metrics/prometheusendpoint. Compressing the payload reduces network usage but increases CPU on scrape time. On CPU-constrained nodes consider increasing scrape interval (e.g. 15sβ30s) or disabling gzip at the instrumentor layer.
- The metrics exposer in
-
Duplicate collectors during reloads/tests
- Instrumentation registers collectors on the global Prometheus registry. When reloading the app in the same process (tests, interactive sessions) you may see "collector already registered"; restart the process or clear the registry in test fixtures.
Quick checklistΒΆ
-
ENABLE_METRICS=true -
/metrics/prometheusreachable - Add scrape job to Prometheus
- Exclude high-cardinality paths with
METRICS_EXCLUDED_HANDLERS - Use tracing (OTel) for high-cardinality debugging information
Where to look in the codeΒΆ
mcpgateway/main.pyβ wiring: imports and callssetup_metrics(app)frommcpgateway.services.metrics. The function call instruments the app at startup; the actual HTTP handler for/metrics/prometheusis registered by theInstrumentatorinsidemcpgateway/services/metrics.py.mcpgateway/services/metrics.pyβ instrumentation implementation and env-vars.mcpgateway/config.pyβ settings defaults and names used by the app.