This diagram showcases the performance-optimized architecture of MCP Gateway (ContextForge), highlighting Rust-powered components, async patterns, and scaling capabilities.
Architecture Overview βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β KUBERNETES ORCHESTRATION LAYER β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β Horizontal Pod Autoscaler (HPA) ββ
β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ββ
β β β CPU Target: 70% β Memory Target: 80% β ββ
β β β Min Replicas: 3 β Max Replicas: 50 β ββ
β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ββ
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β EDGE / PROXY LAYER β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β NGINX Caching Proxy ββ
β β βββββββββββββββββββββ βββββββββββββββββββββ βββββββββββββββββββββ βββββββββββββββββββββββββββββ ββ
β β β Brotli/Gzip/Zstd β β Static Cache 1GB β β API Cache 512MB β β Rate Limiting 3000r/s β ββ
β β β Compression β β 30-day TTL β β 5-min TTL β β Burst: 3000 requests β ββ
β β β 30-70% savings β β X-Cache-Status β β Schema Cache β β Conn Limit: 3000/IP β ββ
β β βββββββββββββββββββββ βββββββββββββββββββββ βββββββββββββββββββββ βββββββββββββββββββββββββββββ ββ
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ββ
β β β worker_processes: auto β worker_connections: 8192 β keepalive: 512 β backlog: 4096 β ββ
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ββ
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββ
β β β
βΌ βΌ βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β GATEWAY APPLICATION LAYER (Replicated Pods) β
β β
β ββββββββββββββββββββββββββββ ββββββββββββββββββββββββββββ ββββββββββββββββββββββββββββ β
β β Gateway Pod 1 β β Gateway Pod 2 β β Gateway Pod N β β
β β β β β β β β
β β ββββββββββββββββββββββ β β ββββββββββββββββββββββ β β ββββββββββββββββββββββ β β
β β β HTTP SERVER LAYER β β β β HTTP SERVER LAYER β β β β HTTP SERVER LAYER β β β
β β β βββββββββββββββββββ β β β βββββββββββββββββββ β β β βββββββββββββββββββ β β
β β β β GRANIAN ββ β β β β GRANIAN ββ β β β β GRANIAN ββ β β
β β β β (Rust HTTP) ββ β β β β (Rust HTTP) ββ β β β β (Rust HTTP) ββ β β
β β β β +20-50% perf ββ β β β β +20-50% perf ββ β β β β +20-50% perf ββ β β
β β β βββββββββββββββββββ β β β βββββββββββββββββββ β β β βββββββββββββββββββ β β
β β β 16 workers β β β β 16 workers β β β β 16 workers β β β
β β β backlog: 4096 β β β β backlog: 4096 β β β β backlog: 4096 β β β
β β β backpressure: 64 β β β β backpressure: 64 β β β β backpressure: 64 β β β
β β ββββββββββββββββββββββ β β ββββββββββββββββββββββ β β ββββββββββββββββββββββ β β
β β β β β β β β β β β
β β βΌ β β βΌ β β βΌ β β
β β ββββββββββββββββββββββ β β ββββββββββββββββββββββ β β ββββββββββββββββββββββ β β
β β β ASYNC RUNTIME β β β β ASYNC RUNTIME β β β β ASYNC RUNTIME β β β
β β β βββββββββββββββββββ β β β βββββββββββββββββββ β β β βββββββββββββββββββ β β
β β β β UVLOOP ββ β β β β UVLOOP ββ β β β β UVLOOP ββ β β
β β β β (Cython/libuv) ββ β β β β (Cython/libuv) ββ β β β β (Cython/libuv) ββ β β
β β β β 2-4x faster ββ β β β β 2-4x faster ββ β β β β 2-4x faster ββ β β
β β β βββββββββββββββββββ β β β βββββββββββββββββββ β β β βββββββββββββββββββ β β
β β β 1000+ concurrent β β β β 1000+ concurrent β β β β 1000+ concurrent β β β
β β β requests/worker β β β β requests/worker β β β β requests/worker β β β
β β ββββββββββββββββββββββ β β ββββββββββββββββββββββ β β ββββββββββββββββββββββ β β
β β β β β β β β β β β
β βββββββββββββΌβββββββββββββββ βββββββββββββΌβββββββββββββββ βββββββββββββΌβββββββββββββββ β
ββββββββββββββββΌββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββ
β β β
βββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β RUST-POWERED COMPONENTS β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β FASTAPI APPLICATION ββ
β β ββ
β β βββββββββββββββββββββββββ βββββββββββββββββββββββββ βββββββββββββββββββββββββ ββ
β β β ββββββββββββββββββββββ β ββββββββββββββββββββββ β ββββββββββββββββββββββ ββ
β β β β PYDANTIC V2 ββ β β ORJSON ββ β β HIREDIS ββ ββ
β β β β (Rust Core) ββ β β (Rust JSON) ββ β β (C Parser) ββ ββ
β β β ββββββββββββββββββββββ β ββββββββββββββββββββββ β ββββββββββββββββββββββ ββ
β β β β’ 5-50x faster β β β’ 3x faster β β β’ Up to 83x faster β ββ
β β β validation β β serialization β β Redis parsing β ββ
β β β β’ GIL bypass β β β’ Native types β β β’ Large response β ββ
β β β β’ 5,463 lines β β β’ ORJSONResponse β β optimization β ββ
β β β of schemas β β β’ SSE streaming β β β’ Auto fallback β ββ
β β βββββββββββββββββββββββββ βββββββββββββββββββββββββ βββββββββββββββββββββββββ ββ
β β ββ
β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ββ
β β β MULTI-LEVEL CACHING (80-95% DB reduction) β ββ
β β β ββββββββββββββββββ ββββββββββββββββββ ββββββββββββββββββ ββββββββββββββββββ βββββββββββββββ β ββ
β β β β JWT Cache β β Auth Cache β β Registry Cache β β Admin Stats β β GlobalConfigβ β ββ
β β β β TTL: 30s β β TTL: 60s β β TTL: 15-20s β β TTL: 30-60s β β TTL: 60s β β ββ
β β β β <1ms auth β β 0-1 queries β β 95%+ hit rate β β Dashboard opt β β 42Kβ0 qry β β ββ
β β β ββββββββββββββββββ ββββββββββββββββββ ββββββββββββββββββ ββββββββββββββββββ βββββββββββββββ β ββ
β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ββ
β β ββ
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ββ
β β β PERFORMANCE OPTIMIZATIONS β ββ
β β β β’ Precompiled regex validators β’ Lazy f-string logging β ββ
β β β β’ Cached Jinja templates β’ Cached JSONPath parsing β ββ
β β β β’ Cached jq filter compilation β’ Cached JSON Schema validators β ββ
β β β β’ has_hooks_for optimization β’ Buffered metrics writes β ββ
β β β β’ Bulk UPDATE for token cleanup β’ SQL-based metrics aggregation β ββ
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ββ
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββ
β β β
βΌ βΌ βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β DATA LAYER β
β β
β βββββββββββββββββββββββββββββββββββββββββ βββββββββββββββββββββββββββββββββββββββββββ β
β β CONNECTION POOLING β β DISTRIBUTED CACHE β β
β β β β β β
β β ββββββββββββββββββββββββββββββββββ β β ββββββββββββββββββββββββββββββββββββββββ β
β β β PGBOUNCER β β β β REDIS ββ β
β β β Connection Multiplexer β β β β High-Performance ββ β
β β β ββββββββββββββββββββββββββββ β β β β βββββββββββββββββββββββββββββββββ ββ β
β β β β MAX_CLIENT_CONN: 5000 β β β β β β βββββββββββββββββββββββββββ β ββ β
β β β β DEFAULT_POOL_SIZE: 450 β β β β β β β HIREDIS C PARSER β β ββ β
β β β β MAX_DB_CONNECTIONS: 550 β β β β β β β Up to 83x faster β β ββ β
β β β β POOL_MODE: transaction β β β β β β βββββββββββββββββββββββββββ β ββ β
β β β β 8x connection reduction β β β β β β maxmemory: 1GB β ββ β
β β β ββββββββββββββββββββββββββββ β β β β β maxclients: 10000 β ββ β
β β ββββββββββββββββββββββββββββββββββ β β β β tcp-backlog: 2048 β ββ β
β β β β β β β allkeys-lru eviction β ββ β
β β βΌ β β β βββββββββββββββββββββββββββββββββ ββ β
β β ββββββββββββββββββββββββββββββββββββ β β β ββ β
β β β POSTGRESQL 18 β β β β Session storage: TTL 3600s ββ β
β β β Production Database β β β β Message cache: TTL 600s ββ β
β β β βββββββββββββββββββββββββββββ β β β β Federation cache ββ β
β β β β βββββββββββββββββββββββ β β β β β Leader election ββ β
β β β β β PSYCOPG V3 β β β β β ββββββββββββββββββββββββββββββββββββββββ β
β β β β β (Modern Driver) β β β β βββββββββββββββββββββββββββββββββββββββββββ β
β β β β βββββββββββββββββββββββ β β β β
β β β β β’ Auto-prepared stmts β β β β
β β β β β’ COPY protocol (5-10x) β β β β
β β β β β’ Pipeline mode (2-5x) β β β β
β β β β β’ Native async I/O β β β β
β β β βββββββββββββββββββββββββββββ β β β
β β β max_connections: 700 β β β
β β β shared_buffers: 512MB β β β
β β β effective_cache_size: 1536MB β β β
β β β synchronous_commit: off β β β
β β β idle_in_transaction: 30s β β β
β β β SSD optimized (random_page: 1.1)β β β
β β ββββββββββββββββββββββββββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β OBSERVABILITY & MONITORING β
β βββββββββββββββββββββ βββββββββββββββββββββ βββββββββββββββββββββ βββββββββββββββββββββββββββββββββ β
β β Prometheus β β Grafana β β Loki β β Exporters β β
β β Metrics Store β β Dashboards β β Log Aggregation β β PostgreSQL | Redis | Nginx β β
β β 7-day retention β β Visualization β β LogQL β β PgBouncer | cAdvisor β β
β βββββββββββββββββββββ βββββββββββββββββββββ βββββββββββββββββββββ βββββββββββββββββββββββββββββββββ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β OpenTelemetry Integration ββ
β β OTEL Traces β OTLP Exporter β Collector β Service Name: mcp-gateway ββ
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Rust-Powered Components (GIL Bypass) Component Technology Performance Gain Use Case Pydantic v2 Rust core (pydantic-core) 5-50x faster validation Request/response schemas (5,463 lines) orjson Rust JSON library 3x faster serialization All JSON encoding/decoding Granian Rust HTTP server +20-50% throughput HTTP request handling hiredis C-based Redis parser Up to 83x faster Large Redis response parsing uvloop Cython/libuv event loop 2-4x faster async I/O Async event loop
Optimization Before After Improvement psycopg v3 prepared statements Parsed each query Auto-prepared 2-3x faster repeated queries COPY protocol INSERT statements Binary COPY 5-10x faster bulk inserts Pipeline mode Sequential queries Pipelined 2-5x batch improvements PgBouncer pooling 1600+ connections 200 connections 8x connection reduction
Cache Layer Hit Rate Latency Reduction DB Query Reduction JWT Cache >80% 5-12ms β <1ms Per-request auth overhead Auth Cache >90% 8-15ms β 1-3ms 3-4 β 0-1 queries/request Registry Cache 95%+ Variable 50-200 β 0-1 queries GlobalConfig Cache 99%+ 1ms β 0.00001ms 42K+ queries eliminated
Compression & Bandwidth Algorithm Compression Ratio Best For Brotli 15-25% smaller than Gzip Production, CDNs Zstd Very good, fastest High-throughput APIs Gzip Good, universal Legacy compatibility
Scaling Capacity Configuration Capacity Single pod (16 workers) ~1,600 RPS 3 pods (default) ~4,800 RPS 10 pods (HPA scaled) ~16,000 RPS 50 pods (max) ~80,000 RPS
Issue # Feature Impact #1695 Granian HTTP server migration +20-50% throughput #1696, #1692 orjson throughout codebase 3x JSON performance #1699 uvicorn[standard] with uvloop/httptools 15-30% faster async #1702 hiredis Redis parser Up to 83x Redis parsing #1740 psycopg v3 migration Auto-prepared, COPY, pipeline #1750, #1753 PgBouncer connection pooling 8x connection reduction #1715 GlobalConfig in-memory cache 42K queries eliminated #1773 get_user_teams() caching Reduced idle-in-transaction #1809-1814 Schema/template/filter caching Compilation overhead removed #1816, #1819, #1830 Precompiled regex patterns CPU reduction in hot paths #1828, #1837 SSE/logging micro-optimizations Reduced allocation overhead #1844 Monitoring profile Production observability #2025 Startup resilience (exponential backoff) Prevents crash-loop CPU storms
Startup Resilience The gateway implements exponential backoff with jitter for database and Redis connection retries at startup. This prevents CPU-intensive crash-respawn loops when dependencies are temporarily unavailable.
Problem Solved Without exponential backoff, a dependency outage would cause:
Worker starts β Connection fails after 3 attempts (6s) β Worker crashes
β
Granian respawns worker immediately β Worker starts β Connection fails β Crashes
β
Tight crash-respawn loop β 500%+ CPU consumption β System destabilization
Solution: Exponential Backoff with Jitter βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β EXPONENTIAL BACKOFF RETRY PATTERN β
β β
β Attempt 1 Attempt 2 Attempt 3 Attempt 4 Attempt 5+ β
β β β β β β β
β βΌ βΌ βΌ βΌ βΌ β
β βββββββ βββββββ βββββββ ββββββββ ββββββββ β
β β 2s β β 4s β β 8s β β 16s β β 30s β (capped) β
β βββββββ βββββββ βββββββ ββββββββ ββββββββ β
β β β β β β β
β ββββββββββββββ΄βββββββββββββ΄βββββββββββββ΄βββββββββββββ β
β β β
β βΌ β
β Β±25% Random Jitter β
β (prevents thundering herd) β
β β
β Formula: sleep = min(base Γ 2^(attempt-1), 30s) Γ (1 Β± 0.25 Γ random) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Metric Before After Retry attempts 3 (~6 seconds) 30 (~5 minutes) CPU during outage ~500% (crash loop) ~0% (sleeping) Recovery pattern Thundering herd Staggered with jitter System stability Cascading failures Graceful degradation
Configuration # Database Startup Resilience
DB_MAX_RETRIES = 30 # Max attempts before worker exits (default: 30)
DB_RETRY_INTERVAL_MS = 2000 # Base interval in ms (doubles each attempt)
# Redis Startup Resilience
REDIS_MAX_RETRIES = 30 # Max attempts before worker exits (default: 30)
REDIS_RETRY_INTERVAL_MS = 2000 # Base interval in ms (doubles each attempt)
Retry Progression Example With default settings (2s base interval):
Attempt Base Delay With Jitter (Β±25%) Cumulative Time 1 2s 1.5s - 2.5s ~2s 2 4s 3s - 5s ~6s 3 8s 6s - 10s ~14s 4 16s 12s - 20s ~30s 5+ 30s (cap) 22.5s - 37.5s ~60s+
After 30 retries: approximately 5 minutes total wait time before the worker gives up, providing ample time for dependencies to recover during maintenance windows or transient outages.
Future: Python 3.14 Free-Threading (GIL Removal) Current Architecture (Python 3.11-3.13):
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 16 Worker Processes Γ 1 GIL each = True Parallelism β
β Memory: 256MB base + (16 Γ 200MB) = ~3.5GB β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Future Architecture (Python 3.14+):
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 2 Worker Processes Γ 32 Threads = True Parallelism β
β Memory: ~1GB (shared memory, reduced IPC overhead) β
β Performance: Near-linear scaling with CPU cores β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Quick Reference Commands # Start full performance stack
docker-compose up -d
# Access via caching proxy (production)
curl http://localhost:8080/health
# Start with monitoring
docker-compose --profile monitoring up -d
# View cache hit rates
curl -I http://localhost:8080/tools | grep X-Cache-Status
# Run load test
hey -n 10000 -c 200 -H "Authorization: Bearer $TOKEN " http://localhost:8080/
# Check HPA status (Kubernetes)
kubectl get hpa -n mcp-gateway