ContextForge High-Performance ArchitectureΒΆ
This diagram showcases the performance-optimized architecture of ContextForge, highlighting Rust-powered components, async patterns, and scaling capabilities.
Architecture OverviewΒΆ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β KUBERNETES ORCHESTRATION LAYER β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β Horizontal Pod Autoscaler (HPA) ββ
β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ββ
β β β CPU Target: 70% β Memory Target: 80% β ββ
β β β Min Replicas: 3 β Max Replicas: 50 β ββ
β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ββ
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β EDGE / PROXY LAYER β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β NGINX Caching Proxy ββ
β β βββββββββββββββββββββ βββββββββββββββββββββ βββββββββββββββββββββ βββββββββββββββββββββββββββββ ββ
β β β Brotli/Gzip/Zstd β β Static Cache 1GB β β API Cache 512MB β β Rate Limiting 3000r/s β ββ
β β β Compression β β 30-day TTL β β 5-min TTL β β Burst: 3000 requests β ββ
β β β 30-70% savings β β X-Cache-Status β β Schema Cache β β Conn Limit: 3000/IP β ββ
β β βββββββββββββββββββββ βββββββββββββββββββββ βββββββββββββββββββββ βββββββββββββββββββββββββββββ ββ
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ββ
β β β worker_processes: auto β worker_connections: 8192 β keepalive: 512 β backlog: 4096 β ββ
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ββ
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββ
β β β
βΌ βΌ βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β GATEWAY APPLICATION LAYER (Replicated Pods) β
β β
β ββββββββββββββββββββββββββββ ββββββββββββββββββββββββββββ ββββββββββββββββββββββββββββ β
β β Gateway Pod 1 β β Gateway Pod 2 β β Gateway Pod N β β
β β β β β β β β
β β ββββββββββββββββββββββ β β ββββββββββββββββββββββ β β ββββββββββββββββββββββ β β
β β β HTTP SERVER LAYER β β β β HTTP SERVER LAYER β β β β HTTP SERVER LAYER β β β
β β β βββββββββββββββββββ β β β βββββββββββββββββββ β β β βββββββββββββββββββ β β
β β β β GRANIAN ββ β β β β GRANIAN ββ β β β β GRANIAN ββ β β
β β β β (Rust HTTP) ββ β β β β (Rust HTTP) ββ β β β β (Rust HTTP) ββ β β
β β β β +20-50% perf ββ β β β β +20-50% perf ββ β β β β +20-50% perf ββ β β
β β β βββββββββββββββββββ β β β βββββββββββββββββββ β β β βββββββββββββββββββ β β
β β β 16 workers β β β β 16 workers β β β β 16 workers β β β
β β β backlog: 4096 β β β β backlog: 4096 β β β β backlog: 4096 β β β
β β β backpressure: 64 β β β β backpressure: 64 β β β β backpressure: 64 β β β
β β ββββββββββββββββββββββ β β ββββββββββββββββββββββ β β ββββββββββββββββββββββ β β
β β β β β β β β β β β
β β βΌ β β βΌ β β βΌ β β
β β ββββββββββββββββββββββ β β ββββββββββββββββββββββ β β ββββββββββββββββββββββ β β
β β β ASYNC RUNTIME β β β β ASYNC RUNTIME β β β β ASYNC RUNTIME β β β
β β β βββββββββββββββββββ β β β βββββββββββββββββββ β β β βββββββββββββββββββ β β
β β β β UVLOOP ββ β β β β UVLOOP ββ β β β β UVLOOP ββ β β
β β β β (Cython/libuv) ββ β β β β (Cython/libuv) ββ β β β β (Cython/libuv) ββ β β
β β β β 2-4x faster ββ β β β β 2-4x faster ββ β β β β 2-4x faster ββ β β
β β β βββββββββββββββββββ β β β βββββββββββββββββββ β β β βββββββββββββββββββ β β
β β β 1000+ concurrent β β β β 1000+ concurrent β β β β 1000+ concurrent β β β
β β β requests/worker β β β β requests/worker β β β β requests/worker β β β
β β ββββββββββββββββββββββ β β ββββββββββββββββββββββ β β ββββββββββββββββββββββ β β
β β β β β β β β β β β
β βββββββββββββΌβββββββββββββββ βββββββββββββΌβββββββββββββββ βββββββββββββΌβββββββββββββββ β
ββββββββββββββββΌββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββ
β β β
βββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β RUST-POWERED COMPONENTS β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β FASTAPI APPLICATION ββ
β β ββ
β β βββββββββββββββββββββββββ βββββββββββββββββββββββββ βββββββββββββββββββββββββ ββ
β β β ββββββββββββββββββββββ β ββββββββββββββββββββββ β ββββββββββββββββββββββ ββ
β β β β PYDANTIC V2 ββ β β ORJSON ββ β β HIREDIS ββ ββ
β β β β (Rust Core) ββ β β (Rust JSON) ββ β β (C Parser) ββ ββ
β β β ββββββββββββββββββββββ β ββββββββββββββββββββββ β ββββββββββββββββββββββ ββ
β β β β’ 5-50x faster β β β’ 3x faster β β β’ Up to 83x faster β ββ
β β β validation β β serialization β β Redis parsing β ββ
β β β β’ GIL bypass β β β’ Native types β β β’ Large response β ββ
β β β β’ 5,463 lines β β β’ ORJSONResponse β β optimization β ββ
β β β of schemas β β β’ SSE streaming β β β’ Auto fallback β ββ
β β βββββββββββββββββββββββββ βββββββββββββββββββββββββ βββββββββββββββββββββββββ ββ
β β ββ
β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ββ
β β β MULTI-LEVEL CACHING (80-95% DB reduction) β ββ
β β β ββββββββββββββββββ ββββββββββββββββββ ββββββββββββββββββ ββββββββββββββββββ βββββββββββββββ β ββ
β β β β JWT Cache β β Auth Cache β β Registry Cache β β Admin Stats β β GlobalConfigβ β ββ
β β β β TTL: 30s β β TTL: 120-300s β β TTL: 20-300s β β TTL: 30-120s β β TTL: 60s β β ββ
β β β β <1ms auth β β 0-1 queries β β 95%+ hit rate β β Dashboard opt β β 42Kβ0 qry β β ββ
β β β ββββββββββββββββββ ββββββββββββββββββ ββββββββββββββββββ ββββββββββββββββββ βββββββββββββββ β ββ
β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ββ
β β ββ
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ββ
β β β PERFORMANCE OPTIMIZATIONS β ββ
β β β β’ Precompiled regex validators β’ Lazy f-string logging β ββ
β β β β’ Cached Jinja templates β’ Cached JSONPath parsing β ββ
β β β β’ Cached jq filter compilation β’ Cached JSON Schema validators β ββ
β β β β’ has_hooks_for optimization β’ Buffered metrics writes β ββ
β β β β’ Bulk UPDATE for token cleanup β’ SQL-based metrics aggregation β ββ
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ββ
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββ
β β β
βΌ βΌ βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β DATA LAYER β
β β
β βββββββββββββββββββββββββββββββββββββββββ βββββββββββββββββββββββββββββββββββββββββββ β
β β CONNECTION POOLING β β DISTRIBUTED CACHE β β
β β β β β β
β β ββββββββββββββββββββββββββββββββββ β β ββββββββββββββββββββββββββββββββββββββββ β
β β β PGBOUNCER β β β β REDIS ββ β
β β β Connection Multiplexer β β β β High-Performance ββ β
β β β ββββββββββββββββββββββββββββ β β β β βββββββββββββββββββββββββββββββββ ββ β
β β β β MAX_CLIENT_CONN: 5000 β β β β β β βββββββββββββββββββββββββββ β ββ β
β β β β DEFAULT_POOL_SIZE: 450 β β β β β β β HIREDIS C PARSER β β ββ β
β β β β MAX_DB_CONNECTIONS: 550 β β β β β β β Up to 83x faster β β ββ β
β β β β POOL_MODE: transaction β β β β β β βββββββββββββββββββββββββββ β ββ β
β β β β 8x connection reduction β β β β β β maxmemory: 1GB β ββ β
β β β ββββββββββββββββββββββββββββ β β β β β maxclients: 10000 β ββ β
β β ββββββββββββββββββββββββββββββββββ β β β β tcp-backlog: 2048 β ββ β
β β β β β β β allkeys-lru eviction β ββ β
β β βΌ β β β βββββββββββββββββββββββββββββββββ ββ β
β β ββββββββββββββββββββββββββββββββββββ β β β ββ β
β β β POSTGRESQL 18 β β β β Session storage: TTL 3600s ββ β
β β β Production Database β β β β Message cache: TTL 600s ββ β
β β β βββββββββββββββββββββββββββββ β β β β Federation cache ββ β
β β β β βββββββββββββββββββββββ β β β β β Leader election ββ β
β β β β β PSYCOPG V3 β β β β β ββββββββββββββββββββββββββββββββββββββββ β
β β β β β (Modern Driver) β β β β βββββββββββββββββββββββββββββββββββββββββββ β
β β β β βββββββββββββββββββββββ β β β β
β β β β β’ Auto-prepared stmts β β β β
β β β β β’ COPY protocol (5-10x) β β β β
β β β β β’ Pipeline mode (2-5x) β β β β
β β β β β’ Native async I/O β β β β
β β β βββββββββββββββββββββββββββββ β β β
β β β max_connections: 700 β β β
β β β shared_buffers: 512MB β β β
β β β effective_cache_size: 1536MB β β β
β β β synchronous_commit: off β β β
β β β idle_in_transaction: 30s β β β
β β β SSD optimized (random_page: 1.1)β β β
β β ββββββββββββββββββββββββββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β OBSERVABILITY & MONITORING β
β βββββββββββββββββββββ βββββββββββββββββββββ βββββββββββββββββββββ βββββββββββββββββββββββββββββββββ β
β β Prometheus β β Grafana β β Loki β β Exporters β β
β β Metrics Store β β Dashboards β β Log Aggregation β β PostgreSQL | Redis | Nginx β β
β β 7-day retention β β Visualization β β LogQL β β PgBouncer | cAdvisor β β
β βββββββββββββββββββββ βββββββββββββββββββββ βββββββββββββββββββββ βββββββββββββββββββββββββββββββββ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β OpenTelemetry Integration ββ
β β OTEL Traces β OTLP Exporter β Collector β Service Name: mcp-gateway ββ
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
MCP Streamable HTTP Request PathsΒΆ
ContextForge now has two materially different MCP request paths, depending on the Rust runtime mode.
Mode summaryΒΆ
| Mode | Public /mcp ingress | Session/runtime ownership |
|---|---|---|
off | Python | Python |
shadow | Python | Python (Rust sidecar present internally only) |
edge | Rust | Mixed: Rust ingress, Python still backs more MCP internals |
full | Rust | Rust ingress plus Rust session/event/resume/live-stream/affinity cores |
Python-owned public path (off, shadow)ΒΆ
Client Request
-> NGINX
-> Python gateway middleware/auth/token scoping
-> Python MCP session manager + handlers
-> upstream MCP server
Rust-owned public path (edge, full)ΒΆ
Client Request
-> NGINX
-> Rust public MCP listener
-> trusted Python auth endpoint
-> Rust MCP routing/session/runtime logic
-> upstream MCP server or narrow Python internal route
Important current behavior:
- Python remains authoritative for JWT auth, token scoping, and RBAC in all modes.
edge|fullremove the old public Python ingress hop by routing nginx directly to Rust.fullalso moves MCP session, event-store, resume, live-stream, and affinity/owner-worker logic into Rust.shadowis the safety-first fallback mode: the Rust sidecar is running, but public/mcpstays mounted on Python.
Performance Characteristics by LayerΒΆ
| Layer | Typical Latency | Scaling Bottleneck | Key Tunable |
|---|---|---|---|
| nginx | <1ms | Not a bottleneck | keepalive, worker_connections |
| Python auth/control path | 5-15ms | Auth DB/cache queries | AUTH_CACHE_*, AUTH_CACHE_BATCH_QUERIES |
Rust public ingress (edge, full) | low single-digit ms | Syscall/network overhead | keepalive, upstream reuse, request shaping |
Python MCP session manager (off, shadow) | 2-5ms | JSON-RPC parsing, context vars | JSON_RESPONSE_ENABLED |
| RBAC check | 1-5ms | Permission DB queries | Role cache TTL (5 min internal) |
| tools/list / resources / prompts | 5-10ms | DB and compatibility paths | cache TTLs, Rust specialized handlers |
| tools/call (upstream) | 10-200ms | Upstream server + network | upstream session reuse, direct execution, RMCP client reuse |
Feature Flags and Middleware OverheadΒΆ
Every enabled feature registers middleware, routers, or background tasks that consume resources even when not actively used. ContextForge has ~90 feature flags; each disabled feature removes its middleware and background tasks from the request path.
The most impactful features to disable when not needed are: admin UI, A2A protocol, LLM chat, catalog, observability, audit trail, and database-backed structured logging. See the disable unused features section in the tuning guide for deployment profiles.
Key Architectural InsightΒΆ
The important transport distinction is no longer only /rpc versus /mcp. It is now also Python-owned MCP versus Rust-owned public MCP ingress:
/rpcstill benefits heavily from Redis-backed caches and does not follow the streamable HTTP MCP session path.- Python MCP (
off,shadow) still pays the full Python middleware, session-manager, and handler cost on the public path. - Rust MCP (
edge,full) removes the public Python ingress hop and moves progressively more MCP session/runtime work to Rust, but Python auth/RBAC remains part of the control plane.
This means that scaling MCP throughput now depends on two different concerns:
- shrinking Python auth/control work that still happens for Rust MCP traffic
- minimizing per-request transport and upstream costs on the Rust side
Component Performance Impact SummaryΒΆ
Rust-Powered Components (GIL Bypass)ΒΆ
| Component | Technology | Performance Gain | Use Case |
|---|---|---|---|
| Pydantic v2 | Rust core (pydantic-core) | 5-50x faster validation | Request/response schemas (5,463 lines) |
| orjson | Rust JSON library | 3x faster serialization | All JSON encoding/decoding |
| Granian | Rust HTTP server | +20-50% throughput | HTTP request handling |
| hiredis | C-based Redis parser | Up to 83x faster | Large Redis response parsing |
| uvloop | Cython/libuv event loop | 2-4x faster async I/O | Async event loop |
Database PerformanceΒΆ
| Optimization | Before | After | Improvement |
|---|---|---|---|
| psycopg v3 prepared statements | Parsed each query | Auto-prepared | 2-3x faster repeated queries |
| COPY protocol | INSERT statements | Binary COPY | 5-10x faster bulk inserts |
| Pipeline mode | Sequential queries | Pipelined | 2-5x batch improvements |
| PgBouncer pooling | 1600+ connections | 200 connections | 8x connection reduction |
Caching PerformanceΒΆ
| Cache Layer | Hit Rate | TTL (Configurable) | DB Query Reduction |
|---|---|---|---|
| JWT Cache | >80% | 30s | Per-request HMAC verification cached |
| Auth Cache | >90% | 120-300s (max) | 3-4 β 0-1 queries/request (user, team, role, revocation) |
| Registry Cache | 95%+ | 20-300s | 50-200 β 0-1 queries (tools, servers, prompts, resources) |
| GlobalConfig Cache | 99%+ | 60s | 42K+ queries eliminated (passthrough header config) |
| MCP Session Pool | Varies | 300s pool TTL | 10-20x latency improvement for repeated upstream calls |
Compression & BandwidthΒΆ
| Algorithm | Compression Ratio | Best For |
|---|---|---|
| Brotli | 15-25% smaller than Gzip | Production, CDNs |
| Zstd | Very good, fastest | High-throughput APIs |
| Gzip | Good, universal | Legacy compatibility |
Scaling CapacityΒΆ
Capacity varies by workload type. MCP Streamable HTTP requests are more resource-intensive per request than REST API calls due to additional middleware, auth, and upstream proxy overhead.
| Configuration | REST API (/rpc) | MCP Streamable HTTP (/mcp) |
|---|---|---|
| Single pod (16-24 workers) | ~1,600 RPS | ~250-400 RPS |
| 3 pods (default) | ~4,800 RPS | ~750-800 RPS |
| 10 pods (HPA scaled) | ~16,000 RPS | ~2,500-3,000 RPS |
MCP throughput is lower because each request includes auth/RBAC database queries that the /rpc endpoint caches in Redis. With session pool enabled (MCP_SESSION_POOL_ENABLED=true), upstream MCP server latency is amortized across pooled connections, providing ~10% throughput improvement.
Key Performance Features by IssueΒΆ
| Issue # | Feature | Impact |
|---|---|---|
| #1695 | Granian HTTP server migration | +20-50% throughput |
| #1696, #1692 | orjson throughout codebase | 3x JSON performance |
| #1699 | uvicorn[standard] with uvloop/httptools | 15-30% faster async |
| #1702 | hiredis Redis parser | Up to 83x Redis parsing |
| #1740 | psycopg v3 migration | Auto-prepared, COPY, pipeline |
| #1750, #1753 | PgBouncer connection pooling | 8x connection reduction |
| #1715 | GlobalConfig in-memory cache | 42K queries eliminated |
| #1773 | get_user_teams() caching | Reduced idle-in-transaction |
| #1809-1814 | Schema/template/filter caching | Compilation overhead removed |
| #1816, #1819, #1830 | Precompiled regex patterns | CPU reduction in hot paths |
| #1828, #1837 | SSE/logging micro-optimizations | Reduced allocation overhead |
| #1844 | Monitoring profile | Production observability |
| #2025 | Startup resilience (exponential backoff) | Prevents crash-loop CPU storms |
Startup ResilienceΒΆ
The gateway implements exponential backoff with jitter for database and Redis connection retries at startup. This prevents CPU-intensive crash-respawn loops when dependencies are temporarily unavailable.
Problem SolvedΒΆ
Without exponential backoff, a dependency outage would cause:
Worker starts β Connection fails after 3 attempts (6s) β Worker crashes
β
Granian respawns worker immediately β Worker starts β Connection fails β Crashes
β
Tight crash-respawn loop β 500%+ CPU consumption β System destabilization
Solution: Exponential Backoff with JitterΒΆ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β EXPONENTIAL BACKOFF RETRY PATTERN β
β β
β Attempt 1 Attempt 2 Attempt 3 Attempt 4 Attempt 5+ β
β β β β β β β
β βΌ βΌ βΌ βΌ βΌ β
β βββββββ βββββββ βββββββ ββββββββ ββββββββ β
β β 2s β β 4s β β 8s β β 16s β β 30s β (capped) β
β βββββββ βββββββ βββββββ ββββββββ ββββββββ β
β β β β β β β
β ββββββββββββββ΄βββββββββββββ΄βββββββββββββ΄βββββββββββββ β
β β β
β βΌ β
β Β±25% Random Jitter β
β (prevents thundering herd) β
β β
β Formula: sleep = min(base Γ 2^(attempt-1), 30s) Γ (1 Β± 0.25 Γ random) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Performance ImpactΒΆ
| Metric | Before | After |
|---|---|---|
| Retry attempts | 3 (~6 seconds) | 30 (~5 minutes) |
| CPU during outage | ~500% (crash loop) | ~0% (sleeping) |
| Recovery pattern | Thundering herd | Staggered with jitter |
| System stability | Cascading failures | Graceful degradation |
ConfigurationΒΆ
# Database Startup Resilience
DB_MAX_RETRIES=30 # Max attempts before worker exits (default: 30)
DB_RETRY_INTERVAL_MS=2000 # Base interval in ms (doubles each attempt)
# Redis Startup Resilience
REDIS_MAX_RETRIES=30 # Max attempts before worker exits (default: 30)
REDIS_RETRY_INTERVAL_MS=2000 # Base interval in ms (doubles each attempt)
Retry Progression ExampleΒΆ
With default settings (2s base interval):
| Attempt | Base Delay | With Jitter (Β±25%) | Cumulative Time |
|---|---|---|---|
| 1 | 2s | 1.5s - 2.5s | ~2s |
| 2 | 4s | 3s - 5s | ~6s |
| 3 | 8s | 6s - 10s | ~14s |
| 4 | 16s | 12s - 20s | ~30s |
| 5+ | 30s (cap) | 22.5s - 37.5s | ~60s+ |
After 30 retries: approximately 5 minutes total wait time before the worker gives up, providing ample time for dependencies to recover during maintenance windows or transient outages.
Future: Python 3.14 Free-Threading (GIL Removal)ΒΆ
Current Architecture (Python 3.11-3.13):
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 16 Worker Processes Γ 1 GIL each = True Parallelism β
β Memory: 256MB base + (16 Γ 200MB) = ~3.5GB β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Future Architecture (Python 3.14+):
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 2 Worker Processes Γ 32 Threads = True Parallelism β
β Memory: ~1GB (shared memory, reduced IPC overhead) β
β Performance: Near-linear scaling with CPU cores β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
See AlsoΒΆ
- Gateway Tuning Guide - Environment variables, MCP transport settings, session pool, connection pool tuning
- Performance Profiling Guide - py-spy, memray, PostgreSQL profiling, MCP bottleneck triage
- Database Performance Guide - N+1 detection, query logging, DB vs transport bottleneck triage
Quick Reference CommandsΒΆ
# Start full performance stack
docker-compose up -d
# Access via caching proxy (production)
curl http://localhost:8080/health
# Start with monitoring
docker-compose --profile monitoring up -d
# View cache hit rates
curl -I http://localhost:8080/tools | grep X-Cache-Status
# Run load test
hey -n 10000 -c 200 -H "Authorization: Bearer $TOKEN" http://localhost:8080/
# Check HPA status (Kubernetes)
kubectl get hpa -n mcp-gateway