ContextForge High-Performance ArchitectureΒΆ
This diagram showcases the performance-optimized architecture of ContextForge, highlighting Rust-powered components, async patterns, and scaling capabilities.
Architecture OverviewΒΆ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β KUBERNETES ORCHESTRATION LAYER β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β Horizontal Pod Autoscaler (HPA) ββ
β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ββ
β β β CPU Target: 70% β Memory Target: 80% β ββ
β β β Min Replicas: 3 β Max Replicas: 50 β ββ
β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ββ
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β EDGE / PROXY LAYER β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β NGINX Caching Proxy ββ
β β βββββββββββββββββββββ βββββββββββββββββββββ βββββββββββββββββββββ βββββββββββββββββββββββββββββ ββ
β β β Brotli/Gzip/Zstd β β Static Cache 1GB β β API Cache 512MB β β Rate Limiting 3000r/s β ββ
β β β Compression β β 30-day TTL β β 5-min TTL β β Burst: 3000 requests β ββ
β β β 30-70% savings β β X-Cache-Status β β Schema Cache β β Conn Limit: 3000/IP β ββ
β β βββββββββββββββββββββ βββββββββββββββββββββ βββββββββββββββββββββ βββββββββββββββββββββββββββββ ββ
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ββ
β β β worker_processes: auto β worker_connections: 8192 β keepalive: 512 β backlog: 4096 β ββ
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ββ
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββ
β β β
βΌ βΌ βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β GATEWAY APPLICATION LAYER (Replicated Pods) β
β β
β ββββββββββββββββββββββββββββ ββββββββββββββββββββββββββββ ββββββββββββββββββββββββββββ β
β β Gateway Pod 1 β β Gateway Pod 2 β β Gateway Pod N β β
β β β β β β β β
β β ββββββββββββββββββββββ β β ββββββββββββββββββββββ β β ββββββββββββββββββββββ β β
β β β HTTP SERVER LAYER β β β β HTTP SERVER LAYER β β β β HTTP SERVER LAYER β β β
β β β βββββββββββββββββββ β β β βββββββββββββββββββ β β β βββββββββββββββββββ β β
β β β β GRANIAN ββ β β β β GRANIAN ββ β β β β GRANIAN ββ β β
β β β β (Rust HTTP) ββ β β β β (Rust HTTP) ββ β β β β (Rust HTTP) ββ β β
β β β β +20-50% perf ββ β β β β +20-50% perf ββ β β β β +20-50% perf ββ β β
β β β βββββββββββββββββββ β β β βββββββββββββββββββ β β β βββββββββββββββββββ β β
β β β 16 workers β β β β 16 workers β β β β 16 workers β β β
β β β backlog: 4096 β β β β backlog: 4096 β β β β backlog: 4096 β β β
β β β backpressure: 64 β β β β backpressure: 64 β β β β backpressure: 64 β β β
β β ββββββββββββββββββββββ β β ββββββββββββββββββββββ β β ββββββββββββββββββββββ β β
β β β β β β β β β β β
β β βΌ β β βΌ β β βΌ β β
β β ββββββββββββββββββββββ β β ββββββββββββββββββββββ β β ββββββββββββββββββββββ β β
β β β ASYNC RUNTIME β β β β ASYNC RUNTIME β β β β ASYNC RUNTIME β β β
β β β βββββββββββββββββββ β β β βββββββββββββββββββ β β β βββββββββββββββββββ β β
β β β β UVLOOP ββ β β β β UVLOOP ββ β β β β UVLOOP ββ β β
β β β β (Cython/libuv) ββ β β β β (Cython/libuv) ββ β β β β (Cython/libuv) ββ β β
β β β β 2-4x faster ββ β β β β 2-4x faster ββ β β β β 2-4x faster ββ β β
β β β βββββββββββββββββββ β β β βββββββββββββββββββ β β β βββββββββββββββββββ β β
β β β 1000+ concurrent β β β β 1000+ concurrent β β β β 1000+ concurrent β β β
β β β requests/worker β β β β requests/worker β β β β requests/worker β β β
β β ββββββββββββββββββββββ β β ββββββββββββββββββββββ β β ββββββββββββββββββββββ β β
β β β β β β β β β β β
β βββββββββββββΌβββββββββββββββ βββββββββββββΌβββββββββββββββ βββββββββββββΌβββββββββββββββ β
ββββββββββββββββΌββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββ
β β β
βββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β RUST-POWERED COMPONENTS β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β FASTAPI APPLICATION ββ
β β ββ
β β βββββββββββββββββββββββββ βββββββββββββββββββββββββ βββββββββββββββββββββββββ ββ
β β β ββββββββββββββββββββββ β ββββββββββββββββββββββ β ββββββββββββββββββββββ ββ
β β β β PYDANTIC V2 ββ β β ORJSON ββ β β HIREDIS ββ ββ
β β β β (Rust Core) ββ β β (Rust JSON) ββ β β (C Parser) ββ ββ
β β β ββββββββββββββββββββββ β ββββββββββββββββββββββ β ββββββββββββββββββββββ ββ
β β β β’ 5-50x faster β β β’ 3x faster β β β’ Up to 83x faster β ββ
β β β validation β β serialization β β Redis parsing β ββ
β β β β’ GIL bypass β β β’ Native types β β β’ Large response β ββ
β β β β’ 5,463 lines β β β’ ORJSONResponse β β optimization β ββ
β β β of schemas β β β’ SSE streaming β β β’ Auto fallback β ββ
β β βββββββββββββββββββββββββ βββββββββββββββββββββββββ βββββββββββββββββββββββββ ββ
β β ββ
β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ββ
β β β MULTI-LEVEL CACHING (80-95% DB reduction) β ββ
β β β ββββββββββββββββββ ββββββββββββββββββ ββββββββββββββββββ ββββββββββββββββββ βββββββββββββββ β ββ
β β β β JWT Cache β β Auth Cache β β Registry Cache β β Admin Stats β β GlobalConfigβ β ββ
β β β β TTL: 30s β β TTL: 120-300s β β TTL: 20-300s β β TTL: 30-120s β β TTL: 60s β β ββ
β β β β <1ms auth β β 0-1 queries β β 95%+ hit rate β β Dashboard opt β β 42Kβ0 qry β β ββ
β β β ββββββββββββββββββ ββββββββββββββββββ ββββββββββββββββββ ββββββββββββββββββ βββββββββββββββ β ββ
β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ββ
β β ββ
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ββ
β β β PERFORMANCE OPTIMIZATIONS β ββ
β β β β’ Precompiled regex validators β’ Lazy f-string logging β ββ
β β β β’ Cached Jinja templates β’ Cached JSONPath parsing β ββ
β β β β’ Cached jq filter compilation β’ Cached JSON Schema validators β ββ
β β β β’ has_hooks_for optimization β’ Buffered metrics writes β ββ
β β β β’ Bulk UPDATE for token cleanup β’ SQL-based metrics aggregation β ββ
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ββ
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββ
β β β
βΌ βΌ βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β DATA LAYER β
β β
β βββββββββββββββββββββββββββββββββββββββββ βββββββββββββββββββββββββββββββββββββββββββ β
β β CONNECTION POOLING β β DISTRIBUTED CACHE β β
β β β β β β
β β ββββββββββββββββββββββββββββββββββ β β ββββββββββββββββββββββββββββββββββββββββ β
β β β PGBOUNCER β β β β REDIS ββ β
β β β Connection Multiplexer β β β β High-Performance ββ β
β β β ββββββββββββββββββββββββββββ β β β β βββββββββββββββββββββββββββββββββ ββ β
β β β β MAX_CLIENT_CONN: 5000 β β β β β β βββββββββββββββββββββββββββ β ββ β
β β β β DEFAULT_POOL_SIZE: 450 β β β β β β β HIREDIS C PARSER β β ββ β
β β β β MAX_DB_CONNECTIONS: 550 β β β β β β β Up to 83x faster β β ββ β
β β β β POOL_MODE: transaction β β β β β β βββββββββββββββββββββββββββ β ββ β
β β β β 8x connection reduction β β β β β β maxmemory: 1GB β ββ β
β β β ββββββββββββββββββββββββββββ β β β β β maxclients: 10000 β ββ β
β β ββββββββββββββββββββββββββββββββββ β β β β tcp-backlog: 2048 β ββ β
β β β β β β β allkeys-lru eviction β ββ β
β β βΌ β β β βββββββββββββββββββββββββββββββββ ββ β
β β ββββββββββββββββββββββββββββββββββββ β β β ββ β
β β β POSTGRESQL 18 β β β β Session storage: TTL 3600s ββ β
β β β Production Database β β β β Message cache: TTL 600s ββ β
β β β βββββββββββββββββββββββββββββ β β β β Federation cache ββ β
β β β β βββββββββββββββββββββββ β β β β β Leader election ββ β
β β β β β PSYCOPG V3 β β β β β ββββββββββββββββββββββββββββββββββββββββ β
β β β β β (Modern Driver) β β β β βββββββββββββββββββββββββββββββββββββββββββ β
β β β β βββββββββββββββββββββββ β β β β
β β β β β’ Auto-prepared stmts β β β β
β β β β β’ COPY protocol (5-10x) β β β β
β β β β β’ Pipeline mode (2-5x) β β β β
β β β β β’ Native async I/O β β β β
β β β βββββββββββββββββββββββββββββ β β β
β β β max_connections: 700 β β β
β β β shared_buffers: 512MB β β β
β β β effective_cache_size: 1536MB β β β
β β β synchronous_commit: off β β β
β β β idle_in_transaction: 30s β β β
β β β SSD optimized (random_page: 1.1)β β β
β β ββββββββββββββββββββββββββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β OBSERVABILITY & MONITORING β
β βββββββββββββββββββββ βββββββββββββββββββββ βββββββββββββββββββββ βββββββββββββββββββββββββββββββββ β
β β Prometheus β β Grafana β β Loki β β Exporters β β
β β Metrics Store β β Dashboards β β Log Aggregation β β PostgreSQL | Redis | Nginx β β
β β 7-day retention β β Visualization β β LogQL β β PgBouncer | cAdvisor β β
β βββββββββββββββββββββ βββββββββββββββββββββ βββββββββββββββββββββ βββββββββββββββββββββββββββββββββ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β OpenTelemetry Integration ββ
β β OTEL Traces β OTLP Exporter β Collector β Service Name: mcp-gateway ββ
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
MCP Streamable HTTP Request PathΒΆ
Every MCP request to /servers/{server_id}/mcp passes through these layers:
Client Request (JSON-RPC over HTTP POST)
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββ
β NGINX (Edge/Proxy) β
β β’ least_conn load balancing β
β β’ keepalive 512 per worker β
β β’ No caching for /mcp (POST requests) β
βββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββ
β GATEWAY MIDDLEWARE STACK β
β 1. SecurityHeaders, CORS β
β 2. MCPPathRewrite + Auth β
β β’ JWT verification (HMAC) β
β β’ Token revocation check (DB/cache) β
β β’ User lookup (DB/cache) β
β β’ Team resolution (DB/cache) β
β 3. Token scoping (Layer 1 auth) β
β 4. Request logging β
βββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββ
β MCP SDK SessionManager β
β β’ JSON-RPC envelope parsing β
β β’ Session tracking (stateless by default) β
β β’ Context variable propagation β
β β’ Handler method routing β
βββββββββββββββββββββββββββββββββββββββββββββββ
β
βββ tools/list ββββββ
βββ tools/call ββββββ€
βββ resources/list ββ€
βββ prompts/list ββββ€
βββ ping ββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββ
β MCP HANDLER β
β β’ RBAC permission check (Layer 2 auth) β
β β’ Server/tool lookup (DB query) β
β β’ For tools/call: upstream proxy β
β via MCP Session Pool (if enabled) β
βββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββ
β UPSTREAM MCP SERVER β
β (fast_test_server, fast_time, plugins) β
β β’ Executes tool logic β
β β’ Returns JSON-RPC result β
βββββββββββββββββββββββββββββββββββββββββββββββ
Performance Characteristics by LayerΒΆ
| Layer | Typical Latency | Scaling Bottleneck | Key Tunable |
|---|---|---|---|
| nginx | <1ms | Not a bottleneck | keepalive, worker_connections |
| Middleware + Auth | 5-15ms | Auth DB queries | AUTH_CACHE_*_TTL, AUTH_CACHE_BATCH_QUERIES |
| MCP SDK SessionManager | 2-5ms | JSON-RPC parsing, context vars | JSON_RESPONSE_ENABLED |
| RBAC check | 1-5ms | Permission DB queries | Role cache TTL (5 min internal) |
| tools/list (DB) | 5-10ms | Sequential table scans | REGISTRY_CACHE_TOOLS_TTL |
| tools/call (upstream) | 10-200ms | Upstream server + network | MCP_SESSION_POOL_ENABLED |
Feature Flags and Middleware OverheadΒΆ
Every enabled feature registers middleware, routers, or background tasks that consume resources even when not actively used. ContextForge has ~90 feature flags; each disabled feature removes its middleware and background tasks from the request path.
The most impactful features to disable when not needed are: admin UI, A2A protocol, LLM chat, catalog, observability, audit trail, and database-backed structured logging. See the disable unused features section in the tuning guide for deployment profiles.
Key Architectural InsightΒΆ
The /rpc endpoint and the /servers/{id}/mcp endpoint serve the same logical operations (tools/list, tools/call) but follow different code paths:
/rpc: Uses Redis-backed caching (registry cache, tool lookup cache) for most lookups. Under load, Redis handles the read pressure, keeping PgBouncer/PostgreSQL near idle./mcp: Routes through the MCP SDK session manager, which executes its own handler functions. These handlers query the database via SQLAlchemy for server resolution, tool lookup, and RBAC checks. The auth cache (Redis-backed, TTL up to 300s) mitigates some of this, but RBAC and server/tool lookups still hit the database.
This means that scaling MCP throughput depends heavily on reducing per-request database queries in the MCP transport handlers.
Component Performance Impact SummaryΒΆ
Rust-Powered Components (GIL Bypass)ΒΆ
| Component | Technology | Performance Gain | Use Case |
|---|---|---|---|
| Pydantic v2 | Rust core (pydantic-core) | 5-50x faster validation | Request/response schemas (5,463 lines) |
| orjson | Rust JSON library | 3x faster serialization | All JSON encoding/decoding |
| Granian | Rust HTTP server | +20-50% throughput | HTTP request handling |
| hiredis | C-based Redis parser | Up to 83x faster | Large Redis response parsing |
| uvloop | Cython/libuv event loop | 2-4x faster async I/O | Async event loop |
Database PerformanceΒΆ
| Optimization | Before | After | Improvement |
|---|---|---|---|
| psycopg v3 prepared statements | Parsed each query | Auto-prepared | 2-3x faster repeated queries |
| COPY protocol | INSERT statements | Binary COPY | 5-10x faster bulk inserts |
| Pipeline mode | Sequential queries | Pipelined | 2-5x batch improvements |
| PgBouncer pooling | 1600+ connections | 200 connections | 8x connection reduction |
Caching PerformanceΒΆ
| Cache Layer | Hit Rate | TTL (Configurable) | DB Query Reduction |
|---|---|---|---|
| JWT Cache | >80% | 30s | Per-request HMAC verification cached |
| Auth Cache | >90% | 120-300s (max) | 3-4 β 0-1 queries/request (user, team, role, revocation) |
| Registry Cache | 95%+ | 20-300s | 50-200 β 0-1 queries (tools, servers, prompts, resources) |
| GlobalConfig Cache | 99%+ | 60s | 42K+ queries eliminated (passthrough header config) |
| MCP Session Pool | Varies | 300s pool TTL | 10-20x latency improvement for repeated upstream calls |
Compression & BandwidthΒΆ
| Algorithm | Compression Ratio | Best For |
|---|---|---|
| Brotli | 15-25% smaller than Gzip | Production, CDNs |
| Zstd | Very good, fastest | High-throughput APIs |
| Gzip | Good, universal | Legacy compatibility |
Scaling CapacityΒΆ
Capacity varies by workload type. MCP Streamable HTTP requests are more resource-intensive per request than REST API calls due to additional middleware, auth, and upstream proxy overhead.
| Configuration | REST API (/rpc) | MCP Streamable HTTP (/mcp) |
|---|---|---|
| Single pod (16-24 workers) | ~1,600 RPS | ~250-400 RPS |
| 3 pods (default) | ~4,800 RPS | ~750-800 RPS |
| 10 pods (HPA scaled) | ~16,000 RPS | ~2,500-3,000 RPS |
MCP throughput is lower because each request includes auth/RBAC database queries that the /rpc endpoint caches in Redis. With session pool enabled (MCP_SESSION_POOL_ENABLED=true), upstream MCP server latency is amortized across pooled connections, providing ~10% throughput improvement.
Key Performance Features by IssueΒΆ
| Issue # | Feature | Impact |
|---|---|---|
| #1695 | Granian HTTP server migration | +20-50% throughput |
| #1696, #1692 | orjson throughout codebase | 3x JSON performance |
| #1699 | uvicorn[standard] with uvloop/httptools | 15-30% faster async |
| #1702 | hiredis Redis parser | Up to 83x Redis parsing |
| #1740 | psycopg v3 migration | Auto-prepared, COPY, pipeline |
| #1750, #1753 | PgBouncer connection pooling | 8x connection reduction |
| #1715 | GlobalConfig in-memory cache | 42K queries eliminated |
| #1773 | get_user_teams() caching | Reduced idle-in-transaction |
| #1809-1814 | Schema/template/filter caching | Compilation overhead removed |
| #1816, #1819, #1830 | Precompiled regex patterns | CPU reduction in hot paths |
| #1828, #1837 | SSE/logging micro-optimizations | Reduced allocation overhead |
| #1844 | Monitoring profile | Production observability |
| #2025 | Startup resilience (exponential backoff) | Prevents crash-loop CPU storms |
Startup ResilienceΒΆ
The gateway implements exponential backoff with jitter for database and Redis connection retries at startup. This prevents CPU-intensive crash-respawn loops when dependencies are temporarily unavailable.
Problem SolvedΒΆ
Without exponential backoff, a dependency outage would cause:
Worker starts β Connection fails after 3 attempts (6s) β Worker crashes
β
Granian respawns worker immediately β Worker starts β Connection fails β Crashes
β
Tight crash-respawn loop β 500%+ CPU consumption β System destabilization
Solution: Exponential Backoff with JitterΒΆ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β EXPONENTIAL BACKOFF RETRY PATTERN β
β β
β Attempt 1 Attempt 2 Attempt 3 Attempt 4 Attempt 5+ β
β β β β β β β
β βΌ βΌ βΌ βΌ βΌ β
β βββββββ βββββββ βββββββ ββββββββ ββββββββ β
β β 2s β β 4s β β 8s β β 16s β β 30s β (capped) β
β βββββββ βββββββ βββββββ ββββββββ ββββββββ β
β β β β β β β
β ββββββββββββββ΄βββββββββββββ΄βββββββββββββ΄βββββββββββββ β
β β β
β βΌ β
β Β±25% Random Jitter β
β (prevents thundering herd) β
β β
β Formula: sleep = min(base Γ 2^(attempt-1), 30s) Γ (1 Β± 0.25 Γ random) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Performance ImpactΒΆ
| Metric | Before | After |
|---|---|---|
| Retry attempts | 3 (~6 seconds) | 30 (~5 minutes) |
| CPU during outage | ~500% (crash loop) | ~0% (sleeping) |
| Recovery pattern | Thundering herd | Staggered with jitter |
| System stability | Cascading failures | Graceful degradation |
ConfigurationΒΆ
# Database Startup Resilience
DB_MAX_RETRIES=30 # Max attempts before worker exits (default: 30)
DB_RETRY_INTERVAL_MS=2000 # Base interval in ms (doubles each attempt)
# Redis Startup Resilience
REDIS_MAX_RETRIES=30 # Max attempts before worker exits (default: 30)
REDIS_RETRY_INTERVAL_MS=2000 # Base interval in ms (doubles each attempt)
Retry Progression ExampleΒΆ
With default settings (2s base interval):
| Attempt | Base Delay | With Jitter (Β±25%) | Cumulative Time |
|---|---|---|---|
| 1 | 2s | 1.5s - 2.5s | ~2s |
| 2 | 4s | 3s - 5s | ~6s |
| 3 | 8s | 6s - 10s | ~14s |
| 4 | 16s | 12s - 20s | ~30s |
| 5+ | 30s (cap) | 22.5s - 37.5s | ~60s+ |
After 30 retries: approximately 5 minutes total wait time before the worker gives up, providing ample time for dependencies to recover during maintenance windows or transient outages.
Future: Python 3.14 Free-Threading (GIL Removal)ΒΆ
Current Architecture (Python 3.11-3.13):
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 16 Worker Processes Γ 1 GIL each = True Parallelism β
β Memory: 256MB base + (16 Γ 200MB) = ~3.5GB β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Future Architecture (Python 3.14+):
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 2 Worker Processes Γ 32 Threads = True Parallelism β
β Memory: ~1GB (shared memory, reduced IPC overhead) β
β Performance: Near-linear scaling with CPU cores β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
See AlsoΒΆ
- Gateway Tuning Guide - Environment variables, MCP transport settings, session pool, connection pool tuning
- Performance Profiling Guide - py-spy, memray, PostgreSQL profiling, MCP bottleneck triage
- Database Performance Guide - N+1 detection, query logging, DB vs transport bottleneck triage
Quick Reference CommandsΒΆ
# Start full performance stack
docker-compose up -d
# Access via caching proxy (production)
curl http://localhost:8080/health
# Start with monitoring
docker-compose --profile monitoring up -d
# View cache hit rates
curl -I http://localhost:8080/tools | grep X-Cache-Status
# Run load test
hey -n 10000 -c 200 -H "Authorization: Bearer $TOKEN" http://localhost:8080/
# Check HPA status (Kubernetes)
kubectl get hpa -n mcp-gateway