ADR-032: MCP Session Pool for Connection ReuseΒΆ
- Status: Accepted
- Date: 2025-01-05
- Deciders: Platform Team
Introduction: Understanding Connection ReuseΒΆ
The Connection Overhead ProblemΒΆ
When a client makes an HTTP request, several steps must occur before any application data is exchanged:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Traditional HTTP Request Flow β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Client Server β
β β β β
β ββββββββββββ TCP SYN ββββββββββββββββββββββββββββββββββββββββββΊβ β β
β ββββββββββββ TCP SYN-ACK ββββββββββββββββββββββββββββββββββββββ β
β ββββββββββββ TCP ACK ββββββββββββββββββββββββββββββββββββββββββΊβ β
β β β β
β ββββββββββββ TLS ClientHello ββββββββββββββββββββββββββββββββββΊβ β‘ β
β ββββββββββββ TLS ServerHello + Certificate ββββββββββββββββββββ β
β ββββββββββββ TLS Key Exchange βββββββββββββββββββββββββββββββββΊβ β
β ββββββββββββ TLS Finished βββββββββββββββββββββββββββββββββββββ β
β β β β
β ββββββββββββ HTTP Request ββββββββββββββββββββββββββββββββββββΊβ β’ β
β ββββββββββββ HTTP Response ββββββββββββββββββββββββββββββββββββ β
β β β β
β ββββββββββββ TCP FIN ββββββββββββββββββββββββββββββββββββββββββΊβ β£ β
β β β β
β β
β β TCP Handshake: ~1-3ms (local) to ~50-150ms (cross-region) β
β β‘ TLS Handshake: ~5-15ms (additional round trips + crypto) β
β β’ HTTP Exchange: ~1-5ms (actual request/response) β
β β£ Connection Close β
β β
β Total overhead per request: 10-170ms (mostly handshakes!) β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
HTTP Persistent Connections (Keep-Alive)ΒΆ
HTTP/1.1 persistent connections solve this by reusing TCP connections:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β HTTP Keep-Alive Flow β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Client Server β
β β β β
β ββββββββββββ TCP + TLS Handshakes βββββββββββββββββββββββββββββΊβ Once β
β β β β
β ββββββββββββ HTTP Request 1 ββββββββββββββββββββββββββββββββββΊβ β
β ββββββββββββ HTTP Response 1 βββββββββββββββββββββββββββββββββ β
β β β β
β ββββββββββββ HTTP Request 2 ββββββββββββββββββββββββββββββββββΊβ Reuse β
β ββββββββββββ HTTP Response 2 βββββββββββββββββββββββββββββββββ β
β β β β
β ββββββββββββ HTTP Request 3 ββββββββββββββββββββββββββββββββββΊβ Reuse β
β ββββββββββββ HTTP Response 3 βββββββββββββββββββββββββββββββββ β
β β β β
β β
β First request: 10-170ms (includes handshakes) β
β Subsequent: 1-5ms (just HTTP exchange) β 10-50x faster! β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
MCP Protocol: An Additional LayerΒΆ
The Model Context Protocol (MCP) adds its own session initialization on top of HTTP:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β MCP Session Initialization β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β MCP Client MCP Server β
β β β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β TCP + TLS (reused via HTTP Keep-Alive in httpx client) β β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β β
β ββββββββββββ initialize (JSON-RPC) ββββββββββββββββββββββββββΊβ β β
β β { β β
β β "method": "initialize", β β
β β "params": { β β
β β "protocolVersion": "2025-03-26", β β
β β "capabilities": {...}, β β
β β "clientInfo": {"name": "gateway", ...} β β
β β } β β
β β } β β
β β β β
β ββββββββββββ InitializeResult βββββββββββββββββββββββββββββββ β‘ β
β β { β β
β β "protocolVersion": "2025-03-26", β β
β β "capabilities": {...}, β β
β β "serverInfo": {"name": "my-mcp-server", ...} β β
β β } β β
β β Header: mcp-session-id: "abc123" β β
β β β β
β ββββββββββββ initialized (notification) βββββββββββββββββββββΊβ β’ β
β β β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β Session established - can now call tools, read resources β β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β β
β ββββββββββββ tools/call βββββββββββββββββββββββββββββββββββββΊβ β£ β
β ββββββββββββ CallToolResult βββββββββββββββββββββββββββββββββ β
β β β β
β β
β β Client sends initialize with protocol version and capabilities β
β β‘ Server responds with its capabilities and assigns mcp-session-id β
β β’ Client confirms with initialized notification β
β β£ Now tool calls, resource reads, etc. can proceed β
β β
β MCP initialization overhead: ~10-15ms (2-3 round trips) β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
The mcp-session-id header is critical - it identifies this session for all subsequent requests. The MCP SDK's ClientSession class manages this state internally.
The Full Picture: Why Session Pooling MattersΒΆ
Without session pooling, every tool call pays the full cost:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β WITHOUT Session Pooling (Current MCP SDK Default) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Tool Call 1: β
β TCP Handshake βββββββββββββββββββββββββββββββββ ~2ms β
β TLS Handshake βββββββββββββββββββββββββββββββββ ~5ms β
β MCP Initialize ββββββββββββββββββββββββββββββββ ~10ms β
β Tool Execution ββββββββββββββββββββββββββββββββ ~2ms β
β Close βββββββββββββββββββββββββββββββββββββββββ ~1ms β
β Total: ~20ms β
β β
β Tool Call 2: β
β TCP Handshake βββββββββββββββββββββββββββββββββ ~2ms β
β TLS Handshake βββββββββββββββββββββββββββββββββ ~5ms β
β MCP Initialize ββββββββββββββββββββββββββββββββ ~10ms β
β Tool Execution ββββββββββββββββββββββββββββββββ ~2ms β
β Close βββββββββββββββββββββββββββββββββββββββββ ~1ms β
β Total: ~20ms β
β β
β Tool Call 3: ~20ms β
β Tool Call 4: ~20ms β
β ... β
β β
β 10 tool calls = 200ms total β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β WITH Session Pooling (This Implementation) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Tool Call 1 (Pool Miss - creates new session): β
β TCP Handshake βββββββββββββββββββββββββββββββββ ~2ms β
β TLS Handshake βββββββββββββββββββββββββββββββββ ~5ms β
β MCP Initialize ββββββββββββββββββββββββββββββββ ~10ms β
β Tool Execution ββββββββββββββββββββββββββββββββ ~2ms β
β Return to pool (not closed!) ββββββββββββββββββ ~0ms β
β Total: ~19ms β
β β
β Tool Call 2 (Pool Hit - reuses session): β
β Acquire from pool βββββββββββββββββββββββββββββ ~0.1ms β
β Tool Execution ββββββββββββββββββββββββββββββββ ~2ms β
β Return to pool ββββββββββββββββββββββββββββββββ ~0.1ms β
β Total: ~2ms β 10x faster! β
β β
β Tool Call 3: ~2ms (pool hit) β
β Tool Call 4: ~2ms (pool hit) β
β ... β
β β
β 10 tool calls = 19ms + 9Γ2ms = 37ms total (vs 200ms = 5.4x faster!) β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Comparison: HTTP Keep-Alive vs MCP Session PoolingΒΆ
| Layer | What's Reused | Overhead Saved | Who Manages It |
|---|---|---|---|
| HTTP Keep-Alive | TCP + TLS connection | ~5-15ms | httpx client |
| MCP Session Pool | TCP + TLS + MCP session | ~15-25ms | This implementation |
HTTP Keep-Alive is already used by the httpx client internally. MCP Session Pooling adds MCP-level session reuse on top, saving the initialize β initialized handshake (~10-15ms) on every call.
ContextΒΆ
Every MCP tool call previously required establishing a new MCP session:
- Create HTTP/SSE transport (httpx may reuse TCP via keep-alive)
- Initialize MCP session (protocol handshake with capability negotiation)
- Execute the tool call
- Close MCP session
This per-request session overhead added 15-25ms latency to every tool invocation, which becomes significant under high load or in latency-sensitive applications.
Problem StatementΒΆ
- Latency: MCP session initialization dominates tool call time for fast operations
- Resource Usage: Repeated protocol handshakes increase CPU usage
- Scalability: Session churn limits throughput under load
- State Loss: Each session starts fresh (no caching of tool lists, etc.)
RequirementsΒΆ
- Reduce tool call latency by reusing MCP sessions
- Maintain session isolation between users/tenants
- Support different transport types (SSE, StreamableHTTP)
- Handle session failures gracefully
- Prevent unbounded resource growth
DecisionΒΆ
Implement a session pool that maintains persistent MCP ClientSession objects keyed by (URL, identity_hash, transport_type).
Architecture OverviewΒΆ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β MCP Gateway with Session Pool β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β βββββββββββββββ βββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β User A ββββββΊβ β β
β β (token X) β β β β
β βββββββββββββββ β MCP Gateway β β
β β β β
β βββββββββββββββ β βββββββββββββββββββββββββββββββββββββββββββ β β
β β User B ββββββΊβ β Session Pool β β β
β β (token Y) β β β β β β
β βββββββββββββββ β β Pool Key = (URL, identity_hash, transport) β
β β β β β β
β βββββββββββββββ β β βββββββββββββββββββββββββββββββββββ β β β
β β User C ββββββΊβ β β Key: (mcp-server:8080, sha(X), http) β β β
β β (token X) β β β β Sessions: [S1, S2, S3] βββββΌββββΌβββΌβββΊ MCP Server A
β βββββββββββββββ β β βββββββββββββββββββββββββββββββββββ β β β
β β β β β β
β β β βββββββββββββββββββββββββββββββββββ β β β
β β β β Key: (mcp-server:8080, sha(Y), http) β β β
β β β β Sessions: [S4, S5] βββββΌββββΌβββΌβββΊ MCP Server A
β β β βββββββββββββββββββββββββββββββββββ β β β
β β β β β β
β β β βββββββββββββββββββββββββββββββββββ β β β
β β β β Key: (other-mcp:9000, sha(X), sse) β β β
β β β β Sessions: [S6] βββββΌββββΌβββΌβββΊ MCP Server B
β β β βββββββββββββββββββββββββββββββββββ β β β
β β β β β β
β β βββββββββββββββββββββββββββββββββββββββββββ β β
β β β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β Note: User A and User C have the same token (X), so they share sessions β
β User B has different token (Y), so gets isolated sessions β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Key Design DecisionsΒΆ
1. Identity-Based IsolationΒΆ
Sessions are isolated by a composite key:
Where identity_hash is derived from authentication headers: - Authorization - X-Tenant-ID - X-User-ID - X-API-Key - Cookie
This ensures different users/tenants never share sessions, preventing data leakage.
2. Transport Type IsolationΒΆ
Sessions are also isolated by transport type (SSE vs StreamableHTTP) because: - Different transports have different connection semantics - Mixing transports could cause protocol errors - Allows independent tuning per transport
3. Session LifecycleΒΆ
βββββββββββββββ acquire() βββββββββββββββ
β Pool β ββββββββββββββββββΊ β Active β
β (Idle) β β (In Use) β
βββββββββββββββ βββββββββββββββ
β² β
β release() β
ββββββββββββββββββββββββββββββββββββ
β
β (TTL expired or unhealthy)
βΌ
βββββββββββββββ
β Closed β
βββββββββββββββ
4. Health Checking StrategyΒΆ
Sessions are validated: - On acquire: If idle > health_check_interval (default 60s), call list_tools() to verify health - On release: If age > TTL, close instead of returning to pool - Background: Stale sessions are reaped during acquire operations
This balances freshness with performance overhead.
5. Circuit Breaker PatternΒΆ
Failed endpoints are temporarily blocked: - After threshold consecutive failures (default 5), circuit opens - Requests fail fast for reset_seconds (default 60s) - Prevents cascade failures when an MCP server is down
6. Timeout ConfigurationΒΆ
The pool uses separate timeouts for different operations:
| Setting | Default | Purpose |
|---|---|---|
health_check_interval | 60s | Gateway health check frequency |
mcp_session_pool_health_check_interval | 60s | Session staleness threshold |
mcp_session_pool_transport_timeout | 30s | Transport timeout for all HTTP operations |
Configuration behavior: - Pool health check interval uses min(health_check_interval, mcp_session_pool_health_check_interval) - Pool transport timeout uses mcp_session_pool_transport_timeout (default 30s to match MCP SDK)
The transport timeout applies to all HTTP operations (connect, read, write) on pooled sessions. If your tools require longer execution times, increase this value accordingly.
7. Optional Explicit Health VerificationΒΆ
Gateway health checks can optionally perform explicit RPC verification via feature flag:
# Disabled by default for performance (pool's internal staleness check is sufficient)
MCP_SESSION_POOL_EXPLICIT_HEALTH_RPC=false
When enabled, health checks call list_tools() even on fresh sessions:
# gateway_service.py
async with pool.session(url, headers, transport_type) as pooled:
if settings.mcp_session_pool_explicit_health_rpc:
await asyncio.wait_for(
pooled.session.list_tools(),
timeout=settings.health_check_timeout,
)
Trade-off: - Disabled (default): Pool's internal staleness check (idle > health_check_interval) handles health. Best performance (~1-2ms per check). - Enabled: Every health check performs explicit RPC. Stricter verification at ~5ms latency cost per check.
ImplementationΒΆ
File: mcpgateway/services/mcp_session_pool.py
class MCPSessionPool:
"""Pool of MCP ClientSessions keyed by (URL, identity, transport)."""
async def acquire(
self,
url: str,
headers: Optional[Dict[str, str]] = None,
transport_type: TransportType = TransportType.STREAMABLE_HTTP,
httpx_client_factory: Optional[HttpxClientFactory] = None,
timeout: Optional[float] = None,
) -> PooledSession:
"""Acquire a session, creating if needed."""
async def release(self, pooled: PooledSession) -> None:
"""Return session to pool for reuse."""
@asynccontextmanager
async def session(self, url, headers, transport_type, ...) -> AsyncIterator[PooledSession]:
"""Context manager for acquire/release lifecycle."""
Usage in Services:
# tool_service.py, resource_service.py, gateway_service.py
async with pool.session(
url=server_url,
headers=auth_headers,
transport_type=TransportType.SSE,
httpx_client_factory=factory,
) as pooled:
result = await pooled.session.call_tool(tool_name, arguments)
Performance CharacteristicsΒΆ
Latency ImprovementΒΆ
| Scenario | Before (per-call) | After (pooled) | Improvement |
|---|---|---|---|
| Pool Hit | 20ms | 1-2ms | 10-20x |
| Pool Miss | 20ms | 20ms | Same |
| Health Check | N/A | +5ms | Occasional |
Real-World Metrics ExampleΒΆ
From production deployment:
{
"hits": 2977,
"misses": 10,
"hit_rate": 0.9967,
"pool_key_count": 2,
"anonymous_identity_count": 2997,
"circuit_breaker_trips": 0
}
99.67% of requests reused existing sessions β 10x latency reduction for those calls.
Resource UsageΒΆ
- Memory: ~1KB per pooled session
- Connections: Bounded by
max_per_key Γ unique_identities Γ urls - Default: 10 sessions per (URL, identity, transport)
Idle Pool EvictionΒΆ
Empty pool keys are evicted after idle_pool_eviction_seconds (default 600s) to prevent unbounded growth with rotating tokens.
ConsequencesΒΆ
PositiveΒΆ
- 10-20x latency reduction for repeated tool calls from same user
- Reduced server load through connection reuse
- Improved throughput under high concurrency
- Graceful degradation via circuit breaker
- Session isolation prevents cross-user data leakage
- Configurable - all parameters tunable via environment variables
NegativeΒΆ
- Memory overhead for maintaining idle sessions
- Complexity - more moving parts than per-call connections
- Stale sessions possible if health check interval is too long
- Header pinning - session reuses original auth headers (by design)
NeutralΒΆ
- Requires graceful shutdown to close pool (
close_mcp_session_pool()) - Metrics available via
/admin/mcp-pool/metricsendpoint - Falls back to per-call sessions when pool unavailable (e.g., in tests)
ConfigurationΒΆ
Environment variables:
# Enable/disable pool (default: false - enable explicitly after testing)
MCP_SESSION_POOL_ENABLED=true # Recommended for production
# Max sessions per (URL, identity, transport) - default: 10
MCP_SESSION_POOL_MAX_PER_KEY=10
# Session TTL before forced close - default: 300s
MCP_SESSION_POOL_TTL=300.0
# Idle time before health check - default: 60s
# Auto-aligned with min(HEALTH_CHECK_INTERVAL, MCP_SESSION_POOL_HEALTH_CHECK_INTERVAL)
MCP_SESSION_POOL_HEALTH_CHECK_INTERVAL=60.0
# Transport timeout for all HTTP operations (connect, read, write) - default: 30s
# Increase for deployments with long-running tool calls
MCP_SESSION_POOL_TRANSPORT_TIMEOUT=30.0
# Timeout waiting for session slot - default: 30s
MCP_SESSION_POOL_ACQUIRE_TIMEOUT=30.0
# Timeout creating new session - default: 30s
MCP_SESSION_POOL_CREATE_TIMEOUT=30.0
# Circuit breaker failures threshold - default: 5
MCP_SESSION_POOL_CIRCUIT_BREAKER_THRESHOLD=5
# Circuit breaker reset time - default: 60s
MCP_SESSION_POOL_CIRCUIT_BREAKER_RESET=60.0
# Evict idle pool keys after - default: 600s
MCP_SESSION_POOL_IDLE_EVICTION=600.0
# Force explicit RPC (list_tools) on gateway health checks - default: false
# Off by default for performance; pool's internal staleness check is sufficient.
# Enable for stricter health verification at ~5ms latency cost per check.
MCP_SESSION_POOL_EXPLICIT_HEALTH_RPC=false
Design ConsiderationsΒΆ
Why Not Share Sessions Across Users?ΒΆ
Security: MCP sessions may contain user-specific state (authentication context, rate limits, permissions). Sharing sessions could leak data between users.
Why Identity Hash Instead of Full Headers?ΒΆ
- Privacy: Full headers may contain secrets
- Efficiency: Hash comparison is O(1)
- Stability: Irrelevant header changes don't fragment pools
Why Not Refresh Headers on Reuse?ΒΆ
The MCP protocol establishes auth during initialize(). Changing headers mid-session would require protocol renegotiation, defeating the purpose of pooling.
For rotating tokens, use identity_extractor to extract stable identity (e.g., user ID from JWT claims), ensuring the same user always gets the same pool.
Known LimitationsΒΆ
1. Request-Scoped Headers Are PinnedΒΆ
The MCP SDK pins headers at transport creation time. Per-request headers (like X-Correlation-ID) passed to pooled sessions become "sticky" and are reused for all subsequent requests on that session.
Impact: Distributed tracing may attribute multiple requests to the same correlation ID if they share a pooled session.
Mitigation: The gateway strips X-Correlation-ID from headers before pooling. If you need per-request headers downstream, use non-pooled sessions or contribute MCP SDK support for per-request headers.
2. identity_extractor Requires Code ChangesΒΆ
The identity_extractor callback is supported in pool code but cannot be enabled via environment variables. Operators who need custom identity extraction (e.g., extracting user ID from JWT claims) must modify the initialization code in main.py.
3. Circuit Breaker Is URL-ScopedΒΆ
The circuit breaker tracks failures per URL, not per identity. If one tenant causes repeated session creation failures, the circuit opens for all tenants accessing that URL.
Scope: Only session creation failures (connection refused, SSL errors) trip the circuit. Tool call failures do not affect the circuit breaker.
4. TLS Configuration Not in Pool KeyΒΆ
Pool keys do not include TLS/CA context. If the same URL is accessed with different CA bundles (unusual deployment pattern), the first session's TLS configuration may be reused.
Security ConsiderationsΒΆ
Session Isolation ModelΒΆ
Sessions are isolated by a composite key: (URL, identity_hash, transport_type). The identity hash is derived from authentication headers (Authorization, X-Tenant-ID, X-User-ID, X-API-Key, Cookie).
Key security properties: - Different users with different credentials get different pool keys β different sessions - Different MCP server URLs always get different sessions - Identity is validated at the gateway level; upstream MCP servers validate only mcp-session-id
Anonymous Pooling RiskΒΆ
When no identity headers are present, identity collapses to "anonymous", causing all such requests to share sessions. This is acceptable only if:
- The gateway requires authentication (default), preventing truly anonymous requests
- Upstream MCP servers are stateless and don't maintain per-session context
If MCP servers maintain per-session state, anonymous pooling can leak data between users.
Recommended configuration: Ensure AUTH_REQUIRED=true and identity headers are present via passthrough or gateway authentication.
Shared Credentials ScenarioΒΆ
With shared service credentials (OAuth Client Credentials, static API keys), all users share the same Authorization header and therefore the same session. This is intentional for machine-to-machine auth where the MCP server has no per-user concept.
Risk: Only if the upstream MCP server maintains per-user state. For truly stateless servers, this is safe and provides maximum connection reuse.
Token Rotation HandlingΒΆ
With default configuration, Authorization is part of the identity hash. Token rotation produces a new pool key and therefore a new session. Stale tokens are not reused.
Exception: If identity_extractor is enabled (requires code changes) or Authorization is removed from identity headers, rotating tokens may reuse sessions with stale credentials until TTL expiration.
Alternatives ConsideredΒΆ
| Alternative | Why Not |
|---|---|
| HTTP/2 multiplexing only | Saves TCP/TLS but not MCP initialize overhead |
| Global session pool | Security risk from cross-user session sharing |
| No pooling | Unacceptable latency for high-throughput use cases |
| Connection-only pool | MCP session state includes more than just connection |
ReferencesΒΆ
- HTTP Persistent Connection (Wikipedia)
- MCP Protocol Specification
mcpgateway/services/mcp_session_pool.py- Implementationmcpgateway/config.py- Configuration settingsmcpgateway/admin.py- Metrics endpoint (/admin/mcp-pool/metrics)tests/unit/mcpgateway/services/test_mcp_session_pool.py- Unit tests
StatusΒΆ
Implemented and disabled by default for safety. Enable explicitly after testing:
Provides 10-20x latency improvement for tool calls with session reuse.