Skip to content

ADR-032: MCP Session Pool for Connection ReuseΒΆ

  • Status: Accepted
  • Date: 2025-01-05
  • Deciders: Platform Team

Introduction: Understanding Connection ReuseΒΆ

The Connection Overhead ProblemΒΆ

When a client makes an HTTP request, several steps must occur before any application data is exchanged:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Traditional HTTP Request Flow                            β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                            β”‚
β”‚  Client                                                          Server    β”‚
β”‚    β”‚                                                               β”‚       β”‚
β”‚    │─────────── TCP SYN ─────────────────────────────────────────►│ β‘      β”‚
β”‚    │◄────────── TCP SYN-ACK ─────────────────────────────────────│       β”‚
β”‚    │─────────── TCP ACK ─────────────────────────────────────────►│       β”‚
β”‚    β”‚                                                               β”‚       β”‚
β”‚    │─────────── TLS ClientHello ─────────────────────────────────►│ β‘‘     β”‚
β”‚    │◄────────── TLS ServerHello + Certificate ───────────────────│       β”‚
β”‚    │─────────── TLS Key Exchange ────────────────────────────────►│       β”‚
β”‚    │◄────────── TLS Finished ────────────────────────────────────│       β”‚
β”‚    β”‚                                                               β”‚       β”‚
β”‚    │═══════════ HTTP Request ═══════════════════════════════════►│ β‘’     β”‚
β”‚    │◄══════════ HTTP Response ═══════════════════════════════════│       β”‚
β”‚    β”‚                                                               β”‚       β”‚
β”‚    │─────────── TCP FIN ─────────────────────────────────────────►│ β‘£     β”‚
β”‚    β”‚                                                               β”‚       β”‚
β”‚                                                                            β”‚
β”‚  β‘  TCP Handshake:  ~1-3ms (local) to ~50-150ms (cross-region)             β”‚
β”‚  β‘‘ TLS Handshake:  ~5-15ms (additional round trips + crypto)              β”‚
β”‚  β‘’ HTTP Exchange:  ~1-5ms (actual request/response)                       β”‚
β”‚  β‘£ Connection Close                                                        β”‚
β”‚                                                                            β”‚
β”‚  Total overhead per request: 10-170ms (mostly handshakes!)                β”‚
β”‚                                                                            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

HTTP Persistent Connections (Keep-Alive)ΒΆ

HTTP/1.1 persistent connections solve this by reusing TCP connections:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    HTTP Keep-Alive Flow                                     β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                            β”‚
β”‚  Client                                                          Server    β”‚
β”‚    β”‚                                                               β”‚       β”‚
β”‚    │─────────── TCP + TLS Handshakes ────────────────────────────►│ Once  β”‚
β”‚    β”‚                                                               β”‚       β”‚
β”‚    │═══════════ HTTP Request 1 ═════════════════════════════════►│       β”‚
β”‚    │◄══════════ HTTP Response 1 ════════════════════════════════│       β”‚
β”‚    β”‚                                                               β”‚       β”‚
β”‚    │═══════════ HTTP Request 2 ═════════════════════════════════►│ Reuse β”‚
β”‚    │◄══════════ HTTP Response 2 ════════════════════════════════│       β”‚
β”‚    β”‚                                                               β”‚       β”‚
β”‚    │═══════════ HTTP Request 3 ═════════════════════════════════►│ Reuse β”‚
β”‚    │◄══════════ HTTP Response 3 ════════════════════════════════│       β”‚
β”‚    β”‚                                                               β”‚       β”‚
β”‚                                                                            β”‚
β”‚  First request:  10-170ms (includes handshakes)                           β”‚
β”‚  Subsequent:     1-5ms (just HTTP exchange) ← 10-50x faster!              β”‚
β”‚                                                                            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

MCP Protocol: An Additional LayerΒΆ

The Model Context Protocol (MCP) adds its own session initialization on top of HTTP:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    MCP Session Initialization                               β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                            β”‚
β”‚  MCP Client                                                    MCP Server  β”‚
β”‚    β”‚                                                               β”‚       β”‚
β”‚    β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚       β”‚
β”‚    β”‚ β”‚ TCP + TLS (reused via HTTP Keep-Alive in httpx client)  β”‚  β”‚       β”‚
β”‚    β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚       β”‚
β”‚    β”‚                                                               β”‚       β”‚
β”‚    │═══════════ initialize (JSON-RPC) ═════════════════════════►│ β‘      β”‚
β”‚    β”‚            {                                                  β”‚       β”‚
β”‚    β”‚              "method": "initialize",                          β”‚       β”‚
β”‚    β”‚              "params": {                                      β”‚       β”‚
β”‚    β”‚                "protocolVersion": "2025-03-26",               β”‚       β”‚
β”‚    β”‚                "capabilities": {...},                         β”‚       β”‚
β”‚    β”‚                "clientInfo": {"name": "gateway", ...}         β”‚       β”‚
β”‚    β”‚              }                                                β”‚       β”‚
β”‚    β”‚            }                                                  β”‚       β”‚
β”‚    β”‚                                                               β”‚       β”‚
β”‚    │◄══════════ InitializeResult ══════════════════════════════│ β‘‘     β”‚
β”‚    β”‚            {                                                  β”‚       β”‚
β”‚    β”‚              "protocolVersion": "2025-03-26",                 β”‚       β”‚
β”‚    β”‚              "capabilities": {...},                           β”‚       β”‚
β”‚    β”‚              "serverInfo": {"name": "my-mcp-server", ...}     β”‚       β”‚
β”‚    β”‚            }                                                  β”‚       β”‚
β”‚    β”‚            Header: mcp-session-id: "abc123"                   β”‚       β”‚
β”‚    β”‚                                                               β”‚       β”‚
β”‚    │═══════════ initialized (notification) ════════════════════►│ β‘’     β”‚
β”‚    β”‚                                                               β”‚       β”‚
β”‚    β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚       β”‚
β”‚    β”‚ β”‚ Session established - can now call tools, read resources β”‚  β”‚       β”‚
β”‚    β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚       β”‚
β”‚    β”‚                                                               β”‚       β”‚
β”‚    │═══════════ tools/call ════════════════════════════════════►│ β‘£     β”‚
β”‚    │◄══════════ CallToolResult ════════════════════════════════│       β”‚
β”‚    β”‚                                                               β”‚       β”‚
β”‚                                                                            β”‚
β”‚  β‘  Client sends initialize with protocol version and capabilities         β”‚
β”‚  β‘‘ Server responds with its capabilities and assigns mcp-session-id       β”‚
β”‚  β‘’ Client confirms with initialized notification                          β”‚
β”‚  β‘£ Now tool calls, resource reads, etc. can proceed                       β”‚
β”‚                                                                            β”‚
β”‚  MCP initialization overhead: ~10-15ms (2-3 round trips)                  β”‚
β”‚                                                                            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

The mcp-session-id header is critical - it identifies this session for all subsequent requests. The MCP SDK's ClientSession class manages this state internally.

The Full Picture: Why Session Pooling MattersΒΆ

Without session pooling, every tool call pays the full cost:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              WITHOUT Session Pooling (Current MCP SDK Default)             β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                            β”‚
β”‚  Tool Call 1:                                                              β”‚
β”‚    TCP Handshake ───────────────────────────────── ~2ms                   β”‚
β”‚    TLS Handshake ───────────────────────────────── ~5ms                   β”‚
β”‚    MCP Initialize ──────────────────────────────── ~10ms                  β”‚
β”‚    Tool Execution ──────────────────────────────── ~2ms                   β”‚
β”‚    Close ───────────────────────────────────────── ~1ms                   β”‚
β”‚                                               Total: ~20ms                 β”‚
β”‚                                                                            β”‚
β”‚  Tool Call 2:                                                              β”‚
β”‚    TCP Handshake ───────────────────────────────── ~2ms                   β”‚
β”‚    TLS Handshake ───────────────────────────────── ~5ms                   β”‚
β”‚    MCP Initialize ──────────────────────────────── ~10ms                  β”‚
β”‚    Tool Execution ──────────────────────────────── ~2ms                   β”‚
β”‚    Close ───────────────────────────────────────── ~1ms                   β”‚
β”‚                                               Total: ~20ms                 β”‚
β”‚                                                                            β”‚
β”‚  Tool Call 3:  ~20ms                                                       β”‚
β”‚  Tool Call 4:  ~20ms                                                       β”‚
β”‚  ...                                                                       β”‚
β”‚                                                                            β”‚
β”‚  10 tool calls = 200ms total                                               β”‚
β”‚                                                                            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              WITH Session Pooling (This Implementation)                    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                            β”‚
β”‚  Tool Call 1 (Pool Miss - creates new session):                           β”‚
β”‚    TCP Handshake ───────────────────────────────── ~2ms                   β”‚
β”‚    TLS Handshake ───────────────────────────────── ~5ms                   β”‚
β”‚    MCP Initialize ──────────────────────────────── ~10ms                  β”‚
β”‚    Tool Execution ──────────────────────────────── ~2ms                   β”‚
β”‚    Return to pool (not closed!) ────────────────── ~0ms                   β”‚
β”‚                                               Total: ~19ms                 β”‚
β”‚                                                                            β”‚
β”‚  Tool Call 2 (Pool Hit - reuses session):                                 β”‚
β”‚    Acquire from pool ───────────────────────────── ~0.1ms                 β”‚
β”‚    Tool Execution ──────────────────────────────── ~2ms                   β”‚
β”‚    Return to pool ──────────────────────────────── ~0.1ms                 β”‚
β”‚                                               Total: ~2ms  ← 10x faster!  β”‚
β”‚                                                                            β”‚
β”‚  Tool Call 3:  ~2ms (pool hit)                                            β”‚
β”‚  Tool Call 4:  ~2ms (pool hit)                                            β”‚
β”‚  ...                                                                       β”‚
β”‚                                                                            β”‚
β”‚  10 tool calls = 19ms + 9Γ—2ms = 37ms total (vs 200ms = 5.4x faster!)      β”‚
β”‚                                                                            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Comparison: HTTP Keep-Alive vs MCP Session PoolingΒΆ

Layer What's Reused Overhead Saved Who Manages It
HTTP Keep-Alive TCP + TLS connection ~5-15ms httpx client
MCP Session Pool TCP + TLS + MCP session ~15-25ms This implementation

HTTP Keep-Alive is already used by the httpx client internally. MCP Session Pooling adds MCP-level session reuse on top, saving the initialize β†’ initialized handshake (~10-15ms) on every call.

ContextΒΆ

Every MCP tool call previously required establishing a new MCP session:

  1. Create HTTP/SSE transport (httpx may reuse TCP via keep-alive)
  2. Initialize MCP session (protocol handshake with capability negotiation)
  3. Execute the tool call
  4. Close MCP session

This per-request session overhead added 15-25ms latency to every tool invocation, which becomes significant under high load or in latency-sensitive applications.

Problem StatementΒΆ

  • Latency: MCP session initialization dominates tool call time for fast operations
  • Resource Usage: Repeated protocol handshakes increase CPU usage
  • Scalability: Session churn limits throughput under load
  • State Loss: Each session starts fresh (no caching of tool lists, etc.)

RequirementsΒΆ

  1. Reduce tool call latency by reusing MCP sessions
  2. Maintain session isolation between users/tenants
  3. Support different transport types (SSE, StreamableHTTP)
  4. Handle session failures gracefully
  5. Prevent unbounded resource growth

DecisionΒΆ

Implement a session pool that maintains persistent MCP ClientSession objects keyed by (URL, identity_hash, transport_type).

Architecture OverviewΒΆ

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                         MCP Gateway with Session Pool                       β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                            β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚   User A    │────►│                                                 β”‚  β”‚
β”‚  β”‚  (token X)  β”‚     β”‚                                                 β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚            MCP Gateway                          β”‚  β”‚
β”‚                      β”‚                                                 β”‚  β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚  β”‚
β”‚  β”‚   User B    │────►│   β”‚           Session Pool                  β”‚   β”‚  β”‚
β”‚  β”‚  (token Y)  β”‚     β”‚   β”‚                                         β”‚   β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚   β”‚  Pool Key = (URL, identity_hash, transport) β”‚
β”‚                      β”‚   β”‚                                         β”‚   β”‚  β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚   β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚   β”‚  β”‚
β”‚  β”‚   User C    │────►│   β”‚  β”‚ Key: (mcp-server:8080, sha(X), http) β”‚   β”‚  β”‚
β”‚  β”‚  (token X)  β”‚     β”‚   β”‚  β”‚ Sessions: [S1, S2, S3]          │───┼───┼──┼──► MCP Server A
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚   β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚   β”‚  β”‚
β”‚                      β”‚   β”‚                                         β”‚   β”‚  β”‚
β”‚                      β”‚   β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚   β”‚  β”‚
β”‚                      β”‚   β”‚  β”‚ Key: (mcp-server:8080, sha(Y), http) β”‚   β”‚  β”‚
β”‚                      β”‚   β”‚  β”‚ Sessions: [S4, S5]              │───┼───┼──┼──► MCP Server A
β”‚                      β”‚   β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚   β”‚  β”‚
β”‚                      β”‚   β”‚                                         β”‚   β”‚  β”‚
β”‚                      β”‚   β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚   β”‚  β”‚
β”‚                      β”‚   β”‚  β”‚ Key: (other-mcp:9000, sha(X), sse)  β”‚   β”‚  β”‚
β”‚                      β”‚   β”‚  β”‚ Sessions: [S6]                  │───┼───┼──┼──► MCP Server B
β”‚                      β”‚   β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚   β”‚  β”‚
β”‚                      β”‚   β”‚                                         β”‚   β”‚  β”‚
β”‚                      β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚  β”‚
β”‚                      β”‚                                                 β”‚  β”‚
β”‚                      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                                                                            β”‚
β”‚  Note: User A and User C have the same token (X), so they share sessions  β”‚
β”‚        User B has different token (Y), so gets isolated sessions          β”‚
β”‚                                                                            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key Design DecisionsΒΆ

1. Identity-Based IsolationΒΆ

Sessions are isolated by a composite key:

pool_key = (url, identity_hash, transport_type)

Where identity_hash is derived from authentication headers: - Authorization - X-Tenant-ID - X-User-ID - X-API-Key - Cookie

This ensures different users/tenants never share sessions, preventing data leakage.

2. Transport Type IsolationΒΆ

Sessions are also isolated by transport type (SSE vs StreamableHTTP) because: - Different transports have different connection semantics - Mixing transports could cause protocol errors - Allows independent tuning per transport

3. Session LifecycleΒΆ

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     acquire()      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Pool       β”‚ ─────────────────► β”‚  Active     β”‚
β”‚  (Idle)     β”‚                    β”‚  (In Use)   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       β–²                                  β”‚
       β”‚         release()                β”‚
       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                     β”‚
                     β”‚ (TTL expired or unhealthy)
                     β–Ό
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β”‚  Closed     β”‚
              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

4. Health Checking StrategyΒΆ

Sessions are validated: - On acquire: If idle > health_check_interval (default 60s), call list_tools() to verify health - On release: If age > TTL, close instead of returning to pool - Background: Stale sessions are reaped during acquire operations

This balances freshness with performance overhead.

5. Circuit Breaker PatternΒΆ

Failed endpoints are temporarily blocked: - After threshold consecutive failures (default 5), circuit opens - Requests fail fast for reset_seconds (default 60s) - Prevents cascade failures when an MCP server is down

6. Timeout ConfigurationΒΆ

The pool uses separate timeouts for different operations:

Setting Default Purpose
health_check_interval 60s Gateway health check frequency
mcp_session_pool_health_check_interval 60s Session staleness threshold
mcp_session_pool_transport_timeout 30s Transport timeout for all HTTP operations

Configuration behavior: - Pool health check interval uses min(health_check_interval, mcp_session_pool_health_check_interval) - Pool transport timeout uses mcp_session_pool_transport_timeout (default 30s to match MCP SDK)

The transport timeout applies to all HTTP operations (connect, read, write) on pooled sessions. If your tools require longer execution times, increase this value accordingly.

7. Optional Explicit Health VerificationΒΆ

Gateway health checks can optionally perform explicit RPC verification via feature flag:

# Disabled by default for performance (pool's internal staleness check is sufficient)
MCP_SESSION_POOL_EXPLICIT_HEALTH_RPC=false

When enabled, health checks call list_tools() even on fresh sessions:

# gateway_service.py
async with pool.session(url, headers, transport_type) as pooled:
    if settings.mcp_session_pool_explicit_health_rpc:
        await asyncio.wait_for(
            pooled.session.list_tools(),
            timeout=settings.health_check_timeout,
        )

Trade-off: - Disabled (default): Pool's internal staleness check (idle > health_check_interval) handles health. Best performance (~1-2ms per check). - Enabled: Every health check performs explicit RPC. Stricter verification at ~5ms latency cost per check.

ImplementationΒΆ

File: mcpgateway/services/mcp_session_pool.py

class MCPSessionPool:
    """Pool of MCP ClientSessions keyed by (URL, identity, transport)."""

    async def acquire(
        self,
        url: str,
        headers: Optional[Dict[str, str]] = None,
        transport_type: TransportType = TransportType.STREAMABLE_HTTP,
        httpx_client_factory: Optional[HttpxClientFactory] = None,
        timeout: Optional[float] = None,
    ) -> PooledSession:
        """Acquire a session, creating if needed."""

    async def release(self, pooled: PooledSession) -> None:
        """Return session to pool for reuse."""

    @asynccontextmanager
    async def session(self, url, headers, transport_type, ...) -> AsyncIterator[PooledSession]:
        """Context manager for acquire/release lifecycle."""

Usage in Services:

# tool_service.py, resource_service.py, gateway_service.py
async with pool.session(
    url=server_url,
    headers=auth_headers,
    transport_type=TransportType.SSE,
    httpx_client_factory=factory,
) as pooled:
    result = await pooled.session.call_tool(tool_name, arguments)

Performance CharacteristicsΒΆ

Latency ImprovementΒΆ

Scenario Before (per-call) After (pooled) Improvement
Pool Hit 20ms 1-2ms 10-20x
Pool Miss 20ms 20ms Same
Health Check N/A +5ms Occasional

Real-World Metrics ExampleΒΆ

From production deployment:

{
    "hits": 2977,
    "misses": 10,
    "hit_rate": 0.9967,
    "pool_key_count": 2,
    "anonymous_identity_count": 2997,
    "circuit_breaker_trips": 0
}

99.67% of requests reused existing sessions β†’ 10x latency reduction for those calls.

Resource UsageΒΆ

  • Memory: ~1KB per pooled session
  • Connections: Bounded by max_per_key Γ— unique_identities Γ— urls
  • Default: 10 sessions per (URL, identity, transport)

Idle Pool EvictionΒΆ

Empty pool keys are evicted after idle_pool_eviction_seconds (default 600s) to prevent unbounded growth with rotating tokens.

ConsequencesΒΆ

PositiveΒΆ

  • 10-20x latency reduction for repeated tool calls from same user
  • Reduced server load through connection reuse
  • Improved throughput under high concurrency
  • Graceful degradation via circuit breaker
  • Session isolation prevents cross-user data leakage
  • Configurable - all parameters tunable via environment variables

NegativeΒΆ

  • Memory overhead for maintaining idle sessions
  • Complexity - more moving parts than per-call connections
  • Stale sessions possible if health check interval is too long
  • Header pinning - session reuses original auth headers (by design)

NeutralΒΆ

  • Requires graceful shutdown to close pool (close_mcp_session_pool())
  • Metrics available via /admin/mcp-pool/metrics endpoint
  • Falls back to per-call sessions when pool unavailable (e.g., in tests)

ConfigurationΒΆ

Environment variables:

# Enable/disable pool (default: false - enable explicitly after testing)
MCP_SESSION_POOL_ENABLED=true  # Recommended for production

# Max sessions per (URL, identity, transport) - default: 10
MCP_SESSION_POOL_MAX_PER_KEY=10

# Session TTL before forced close - default: 300s
MCP_SESSION_POOL_TTL=300.0

# Idle time before health check - default: 60s
# Auto-aligned with min(HEALTH_CHECK_INTERVAL, MCP_SESSION_POOL_HEALTH_CHECK_INTERVAL)
MCP_SESSION_POOL_HEALTH_CHECK_INTERVAL=60.0

# Transport timeout for all HTTP operations (connect, read, write) - default: 30s
# Increase for deployments with long-running tool calls
MCP_SESSION_POOL_TRANSPORT_TIMEOUT=30.0

# Timeout waiting for session slot - default: 30s
MCP_SESSION_POOL_ACQUIRE_TIMEOUT=30.0

# Timeout creating new session - default: 30s
MCP_SESSION_POOL_CREATE_TIMEOUT=30.0

# Circuit breaker failures threshold - default: 5
MCP_SESSION_POOL_CIRCUIT_BREAKER_THRESHOLD=5

# Circuit breaker reset time - default: 60s
MCP_SESSION_POOL_CIRCUIT_BREAKER_RESET=60.0

# Evict idle pool keys after - default: 600s
MCP_SESSION_POOL_IDLE_EVICTION=600.0

# Force explicit RPC (list_tools) on gateway health checks - default: false
# Off by default for performance; pool's internal staleness check is sufficient.
# Enable for stricter health verification at ~5ms latency cost per check.
MCP_SESSION_POOL_EXPLICIT_HEALTH_RPC=false

Design ConsiderationsΒΆ

Why Not Share Sessions Across Users?ΒΆ

Security: MCP sessions may contain user-specific state (authentication context, rate limits, permissions). Sharing sessions could leak data between users.

Why Identity Hash Instead of Full Headers?ΒΆ

  1. Privacy: Full headers may contain secrets
  2. Efficiency: Hash comparison is O(1)
  3. Stability: Irrelevant header changes don't fragment pools

Why Not Refresh Headers on Reuse?ΒΆ

The MCP protocol establishes auth during initialize(). Changing headers mid-session would require protocol renegotiation, defeating the purpose of pooling.

For rotating tokens, use identity_extractor to extract stable identity (e.g., user ID from JWT claims), ensuring the same user always gets the same pool.

Known LimitationsΒΆ

1. Request-Scoped Headers Are PinnedΒΆ

The MCP SDK pins headers at transport creation time. Per-request headers (like X-Correlation-ID) passed to pooled sessions become "sticky" and are reused for all subsequent requests on that session.

Impact: Distributed tracing may attribute multiple requests to the same correlation ID if they share a pooled session.

Mitigation: The gateway strips X-Correlation-ID from headers before pooling. If you need per-request headers downstream, use non-pooled sessions or contribute MCP SDK support for per-request headers.

2. identity_extractor Requires Code ChangesΒΆ

The identity_extractor callback is supported in pool code but cannot be enabled via environment variables. Operators who need custom identity extraction (e.g., extracting user ID from JWT claims) must modify the initialization code in main.py.

3. Circuit Breaker Is URL-ScopedΒΆ

The circuit breaker tracks failures per URL, not per identity. If one tenant causes repeated session creation failures, the circuit opens for all tenants accessing that URL.

Scope: Only session creation failures (connection refused, SSL errors) trip the circuit. Tool call failures do not affect the circuit breaker.

4. TLS Configuration Not in Pool KeyΒΆ

Pool keys do not include TLS/CA context. If the same URL is accessed with different CA bundles (unusual deployment pattern), the first session's TLS configuration may be reused.

Security ConsiderationsΒΆ

Session Isolation ModelΒΆ

Sessions are isolated by a composite key: (URL, identity_hash, transport_type). The identity hash is derived from authentication headers (Authorization, X-Tenant-ID, X-User-ID, X-API-Key, Cookie).

Key security properties: - Different users with different credentials get different pool keys β†’ different sessions - Different MCP server URLs always get different sessions - Identity is validated at the gateway level; upstream MCP servers validate only mcp-session-id

Anonymous Pooling RiskΒΆ

When no identity headers are present, identity collapses to "anonymous", causing all such requests to share sessions. This is acceptable only if:

  1. The gateway requires authentication (default), preventing truly anonymous requests
  2. Upstream MCP servers are stateless and don't maintain per-session context

If MCP servers maintain per-session state, anonymous pooling can leak data between users.

Recommended configuration: Ensure AUTH_REQUIRED=true and identity headers are present via passthrough or gateway authentication.

Shared Credentials ScenarioΒΆ

With shared service credentials (OAuth Client Credentials, static API keys), all users share the same Authorization header and therefore the same session. This is intentional for machine-to-machine auth where the MCP server has no per-user concept.

Risk: Only if the upstream MCP server maintains per-user state. For truly stateless servers, this is safe and provides maximum connection reuse.

Token Rotation HandlingΒΆ

With default configuration, Authorization is part of the identity hash. Token rotation produces a new pool key and therefore a new session. Stale tokens are not reused.

Exception: If identity_extractor is enabled (requires code changes) or Authorization is removed from identity headers, rotating tokens may reuse sessions with stale credentials until TTL expiration.

Alternatives ConsideredΒΆ

Alternative Why Not
HTTP/2 multiplexing only Saves TCP/TLS but not MCP initialize overhead
Global session pool Security risk from cross-user session sharing
No pooling Unacceptable latency for high-throughput use cases
Connection-only pool MCP session state includes more than just connection

ReferencesΒΆ

StatusΒΆ

Implemented and disabled by default for safety. Enable explicitly after testing:

MCP_SESSION_POOL_ENABLED=true

Provides 10-20x latency improvement for tool calls with session reuse.