Performance Profiling Guide¶
This guide covers tools and techniques for profiling MCP Gateway performance under load. Use these methods to identify bottlenecks, optimize queries, and diagnose production issues.
Quick Reference¶
| Tool | Purpose | When to Use |
|---|---|---|
| Locust | Load testing | Simulate concurrent users |
| PostgreSQL EXPLAIN | Query analysis | Find slow/inefficient queries |
| pg_stat_activity | Connection monitoring | Debug idle transactions |
| pg_stat_user_tables | Table scan stats | Find full table scans |
| py-spy | Python CPU profiling | Find CPU hotspots |
| memray | Python memory profiling | Find memory leaks and allocation hotspots |
| docker stats | Resource monitoring | Track CPU/memory usage |
| Redis CLI | Cache analysis | Check hit rates |
Load Testing with Locust¶
Starting a Load Test¶
# Start Locust web UI
make load-test-ui
# Open browser to http://localhost:8089
# Configure users (e.g., 3000) and spawn rate (e.g., 100/s)
Monitoring Locust Stats via API¶
# Get current stats as JSON
curl -s http://localhost:8089/stats/requests | python3 -c "
import sys, json
data = json.load(sys.stdin)
print('=== TOP SLOWEST ENDPOINTS ===')
stats = sorted(data.get('stats', []), key=lambda x: x.get('avg_response_time', 0), reverse=True)[:10]
print(f\"{'Endpoint':<45} {'Reqs':>8} {'Avg':>8} {'P95':>8} {'P99':>8}\")
print('-' * 85)
for s in stats:
name = s.get('name', '')[:43]
p95 = s.get('response_time_percentile_0.95', 0)
p99 = s.get('response_time_percentile_0.99', 0)
print(f\"{name:<45} {s.get('num_requests', 0):>8} {s.get('avg_response_time', 0):>8.0f} {p95:>8.0f} {p99:>8.0f}\")
print()
print(f\"RPS: {data.get('total_rps', 0):.1f}, Users: {data.get('user_count', 0)}, Failures: {data.get('total_fail_count', 0)}\")
"
Checking for Errors¶
curl -s http://localhost:8089/stats/requests | python3 -c "
import sys, json
data = json.load(sys.stdin)
print('=== ERRORS ===')
for e in data.get('errors', []):
print(f\" {e.get('name')}: {e.get('occurrences')} - {e.get('error')[:80]}\")
"
PostgreSQL Profiling¶
EXPLAIN ANALYZE¶
Use EXPLAIN ANALYZE to understand query execution plans and find slow queries:
docker exec mcp-context-forge-postgres-1 psql -U postgres -d mcp -c "
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT COUNT(*), AVG(response_time)
FROM tool_metrics
WHERE timestamp >= NOW() - INTERVAL '7 days';
"
Key metrics to watch:
| Metric | Good | Bad |
|---|---|---|
Seq Scan | On small tables (<1000 rows) | On large tables |
Index Scan | On filtered queries | Missing when expected |
Rows Removed by Filter: 0 | Filter matches few rows | Filter matches all rows |
Shared Buffers Hit | High ratio | Low ratio (disk I/O) |
Example: Detecting Non-Selective Filters
Parallel Seq Scan on tool_metrics
Filter: (timestamp >= (now() - '7 days'::interval))
Rows Removed by Filter: 0 <-- ALL rows match = index not useful
This indicates the filter matches 100% of rows, so PostgreSQL chooses a sequential scan over an index scan.
Table Scan Statistics¶
Monitor which tables are being scanned excessively:
docker exec mcp-context-forge-postgres-1 psql -U postgres -d mcp -c "
SELECT
relname as table_name,
pg_size_pretty(pg_total_relation_size(relid)) as total_size,
n_live_tup as live_rows,
seq_scan,
seq_tup_read,
idx_scan,
CASE WHEN seq_scan > 0 THEN seq_tup_read / seq_scan ELSE 0 END as avg_rows_per_seq_scan
FROM pg_stat_user_tables
ORDER BY seq_tup_read DESC
LIMIT 15;
"
Warning signs:
seq_tup_readin billions = excessive full table scansavg_rows_per_seq_scanequalslive_rows= scanning entire table each time- High
seq_scancount with large tables = missing index or non-selective filter
Connection State Analysis¶
Check for idle-in-transaction connections (a sign of long-running requests or connection leaks):
docker exec mcp-context-forge-postgres-1 psql -U postgres -d mcp -c "
SELECT
state,
COUNT(*) as count,
MAX(EXTRACT(EPOCH FROM (NOW() - state_change)))::int as max_age_seconds
FROM pg_stat_activity
WHERE datname = 'mcp'
GROUP BY state
ORDER BY count DESC;
"
Healthy state:
state | count | max_age_seconds
--------------------+-------+-----------------
idle | 70 | 200
active | 5 | 0
idle in transaction | 3 | 1
Unhealthy state (connection exhaustion risk):
state | count | max_age_seconds
--------------------+-------+-----------------
idle in transaction | 60 | 120 <-- Problem!
idle | 38 | 500
active | 2 | 0
Finding Stuck Queries¶
docker exec mcp-context-forge-postgres-1 psql -U postgres -d mcp -c "
SELECT
pid,
state,
EXTRACT(EPOCH FROM (NOW() - state_change))::numeric(8,2) as idle_seconds,
LEFT(query, 100) as query_snippet
FROM pg_stat_activity
WHERE datname = 'mcp' AND state = 'idle in transaction'
ORDER BY state_change
LIMIT 15;
"
Reset Statistics¶
To get fresh statistics for a specific test:
Python Profiling with py-spy¶
py-spy is a sampling profiler for Python that can attach to running processes without code changes.
Installing py-spy¶
Profiling a Running Container¶
# Find the Python process ID
docker exec mcp-context-forge-gateway-1 ps aux | grep python
# Run py-spy from host (requires root)
sudo py-spy top --pid $(docker inspect --format '{{.State.Pid}}' mcp-context-forge-gateway-1)
# Generate a flamegraph
sudo py-spy record -o profile.svg --pid $(docker inspect --format '{{.State.Pid}}' mcp-context-forge-gateway-1) --duration 30
Profiling Locally¶
# Profile the development server
py-spy top -- python -m mcpgateway
# Generate flamegraph
py-spy record -o flamegraph.svg -- python -m mcpgateway
Interpreting Flamegraphs¶
- Wide bars = functions consuming the most CPU time
- Deep stacks = many nested function calls
- Look for: Template rendering, JSON serialization, database queries
Memory Profiling with memray¶
memray is a memory profiler for Python that tracks allocations in Python code, native extension modules, and the Python interpreter itself. It's ideal for finding memory leaks, high-water marks, and allocation hotspots.
Installing memray¶
Profiling Locally¶
# Run your application with memray tracking
memray run -o output.bin python -m mcpgateway
# Or run a specific script
memray run -o output.bin python script.py
Attaching to a Running Process¶
memray can attach to an already-running Python process to capture memory allocations:
# Find the Python process ID inside the container
docker exec mcp-context-forge-gateway-1 ps aux | grep python
# Attach memray to a running process (requires ptrace permissions)
# Option 1: Run memray inside the container
docker exec -it mcp-context-forge-gateway-1 memray attach <PID> -o /tmp/profile.bin
# Option 2: If using privileged container or with SYS_PTRACE capability
docker exec mcp-context-forge-gateway-1 memray attach --aggregate <PID> -o /tmp/profile.bin
# After capturing, copy the profile out
docker cp mcp-context-forge-gateway-1:/tmp/profile.bin ./profile.bin
Note: memray attach requires ptrace permissions. You may need to run the container with --cap-add=SYS_PTRACE or in privileged mode for profiling.
Generating Reports¶
memray provides multiple output formats:
# Interactive flamegraph (opens in browser)
memray flamegraph output.bin -o flamegraph.html
# Table view (terminal-friendly)
memray table output.bin
# Tree view (call hierarchy)
memray tree output.bin
# Summary statistics
memray stats output.bin
# Identify memory leaks
memray summary output.bin
Live Mode (Real-time Monitoring)¶
For development, use live mode to see allocations in real-time:
# Run with live TUI
memray run --live python -m mcpgateway
# Attach to running process with live mode
memray attach --live <PID>
Container Profiling Workflow¶
Complete workflow for profiling a gateway container:
# 1. Install memray in the container (if not already installed)
docker exec mcp-context-forge-gateway-1 pip install memray
# 2. Find worker PIDs
docker exec mcp-context-forge-gateway-1 ps aux | grep "mcpgateway work" | head -5
# 3. Attach to one worker (e.g., PID 123) for 60 seconds
docker exec mcp-context-forge-gateway-1 timeout 60 memray attach 123 -o /tmp/worker_profile.bin || true
# 4. Copy profile to host
docker cp mcp-context-forge-gateway-1:/tmp/worker_profile.bin ./worker_profile.bin
# 5. Generate reports
memray flamegraph worker_profile.bin -o memory_flamegraph.html
memray stats worker_profile.bin
memray table worker_profile.bin | head -50
Container Profiling Limitations:
memray attachrequiresgdborlldb, which may not be available in minimal containers- Python version must match between memray and the target process (e.g., memray compiled for Python 3.13 won't work with Python 3.12 containers)
- Requires ptrace permissions (
--cap-add=SYS_PTRACEor privileged mode) - For production containers without pip, consider:
- Building a debug image with memray pre-installed
- Using
memray runlocally to reproduce the issue - Using py-spy for CPU profiling (works cross-version and is more portable)
Interpreting memray Output¶
Flamegraph: - Width = amount of memory allocated by that call stack - Color: Red = Python code, Green = C extensions, Blue = Python internals - Click on frames to zoom in
Table view columns: - Total memory = all memory allocated by this function and its callees - Own memory = memory allocated directly by this function - Allocations = number of allocation calls
Common patterns to look for: - Large allocations in template rendering (Jinja2) - JSON serialization of large datasets - ORM model instantiation (SQLAlchemy) - Response buffering in ASGI middleware - Caches growing unbounded
Example high-memory patterns:
# Pattern: Large list comprehensions in API responses
mcpgateway/main.py:handle_rpc Total: 500MB Own: 450MB Allocations: 10000
# Pattern: Template rendering accumulating data
jinja2/environment.py:render Total: 200MB Own: 50MB Allocations: 5000
py-spy vs memray¶
| Aspect | py-spy | memray |
|---|---|---|
| Focus | CPU time | Memory allocation |
| Overhead | Very low (~1%) | Medium (10-30%) |
| Attach support | Yes | Yes |
| Native code | No | Yes |
| Use when | High CPU usage | OOM errors, memory leaks |
Use py-spy when CPU is the bottleneck. Use memray when memory usage is high or you're seeing OOM kills.
Container Resource Monitoring¶
Real-time Stats¶
# Watch all containers
docker stats
# Filter to specific containers
docker stats --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}" \
mcp-context-forge-gateway-1 \
mcp-context-forge-postgres-1 \
mcp-context-forge-redis-1
Snapshot Stats¶
docker stats --no-stream --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}" \
| grep -E "gateway|postgres|redis|nginx"
Healthy resource usage:
| Container | CPU | Memory |
|---|---|---|
| gateway (each) | <400% | <4GB |
| postgres | <150% | <1GB |
| redis | <20% | <100MB |
Redis Cache Analysis¶
Check Hit Rate¶
docker exec mcp-context-forge-redis-1 redis-cli info stats | grep -E "keyspace|ops_per_sec|hits|misses"
Calculate hit rate:
docker exec mcp-context-forge-redis-1 redis-cli info stats | python3 -c "
import sys
stats = {}
for line in sys.stdin:
if ':' in line:
k, v = line.strip().split(':')
stats[k] = int(v) if v.isdigit() else v
hits = stats.get('keyspace_hits', 0)
misses = stats.get('keyspace_misses', 0)
total = hits + misses
hit_rate = (hits / total * 100) if total > 0 else 0
print(f'Hits: {hits}, Misses: {misses}, Hit Rate: {hit_rate:.1f}%')
"
Good hit rate: >90% for cached data
Check Key Counts¶
docker exec mcp-context-forge-redis-1 redis-cli dbsize
# List keys by pattern
docker exec mcp-context-forge-redis-1 redis-cli keys "mcpgw:*" | head -20
Tool lookup cache keys (invoke hot path):
Gateway Log Analysis¶
Check for Errors¶
docker logs mcp-context-forge-gateway-1 2>&1 | grep -iE "error|exception|timeout|warning" | tail -30
Count Error Types¶
docker logs mcp-context-forge-gateway-1 2>&1 | grep -i "error" | \
sed 's/.*\(Error[^:]*\).*/\1/' | sort | uniq -c | sort -rn | head -10
Check for Idle Transaction Timeouts¶
Complete Profiling Session Example¶
Here's a workflow for diagnosing performance issues under load:
# 1. Reset PostgreSQL statistics
docker exec mcp-context-forge-postgres-1 psql -U postgres -d mcp -c "SELECT pg_stat_reset();"
# 2. Start load test
make load-test-ui
# Configure 3000 users in browser, start test
# 3. Take samples every 30 seconds
for i in {1..5}; do
echo "=== SAMPLE $i ==="
# Locust stats
curl -s http://localhost:8089/stats/requests | python3 -c "
import sys, json
d = json.load(sys.stdin)
admin = next((s for s in d.get('stats', []) if s.get('name') == '/admin/'), {})
print(f\"RPS: {d.get('total_rps', 0):.0f}, /admin/ avg: {admin.get('avg_response_time', 0):.0f}ms\")
"
# Connection states
docker exec mcp-context-forge-postgres-1 psql -U postgres -d mcp -c "
SELECT state, COUNT(*) FROM pg_stat_activity WHERE datname='mcp' GROUP BY state;
"
# Container CPU
docker stats --no-stream --format "{{.Name}}: {{.CPUPerc}}" | grep gateway
sleep 30
done
# 4. Final analysis
docker exec mcp-context-forge-postgres-1 psql -U postgres -d mcp -c "
SELECT relname, seq_scan, seq_tup_read, idx_scan
FROM pg_stat_user_tables
ORDER BY seq_tup_read DESC LIMIT 10;
"
Common Performance Issues¶
Issue: High Sequential Scan Count¶
Symptom: seq_tup_read in billions
Causes: - Missing index - Non-selective filter (e.g., 7-day filter matches all recent data) - Short cache TTL causing repeated queries
Solutions: - Add covering index - Increase cache TTL - Add materialized view for aggregations
Issue: Many Idle-in-Transaction Connections¶
Symptom: 50+ connections in idle in transaction state
Causes: - N+1 query patterns - Long-running requests holding transactions - Missing connection pool limits
Solutions: - Use batch queries instead of loops - Set idle_in_transaction_session_timeout - Optimize slow queries
Issue: Health Check Endpoints Holding PgBouncer Connections¶
Symptom: SELECT 1 queries stuck in idle in transaction state for minutes
SELECT left(query, 50), count(*), avg(EXTRACT(EPOCH FROM (NOW() - state_change)))::int as avg_age
FROM pg_stat_activity
WHERE state = 'idle in transaction' AND datname = 'mcp'
GROUP BY left(query, 50);
query | count | avg_age
----------------------+-------+---------
SELECT 1 | 45 | 139
Causes: - PgBouncer in transaction mode holds backend connections until COMMIT/ROLLBACK - Health endpoints using Depends(get_db) rely on dependency cleanup, which may not execute on timeout/cancellation - async def endpoints calling blocking SQLAlchemy code on event loop thread - Cross-thread session usage when mixing asyncio.to_thread with Depends(get_db)
Solutions:
- Use dedicated sessions instead of
Depends(get_db)- Health endpoints should create and manage their own sessions to avoid double-commit and cross-thread issues:
@app.get("/health")
def healthcheck(): # Sync function - FastAPI runs in threadpool
"""Health check with dedicated session."""
db = SessionLocal()
try:
db.execute(text("SELECT 1"))
db.commit() # Explicitly release PgBouncer connection
return {"status": "healthy"}
except Exception as e:
try:
db.rollback()
except Exception:
try:
db.invalidate() # Remove broken connection from pool
except Exception:
pass
return {"status": "unhealthy", "error": str(e)}
finally:
db.close()
- Use sync functions for simple blocking operations - FastAPI automatically runs
def(sync) route handlers in a threadpool:
# BAD: async def with blocking calls stalls event loop
@app.get("/health")
async def healthcheck():
db.execute(text("SELECT 1")) # Blocks event loop!
# GOOD: sync def runs in threadpool automatically
@app.get("/health")
def healthcheck():
db.execute(text("SELECT 1")) # Runs in threadpool
- For async endpoints, create sessions inside
asyncio.to_thread- All DB operations must happen in the same thread:
@app.get("/ready")
async def readiness_check():
def _check_db() -> str | None:
# Session created IN the worker thread
db = SessionLocal()
try:
db.execute(text("SELECT 1"))
db.commit()
return None
except Exception as e:
try:
db.rollback()
except Exception:
try:
db.invalidate()
except Exception:
pass
return str(e)
finally:
db.close()
error = await asyncio.to_thread(_check_db)
if error:
return {"status": "not ready", "error": error}
return {"status": "ready"}
- Mirror
get_dbcleanup pattern - Use rollback → invalidate → close:
except Exception as e:
try:
db.rollback()
except Exception:
try:
db.invalidate() # Remove broken connection from pool
except Exception:
pass # nosec B110 - Best effort cleanup
Why not use Depends(get_db)?
get_dbcommits after yield, causing double-commit if endpoint commits- With
asyncio.to_thread, the session is created in one thread but used in another - Health endpoints should test actual DB connectivity, not be mockable via
dependency_overrides
Issue: High Gateway CPU¶
Symptom: Gateway at 600%+ CPU
Causes: - Template rendering overhead - JSON serialization of large responses - Pydantic validation overhead
Solutions: - Enable response caching - Paginate large result sets - Use orjson for serialization (enabled by default)
See Also¶
- Database Performance Guide - N+1 detection and query logging
- Performance Testing - Load testing with hey
- Scaling Guide - Production scaling configuration
- Issue #1906 - Metrics cache optimization