Skip to content

Load Testing HintsΒΆ

Quick reference for running containerized load tests with docker compose and Locust.


Starting the Testing StackΒΆ

# Default: starts gateway, nginx, fast_test_server, Locust (web UI at :8089)
make testing-up

All load testing services run inside Docker on the mcpnet network. Locust targets http://nginx:80 by default.


Environment Variable ReferenceΒΆ

Locust ConfigurationΒΆ

Override these when calling make testing-up or docker compose --profile testing up:

Variable Default Description
LOCUST_LOCUSTFILE locustfile.py Which locustfile to run (any file in tests/loadtest/)
LOCUST_MODE master master for web UI, headless for CLI-only
LOCUST_USERS 100 Number of concurrent simulated users
LOCUST_SPAWN_RATE 10 Users spawned per second during ramp-up
LOCUST_RUN_TIME 5m Test duration (headless mode only), e.g. 30s, 5m, 1h
LOCUST_EXPECT_WORKERS 1 Number of distributed workers the master expects

Examples:

# Run the echo delay locustfile with web UI
LOCUST_LOCUSTFILE=locustfile_echo_delay.py make testing-up

# Headless run with 500 users for 2 minutes
LOCUST_LOCUSTFILE=locustfile_echo_delay.py LOCUST_MODE=headless \
  LOCUST_USERS=500 LOCUST_SPAWN_RATE=50 LOCUST_RUN_TIME=120s \
  make testing-up

# Use the high-throughput locustfile
LOCUST_LOCUSTFILE=locustfile_highthroughput.py make testing-up

# Scale to 4 Locust workers for higher concurrency
TESTING_LOCUST_WORKERS=4 make testing-up

Gateway ScalingΒΆ

Variable Default Description
GATEWAY_REPLICAS 3 Number of gateway container instances
GATEWAY_CPU_LIMIT 8 CPU limit per replica
GATEWAY_MEM_LIMIT 8G Memory limit per replica
GATEWAY_CPU_RESERVATION 4 CPU reservation per replica
GATEWAY_MEM_RESERVATION 4G Memory reservation per replica
GUNICORN_WORKERS 24 Gunicorn worker processes per replica

Examples:

# 6 small replicas with 5 workers each (30 total workers)
GATEWAY_REPLICAS=6 GATEWAY_CPU_LIMIT=1 GATEWAY_MEM_LIMIT=2G \
  GATEWAY_CPU_RESERVATION=0.5 GATEWAY_MEM_RESERVATION=1G \
  GUNICORN_WORKERS=5 make testing-up

# Single large replica for debugging
GATEWAY_REPLICAS=1 GUNICORN_WORKERS=4 make testing-up

Gateway ConfigurationΒΆ

Variable Default Description
DATABASE_URL postgresql+psycopg://...@pgbouncer:6432/mcp Database connection string
POSTGRES_PASSWORD mysecretpassword PostgreSQL password (used in default DATABASE_URL)
MCP_SESSION_POOL_ENABLED true Enable MCP client session pooling

Examples:

# Bypass PgBouncer and connect directly to PostgreSQL
DATABASE_URL='postgresql+psycopg://postgres:mysecretpassword@postgres:5432/mcp' make testing-up

# Disable session pooling (uses fresh connection per tool call β€” slower but more reliable)
MCP_SESSION_POOL_ENABLED=false make testing-up

# Combine: small replicas + direct Postgres + echo delay test + no pool
GATEWAY_REPLICAS=6 GUNICORN_WORKERS=5 MCP_SESSION_POOL_ENABLED=false \
  DATABASE_URL='postgresql+psycopg://postgres:mysecretpassword@postgres:5432/mcp' \
  LOCUST_LOCUSTFILE=locustfile_echo_delay.py \
  make testing-up

Echo Delay Test ConfigurationΒΆ

These are read by locustfile_echo_delay.py inside the Locust container:

Variable Default Description
ECHO_DELAY_MS 500 Milliseconds the echo tool waits before responding
ECHO_DELAY_SERVER_ID (fixed UUID) Virtual server ID to target (matches register_fast_test)

The echo delay test sends MCP tools/call requests through the gateway's Streamable HTTP endpoint (/servers/{id}/mcp), measuring how efficiently the gateway handles I/O-bound backends.


Available LocustfilesΒΆ

File Description
locustfile.py Main comprehensive test with 20+ user classes
locustfile_echo_delay.py Streamable HTTP echo with configurable delay
locustfile_baseline.py Component baselines (REST, MCP, PostgreSQL, Redis)
locustfile_highthroughput.py Optimized for maximum RPS
locustfile_slow_time_server.py Resilience testing against slow backends
locustfile_spin_detector.py CPU spin loop detection (spike/drop pattern)
locustfile_agentgateway_mcp_server_time.py External MCP server testing

Typical WorkflowsΒΆ

Measure gateway overheadΒΆ

Compare direct server performance vs. going through the gateway:

# 1. Baseline: hit the fast_test_server REST API directly
hey -n 10000 -c 200 -m POST -T 'application/json' \
    -d '{"message":"hello"}' http://localhost:8880/api/echo

# 2. Through gateway: Streamable HTTP MCP path
LOCUST_LOCUSTFILE=locustfile_echo_delay.py ECHO_DELAY_MS=0 make testing-up

Measure throughput with slow backendsΒΆ

# 500ms backend delay β€” theoretical max with 200 users is ~400 RPS
LOCUST_LOCUSTFILE=locustfile_echo_delay.py make testing-up
# Open http://localhost:8089, set 200 users, observe actual RPS

Stress test with many replicasΒΆ

LOCUST_LOCUSTFILE=locustfile_echo_delay.py \
  LOCUST_USERS=2000 LOCUST_SPAWN_RATE=100 \
  TESTING_LOCUST_WORKERS=4 \
  GATEWAY_REPLICAS=6 GUNICORN_WORKERS=5 \
  make testing-up

Headless CI runΒΆ

LOCUST_LOCUSTFILE=locustfile_echo_delay.py \
  LOCUST_MODE=headless LOCUST_USERS=100 LOCUST_RUN_TIME=60s \
  make testing-up
# Reports saved to reports/locust_report.html and reports/locust_*.csv

Stopping the StackΒΆ

make testing-down

Host TuningΒΆ

For 500+ concurrent users, tune the host OS first. See Performance Testing for TCP, file descriptor, and memory settings, or run:

sudo scripts/tune-loadtest.sh