Performance TestingΒΆ
Use this guide to benchmark MCP Gateway under load, validate performance improvements, and identify bottlenecks before production deployment.
βοΈ Tooling: heyΒΆ
hey is a CLI-based HTTP load generator. Install it with:
brew install hey # macOS
sudo apt install hey # Debian/Ubuntu
go install github.com/rakyll/hey@latest # From source
π― Establishing a BaselineΒΆ
Before benchmarking the full MCP Gateway stack, run tests against the MCP server directly (if applicable) to establish baseline latency and throughput. This helps isolate issues related to gateway overhead, authentication, or network I/O.
If your backend service exposes a direct HTTP interface or gRPC gateway, target it with hey using the same payload and concurrency settings.
hey -n 5000 -c 100 \
-m POST \
-T application/json \
-D tests/hey/payload.json \
http://localhost:5000/your-backend-endpoint
Compare the 95/99th percentile latencies and error rates with and without the gateway in front. Any significant increase can guide you toward:
- Bottlenecks in auth middleware
- Overhead from JSON-RPC wrapping/unwrapping
- Improper worker/thread config in Gunicorn
π Scripted Load Tests: tests/hey/hey.shΒΆ
A wrapper script exists at:
This script provides:
- Strict error handling (
set -euo pipefail) - Helpful CLI interface (
-n,-c,-d, etc.) - Required dependency checks
- Optional dry-run mode
- Timestamped logging
Example usage:
./hey.sh -n 10000 -c 200 \
-X POST \
-T application/json \
-H "Authorization: Bearer $JWT" \
-d payload.json \
-u http://localhost:4444/rpc
The
payload.jsonfile is expected to be a valid JSON-RPC request payload.
Sample payload (tests/hey/payload.json):
{
"jsonrpc": "2.0",
"id": 1,
"method": "convert_time",
"params": {
"source_timezone": "Europe/Berlin",
"target_timezone": "Europe/Dublin",
"time": "09:00"
}
}
Logs are saved automatically (e.g. hey-20250610_120000.log).
π Interpreting ResultsΒΆ
When the test completes, look at:
| Metric | Interpretation |
|---|---|
| Requests/sec (RPS) | Raw throughput capability |
| 95/99th percentile | Tail latency - tune timeout, workers, or DB pooling |
| Non-2xx responses | Failures under load - common with CPU/memory starvation |
π§ͺ Tips & Best PracticesΒΆ
- Always test against a realistic endpoint (e.g.
POST /rpcwith auth and payload). - Use the same JWT and payload structure your clients would.
- Run from a dedicated machine to avoid local CPU skewing results.
- Use
make runormake serveto launch the app for local testing.
For runtime tuning details, see Gateway Tuning Guide.
π JSON Serialization Performance: orjsonΒΆ
MCP Gateway uses orjson for high-performance JSON serialization, providing 5-6x faster serialization and 1.5-2x faster deserialization compared to Python's standard library json module.
Why orjson?ΒΆ
orjson is a fast, correct JSON library for Python implemented in Rust. It provides:
- 5-6x faster serialization than stdlib json
- 1.5-2x faster deserialization than stdlib json
- 7% smaller output (more compact JSON)
- Native type support: datetime, UUID, numpy arrays, Pydantic models
- RFC 8259 compliance: strict JSON specification adherence
- Zero configuration: drop-in replacement, works automatically
Performance BenchmarksΒΆ
Run the benchmark script to measure JSON serialization performance on your system:
Sample Results:
| Payload Size | stdlib json | orjson | Speedup |
|---|---|---|---|
| 10 items | 10.32 ΞΌs | 1.43 ΞΌs | 623% |
| 100 items | 91.00 ΞΌs | 13.82 ΞΌs | 558% |
| 1,000 items | 893.53 ΞΌs | 135.00 ΞΌs | 562% |
| 5,000 items | 4.44 ms | 682.14 ΞΌs | 551% |
Key Findings:
β Serialization: 5-6x faster (550-623% speedup) β Deserialization: 1.5-2x faster (55-115% speedup) β Output Size: 7% smaller (more compact JSON) β Performance scales: Advantage increases with payload size
Where Performance Matters MostΒΆ
orjson provides the biggest impact for:
- Large list endpoints:
GET /tools,GET /servers,GET /gateways(100+ items) - Bulk export operations: Exporting 1000+ entities to JSON
- High-throughput APIs: Services handling >1000 req/s
- Real-time streaming: SSE and WebSocket with frequent JSON events
- Federation sync: Tool catalog exchange between gateways
- Admin UI data loading: Large tables with many records
Implementation DetailsΒΆ
MCP Gateway configures orjson as the default JSON response class for all FastAPI endpoints:
from mcpgateway.utils.orjson_response import ORJSONResponse
app = FastAPI(
default_response_class=ORJSONResponse, # Use orjson for all responses
# ... other config
)
Options enabled: - OPT_NON_STR_KEYS: Allow non-string dict keys (integers, UUIDs) - OPT_SERIALIZE_NUMPY: Support numpy arrays if numpy is installed
Datetime serialization: - Uses RFC 3339 format (ISO 8601 with timezone) - Naive datetimes treated as UTC - Example: 2025-01-19T12:00:00+00:00
Testing orjson IntegrationΒΆ
All JSON serialization is automatically handled by orjson. No client changes required.
Verify orjson is active:
# Start the development server
make dev
# Check that responses are using orjson (compact, fast)
curl -s http://localhost:8000/health | jq .
# Measure response time for large endpoint
time curl -s http://localhost:8000/tools > /dev/null
Unit tests:
# Run orjson-specific tests
pytest tests/unit/mcpgateway/utils/test_orjson_response.py -v
# Verify 100% code coverage
pytest tests/unit/mcpgateway/utils/test_orjson_response.py --cov=mcpgateway.utils.orjson_response --cov-report=term-missing
Performance ImpactΒΆ
Based on benchmark results, orjson provides:
| Metric | Improvement |
|---|---|
| Serialization speed | 5-6x faster |
| Deserialization speed | 1.5-2x faster |
| Output size | 7% smaller |
| API throughput | 15-30% higher RPS |
| CPU usage | 10-20% lower |
| Response latency (p95) | 20-40% faster |
Production benefits: - Higher requests/second capacity - Lower CPU utilization per request - Faster page loads for Admin UI - Reduced bandwidth usage (smaller JSON) - Better tail latency (p95, p99)
π¬ Combining Performance OptimizationsΒΆ
For maximum performance, combine multiple optimizations:
- orjson serialization (5-6x faster JSON) β Automatic
- Response compression (30-70% bandwidth reduction) β See compression docs
- Redis caching (avoid repeated serialization) β Optional
- Connection pooling (reuse DB connections) β Automatic
- Async I/O (non-blocking operations) β Automatic with FastAPI
These optimizations are complementary and provide cumulative benefits.