ADR-0025: Add Granian as Alternative HTTP Server¶
- Status: Accepted
- Date: 2025-12-21
- Deciders: Core Engineering Team
Context¶
MCP Gateway uses Gunicorn with Uvicorn workers as its production HTTP server stack. This provides good performance with the uvicorn[standard] extras (ADR-0024) and is battle-tested. However, a Rust-based alternative called Granian offers potential benefits:
- Native HTTP/2 support (without requiring a reverse proxy)
- Native WebSocket support
- Native mTLS support
- Lower memory footprint
- Simpler process model
Granian is a Rust-based HTTP server for Python applications that implements ASGI, RSGI, and WSGI interfaces. It's built on: - Hyper: Rust's HTTP library - Tokio: Async runtime - PyO3: Python bindings
Decision¶
We will add Granian as an alternative HTTP server option while keeping Gunicorn + Uvicorn as the default.
Key points: - Gunicorn + Uvicorn remains the default server (stability, maturity) - Granian is available as an alternative for users who need its features - Both servers are production-ready and fully supported - Users can switch via the HTTP_SERVER environment variable
Usage:
# Using Gunicorn + Uvicorn (default)
make serve
# Using Granian (alternative)
make serve-granian
# Granian with HTTP/2 + TLS
make serve-granian-http2
Granian vs Gunicorn+Uvicorn¶
| Feature | Gunicorn + Uvicorn | Granian |
|---|---|---|
| Language | Python + C (uvloop) | Rust |
| HTTP/2 | Requires reverse proxy | Native |
| WebSocket | Via Uvicorn | Native |
| mTLS | Requires configuration | Native |
| Process Model | Master + Workers | Workers only |
| Hot Reload | Via watchfiles | Built-in |
| Memory | Higher (Python overhead) | Lower |
| Maturity | Very mature | Newer (production-ready) |
Performance Characteristics¶
Based on community benchmarks:
| Metric | Gunicorn+Uvicorn | Granian | Notes |
|---|---|---|---|
| Simple JSON | Baseline | +20-50% | Varies by workload |
| High concurrency | Good | Better | Less context switching |
| Memory per worker | ~80MB | ~40MB | Rust efficiency |
| Startup time | Slower | Faster | No preload needed |
Note: Actual performance varies by workload. Always benchmark with your specific use case.
Real-World Performance Comparison (Database-Bound Workload)¶
Profiling under load test with 2500 concurrent users against PostgreSQL backend:
| Metric | Gunicorn+Uvicorn | Granian | Notes |
|---|---|---|---|
| Memory per replica | ~2.7 GiB | ~4.0 GiB | Gunicorn 32% less |
| CPU per replica | ~740% | ~680% | Granian 8% less |
| Throughput (RPS) | ~2000 | ~2000 | Same (DB bottleneck) |
| Backpressure | ❌ None (queues unbounded) | ✅ Native (rejects excess) | Granian safer under overload |
| 503 under overload | No (timeouts instead) | Yes (clean rejection) | Granian fails gracefully |
Key Insight: When the bottleneck is the database (not HTTP parsing), both servers achieve similar throughput. The difference is in resource usage and overload behavior:
- Gunicorn uses less memory but has no admission control—under extreme load it will queue requests indefinitely, potentially causing OOM or cascading timeouts.
- Granian uses more memory but provides native backpressure—under extreme load it rejects excess requests with immediate 503, protecting system stability.
Why Granian Uses More Memory: - Multi-threaded Rust runtime (Tokio) overhead - Larger HTTP buffers (512KB per connection default) - Backpressure queues holding pending requests - PyO3 Python-Rust bindings overhead
Recommendation by Use Case:
| Scenario | Recommendation |
|---|---|
| Memory-constrained environment | Gunicorn (32% less RAM) |
| Load spike protection needed | Granian (native backpressure) |
| Predictable traffic patterns | Either (similar performance) |
| Unpredictable/bursty traffic | Granian (graceful degradation) |
Consequences¶
Positive¶
- Higher throughput potential - Rust-based server can handle more RPS
- Native HTTP/2 - No reverse proxy needed for HTTP/2 support
- Lower memory footprint - Rust's memory efficiency
- Simpler deployment - Single process model, no master/worker complexity
- Choice - Users can pick the server that best fits their needs
- Future-proof - Rust ecosystem continues to improve
Negative¶
- Newer project - Less battle-tested than Gunicorn (though production-ready)
- Binary dependency - Requires Rust compilation or prebuilt wheels
- Different tuning - Different configuration parameters to learn
- Optional dependency - Adds ~20MB to container if installed
Neutral¶
- Same application code - ASGI interface is identical
- Same configuration - Environment variables work the same way
- Coexistence - Both servers can be installed simultaneously
Configuration¶
Environment Variables¶
| Variable | Default | Description |
|---|---|---|
GRANIAN_WORKERS | auto (CPU cores, max 16) | Number of worker processes |
GRANIAN_RUNTIME_MODE | auto (mt if >8 workers, else st) | Runtime mode: mt (multi-threaded), st (single-threaded) |
GRANIAN_RUNTIME_THREADS | 1 | Runtime threads per worker |
GRANIAN_BLOCKING_THREADS | 1 | Blocking threads per worker (must be 1 for ASGI) |
GRANIAN_HTTP | auto | HTTP version: auto, 1, 2 |
GRANIAN_LOOP | uvloop | Event loop: uvloop, asyncio, rloop |
GRANIAN_BACKLOG | 2048 | OS socket backlog for pending connections |
GRANIAN_BACKPRESSURE | 512 | Max concurrent requests per worker before 503 rejection |
GRANIAN_HTTP1_BUFFER_SIZE | 524288 | HTTP/1 buffer size in bytes (512KB) |
GRANIAN_RESPAWN_FAILED | true | Respawn failed workers automatically |
GRANIAN_DEV_MODE | false | Enable hot reload (requires granian[reload]) |
GRANIAN_LOG_LEVEL | info | Log level: debug, info, warning, error |
DISABLE_ACCESS_LOG | true | Disable access logging for performance |
Backpressure capacity calculation:
Total capacity = GRANIAN_WORKERS × GRANIAN_BACKPRESSURE
Example: 16 workers × 64 backpressure = 1024 concurrent requests
- Requests 1-1024: Processed normally
- Request 1025+: Immediate 503 Service Unavailable (no queuing)
For production with high concurrency, use GRANIAN_BACKPRESSURE=64 with GRANIAN_WORKERS=16 for 1024 total capacity.
Docker Compose¶
# docker-compose.yml - set HTTP_SERVER to switch servers
services:
gateway:
environment:
- HTTP_SERVER=granian # Rust-based with native backpressure
# - HTTP_SERVER=gunicorn # Python-based, stable (default)
# Granian backpressure configuration (16 × 64 = 1024 concurrent)
- GRANIAN_WORKERS=16
- GRANIAN_BACKLOG=4096
- GRANIAN_BACKPRESSURE=64
# - GRANIAN_WORKERS=8
# - GRANIAN_HTTP=2 # Enable HTTP/2
Container Targets¶
# Run container with Gunicorn (default)
make container-run
make container-run-gunicorn
make container-run-gunicorn-ssl
# Run container with Granian (alternative)
make container-run-granian
make container-run-granian-ssl
# Or pass HTTP_SERVER directly
docker run mcpgateway/mcpgateway # Gunicorn (default)
docker run -e HTTP_SERVER=granian mcpgateway/mcpgateway # Granian
Switching Servers¶
To switch from Gunicorn to Granian:
- Install Granian:
pip install "mcpgateway[granian]" - Test locally:
make serve-granian - Benchmark both servers with your workload
- If Granian performs better, set
HTTP_SERVER=granianin your deployment
Files Changed¶
| File | Change |
|---|---|
pyproject.toml | Added granian optional dependency |
run-granian.sh | Startup script with performance optimizations |
docker-entrypoint.sh | Entrypoint that switches between servers |
Makefile | Added serve, container-run-granian, container-run-gunicorn targets |
docker-compose.yml | Added HTTP_SERVER environment variable |
Containerfile* | Include both servers, use docker-entrypoint.sh |
When to Use Each Server¶
Use Gunicorn + Uvicorn when:¶
- You need maximum stability and battle-tested components
- Your team is familiar with Gunicorn configuration
- You're running behind a reverse proxy that handles HTTP/2
- You need gevent/eventlet worker classes
Use Granian when:¶
- You want native HTTP/2 without a reverse proxy
- You're optimizing for memory efficiency
- You want the simplest possible deployment
- You're comfortable with newer technology
- Your benchmarks show better performance
Recommendation¶
For most users: Stick with Gunicorn (default)
Gunicorn + Uvicorn is battle-tested, well-documented, and provides excellent performance for most workloads. The uvicorn[standard] extras (ADR-0024) already provide significant optimizations.
Consider Granian when: - You need native HTTP/2 without a reverse proxy - Memory usage is a primary concern - Your benchmarks show measurable improvement - You're comfortable with newer technology
Note on Python 3.12+: Granian's Rust task implementation is not available on Python 3.12+, which limits some performance benefits. Both servers use uvloop for async I/O, so the main difference is HTTP parsing.
Status¶
This decision has been implemented. Both servers are available: - Gunicorn + Uvicorn: Default (make serve) - Granian: Alternative (make serve-granian or HTTP_SERVER=granian)
References¶
- GitHub Issue: #1695
- Related ADR: ADR-0024 (uvicorn[standard])
- Granian GitHub: emmett-framework/granian
- Granian Documentation: https://granian.dev/
- Hyper (Rust HTTP): https://hyper.rs/
- Tokio (Rust async): https://tokio.rs/