Rust MCP RuntimeΒΆ
The Rust MCP runtime is an optional sidecar/runtime path for ContextForge's streamable HTTP MCP traffic. It is designed to move the public MCP hot path out of Python incrementally while keeping Python authoritative for authentication, token scoping, and RBAC.
It is also the first concrete precedent for the broader Modular Runtime Architecture: a protocol-specific runtime that can move out of the Python process while the core platform remains the shared policy and control plane. The generalized implementor-facing contract for future modules is documented in the Modular Runtime Specification.
This page describes the current architecture and the supported rollout modes.
Mode ModelΒΆ
The user-facing control is RUST_MCP_MODE:
| Mode | Public /mcp ingress | Rust session/event/resume/live-stream cores | Intended use |
|---|---|---|---|
off | Python | No | Baseline Python MCP path |
shadow | Python | No public Rust ownership | Safety-first rollback/comparison mode with Rust sidecar present |
edge | Rust | No | Direct public Rust ingress with Python still backing more MCP internals |
full | Rust | Yes | Fastest public Rust path with Rust-owned MCP session/runtime cores |
Use the testing stack wrappers to bring these up locally:
Runtime Mode OverrideΒΆ
The boot env vars RUST_MCP_MODE and RUST_A2A_MODE still pick the initial mode. When the boot mode is edge an authorized admin can flip the public /mcp ingress (and the registered-A2A invocation path) between shadow and edge at runtime, without a restart.
Why only edge? The repo's safety invariant for routing public traffic to Rust requires BOTH experimental_rust_mcp_runtime_enabled and experimental_rust_mcp_session_auth_reuse_enabled (and the analogous experimental_rust_a2a_runtime_delegate_enabled for A2A). Only edge boot sets both flags. A shadow-booted deployment has the Rust sidecar present but session-auth-reuse disabled, so an override to edge cannot safely route public traffic β the API rejects such PATCHes with 409. Operators who want runtime flippability must boot with edge; from there they can flip to shadow (forces Python path) and back to edge (restores default) freely.
| Method | Path | Body | Permission |
|---|---|---|---|
| GET | /admin/runtime/mcp-mode | β | admin.system_config |
| PATCH | /admin/runtime/mcp-mode | {"mode": "shadow" \| "edge"} | admin.system_config |
| GET | /admin/runtime/a2a-mode | β | admin.system_config |
| PATCH | /admin/runtime/a2a-mode | {"mode": "shadow" \| "edge"} | admin.system_config |
Behavior:
- The override lives in process memory. A restart re-reads
RUST_MCP_MODE/RUST_A2A_MODE; there is no new persistence surface in Postgres. - Drain semantics are natural: in-flight requests complete on their original transport and only newly-accepted requests follow the flip.
- Runtime flip gating is per-target-mode:
mode=shadowis accepted from any boot that has a dispatcher mounted (shadoworedge). This is the escape hatch β it lets admins clear a staleoverride=edgethat landed via a Redis hint inherited from a prior edge-boot deploy, without needing to flush Redis by hand.mode=edgeis accepted only fromboot_mode=edge, which is the only configuration where the session-auth-reuse / delegate-enabled safety invariant is met.offandfullboot modes return409for every PATCH:offhas no Rust sidecar,fullmounts a plain Rust proxy with no dispatcher so an override can't take effect.- The coordinator's boot reconciliation (and the live pub/sub listener) discards messages whose mode cannot safely take effect on the current deployment. The reason surfaces via
/healthundermcp_runtime.boot_reconcile_statusas one of: incompatible_no_dispatcher: any hint on aboot=offdeploy (no Rust sidecar, no mechanism to honor the override).incompatible_boot_full: any hint on aboot=fulldeploy (plain Rust proxy mounted with no dispatcher; the override would strand).incompatible_safety_flag: anedgehint on aboot=shadowdeploy (the session-auth-reuse / delegate-enabled safety invariant is unmet).
The Redis hint key is intentionally NOT deleted on discard β a future compatible-boot pod must still be able to read it, and stale hints expire on their own via the 24h TTL set at publish time. An operator who wants to clear immediately can DEL contextforge:runtime:mode_state:{runtime}.
The same compatibility check runs on every live pub/sub message: a remote pod's flip that the local deployment can't safely honor is discarded with a WARN log (no INCOMPATIBLE_* state is recorded for live discards because they don't represent boot state β only a log line). - Each successful flip writes a runtime_config audit trail entry via the existing SecurityLogger.log_data_access pathway. Audit-write failures caused by transient DB issues do not roll back the flip β the response body reports audit_persisted: false so the caller knows the audit gap. - When Redis is attached but the version counter cannot be safely allocated (INCR fails or returns a value not greater than the local floor), the PATCH returns 503 Service Unavailable. Falling back to a local version could collide with a concurrent PATCH on a peer pod and silently lose one of the two flips at peer dedup time. - The PATCH response includes: - publish_status: "propagated" | "local-only" | "failed" | "superseded" so the caller knows whether peers received the flip. - audit_persisted: bool. - The currently effective mode and propagation status surface on /health under mcp_runtime and a2a_runtime (boot_mode, effective_mode, override_active, cluster_propagation, last_change). - cluster_propagation is one of: - "redis" β coordinator is publishing/subscribing successfully. - "disabled" β Redis is intentionally not configured for this deployment. - "degraded" β Redis is configured but the coordinator failed to attach; operators should treat this as alertable (e.g. fail readiness or page).
Reverse-proxy deployments β important caveatΒΆ
The override updates the Python gateway's in-process state (and propagates across pods via Redis), but the public /mcp ingress is sometimes terminated by an upstream reverse proxy (typically nginx) before reaching the Python gateway. In that topology a runtime flip is observable but not behavior-changing at the public ingress until the proxy is reconfigured.
Two deployment shapes:
-
Single-process / no proxy (FastAPI on
:4444is the only public ingress). Runtime flips are end-to-end functional forboot_mode=edge;boot_mode=shadowacceptsmode=shadowPATCHes as a clearance escape hatch but cannot be promoted tomode=edge(see the 409 contract above). The/mcpmount is the per-requestMCPStreamableHTTPModeDispatcher, which reads the override on every request. Boot modesoffandfullare not flippable at all. -
Reverse-proxy in front (e.g. nginx routing public
GET/POST/DELETE /mcpeither togateway:4444for Python or directly to the Rust public listener atgateway:8787for edge/full). The nginx config is decided at deploy time fromRUST_MCP_MODE. The proxy is not aware of the runtime override. Symptoms: -
edgeβshadowflip: the gateway's Python path is now ready to serve/mcp, but nginx is still routing to:8787. Public traffic continues to land on Rust until nginx is updated. shadowβedgeflip: FastAPI on:4444is now configured to proxy to Rust, but nginx is still routing all public/mcpto:4444. Traffic does not reach the Rust public listener directly. (Note: because the Python proxy still forwards to the Rust sidecar over the internal UDS/loopback URL, requests still execute against Rust β but the latency benefit of bypassing Python is not realized.)
Until the proxy can follow the override, treat the API as a single-pod control surface: useful for CI / compliance harnesses that talk straight to FastAPI, and for incident-rollback scenarios where you also have a mechanism to update the proxy. For production reverse-proxy deployments, plan to either:
- Update the proxy config alongside the API call (e.g. write the new mode to a shared store the proxy consults, or run a configuration-management step that rewrites nginx and reloads it), or
- Use the API only for the single-pod / single-process scenarios above.
Tracking issue for an nginx-side mechanism (e.g. an OpenResty lua module that consults the Redis hint key) is #4278.
Cluster-wide propagationΒΆ
When REDIS_URL is configured, every pod runs a RuntimeStateCoordinator that subscribes to the contextforge:runtime:mode pub/sub channel. A successful PATCH on any pod publishes a versioned message; every other pod applies it under monotonic versioning (last writer wins, ties impossible because the version is allocated via INCR on a per-runtime Redis counter).
A short-lived hint key per runtime (contextforge:runtime:mode_state:mcp and contextforge:runtime:mode_state:a2a, TTL 24h) lets a freshly started pod reconcile to the cluster's current desired override on boot.
When Redis is unavailable, the coordinator degrades to per-pod scope; the endpoint still works on the pod that received the PATCH and the response payload reports cluster_propagation: "disabled" so operators know to flip each pod individually.
To revert the cluster to env-var defaults, delete the hint keys and restart the pods:
Quick-start examplesΒΆ
# Inspect current mode (any pod)
curl -H "Authorization: Bearer $JWT" .../admin/runtime/mcp-mode
# Flip the public /mcp ingress back to Python (incident rollback)
curl -X PATCH -H "Authorization: Bearer $JWT" \
-H "Content-Type: application/json" \
-d '{"mode": "shadow"}' \
.../admin/runtime/mcp-mode
# Flip A2A delegate back to Rust
curl -X PATCH -H "Authorization: Bearer $JWT" \
-H "Content-Type: application/json" \
-d '{"mode": "edge"}' \
.../admin/runtime/a2a-mode
Request FlowsΒΆ
off and shadowΒΆ
In off and shadow, the public MCP path remains Python-owned:
client
-> nginx
-> Python gateway transport/auth/token scoping/RBAC
-> Python MCP handlers
-> upstream MCP server
shadow differs from off only in that the Rust sidecar is present and can be used for internal validation and comparison; it does not own the public MCP transport.
edge and fullΒΆ
In edge and full, nginx routes public GET/POST/DELETE /mcp directly to the Rust runtime:
client
-> nginx
-> Rust public listener
-> trusted Python auth endpoint (internal)
-> Rust MCP routing/execution/session logic
-> upstream MCP server or narrow Python internal endpoint
Important details:
- Direct public Rust ingress is enabled by the dedicated public listener set up from
RUST_MCP_MODE=edge|full. - Rust authenticates public traffic through the trusted Python internal endpoint
POST /_internal/mcp/authenticate. - Rust strips forwarded/proxy-chain headers on the trusted Rust -> Python hop so Python evaluates the request as an internal runtime dispatch rather than as an external client IP.
Responsibility SplitΒΆ
The current split is intentionally conservative:
| Concern | Python | Rust |
|---|---|---|
| JWT authentication | Yes | Via trusted internal Python auth |
| Token scoping / team visibility | Yes | Consumes authenticated context |
| RBAC | Yes | Enforces Python-authenticated result |
| Public MCP HTTP edge | off, shadow | edge, full |
| Session registry | Python in off, shadow | Rust in full |
| Event store / replay / resume | Python in off, shadow, edge | Rust in full |
Live GET /mcp SSE edge | Python in off, shadow, edge | Rust in full |
| Affinity / owner-worker forwarding | Python in off, shadow, edge | Rust in full |
Direct tools/call execution | Python fallback still exists | Rust hot path when eligible |
The important architectural point is that Rust does not currently replace the full security model. Python remains the authority for auth and RBAC while Rust owns progressively more of the public MCP transport and session/runtime work.
Session/Auth Reuse ModelΒΆ
To reduce repeated auth overhead on session-bound MCP traffic, Rust can reuse authenticated context for an established MCP session. This is not a global per-user cache. It is bound to the MCP session and validated against the original authenticated context.
Key invariants:
- a session belongs to exactly one authenticated caller context
- a different caller cannot reuse the same
mcp-session-id - a changed auth binding on the same session is denied rather than reused
- replay/resume and delete operations preserve the same ownership checks
This model is validated by the dedicated isolation suite:
See the detailed threat model and test matrix in crates/mcp_runtime/TESTING-DESIGN.md in the repository.
VerificationΒΆ
After bringing up the stack, verify the active mode through /health:
Representative full-Rust headers:
x-contextforge-mcp-runtime-mode: rust-managed
x-contextforge-mcp-transport-mounted: rust
x-contextforge-mcp-session-core-mode: rust
x-contextforge-mcp-event-store-mode: rust
x-contextforge-mcp-resume-core-mode: rust
x-contextforge-mcp-live-stream-core-mode: rust
x-contextforge-mcp-affinity-core-mode: rust
x-contextforge-mcp-session-auth-reuse-mode: rust
Representative shadow-mode headers:
x-contextforge-mcp-runtime-mode: rust-managed
x-contextforge-mcp-transport-mounted: python
x-contextforge-mcp-session-core-mode: python
x-contextforge-mcp-event-store-mode: python
x-contextforge-mcp-resume-core-mode: python
x-contextforge-mcp-live-stream-core-mode: python
x-contextforge-mcp-affinity-core-mode: python
x-contextforge-mcp-session-auth-reuse-mode: python
Plugin Execution and tools/call FlowΒΆ
The Rust runtime does not execute plugin code directly. All plugin execution happens in Python, with results communicated to Rust over internal HTTP RPC endpoints.
Internal RPC EndpointsΒΆ
Rust derives internal endpoint URLs from its --backend-rpc-url configuration. The following endpoints exist on the Python side:
| Endpoint | Purpose |
|---|---|
POST /_internal/mcp/authenticate | JWT validation, token scoping, RBAC context |
POST /_internal/mcp/tools/call/resolve | Build execution plan; runs pre-invoke plugin hooks |
POST /_internal/mcp/tools/call | Full Python fallback execution with all plugins |
POST /_internal/mcp/tools/call/metric | Record tool execution timing and success/failure |
These are trusted internal endpoints, not exposed to external clients.
tools/call Request Flow (edge and full modes)ΒΆ
When a tools/call request arrives at the Rust runtime in edge or full mode, it follows a two-phase resolve-then-execute model:
client
-> nginx
-> Rust public listener
-> Rust: POST /_internal/mcp/tools/call/resolve (Python)
-> Python: auth + RBAC + tool lookup
-> Python: pre-invoke plugin hooks (if registered)
-> Python: returns execution plan to Rust
-> Rust: eligible?
YES -> Rust applies modified args + headers from plan
-> Rust calls upstream MCP server directly
-> Rust: POST /_internal/mcp/tools/call/metric (Python)
NO -> Rust: POST /_internal/mcp/tools/call (Python)
-> Python: full invoke_tool() with pre + post-invoke plugins
-> Python calls upstream MCP server
Plugin Handling by ModeΒΆ
| Mode | Pre-invoke hooks | Post-invoke hooks | Tool execution |
|---|---|---|---|
off | Python (normal path) | Python (normal path) | Python |
shadow | Python (normal path) | Python (normal path) | Python |
edge | Python (via /resolve RPC) | Python (fallback only) | Rust direct when eligible, Python fallback otherwise |
full | Python (via /resolve RPC) | Python (fallback only) | Rust direct when eligible, Python fallback otherwise |
Key behaviors:
- Pre-invoke hooks always run in Python. In
edge/full, they execute during the/resolvecall. Their output β modified arguments and injected headers β is returned in the execution plan for Rust to apply. - Post-invoke hooks cannot run after Rust direct execution, so their presence forces an immediate fallback to the full Python path (
eligible: false,fallbackReason: post-invoke-hooks-configured). - Plan caching is disabled when pre-invoke hooks executed, because hook results may depend on per-call context (e.g. connection IDs, rotated credentials).
Direct Execution EligibilityΒΆ
A tool is eligible for Rust direct execution only when all of the following are true:
- No post-invoke plugin hooks are registered
- No active observability trace
- Tool integration type is
MCP - Transport is
streamablehttp - No JSONPath filter configured on the tool
- No custom CA certificate on the gateway
- Gateway URL is present
- Gateway is not in
direct_proxymode - OAuth grant type is not
authorization_code(or token retrieval succeeds) - Tool resolves unambiguously to a single enabled, reachable tool
When any condition fails, prepare_rust_mcp_tool_execution() returns eligible: false with a fallbackReason string, and Rust forwards the full request to the Python /_internal/mcp/tools/call endpoint.
Validation and Benchmark WorkflowΒΆ
Recommended stack-backed validation:
make testing-rebuild-rust-full
make test-mcp-protocol-e2e
make test-mcp-rbac
make test-mcp-session-isolation
cargo test --release --manifest-path crates/mcp_runtime/Cargo.toml
Recommended benchmark wrappers:
make benchmark-mcp-mixed
make benchmark-mcp-tools
make benchmark-mcp-mixed-300
make benchmark-mcp-tools-300
For Rust-local profiling and crate-level lint/test helpers, see crates/mcp_runtime/README.md in the repository.