Rust MCP Runtime¶

The Rust MCP runtime is an optional sidecar/runtime path for ContextForge's streamable HTTP MCP traffic. It is designed to move the public MCP hot path out of Python incrementally while keeping Python authoritative for authentication, token scoping, and RBAC.

It is also the first concrete precedent for the broader Modular Runtime Architecture: a protocol-specific runtime that can move out of the Python process while the core platform remains the shared policy and control plane. The generalized implementor-facing contract for future modules is documented in the Modular Runtime Specification.

This page describes the current architecture and the supported rollout modes.

Mode Model¶

The user-facing control is RUST_MCP_MODE:

Mode	Public `/mcp` ingress	Rust session/event/resume/live-stream cores	Intended use
`off`	Python	No	Baseline Python MCP path
`shadow`	Python	No public Rust ownership	Safety-first rollback/comparison mode with Rust sidecar present
`edge`	Rust	No	Direct public Rust ingress with Python still backing more MCP internals
`full`	Rust	Yes	Fastest public Rust path with Rust-owned MCP session/runtime cores

Use the testing stack wrappers to bring these up locally:

make testing-rebuild-rust-shadow
make testing-rebuild-rust
make testing-rebuild-rust-full

Runtime Mode Override¶

The boot env vars RUST_MCP_MODE and RUST_A2A_MODE still pick the initial mode. When the boot mode is edge an authorized admin can flip the public /mcp ingress (and the registered-A2A invocation path) between shadow and edge at runtime, without a restart.

Why only edge? The repo's safety invariant for routing public traffic to Rust requires BOTH experimental_rust_mcp_runtime_enabled and experimental_rust_mcp_session_auth_reuse_enabled (and the analogous experimental_rust_a2a_runtime_delegate_enabled for A2A). Only edge boot sets both flags. A shadow-booted deployment has the Rust sidecar present but session-auth-reuse disabled, so an override to edge cannot safely route public traffic — the API rejects such PATCHes with 409. Operators who want runtime flippability must boot with edge; from there they can flip to shadow (forces Python path) and back to edge (restores default) freely.

Method	Path	Body	Permission
GET	`/admin/runtime/mcp-mode`	—	`admin.system_config`
PATCH	`/admin/runtime/mcp-mode`	`{"mode": "shadow" \\| "edge"}`	`admin.system_config`
GET	`/admin/runtime/a2a-mode`	—	`admin.system_config`
PATCH	`/admin/runtime/a2a-mode`	`{"mode": "shadow" \\| "edge"}`	`admin.system_config`

Behavior:

The override lives in process memory. A restart re-reads RUST_MCP_MODE / RUST_A2A_MODE; there is no new persistence surface in Postgres.
Drain semantics are natural: in-flight requests complete on their original transport and only newly-accepted requests follow the flip.
Runtime flip gating is per-target-mode:
mode=shadow is accepted from any boot that has a dispatcher mounted (shadow or edge). This is the escape hatch — it lets admins clear a stale override=edge that landed via a Redis hint inherited from a prior edge-boot deploy, without needing to flush Redis by hand.
mode=edge is accepted only from boot_mode=edge, which is the only configuration where the session-auth-reuse / delegate-enabled safety invariant is met.
off and full boot modes return 409 for every PATCH: off has no Rust sidecar, full mounts a plain Rust proxy with no dispatcher so an override can't take effect.
The coordinator's boot reconciliation (and the live pub/sub listener) discards messages whose mode cannot safely take effect on the current deployment. The reason surfaces via /health under mcp_runtime.boot_reconcile_status as one of:
incompatible_no_dispatcher: any hint on a boot=off deploy (no Rust sidecar, no mechanism to honor the override).
incompatible_boot_full: any hint on a boot=full deploy (plain Rust proxy mounted with no dispatcher; the override would strand).
incompatible_safety_flag: an edge hint on a boot=shadow deploy (the session-auth-reuse / delegate-enabled safety invariant is unmet).

The Redis hint key is intentionally NOT deleted on discard — a future compatible-boot pod must still be able to read it, and stale hints expire on their own via the 24h TTL set at publish time. An operator who wants to clear immediately can DEL contextforge:runtime:mode_state:{runtime}.

The same compatibility check runs on every live pub/sub message: a remote pod's flip that the local deployment can't safely honor is discarded with a WARN log (no INCOMPATIBLE_* state is recorded for live discards because they don't represent boot state — only a log line). - Each successful flip writes a runtime_config audit trail entry via the existing SecurityLogger.log_data_access pathway. Audit-write failures caused by transient DB issues do not roll back the flip — the response body reports audit_persisted: false so the caller knows the audit gap. - When Redis is attached but the version counter cannot be safely allocated (INCR fails or returns a value not greater than the local floor), the PATCH returns 503 Service Unavailable. Falling back to a local version could collide with a concurrent PATCH on a peer pod and silently lose one of the two flips at peer dedup time. - The PATCH response includes: - publish_status: "propagated" | "local-only" | "failed" | "superseded" so the caller knows whether peers received the flip. - audit_persisted: bool. - The currently effective mode and propagation status surface on /health under mcp_runtime and a2a_runtime (boot_mode, effective_mode, override_active, cluster_propagation, last_change). - cluster_propagation is one of: - "redis" — coordinator is publishing/subscribing successfully. - "disabled" — Redis is intentionally not configured for this deployment. - "degraded" — Redis is configured but the coordinator failed to attach; operators should treat this as alertable (e.g. fail readiness or page).

Reverse-proxy deployments — important caveat¶

The override updates the Python gateway's in-process state (and propagates across pods via Redis), but the public /mcp ingress is sometimes terminated by an upstream reverse proxy (typically nginx) before reaching the Python gateway. In that topology a runtime flip is observable but not behavior-changing at the public ingress until the proxy is reconfigured.

Two deployment shapes:

Single-process / no proxy (FastAPI on :4444 is the only public ingress). Runtime flips are end-to-end functional for boot_mode=edge; boot_mode=shadow accepts mode=shadow PATCHes as a clearance escape hatch but cannot be promoted to mode=edge (see the 409 contract above). The /mcp mount is the per-request MCPStreamableHTTPModeDispatcher, which reads the override on every request. Boot modes off and full are not flippable at all.
Reverse-proxy in front (e.g. nginx routing public GET/POST/DELETE /mcp either to gateway:4444 for Python or directly to the Rust public listener at gateway:8787 for edge/full). The nginx config is decided at deploy time from RUST_MCP_MODE. The proxy is not aware of the runtime override. Symptoms:
edge → shadow flip: the gateway's Python path is now ready to serve /mcp, but nginx is still routing to :8787. Public traffic continues to land on Rust until nginx is updated.
shadow → edge flip: FastAPI on :4444 is now configured to proxy to Rust, but nginx is still routing all public /mcp to :4444. Traffic does not reach the Rust public listener directly. (Note: because the Python proxy still forwards to the Rust sidecar over the internal UDS/loopback URL, requests still execute against Rust — but the latency benefit of bypassing Python is not realized.)

Until the proxy can follow the override, treat the API as a single-pod control surface: useful for CI / compliance harnesses that talk straight to FastAPI, and for incident-rollback scenarios where you also have a mechanism to update the proxy. For production reverse-proxy deployments, plan to either:

Update the proxy config alongside the API call (e.g. write the new mode to a shared store the proxy consults, or run a configuration-management step that rewrites nginx and reloads it), or
Use the API only for the single-pod / single-process scenarios above.

Tracking issue for an nginx-side mechanism (e.g. an OpenResty lua module that consults the Redis hint key) is #4278.

Cluster-wide propagation¶

When REDIS_URL is configured, every pod runs a RuntimeStateCoordinator that subscribes to the contextforge:runtime:mode pub/sub channel. A successful PATCH on any pod publishes a versioned message; every other pod applies it under monotonic versioning (last writer wins, ties impossible because the version is allocated via INCR on a per-runtime Redis counter).

A short-lived hint key per runtime (contextforge:runtime:mode_state:mcp and contextforge:runtime:mode_state:a2a, TTL 24h) lets a freshly started pod reconcile to the cluster's current desired override on boot.

When Redis is unavailable, the coordinator degrades to per-pod scope; the endpoint still works on the pod that received the PATCH and the response payload reports cluster_propagation: "disabled" so operators know to flip each pod individually.

To revert the cluster to env-var defaults, delete the hint keys and restart the pods:

redis-cli DEL contextforge:runtime:mode_state:mcp contextforge:runtime:mode_state:a2a

Quick-start examples¶

# Inspect current mode (any pod)
curl -H "Authorization: Bearer $JWT" .../admin/runtime/mcp-mode

# Flip the public /mcp ingress back to Python (incident rollback)
curl -X PATCH -H "Authorization: Bearer $JWT" \
     -H "Content-Type: application/json" \
     -d '{"mode": "shadow"}' \
     .../admin/runtime/mcp-mode

# Flip A2A delegate back to Rust
curl -X PATCH -H "Authorization: Bearer $JWT" \
     -H "Content-Type: application/json" \
     -d '{"mode": "edge"}' \
     .../admin/runtime/a2a-mode

Request Flows¶

`off` and `shadow`¶

In off and shadow, the public MCP path remains Python-owned:

client
  -> nginx
  -> Python gateway transport/auth/token scoping/RBAC
  -> Python MCP handlers
  -> upstream MCP server

shadow differs from off only in that the Rust sidecar is present and can be used for internal validation and comparison; it does not own the public MCP transport.

`edge` and `full`¶

In edge and full, nginx routes public GET/POST/DELETE /mcp directly to the Rust runtime:

client
  -> nginx
  -> Rust public listener
  -> trusted Python auth endpoint (internal)
  -> Rust MCP routing/execution/session logic
  -> upstream MCP server or narrow Python internal endpoint

Important details:

Direct public Rust ingress is enabled by the dedicated public listener set up from RUST_MCP_MODE=edge|full.
Rust authenticates public traffic through the trusted Python internal endpoint POST /_internal/mcp/authenticate.
Rust strips forwarded/proxy-chain headers on the trusted Rust -> Python hop so Python evaluates the request as an internal runtime dispatch rather than as an external client IP.

Responsibility Split¶

The current split is intentionally conservative:

Concern	Python	Rust
JWT authentication	Yes	Via trusted internal Python auth
Token scoping / team visibility	Yes	Consumes authenticated context
RBAC	Yes	Enforces Python-authenticated result
Public MCP HTTP edge	`off`, `shadow`	`edge`, `full`
Session registry	Python in `off`, `shadow`	Rust in `full`
Event store / replay / resume	Python in `off`, `shadow`, `edge`	Rust in `full`
Live `GET /mcp` SSE edge	Python in `off`, `shadow`, `edge`	Rust in `full`
Affinity / owner-worker forwarding	Python in `off`, `shadow`, `edge`	Rust in `full`
Direct `tools/call` execution	Python fallback still exists	Rust hot path when eligible

The important architectural point is that Rust does not currently replace the full security model. Python remains the authority for auth and RBAC while Rust owns progressively more of the public MCP transport and session/runtime work.

Session/Auth Reuse Model¶

To reduce repeated auth overhead on session-bound MCP traffic, Rust can reuse authenticated context for an established MCP session. This is not a global per-user cache. It is bound to the MCP session and validated against the original authenticated context.

Key invariants:

a session belongs to exactly one authenticated caller context
a different caller cannot reuse the same mcp-session-id
a changed auth binding on the same session is denied rather than reused
replay/resume and delete operations preserve the same ownership checks

This model is validated by the dedicated isolation suite:

make test-mcp-session-isolation

See the detailed threat model and test matrix in crates/mcp_runtime/TESTING-DESIGN.md in the repository.

Verification¶

After bringing up the stack, verify the active mode through /health:

curl -sD - http://localhost:8080/health -o /dev/null | rg 'x-contextforge-mcp-'

Representative full-Rust headers:

x-contextforge-mcp-runtime-mode: rust-managed
x-contextforge-mcp-transport-mounted: rust
x-contextforge-mcp-session-core-mode: rust
x-contextforge-mcp-event-store-mode: rust
x-contextforge-mcp-resume-core-mode: rust
x-contextforge-mcp-live-stream-core-mode: rust
x-contextforge-mcp-affinity-core-mode: rust
x-contextforge-mcp-session-auth-reuse-mode: rust

Representative shadow-mode headers:

x-contextforge-mcp-runtime-mode: rust-managed
x-contextforge-mcp-transport-mounted: python
x-contextforge-mcp-session-core-mode: python
x-contextforge-mcp-event-store-mode: python
x-contextforge-mcp-resume-core-mode: python
x-contextforge-mcp-live-stream-core-mode: python
x-contextforge-mcp-affinity-core-mode: python
x-contextforge-mcp-session-auth-reuse-mode: python

Plugin Execution and tools/call Flow¶

The Rust runtime does not execute plugin code directly. All plugin execution happens in Python, with results communicated to Rust over internal HTTP RPC endpoints.

Internal RPC Endpoints¶

Rust derives internal endpoint URLs from its --backend-rpc-url configuration. The following endpoints exist on the Python side:

Endpoint	Purpose
`POST /_internal/mcp/authenticate`	JWT validation, token scoping, RBAC context
`POST /_internal/mcp/tools/call/resolve`	Build execution plan; runs pre-invoke plugin hooks
`POST /_internal/mcp/tools/call`	Full Python fallback execution with all plugins
`POST /_internal/mcp/tools/call/metric`	Record tool execution timing and success/failure

These are trusted internal endpoints, not exposed to external clients.

tools/call Request Flow (edge and full modes)¶

When a tools/call request arrives at the Rust runtime in edge or full mode, it follows a two-phase resolve-then-execute model:

client
  -> nginx
  -> Rust public listener
  -> Rust: POST /_internal/mcp/tools/call/resolve (Python)
     -> Python: auth + RBAC + tool lookup
     -> Python: pre-invoke plugin hooks (if registered)
     -> Python: returns execution plan to Rust
  -> Rust: eligible?
     YES -> Rust applies modified args + headers from plan
            -> Rust calls upstream MCP server directly
            -> Rust: POST /_internal/mcp/tools/call/metric (Python)
     NO  -> Rust: POST /_internal/mcp/tools/call (Python)
            -> Python: full invoke_tool() with pre + post-invoke plugins
            -> Python calls upstream MCP server

Plugin Handling by Mode¶

Mode	Pre-invoke hooks	Post-invoke hooks	Tool execution
`off`	Python (normal path)	Python (normal path)	Python
`shadow`	Python (normal path)	Python (normal path)	Python
`edge`	Python (via `/resolve` RPC)	Python (fallback only)	Rust direct when eligible, Python fallback otherwise
`full`	Python (via `/resolve` RPC)	Python (fallback only)	Rust direct when eligible, Python fallback otherwise

Key behaviors:

Pre-invoke hooks always run in Python. In edge/full, they execute during the /resolve call. Their output — modified arguments and injected headers — is returned in the execution plan for Rust to apply.
Post-invoke hooks cannot run after Rust direct execution, so their presence forces an immediate fallback to the full Python path (eligible: false, fallbackReason: post-invoke-hooks-configured).
Plan caching is disabled when pre-invoke hooks executed, because hook results may depend on per-call context (e.g. connection IDs, rotated credentials).

Direct Execution Eligibility¶

A tool is eligible for Rust direct execution only when all of the following are true:

No post-invoke plugin hooks are registered
No active observability trace
Tool integration type is MCP
Transport is streamablehttp
No JSONPath filter configured on the tool
No custom CA certificate on the gateway
Gateway URL is present
Gateway is not in direct_proxy mode
OAuth grant type is not authorization_code (or token retrieval succeeds)
Tool resolves unambiguously to a single enabled, reachable tool

When any condition fails, prepare_rust_mcp_tool_execution() returns eligible: false with a fallbackReason string, and Rust forwards the full request to the Python /_internal/mcp/tools/call endpoint.

Validation and Benchmark Workflow¶

Recommended stack-backed validation:

make testing-rebuild-rust-full
make test-mcp-protocol-e2e
make test-mcp-rbac
make test-mcp-session-isolation
cargo test --release --manifest-path crates/mcp_runtime/Cargo.toml

Recommended benchmark wrappers:

make benchmark-mcp-mixed
make benchmark-mcp-tools
make benchmark-mcp-mixed-300
make benchmark-mcp-tools-300

For Rust-local profiling and crate-level lint/test helpers, see crates/mcp_runtime/README.md in the repository.