Skip to content

ADR-033: Tool Lookup Cache for invoke_toolΒΆ

  • Status: Accepted
  • Date: 2025-01-20
  • Deciders: Platform Team

ContextΒΆ

Load testing exposed a hot-path bottleneck in ToolService.invoke_tool: every tool invocation performs a DB lookup for the tool (and its gateway), even when the same tool is invoked repeatedly. This created:

  • High database QPS proportional to tool invocations
  • Connection pool saturation during slow upstream calls
  • Elevated p95/p99 latency on high-concurrency tests

Existing registry/admin caches do not cover single-tool lookups, and invoke_tool cannot reuse registry list caches.

Related issues: #1940 (Tool lookup caching)

DecisionΒΆ

Introduce a two-tier tool lookup cache keyed by tool name, with:

  • L1 in-memory LRU + TTL per worker
  • Optional Redis L2 for multi-worker deployments
  • Negative caching for missing/inactive/offline tools
  • Explicit invalidation on tool and gateway mutations

Changes MadeΒΆ

  1. New module: mcpgateway/cache/tool_lookup_cache.py
  2. L1 LRU + TTL cache with size limit
  3. Redis L2 cache with shared keyspace when CACHE_TYPE=redis
  4. Negative cache entries (missing, inactive, offline)
  5. Gateway-scoped invalidation using a Redis set of tool names

  6. Invoke path integration

  7. ToolService.invoke_tool() now checks cache before querying the DB
  8. Cache payload includes tool + gateway fields needed for invocation
  9. Negative cache entries short-circuit missing/inactive/offline tool calls

  10. Cache invalidation

  11. Tool create/update/delete/toggle invalidates tool lookup cache
  12. Gateway update/toggle/delete invalidates all tools for that gateway

  13. Configuration

Tool Lookup Cache: - TOOL_LOOKUP_CACHE_ENABLED (default: true) - TOOL_LOOKUP_CACHE_TTL_SECONDS (default: 60) - TOOL_LOOKUP_CACHE_NEGATIVE_TTL_SECONDS (default: 10) - TOOL_LOOKUP_CACHE_L1_MAXSIZE (default: 10000) - TOOL_LOOKUP_CACHE_L2_ENABLED (default: true, only when CACHE_TYPE=redis)

Cache Key SchemeΒΆ

{prefix}tool_lookup:{tool_name}            β†’ tool + gateway payload
{prefix}tool_lookup:gateway:{gateway_id}   β†’ set of tool names (for invalidation)

Default prefix: mcpgw: β†’ mcpgw:tool_lookup:my_tool

Performance OptimizationsΒΆ

Before (Baseline)ΒΆ

Metric Value
DB lookups per tool invocation 1
Invoke latency (cache miss) DB-bound

After (With Caching)ΒΆ

Metric Cache Hit Cache Miss
DB lookups per tool invocation 0 1
Invoke latency ~1-3ms DB-bound

Expected improvement: 80-95% reduction in DB traffic for repeated tool invocations.

ConsequencesΒΆ

PositiveΒΆ

  • Removes hot-path DB lookups for repeat tool invocations
  • Reduces connection pool pressure under high concurrency
  • Redis L2 provides cross-worker cache reuse

NegativeΒΆ

  • Cache staleness window (up to TTL) after tool/gateway updates
  • Additional memory use in each worker for L1 cache
  • Requires careful invalidation on tool/gateway mutations

NeutralΒΆ

  • L2 is optional and only enabled when CACHE_TYPE=redis
  • Negative cache TTL is short to avoid long-lived false negatives
  • No schema changes required

Alternatives ConsideredΒΆ

Option Why Not
Rely on registry list cache Not usable for single tool lookups
Cache at router layer only Still requires DB lookup per invocation
Longer TTLs Too stale for active tool updates
Materialized view Overkill and DB-specific

Compatibility NotesΒΆ

  • Enabled by default, can be disabled via env vars
  • Works with existing cache backend configuration
  • No API or schema changes required

ReferencesΒΆ

  • GitHub Issue #1940: Tool lookup caching
  • ADR-007: Pluggable cache backend
  • ADR-029: Registry and Admin Stats Caching
  • mcpgateway/cache/tool_lookup_cache.py - Implementation
  • mcpgateway/services/tool_service.py - Integration

StatusΒΆ

Implemented and enabled by default. Monitor Redis keyspace for tool_lookup keys.