OAuth Troubleshooting GuideΒΆ
This guide helps troubleshoot OAuth 2.0 authentication problems when registering MCP servers, particularly the "Invalid or expired state parameter" error.
Understanding OAuth State ParameterΒΆ
The OAuth state parameter serves two critical purposes:
- CSRF Protection: Prevents cross-site request forgery attacks via HMAC signature
- Session Binding: Links the authorization request to the callback with embedded gateway ID and user context
How State Works in MCP GatewayΒΆ
βββββββββββ 1. GET /oauth/authorize/{gateway_id} βββββββββββββββ
β User β ββββββββββββββββββββββββββββββββββββββββββββ>β MCP Gateway β
βββββββββββ ββββββββ¬βββββββ
β
β 2. Generate:
β - PKCE code_verifier/challenge
β - State (JSON + HMAC signature)
β - Store in Redis/DB/Memory
v
βββββββββββ 3. Redirect to provider with βββββββββββββββ
β User β <βββββββββββββββββββββββββββββββββββββββββββββ MCP Gateway β
ββββββ¬βββββ state + code_challenge βββββββββββββββ
β
β 4. User authenticates at OAuth provider
v
βββββββββββββββ
β OAuth β
β Provider β
ββββββββ¬βββββββ
β
β 5. Redirect to /oauth/callback?code=xxx&state=xxx
v
βββββββββββββββ 6. Validate state: βββββββββββββββ
β MCP Gateway β ββββββββββββββββββββββββββββββββββββββββ>β State Store β
β β - Check signature (HMAC-SHA256) β β
β β - Check expiration (5 min TTL) β β
β β - Check not already used β β
β β - Retrieve code_verifier β β
ββββββββ¬βββββββ βββββββββββββββ
β
β 7. Exchange code + code_verifier for tokens
v
βββββββββββββββ
β OAuth β
β Provider β
βββββββββββββββ
State StructureΒΆ
The state parameter is a base64-encoded JSON payload with HMAC signature:
# From mcpgateway/services/oauth_manager.py:627
state_data = {
"gateway_id": gateway_id,
"app_user_email": app_user_email,
"nonce": secrets.token_urlsafe(16),
"timestamp": datetime.now(timezone.utc).isoformat()
}
# Signed with HMAC-SHA256 using AUTH_ENCRYPTION_SECRET
OAuth EndpointsΒΆ
| Endpoint | Method | Description |
|---|---|---|
/oauth/authorize/{gateway_id} | GET | Initiates OAuth flow, redirects to provider |
/oauth/callback | GET | Handles OAuth callback, exchanges code for tokens |
/oauth/status/{gateway_id} | GET | Returns OAuth configuration status |
/oauth/fetch-tools/{gateway_id} | POST | Fetches tools from MCP server after OAuth completion |
/oauth/registered-clients | GET | Lists all DCR-registered OAuth clients |
/oauth/registered-clients/{gateway_id} | GET | Gets registered client for specific gateway |
/oauth/registered-clients/{client_id} | DELETE | Deletes a registered OAuth client |
Error Messages and Their CausesΒΆ
"Invalid or expired state parameter - possible replay attack"ΒΆ
Location: mcpgateway/services/oauth_manager.py:544
Causes:
- State not found in storage (Redis/DB/Memory)
- State already consumed (single-use protection)
- State expired (TTL exceeded)
"Invalid state signature - possible CSRF attack"ΒΆ
Location: mcpgateway/services/oauth_manager.py:562
Causes:
- State was tampered with
AUTH_ENCRYPTION_SECRETchanged between authorization and callback- Corrupted state parameter in URL
"State parameter gateway mismatch"ΒΆ
Location: mcpgateway/services/oauth_manager.py:572
Causes:
- Callback received for different gateway than initiated
- Gateway ID changed in state parameter
- Misconfigured redirect URI
"State has expired for gateway {gateway_id}"ΒΆ
Location: mcpgateway/services/oauth_manager.py:746, 782, 818
Cause: OAuth flow took longer than 5 minutes (STATE_TTL_SECONDS = 300)
"State was already used for gateway {gateway_id} - possible replay attack"ΒΆ
Location: mcpgateway/services/oauth_manager.py:751, 789, 824
Cause: User refreshed callback page or callback was triggered twice
"State not found in [Redis|database|memory] for gateway {gateway_id}"ΒΆ
Location: mcpgateway/services/oauth_manager.py:727, 772, 809
Causes:
- Server restart (in-memory storage lost)
- Different worker handled callback (multi-worker with in-memory)
- State never stored (storage failure during authorization)
Common Causes and FixesΒΆ
1. Multi-Worker Deployment with In-Memory StorageΒΆ
Symptom: Intermittent failures - sometimes works, sometimes doesn't.
Cause: In-memory storage doesn't share state across workers.
Note
The default CACHE_TYPE is database, which supports multi-worker deployments. This issue only occurs if you explicitly set CACHE_TYPE=memory.
# From mcpgateway/services/oauth_manager.py:35-37
_oauth_states: Dict[str, Dict[str, Any]] = {} # Per-process only when CACHE_TYPE=memory!
Diagnosis:
# Check your cache type configuration
grep CACHE_TYPE .env
# Default is "database" - which is safe for multi-worker
# If CACHE_TYPE=memory, check worker count
ps aux | grep -E "(gunicorn|uvicorn)" | grep -v grep
# Multiple workers + CACHE_TYPE=memory = problem
Fix:
# Option A: Use default database storage (recommended - already the default)
CACHE_TYPE=database
# Option B: Use Redis for better performance
CACHE_TYPE=redis
REDIS_URL=redis://localhost:6379
# Option C: Single worker (development only, if using memory)
# gunicorn --workers 1 ...
2. State TTL Expired (5 Minutes)ΒΆ
Symptom: Always fails if user takes too long to authenticate.
Cause: State expires after 300 seconds.
Diagnosis:
- Check timestamps in debug logs
- Note time between "Stored OAuth state" and callback
Fix:
- Complete OAuth flow within 5 minutes
- For slow identity providers, consider increasing TTL (requires code change)
3. State Already Consumed (Replay Protection)ΒΆ
Symptom: First attempt works, refresh/retry fails.
Cause: State is single-use and deleted after validation.
# From mcpgateway/services/oauth_manager.py:724-725
# Get and delete state atomically (single-use)
state_json = await redis.getdel(state_key)
Diagnosis:
# Check logs for duplicate callbacks
grep "State not found" logs/mcpgateway.log
grep "already been used" logs/mcpgateway.log
Fix:
- Don't refresh the callback page
- Start a new OAuth flow if needed (click "Authorize" again in Admin UI)
4. Server Restart During OAuth FlowΒΆ
Symptom: Always fails after server restart.
Cause: In-memory states are lost on restart.
Fix:
- Use Redis or database storage for persistence
- Restart servers during low-traffic periods
5. Gateway ID MismatchΒΆ
Symptom: State validation fails with "gateway mismatch" in logs.
Cause: The callback returned to a different gateway than expected.
# From mcpgateway/services/oauth_manager.py:572
raise OAuthError("State parameter gateway mismatch")
Diagnosis:
Fix:
- Verify the OAuth redirect URI matches the gateway configuration
- Check that gateway IDs are consistent
- Ensure redirect_uri in OAuth provider matches
{BASE_URL}/oauth/callback
6. Load Balancer Without Sticky SessionsΒΆ
Symptom: Fails randomly in load-balanced environments.
Cause: Different backend handles callback than authorization.
Fix:
Or configure sticky sessions in your load balancer (not recommended).
7. AUTH_ENCRYPTION_SECRET ChangedΒΆ
Symptom: "Invalid state signature" error after config change.
Cause: HMAC signature is validated using AUTH_ENCRYPTION_SECRET.
Fix:
- Don't change
AUTH_ENCRYPTION_SECRETduring active OAuth flows - Clear all pending states after changing the secret
- Ensure consistent secret across all workers/instances
8. PKCE Code Verifier LostΒΆ
Symptom: Token exchange fails after successful callback.
Cause: PKCE code_verifier not found during token exchange (stored with state).
Fix: Same as state storage issues - ensure Redis/database storage is working.
Admin UI OAuth FlowΒΆ
Starting OAuth AuthorizationΒΆ
- Navigate to Admin Panel > Gateways
- Find gateway with
auth_type = oauth - Click Authorize button (only visible for OAuth gateways)
- Browser redirects to
/oauth/authorize/{gateway_id} - User authenticates at OAuth provider
- Callback redirects to success page
- Click Fetch Tools to import MCP tools
Debugging StepsΒΆ
Step 1: Enable Debug LoggingΒΆ
Step 2: Check State Storage TypeΒΆ
# Verify current configuration
grep -E "^CACHE_TYPE" .env
# Valid values:
# CACHE_TYPE=database (default - recommended, supports multi-worker)
# CACHE_TYPE=redis (best for high-traffic production)
# CACHE_TYPE=memory (single-worker only - NOT recommended)
# CACHE_TYPE=none (disabled - OAuth will not work)
Step 3: Monitor State OperationsΒΆ
Watch for these log messages:
# Successful state storage
grep "Stored OAuth state" logs/mcpgateway.log
# State validation
grep -E "(validated|not found|expired|already been used)" logs/mcpgateway.log
# Signature validation
grep -E "(Invalid state signature|CSRF)" logs/mcpgateway.log
Healthy flow:
DEBUG - Stored OAuth state in Redis for gateway abc123
DEBUG - Successfully validated OAuth state from Redis for gateway abc123
INFO - Completed OAuth flow for gateway abc123, user user@example.com
Problematic flow (example: CACHE_TYPE=memory with multiple workers):
DEBUG - Stored OAuth state in memory for gateway abc123
WARNING - State not found in memory for gateway abc123
ERROR - OAuth callback failed: Invalid or expired state parameter - possible replay attack
Problematic flow (example: state expired):
DEBUG - Stored OAuth state in database for gateway abc123
WARNING - State has expired for gateway abc123
ERROR - OAuth callback failed: Invalid or expired state parameter - possible replay attack
Step 4: Verify Redis Connectivity (if using Redis)ΒΆ
# Test Redis connection
redis-cli ping
# Should return: PONG
# Check OAuth states in Redis
redis-cli keys "oauth:state:*"
# Inspect a specific state (TTL and content)
redis-cli ttl "oauth:state:<gateway_id>:<state>"
redis-cli get "oauth:state:<gateway_id>:<state>"
Step 5: Check Database States (if using database)ΒΆ
-- SQLite: Check for stored states (oauth_states table)
SELECT id, gateway_id, substr(state, 1, 50) as state_preview,
code_verifier IS NOT NULL as has_pkce,
expires_at, used, created_at
FROM oauth_states
ORDER BY created_at DESC
LIMIT 10;
-- SQLite: Check for expired states (should be cleaned up automatically)
SELECT COUNT(*) as expired_count
FROM oauth_states
WHERE expires_at < datetime('now');
-- Check for used states (should be deleted after use)
SELECT COUNT(*) as used_count
FROM oauth_states
WHERE used = 1;
-- View stored tokens after successful OAuth
SELECT id, gateway_id, user_id, app_user_email,
expires_at, created_at
FROM oauth_tokens
ORDER BY created_at DESC
LIMIT 10;
-- PostgreSQL alternatives:
-- Use NOW() instead of datetime('now')
-- Use true instead of 1 for boolean
Step 6: Trace a Single OAuth FlowΒΆ
# Enable debug logging first
export LOG_LEVEL=DEBUG
# Start the gateway
make dev
# In another terminal, tail logs
tail -f logs/mcpgateway.log | grep -E "(OAuth|state|gateway)"
# Start OAuth flow in browser and watch the logs
Configuration ReferenceΒΆ
Recommended Production ConfigurationΒΆ
# .env
# Use Redis for distributed state storage (recommended)
CACHE_TYPE=redis
REDIS_URL=redis://localhost:6379
# Redis connection settings
REDIS_MAX_RETRIES=3
REDIS_RETRY_INTERVAL_MS=2000
# OAuth settings
OAUTH_REQUEST_TIMEOUT=30
OAUTH_MAX_RETRIES=3
OAUTH_DEFAULT_TIMEOUT=3600
# PKCE settings
OAUTH_DISCOVERY_ENABLED=true
OAUTH_PREFERRED_CODE_CHALLENGE_METHOD=S256
# Secret for signing state (CRITICAL - must be consistent across instances)
AUTH_ENCRYPTION_SECRET=your-secret-key-here
# Enable debug logging temporarily for troubleshooting
LOG_LEVEL=DEBUG
Dynamic Client Registration (DCR)ΒΆ
If using DCR (RFC 7591) for automatic client registration:
# Enable DCR
DCR_ENABLED=true
DCR_AUTO_REGISTER_ON_MISSING_CREDENTIALS=true
# Default scopes for registered clients
DCR_DEFAULT_SCOPES=["mcp:read"]
# Token endpoint auth method
DCR_TOKEN_ENDPOINT_AUTH_METHOD=client_secret_basic
# Metadata cache TTL
DCR_METADATA_CACHE_TTL=3600
State Storage ComparisonΒΆ
| Storage Type | CACHE_TYPE | Multi-Worker | Persistent | Performance | Use Case |
|---|---|---|---|---|---|
| Database | database (default) | Yes | Yes | Good | Default, most deployments |
| Redis | redis | Yes | Optional | Best | High-traffic production |
| In-Memory | memory | No | No | Fast | Single-worker dev only |
Environment VariablesΒΆ
| Variable | Default | Description |
|---|---|---|
CACHE_TYPE | database | database, redis, memory, or none |
REDIS_URL | redis://localhost:6379/0 | Redis connection string (when CACHE_TYPE=redis) |
DATABASE_URL | sqlite:///./mcp.db | Database for state storage |
AUTH_ENCRYPTION_SECRET | my-test-salt | Secret for HMAC signing states (change in production!) |
OAUTH_REQUEST_TIMEOUT | 30 | Timeout for OAuth requests (seconds) |
OAUTH_MAX_RETRIES | 3 | Max retries for token requests |
OAUTH_DEFAULT_TIMEOUT | 3600 | Default token expiration (seconds) |
LOG_LEVEL | INFO | Set to DEBUG for troubleshooting |
Quick FixesΒΆ
Fix 1: Verify Database Storage is Active (Default)ΒΆ
# Database storage is the default - verify it's set correctly
grep CACHE_TYPE .env
# If not set or set incorrectly, ensure it's using database:
# CACHE_TYPE=database
# Run migrations to ensure oauth_states table exists
alembic upgrade head
# Restart MCP Gateway
make dev
Fix 2: Switch to Redis Storage (For High-Traffic Production)ΒΆ
# Install Redis if needed
# Ubuntu/Debian
sudo apt install redis-server
sudo systemctl start redis
# macOS
brew install redis
brew services start redis
# Update .env
sed -i 's/CACHE_TYPE=.*/CACHE_TYPE=redis/' .env
# Or add if not present:
echo "CACHE_TYPE=redis" >> .env
echo "REDIS_URL=redis://localhost:6379" >> .env
# Restart MCP Gateway
make dev
Fix 3: Single Worker Mode (Only if using CACHE_TYPE=memory)ΒΆ
# Only needed if you're using CACHE_TYPE=memory
# (Not recommended - switch to database or redis instead)
gunicorn mcpgateway.main:app --workers 1 --bind 0.0.0.0:4444
# Or using make
WORKERS=1 make serve
Fix 4: Clear Stale StatesΒΆ
# Redis - clear all OAuth states
redis-cli keys "oauth:state:*" | xargs -r redis-cli del
# Database - clear expired and stale states
sqlite3 mcp.db "DELETE FROM oauth_states WHERE expires_at < datetime('now') OR used = 1;"
# PostgreSQL
psql -c "DELETE FROM oauth_states WHERE expires_at < NOW() OR used = true;"
Verifying the FixΒΆ
After applying fixes, verify OAuth works:
-
Clear any stale states (see Fix 4 above)
-
Restart the gateway:
-
Test OAuth flow:
- Go to Admin Panel > Gateways
- Click "Authorize" on an OAuth-configured gateway
- Complete authentication within 5 minutes
- Verify success page shows "OAuth Authorization Successful"
- Click "Fetch Tools" to verify token works
-
Check logs for success:
-
Verify tokens stored:
Troubleshooting DCR IssuesΒΆ
If using Dynamic Client Registration:
# Check DCR is enabled
grep DCR .env
# Look for DCR-related logs
grep -E "(DCR|Dynamic Client)" logs/mcpgateway.log
# Check registered clients
curl -s http://localhost:4444/oauth/registered-clients | jq
# Common DCR errors:
# - "DCR failed" - AS doesn't support RFC 7591
# - "No issuer configured" - Gateway missing issuer URL
# - "Metadata discovery failed" - AS metadata endpoint unreachable
Related DocumentationΒΆ
- OAuth Integration - Main OAuth setup guide
- Configuration Reference - All environment variables
- Scaling Guide - Multi-worker and Redis setup
- Securing MCP Gateway - Security best practices