ADR-0015: Configurable Well-Known URI Handler¶
- Status: Accepted
- Date: 2025-08-17
- Deciders: Core Engineering Team
- Issues: #540
- Related: Security infrastructure for standardized web discovery
Context¶
The MCP Gateway needed to support standardized well-known URIs as defined by RFC 8615 to enable proper web service discovery, security contact information, and crawler management. Well-known URIs are standardized endpoints that web services expose for automated discovery and security contact purposes.
The implementation needed to address: - robots.txt for search engine crawler management (typically private API = disable crawling) - security.txt for security contact information per RFC 9116 - Custom well-known files for additional service policies (AI usage, DNT policy, etc.) - Security-first defaults appropriate for private API gateway deployment - Configuration flexibility for different deployment scenarios - Admin monitoring of well-known configuration status
Requirements¶
- Support standard well-known URIs (robots.txt, security.txt)
- Allow custom well-known files via configuration
- Default to private API security posture (no crawling)
- RFC 9116 compliant security.txt with automatic validation
- Configurable cache headers for performance
- Admin endpoint for configuration monitoring
- Environment-based configuration via standard patterns
Decision¶
We implemented a flexible /.well-known/*
endpoint handler with the following design:
1. Router-Based Implementation¶
Created mcpgateway/routers/well_known.py
with a dedicated FastAPI router:
@router.get("/.well-known/{filename:path}", include_in_schema=False)
async def get_well_known_file(filename: str, response: Response, request: Request):
"""Serve well-known URI files with configurable content and security defaults."""
Design decisions: - Router isolation: Separate router for clean organization and testing - Dynamic routing: Single endpoint handles all well-known URIs - Security-first: Disabled by default, explicit enable required - Schema exclusion: Not included in OpenAPI docs (reduces attack surface)
2. Configuration-Driven Content¶
Extended mcpgateway/config.py
with well-known URI settings:
# Well-Known URI Configuration
well_known_enabled: bool = True
well_known_robots_txt: str = """User-agent: *
Disallow: /
# MCP Gateway is a private API gateway
# Public crawling is disabled by default"""
well_known_security_txt: str = ""
well_known_security_txt_enabled: bool = False
well_known_custom_files: str = "{}" # JSON format
well_known_cache_max_age: int = 3600 # 1 hour
Design decisions: - Private API defaults: robots.txt blocks all crawlers by default - Explicit security.txt: Only enabled when content is provided - JSON custom files: Flexible format for additional well-known files - Configurable caching: Performance optimization with sensible defaults
3. RFC 9116 Security.txt Compliance¶
Implemented automatic security.txt validation and enhancement:
def validate_security_txt(content: str) -> Optional[str]:
"""Validate security.txt format and add required headers."""
# Add Expires field if missing (6 months from now)
# Add header comments for clarity
# Preserve existing valid content
Design decisions: - Auto-expires: Adds Expires header if missing (RFC requirement) - Header comments: Adds generation timestamp and description - Validation: Ensures RFC 9116 compliance - Preservation: Maintains existing valid fields
4. Well-Known Registry¶
Implemented a registry for known well-known URIs with metadata:
WELL_KNOWN_REGISTRY = {
"robots.txt": {
"content_type": "text/plain",
"description": "Robot exclusion standard",
"rfc": "RFC 9309"
},
"security.txt": {
"content_type": "text/plain",
"description": "Security contact information",
"rfc": "RFC 9116"
},
# ... additional standard URIs
}
Design decisions: - Helpful errors: Provides descriptive 404 messages for known but unconfigured files - Content-Type mapping: Ensures correct MIME types - Documentation: Links to relevant RFCs and standards - Extensibility: Easy to add new standard well-known URIs
5. Admin Monitoring¶
Added /admin/well-known
endpoint for configuration visibility:
@router.get("/admin/well-known", response_model=dict)
async def get_well_known_status(user: str = Depends(require_auth)):
"""Returns configuration status and available well-known files."""
Design decisions: - Authentication required: Admin endpoint requires JWT authentication - Configuration visibility: Shows enabled files and cache settings - Supported files list: Displays all known well-known URI types - Status monitoring: Helps administrators verify configuration
Implementation Architecture¶
File Structure¶
mcpgateway/
βββ config.py # Well-known configuration settings
βββ routers/
β βββ well_known.py # /.well-known/* endpoint handler
βββ main.py # Router integration
tests/unit/mcpgateway/
βββ test_well_known.py # Comprehensive test coverage
.env.example # Configuration documentation
Request Flow¶
- Request:
GET /.well-known/robots.txt
- Authentication: No auth required for well-known URIs (public by design)
- Validation: Check if well-known endpoints are enabled
- Routing: Match filename to configured content or registry
- Headers: Add cache control and specific headers (X-Robots-Tag for robots.txt)
- Response: Return PlainTextResponse with appropriate headers
Security Considerations¶
- No authentication: Well-known URIs are public by design per RFC 8615
- Content validation: security.txt content validated against RFC 9116
- Path traversal protection: Filename normalization prevents directory traversal
- Cache headers: Appropriate cache settings reduce server load
- Information disclosure: Default robots.txt reveals minimal information
Consequences¶
β Benefits¶
- Standards Compliance: Implements RFC 8615 (well-known URIs) and RFC 9116 (security.txt)
- Security Contact: Enables security researchers to find contact information
- Crawler Management: Proper robots.txt prevents unwanted search engine indexing
- Flexibility: Custom well-known files support organization-specific policies
- Performance: Configurable caching reduces server load for frequently accessed files
- Monitoring: Admin endpoint provides configuration visibility
- Private API Focused: Defaults appropriate for API gateway deployment
β Trade-offs¶
- Information Disclosure: Well-known URIs are public and may reveal service information
- Cache Headers: Public cache headers may not be appropriate for all deployments
- Configuration Complexity: Additional environment variables to manage
- Static Content: Well-known files are static and can't include dynamic information
π Maintenance¶
- security.txt Updates: Requires periodic updates to contact information and expiration
- RFC Compliance: Monitor RFC updates for security.txt format changes
- Custom File Management: Organizations need to maintain custom well-known content
- Cache Tuning: May need cache duration adjustments based on usage patterns
Configuration Examples¶
Basic Private API (Default)¶
WELL_KNOWN_ENABLED=true
# robots.txt blocks all crawlers (default)
# security.txt disabled (default)
Public API with Security Contact¶
WELL_KNOWN_ENABLED=true
WELL_KNOWN_SECURITY_TXT="Contact: mailto:security@example.com\nContact: https://example.com/security\nPreferred-Languages: en"
WELL_KNOWN_ROBOTS_TXT="User-agent: *\nAllow: /health\nAllow: /docs\nDisallow: /"
Custom Policies¶
WELL_KNOWN_CUSTOM_FILES={"ai.txt": "AI Usage: Tool orchestration only", "dnt-policy.txt": "We honor Do Not Track headers"}
Alternatives Considered¶
Alternative | Why Not Chosen |
---|---|
Static file serving | No environment-based configuration, harder to manage |
Database-stored content | Overly complex for static content, harder to configure |
Middleware-based handler | Less organized than router-based approach |
Always-enabled endpoints | Security risk, should be explicitly enabled |
No security.txt validation | Would allow non-compliant security.txt files |
Wildcard well-known handler | Security risk, explicit file support is safer |
Testing Strategy¶
Implemented comprehensive test coverage: - Default robots.txt: Validates security-first defaults - security.txt validation: Tests RFC 9116 compliance and auto-enhancement - Custom files: Verifies JSON configuration parsing and serving - 404 handling: Tests unknown files and helpful error messages - Path normalization: Ensures path traversal protection - Registry functionality: Validates well-known URI metadata
Future Enhancements¶
Potential improvements for future iterations: - Dynamic content: Template variables (e.g., {{DOMAIN}}
, {{CONTACT_EMAIL}}
) - File upload API: Admin interface for uploading well-known files - GPG signing: Digital signature support for security.txt - Rate limiting: Specific limits for well-known endpoints - Internationalization: Multi-language support for policy files - A/B testing: Different content based on user agent or other criteria
Security Impact¶
Positive Security Impact¶
- Security contact: Enables responsible disclosure by security researchers
- Crawler control: Prevents unwanted indexing of private API endpoints
- Standards compliance: Follows established web security practices
- Information control: Explicit control over what information is disclosed
Security Considerations¶
- Information disclosure: Well-known URIs are intentionally public
- Content validation: Prevents serving malicious content through validation
- Cache control: Public caching may not be appropriate for all environments
- Admin endpoint: Configuration status requires authentication
Status¶
This well-known URI handler implementation is accepted and implemented as of version 0.7.0, providing standards-compliant web service discovery while maintaining security-first defaults appropriate for private API gateway deployments.