Fuzz Testing¶
MCP Gateway includes comprehensive fuzz testing to automatically discover edge cases, security vulnerabilities, and crashes through property-based testing, coverage-guided fuzzing, and security-focused validation.
Overview¶
Fuzz testing generates thousands of random, malformed, or edge-case inputs to find bugs that traditional testing might miss. Our implementation combines multiple fuzzing approaches:
- Property-Based Testing with Hypothesis for core validation logic
- Coverage-Guided Fuzzing with Atheris for deep code path exploration
- API Schema Fuzzing with Schemathesis for contract validation
- Security-Focused Testing for vulnerability discovery
Quick Start¶
Installation¶
Install fuzzing dependencies as an optional package group:
Running Tests¶
# Complete fuzzing suite
make fuzz-all
# Individual components
make fuzz-hypothesis # Property-based tests
make fuzz-security # Security vulnerability tests
make fuzz-quick # Fast CI validation
make fuzz-report # Generate reports
Fuzzing Components¶
Property-Based Testing (Hypothesis)¶
Tests core validation logic by generating inputs that satisfy certain properties and verifying invariants hold.
Test Modules: - tests/fuzz/test_jsonrpc_fuzz.py
- JSON-RPC validation (16 tests) - tests/fuzz/test_jsonpath_fuzz.py
- JSONPath processing (16 tests) - tests/fuzz/test_schema_validation_fuzz.py
- Pydantic schemas (19 tests)
Example Test:
@given(st.text())
def test_validate_request_handles_text_input(self, text_input):
"""Test that text input never crashes the validator."""
try:
data = json.loads(text_input)
if isinstance(data, dict):
validate_request(data)
except (JSONRPCError, ValueError, TypeError, json.JSONDecodeError, AttributeError):
# Expected exceptions for invalid input
pass
except Exception as e:
pytest.fail(f"Unexpected exception: {type(e).__name__}: {e}")
Configuration: Set testing intensity via environment variables:
HYPOTHESIS_PROFILE=dev # 100 examples (default)
HYPOTHESIS_PROFILE=ci # 50 examples (fast)
HYPOTHESIS_PROFILE=thorough # 1000 examples (comprehensive)
Coverage-Guided Fuzzing (Atheris)¶
Uses libfuzzer to instrument code and guide input generation toward unexplored code paths.
Fuzzer Scripts: - tests/fuzz/fuzzers/fuzz_jsonpath.py
- JSONPath expression fuzzing - tests/fuzz/fuzzers/fuzz_jsonrpc.py
- JSON-RPC message fuzzing - tests/fuzz/fuzzers/fuzz_config_parser.py
- Configuration parsing fuzzing
Setup Requirements: Atheris requires clang and libfuzzer to be installed:
# Install LLVM/Clang (one-time setup)
git clone --depth=1 https://github.com/llvm/llvm-project.git
cd llvm-project
cmake -DLLVM_ENABLE_PROJECTS='clang;compiler-rt' -G "Unix Makefiles" -S llvm -B build
cmake --build build --parallel $(nproc)
# Set environment and install
export CLANG_BIN="$(pwd)/bin/clang"
pip install -e .[fuzz-atheris]
Running Atheris:
# Manual execution with custom parameters
python tests/fuzz/fuzzers/fuzz_jsonpath.py -runs=10000 -max_total_time=300
API Schema Fuzzing (Schemathesis)¶
Tests API endpoints by generating requests based on OpenAPI schema definitions.
Features: - Validates API contracts automatically - Tests authentication flows - Verifies response schemas - Discovers endpoint-specific edge cases
Manual Setup: API fuzzing requires a running server instance:
# Terminal 1: Start server
make dev
# Terminal 2: Run API fuzzing
source $(VENV_DIR)/bin/activate
schemathesis run http://localhost:4444/openapi.json \
--checks all \
--auth admin:changeme \
--hypothesis-max-examples=500
Security-Focused Testing¶
Tests resistance to common security vulnerabilities and attack patterns.
Test Categories: - SQL Injection: Tests input sanitization in database queries - XSS Prevention: Validates output encoding and CSP headers - Path Traversal: Tests file access controls - Command Injection: Validates command execution safeguards - Authentication Bypass: Tests auth mechanism robustness - DoS Protection: Validates rate limiting and resource constraints
Example Security Test:
@given(st.text(min_size=1, max_size=1000))
def test_sql_injection_resistance(self, malicious_input):
"""Test resistance to SQL injection in various fields."""
sql_patterns = [
malicious_input,
f"'; DROP TABLE tools; --",
f"' OR '1'='1",
f"'; INSERT INTO tools (name) VALUES ('hacked'); --",
]
for pattern in sql_patterns:
response = client.post("/admin/tools", json={
"name": pattern,
"url": "http://example.com"
}, headers={"Authorization": "Basic YWRtaW46Y2hhbmdlbWU="})
# Should not crash or allow injection
assert response.status_code in [200, 201, 400, 401, 422]
Makefile Targets¶
Target | Purpose | Dependencies | Use Case |
---|---|---|---|
fuzz-install | Install fuzzing dependencies | Virtual environment | One-time setup |
fuzz-all | Complete fuzzing suite | fuzz-install | Full validation |
fuzz-hypothesis | Property-based testing | fuzz-install | Core logic validation |
fuzz-atheris | Coverage-guided fuzzing | clang/libfuzzer | Deep exploration |
fuzz-api | API endpoint fuzzing | Running server | Contract validation |
fuzz-restler | RESTler API fuzzing (instructions) | Docker or local RESTler | Stateful/sequence fuzzing |
fuzz-restler-auto | Run RESTler via Docker automatically | Docker, running server | Automated stateful fuzzing |
fuzz-security | Security vulnerability testing | fuzz-install | Security validation |
fuzz-quick | Fast fuzzing for CI | fuzz-install | PR validation |
fuzz-extended | Extended fuzzing | fuzz-install | Nightly testing |
fuzz-report | Generate reports | fuzz-install | Analysis |
fuzz-clean | Clean artifacts | None | Maintenance |
Test Execution Modes¶
Development Mode¶
For interactive development and debugging:
make fuzz-hypothesis # Run with statistics and detailed output
make fuzz-security # Security tests with warnings
CI/CD Mode¶
For automated testing in continuous integration:
Comprehensive Mode¶
For thorough testing in nightly builds:
RESTler Fuzzing¶
RESTler performs stateful, sequence-based fuzzing of REST APIs using the OpenAPI/Swagger specification. It's ideal for discovering bugs that require specific call sequences.
Option A: Docker (recommended)¶
Prerequisites: Docker installed and the gateway running locally.
# Terminal 1: Start the server
make dev
# Terminal 2: Generate/OpenAPI and run RESTler via Docker
curl -sSf http://localhost:4444/openapi.json -o reports/restler/openapi.json
docker run --rm -v "$PWD/reports/restler:/workspace" \
ghcr.io/microsoft/restler restler compile --api_spec /workspace/openapi.json
docker run --rm -v "$PWD/reports/restler:/workspace" \
ghcr.io/microsoft/restler restler test --grammar_dir /workspace/Compile --no_ssl --time_budget 5
# Results are written to reports/restler
You can print these instructions anytime with:
Option A2: Automated Docker runner¶
Use the helper that waits for the server, downloads the spec, then compiles and runs RESTler in Docker:
# Terminal 1: Start the server
make dev
# Terminal 2: Run automated RESTler fuzzing
make fuzz-restler-auto
# Optional environment variables:
# MCPFUZZ_BASE_URL (default: http://localhost:4444)
# MCPFUZZ_AUTH_HEADER (e.g., "Authorization: Basic YWRtaW46Y2hhbmdlbWU=")
# MCPFUZZ_TIME_BUDGET (minutes, default: 5)
# MCPFUZZ_NO_SSL (1 to pass --no_ssl; default: 1)
Notes: - If Docker is not present, fuzz-restler-auto
will print a friendly message and exit successfully (use make fuzz-restler
for manual steps). This behavior avoids CI failures on runners without Docker. - Artifacts are written under reports/restler/
.
Option B: Local install¶
Follow RESTler's official installation guide, set RESTLER_HOME
, then:
export RESTLER_HOME=/path/to/restler
curl -sSf http://localhost:4444/openapi.json -o reports/restler/openapi.json
"$RESTLER_HOME"/restler compile --api_spec reports/restler/openapi.json
"$RESTLER_HOME"/restler test --grammar_dir Compile --no_ssl --time_budget 5
Notes: - Ensure the server exposes http://localhost:4444/openapi.json
. - For authenticated specs, supply tokens/headers to RESTler as needed. - Increase --time_budget
for deeper exploration in nightly runs. - In CI, prefer running fuzz-restler-auto
only on runners with Docker available, or skip otherwise.
Understanding Results¶
Test Outcomes¶
Passing Tests: Inputs handled correctly without crashes Failing Tests: Unexpected exceptions or crashes discovered Skipped Tests: Tests requiring external dependencies (auth, servers)
Hypothesis Statistics¶
Hypothesis provides detailed statistics about test execution:
- during generate phase (1.86 seconds):
- Typical runtimes: ~ 15-16 ms, of which < 1ms in data generation
- 100 passing examples, 0 failing examples, 0 invalid examples
- Stopped because settings.max_examples=100
Bug Discovery¶
When fuzzing finds issues, it provides: - Minimal failing example: Simplified input that reproduces the bug - Seed for reproduction: Run with --hypothesis-seed=X
to reproduce - Call stack: Exact location where the failure occurred
Example failure:
Falsifying example: test_validate_request_handles_text_input(
self=<TestJSONRPCRequestFuzzing>,
text_input='null'
)
Writing Fuzz Tests¶
Property-Based Test Structure¶
from hypothesis import given, strategies as st
import pytest
class TestMyComponentFuzzing:
@given(st.text(min_size=1, max_size=100))
def test_component_never_crashes(self, input_text):
"""Test that component handles arbitrary text input."""
try:
result = my_component.process(input_text)
# Verify expected properties
assert isinstance(result, (str, dict, list))
except (ValueError, TypeError):
# Expected exceptions for invalid input
pass
except Exception as e:
pytest.fail(f"Unexpected exception: {type(e).__name__}: {e}")
Atheris Fuzzer Structure¶
#!/usr/bin/env python3
import atheris
import sys
import os
# Ensure project is in path
sys.path.insert(0, os.path.join(os.path.dirname(__file__), '../../..'))
from mcpgateway.my_module import my_function
def TestOneInput(data: bytes) -> None:
"""Fuzz target for my_function."""
fdp = atheris.FuzzedDataProvider(data)
try:
if fdp.remaining_bytes() < 1:
return
# Generate test input
test_input = fdp.ConsumeUnicodeNoSurrogates(100)
# Test function (should never crash)
my_function(test_input)
except (ValueError, TypeError):
# Expected exceptions
pass
except Exception:
# Unexpected - let Atheris catch it
raise
def main():
atheris.instrument_all()
atheris.Setup(sys.argv, TestOneInput)
atheris.Fuzz()
if __name__ == "__main__":
main()
Security Test Patterns¶
@given(st.text().filter(lambda x: any(char in x for char in '<>"\'&')))
def test_xss_prevention(self, potentially_malicious):
"""Test XSS prevention in user inputs."""
response = client.post("/api/endpoint", json={
"field": potentially_malicious
}, headers={"Authorization": "Basic YWRtaW46Y2hhbmdlbWU="})
# Should handle malicious content safely
assert response.status_code in [200, 201, 400, 401, 422]
# Raw script tags should not appear unescaped
if "<script>" in potentially_malicious.lower():
assert "<script>" not in response.text.lower()
Common Strategies¶
Input Generation Strategies¶
import hypothesis.strategies as st
# Basic types
st.text() # Unicode strings
st.integers() # Integers
st.binary() # Raw bytes
st.booleans() # True/False
# Structured data
st.dictionaries(
keys=st.text(min_size=1),
values=st.integers()
)
st.lists(st.text(), max_size=10)
# Custom strategies
st.one_of(st.none(), st.text(), st.integers()) # Union types
# Filtered strategies (use sparingly)
st.text().filter(lambda x: '$' in x)
Common Edge Cases to Test¶
JSON-RPC Validation: - Empty objects: {}
- Non-objects: null
, []
, "string"
, 123
- Missing required fields - Invalid field types - Very large payloads
JSONPath Processing: - Invalid expressions: $..
, $[
, $.
- Very long expressions - Unicode characters - Special characters that break parsing
API Endpoints: - Malformed JSON payloads - Missing authentication headers - Invalid content types - Very large request bodies - Concurrent requests
Troubleshooting¶
Common Issues¶
Import Errors:
Solution: Runmake fuzz-install
first Authentication Failures:
Solution: Security tests expect auth failures when testing in isolationFilter Warnings:
Solution: Useassume()
instead of .filter()
or disable health check Performance Tuning¶
Slow Tests: - Reduce max_examples
for development - Use HYPOTHESIS_PROFILE=ci
for faster execution - Add @settings(timeout=timedelta(seconds=10))
for time limits
Memory Issues: - Limit recursive data structure depth - Use max_leaves
parameter in recursive strategies - Monitor corpus size growth
Debugging Failed Tests¶
Reproduce Failures:
# Use seed from failed test output
pytest --hypothesis-seed=12345 tests/fuzz/test_my_module.py::test_function
Debug Mode:
@settings(verbosity=Verbosity.verbose)
@given(st.text())
def test_with_debug(self, input_text):
print(f"Testing with: {repr(input_text)}") # Add debug output
# ... test logic
Integration with CI/CD¶
GitHub Actions Example¶
name: Fuzz Testing
on:
pull_request:
branches: [main]
schedule:
- cron: '0 2 * * *' # Nightly
jobs:
fuzz-quick:
name: Quick Fuzzing
runs-on: ubuntu-latest
if: github.event_name == 'pull_request'
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.12"
- run: make fuzz-quick
fuzz-extended:
name: Extended Fuzzing
runs-on: ubuntu-latest
if: github.event_name == 'schedule'
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.12"
- run: make fuzz-extended
- run: make fuzz-report
- uses: actions/upload-artifact@v4
with:
name: fuzz-reports
path: reports/
Pre-commit Hooks¶
Add fuzzing to pre-commit pipeline:
# .pre-commit-config.yaml
repos:
- repo: local
hooks:
- id: fuzz-quick
name: Quick Fuzz Testing
entry: make fuzz-quick
language: system
pass_filenames: false
stages: [pre-push]
Best Practices¶
Test Design¶
- Focus on invariants: Test properties that should always hold
- Expect the expected: Handle known exception types gracefully
- Fail on unexpected: Use
pytest.fail()
for truly unexpected errors - Use examples: Add
@example()
decorators for known edge cases
Input Strategies¶
- Start broad: Use general strategies like
st.text()
initially - Narrow gradually: Add constraints based on domain knowledge
- Avoid over-filtering: Use
assume()
instead of.filter()
when possible - Test boundaries: Include empty, very large, and edge case inputs
Security Testing¶
- Test defensively: Assume all input is potentially malicious
- Verify sanitization: Check that dangerous content is properly escaped
- Test authentication: Verify auth requirements are properly enforced
- Monitor responses: Ensure error messages don't leak sensitive information
Real Issues Found¶
Our fuzzing implementation has already discovered several real bugs:
JSON-RPC Validation Crashes¶
Issue: validate_request()
crashes with AttributeError
when given non-dict inputs.
Root Cause: Function assumes input is always a dictionary and calls .get()
method.
Examples that crash: - json.loads("null")
β None
β None.get("jsonrpc")
crashes - json.loads("0")
β 0
β 0.get("jsonrpc")
crashes - json.loads("[]")
β []
β [].get("jsonrpc")
crashes
Fix Applied: Added type checking in fuzz tests to only validate dict inputs.
Schema Validation Edge Cases¶
Issue: Pydantic schemas accept broader input types than expected.
Examples: - AuthenticationValues(auth_type="")
accepts empty strings - ToolCreate(input_schema=None)
allows None values - Various unicode and special character handling inconsistencies
Directory Structure¶
tests/fuzz/ # Fuzz testing directory
βββ conftest.py # Pytest configuration and markers
βββ test_jsonrpc_fuzz.py # JSON-RPC validation tests
βββ test_jsonpath_fuzz.py # JSONPath processing tests
βββ test_schema_validation_fuzz.py # Pydantic schema tests
βββ test_api_schema_fuzz.py # API endpoint tests
βββ test_security_fuzz.py # Security vulnerability tests
βββ fuzzers/ # Atheris coverage-guided fuzzers
β βββ fuzz_jsonpath.py # JSONPath expression fuzzer
β βββ fuzz_jsonrpc.py # JSON-RPC message fuzzer
β βββ fuzz_config_parser.py # Configuration parser fuzzer
βββ scripts/
βββ generate_fuzz_report.py # Report generation utility
# Generated artifacts (gitignored)
corpus/ # Test case corpus
βββ jsonpath/ # JSONPath test cases
βββ jsonrpc/ # JSON-RPC test cases
βββ api/ # API request test cases
reports/ # Generated reports
βββ fuzz-report.json # Machine-readable report
βββ fuzz-report.md # Human-readable report
Advanced Usage¶
Custom Strategies¶
Create domain-specific input generators:
# JSON-RPC message strategy
jsonrpc_request = st.fixed_dict({
"jsonrpc": st.just("2.0"),
"method": st.text(min_size=1, max_size=50),
"id": st.one_of(st.integers(), st.text(), st.none())
}, optional={
"params": st.one_of(
st.dictionaries(st.text(), st.text()),
st.lists(st.text())
)
})
@given(jsonrpc_request)
def test_with_valid_structure(self, request):
validate_request(request)
Corpus Management¶
Build and maintain test case collections:
# Generate corpus from successful fuzzing runs
python tests/fuzz/fuzzers/fuzz_jsonpath.py \
-runs=10000 \
-artifact_prefix=corpus/jsonpath/
# Use corpus for regression testing
python tests/fuzz/fuzzers/fuzz_jsonpath.py \
corpus/jsonpath/* \
-runs=0 # Only test existing corpus
Performance Monitoring¶
Track fuzzing performance over time:
@settings(deadline=timedelta(milliseconds=500))
@given(st.text())
def test_performance_regression(self, input_text):
"""Ensure processing stays within performance bounds."""
start_time = time.time()
my_function(input_text)
duration = time.time() - start_time
assert duration < 0.1, f"Processing took {duration}s, expected < 0.1s"
Reporting and Analysis¶
Generated Reports¶
The make fuzz-report
command generates comprehensive reports:
JSON Report (reports/fuzz-report.json
): - Machine-readable results for CI integration - Tool execution statistics - Failure counts and error categorization - Corpus and coverage metrics
Markdown Report (reports/fuzz-report.md
): - Human-readable executive summary - Tool-by-tool breakdown - Recommendations for action - Links to detailed artifacts
Interpreting Results¶
Green (β ): No crashes or security issues found Yellow (β οΈ): Partial results or configuration issues Red (π¨): Critical issues requiring immediate attention
Example Report Summary:
π― Overall Status: β
PASS
π§ Tools Completed: 4/4
π¨ Critical Issues: 0
π‘ Key Recommendations:
β
No critical issues found in fuzzing
π Continue regular fuzzing as part of CI/CD
π Review detailed results for optimization opportunities
Maintenance¶
Regular Tasks¶
- Update corpus: Add new interesting test cases discovered during development
- Review failures: Investigate and fix any new crashes discovered
- Tune performance: Adjust example counts based on CI time constraints
- Update strategies: Enhance input generation as code evolves
Corpus Hygiene¶
# Clean up old artifacts
make fuzz-clean
# Regenerate corpus with latest code
make fuzz-atheris
# Verify corpus quality
python tests/fuzz/scripts/generate_fuzz_report.py
References¶
- Hypothesis Documentation - Property-based testing guide
- Atheris Documentation - Coverage-guided fuzzing
- Schemathesis Documentation - API schema testing
- OWASP Fuzzing Guide - Security fuzzing practices
- Property-Based Testing - Testing philosophy and examples