Model Mesh Tool - Complete Guide
A comprehensive guide to the Model Mesh Tool, a configurable small-model mesh MCP tool that routes prompts to specialized models based on task type.
Table of Contents
- Overview
- Concept
- Architecture
- Provider Communication
- Task-to-Model Mapping
- Configuration
- Usage Examples
- Response Format
- Troubleshooting
Overview
The Model Mesh Tool implements a small-model mesh architecture where different models are configured for specific tasks:
- Guardian tasks → Content safety and moderation models (e.g.,
ibm/granite3.3-guardian:8b) - Vision tasks → Vision-specialized models (e.g.,
ibm/granite3.2-vision) - Speech recognition tasks → Speech models (e.g.,
whisper) - Text tasks → General-purpose text models (e.g.,
llama2)
Models are accessed via configurable providers (LiteLLM by default, Ollama as alternative).
Features
- ✅ Task-based model routing
- ✅ JSON prompt configuration file support
- ✅ Direct prompt input support
- ✅ Configurable model providers (LiteLLM, Ollama)
- ✅ Provider adapter pattern for extensibility
- ✅ Configurable temperature and max_tokens
- ✅ Model override capability
- ✅ Template variable substitution
- ✅ Graceful error handling (warns but doesn't fail if models unavailable)
Prerequisites
Install providers:
bashpip install litellm # For LiteLLM provider (default) pip install ollama # For Ollama provider (optional)Install and run Ollama:
bash# Install Ollama (if not already installed) # Visit: https://ollama.ai/ # Start Ollama server ollama serve # Pull required models ollama pull ibm/granite3.3-guardian:8b ollama pull ibm/granite3.2-vision
Concept
What is a Model Mesh?
A Model Mesh is an architectural pattern that routes different types of tasks to specialized models optimized for those specific tasks. Instead of using a single large general-purpose model for everything, a model mesh uses multiple smaller, specialized models, each optimized for a particular domain.
Why Use a Model Mesh?
Specialization: Each model is optimized for its specific task type
- Guardian models are trained for content safety and moderation
- Vision models are optimized for image understanding
- Text models handle language tasks efficiently
- Speech models excel at audio processing
Cost Efficiency: Smaller specialized models are often more cost-effective than large general-purpose models
- Run models locally (e.g., via Ollama)
- Use smaller models that are faster and cheaper
- Only load/use models when needed
Performance: Specialized models often perform better at their specific tasks
- Guardian models provide better safety assessments
- Vision models have better image understanding
- Task-specific optimizations improve accuracy
Flexibility: Mix and match models based on your needs
- Use different providers for different models
- Configure models independently
- Easy to add or remove models
Model Mesh vs. Single Model
Single Model Approach:
User Request → Large General Model → Response- One model handles all tasks
- Higher resource usage
- May not be optimal for specialized tasks
- Single point of failure
Model Mesh Approach:
User Request → Task Router → Specialized Model → Response
↓
(guardian/vision/text/speech)- Multiple specialized models
- Optimal for each task type
- Better resource utilization
- Graceful degradation (if one model fails, others still work)
Key Concepts
Task-Based Routing
The core concept is task-based routing: when a request comes in, the system identifies the task type (guardian, vision, text, speech) and routes it to the appropriate specialized model.
Provider Abstraction
Models can be accessed through different providers (LiteLLM, Ollama, etc.), allowing flexibility in how models are accessed while maintaining a consistent interface.
Configuration-Driven
The model mesh is configuration-driven: you define which models handle which tasks through a simple JSON configuration, making it easy to change models without code changes.
Graceful Degradation
The system is designed for graceful degradation: if a model is unavailable, the system warns but continues to work with available models, rather than failing completely.
Architecture
System Architecture
The Model Mesh Tool follows a layered architecture with clear separation of concerns:
┌─────────────────────────────────────────────────────────┐
│ User Request │
│ (task, prompt, options) │
└────────────────────┬────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────┐
│ ModelMeshTool.run() │
│ - Validates input parameters │
│ - Extracts task type │
│ - Gets prompt (direct or from template) │
└────────────────────┬────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────┐
│ Task Type Detection & Routing │
│ - Identifies task: guardian/vision/text/speech │
│ - Looks up model configuration │
│ - Handles model override if provided │
└────────────────────┬────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────┐
│ Model Configuration Lookup │
│ - Retrieves model name from model_config │
│ - Determines provider (litellm/ollama) │
│ - Gets provider-specific options │
└────────────────────┬────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────┐
│ Provider Adapter Selection │
│ - Gets or creates provider adapter │
│ - Caches adapters for reuse │
│ - Handles provider fallback if needed │
└────────────────────┬────────────────────────────────────┘
↓
┌────────────┴────────────┐
↓ ↓
┌───────────────┐ ┌───────────────┐
│ LiteLLM │ │ Ollama │
│ Adapter │ │ Adapter │
│ │ │ │
│ - Formats │ │ - Direct │
│ model name │ │ API calls │
│ - Calls │ │ - Supports │
│ litellm │ │ think=True │
│ API │ │ - AsyncClient │
└───────┬───────┘ └───────┬───────┘
↓ ↓
┌─────────────────────────────────────────┐
│ Ollama API / Model Backend │
│ - Local Ollama server │
│ - Remote model APIs │
└────────────────────┬────────────────────┘
↓
┌─────────────────────────────────────────────────────────┐
│ Specialized Model │
│ - Guardian: ibm/granite3.3-guardian:8b │
│ - Vision: ibm/granite3.2-vision / llava │
│ - Text: llama2 / other text models │
│ - Speech: whisper │
└────────────────────┬────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────┐
│ Response Processing │
│ - Extracts model response │
│ - Verifies model used │
│ - Adds metadata (capability, usage, etc.) │
│ - Formats as ToolResult │
└────────────────────┬────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────┐
│ Response │
│ - Model output │
│ - Capability information │
│ - Usage statistics │
└─────────────────────────────────────────────────────────┘Component Architecture
1. ModelMeshTool (Main Orchestrator)
- Responsibility: Routes requests to appropriate models
- Key Methods:
run(): Main entry point for tool execution_get_model_config_for_task(): Retrieves model configuration_get_prompt(): Handles prompt templates and variables_validate_model_configs(): Validates configuration at startup
2. Provider Adapter Pattern
The Model Mesh Tool uses a provider adapter pattern for flexibility:
┌─────────────────────────────────────┐
│ ModelProviderAdapter (Abstract) │
│ - chat() │
│ - is_available() │
│ - get_provider_name() │
└──────────────┬──────────────────────┘
│
┌───────┴────────┐
↓ ↓
┌──────────────┐ ┌──────────────┐
│ LiteLLM │ │ Ollama │
│ Adapter │ │ Adapter │
│ │ │ │
│ - Uses │ │ - Uses │
│ litellm │ │ ollama- │
│ library │ │ python │
│ - Formats: │ │ - Direct │
│ ollama/ │ │ model name │
│ model_name │ │ - Supports │
│ │ │ think=True │
└──────────────┘ └──────────────┘Components:
- Base Adapter Interface:
ModelProviderAdapter- Abstract base class defining the interface - LiteLLM Adapter:
LiteLLMAdapter- Uses LiteLLM library for unified model access - Ollama Adapter:
OllamaAdapter- Uses ollama-python library directly for Ollama-specific features - Provider Factory:
ModelProviderFactory- Creates and manages adapter instances
Benefits:
- Easy addition of new providers (OpenAI, Anthropic, etc.)
- Provider-specific feature access (e.g.,
think=Truefor Ollama Guardian models) - Graceful fallback between providers
- Consistent interface across all providers
- Adapter caching for performance
3. Configuration Management
┌─────────────────────────────────────┐
│ Configuration Sources │
│ │
│ 1. model_config (dict) │
│ - Task → Model mapping │
│ - Provider selection │
│ - Model-specific options │
│ │
│ 2. prompt_config_path (JSON file) │
│ - Prompt templates │
│ - Variable substitution │
│ - Task type associations │
│ │
│ 3. Runtime parameters │
│ - temperature, max_tokens │
│ - model_override │
└─────────────────────────────────────┘4. Request Flow Details
Step 1: Input Validation
- Validates task parameter (required)
- Ensures either prompt or prompt_key is provided
- Validates temperature and max_tokens ranges
Step 2: Prompt Resolution
- If
promptprovided: use directly - If
prompt_keyprovided: load template from JSON config - Substitute variables if
prompt_variablesprovided
Step 3: Model Selection
- Look up task in
model_config - Handle
model_overrideif provided - Extract model name, provider, and options
Step 4: Provider Adapter
- Get or create adapter for provider
- Cache adapter for reuse
- Handle provider fallback if needed
Step 5: Model Call
- Prepare provider-specific parameters
- Call adapter's
chat()method - Handle provider-specific features (e.g.,
think=True)
Step 6: Response Processing
- Extract model response
- Verify model used matches requested
- Add metadata (capability, usage, etc.)
- Format as ToolResult
Error Handling Architecture
The system implements multi-level error handling:
- Initialization Level: Validates configuration, warns about unavailable providers/models
- Runtime Level: Graceful error responses instead of exceptions
- Provider Level: Handles connection errors, model not found, etc.
- Response Level: Includes error details and suggestions in response
This ensures the tool continues to work even if some models are unavailable.
Provider Communication
Provider Selection
The tool supports two providers:
1. LiteLLM Provider (Default)
- Library:
litellm - Format:
ollama/model_name - Usage:
litellm.acompletion(model="ollama/llava", ...) - Pros: Unified interface, supports many providers
- Cons: May not support all provider-specific features
2. Ollama-Python Provider (Direct)
- Library:
ollama(from ollama-python) - Format: Direct model name (e.g.,
ibm/granite3.3-guardian:8b) - Usage:
AsyncClient().chat(model="ibm/granite3.3-guardian:8b", think=True, ...) - Pros: Full access to Ollama-specific features (think, streaming, etc.)
- Cons: Ollama-specific only
How Communication Works
When Provider is "ollama"
When provider: "ollama" is specified, the tool uses the ollama-python library directly via AsyncClient:
# Internal implementation
from ollama import AsyncClient
client = AsyncClient(host=base_url)
response = await client.chat(
model="ibm/granite3.3-guardian:8b",
think=True, # From config options
messages=[
{
"role": "user",
"content": prompt
}
],
options={
"temperature": 0 # From config options
}
)When Provider is "litellm"
When provider: "litellm" is specified (or default), the tool uses LiteLLM:
# Internal implementation
import litellm
response = await litellm.acompletion(
model="ollama/llava",
messages=[
{
"role": "user",
"content": prompt
}
],
temperature=temperature,
max_tokens=max_tokens
)Options Priority
Provider-specific options are merged in priority order:
- Runtime arguments (highest priority) - from
tool.run()call - Model config options - from
model_config[task]["options"] - Default values (lowest priority) - tool defaults
Example:
# Config
{
"guardian": {
"options": {"temperature": 0, "think": True}
}
}
# Runtime call
await tool.run({
"task": "guardian",
"temperature": 0.1 # Overrides config
})
# Final options: {"temperature": 0.1, "think": True}
# (temperature from runtime, think from config)Task-to-Model Mapping
Mapping Architecture
User Request
↓
tool.run({"task": "vision", ...})
↓
Task Parameter Extraction ("vision")
↓
Lookup in model_config Dictionary
↓
model_config = {
"vision": {...}, ← Matches task="vision"
"guardian": {...},
"text": {...}
}
↓
Get Model Configuration
↓
Route to Appropriate Model & ProviderHow the Mapping Works
The model_config dictionary maps task names to model configurations:
model_config = {
"vision": { # Task name
"model": "ibm/granite3.2-vision", # Model to use
"provider": "ollama",
"options": {...}
},
"guardian": { # Task name
"model": "ibm/granite3.3-guardian:8b", # Model to use
"provider": "ollama",
"options": {
"think": True,
"temperature": 0
}
},
"text": { # Task name
"model": "llama2", # Model to use
"provider": "litellm"
}
}Routing Logic
When you call the tool:
await tool.run({
"task": "vision", # ← This determines which model to use
"prompt": "..."
})The tool:
- Extracts the
taskparameter:"vision" - Looks up
model_config.get("vision")→ finds model configuration - Routes to that model with the specified provider and options
Task Name Convention
Task names are case-insensitive and can be any string. Common conventions:
- Task type:
"vision","speech","text","guardian" - Model-specific:
"granite-vision","granite-guardian" - Use case:
"image-analysis","safety-check","transcription"
# All of these work the same way:
await tool.run({"task": "vision", ...})
await tool.run({"task": "Vision", ...}) # Case-insensitive
await tool.run({"task": "VISION", ...}) # Case-insensitiveModel Override
You can override the model selection at runtime:
# Config has: "vision" → "llava"
# Runtime override:
await tool.run({
"task": "vision",
"model_override": "ibm/granite3.2-vision", # Overrides config
"prompt": "..."
})Configuration
Model Configuration Structure
Simple Configuration (Backward Compatible)
model_config = {
"vision": "llava",
"speech": "whisper",
"text": "llama2"
}This uses LiteLLM by default with format ollama/model_name.
Enhanced Configuration (Provider-Based)
{
"guardian": {
"model": "ibm/granite3.3-guardian:8b",
"provider": "ollama",
"base_url": "http://localhost:11434",
"options": {
"think": true,
"temperature": 0
}
},
"vision": {
"model": "ibm/granite3.2-vision",
"provider": "ollama",
"base_url": "http://localhost:11434"
},
"text": {
"model": "llama2",
"provider": "litellm",
"base_url": "http://localhost:11434"
}
}Basic Setup
from mcp_composer.core.tools.model_mesh_tool import ModelMeshTool
model_mesh_tool = ModelMeshTool({
"name": "model_mesh",
"prompt_config_path": "./example/model_mesh_prompts.json",
"model_config": {
"guardian": {
"model": "ibm/granite3.3-guardian:8b",
"provider": "ollama",
"options": {
"think": True,
"temperature": 0
}
},
"vision": {
"model": "ibm/granite3.2-vision",
"provider": "ollama"
}
},
"base_url": "http://localhost:11434",
"default_provider": "litellm"
})Prompt Configuration File
Create a JSON file with prompt templates:
{
"guardian_content_safety": {
"template": "Analyze the following content for safety: {content}",
"description": "Template for content safety assessment",
"task_type": "guardian"
},
"vision_analysis": {
"template": "Analyze the following image: {image_description}. Focus on: {focus_areas}",
"description": "Template for vision analysis",
"task_type": "vision"
}
}Configuration Fields
| Field | Type | Required | Description |
|---|---|---|---|
name | string | No | Tool name (default: "model_mesh") |
prompt_config_path | string | No | Path to JSON file with prompt templates |
model_config | dict | Yes | Dictionary mapping task types to model configurations |
base_url | string | No | Base URL for model API (default: http://localhost:11434) |
default_provider | string | No | Default provider to use (default: "litellm") |
Model Config Structure
Each task in model_config can be:
Simple string (backward compatible):
python"vision": "llava" # Uses default provider (litellm)Dictionary with provider:
python"guardian": { "model": "ibm/granite3.3-guardian:8b", # Required "provider": "ollama", # Required: "litellm" or "ollama" "base_url": "http://localhost:11434", # Optional "options": { # Optional, model-specific options "think": True, "temperature": 0 } }
Usage Examples
Using Prompt Templates from JSON
result = await model_mesh_tool.run({
"task": "guardian",
"prompt_key": "guardian_content_safety",
"prompt_variables": {
"content": "Hello, this is a test message"
}
})Using Direct Prompts
result = await model_mesh_tool.run({
"task": "guardian",
"prompt": "Analyze this content for safety: 'Hello world'",
"temperature": 0
})Overriding Model Selection
result = await model_mesh_tool.run({
"task": "vision",
"prompt": "Describe this image.",
"model_override": "llava:13b" # Use specific model variant
})Complete Example: Guardian Model
import asyncio
from mcp_composer.core.tools.model_mesh_tool import ModelMeshTool
async def main():
# Configure with Guardian model using ollama provider
tool = ModelMeshTool({
"name": "model_mesh",
"model_config": {
"guardian": {
"model": "ibm/granite3.3-guardian:8b",
"provider": "ollama",
"base_url": "http://localhost:11434",
"options": {
"think": True,
"temperature": 0
}
}
}
})
# Use the tool
result = await tool.run({
"task": "guardian",
"prompt": "hello world"
})
# Print result
print(result.content[0].text)
if __name__ == "__main__":
asyncio.run(main())This will internally call:
from ollama import AsyncClient
client = AsyncClient(host="http://localhost:11434")
response = await client.chat(
model="ibm/granite3.3-guardian:8b",
think=True,
messages=[{"role": "user", "content": "hello world"}],
options={"temperature": 0}
)Integration with MCP Composer
from mcp_composer import MCPComposer
from mcp_composer.core.tools.model_mesh_tool import ModelMeshTool
composer = MCPComposer("my-composer")
composer.disable_composer_tool() # Optional
model_mesh_tool = ModelMeshTool({
"name": "model_mesh",
"prompt_config_path": "./config/prompts.json",
"model_config": {
"guardian": {
"model": "ibm/granite3.3-guardian:8b",
"provider": "ollama",
"options": {"think": True, "temperature": 0}
},
"vision": {
"model": "ibm/granite3.2-vision",
"provider": "ollama"
}
}
})
composer.add_tool(model_mesh_tool)
await composer.setup_member_servers()Tool Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
task | string | Yes | Task type (e.g., 'guardian', 'vision', 'text', 'speech'). Determines which model will be used. |
prompt | string | No* | Direct prompt text |
prompt_key | string | No* | Key to look up in prompt config file |
prompt_variables | object | No | Variables for template substitution |
model_override | string | No | Override model selection |
temperature | number | No | Generation temperature (0.0-2.0, default: 0.7) |
max_tokens | integer | No | Max tokens to generate (default: 1000) |
*Either prompt or prompt_key must be provided.
Task Parameter Details
The task parameter is critical - it determines which model and capability is used:
- 'guardian': Use for content safety, toxicity detection, policy compliance, risk assessment, and content moderation. Examples: 'check if content is safe', 'is this toxic?', 'does this comply with policy?', 'assess risk level', 'should I moderate this?'
- 'vision': Use for image analysis, object detection, and visual content analysis. Examples: 'analyze this image', 'what objects are in this scene?'
- 'text': Use for text summarization, question answering, and general text processing. Examples: 'summarize this text', 'answer this question based on context'
- 'speech': Use for audio transcription and speech sentiment analysis. Examples: 'transcribe this audio', 'analyze speech sentiment'
IMPORTANT: For content safety, toxicity checks, moderation, or policy compliance, ALWAYS use task='guardian', NOT task='text'.
Response Format
The tool returns a JSON response with comprehensive information:
{
"status": "success",
"task": "guardian",
"model": "ibm/granite3.3-guardian:8b",
"provider": "ollama",
"prompt": "Analyze this content for safety: 'Hello world'",
"response": "Model response text...",
"capability": {
"task_type": "guardian",
"description": "Content Safety & Moderation - Analyzes content for safety, toxicity, policy compliance, and risk assessment",
"prompt_template": "guardian_content_safety",
"model_used": "ibm/granite3.3-guardian:8b",
"provider_used": "ollama"
},
"usage": {
"prompt_tokens": 50,
"completion_tokens": 100,
"total_tokens": 150
},
"system_prompt": "...",
"guardrails": [...]
}Response Fields
- status: "success" or "error"
- task: The task type that was used
- model: The model name that was used/attempted
- provider: The provider that was used/attempted
- capability: Object containing:
task_type: The task typedescription: Human-readable description of the capabilityprompt_template: Which prompt template was used (or "direct_prompt")model_used/model_attempted: The model that was used or attemptedprovider_used/provider_attempted: The provider that was used or attempted
- response: The model's response text (if successful)
- usage: Token usage information (if available)
- error: Error message (if status is "error")
- suggestion: Helpful suggestions for fixing errors
Troubleshooting
Model Not Found
If you get an error about a model not being found:
- Ensure Ollama is running:
ollama serve - Check available models:
ollama list - Pull the required model:
ollama pull <model_name>
Note: The tool will show warnings but continue to work with available models. If a model is unavailable, you'll get a graceful error message with suggestions.
Provider Not Available
Install required packages:
bashpip install litellm # For LiteLLM provider pip install ollama # For Ollama providerCheck provider availability:
pythonfrom mcp_composer.core.tools.model_providers import ModelProviderFactory print(ModelProviderFactory.is_provider_available("litellm")) print(ModelProviderFactory.is_provider_available("ollama"))
Prompt Config Not Found
- Ensure the path to
prompt_config_pathis correct (relative or absolute) - Check that the JSON file is valid
Wrong Task Type Selected
If Claude is selecting the wrong task type (e.g., using "text" for toxicity checks):
- Check tool description - The tool description should clearly indicate that safety/moderation tasks use "guardian"
- Verify configuration - Ensure guardian model is configured correctly
- Review prompts - Use natural language that clearly indicates the task type
Graceful Error Handling
The tool is designed to handle unavailable models gracefully:
- During initialization: Shows warnings but doesn't fail
- During execution: Returns helpful error messages instead of crashing
- Model validation: Checks model configurations and warns about issues
- Provider fallback: Attempts to fall back to default provider if configured provider is unavailable
Benefits of Provider-Based Architecture
- Flexibility: Choose the best provider for each model
- Feature Access: Use provider-specific features (e.g.,
think=Truefor Guardian) - Performance: Direct provider access can be more efficient
- Backward Compatible: Simple string configs still work
- Extensible: Easy to add new providers (OpenAI, Anthropic, etc.)
- Graceful Degradation: Works even if some models are unavailable
Future Enhancements
- Support for more providers (OpenAI, Anthropic, etc.)
- Provider-specific authentication
- Connection pooling per provider
- Provider health checks
- Automatic failover between providers
Best Practices
- Use descriptive task names:
"vision","guardian","text"are clearer than"task1","task2" - Group related models: Keep related models in the same tool configuration
- Document your mappings: Add comments explaining which model handles which task
- Use consistent naming: Follow a naming convention (e.g., all lowercase, kebab-case)
- Configure providers appropriately: Use Ollama provider for models that need Ollama-specific features (like
think=True) - Handle errors gracefully: The tool will warn about unavailable models - check logs for warnings
Complete Flow Diagram
┌─────────────────────────────────────────────────────────┐
│ User Call: tool.run({"task": "guardian", ...}) │
└─────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────┐
│ Extract task parameter: "guardian" │
└─────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────┐
│ Lookup in model_config: │
│ model_config.get("guardian") │
│ → {"model": "ibm/granite3.3-guardian:8b", ...} │
└─────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────┐
│ Get model configuration: │
│ - model: "ibm/granite3.3-guardian:8b" │
│ - provider: "ollama" │
│ - options: {"think": True, "temperature": 0} │
└─────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────┐
│ Get provider adapter via ModelProviderFactory │
│ → Creates OllamaAdapter │
└─────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────┐
│ Call adapter.chat() │
│ → AsyncClient().chat( │
│ model="ibm/granite3.3-guardian:8b", │
│ think=True, │
│ options={"temperature": 0} │
│ ) │
└─────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────┐
│ Format response with capability information │
│ - model_used │
│ - provider_used │
│ - capability description │
└─────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────┐
│ Return response │
└─────────────────────────────────────────────────────────┘Key Points Summary
- Task-Based Routing: The
taskparameter determines which model configuration to use - Provider Adapters: Uses adapter pattern for flexible provider support
- Graceful Degradation: Works even if some models are unavailable (shows warnings)
- Capability Information: Response includes which model and capability was used
- Options Merging: Runtime options override config options
- Backward Compatible: Simple string configs still work (defaults to LiteLLM)
- Natural Language: Tool description guides Claude to use correct task types
