Model Configuration Guide
This guide provides detailed information on configuring language models for use with the Flexo Framework. Model configurations are specified in the models_config
section of src/configs/agent.yaml
.
Table of Contents
Configuration Structure
The model configuration in models.yaml
follows this basic structure:
models:
model_name:
vendor: provider_name
model_id: model_identifier
# Additional parameters...
Each model entry consists of:
- A unique
model_name
which will be referenced in your agent configuration
- A
vendor
identifier that determines which adapter to use
- A
model_id
that specifies the actual model to use from the vendor
- Additional parameters specific to the provider or model
Supported Model Providers
Cloud Providers
Self-Hosted Options
The openai-compat
adapter types support any API that implements OpenAI-compatible endpoints:
Configuration Parameters
Common Parameters
Provider-Specific Parameters Examples
OpenAI
gpt-4o:
vendor: openai
model_id: gpt-4-turbo
temperature: 0.7
max_tokens: 4096
top_p: 1.0
presence_penalty: 0.0 # OpenAI-specific
frequency_penalty: 0.0 # OpenAI-specific
Anthropic
claude-35:
vendor: anthropic
model_id: claude-3-opus-20240229
temperature: 0.5
max_tokens: 4096
top_p: 0.9
top_k: 50 # Anthropic-specific
xAI
grok-2:
vendor: xai
model_id: grok-2-latest
temperature: 0.7
max_tokens: 4096
top_p: 0.95
base_url: https://api.x.ai/v1 # Optional override
IBM watsonX
granite-8b:
vendor: watsonx-granite
model_id: ibm/granite-3-8b-instruct
temperature: 0.5
max_tokens: 1024
time_limit: 60 # watsonX-specific
Mistral AI
mistral-large:
vendor: mistral-ai
model_id: mistral-large-latest
temperature: 0.7
max_tokens: 4096
safe_prompt: true # Mistral-specific
OpenAI-Compatible
local-model:
vendor: openai-compat # or openai-compat-granite, openai-compat-llama, etc.
model_id: your-model-name
base_url: http://localhost:8000/v1
api_key: dummy-key
temperature: 0.7
max_tokens: 2048
Environment Variables
The following environment variables are used for API authentication:
Model Selection in Agent Configuration
In your agent.yaml
file, you specify which model to use:
models_config:
main_chat_model:
vendor: openai
model_id: gpt-4o
# Parameters...
You can also configure the agent to use different models for different purposes:
models_config:
main_chat_model:
vendor: anthropic
model_id: claude-3-5-sonnet-latest
# Primary model parameters...
watsonx_granite:
vendor: watsonx-granite
model_id: ibm/granite-3-8b-instruct
# Additional model instance parameters
Advanced Configurations
Vendor-Specific Prompt Builders
Flexo automatically selects the appropriate prompt builder based on the vendor:
The detection_mode
in your agent configuration affects how tool calls are detected:
vendor
: Uses the provider's native tool-calling capabilities (recommended for cloud providers)
manual
: Uses Flexo's pattern-matching for tool calls (useful for local models)
detection_mode: vendor # or 'manual'
use_vendor_chat_completions: true # or false
Examples
OpenAI GPT-4
gpt-4:
vendor: openai
model_id: gpt-4o
temperature: 0.7
max_tokens: 4096
presence_penalty: 0.0
frequency_penalty: 0.0
Anthropic Claude
claude:
vendor: anthropic
model_id: claude-3-5-sonnet-latest
temperature: 0.7
max_tokens: 4096
top_p: 0.9
xAI
grok:
vendor: xai
model_id: grok-2-latest
temperature: 0.7
max_tokens: 4096
Mistral AI
mistral:
vendor: mistral-ai
model_id: mistral-large-latest
temperature: 0.7
max_tokens: 4096
IBM WatsonX (Llama)
watsonx-llama:
vendor: watsonx-llama
model_id: meta-llama/llama-3-405b-instruct
decoding_method: greedy
max_tokens: 4000
temperature: 0.7
IBM WatsonX (Granite)
watsonx-granite:
vendor: watsonx-granite
model_id: ibm/granite-3-8b-instruct
decoding_method: greedy
max_tokens: 4000
temperature: 0.7
vLLM (Local Deployment)
vllm-local:
vendor: openai-compat-llama
model_id: meta-llama/Llama-3.2-8B-Instruct
base_url: http://localhost:8000/v1
api_key: dummy-key
temperature: 0.7
max_tokens: 2048
Ollama (Local Deployment)
ollama-local:
vendor: openai-compat-granite
model_id: granite31 # or any model name in Ollama
base_url: http://localhost:11434/v1
api_key: ollama
temperature: 0.7
max_tokens: 2048