Skip to content

Model Configuration Guide

This guide provides detailed information on configuring language models for use with the Flexo Framework. Model configurations are specified in the models_config section of src/configs/agent.yaml.

Table of Contents

Configuration Structure

The model configuration in models.yaml follows this basic structure:

models:
  model_name:
    vendor: provider_name
    model_id: model_identifier
    # Additional parameters...

Each model entry consists of:

  • A unique model_name which will be referenced in your agent configuration
  • A vendor identifier that determines which adapter to use
  • A model_id that specifies the actual model to use from the vendor
  • Additional parameters specific to the provider or model

Supported Model Providers

Cloud Providers

Provider Vendor Key API Endpoints Environment Variables
OpenAI openai /chat/completions OPENAI_API_KEY
Anthropic anthropic /messages ANTHROPIC_API_KEY
xAI xai /chat/completions XAI_API_KEY
Mistral AI mistral-ai /chat/completions MISTRAL_API_KEY
IBM WatsonX watsonx-llama, watsonx-granite, watsonx-mistral /text/chat_stream, /text/generation_stream WATSONX_API_KEY, WATSONX_PROJECT_ID

Self-Hosted Options

The openai-compat adapter types support any API that implements OpenAI-compatible endpoints:

Implementation Vendor Key Default Base URL
vLLM openai-compat http://localhost:8000/v1
Ollama openai-compat http://localhost:11434/v1
LLaMA.cpp openai-compat http://localhost:8080/v1
LM Studio openai-compat http://localhost:1234/v1
LocalAI openai-compat http://localhost:8080/v1
Text Generation WebUI openai-compat http://localhost:5000/v1

Configuration Parameters

Common Parameters

Parameter Type Description Default
model_id string Identifier for the specific model to use Required
vendor string Provider identifier (see tables above) Required
temperature float Controls randomness in generation (0.0-1.0) 0.7
max_tokens integer Maximum tokens to generate 1024
top_p float Alternative to temperature for nucleus sampling (0.0-1.0) 1.0
stop array Sequences that will stop generation when produced []

Provider-Specific Parameters Examples

OpenAI

gpt-4o:
  vendor: openai
  model_id: gpt-4-turbo
  temperature: 0.7
  max_tokens: 4096
  top_p: 1.0
  presence_penalty: 0.0  # OpenAI-specific
  frequency_penalty: 0.0  # OpenAI-specific
Parameter Type Description Default
presence_penalty float Penalty for token presence (-2.0 to 2.0) 0.0
frequency_penalty float Penalty for token frequency (-2.0 to 2.0) 0.0

Anthropic

claude-35:
  vendor: anthropic
  model_id: claude-3-opus-20240229
  temperature: 0.5
  max_tokens: 4096
  top_p: 0.9
  top_k: 50  # Anthropic-specific
Parameter Type Description Default
top_k integer Limits token selection to top K options 50

xAI

grok-2:
  vendor: xai
  model_id: grok-2-latest
  temperature: 0.7
  max_tokens: 4096
  top_p: 0.95
  base_url: https://api.x.ai/v1  # Optional override
Parameter Type Description Default
base_url string API endpoint URL https://api.x.ai/v1

IBM watsonX

granite-8b:
  vendor: watsonx-granite
  model_id: ibm/granite-3-8b-instruct
  temperature: 0.5
  max_tokens: 1024
  time_limit: 60  # watsonX-specific
Parameter Type Description Default
time_limit integer Maximum generation time in seconds 60

Mistral AI

mistral-large:
  vendor: mistral-ai
  model_id: mistral-large-latest
  temperature: 0.7
  max_tokens: 4096
  safe_prompt: true  # Mistral-specific
Parameter Type Description Default
safe_prompt boolean Enable content filtering true

OpenAI-Compatible

local-model:
  vendor: openai-compat  # or openai-compat-granite, openai-compat-llama, etc.
  model_id: your-model-name
  base_url: http://localhost:8000/v1
  api_key: dummy-key
  temperature: 0.7
  max_tokens: 2048
Parameter Type Description Default
base_url string The API endpoint URL Required
api_key string API key (depending on implementation) Required

Environment Variables

The following environment variables are used for API authentication:

Provider Environment Variable Required
OpenAI OPENAI_API_KEY Yes
Anthropic ANTHROPIC_API_KEY Yes
xAI XAI_API_KEY Yes
Mistral AI MISTRAL_API_KEY Yes
IBM WatsonX WATSONX_API_KEY, WATSONX_PROJECT_ID Yes
OpenAI-Compatible Varies (can be set in config) Depends

Model Selection in Agent Configuration

In your agent.yaml file, you specify which model to use:

models_config:
  main_chat_model:
    vendor: openai
    model_id: gpt-4o
    # Parameters...

You can also configure the agent to use different models for different purposes:

models_config:
  main_chat_model:
    vendor: anthropic
    model_id: claude-3-5-sonnet-latest
    # Primary model parameters...

  watsonx_granite:
    vendor: watsonx-granite
    model_id: ibm/granite-3-8b-instruct
    # Additional model instance parameters

Advanced Configurations

Vendor-Specific Prompt Builders

Flexo automatically selects the appropriate prompt builder based on the vendor:

Vendor Prompt Builder Link
openai OpenAIPromptBuilder
anthropic AnthropicPromptBuilder
mistral-ai MistralAIPromptBuilder
watsonx-granite WatsonXGranitePromptBuilder
watsonx-llama WatsonXLlamaPromptBuilder
watsonx-mistral WatsonXMistralPromptBuilder
openai-compat-granite OpenAICompatGranitePromptBuilder
openai-compat-llama OpenAICompatLlamaPromptBuilder
xai XAIPromptBuilder

Tool Detection Modes

The detection_mode in your agent configuration affects how tool calls are detected:

  • vendor: Uses the provider's native tool-calling capabilities (recommended for cloud providers)
  • manual: Uses Flexo's pattern-matching for tool calls (useful for local models)
detection_mode: vendor  # or 'manual'
use_vendor_chat_completions: true  # or false

Examples

OpenAI GPT-4

gpt-4:
  vendor: openai
  model_id: gpt-4o
  temperature: 0.7
  max_tokens: 4096
  presence_penalty: 0.0
  frequency_penalty: 0.0

Anthropic Claude

claude:
  vendor: anthropic
  model_id: claude-3-5-sonnet-latest
  temperature: 0.7
  max_tokens: 4096
  top_p: 0.9

xAI

grok:
  vendor: xai
  model_id: grok-2-latest
  temperature: 0.7
  max_tokens: 4096

Mistral AI

mistral:
  vendor: mistral-ai
  model_id: mistral-large-latest
  temperature: 0.7
  max_tokens: 4096

IBM WatsonX (Llama)

watsonx-llama:
  vendor: watsonx-llama
  model_id: meta-llama/llama-3-405b-instruct
  decoding_method: greedy
  max_tokens: 4000
  temperature: 0.7

IBM WatsonX (Granite)

watsonx-granite:
  vendor: watsonx-granite
  model_id: ibm/granite-3-8b-instruct
  decoding_method: greedy
  max_tokens: 4000
  temperature: 0.7

vLLM (Local Deployment)

vllm-local:
  vendor: openai-compat-llama
  model_id: meta-llama/Llama-3.2-8B-Instruct
  base_url: http://localhost:8000/v1
  api_key: dummy-key
  temperature: 0.7
  max_tokens: 2048

Ollama (Local Deployment)

ollama-local:
  vendor: openai-compat-granite
  model_id: granite31  # or any model name in Ollama
  base_url: http://localhost:11434/v1
  api_key: ollama
  temperature: 0.7
  max_tokens: 2048