Welcome to Flexo
The Flexo Agent Library is a powerful and flexible codebase that enables users to configure, customize, and deploy a generative AI agent. Designed for adaptability, the library can be tailored to a wide range of use cases, from conversational AI to specialized automation.
Why Flexo?
- Simplified Deployment: Deploy anywhere with comprehensive platform guides
- Production Ready: Built for scalability and reliability
- Extensible: Add custom tools and capabilities
- Well Documented: Clear guides for every step
Key Features
- Configurable Agent: YAML-based configuration for custom behaviors
- Tool Integration: Execute Python functions and REST API calls
- Streaming Support: Real-time streaming with pattern detection
- Production Ready: Containerized deployment support with logging
- FastAPI Backend: Modern async API with comprehensive docs
Supported LLM Providers
☁️ Cloud Providers

OpenAI
GPT-powered models

watsonx.ai
Enterprise AI solutions

Anthropic
Claude family models

xAI
Grok and beyond

Mistral AI
Efficient open models
🖥️ Local & Self-Hosted Options

High-throughput serving

Ollama
Easy local LLMs

Optimized C++ runtime

LM Studio
User-friendly interface

LocalAI
Self-hosted versatility
⚙️ Unified Configuration Interface
Switch providers effortlessly with Flexo's adapter layer. Customize your LLM settings in one place:
gpt-4o:
provider: "openai" # Choose your provider
model: "gpt-4o" # Select specific model
temperature: 0.7
max_tokens: 4000 # Additional model-specific parameters
Need more details? Check our comprehensive Model Configuration Guide for provider-specific settings and optimization tips.
Quick Start Guide
1. Local Development
Start developing with Flexo locally:
2. Production Deployment
Deploy Flexo to your preferred platform:
Platform | Best For | Guide |
---|---|---|
IBM Code Engine | Serverless, pay-per-use | Deploy → |
AWS Fargate | AWS integration | Deploy → |
OpenShift | Enterprise, hybrid cloud | Deploy → |
Kubernetes | Custom infrastructure | Deploy → |
Documentation
Deployment Guides
Code Reference
System Architecture
graph TB
Client[Client] --> API[FastAPI Server]
subgraph API["API Layer"]
Router[Chat Completions Router]
SSE[SSE Models]
Validation[Request Validation]
end
subgraph Agent["Agent Layer"]
ChatAgent[Streaming Chat Agent]
State[State Management]
Config[Configuration]
end
subgraph LLM["LLM Layer"]
Factory[LLM Factory]
Detection[Tool Detection]
Builders[Prompt Builders]
subgraph Adapters["LLM Adapters"]
WatsonX
OpenAI
Anthropic
Mistral
end
end
subgraph Tools["Tools Layer"]
Registry[Tool Registry]
REST[REST Tools]
NonREST["Non-REST Tools: Python etc"]
ExampleTools["Included Example
Tools: RAG + Weather"]
end
subgraph Database["Database Layer"]
ES[Elasticsearch]
Milvus[Milvus]
end
API --> Agent
Agent --> LLM
Agent --> Tools
Tools --> Database
style API stroke:#ff69b4,stroke-width:4px
style Agent stroke:#4169e1,stroke-width:4px
style LLM stroke:#228b22,stroke-width:4px
style Tools stroke:#cd853f,stroke-width:4px
style Database stroke:#4682b4,stroke-width:4px
style Router stroke:#ff69b4,stroke-width:2px
style SSE stroke:#ff69b4,stroke-width:2px
style Validation stroke:#ff69b4,stroke-width:2px
style ChatAgent stroke:#4169e1,stroke-width:2px
style State stroke:#4169e1,stroke-width:2px
style Config stroke:#4169e1,stroke-width:2px
style Factory stroke:#228b22,stroke-width:2px
style Detection stroke:#228b22,stroke-width:2px
style Builders stroke:#228b22,stroke-width:2px
style Adapters stroke:#228b22,stroke-width:2px
style WatsonX stroke:#228b22,stroke-width:2px
style OpenAI stroke:#228b22,stroke-width:2px
style Anthropic stroke:#228b22,stroke-width:2px
style Mistral stroke:#228b22,stroke-width:2px
style Registry stroke:#cd853f,stroke-width:2px
style ExampleTools stroke:#cd853f,stroke-width:2px
style REST stroke:#cd853f,stroke-width:2px
style NonREST stroke:#cd853f,stroke-width:2px
style ES stroke:#4682b4,stroke-width:2px
style Milvus stroke:#4682b4,stroke-width:2px
style Client stroke:#333,stroke-width:2px
Chat Agent State Flow
stateDiagram-v2
[*] --> STREAMING: Initialize Agent
STREAMING --> TOOL_DETECTION: Tool Call Found
STREAMING --> COMPLETING: Generation Done
TOOL_DETECTION --> EXECUTING_TOOLS: Process Tool Call
EXECUTING_TOOLS --> INTERMEDIATE: Tool Execution Complete
INTERMEDIATE --> STREAMING: Continue Generation
INTERMEDIATE --> COMPLETING: Max Iterations
COMPLETING --> [*]: End Response
note right of STREAMING
Main generation state
Handles LLM responses
end note
note right of EXECUTING_TOOLS
Concurrent tool execution
Error handling
end note
Contributing
See our Contributing Guide for details.
Security
For security concerns, please review our Security Policy.