Architecture Overview¶
ai4rag is designed as a modular RAG optimization engine with clear separation of concerns. This document provides a high-level overview of the system architecture.
Design Principles¶
Provider Agnostic¶
ai4rag is LLM and Vector Database provider agnostic. It integrates with various backends through:
- Llama Stack: Primary integration for models and vector stores
- Pluggable Components: Foundation models, embeddings, and vector stores are abstracted
Template-Based Approach¶
The core concept is RAG Templates vs RAG Patterns:
- RAG Template: A RAG implementation with uninitialized parameters
- RAG Pattern: A RAG implementation with optimal parameter values (output of optimization)
Optimization-Driven¶
Hyperparameter optimization is the central workflow:
- Define a search space of possible configurations
- Use HPO algorithms to explore configurations
- Evaluate each configuration using metrics
- Return the best-performing configuration
High-Level Architecture¶
graph TB
A[Documents]
B[Benchmark Data]
C[Search Space Definition]
D[Experiment Engine]
E[HPO Optimizer]
subgraph X[RAG Pattern]
G[Chunking]
H[Embedding]
I[Vector Store]
J[Retrieval]
K[Generation]
end
M[Evaluation & Metrics Computation]
N[Best RAG Pattern]
O[Results Artifacts]
P[Events Callbacks]
A --> D
B --> D
C --> D
E <--> D
D --> X
G --> H
H --> I
I --> J
J --> K
X --> M
M --> E
D --> N
D --> O
D --> P Component Layers¶
1. Core Layer¶
Experiment Engine (ai4rag/core/experiment/)
- Orchestrates the entire optimization workflow
- Manages experiment lifecycle (setup, execution, teardown)
- Coordinates between HPO, RAG components, and evaluation
- Produces results and artifacts
HPO Optimizers (ai4rag/core/hpo/)
- Implements optimization algorithms
- Explores search space intelligently
- Suggests next configurations to evaluate
- Tracks optimization progress
2. Search Space Layer¶
Search Space (ai4rag/search_space/)
- Defines parameter ranges and types
- Validates configurations against rules
- Provides parameter sampling
- Enforces constraints (e.g.,
chunk_size > chunk_overlap)
3. RAG Components Layer¶
Chunking (ai4rag/rag/chunking/)
- Splits documents into chunks
- Configurable chunk size and overlap
- Uses LangChain text splitters
Embedding (ai4rag/rag/embedding/)
- Generates text embeddings
- Integrates with Llama Stack embedding models
- Handles batching and error recovery
Vector Stores (ai4rag/rag/vector_store/)
- Stores and retrieves document embeddings
- Supports Milvus (via Llama Stack) and ChromaDB
- Provides similarity search capabilities
Retrieval (ai4rag/rag/retrieval/)
- Retrieves relevant chunks for queries
- Supports simple and window-based retrieval
- Configurable top-k parameter
Foundation Models (ai4rag/rag/foundation_models/)
- Generates answers using LLMs
- Integrates with Llama Stack models
- Formats prompts with retrieved context
Templates (ai4rag/rag/template/)
- Complete RAG implementations
- Combines all RAG components
- Provides end-to-end RAG pipeline
4. Evaluation Layer¶
Evaluator (ai4rag/evaluator/)
- Wraps
unitxtevaluation library - Computes metrics:
faithfulness,answer_correctnessandcontext_correctness - Compares generated answers to ground truth
- Returns scores for optimization
5. Utilities Layer¶
Event Handlers (ai4rag/utils/event_handler/)
- Provides hooks for experiment events
- Allows custom tracking and monitoring
- Useful for production integrations
Validators (ai4rag/utils/)
- Validates input data formats
- Checks configuration consistency
- Provides error messages
Data Flow¶
Indexing Phase¶
Documents
↓
Chunking (chunk_size, chunk_overlap)
↓
Embedding Model (embedding_model)
↓
Vector Store (ls_milvus or chroma)
Query Phase (Per Configuration)¶
Question (from benchmark)
↓
Embedding Model
↓
Vector Store Search (number_of_chunks, retrieval_method, window_size)
↓
Retrieved Chunks
↓
Foundation Model (with context + question)
↓
Generated Answer
Evaluation Phase¶
Generated Answer
Ground Truth Answer
Retrieved Documents
Ground Truth Documents
↓
Unitxt Evaluator
↓
Metrics (faithfulness, answer_correctness, context_correctness)
↓
Aggregated Score
↓
HPO Optimizer (feedback for next configuration)
Extension Points¶
ai4rag is designed for extensibility:
- Custom Foundation Models: Implement
BaseFoundationModel - Custom Embedding Models: Implement
BaseEmbeddingModel - Custom Vector Stores: Implement
BaseVectorStore - Custom HPO Algorithms: Extend base optimizer classes
- Custom Event Handlers: Implement
BaseEventHandler - Custom Metrics: Integrate additional evaluators
Technology Stack¶
- Python: 3.12 & 3.13
- Llama Stack: Model and vector store integration
- LangChain: Document chunking and processing
- Unitxt: Evaluation metrics
- Pandas: Results management
- Pydantic: Data validation
Next Steps¶
- Core Components - Detailed component documentation
- RAG Components - RAG pipeline details
- Data Flow - In-depth workflow analysis