Core Components¶
This page provides detailed architecture documentation for the core components that drive ai4rag's optimization engine.
Architecture Overview¶
The core layer consists of three primary components working together to execute RAG hyperparameter optimization:
graph TB
A[AI4RAGExperiment<br/>Central Orchestrator]
B[HPO Optimizer<br/>GAM or Random]
C[Search Space Manager<br/>Parameter Definitions & Rules]
D[Documents]
E[Benchmark Data]
F[RAG Components]
G[Evaluator]
D --> A
E --> A
C --> A
A <--> B
B -- suggests config --> A
A -- evaluates config --> F
F -- results --> G
G -- score --> A
A -- score --> B Experiment Engine¶
AI4RAGExperiment¶
The AI4RAGExperiment class is the central orchestrator for the entire optimization workflow. It coordinates between the optimizer, RAG components, evaluation, and result management.
Key Responsibilities:
- Manage experiment lifecycle (initialization, execution, cleanup)
- Coordinate HPO optimizer with RAG pattern evaluation
- Handle vector store collection reuse and caching
- Execute Models Pre-Selection (MPS) when needed
- Stream results via event handlers
Core Workflow:
experiment = AI4RAGExperiment(
documents=documents,
benchmark_data=benchmark_data,
search_space=search_space,
vector_store_type="ls_milvus",
optimizer_settings=GAMOptSettings(max_evals=20),
event_handler=my_handler,
client=llama_stack_client,
)
experiment.search() # Execute optimization
search() Method¶
The search() method is the main entry point that executes the optimization loop:
- Models Pre-Selection (MPS) (optional):
- Triggered when
len(foundation_models) > n_mps_foundation_modelsorlen(embedding_models) > n_mps_embedding_models - Evaluates all model combinations on a small sample (default: 5 benchmark questions)
- Selects top N models based on optimization metric
- Reduces search space by eliminating poorly-performing models early
-
Can be skipped with
skip_mps=Truekwarg -
Optimizer Initialization:
- Creates optimizer instance (default:
GAMOptimizer) - Passes
objective_functionwrapper that callsrun_single_evaluation() -
Optionally warm-starts with
known_observations -
Optimization Loop:
- Optimizer suggests next configuration
run_single_evaluation()evaluates configuration- Score returned to optimizer
-
Loop continues until
max_evalsreached -
Error Handling:
- Failed iterations captured via
ExperimentExceptionHandler - Continues optimization despite individual failures
- Final error summary available if all iterations fail
run_single_evaluation() Method¶
This method evaluates a single RAG configuration and returns its optimization score:
sequenceDiagram
participant Optimizer
participant Exp as AI4RAGExperiment
participant VS as VectorStore
participant RAG as SimpleRAG
participant Eval as Evaluator
participant EH as EventHandler
Optimizer->>Exp: run_single_evaluation(config)
activate Exp
Note over Exp: Check results cache
alt Config cached
Exp-->>Optimizer: return cached score
else Config not cached
Note over Exp: Check if collection can be reused
Exp->>VS: create or reuse collection
activate VS
VS-->>Exp: indexing complete
deactivate VS
Exp->>RAG: create SimpleRAG instance
activate RAG
Note over RAG: Parallel query execution
RAG-->>Exp: inference responses
deactivate RAG
Exp->>Eval: evaluate responses
activate Eval
Eval-->>Exp: metric scores
deactivate Eval
Exp->>EH: stream finished pattern
Note over Exp: Cache evaluation result
Exp-->>Optimizer: return optimization score
end
deactivate Exp Key Features:
- Results Caching: Checks if identical
indexing_params+rag_paramshave been evaluated before - Collection Reuse: Reuses vector store collections when
indexing_paramsmatch (chunking + embedding) - Parallel Query Execution: Uses
ThreadPoolExecutor(max 10 threads) for concurrent question processing - Event Streaming: Notifies event handler at each step (chunking, embedding, retrieval, generation, evaluation)
- Error Handling: Wraps errors in domain-specific exceptions (
IndexingError,GenerationError,EvaluationError)
Parameters Extracted:
- Chunking params:
chunking_method,chunk_size,chunk_overlap - Embedding params:
model_id,distance_metric,embedding_dimension,context_length - Retrieval params:
retrieval_method,window_size,number_of_chunks,search_mode,ranker_* - Generation params:
model_id,context_template_text,user_message_text,system_message_text
Models Pre-Selection (MPS)¶
The ModelsPreSelector reduces the search space by evaluating all model combinations on a small benchmark sample:
def run_pre_selection(
self,
foundation_models: list[BaseFoundationModel],
embedding_models: list[BaseEmbeddingModel],
n_records: int = 5,
random_seed: int = 17,
) -> dict[str, list]:
Process:
- Sample
n_recordsquestions from benchmark data (withrandom_seedfor reproducibility) - Evaluate every
(foundation_model, embedding_model)combination - Rank by
optimization_metricperformance - Select top
n_mps_foundation_modelsandn_mps_embedding_models - Update search space with selected models
Default Thresholds:
n_mps_foundation_models = 3n_mps_embedding_models = 2
When to Use:
- Large model search spaces (e.g., 10 foundation models × 5 embedding models = 50 combinations)
- Limited evaluation budget
- Want to eliminate clearly underperforming models early
HPO Optimizers¶
All optimizers implement the BaseOptimizer interface and share common functionality:
BaseOptimizer Interface¶
class BaseOptimizer(ABC):
def __init__(
self,
objective_function: Callable[[dict], float],
search_space: SearchSpace,
settings: OptimizerSettings,
):
self.objective_function = objective_function
self._search_space = search_space
self.settings = settings
@abstractmethod
def search(self) -> dict[str, Any]:
"""Return best configuration with score."""
Common Settings:
GAMOptimizer¶
The Generalized Additive Models (GAM) optimizer is the recommended algorithm for ai4rag. It uses a two-phase approach:
Phase 1: Random Exploration
- Randomly evaluates
n_random_nodesconfigurations from the search space - Builds initial understanding of the objective function landscape
- Excludes already-evaluated combinations (important for warm-start)
Phase 2: GAM-Guided Search
- Trains a
LinearGAMmodel on evaluated configurations - Predicts scores for all remaining (unevaluated) configurations
- Selects top
evals_per_trialconfigurations with highest predicted scores - Evaluates selected configurations and updates training data
- Repeats until
max_evalsreached
Settings:
@dataclass
class GAMOptSettings(OptimizerSettings):
max_evals: int = 20 # Total evaluation budget
n_random_nodes: int = 4 # Initial random evaluations
evals_per_trial: int = 1 # Evaluations per GAM iteration
random_state: int = 64 # Random seed for reproducibility
Algorithm Details:
def search(self) -> dict[str, Any]:
# Phase 1: Random exploration
self.evaluate_initial_random_nodes() # Evaluate n_random_nodes
# Phase 2: GAM-guided search
iterations_limit = ceil((max_evals - len(evaluations)) / evals_per_trial)
for _ in range(iterations_limit):
self._run_iteration() # Train GAM, predict, evaluate best
# Return best configuration
return max(evaluations, key=lambda x: x["score"])
GAM Iteration Details:
- Encode categorical parameters using
LabelEncoder(one encoder per parameter) - Train LinearGAM on
(encoded_params, scores)from successful evaluations - Predict scores for all remaining unevaluated configurations
- Select top N configurations with highest predictions (N =
evals_per_trial) - Evaluate selected configurations via
objective_function - Update
evaluationsand_evaluated_combinations
When to Use:
- Medium to large search spaces (10+ combinations)
- Want intelligent exploration beyond random search
- Have evaluation budget for >4 evaluations
Warm-Starting:
known_observations = [
{"foundation_model": model_a, "chunk_size": 512, ..., "score": 0.72},
{"foundation_model": model_b, "chunk_size": 1024, ..., "score": 0.68},
]
optimizer = GAMOptimizer(
objective_function=objective_fn,
search_space=search_space,
settings=GAMOptSettings(max_evals=20, n_random_nodes=2),
known_observations=known_observations,
)
- Pre-populates
evaluationswith known results - Counts toward
n_random_nodestarget (may skip random phase entirely if enough known) - Excludes known combinations from random sampling candidates
RandomOptimizer¶
A simple baseline optimizer that performs pure random search:
Algorithm:
- Shuffle all search space combinations
- Evaluate first
max_evalscombinations - Return best score
When to Use:
- Very small search spaces (<10 combinations)
- Benchmarking against smarter algorithms
- Sanity checking search space definitions
Search Space Manager¶
The search space system defines the hyperparameter search space and validates configurations against rules.
Parameter Class¶
Represents a single optimizable parameter with type-specific constraints:
@dataclass(frozen=True)
class Parameter(Generic[HashableT]):
name: str
param_type: Literal["B", "I", "R", "C"] # Boolean, Integer, Real, Categorical
v_min: Optional[int | float] = None # For I/R types
v_max: Optional[int | float] = None # For I/R types
values: Optional[Sequence[HashableT]] = None # For B/C types
Parameter Types:
| Type | Description | Required Fields | Example |
|---|---|---|---|
| C (Categorical) | Fixed set of values | values | values=["simple", "window"] |
| I (Integer) | Integer range | v_min, v_max | v_min=128, v_max=2048 |
| R (Real/Float) | Float range | v_min, v_max | v_min=0.0, v_max=1.0 |
| B (Boolean) | True/False | values | values=[True, False] |
Examples:
# Categorical parameter
Parameter(
name=AI4RAGParamNames.RETRIEVAL_METHOD,
param_type="C",
values=["simple", "window"]
)
# Integer parameter
Parameter(
name=AI4RAGParamNames.CHUNK_SIZE,
param_type="I",
v_min=128,
v_max=2048
)
# Real parameter
Parameter(
name=AI4RAGParamNames.RANKER_ALPHA,
param_type="R",
v_min=0.0,
v_max=1.0
)
# Categorical with complex objects (models)
Parameter(
name=AI4RAGParamNames.FOUNDATION_MODEL,
param_type="C",
values=[LSFoundationModel(...), LSFoundationModel(...)]
)
Hashability: Parameters are immutable (frozen=True) and hashable, enabling efficient comparison and caching. Complex categorical values (like model objects) are hashed for comparison.
SearchSpace Base Class¶
Generic search space manager with rule-based filtering:
class SearchSpace:
def __init__(
self,
params: list[Parameter] = None,
rules: list[RuleFunction] | None = None
):
self.params = params or []
self._rules = rules
Key Properties:
combinations: All valid parameter combinations after applying rulesmax_combinations: Total count of valid combinations
Rule Application:
Rules are functions that filter invalid parameter combinations:
RuleFunction: TypeAlias = Callable[[dict[str, Any]], bool]
def _rule_chunk_size_bigger_than_chunk_overlap(combination: dict) -> bool:
chunk_size = combination.get("chunk_size")
chunk_overlap = combination.get("chunk_overlap")
return chunk_size > 2 * chunk_overlap
Rules are applied sequentially to all combinations; any combination failing any rule is removed.
AI4RAGSearchSpace¶
Specialized search space for RAG optimization with built-in validation rules:
class AI4RAGSearchSpace(SearchSpace):
def __init__(
self,
params: list[Parameter] | None = None,
rules: list[RuleFunction] | None = None,
vector_store_type: str = "ls_milvus",
):
Built-in Validation Rules:
Base Rules (always applied):
- Chunk size > 2 × chunk overlap
- Ensures sufficient non-overlapping content
-
Skipped when
chunk_size == 0(structural-only splitting) -
Window size ↔ retrieval method consistency
window_size == 0requiresretrieval_method == "simple"-
window_size > 0requiresretrieval_method == "window" -
Chunk size within embedding context length
- Estimates token count:
estimated_tokens = chunk_size / 3.6 - Verifies
estimated_tokens <= embedding_model.params.context_length - Conservative ratio prevents runtime failures
Hybrid Search Rules (only for vector_store_type != "chroma"):
- Search mode ↔ ranker parameter consistency
- When
search_mode == "vector": all ranker params must be sentinels ("",0,1) -
When
search_mode == "hybrid":ranker_strategymust be non-empty -
Ranker K for RRF only
ranker_k > 0only whenranker_strategy == "rrf"-
ranker_k == 0(sentinel) for all other strategies -
Ranker alpha for weighted only
ranker_alpha != 1only whenranker_strategy == "weighted"ranker_alpha == 1(sentinel, means 100% dense) for all other strategies
Required Parameters:
User must provide at minimum:
foundation_model(Categorical)embedding_model(Categorical)
All other parameters have defaults that can be overridden.
Custom Rules:
Add your own validation logic:
def _rule_small_chunks_for_qa(combination: dict) -> bool:
"""Prefer smaller chunks for Q&A tasks."""
if combination.get("chunk_size", 0) > 1024:
return False
return True
search_space = AI4RAGSearchSpace(
params=[...],
rules=[_rule_small_chunks_for_qa]
)
Combination Generation:
# Example: 2 models × 2 chunk sizes × 2 retrievals = 8 combinations
params = [
Parameter(name="foundation_model", param_type="C", values=[model_a, model_b]),
Parameter(name="chunk_size", param_type="C", values=[512, 1024]),
Parameter(name="retrieval_method", param_type="C", values=["simple", "window"]),
]
search_space = AI4RAGSearchSpace(params=params)
# Generate Cartesian product, then filter via rules
combinations = search_space.combinations
# Returns: list of dicts, each dict is one valid configuration
Component Interaction Flow¶
sequenceDiagram
participant User
participant Exp as AI4RAGExperiment
participant MPS as ModelsPreSelector
participant GamOpt as GAMOptimizer
participant SS as AI4RAGSearchSpace
User->>Exp: search()
activate Exp
alt Models Pre-Selection needed
Exp->>MPS: run_pre_selection()
activate MPS
Note over MPS: Evaluate all models on sample
MPS-->>Exp: selected models
deactivate MPS
Exp->>SS: update search space
end
Exp->>GamOpt: initialize optimizer
activate GamOpt
GamOpt->>SS: get combinations
activate SS
SS-->>GamOpt: valid configurations
deactivate SS
Note over GamOpt: Phase 1 - Random Exploration
loop n_random_nodes times
GamOpt->>Exp: evaluate config
Note over Exp: run_single_evaluation
Exp-->>GamOpt: score
end
Note over GamOpt: Phase 2 - GAM-Guided Search
loop Until max_evals reached
Note over GamOpt: Train GAM and predict scores
GamOpt->>Exp: evaluate best predicted config
Note over Exp: run_single_evaluation
Exp-->>GamOpt: score
end
GamOpt-->>Exp: best configuration
deactivate GamOpt
Exp->>User: results via event_handler
deactivate Exp Extension Points¶
The core components are designed for extensibility:
Custom Optimizer¶
Implement BaseOptimizer for custom optimization algorithms:
class CustomOptimizer(BaseOptimizer):
def search(self) -> dict[str, Any]:
# Your algorithm here
pass
Use via search() kwargs:
Custom Search Space Rules¶
Add domain-specific validation:
def _rule_custom_constraint(combination: dict) -> bool:
# Your validation logic
return True
search_space = AI4RAGSearchSpace(
params=params,
rules=[_rule_custom_constraint]
)
Custom Exception Handling¶
The ExperimentExceptionHandler can be extended to customize error handling behavior for failed iterations.
Performance Considerations¶
Collection Reuse: - Vector store collections are reused when indexing_params match - Avoids re-embedding and re-indexing documents for identical chunking/embedding configs - Significant speedup when search space varies only retrieval/generation params
Results Caching: - Identical (indexing_params, rag_params) combinations return cached scores - Prevents redundant evaluation when optimizer suggests same config twice
Parallel Query Execution: - ThreadPoolExecutor with max 10 threads for concurrent question processing - Balances throughput with server load - Configurable via max_threads in query_rag()
Batch Embedding: - LSEmbeddingModel processes documents in batches of 2048 chunks - Prevents API request size limits - Applied in both indexing and query phases
GAM Efficiency: - Uses sklearn.preprocessing.LabelEncoder for categorical parameter encoding - Encodes once per search, reused for all iterations - LinearGAM training is fast even for large search spaces (100s of combinations)
Best Practices¶
-
Start with MPS: Let Models Pre-Selection eliminate poor models early when evaluating many model combinations
-
Use GAMOptimizer: Better than random search for search spaces with >10 combinations
-
Warm-start when possible: Reuse previous experiment results via
known_observations -
Define focused search spaces: Constrain parameter ranges based on domain knowledge to reduce search space size
-
Monitor via event handlers: Implement custom
BaseEventHandlerto track progress and debug issues -
Leverage collection reuse: Structure search space to maximize indexing parameter reuse (e.g., vary only retrieval params)
Next Steps¶
- Data Flow - Detailed workflow analysis
- RAG Components - RAG pipeline component details
- Architecture Overview - High-level design