Quick Start¶

This guide walks you through running your first RAG optimization experiment with ai4rag. For the sake of quick-start OGX server will be used, but this can be run with independently deployed models as long as they are introduced to the experiment with proper wrapper.

Data loading¶

To run the experiment you need to provide documents as DoclingDocument instances (from the docling-core library). For the development purposes you may use FileStore implementation from dev_utils, but this will be available only when cloning the repository, as this is not part of the project.

Prerequisites¶

Before starting, ensure you have:

Installed ai4rag (Installation Guide)
A running OGX server with models configured or other deployed models that can be used for the experiment
Environment variables set (e.g. BASE_URL, APIKEY) to communicate with OGX server or deployed models

Step-by-Step Guide with OGX¶

1. Prepare OGX Client¶

Create a client instance to connect to your OGX server:

import os
from dotenv import load_dotenv, find_dotenv
from ogx_client import OgxClient

load_dotenv(find_dotenv())

client = OgxClient(
    base_url=os.getenv("BASE_URL"),
    api_key=os.getenv("APIKEY")
)

2. Prepare Knowledge Base Documents¶

Load your knowledge base documents from a local directory:

from pathlib import Path
from dev_utils.file_store import FileStore

# Path to your documents folder
documents_path = Path("path/to/your/documents")

# Load documents (supports PDF, HTML, TXT, MD, etc.)
documents = FileStore(documents_path).load_as_documents()

print(f"Loaded {len(documents)} documents")

Document Format

Documents must include a document_id in their metadata. FileStore handles this automatically.

3. Prepare Benchmark Data¶

Create a benchmark_data.json file with questions and ground truth answers:

[
  {
    "question": "What is the main purpose of ai4rag?",
    "correct_answers": [
      "ai4rag optimizes RAG templates using hyperparameter optimization",
      "ai4rag finds optimal RAG configurations"
    ],
    "correct_answer_document_ids": ["doc_001.pdf", "doc_002.pdf"]
  },
  {
    "question": "Which vector databases are supported?",
    "correct_answers": [
      "Milvus and ChromaDB are supported."
    ],
    "correct_answer_document_ids": ["doc_005.txt"]
  }
]

Load the benchmark data:

from dev_utils.utils import read_benchmark_from_json

benchmark_data_path = Path("path/to/benchmark_data.json")
benchmark_data = read_benchmark_from_json(benchmark_data_path)

Benchmark Quality

High-quality benchmark data is crucial for meaningful optimization. Ensure questions are based on your knowledge base and answers are accurate.

4. Define Search Space¶

Specify which parameters to optimize and their possible values:

from ai4rag.search_space.src.parameter import Parameter
from ai4rag.search_space.src.search_space import AI4RAGSearchSpace
from ai4rag.rag.foundation_models.ogx import OGXFoundationModel
from ai4rag.rag.embedding.ogx import OGXEmbeddingModel

search_space = AI4RAGSearchSpace(
    params=[
        # Foundation model for generation
        Parameter(
            name="foundation_model",
            param_type="C",
            values=[
                OGXFoundationModel(
                    model_id="ollama/llama3.2:3b",
                    client=client
                )
            ],
        ),
        # Embedding model
        Parameter(
            name="embedding_model",
            param_type="C",
            values=[
                OGXEmbeddingModel(
                    model_id="ollama/nomic-embed-text:latest",
                    client=client,
                    params={
                        "embedding_dimension": 768,
                        "context_length": 8192
                    },
                )
            ],
        ),
        # Chunking parameters
        Parameter(
            name="chunk_size",
            param_type="C",
            values=[200, 400, 800, 1000],
        ),
        Parameter(
            name="chunk_overlap",
            param_type="C",
            values=[0, 50, 100, 200],
        ),
        # Retrieval parameters
        Parameter(
            name="retrieval_method",
            param_type="C",
            values=["simple", "window"],
        ),
        Parameter(
            name="number_of_chunks",
            param_type="C",
            values=[3, 5, 7, 10],
        ),
    ]
)

5. Configure Optimizer¶

Set up the hyperparameter optimization algorithm:

from ai4rag.core.hpo.gam_opt import GAMOptSettings

optimizer_settings = GAMOptSettings(
    max_evals=10,      # Total number of configurations to evaluate
    n_random_nodes=4   # Number of random explorations before using GAM
)

Optimization Strategy

Random phase (n_random_nodes): Explores the search space randomly to avoid falling into local minimum (greater value = better solutions space exploration)
GAM phase: Uses a model to suggest promising configurations

6. Run the Experiment¶

Create and run the optimization experiment:

from ai4rag.core.experiment.experiment import AI4RAGExperiment
from ai4rag.utils.event_handler import LocalEventHandler

experiment = AI4RAGExperiment(
    client=client,
    documents=documents,
    benchmark_data=benchmark_data,
    search_space=search_space,
    vector_store_type="ogx",  # "ogx" for OGX, or "chroma" for in-memory
    ogx_vector_io_provider_id="milvus",  # Matches your OGX server config
    optimizer_settings=optimizer_settings,
    event_handler=LocalEventHandler(output_path="<path_to_store_results>"),  # Tracks progress
)

# Run optimization
experiment.search()

best_pattern = experiment.results.get_best_evaluations(k=1)[0]

print(best_pattern.rag_pattern.generate("What is the main purpose of ai4rag?"))

7. Review Results¶

After completion, check the output_path directory for:

JSON files: Detailed results for each evaluated configuration

Complete Example¶

Here's the full code in one place:

import os
from pathlib import Path
from dotenv import load_dotenv
from ogx_client import OgxClient

from ai4rag.core.experiment.experiment import AI4RAGExperiment
from ai4rag.search_space.src.parameter import Parameter
from ai4rag.search_space.src.search_space import AI4RAGSearchSpace
from ai4rag.rag.foundation_models.ogx import OGXFoundationModel
from ai4rag.rag.embedding.ogx import OGXEmbeddingModel
from ai4rag.core.hpo.gam_opt import GAMOptSettings
from ai4rag.utils.event_handler import LocalEventHandler

from dev_utils.file_store import FileStore
from dev_utils.utils import read_benchmark_from_json

# 1. Setup client
load_dotenv()
client = OgxClient(
    base_url=os.getenv("BASE_URL"),
    api_key=os.getenv("APIKEY")
)

# 2. Load documents
documents = FileStore(Path("./knowledge_base")).load_as_documents()

# 3. Load benchmark data
benchmark_data = read_benchmark_from_json(Path("./benchmark_data.json"))

# 4. Define search space
search_space = AI4RAGSearchSpace(
    params=[
        Parameter(
            name="foundation_model",
            param_type="C",
            values=[OGXFoundationModel(model_id="ollama/llama3.2:3b", client=client)],
        ),
        Parameter(
            name="embedding_model",
            param_type="C",
            values=[
                OGXEmbeddingModel(
                    model_id="ollama/nomic-embed-text:latest",
                    client=client,
                    params={"embedding_dimension": 768, "context_length": 8192},
                )
            ],
        ),
        Parameter(name="chunk_size", param_type="I", values=[200, 400, 800]),
        Parameter(name="chunk_overlap", param_type="I", values=[0, 50, 100]),
        Parameter(name="retrieval_method", param_type="C", values=["simple", "window"]),
        Parameter(name="number_of_chunks", param_type="I", values=[3, 5, 7]),
    ]
)

# 5. Configure optimizer
optimizer_settings = GAMOptSettings(max_evals=10, n_random_nodes=4)

# 6. Run experiment
experiment = AI4RAGExperiment(
    client=client,
    documents=documents,
    benchmark_data=benchmark_data,
    search_space=search_space,
    vector_store_type="ogx",
    ogx_vector_io_provider_id="milvus",
    optimizer_settings=optimizer_settings,
    event_handler=LocalEventHandler(output_path="./results"),
)

best_pattern = experiment.search()
print(f"Optimization complete! Best pattern: {best_pattern}")

Next Steps¶

Learn about search spaces - Customize parameter ranges
Explore optimizers - Fine-tune optimization strategies
Understand evaluation - Metrics and scoring
Custom event handlers - Track experiments in production