Skip to content

Quick Start

This guide walks you through running your first RAG optimization experiment with ai4rag. For the sake of quick-start OGX server will be used, but this can be run with independently deployed models as long as they are introduced to the experiment with proper wrapper.


Data loading

To run the experiment you need to provide documents as DoclingDocument instances (from the docling-core library). For the development purposes you may use FileStore implementation from dev_utils, but this will be available only when cloning the repository, as this is not part of the project.


Prerequisites

Before starting, ensure you have:

  • Installed ai4rag (Installation Guide)
  • A running OGX server with models configured or other deployed models that can be used for the experiment
  • Environment variables set (e.g. BASE_URL, APIKEY) to communicate with OGX server or deployed models

Step-by-Step Guide with OGX

1. Prepare OGX Client

Create a client instance to connect to your OGX server:

import os
from dotenv import load_dotenv, find_dotenv
from ogx_client import OgxClient

load_dotenv(find_dotenv())

client = OgxClient(
    base_url=os.getenv("BASE_URL"),
    api_key=os.getenv("APIKEY")
)

2. Prepare Knowledge Base Documents

Load your knowledge base documents from a local directory:

from pathlib import Path
from dev_utils.file_store import FileStore

# Path to your documents folder
documents_path = Path("path/to/your/documents")

# Load documents (supports PDF, HTML, TXT, MD, etc.)
documents = FileStore(documents_path).load_as_documents()

print(f"Loaded {len(documents)} documents")

Document Format

Documents must include a document_id in their metadata. FileStore handles this automatically.


3. Prepare Benchmark Data

Create a benchmark_data.json file with questions and ground truth answers:

[
  {
    "question": "What is the main purpose of ai4rag?",
    "correct_answers": [
      "ai4rag optimizes RAG templates using hyperparameter optimization",
      "ai4rag finds optimal RAG configurations"
    ],
    "correct_answer_document_ids": ["doc_001.pdf", "doc_002.pdf"]
  },
  {
    "question": "Which vector databases are supported?",
    "correct_answers": [
      "Milvus and ChromaDB are supported."
    ],
    "correct_answer_document_ids": ["doc_005.txt"]
  }
]

Load the benchmark data:

from dev_utils.utils import read_benchmark_from_json

benchmark_data_path = Path("path/to/benchmark_data.json")
benchmark_data = read_benchmark_from_json(benchmark_data_path)

Benchmark Quality

High-quality benchmark data is crucial for meaningful optimization. Ensure questions are based on your knowledge base and answers are accurate.


4. Define Search Space

Specify which parameters to optimize and their possible values:

from ai4rag.search_space.src.parameter import Parameter
from ai4rag.search_space.src.search_space import AI4RAGSearchSpace
from ai4rag.rag.foundation_models.ogx import OGXFoundationModel
from ai4rag.rag.embedding.ogx import OGXEmbeddingModel

search_space = AI4RAGSearchSpace(
    params=[
        # Foundation model for generation
        Parameter(
            name="foundation_model",
            param_type="C",
            values=[
                OGXFoundationModel(
                    model_id="ollama/llama3.2:3b",
                    client=client
                )
            ],
        ),
        # Embedding model
        Parameter(
            name="embedding_model",
            param_type="C",
            values=[
                OGXEmbeddingModel(
                    model_id="ollama/nomic-embed-text:latest",
                    client=client,
                    params={
                        "embedding_dimension": 768,
                        "context_length": 8192
                    },
                )
            ],
        ),
        # Chunking parameters
        Parameter(
            name="chunk_size",
            param_type="C",
            values=[200, 400, 800, 1000],
        ),
        Parameter(
            name="chunk_overlap",
            param_type="C",
            values=[0, 50, 100, 200],
        ),
        # Retrieval parameters
        Parameter(
            name="retrieval_method",
            param_type="C",
            values=["simple", "window"],
        ),
        Parameter(
            name="number_of_chunks",
            param_type="C",
            values=[3, 5, 7, 10],
        ),
    ]
)

5. Configure Optimizer

Set up the hyperparameter optimization algorithm:

from ai4rag.core.hpo.gam_opt import GAMOptSettings

optimizer_settings = GAMOptSettings(
    max_evals=10,      # Total number of configurations to evaluate
    n_random_nodes=4   # Number of random explorations before using GAM
)

Optimization Strategy

  • Random phase (n_random_nodes): Explores the search space randomly to avoid falling into local minimum (greater value = better solutions space exploration)
  • GAM phase: Uses a model to suggest promising configurations

6. Run the Experiment

Create and run the optimization experiment:

from ai4rag.core.experiment.experiment import AI4RAGExperiment
from ai4rag.utils.event_handler import LocalEventHandler

experiment = AI4RAGExperiment(
    client=client,
    documents=documents,
    benchmark_data=benchmark_data,
    search_space=search_space,
    vector_store_type="ogx",  # "ogx" for OGX, or "chroma" for in-memory
    ogx_vector_io_provider_id="milvus",  # Matches your OGX server config
    optimizer_settings=optimizer_settings,
    event_handler=LocalEventHandler(output_path="<path_to_store_results>"),  # Tracks progress
)

# Run optimization
experiment.search()

best_pattern = experiment.results.get_best_evaluations(k=1)[0]

print(best_pattern.rag_pattern.generate("What is the main purpose of ai4rag?"))

7. Review Results

After completion, check the output_path directory for:

  • JSON files: Detailed results for each evaluated configuration

Complete Example

Here's the full code in one place:

import os
from pathlib import Path
from dotenv import load_dotenv
from ogx_client import OgxClient

from ai4rag.core.experiment.experiment import AI4RAGExperiment
from ai4rag.search_space.src.parameter import Parameter
from ai4rag.search_space.src.search_space import AI4RAGSearchSpace
from ai4rag.rag.foundation_models.ogx import OGXFoundationModel
from ai4rag.rag.embedding.ogx import OGXEmbeddingModel
from ai4rag.core.hpo.gam_opt import GAMOptSettings
from ai4rag.utils.event_handler import LocalEventHandler

from dev_utils.file_store import FileStore
from dev_utils.utils import read_benchmark_from_json

# 1. Setup client
load_dotenv()
client = OgxClient(
    base_url=os.getenv("BASE_URL"),
    api_key=os.getenv("APIKEY")
)

# 2. Load documents
documents = FileStore(Path("./knowledge_base")).load_as_documents()

# 3. Load benchmark data
benchmark_data = read_benchmark_from_json(Path("./benchmark_data.json"))

# 4. Define search space
search_space = AI4RAGSearchSpace(
    params=[
        Parameter(
            name="foundation_model",
            param_type="C",
            values=[OGXFoundationModel(model_id="ollama/llama3.2:3b", client=client)],
        ),
        Parameter(
            name="embedding_model",
            param_type="C",
            values=[
                OGXEmbeddingModel(
                    model_id="ollama/nomic-embed-text:latest",
                    client=client,
                    params={"embedding_dimension": 768, "context_length": 8192},
                )
            ],
        ),
        Parameter(name="chunk_size", param_type="I", values=[200, 400, 800]),
        Parameter(name="chunk_overlap", param_type="I", values=[0, 50, 100]),
        Parameter(name="retrieval_method", param_type="C", values=["simple", "window"]),
        Parameter(name="number_of_chunks", param_type="I", values=[3, 5, 7]),
    ]
)

# 5. Configure optimizer
optimizer_settings = GAMOptSettings(max_evals=10, n_random_nodes=4)

# 6. Run experiment
experiment = AI4RAGExperiment(
    client=client,
    documents=documents,
    benchmark_data=benchmark_data,
    search_space=search_space,
    vector_store_type="ogx",
    ogx_vector_io_provider_id="milvus",
    optimizer_settings=optimizer_settings,
    event_handler=LocalEventHandler(output_path="./results"),
)

best_pattern = experiment.search()
print(f"Optimization complete! Best pattern: {best_pattern}")

Next Steps