Working with AutoAI RAG class and rag_optimizer

The AutoAI experiment class is responsible for creating experiments and scheduling training. All experiment results are stored automatically in the user-specified Cloud Object Storage (COS). Then the AutoAI feature can fetch the results and provide them directly to the user for further usage.

Configure rag_optimizer with one data source

For an AutoAI object initialization, you need watsonx.ai credentials (with your API key and URL) and either the project_id or space_id.

Hint

You can copy the project_id from the Project’s Manage tab (Project -> Manage -> General -> Details).

from ibm_watsonx_ai.experiment import AutoAI

experiment = AutoAI(wx_credentials,
    space_id='76g53e0-0b32-4a0e-9152-3d50324855ddb'
)

rag_optimizer = experiment.rag_optimizer(
    name='AutoAI RAG test',
    description='Sample description',
    foundation_models=[
        "meta-llama/llama-3-70b-instruct",
        "ibm/granite-13b-chat-v2"
        ],
    embedding_models=[
        "ibm/slate-125m-english-rtrvr",
        "intfloat/multilingual-e5-large"
        ],
    max_number_of_rag_patterns=5,
    optimization_metrics=[AutoAI.RAGMetrics.ANSWER_CORRECTNESS],
)

Get configuration parameters

To see the current configuration parameters, call the get_params() method.

config_parameters = rag_optimizer.get_params()
print(config_parameters)
{
    'name': 'RAG AutoAi tests without vector_store_references',
    'description': 'Sample description',
    'embedding_models': [
        'ibm/slate-125m-english-rtrvr',
        'intfloat/multilingual-e5-large'
        ],
    'foundation_models': [
        'meta-llama/llama-3-70b-instruct',
        'ibm/granite-13b-chat-v2'
        ],
    'max_number_of_rag_patterns': 5,
    'optimization_metrics': ['answer_correctness']
}

Run rag_optimizer

To schedule an AutoAI RAG experiment, call the run() method. This will trigger a training and an optimization process on watsonx.ai. The run() method can be synchronous (background_mode=False) or asynchronous (background_mode=True). If you don’t want to wait for the fit to end, invoke the async version. It immediately returns only run details.

run_details = rag_optimizer.run(
    input_data_references=[input_data_connection],
    test_data_references=[test_data_connection],
    results_reference=results_connection,
    background_mode=True
)

# OR

run_details = rag_optimizer.run(
    input_data_references=[input_data_connection],
    test_data_references=[test_data_connection],
    results_reference=results_connection,
    background_mode=False
)

Get the run status and run details

If you use the run() method asynchronously, you can monitor the run details and status using the following two methods:

status = rag_optimizer.get_run_status()
print(status)

'running'

# OR

'completed'

run_details = rag_optimizer.get_run_details()
print(run_details)

RAG Optimizer summary

It is possible to get a ranking of all the computed pattern, sorted based on a scoring metric supplied when configuring the optimizer (optimization_metrics parameter). The output type is a pandas.DataFrame with pattern names, computation timestamps, machine learning metrics, and the number of enhancements implemented in each of the pattern.

rag_optimizer.summary()
rag_optimizer.summary(scoring='answer_correctness')
rag_optimizer.summary(scoring=['answer_correctness', 'context_correctness'])

# Result:
#                  mean_answer_correctness  ...  ci_high_faithfulness
# Pattern_Name
# Pattern3                        0.79165   ...                0.5102
# Pattern1                        0.72915   ...                0.4839
# Pattern2                        0.64585   ...                0.8333
# Pattern4                        0.64585   ...                0.5312

Get pattern details

To see the pattern details, use the get_pattern_details() method. If you leave pattern_name empty, the method returns the details of the best computed pattern.

pattern_params = rag_optimizer.get_pattern_details(pattern_name='Pattern3')
print(pattern_params)
{
    'composition_steps': [
        'chunking',
        'embeddings',
        'vector_store',
        'retrieval',
        'generation'
    ],
    'location': {
        'evaluation_results': '4r55b555-63a6-4cc9-3d00-3d2y762b4vg/Pattern3/evaluation_results.json',
        'indexing_notebook': '4r55b555-63a6-4cc9-3d00-3d2y762b4vg/Pattern3/indexing_notebook.ipynb',
        'inference_notebook': '4r55b555-63a6-4cc9-3d00-3d2y762b4vg/Pattern3/inference_notebook.ipynb'
    },
    'name': 'Pattern3',
    'settings': {
        'chunking': {
            'chunk_size': 512,
            'method': 'recursive'
        },
        'embeddings': {
            'model_id': 'ibm/slate-125m-english-rtrvr',
            'truncate_input_tokens': 512,
            'truncate_strategy': 'left'
        },
        'generation': {
            'context_template_text': '[document]: {document}\n',
            'model_id': 'meta-llama/llama-3-70b-instruct',
            'parameters': {
                'max_new_tokens': 500
            },
        'prompt_template_text': '<|begin_of_text|><|start_header_id|>system<|end_header_id|>\nYou are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.\nIf a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don’t know the answer to a question, please don’t share false information.\n<|eot_id|><|start_header_id|>user<|end_header_id|>\n{reference_documents}\n[conversation]: {question}. Answer with no more than 150 words.  If you cannot base your answer on the given document, please state that you do not have an answer.<|eot_id|>\n<|start_header_id|>assistant<|end_header_id|>'},
        'retrieval': {
            'method': 'simple',
            'number_of_chunks': 5,
            'window_size': 0
        },
        'vector_store': {
            'distance_metric': 'cosine',
            'index_name': 'autoai_rag_20240807124701'
        }
    }
}

Get pattern

Use the get_pattern() method to load a specific pattern. If you leave pattern_name empty, the method returns the details of the best computed pattern.

pattern = rag_optimizer.get_pattern(pattern_name='Pattern3')
print(type(pattern))
'ibm_watsonx_ai.foundation_models.extensions.rag.pattern.pattern.RAGPattern'

Get inference and indexing notebooks

To download specified inference notebook from Service use the get_inference_notebook(). If you leave pattern_name empty, the method download notebook of the best computed pattern.

rag_optimizer.get_inference_notebook(pattern_name='Pattern3')

To download specified indexing notebook from Service use the get_indexing_notebook(). If you leave pattern_name empty, the method download notebook of the best computed pattern.

rag_optimizer.get_indexing_notebook(pattern_name='Pattern3')

Get logs

To download logs of an AutoAI RAG job use get_logs().

rag_optimizer.get_logs()

Get evaluation results

To download evaluation results of an AutoAI RAG job use get_evaluation_results(). If you leave pattern_name empty, the method download notebook of the best computed pattern.

rag_optimizer.get_evaluation_results(pattern_name="Pattern1")

Online deployment

To deploy a inference_function/inference_service please use below methods.

pattern = rag_optimizer.get_pattern()

deployment_details = pattern.inference_function.deploy(
    name="inference_function deployment",
    space_id="<SPACE_ID>",
)

Note

Supported since version 1.2.0

pattern = rag_optimizer.get_pattern()

deployment_details = pattern.inference_service.deploy(
    name="inference_service deployment",
    space_id="<SPACE_ID>",
)