Prompt Evaluator¶
- class ibm_watsonx_gov.prompt_evaluator.prompt_evaluator.PromptEvaluator(credentials: Credentials | None = None)¶
Bases:
object
PromptEvaluator is a class that sets up a prompt template and evaluates the risks associated with it.
Example
# Create the prompt evaluator evaluator = PromptEvaluator( credentials=Credentials(api_key="") ) # Create the prompt setup prompt_setup = PromptSetup( task_type=TaskType.RAG, question_field="question", context_fields=["context1"], label_column="answer", ) # Create the prompt template prompt_template = PromptTemplate( name="test", description="description", input_variables=["question", "context1"], input_text="Answer the below question from the given context only and do not use the knowledge outside the context. Context: {context1} Question: {question} Answer:", model_id="google/flan-ul2", task_ids=[TaskType.RAG.value] ) # Provide the development container details development_container = ProjectContainer( container_id="3acf420f-526a-4007-abe7-78a03435aac2", monitors=[ GenerativeAIQualityMonitor(), ] ) # Evaluate the risk based on the provided dataset evaluator.evaluate_risk( prompt_setup=prompt_setup, prompt_template=prompt_template, containers=[development_container], environments=[EvaluationStage.DEVELOPMENT], input_file_path="./rag_dataset.csv", ) # Show the evaluation result evaluator.get_monitor_metrics( monitor=BaseMonitor(monitor_name="generative_ai_quality"), environment=EvaluationStage.DEVELOPMENT, show_table=True, ) evaluator.get_dataset_records( dataset_type="gen_ai_quality_metrics", environment=EvaluationStage.DEVELOPMENT, show_table=True, )
- e2e_prompt_evaluation(config: dict[str, any], input_file_path: str = None)¶
Method to set up and evaluate the prompt template end to end with a simplified interface.
Examples
# Create the prompt evaluator evaluator = PromptEvaluator( credentials=Credentials(api_key="") ) # detached prompt configuration example detached_prompt_config = { "prompt_setup": { "problem_type": TaskType.RAG.value, "context_fields": ["context1"], }, "development_project_id": "3acf420f-526a-4007-abe7-78a03435aac2", "detached_prompt_template": { "name": "detached prompt experiment", "model_id": "ibm/granite-3-2-8b-instruct", "input_text": "Answer the below question from the given context only and do not use the knowledge outside the context. Context: {context1} Question: {question} Answer:", "input_variables": ["question", "context1"], "detached_model_url": "https://us-south.ml.cloud.ibm.com/ml/v1/deployments/insurance_test_deployment/text/generation?version=2021-05-01", "task_ids": [TaskType.RAG.value], } # prompt configuration example prompt_config = { "prompt_setup": { "problem_type": TaskType.RAG.value, "context_fields": ["context1"], }, "development_project_id": "3acf420f-526a-4007-abe7-78a03435aac2", "prompt_template": { "name": "prompt experiment", "model_id": "ibm/granite-3-2-8b-instruct", "input_text": "Answer the below question from the given context only and do not use the knowledge outside the context. Context: {context1} Question: {question} Answer:", "input_variables": ["question", "context1"], "task_ids": [TaskType.RAG.value], }, // optional usecase configuration "ai_usecase_id": "b1504848-3cf9-4ab9-9d46-d688e34a0295", "catalog_id": "7bca9a52-7c90-4fb4-b3ef-3194e25a8452", // same as inventory_id "approach_id": "80b3a883-015f-498a-86f3-55ba74b5374b", "approach_version": "0.0.2", } # Evaluate the risk based on the provided dataset evaluator.e2e_prompt_evaluation( config=config, input_file_path="./rag_dataset.csv", ) # Show the evaluation result evaluator.get_monitor_metrics( monitor=BaseMonitor(monitor_name="generative_ai_quality"), environment=EvaluationStage.DEVELOPMENT, show_table=True, ) evaluator.get_dataset_records( dataset_type="gen_ai_quality_metrics", environment=EvaluationStage.DEVELOPMENT, show_table=True, )
- Parameters:
config (dict[str, any]) – configurations dictionary
input_file_path (str, optional) – Path to the input to evaluate. This can be a local file or link to a file. The propmt template evaluation will be skipped if this argument is no set.
- evaluate_risk(prompt_setup: PromptSetup, containers: list[ProjectContainer | SpaceContainer], input_file_path: str, prompt_template: PromptTemplate | DetachedPromptTemplate = None, prompt_template_id: str = None, environments: list[EvaluationStage] = [EvaluationStage.DEVELOPMENT])¶
Evaluate the risk of a given input file path for a list of containers. Note either prompt_template or prompt_template_id should be provided.
- Parameters:
prompt_template (PromptTemplate | DetachedPromptTemplate, optional) – The prompt template to use for evaluation.
prompt_template_id (str, optional) – The prompt template id to use for evaluation.
containers (list[ProjectContainer | SpaceContainer]) – The containers details.
input_file_path (str) – The path to the input file to evaluate.
environments (list[EvaluationStage], optional) – The list of evaluation stages to do the evaluation in. Defaults to [EvaluationStage.DEVELOPMENT].
- get_dataset_records(dataset_type: str, environment: EvaluationStage = EvaluationStage.DEVELOPMENT, show_table: bool = False) dict[str, any] ¶
Retrieve dataset records for a given dataset type and environment.
- Parameters:
dataset_type (str) – The type of dataset to retrieve records for.
environment (EvaluationStage, optional) – The environment to retrieve records from. Defaults to EvaluationStage.DEVELOPMENT.
show_table (bool, optional) – Whether to display the dataset records as a table. Defaults to False.
- Returns:
A dictionary containing the dataset records.
- Return type:
dict[str, any]
- get_monitor_metrics(monitor: BaseMonitor, environment: EvaluationStage = EvaluationStage.DEVELOPMENT, show_table: bool = False)¶
Get monitors metrics for a given monitor in a specific environment.
- Parameters:
monitor (BaseMonitor) – monitor to get the metrics for.
environment (EvaluationStage, optional) – monitor environment. Defaults to EvaluationStage.DEVELOPMENT.
show_table (bool, optional) – Flag to print the result table. Defaults to False.
- Returns:
Monitor metrics dictionary
- Return type:
dict[str, any]
- get_prompt_template_id(environment: EvaluationStage = EvaluationStage.DEVELOPMENT) str ¶
Retrieves the prompt template ID based on the specified environment.
- Parameters:
environment (EvaluationStage, optional) – The environment for which to retrieve the prompt template ID. Defaults to EvaluationStage.DEVELOPMENT.
- Returns:
The prompt template ID corresponding to the specified environment.
- Return type:
str