Prompt Evaluator

class ibm_watsonx_gov.prompt_evaluator.prompt_evaluator.PromptEvaluator(credentials: Credentials | None = None)

Bases: object

PromptEvaluator is a class that sets up a prompt template and evaluates the risks associated with it.

Example

# Create the prompt evaluator
evaluator = PromptEvaluator(
    credentials=Credentials(api_key="")
)

# Create the prompt setup
prompt_setup = PromptSetup(
    task_type=TaskType.RAG,
    question_field="question",
    context_fields=["context1"],
    label_column="answer",
)

# Create the prompt template
prompt_template = PromptTemplate(
    name="test",
    description="description",
    input_variables=["question", "context1"],
    input_text="Answer the below question from the given context only and do not use the knowledge outside the context. Context: {context1} Question: {question} Answer:",
    model_id="google/flan-ul2",
    task_ids=[TaskType.RAG.value]
)

# Provide the development container details
development_container = ProjectContainer(
    container_id="3acf420f-526a-4007-abe7-78a03435aac2",
    monitors=[
        GenerativeAIQualityMonitor(),
    ]
)

# Evaluate the risk based on the provided dataset
evaluator.evaluate_risk(
    prompt_setup=prompt_setup,
    prompt_template=prompt_template,
    containers=[development_container],
    environments=[EvaluationStage.DEVELOPMENT],
    input_file_path="./rag_dataset.csv",
)

# Show the evaluation result
evaluator.get_monitor_metrics(
    monitor=BaseMonitor(monitor_name="generative_ai_quality"),
    environment=EvaluationStage.DEVELOPMENT,
    show_table=True,
)

evaluator.get_dataset_records(
    dataset_type="gen_ai_quality_metrics",
    environment=EvaluationStage.DEVELOPMENT,
    show_table=True,
)
e2e_prompt_evaluation(config: dict[str, any], input_file_path: str = None)

Method to set up and evaluate the prompt template end to end with a simplified interface.

Examples

# Create the prompt evaluator
evaluator = PromptEvaluator(
    credentials=Credentials(api_key="")
)

# detached prompt configuration example
detached_prompt_config = {
    "prompt_setup": {
        "problem_type": TaskType.RAG.value,
        "context_fields": ["context1"],
    },
    "development_project_id": "3acf420f-526a-4007-abe7-78a03435aac2",
    "detached_prompt_template": {
        "name": "detached prompt experiment",
        "model_id": "ibm/granite-3-2-8b-instruct",
        "input_text": "Answer the below question from the given context only and do not use the knowledge outside the context. Context: {context1} Question: {question} Answer:",
        "input_variables": ["question", "context1"],
        "detached_model_url": "https://us-south.ml.cloud.ibm.com/ml/v1/deployments/insurance_test_deployment/text/generation?version=2021-05-01",
        "task_ids": [TaskType.RAG.value],
    }

# prompt configuration example
prompt_config = {
    "prompt_setup": {
        "problem_type": TaskType.RAG.value,
        "context_fields": ["context1"],
    },
    "development_project_id": "3acf420f-526a-4007-abe7-78a03435aac2",
    "prompt_template": {
        "name": "prompt experiment",
        "model_id": "ibm/granite-3-2-8b-instruct",
        "input_text": "Answer the below question from the given context only and do not use the knowledge outside the context. Context: {context1} Question: {question} Answer:",
        "input_variables": ["question", "context1"],
        "task_ids": [TaskType.RAG.value],
    },
    // optional usecase configuration
    "ai_usecase_id": "b1504848-3cf9-4ab9-9d46-d688e34a0295",
    "catalog_id": "7bca9a52-7c90-4fb4-b3ef-3194e25a8452", // same as inventory_id
    "approach_id": "80b3a883-015f-498a-86f3-55ba74b5374b",
    "approach_version": "0.0.2",
}

# Evaluate the risk based on the provided dataset
evaluator.e2e_prompt_evaluation(
    config=config,
    input_file_path="./rag_dataset.csv",
)

# Show the evaluation result
evaluator.get_monitor_metrics(
    monitor=BaseMonitor(monitor_name="generative_ai_quality"),
    environment=EvaluationStage.DEVELOPMENT,
    show_table=True,
)

evaluator.get_dataset_records(
    dataset_type="gen_ai_quality_metrics",
    environment=EvaluationStage.DEVELOPMENT,
    show_table=True,
)
Parameters:
  • config (dict[str, any]) – configurations dictionary

  • input_file_path (str, optional) – Path to the input to evaluate. This can be a local file or link to a file. The propmt template evaluation will be skipped if this argument is no set.

evaluate_risk(prompt_setup: PromptSetup, containers: list[ProjectContainer | SpaceContainer], input_file_path: str, prompt_template: PromptTemplate | DetachedPromptTemplate = None, prompt_template_id: str = None, environments: list[EvaluationStage] = [EvaluationStage.DEVELOPMENT])

Evaluate the risk of a given input file path for a list of containers. Note either prompt_template or prompt_template_id should be provided.

Parameters:
  • prompt_template (PromptTemplate | DetachedPromptTemplate, optional) – The prompt template to use for evaluation.

  • prompt_template_id (str, optional) – The prompt template id to use for evaluation.

  • containers (list[ProjectContainer | SpaceContainer]) – The containers details.

  • input_file_path (str) – The path to the input file to evaluate.

  • environments (list[EvaluationStage], optional) – The list of evaluation stages to do the evaluation in. Defaults to [EvaluationStage.DEVELOPMENT].

get_dataset_records(dataset_type: str, environment: EvaluationStage = EvaluationStage.DEVELOPMENT, show_table: bool = False) dict[str, any]

Retrieve dataset records for a given dataset type and environment.

Parameters:
  • dataset_type (str) – The type of dataset to retrieve records for.

  • environment (EvaluationStage, optional) – The environment to retrieve records from. Defaults to EvaluationStage.DEVELOPMENT.

  • show_table (bool, optional) – Whether to display the dataset records as a table. Defaults to False.

Returns:

A dictionary containing the dataset records.

Return type:

dict[str, any]

get_monitor_metrics(monitor: BaseMonitor, environment: EvaluationStage = EvaluationStage.DEVELOPMENT, show_table: bool = False)

Get monitors metrics for a given monitor in a specific environment.

Parameters:
  • monitor (BaseMonitor) – monitor to get the metrics for.

  • environment (EvaluationStage, optional) – monitor environment. Defaults to EvaluationStage.DEVELOPMENT.

  • show_table (bool, optional) – Flag to print the result table. Defaults to False.

Returns:

Monitor metrics dictionary

Return type:

dict[str, any]

get_prompt_template_id(environment: EvaluationStage = EvaluationStage.DEVELOPMENT) str

Retrieves the prompt template ID based on the specified environment.

Parameters:

environment (EvaluationStage, optional) – The environment for which to retrieve the prompt template ID. Defaults to EvaluationStage.DEVELOPMENT.

Returns:

The prompt template ID corresponding to the specified environment.

Return type:

str