Use cases
aisteer360.evaluation.use_cases.base
Base class for all use cases. Provides a framework for loading evaluation data, applying metrics, and running
standardized evaluations across different types of tasks. Subclasses must implement the generate()
and evaluate()
methods to define task-specific evaluation logic.
UseCase
Bases: ABC
Base use case class.
Source code in aisteer360/evaluation/use_cases/base.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 |
|
evaluation_data = [(json.loads(line)) for line in f] if path.suffix == '.jsonl' else json.load(f)
instance-attribute
evaluation_metrics = evaluation_metrics
instance-attribute
evaluate(generations)
abstractmethod
Required evaluation logic for model's generations via evaluation_metrics
.
Source code in aisteer360/evaluation/use_cases/base.py
68 69 70 71 72 73 74 75 76 |
|
export(profiles, save_dir)
Optional formatting and export of evaluation profiles.
Source code in aisteer360/evaluation/use_cases/base.py
78 79 80 81 82 83 84 85 |
|
generate(model_or_pipeline, tokenizer, gen_kwargs=None, runtime_overrides=None)
abstractmethod
Required generation logic for the current use case.
Source code in aisteer360/evaluation/use_cases/base.py
55 56 57 58 59 60 61 62 63 64 65 66 |
|
validate_evaluation_data(evaluation_data)
Optional validation of the evaluation dataset.
Source code in aisteer360/evaluation/use_cases/base.py
90 91 92 93 94 |
|