Notebooks
Notebooks cover basic implementations of each control in our toolkit (including examples of how to implement methods from wrappers), as well as implementations of benchmarks.
Controls
-
Input control
Input control methods adapt the input (prompt) before the model is called. Current notebooks cover:
-
Structural control
Structural control methods adapt the model's weights/architecture. Current notebooks cover:
-
State control
State control methods influence the model's internal states (activation, attentions, etc.) at inference time. Current notebooks cover:
-
Output control
Output control methods influence the model's behavior via the
generate()
method. Current notebooks cover:
Benchmarks
-
Commonsense MCQA
This benchmark evaluates how well a steered model (under
FewShot
andLoRA
) performs compared to a base model on answering commonsense multiple-choice questions. -
Instruction following
This benchmark evaluates a steered model's ability to follow instructions. We compare the performance of the baseline model to the steered model under
PASTA
,DeAL
, andThinkingIntervention
.