Notebooks

Notebooks cover basic implementations of each control in our toolkit (including examples of how to implement methods from wrappers), as well as implementations of benchmarks.

Controls

Input control

Input control methods adapt the input (prompt) before the model is called. Current notebooks cover:

FewShot
Structural control

Structural control methods adapt the model's weights/architecture. Current notebooks cover:

MergeKit wrapper

TRL wrapper
State control

State control methods influence the model's internal states (activation, attentions, etc.) at inference time. Current notebooks cover:

CAST

PASTA
Output control

Output control methods influence the model's behavior via the generate() method. Current notebooks cover:

DeAL

RAD

SASA

ThinkingIntervention

Benchmarks

Commonsense MCQA

This benchmark evaluates how well a steered model (under FewShot and LoRA) performs compared to a base model on answering commonsense multiple-choice questions.

See the benchmark
Instruction following

This benchmark evaluates a steered model's ability to follow instructions. We compare the performance of the baseline model to the steered model under PASTA, DeAL, and ThinkingIntervention.

See the benchmark