Examples

We have prepared a collection of example notebooks for each of the implemented controls in our toolkit (including examples of how to implement methods from wrappers), as well as demonstrations of more extensive benchmarks.

Controls

Input control

Input control methods adapt the input (prompt) before the model is called. Current notebooks cover:

FewShot
Structural control

Structural control methods adapt the model's weights/architecture. Current notebooks cover:

MergeKit wrapper

TRL wrapper
State control

State control methods influence the model's internal states (activation, attentions, etc.) at inference time. Current notebooks cover:

ActAdd

CAA

CAST

ITI

PASTA
Output control

Output control methods influence the model's behavior via the generate() method. Current notebooks cover:

DeAL

RAD

SASA

ThinkingIntervention

Benchmarks

Instruction following

This notebook studies the effect of post-hoc attention steering (PASTA) on a model's ability to follow instructions. We sweep over the steering strength and investigate the trade-off between a model's instruction following ability and general response quality.

See the benchmark
Commonsense MCQA

This notebook benchmarks steering methods on the CommonsenseQA dataset, comparing few-shot prompting against a LoRA adapter trained with DPO. We sweep over the number of few-shot examples and study how accuracy scales relative to the fine-tuned baseline across two models.

See the benchmark
Composite steering for truthfulness

One of the primary features of the toolkit is the ability to compose multiple steering methods into one model operation. This notebook composes a state control (PASTA) with an output control (DeAL) with the goal of improving the model's truthfulness (as measured on TruthfulQA) without significantly degrading informativeness. We sweep over the joint parameter space of the controls and study each control's performance (via the tradeoff between truthfulness and informativeness) to that of the composition.

See the benchmark