ACPBench consists of generative questions of the following 8 reasoning tasks.
Action Applicability
Progression
Atom Reachability
Action Reachability
Validation
Justification
Landmarks
Landmarks
A domain-wise analysis of the performance of GPT-4o on all of ACPBench-Hard tasks.
@misc{kokel2025acphard,
title={ACPBench Hard: Unrestrained Reasoning about Action, Change, and Planning},
author={Harsha Kokel and Michael Katz and Kavitha Srinivas and Shirin Sohrabi},
year={2025},
eprint={2503.24378},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2503.24378},
}