ACPBench-Hard: Unrestrained Reasoning About Action, Change, And Planning

ACPBench consists of generative questions of the following 8 reasoning tasks.

Action Applicability (app) : The first task deals with identifying which actions are applicable in a state.
Action Applicability
Progression (prog) : The next task evaluates LLMs ability to understand how the world state changes by action application.
Progression
Atom Reachability (reach) : The reachability task evaluates if a specific fact can eventually become true by taking (possibly multiple consecutive) actions in the given state.
Atom Reachability
Action Reachability (areach) : The action reachability task is closely related to the atom reachability, checking whether there is a reachable state where the action is applicable.
Action Reachability
Validation (val) : The validation task aims at checking whether the specified sequence of actions πis a plan.
Validation
Justification (just) : The Justification task asks to simplify the plan by removing one or two consecutive actions and to produce the resulting simplified plan.
Justification
Landmarks (land) : The Landmarks task tests LLM’s ability to identify subgoals that are necessary to achieve the goal.
Landmarks
Next Action (nexta) : An additional task that does not appear in the original ACPBench is the next action task. This generative question asks what is the next action that takes us towards the goal. This task is closely related to optimal planning, since optimal plans can be produced by iteratively obtaining such actions.
Landmarks

ACPBench-Hard