GRASP Enables Robust Long-Horizon Planning With World Models

A blog post published on the Berkeley AI Research (BAIR) site on April 20, 2026 introduces GRASP, a gradient-based planning method for learned dynamics models — commonly called world models — that the authors say is designed to remain stable at longer planning horizons. Source: http://bair.berkeley.edu/blog/2026/04/20/grasp/

According to the BAIR post, GRASP combines three ingredients: lifting a planned trajectory into "virtual states" so that optimization can be parallelized across time steps, injecting stochasticity directly into the state iterates to encourage exploration, and reshaping gradients so that action variables receive usable signal while gradients through high-dimensional vision encoders — which the authors describe as brittle — are avoided.

What the authors say the method addresses

The post frames the motivation in terms of a gap between predictive capability and control. Large learned world models, the authors write, "can predict long sequences of future observations in high-dimensional visual spaces and generalize across tasks," and at scale "start to look less like task-specific predictors and more like general-purpose simulators." But the BAIR post argues that a capable predictor is not the same as a usable planner.

Long-horizon planning with such models, according to the post, remains fragile: "optimization becomes ill-conditioned, non-greedy structure creates bad local minima, and high-dimensional latent spaces introduce subtle failure modes." The authors describe planning with modern world models as, in their words, "fragile," and identify long horizons as the regime where these issues compound.

How GRASP is structured

The BAIR post defines a world model, for the purposes of this work, as a learned model that, given a current state and a sequence of actions, predicts future states — where states can include images, latent vectors, or proprioceptive signals. Planning, in this framing, means searching over action sequences to optimize some objective under the learned dynamics.

GRASP is a new gradient-based planner for learned dynamics that makes long-horizon planning practical.

The first component, according to the post, is a collocation-style reformulation in which the trajectory is represented by a set of virtual state variables alongside the action sequence, allowing gradient updates to be computed in parallel across time rather than propagated sequentially through a long unrolled rollout. The second component adds noise to the state iterates during optimization, which the authors describe as a mechanism for exploring the loss landscape rather than settling into local minima produced by the non-greedy structure of long-horizon objectives.

The third component concerns which gradients are used. The post states that gradients flowing through high-dimensional visual state representations can be unreliable, and that GRASP instead routes optimization signal primarily to the action variables. The BAIR post includes animated demonstrations on two tasks — a navigation environment the authors label BallNav, and a Push-T manipulation setup — showing planned trajectories produced by the method. Readers interested in other recent methodological work on training signal quality may compare BAIR's framing with prior DeepBrief coverage of preference training data for AI reasoning.

Authors and framing

The BAIR post lists the collaborators on the work as Mike Rabbat, Aditi Krishnapriyan, Yann LeCun, and Amir Bar, with an asterisk indicating equal advisorship. The post is written in the first person by an author whose name is not reproduced in the excerpt available to DeepBrief; attribution of specific claims in this article is therefore made to "the BAIR post" or "the authors" rather than to a named individual writer.

The post does not, in the portion available, report quantitative benchmark numbers, baseline comparisons, or ablation results. It is framed as a blog post describing the problem and the proposed approach rather than as a standalone empirical report, and the BAIR post refers to the underlying work as a research project by the listed collaborators. DeepBrief has not independently located a corresponding arXiv preprint or peer-reviewed publication at the time of writing, and treats the blog post as the primary source of record for the claims described here.

Context within the research beat

Planning with learned dynamics models is a long-running research direction that sits between model-based reinforcement learning and the broader category of generative simulators. DeepBrief has previously covered adjacent questions about how learned models behave under evaluation pressure, including work comparing large language models with lightweight graph parsers on relation extraction and Stanford's AI Index analysis of model performance gaps.

The BAIR post does not claim that GRASP resolves all failure modes of long-horizon planning with world models. It describes the method as making gradient-based planning more robust in the settings the authors studied, and presents the three design choices — virtual-state lifting, state-iterate stochasticity, and gradient reshaping — as the specific changes responsible for that robustness. As of publication, the BAIR blog post is the only source DeepBrief has identified for the GRASP method.

Berkeley Researchers Propose GRASP, a Gradient-Based Planner for Long-Horizon World Models

What the authors say the method addresses

How GRASP is structured

Authors and framing

Context within the research beat

Stanford AI Index Reports US-China Model Performance Gap Narrowed to 2.7%

Vidoc Security Says It Replicated Anthropic's Mythos Findings Using Public Models

AI Models Play Cards Against Humanity — and Agree With Each Other More Than With Humans

Berkeley Researchers Propose GRASP, a Gradient-Based Planner for Long-Horizon World Models

What the authors say the method addresses

How GRASP is structured

Authors and framing

Context within the research beat

Related

Stanford AI Index Reports US-China Model Performance Gap Narrowed to 2.7%

Vidoc Security Says It Replicated Anthropic's Mythos Findings Using Public Models

AI Models Play Cards Against Humanity — and Agree With Each Other More Than With Humans