NVIDIA launches Cosmos Policy, redefining robotic control with unified world models

Cosmos Policy turns robots into systems that plan, predict, and act with unprecedented precision

NVIDIA presented Cosmos Policy, a new robotic control policy that takes the Cosmos family World Foundation Models (WFMs) to a new level in manipulation, planning, and physical task execution. The approach stands out by directly adapting the Cosmos Predict-2 model, originally trained to predict scene evolution over time, without adding external control modules or parallel architectures. This simplifies training and allows the robot to natively inherit the physical understanding already learned by the model. Cosmos Policy's main technical differentiator lies in how actions, physical states, and success metrics are represented. Instead of separate networks for perception and control, everything is treated as "latent frames," similar to video frames. This unified representation uses the same diffusion process employed in video generation, allowing the model to reuse its prior knowledge about dynamics, gravity, and environment interaction to decide how to act. With this structure, a single model performs three critical functions simultaneously: predicting action sequences for visuomotor control, anticipating future environment states, and estimating expected returns for planning. Cosmos Policy can operate both as a direct policy, generating actions in real time, and as a model-based planning policy, evaluating multiple future scenarios before deciding—essential for long and complex tasks. On the LIBERO and RoboCasa benchmarks, considered references in robotic manipulation, Cosmos Policy achieved state-of-the-art results. On LIBERO, it achieved an average success rate of 98.5%, outperforming diffusion policies trained from scratch and vision-language VLA models. On RoboCasa, it reached 67.1% success with just 50 demonstrations per task, showing significantly higher data efficiency than competing alternatives. In real-world tests using the bimanual ALOHA robot, the model demonstrated the ability to execute long-horizon tasks directly from visual observations. When combined with model-based planning, Cosmos Policy showed an average 12.5% gain in task completion rate, reinforcing the practical impact of integrating prediction, value, and action in a single system. Beyond the technical advance, NVIDIA announced the Cosmos Cookoff, an open hackathon to encourage developers to explore and expand Physical AI applications with the Cosmos models.