Meta on Wednesday introduced V-JEPA 2, its next-generation “world model” designed to help AI agents understand and predict movements in 3D environments, enhancing applications in robotics and self-driving cars. An evolution of last year’s V-JEPA, V-JEPA 2 was trained on over one million hours of video and boasts a 30× speed advantage over competing models like NVIDIA’s Cosmos.
By building an abstract digital twin of physical spaces, V-JEPA 2 enables machines to anticipate actions—such as how gravity influences an object’s trajectory—without requiring vast amounts of labeled data. For example, the model can anticipate that a ball rolling off a table will fall and understands that an object hidden from view hasn’t simply disappeared. Early tests demonstrate smoother navigation for delivery robots and more reliable scene interpretation for autonomous vehicles.
“We believe world models will usher a new era for robotics, enabling real-world AI agents to help with chores and physical tasks without needing astronomical amounts of robotic training data,” explained Meta’s chief AI scientist Yann LeCun in a video.
Meta also announced three open benchmarks for the research community to evaluate video-based reasoning models, underscoring its commitment to advancing “physical AI.”
Quick Take
Meta’s V-JEPA 2 marks a pivotal shift from purely software-based AI to embodied “physical intelligence,” empowering robots and vehicles to reason about real-world dynamics. By open-sourcing benchmarks, Meta invites wider community validation, accelerating progress. If widely adopted, these world models could drastically cut data requirements and unlock new levels of autonomy across industries.