EN
CN
JP
AGIBOT Unveils Genie Envisioner 2.0, Advancing World Models into Scalable “World Simulators” for Embodied AI

AGIBOT today announced the release of Genie Envisioner 2.0 (GE 2-Sim), marking a significant step forward in the evolution of world models - from World Action Models to fully interactive World Simulators.


Visit the GitHub homepage here: https://ge-sim-v2.github.io

 

The new system introduces what AGIBOT describes as a “physical evolution engine” for embodied AI - a model-based environment where robots can be trained, evaluated, and optimized at scale, without relying solely on costly real-world trial and error.

 

From Understanding the World to Learning Within It

In 2025, AGIBOT introduced the industry’s first action-driven world model open-source platform, Genie Envisioner, enabling robots to understand the world through integrated modeling of vision, language, and action.

 

With Genie Envisioner 2.0, the paradigm shifts further: from enabling robots to understand the world, to enabling them to learn within a world generated by models.

 

This transition reflects a broader shift in embodied AI - from representing the world to simulating the world itself. As world models evolve into stable, high-fidelity environments that respond to actions in physically consistent ways, they unlock the ability to train robots at scale in synthetic environments.

 

AGIBOT believes this marks a critical inflection point toward achieving a true scaling law in embodied intelligence.

 

From World Action Model to World Simulator

At the core of this evolution is AGIBOT’s continued development of the World Action Model (WAM) framework, which extends traditional world models by explicitly incorporating actions as a first-class variable.

 

Rather than modeling only state, WAM captures the full loop of:

 State → Action → State Evolution

 

This enables world models to serve as a foundational layer for both policy learning and action generation.


1775788716929618.jpeg

 

Building on this foundation, AGIBOT has progressively developed a series of systems:

 EnerVerse: Extends embodied environments into a computable 4D world model

 Genie Envisioner Act (GE-Act): Bridges world representation and action trajectory generation

 Act2Goal: Enables long-horizon, goal-driven control

 

While these advances allowed world models to support policy learning, real-world deployment exposed key limitations: high reliance on physical environments, costly evaluation, and data scalability constraints.

 

This led to a fundamental realization:

The next breakthrough lies not in stronger representation, but in transforming world models into fully functional simulators.

 

 

Making the World Runnable: Toward Interactive Simulation

To enable this transition, AGIBOT introduces a set of new capabilities that push world models toward interactive simulation:

 EnerVerse-AC: Introduces action-conditioned world modeling for future prediction

 Genie Envisioner Sim (GE-Sim): A neural simulator for closed-loop policy evaluation

 EWMBench: A comprehensive benchmark evaluating simulation fidelity, action correctness, and semantic alignment

 

At the same time, AGIBOT establishes a new data and training paradigm:

 Real2Edit2Real: Real-world data becomes editable and extensible, significantly increasing scale and diversity

 Fidelity-Aware Data Composition: Combines real and generated data to balance realism and generalization

Together, these advancements transform world models from representation systems into environment-level infrastructure.


1775788780212999.jpeg

 

Genie Envisioner 2.0: A “Physical Evolution Engine”

Genie Envisioner 2.0 represents the culmination of this evolution—a system that is no longer just generative, but operational.

 

Key capabilities include:

Action-driven world dynamics

The system responds directly to robot actions, generating high-fidelity environmental changes that follow physical and semantic constraints. The world becomes a process shaped by interaction, rather than a static representation.

 

Long-horizon temporal modeling

Supports minute-level stable simulation, enabling continuous generation of full task sequences rather than fragmented clips.

 

Embodied spatial consistency

Unifies multi-view perception, cross-view 3D consistency, and robot proprioception into a single representation—transforming perception from images into a fully interactive embodied world.

Built-in evaluation and reward modeling

A native General Reward Model enables self-evaluation and optimization based on textual feedback, supporting RL in World Model without human-designed rewards.

 

Toward real-time interaction

With improved inference efficiency, GE 2-Sim approaches real-time operation, enabling:

 

 Eval in World Model

 RL in World Model

 Teleoperation in World Model

 

This marks the transition of world models from offline tools to interactive system environments.

 

1775788827593615.jpeg

 

A Paradigm Shift: When Models Become Worlds

As these capabilities converge, embodied AI is undergoing a fundamental transformation:

 

From “using models to understand the world”

To “learning and making decisions within model-generated worlds.”

 

On one side, the integration of WAM and Vision-Language-Action (VLA) models enables a shift from reactive control to generative, predictive decision-making.

 

On the other, World Simulators allow robots to explore, iterate, and optimize at scale—no longer limited by real-world data availability, but by the fidelity of simulation itself.

 

When these two trajectories converge, robots move beyond replicating human demonstrations to continuously exploring, adapting, and evolving within model-generated environments.

 

Toward a New Foundation for Embodied Intelligence

AGIBOT envisions world models evolving from tools for understanding, to platforms for learning, and ultimately to infrastructure that drives continuous evolution.

 

When models become worlds, reality is no longer the only training ground.

When worlds can be constructed, learning can be scaled.

And when evolution happens within models, the boundaries of embodied AI can be fundamentally redefined.