AGIBOT’s Genie Envisioner-Sim 2.0 Ranks No. 1 on WorldArena Benchmark, Demonstrating Superior Capabilities in Embodied AI

SHANGHAI, CHINA - May 29, 2026】AGIBOT today announced that its self-developed world model, Genie Envisioner-Sim 2.0 (GE 2.0), has ranked first on the latest WorldArena Track 1 leaderboard, a benchmark focused on world model perception and action response.

 

WorldArena is an embodied AI benchmark that evaluates the capabilities of world models in understanding and responding to dynamic environments. Track 1, which focuses on world model perception and action response, measures one of the most fundamental capabilities in embodied AI: the ability to understand, predict, and respond to changing physical scenarios. As embodied AI moves closer to real-world deployment, this capability is becoming an important indicator of a humanoid robot’s intelligence and generalization potential.


 1780061850358177.png

GenieEnvisioner-Sim2.0-2B tops the latest WorldArena Track 1 leaderboard

 

In the WorldArena evaluation, the AGIBOT team used its native world model architecture without designing a task-specific system for the benchmark. The model was only fine-tuned on the leaderboard data, yet still achieved the top overall ranking. This result highlights the strong general adaptability of GE 2.0 and validates AGIBOT’s long-term approach to building a robust technical foundation for embodied world models.


“World models are becoming a critical foundation for embodied intelligence because they allow robots to learn, evaluate, and improve in simulated environments before entering the physical world,” said Dr. Yao Maoqing, Partner, Senior Vice President, President of Embodied AI Business Unit at AGIBOT. “GE 2.0’s performance on WorldArena reflects our belief that the next stage of humanoid robotics will be shaped not only by hardware capability, but by the ability to build reliable, scalable intelligence systems that can generalize across tasks and environments.”


According to AGIBOT’s technical report, GE 2.0 represents a major step forward from the previous generation. Rather than serving only as a perception-and-prediction model, it has evolved into a more complete and practical world simulator. The model can support virtual environments where robot policies can be tested, iterated, and improved through closed-loop simulation, reducing the cost and risk of trial-and-error in the real world while enabling more efficient transfer to physical deployment.

 

GE 2.0 introduces a more complete capability matrix across the world simulation pipeline. It covers key functions including long-horizon generation, multi-view generation, proprioceptive state generation, pseudo real-time inference, and reward judgment. Together, these capabilities form a closed technical loop for world simulation, policy evaluation, and data feedback.


图片2.png

GE 2.0’s world simulation capability matrix

 

One of the model’s key advances is its long-horizon reasoning and generation capability. In long-sequence simulation tasks, GE 2.0 demonstrated strong temporal stability, with visual quality degrading significantly more slowly than industry baseline models. Even when generating continuous video segments of 40 to 50 seconds, GE 2.0 maintained stronger quality than the baseline model achieved within its first 10 seconds.


图片3.png GE 2.0 maintains strong quality in long-horizon generation

 

GE 2.0 also demonstrates strong reliability in closed-loop evaluation. As a world simulator, its core value lies in whether simulation results can accurately reflect outcomes in the physical world. AGIBOT validated the model across multiple closed-loop tasks and found strong correlation between simulation results and real-world performance.

 

图片4.png

Closed-loop evaluation validates GE 2.0’s reliability as a policy evaluator

 

The team further conducted case-by-case rollout comparisons and used confusion matrix analysis to provide quantitative evidence beyond high-level success-rate alignment. These results support GE 2.0’s reliability as a policy evaluator, enabling robot policies to be tested, refined, and improved in simulation before being transferred to real-world deployment.

 

GE 2.0 also enables a data feedback mechanism powered by a reward model. During closed-loop rollout evaluation, the reward model can automatically screen and filter rollout data, selectively feeding effective, high-quality data generated by the world model back into the policy model. Experiments show that this mechanism delivered significant performance gains for the policy model across multiple tasks, demonstrating GE 2.0’s potential not only as a simulator, but also as an engine for continuous policy improvement.

 

图片5.png

GE 2.0 feeds high-quality rollout data back into the policy model

 

As 2026 marks what AGIBOT has described as the beginning of the deployment phase for embodied AI, humanoid robots are moving from laboratory demonstrations toward real-world, large-scale applications. This shift places higher demands on underlying algorithms, especially the ability to evaluate, adapt, and improve before deployment in complex physical environments.

 

AGIBOT remains focused on advancing foundational embodied AI technologies while connecting them with practical industrial value. The WorldArena result and the technical report together demonstrate the potential of the Genie Envisioner technology pathway. Looking ahead, AGIBOT will continue to iterate its world simulator system, strengthen the closed loop between world models and robot policies, and support the scalable deployment of humanoid robots in real-world applications.

 

The GE 2.0 technical report and project resources are available at:

Project page:https://ge-sim-v2.github.io

arxiv: https://arxiv.org/abs/2605.27491

Github:https://github.com/AgibotTech/GE-Sim-V2