EN
CN
JP
AGIBOT Open-Sources ‘AGIBOT WORLD 2026’ Dataset to Accelerate Embodied AI Development

As robotics research moves beyond controlled lab settings into real-world environments, the demand for large-scale, high-quality data has become increasingly critical. Today, AGIBOT announced the open-source release of AGIBOT WORLD 2026 (agibot-world.com), a heterogeneous dataset designed to systematically support five key research pathways in embodied intelligence.

 

The dataset features structured, high-quality, and precisely annotated real-world robot data, providing developers and researchers with a robust foundation for training next-generation embodied AI systems.


Real world scenarios.gif

 

Pioneering Free-Form Data Collection Strategy

AGIBOT WORLD 2026 spans a wide range of real-world environments, including commercial spaces, homes, and everyday scenarios—capturing the complexity, variability, and unpredictability that robots must handle in practice.

 

Unlike conventional datasets built on repetitive and scripted demonstrations, AGIBOT introduces a free-form data collection strategy, where teleoperators dynamically perform tasks based on real-time conditions.

 

This approach significantly enhances diversity within each episode and improves generalization across multiple dimensions, including object categories, initial configurations, and task execution sequences. The system leverages a flexible wheeled base, articulated head and waist movements, and lift-pitch capabilities to enable efficient, natural, and highly transferable data collection.

 

In parallel, AGIBOT constructs 1:1 digital twin environments in simulation, with all corresponding simulation data released alongside the real-world dataset.


Free-form data collection ensures comprehensive generalization .gif


Bridging the Gap Between Data and Real Robot Behavior

A fundamental question in embodied AI remains: Does the data truly reflect how a robot operates as an integrated system?

 

To address this, AGIBOT introduces several key innovations:

 Whole-Body Control (WBC): Enables coordinated control of arms, waist, and hands, allowing robots to perform tasks more fluidly as a unified system rather than through isolated motions

 First-person beyond-visual-range teleoperation: Aligns the operator’s perception with the robot’s, enabling more intuitive, continuous, and transferable control

 Force-controlled data collection: Incorporates contact dynamics and force feedback, capturing not only motion trajectories but also real physical interactions

 

Together, these capabilities ensure that the dataset more accurately represents real-world robot behavior.


Whole Body Control.gif


Industrial-Grade Hardware and Data Pipeline

The dataset is collected on AGIBOT’s G2 hardware platform, which integrates high-performance joint actuators, multi-modal sensors, and a high-performance domain controller to support precise force control and scalable development.

 

Equipped with Swift Picker and AGIBOT OmniHand, the platform captures synchronized multi-modal data—including RGB(D), tactile signals, LiDAR point clouds, IMU data, and full-body joint states—within a unified pipeline.

 

Each data episode undergoes rigorous cleaning and validation through AGIBOT’s industrial-grade data processing system, ensuring readiness for large-scale model training and research applications.


3.jpg


Phase 1 Release: Imitation Learning

AGIBOT WORLD 2026 will be released in five phases, each aligned with a core research direction in embodied intelligence.

 

The first release focuses on imitation learning, a key paradigm that enables robots to acquire complex physical skills from expert demonstrations.

 

This phase includes hundreds of hours of real-world data collected primarily in commercial and service environments. The dataset pairs:

 Task-level descriptions (segment-level instructions)

 Action sequences (step-by-step execution)

 Atomic skill labels (e.g., pull, place)

 Object annotations (2D bounding boxes and attributes such as name and color)

Importantly, error-recovery trajectories are also retained and annotated.

 

This hierarchical annotation framework—spanning from high-level tasks to low-level actions—provides the fidelity and corrective priors needed to train more robust and adaptive embodied agents.


hierarchical annotation framework.gif


A Long-Term Commitment to the Embodied AI Ecosystem

AGIBOT is among a small group of startups taking a long-term, infrastructure-driven approach to embodied intelligence.

 

Recognizing early that high-quality data is foundational to unlocking the next generation of robotic capabilities, the company has consistently open-sourced million-scale real-world and simulation datasets.

 

The effort reflects the company’s broader goal in embodied intelligence: to democratize access to high-quality robot data. Through the continued evolution of the AGIBOT WORLD ecosystem, AGIBOT aims to contribute to the global robotics community and accelerate the transition of embodied AI from research labs into real-world applications.