To address the challenges in vision-language-action (VLA) models, such as the difficulty of achieving efficient real-time performance, the high cost of pre-training, and the complexity of end-to-end fine-tuning on embodied data due to domain shift and catastrophic forgetting, Dual-System VLA models were introduced.
The development and architectural details of Dual-System VLA models are discussed in the paper.
This repository will be continuously updated, and we warmly welcome contributions from the community. If you have papers, projects, or resources that are not yet included, please feel free to submit them via a pull request or open an issue for discussion.
CALVIN ABC→D
Method | 1 | 2 | 3 | 4 | 5 | Avg. Len. |
---|---|---|---|---|---|---|
Single-System | ||||||
OpenVLA | 91.3 | 77.8 | 62.0 | 52.1 | 43.5 | 3.27 |
UniVLA | 95.5 | 85.8 | 75.4 | 66.9 | 56.5 | 3.80 |
Seer | 94.4 | 87.2 | 79.9 | 72.2 | 64.3 | 3.98 |
Dual-System | ||||||
LCB | 73.6 | 50.2 | 28.5 | 16.0 | 9.9 | 1.78 |
RationalVLA | 74.3 | 58.3 | 42.3 | 30.0 | 20.7 | 2.26 |
Robodual | 94.4 | 82.7 | 72.1 | 62.4 | 54.4 | 3.66 |
OpenHelix | 97.1 | 91.4 | 82.8 | 72.6 | 64.1 | 4.08 |
Method | LIBERO-Spatial | LIBERO-Object | LIBERO-Goal | LIBERO-Long | Avg. |
---|---|---|---|---|---|
Single-System | |||||
OpenVLA | 84.7 | 88.4 | 79.2 | 53.7 | 76.5 |
π0 | 96.8 | 98.8 | 95.8 | 85.2 | 94.2 |
OpenVLA-OFT | 97.6 | 98.4 | 97.9 | 94.5 | 97.1 |
GR00T N1 | 94.4 | 97.6 | 90.6 | 93.9 | 93.9 |
UniVLA | 96.5 | 96.8 | 95.6 | 92.0 | 95.2 |
Seer | - | - | - | 87.7 | - |
Dual-System | |||||
DexVLA | 97.2 | 99.1 | 95.6 | - | - |
Hume | 98.6 | 99.8 | 99.4 | 98.6 | 98.6 |
Title | Venue | Date | Code |
---|---|---|---|
Helix: A Vision-Language-Action Model for Generalist Humanoid Control | - | - | Porject |
Title | Venue | Date | Code |
---|---|---|---|
OneTwoVLA: A Unified Vision-Language-Action Model with Adaptive Reasoning | arXiv | 2025-05-17 | |
NORA: A Small Open-Sourced Generalist Vision Language Action Model for Embodied Tasks | arXiv | 2025-04-28 | |
π0.5: a Vision-Language-Action Model with Open-World Generalization | arXiv | 2025-04-22 | Project |
π0: A Vision-Language-Action Flow Model for General Robot Control | arXiv | 2024-10-31 | Project |
PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation | NeurIPS 2024 | 2024-10-14 | |
MResT: Multi-Resolution Sensing for Real-Time Control with Vision-Language Models | CoRL 2023 | 2024-01-25 |
Title | Venue | Date | Code |
---|---|---|---|
GR00T N1: An Open Foundation Model for Generalist Humanoid Robots | arXiv | 2025-03-18 | Porject |