Stars
We release Evo-RL, the opensource real-world offline RL on So-101 and AgileX PiPER for easier reproduction.
ConLA: Contrastive Latent Action Learning from Human Videos for Robotic Manipulation
The official implementation of VLA-Pruner: Temporal-Aware Dual-Level Visual Token Pruning for Efficient Vision-Language-Action Inference.
This website is for the collection of VLA SOTA results.
Evo-1: Lightweight Vision-Language-Action Model with Preserved Semantic Alignment
U-Arm: Lerobot-Everything-Cross-Embodiment-Teleoperation
Evo-0: Vision-Language-Action Model with Implicit Spatial Understanding.
The open-source CapCut alternative
🔥🔥First-ever hour scale video understanding models
RoboFAC: A Comprehensive Framework for Robotic Failure Analysis and Correction
STI-Bench : Are MLLMs Ready for Precise Spatial-Temporal World Understanding?
🔥🔥MLVU: Multi-task Long Video Understanding Benchmark
