Skip to content

MINT-SJTU/Evo-1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

61 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Evo-1: Lightweight Vision-Language-Action Model with Preserved Semantic Alignment [CVPR 2026]

πŸ“„ Paper

πŸ€— HuggingFace Models

πŸ€— HuggingFace Models

πŸ“¦ Dataset

🌍 Website

πŸ“° News

  • πŸ—“οΈ 2026-02-20 β€” Evo-1 is accepted by CVPR 2026 πŸŽ‰
  • πŸ—“οΈ 2025-12-15 β€” Added Evo-1 inference code in Aloha dual arm (Implemented by community user @meijie-jesse)
  • πŸ—“οΈ 2025-11-15 β€” Added Evo-1 inference in the LeRobot framework for SO100/SO101
  • πŸ—“οΈ 2025-11-10 β€” Released inference script in xarm6
  • πŸ—“οΈ 2025-11-06 β€” Released Meta-World & LIBERO evaluation scripts
  • πŸ—“οΈ 2025-11-06 β€” Uploaded model weights to HuggingFace
  • πŸ—“οΈ 2025-11-06 β€” Released official code

βœ… To-Do List

  • βœ… Release inference script in xarm6
  • βœ… Add Evo-1 to the LeRobot framework for SO100/SO101
  • ⬜ Release instructions for deploying Evo-1 on Jetson Orin
  • ⬜ Release results of all 50 RoboTwin tasks
  • ⬜ Release RoboTwin evaluation script

βš™οΈ Installation

Prepare the environment for Evo-1

# Clone this repo git clone https://github.com/MINT-SJTU/Evo-1.git cd Evo-1/ # Create a Conda environment conda create -n Evo1 python=3.10 -y conda activate Evo1 # Install requirements cd Evo_1 pip install -r requirements.txt # You may need to reduce MAX_JOBS to suit your computer # (!!! This is a critical step β€” skipping it may cause lower success rate or unstable robot motion !!!) MAX_JOBS=64 pip install -v flash-attn --no-build-isolation

Simulation Benchmark

πŸ§ͺ Meta-World Benchmark

1️⃣ Prepare the environment for Meta-World

conda create -n metaworld python=3.10 -y conda activate metaworld pip install mujoco pip install metaworld pip install websockets pip install opencv-python pip install packaging pip install huggingface_hub

2️⃣ Model Preparation

πŸ“₯ 2.1 Download Model Weight

hf download MINT-SJTU/Evo1_MetaWorld --local-dir /path/to/save/checkpoint/

✏️ 2.2 Modify config

Modify checkpoint dir: Evo1_server.py#L149
(Optional) Modify server port: Evo1_server.py#L152
(Optional) Modify client port: mt50_evo1_client_prompt.py#L40

3️⃣ Run Meta-World Evaluation

# Terminal 1 conda activate Evo1 cd Evo_1 python scripts/Evo1_server.py
# Terminal 2 conda activate metaworld cd MetaWorld_evaluation python mt50_evo1_client_prompt.py

πŸ§ͺ LIBERO Benchmark

1️⃣ Prepare the environment for LIBERO

conda create -n libero python=3.8.13 -y conda activate libero cd LIBERO_evaluation/ git clone https://github.com/Lifelong-Robot-Learning/LIBERO.git cd LIBERO pip install -r requirements.txt pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu113 pip install -e . pip install websockets pip install huggingface_hub

2️⃣ Model Preparation

πŸ“₯ 2.1 Download Model Weight

hf download MINT-SJTU/Evo1_LIBERO --local-dir /path/to/save/checkpoint/

✏️ 2.2 Modify config

Modify checkpoint dir: Evo1_server.py#L149
Modify ckpt name: libero_client_4tasks.py#L24
(Optional) Modify server port: Evo1_server.py#L152
(Optional) Modify client port: libero_client_4tasks.py#L23

3️⃣ Run LIBERO Evaluation

# Terminal 1 conda activate Evo1 cd Evo_1 python scripts/Evo1_server.py
# Terminal 2 conda activate libero cd LIBERO_evaluation python libero_client_4tasks.py

🧠 Training on Your Own Dataset

We support lerobot v2.1 format, please convert your data to this format.

We use MetaWorld Dataset here as an example.

πŸ“₯ 1. Download Dataset

mkdir Evo1_training_dataset/ cd Evo1_training_dataset/ GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/datasets/MINT-SJTU/Evo1_MetaWorld_Dataset cd Evo1_MetaWorld_Dataset/ git lfs pull

✏️ 2 Modify config

✏️ 2.1 Modify config.yaml

You need to modify the config.yaml

This is used to set the dataset path and the camera mapping.

✏️ 2.2 Set the cache path

You need to change the cache_dir

Set the cache path so the dataset can be loaded from .pkl files next time for faster loading.

πŸš€ 3 Start Training

We use the two-stage training paradigm.

πŸš€ 3.1 Setup deepspeed

accelerate config 

You can check this setup guide


πŸš€ 3.2 Stage 1

We only train the integration module and action expert in stage 1.

If you are training with multiple GPU, set --num_processes to the GPU number.
You need to change the --run_name,--save_dir,--resume_path base on your own config.

conda activate Evo1 cd Evo_1/ accelerate launch --num_processes 1 --num_machines 1 --deepspeed_config_file ds_config.json scripts/train.py --run_name Evo1_metaworld_stage1 --action_head flowmatching --use_augmentation --lr 1e-5 --dropout 0.2 --weight_decay 1e-3 --batch_size 16 --image_size 448 --max_steps 5000 --log_interval 10 --ckpt_interval 2500 --warmup_steps 1000 --grad_clip_norm 1.0 --num_layers 8 --horizon 50 --finetune_action_head --disable_wandb --vlm_name OpenGVLab/InternVL3-1B --dataset_config_path dataset/config.yaml --per_action_dim 24 --state_dim 24 --save_dir /your/path/checkpoints/stage1

πŸš€ 3.3 Stage 2

We perform Full-scale training in stage 2.

conda activate Evo1 cd Evo_1/ accelerate launch --num_processes 1 --num_machines 1 --deepspeed_config_file ds_config.json scripts/train.py --run_name Evo1_metaworld_stage2 --action_head flowmatching --use_augmentation --lr 1e-5 --dropout 0.2 --weight_decay 1e-3 --batch_size 16 --image_size 448 --max_steps 80000 --log_interval 10 --ckpt_interval 2500 --warmup_steps 1000 --grad_clip_norm 1.0 --num_layers 8 --horizon 50 --finetune_vlm --finetune_action_head --disable_wandb --vlm_name OpenGVLab/InternVL3-1B --dataset_config_path dataset/config.yaml --per_action_dim 24 --state_dim 24 --save_dir /your/path/checkpoints/stage2 --resume --resume_pretrain --resume_path /your/path/checkpoints/stage1/step_5000

πŸš€ 3.4 (Optional) Resume

If you want to resume the training process, you can use the following command (we use stage 2 as an example):

accelerate launch --num_processes 1 --num_machines 1 --deepspeed_config_file ds_config.json scripts/train.py --run_name Your_own_name --action_head flowmatching --use_augmentation --lr 1e-5 --dropout 0.2 --weight_decay 1e-3 --batch_size 16 --image_size 448 --max_steps 80000 --log_interval 10 --ckpt_interval 2500 --warmup_steps 1000 --grad_clip_norm 1.0 --num_layers 8 --horizon 50 --finetune_vlm --finetune_action_head --disable_wandb --vlm_name OpenGVLab/InternVL3-1B --dataset_config_path dataset/config.yaml --per_action_dim 24 --state_dim 24 --save_dir /your/path/to/save/the/checkpoints/ --resume --resume_path /the/checkpoint/path/you/want/to/resume/from/step_20000

🦾 4. Inference in Your Own Embodiment

We provide an example of inference client script Evo1_client_xarm6 for xArm6.

The key is to construct an observation dict and pass it to the server.

 obs = { # You need to change the image size to 448x448 before send in obs "image": [base_proc.tolist(), wrist_proc.tolist(), dummy_proc.tolist()], # This shows which image is valid. "image_mask": [int(i) for i in [1, 1, 0]], # This is the state of the robot. "state": state.astype(float).tolist(), # This is the action mask that shows which action is valid. "action_mask": [[int(i) for i in action_mask[0]]], # This is the instruction of the task "prompt": task_instruction } try: # Send the observation to the server await ws.send(json.dumps(obs)) result = await ws.recv() # Get the action chunk action_chunk = torch.tensor(json.loads(result)) except Exception as e: print(f"❌ Inference Error: {e}") await asyncio.sleep(0.5) continue 

πŸ€– 5.Inference in Lerobot SO100/SO101

We add our policy in /so100_evo1/lerobot-main/src/lerobot/policies/evo1/

πŸ”§ 5.1 Environment Setup for Collecting LeRobot v2.1 Data

The environment for data collection is different from the environment used for evaluation, because collecting demonstrations requires compatibility with the LeRobot v2.1 dataset format.

# Create and activate the conda environment for data collection conda create -y -n lerobot python=3.10 conda activate lerobot # Clone the LeRobot repository git clone https://github.com/huggingface/lerobot.git cd lerobot # Checkout the version compatible with v2.1 data format git checkout v0.3.2 pip install -e . pip install -e ".[feetech]"

πŸ”§ 5.2 Environment Setup for Evaluation

#Prepare the environment for Evo1_SO100 cd Evo_1/so100_evo1/ conda create -n Evo1_SO100 python=3.10 conda activate Evo1_SO100 #Install FlashAttention wget https://ghproxy.net/https://github.com/Dao-AILab/flash-attention/releases/download/v2.8.3/flash_attn-2.8.3+cu12torch2.7cxx11abiTRUE-cp310-cp310-linux_x86_64.whl pip install flash_attn-2.8.3+cu12torch2.7cxx11abiTRUE-cp310-cp310-linux_x86_64.whl #Install LeRobot conda install ffmpeg -c conda-forge cd lerobot-main pip install -e. pip install -e ".[feetech]" cd Evo_1/so100_evo1/ #Set your own LEROBOT_HOME which include the calibration file of so100 export HF_LEROBOT_HOME="Adress of your own LEROBOT_HOME" pip install transformers accelerate pip install timm

✏️ 5.3 Checkpoint modification

After you trained your model, you need to modify the checkpoint file to make it compatible with Lerobot SO100.

5.3.1 Change the name of the config file

Rename the original file "config.json" to "model_config.json"

5.3.2 Change camera name and image shape

Create a new config.json based on model_config.json.

We provide an example in SO100_example_checkpoint

hf download MINT-SJTU/Evo1_SO100 --local-dir /path/to/save/checkpoint/

The key is to change the camera name, image shape and rewrite the config.json to satisfy the Lerobot framework.

πŸš€ 5.4 Run the Lerobot SO100/SO101

#Run the command cd Evo-1/so100_evo1 lerobot-record \ --robot.type=so100_follower \ --robot.port=/dev/ttyACMXXXXXXX \ --robot.id=your_so100_follower_arm_id \ --robot.cameras="{   front: {type: opencv, index_or_path: 2, width: 640, height: 480, fps: 30},  wrist: {type: opencv, index_or_path: 0, width: 640, height: 480, fps: 30}  }" \ --display_data=true \ --dataset.repo_id=${HF_USER}/eval_evo1 \ --dataset.single_task="prompt of your task" \ --policy.path= /path/of/your/checkpoint/ #Command example lerobot-record \ --robot.type=so100_follower \ --robot.port=/dev/ttyACM1 \ --robot.id=new_follower_arm \ --robot.cameras="{   front_env: {type: opencv, index_or_path: 2, width: 640, height: 480, fps: 30},  side_env: {type: opencv, index_or_path: 0, width: 640, height: 480, fps: 30}  }" \ --display_data=true \ --dataset.repo_id=yinxinyuchen/eval_evo1 \ --dataset.single_task="Grab the green cube and put the cube in the green box" \ --policy.path=/home/dell/step_20000/

For reference, we also provide a recording that demonstrates how to evaluate Evo1 on SO100/SO101. If you already have a trained checkpoint, please refer to the following links:
YouTube
bilibili

πŸ“š Citation

@article{lin2025evo, title={Evo-1: Lightweight Vision-Language-Action Model with Preserved Semantic Alignment}, author={Lin, Tao and Zhong, Yilei and Du, Yuxin and Zhang, Jingjing and Liu, Jiting and Chen, Yinxinyu and Gu, Encheng and Liu, Ziyan and Cai, Hongyi and Zou, Yanwen and others}, journal={arXiv preprint arXiv:2511.04555}, year={2025} }

πŸ“¬ Contact

If you encounter any issues or have suggestions,
please open an issue or start a discussion on GitHub.
We sincerely welcome your feedback and contributions.

You can also scan the QR code below to connect with me or join chatting group on WeChat:

About

Evo-1: Lightweight Vision-Language-Action Model with Preserved Semantic Alignment

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors