- ποΈ 2026-02-20 β Evo-1 is accepted by CVPR 2026 π
- ποΈ 2025-12-15 β Added Evo-1 inference code in Aloha dual arm (Implemented by community user @meijie-jesse)
- ποΈ 2025-11-15 β Added Evo-1 inference in the LeRobot framework for SO100/SO101
- ποΈ 2025-11-10 β Released inference script in xarm6
- ποΈ 2025-11-06 β Released Meta-World & LIBERO evaluation scripts
- ποΈ 2025-11-06 β Uploaded model weights to HuggingFace
- ποΈ 2025-11-06 β Released official code
- β Release inference script in xarm6
- β Add Evo-1 to the LeRobot framework for SO100/SO101
- β¬ Release instructions for deploying Evo-1 on Jetson Orin
- β¬ Release results of all 50 RoboTwin tasks
- β¬ Release RoboTwin evaluation script
Prepare the environment for Evo-1
# Clone this repo git clone https://github.com/MINT-SJTU/Evo-1.git cd Evo-1/ # Create a Conda environment conda create -n Evo1 python=3.10 -y conda activate Evo1 # Install requirements cd Evo_1 pip install -r requirements.txt # You may need to reduce MAX_JOBS to suit your computer # (!!! This is a critical step β skipping it may cause lower success rate or unstable robot motion !!!) MAX_JOBS=64 pip install -v flash-attn --no-build-isolationconda create -n metaworld python=3.10 -y conda activate metaworld pip install mujoco pip install metaworld pip install websockets pip install opencv-python pip install packaging pip install huggingface_hubhf download MINT-SJTU/Evo1_MetaWorld --local-dir /path/to/save/checkpoint/Modify checkpoint dir: Evo1_server.py#L149
(Optional) Modify server port: Evo1_server.py#L152
(Optional) Modify client port: mt50_evo1_client_prompt.py#L40
# Terminal 1 conda activate Evo1 cd Evo_1 python scripts/Evo1_server.py# Terminal 2 conda activate metaworld cd MetaWorld_evaluation python mt50_evo1_client_prompt.pyconda create -n libero python=3.8.13 -y conda activate libero cd LIBERO_evaluation/ git clone https://github.com/Lifelong-Robot-Learning/LIBERO.git cd LIBERO pip install -r requirements.txt pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu113 pip install -e . pip install websockets pip install huggingface_hubhf download MINT-SJTU/Evo1_LIBERO --local-dir /path/to/save/checkpoint/Modify checkpoint dir: Evo1_server.py#L149
Modify ckpt name: libero_client_4tasks.py#L24
(Optional) Modify server port: Evo1_server.py#L152
(Optional) Modify client port: libero_client_4tasks.py#L23
# Terminal 1 conda activate Evo1 cd Evo_1 python scripts/Evo1_server.py# Terminal 2 conda activate libero cd LIBERO_evaluation python libero_client_4tasks.pyWe support lerobot v2.1 format, please convert your data to this format.
We use MetaWorld Dataset here as an example.
mkdir Evo1_training_dataset/ cd Evo1_training_dataset/ GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/datasets/MINT-SJTU/Evo1_MetaWorld_Dataset cd Evo1_MetaWorld_Dataset/ git lfs pullYou need to modify the config.yaml
This is used to set the dataset path and the camera mapping.
You need to change the cache_dir
Set the cache path so the dataset can be loaded from .pkl files next time for faster loading.
We use the two-stage training paradigm.
accelerate config You can check this setup guide
We only train the integration module and action expert in stage 1.
If you are training with multiple GPU, set --num_processes to the GPU number.
You need to change the --run_name,--save_dir,--resume_path base on your own config.
conda activate Evo1 cd Evo_1/ accelerate launch --num_processes 1 --num_machines 1 --deepspeed_config_file ds_config.json scripts/train.py --run_name Evo1_metaworld_stage1 --action_head flowmatching --use_augmentation --lr 1e-5 --dropout 0.2 --weight_decay 1e-3 --batch_size 16 --image_size 448 --max_steps 5000 --log_interval 10 --ckpt_interval 2500 --warmup_steps 1000 --grad_clip_norm 1.0 --num_layers 8 --horizon 50 --finetune_action_head --disable_wandb --vlm_name OpenGVLab/InternVL3-1B --dataset_config_path dataset/config.yaml --per_action_dim 24 --state_dim 24 --save_dir /your/path/checkpoints/stage1We perform Full-scale training in stage 2.
conda activate Evo1 cd Evo_1/ accelerate launch --num_processes 1 --num_machines 1 --deepspeed_config_file ds_config.json scripts/train.py --run_name Evo1_metaworld_stage2 --action_head flowmatching --use_augmentation --lr 1e-5 --dropout 0.2 --weight_decay 1e-3 --batch_size 16 --image_size 448 --max_steps 80000 --log_interval 10 --ckpt_interval 2500 --warmup_steps 1000 --grad_clip_norm 1.0 --num_layers 8 --horizon 50 --finetune_vlm --finetune_action_head --disable_wandb --vlm_name OpenGVLab/InternVL3-1B --dataset_config_path dataset/config.yaml --per_action_dim 24 --state_dim 24 --save_dir /your/path/checkpoints/stage2 --resume --resume_pretrain --resume_path /your/path/checkpoints/stage1/step_5000If you want to resume the training process, you can use the following command (we use stage 2 as an example):
accelerate launch --num_processes 1 --num_machines 1 --deepspeed_config_file ds_config.json scripts/train.py --run_name Your_own_name --action_head flowmatching --use_augmentation --lr 1e-5 --dropout 0.2 --weight_decay 1e-3 --batch_size 16 --image_size 448 --max_steps 80000 --log_interval 10 --ckpt_interval 2500 --warmup_steps 1000 --grad_clip_norm 1.0 --num_layers 8 --horizon 50 --finetune_vlm --finetune_action_head --disable_wandb --vlm_name OpenGVLab/InternVL3-1B --dataset_config_path dataset/config.yaml --per_action_dim 24 --state_dim 24 --save_dir /your/path/to/save/the/checkpoints/ --resume --resume_path /the/checkpoint/path/you/want/to/resume/from/step_20000We provide an example of inference client script Evo1_client_xarm6 for xArm6.
The key is to construct an observation dict and pass it to the server.
obs = { # You need to change the image size to 448x448 before send in obs "image": [base_proc.tolist(), wrist_proc.tolist(), dummy_proc.tolist()], # This shows which image is valid. "image_mask": [int(i) for i in [1, 1, 0]], # This is the state of the robot. "state": state.astype(float).tolist(), # This is the action mask that shows which action is valid. "action_mask": [[int(i) for i in action_mask[0]]], # This is the instruction of the task "prompt": task_instruction } try: # Send the observation to the server await ws.send(json.dumps(obs)) result = await ws.recv() # Get the action chunk action_chunk = torch.tensor(json.loads(result)) except Exception as e: print(f"β Inference Error: {e}") await asyncio.sleep(0.5) continue We add our policy in /so100_evo1/lerobot-main/src/lerobot/policies/evo1/
The environment for data collection is different from the environment used for evaluation, because collecting demonstrations requires compatibility with the LeRobot v2.1 dataset format.
# Create and activate the conda environment for data collection conda create -y -n lerobot python=3.10 conda activate lerobot # Clone the LeRobot repository git clone https://github.com/huggingface/lerobot.git cd lerobot # Checkout the version compatible with v2.1 data format git checkout v0.3.2 pip install -e . pip install -e ".[feetech]"#Prepare the environment for Evo1_SO100 cd Evo_1/so100_evo1/ conda create -n Evo1_SO100 python=3.10 conda activate Evo1_SO100 #Install FlashAttention wget https://ghproxy.net/https://github.com/Dao-AILab/flash-attention/releases/download/v2.8.3/flash_attn-2.8.3+cu12torch2.7cxx11abiTRUE-cp310-cp310-linux_x86_64.whl pip install flash_attn-2.8.3+cu12torch2.7cxx11abiTRUE-cp310-cp310-linux_x86_64.whl #Install LeRobot conda install ffmpeg -c conda-forge cd lerobot-main pip install -e. pip install -e ".[feetech]" cd Evo_1/so100_evo1/ #Set your own LEROBOT_HOME which include the calibration file of so100 export HF_LEROBOT_HOME="Adress of your own LEROBOT_HOME" pip install transformers accelerate pip install timmAfter you trained your model, you need to modify the checkpoint file to make it compatible with Lerobot SO100.
Rename the original file "config.json" to "model_config.json"
Create a new config.json based on model_config.json.
We provide an example in SO100_example_checkpoint
hf download MINT-SJTU/Evo1_SO100 --local-dir /path/to/save/checkpoint/The key is to change the camera name, image shape and rewrite the config.json to satisfy the Lerobot framework.
#Run the command cd Evo-1/so100_evo1 lerobot-record \ --robot.type=so100_follower \ --robot.port=/dev/ttyACMXXXXXXX \ --robot.id=your_so100_follower_arm_id \ --robot.cameras="{ front: {type: opencv, index_or_path: 2, width: 640, height: 480, fps: 30}, wrist: {type: opencv, index_or_path: 0, width: 640, height: 480, fps: 30} }" \ --display_data=true \ --dataset.repo_id=${HF_USER}/eval_evo1 \ --dataset.single_task="prompt of your task" \ --policy.path= /path/of/your/checkpoint/ #Command example lerobot-record \ --robot.type=so100_follower \ --robot.port=/dev/ttyACM1 \ --robot.id=new_follower_arm \ --robot.cameras="{ front_env: {type: opencv, index_or_path: 2, width: 640, height: 480, fps: 30}, side_env: {type: opencv, index_or_path: 0, width: 640, height: 480, fps: 30} }" \ --display_data=true \ --dataset.repo_id=yinxinyuchen/eval_evo1 \ --dataset.single_task="Grab the green cube and put the cube in the green box" \ --policy.path=/home/dell/step_20000/For reference, we also provide a recording that demonstrates how to evaluate Evo1 on SO100/SO101. If you already have a trained checkpoint, please refer to the following links:
YouTube
bilibili
@article{lin2025evo, title={Evo-1: Lightweight Vision-Language-Action Model with Preserved Semantic Alignment}, author={Lin, Tao and Zhong, Yilei and Du, Yuxin and Zhang, Jingjing and Liu, Jiting and Chen, Yinxinyu and Gu, Encheng and Liu, Ziyan and Cai, Hongyi and Zou, Yanwen and others}, journal={arXiv preprint arXiv:2511.04555}, year={2025} }If you encounter any issues or have suggestions,
please open an issue or start a discussion on GitHub.
We sincerely welcome your feedback and contributions.
You can also scan the QR code below to connect with me or join chatting group on WeChat:

