SJTU & Evo-Tech
Architecture Overview
- Open real-world RL on two platforms: we build and release full real-world RL pipelines on SO101 and AgileX (PiPER/PiPER-X).
- Open code, models, and datasets for reproducibility: we continuously release runnable offline RL assets so more people can reproduce results and apply them to real-world tasks.
- Open algorithm and community co-evolution: we reproduce existing real-world RL methods, propose new methods, and keep publishing data/benchmarks to grow a collaborative open-source community.
- [2026-03-07] Added AgileX (PiPER/PiPER-X) support for real-world RL.
- [2026-02-26] First SO101 real-world RL baseline and reproducible CLI workflow are released.
| Getting Started | Training Pipeline | Project Info |
|---|---|---|
| ⚡ Quick Start | 4) Value Function Training | Model & Dataset |
| 1) Installation | 5) Value Inference | Community Channels |
| 2) Hardware Setup | 6) Policy Training | Affiliations |
| 3) Data Collection | 7) Closed-loop Rollout and Next Round | Citation / License |
Value Visual Results
Success Case
Failure Case
Policy Rollout Visual Results
Human-in-the-Loop Visual Results
LeRobot-aligned foundation: we use LeRobot as the base of this codebase because its inference and data-collection logic are highly aligned with real-world RL workflows.
git clone https://github.com/MINT-SJTU/Evo-RL.git cd Evo-RL conda create -y -n evo-rl python=3.10 conda activate evo-rl pip install -e .For setup details and platform-specific dependencies, follow the official LeRobot configuration guide.
For SO-series setup, please follow the official tutorial in detail and complete all installation and configuration steps there before continuing. The examples below use SO101 as the reference configuration.
Recommended path strategy:
- Robot serial: use
/dev/serial/by-id/(stable across reboots). - Cameras: prefer
/dev/v4l/by-id/; if IDs are not unique, use/dev/v4l/by-path/. - In examples below: robot ports use
by-id, camera paths useby-path.
You can inspect available stable paths with:
ls -l /dev/serial/by-id/ ls -l /dev/v4l/by-id/ ls -l /dev/v4l/by-path/For single-arm users, no major changes are required. After setup, run the command below to verify your system is ready for the next stage:
lerobot-teleoperate \ --robot.type=so101_follower \ --robot.port=/dev/serial/by-id/<SO101_FOLLOWER_PORT> \ --robot.id=my_so101_follower \ --teleop.type=so101_leader \ --teleop.port=/dev/serial/by-id/<SO101_LEADER_PORT> \ --teleop.id=my_so101_leaderFor dual-arm users, we recommend mirroring the mechanical parts corresponding to servos 4/5/6 on the left leader and left follower arms, which usually provides a more natural bimanual operation feel.
Before running the dual-arm command, make sure calibration files exist under ~/.cache/huggingface/lerobot/calibration/ like:
calibration/ ├── robots │ └── so_follower │ ├── bi_so101_follower_left.json │ └── bi_so101_follower_right.json └── teleoperators └── so_leader ├── bi_so101_leader_left.json └── bi_so101_leader_right.json This layout is slightly different from single-arm setup.
Then run this command to verify dual-arm setup:
lerobot-teleoperate \ --robot.type=bi_so_follower \ --robot.left_arm_config.port=/dev/serial/by-id/<LEFT_FOLLOWER_PORT> \ --robot.right_arm_config.port=/dev/serial/by-id/<RIGHT_FOLLOWER_PORT> \ --robot.id=bi_so101_follower \ --teleop.type=bi_so_leader \ --teleop.left_arm_config.port=/dev/serial/by-id/<LEFT_LEADER_PORT> \ --teleop.right_arm_config.port=/dev/serial/by-id/<RIGHT_LEADER_PORT> \ --teleop.id=bi_so101_leaderBefore data collection, validate camera mapping first.
Check whether each camera supports your target setting (for example, 640x480 @ 30):
v4l2-ctl -d /dev/v4l/by-path/<CAM_PATH> --list-formats-extSingle-arm camera check (example):
lerobot-teleoperate \ --robot.type=so101_follower \ --robot.port=/dev/serial/by-id/<SO101_FOLLOWER_PORT> \ --robot.id=my_so101_follower \ --robot.cameras='{ front: {type: opencv, index_or_path: "/dev/v4l/by-path/<FRONT_CAM>", width: 640, height: 480, fps: 30}}' \ --teleop.type=so101_leader \ --teleop.port=/dev/serial/by-id/<SO101_LEADER_PORT> \ --teleop.id=my_so101_leader \ --display_data=trueDual-arm camera check (example):
lerobot-teleoperate \ --robot.type=bi_so_follower \ --robot.left_arm_config.port=/dev/serial/by-id/<LEFT_FOLLOWER_PORT> \ --robot.right_arm_config.port=/dev/serial/by-id/<RIGHT_FOLLOWER_PORT> \ --robot.id=my_bi_so101_follower \ --robot.left_arm_config.cameras='{ wrist: {type: opencv, index_or_path: "/dev/v4l/by-path/<LEFT_WRIST_CAM_PATH>", width: 640, height: 480, fps: 30}}' \ --robot.right_arm_config.cameras='{ wrist: {type: opencv, index_or_path: "/dev/v4l/by-path/<RIGHT_WRIST_CAM_PATH>", width: 640, height: 480, fps: 30}, front: {type: opencv, index_or_path: "/dev/v4l/by-path/<FRONT_CAM_PATH>", width: 640, height: 480, fps: 30}}' \ --teleop.type=bi_so_leader \ --teleop.left_arm_config.port=/dev/serial/by-id/<LEFT_LEADER_PORT> \ --teleop.right_arm_config.port=/dev/serial/by-id/<RIGHT_LEADER_PORT> \ --teleop.id=my_bi_so101_leader \ --display_data=trueFor dual-arm camera mapping, it is fine to attach front under either the left-arm or right-arm camera config. If you use more camera views, place them under either the left or right arm camera config as well.
If needed, you can also use temporary device paths (for example /dev/ttyACM* and /dev/video*) during initial debugging.
PiPER arms in master/teaching mode cannot receive external control commands, so all arms must be configured to follower/motion-output mode (0xFC), and firmware must be version 1.8.5 or above.
For PiPER-series robots, make sure Git LFS assets are pulled before running teleoperation:
git lfs pull --include="src/lerobot/assets/piper_description/**,src/lerobot/assets/piper_x_description/**" --exclude="*" git lfs checkout src/lerobot/assets/piper_description src/lerobot/assets/piper_x_descriptionFor PiPER setup, PiPER uses CAN interfaces instead of serial ports. So first run lerobot-setup-can to confirm CAN interfaces are available:
lerobot-setup-can --mode=setup --interfaces=<LEFT_FOLLOWER_CAN_PORT>,<LEFT_LEADER_CAN_PORT>,<RIGHT_FOLLOWER_CAN_PORT>,<RIGHT_LEADER_CAN_PORT>For single-arm users, run the command below to verify the system is ready:
lerobot-teleoperate \ --robot.type=piperx_follower \ --robot.port=<FOLLOWER_CAN_PORT> \ --robot.id=my_piperx_follower \ --robot.require_calibration=false \ --teleop.type=piperx_leader \ --teleop.port=<LEADER_CAN_PORT> \ --teleop.id=my_piperx_leader \ --teleop.require_calibration=falseFor bimanual users, run this command to verify dual-arm teleoperation:
lerobot-teleoperate \ --robot.type=bi_piperx_follower \ --robot.id=my_bi_piperx_follower \ --robot.left_arm_config.port=<LEFT_FOLLOWER_CAN_PORT> \ --robot.right_arm_config.port=<RIGHT_FOLLOWER_CAN_PORT> \ --robot.left_arm_config.require_calibration=false \ --robot.right_arm_config.require_calibration=false \ --teleop.type=bi_piperx_leader \ --teleop.id=my_bi_piperx_leader \ --teleop.left_arm_config.port=<LEFT_LEADER_CAN_PORT> \ --teleop.right_arm_config.port=<RIGHT_LEADER_CAN_PORT> \ --teleop.left_arm_config.require_calibration=false \ --teleop.right_arm_config.require_calibration=falseFor PiPER (non-X), replace bi_piperx_follower/bi_piperx_leader with bi_piper_follower/bi_piper_leader.
Collect rollout data with lerobot-human-inloop-record.
Bimanual template:
lerobot-human-inloop-record \ --robot.type=bi_so_follower \ --robot.left_arm_config.port=/dev/serial/by-id/<LEFT_FOLLOWER_PORT> \ --robot.right_arm_config.port=/dev/serial/by-id/<RIGHT_FOLLOWER_PORT> \ --robot.id=my_bi_so101_follower \ --robot.left_arm_config.cameras='{ wrist: {type: opencv, index_or_path: "/dev/v4l/by-path/<LEFT_WRIST_CAM_PATH>", width: 640, height: 480, fps: 30, fourcc: "MJPG"}}' \ --robot.right_arm_config.cameras='{ wrist: {type: opencv, index_or_path: "/dev/v4l/by-path/<RIGHT_WRIST_CAM_PATH>", width: 640, height: 480, fps: 30, fourcc: "MJPG"}, front: {type: intelrealsense, serial_number_or_name: "<REALSENSE_SN>", width: 640, height: 480, fps: 30, warmup_s: 2}}' \ --teleop.type=bi_so_leader \ --teleop.left_arm_config.port=/dev/serial/by-id/<LEFT_LEADER_PORT> \ --teleop.right_arm_config.port=/dev/serial/by-id/<RIGHT_LEADER_PORT> \ --teleop.id=my_bi_so101_leader \ --dataset.repo_id=<HF_USERNAME_OR_ORG>/<DATASET_NAME> \ --dataset.single_task="<YOUR_TASK_DESCRIPTION>" \ --dataset.num_episodes=<NUM_EPISODES> \ --dataset.episode_time_s=<EPISODE_SECONDS> \ --dataset.reset_time_s=<RESET_SECONDS> \ --dataset.push_to_hub=true \ --display_data=trueRecommendation: use fourcc: "MJPG" for OpenCV and warmup_s for RealSense. In this example front uses RealSense, but you can switch it to OpenCV with the same structure.
Bimanual template (left/right, PiPER-X example):
lerobot-human-inloop-record \ --robot.type=bi_piperx_follower \ --robot.id=my_bi_piperx_follower \ --robot.left_arm_config.port=<LEFT_FOLLOWER_CAN_PORT> \ --robot.right_arm_config.port=<RIGHT_FOLLOWER_CAN_PORT> \ --robot.left_arm_config.require_calibration=false \ --robot.right_arm_config.require_calibration=false \ --teleop.type=bi_piperx_leader \ --teleop.id=my_bi_piperx_leader \ --teleop.left_arm_config.port=<LEFT_LEADER_CAN_PORT> \ --teleop.right_arm_config.port=<RIGHT_LEADER_CAN_PORT> \ --teleop.left_arm_config.require_calibration=false \ --teleop.right_arm_config.require_calibration=false \ --dataset.repo_id=<HF_USERNAME_OR_ORG>/<DATASET_NAME> \ --dataset.single_task="<YOUR_TASK_DESCRIPTION>" \ --dataset.num_episodes=<NUM_EPISODES> \ --dataset.episode_time_s=<EPISODE_SECONDS> \ --dataset.reset_time_s=<RESET_SECONDS> \ --dataset.push_to_hub=true \ --display_data=trueHotkeys:
i: toggle intervention mode (policy <-> teleop takeover)s: mark success and end current episodef: mark failure and end current episodeRight Arrow: end the current loop earlyLeft Arrow: end early and re-record the current episodeEsc: stop the recording session
Quick quality check:
lerobot-dataset-report --dataset <HF_USERNAME_OR_ORG>/<DATASET_NAME>This prints: dataset meta, totals, episode-length stats/histogram, success/intervention metrics, task list, and full feature schema.
Train the value function on the current dataset. Current default: Pi*0.6 (--value.type=pistar06).
Single-GPU template:
lerobot-value-train \ --dataset.repo_id=<HF_USERNAME_OR_ORG>/<DATASET_NAME> \ --value.type=pistar06 \ --value.dtype=bfloat16 \ --value.push_to_hub=true \ --value.repo_id=<HF_USERNAME_OR_ORG>/<VALUE_MODEL_REPO> \ --batch_size=64 \ --output_dir=outputs/value_train/<RUN_NAME> \ --job_name=<RUN_NAME> \ --wandb.enable=trueMulti-GPU template:
CUDA_VISIBLE_DEVICES=<GPU_ID_LIST> accelerate launch \ --multi_gpu \ --num_processes=<NUM_GPUS> \ --mixed_precision=bf16 \ $(which lerobot-value-train) \ --batch_size=32/<NUM_GPUS> \ <VALUE_TRAIN_ARGS>To plug in a different value function, minimal path in this repo:
- Add
src/lerobot/values/<your_value>/configuration_<your_value>.pywith@PreTrainedConfig.register_subclass("<your_value>"). - Add
src/lerobot/values/<your_value>/modeling_<your_value>.pywith<YourValue>Policy(PreTrainedPolicy)(implement at leastforward,predict_value, andbuild_training_raw_batch_hookforlerobot-value-train). - Add
src/lerobot/values/<your_value>/processor_<your_value>.pywithmake_<your_value>_pre_post_processors(...). - Remove/replace the current
pistar06-only type checks insrc/lerobot/configs/value_train.pyandsrc/lerobot/scripts/lerobot_value_infer.py.
Infer value signals and write value/advantage/indicator back to the dataset:
value: estimated return-to-go of the current frame.advantage: relative improvement signal (higher means better-than-baseline trajectory quality).indicator: binarized training tag derived from advantage.
Single-GPU template:
lerobot-value-infer \ --dataset.repo_id=<HF_USERNAME_OR_ORG>/<DATASET_NAME> \ --inference.checkpoint_path=outputs/value_train/<RUN_NAME> \ --runtime.device=cuda \ --runtime.batch_size=64 \ --acp.enable=true \ --acp.n_step=50 \ --acp.positive_ratio=0.3 \ --acp.value_field=complementary_info.value_<TAG> \ --acp.advantage_field=complementary_info.advantage_<TAG> \ --acp.indicator_field=complementary_info.acp_indicator_<TAG> \ --output_dir=outputs/value_infer/<RUN_NAME> \ --job_name=<RUN_NAME>.inferMulti-GPU template:
CUDA_VISIBLE_DEVICES=<GPU_ID_LIST> accelerate launch \ --multi_gpu \ --num_processes=<NUM_GPUS> \ --mixed_precision=bf16 \ $(which lerobot-value-infer) \ <VALUE_INFER_ARGS>Parameter notes:
--acp.n_step: n-step advantage horizon. --acp.positive_ratio: positive label ratio after advantage binarization (e.g., 0.3 = top 30% per task).Expected new columns:
complementary_info.value_<TAG> complementary_info.advantage_<TAG> complementary_info.acp_indicator_<TAG>These columns are written back to the original dataset specified by --dataset.repo_id.
Train the policy with advantage-conditioned tags. Policy requirement: it must support text/task input, because Advantage-Conditioned tags are injected into task text.
Single-GPU template:
lerobot-train \ --dataset.repo_id=<HF_USERNAME_OR_ORG>/<DATASET_NAME> \ --policy.type=<POLICY_TYPE> \ --policy.pretrained_path=<POLICY_PRETRAINED_PATH> \ --policy.device=cuda \ --policy.dtype=bfloat16 \ --batch_size=32 \ --steps=30000 \ --acp.enable=true \ --acp.indicator_field=complementary_info.acp_indicator_<TAG> \ --acp.indicator_dropout_prob=0.3 \ --output_dir=outputs/train/<RUN_NAME> \ --job_name=<RUN_NAME> \ --wandb.enable=true \ --policy.push_to_hub=true \ --policy.repo_id=<HF_USERNAME_OR_ORG>/<POLICY_REPO>--acp.indicator_dropout_prob controls tag drop rate in task text; 0.3 helps learn both tagged and untagged conditions.
Important checks:
--acp.indicator_fieldmust exist in the dataset and be binary (0/1).
Multi-GPU template:
CUDA_VISIBLE_DEVICES=<GPU_ID_LIST> accelerate launch \ --multi_gpu \ --num_processes=<NUM_GPUS> \ --mixed_precision=bf16 \ $(which lerobot-train) \ --batch_size=32/<NUM_GPUS> \ <POLICY_TRAIN_ARGS>Deploy the trained policy in human-in-loop mode and collect the next dataset round:
lerobot-human-inloop-record \ --robot.type=bi_so_follower \ --robot.left_arm_config.port=/dev/serial/by-id/<LEFT_FOLLOWER_PORT> \ --robot.right_arm_config.port=/dev/serial/by-id/<RIGHT_FOLLOWER_PORT> \ --robot.id=my_bi_so101_follower \ --robot.left_arm_config.cameras='{ wrist: {type: opencv, index_or_path: "/dev/v4l/by-path/<LEFT_WRIST_CAM_PATH>", width: 640, height: 480, fps: 30, fourcc: "MJPG"}}' \ --robot.right_arm_config.cameras='{ wrist: {type: opencv, index_or_path: "/dev/v4l/by-path/<RIGHT_WRIST_CAM_PATH>", width: 640, height: 480, fps: 30, fourcc: "MJPG"}, front: {type: intelrealsense, serial_number_or_name: "<REALSENSE_SN>", width: 640, height: 480, fps: 30, warmup_s: 2}}' \ --teleop.type=bi_so_leader \ --teleop.left_arm_config.port=/dev/serial/by-id/<LEFT_LEADER_PORT> \ --teleop.right_arm_config.port=/dev/serial/by-id/<RIGHT_LEADER_PORT> \ --teleop.id=my_bi_so101_leader \ --dataset.repo_id=<HF_USERNAME_OR_ORG>/<DATASET_NAME_NEXT_ROUND> \ --dataset.single_task="<YOUR_TASK_DESCRIPTION>" \ --dataset.num_episodes=<NUM_EPISODES> \ --dataset.episode_time_s=<EPISODE_SECONDS> \ --dataset.reset_time_s=<RESET_SECONDS> \ --dataset.push_to_hub=true \ --display_data=true \ --policy.path=<POLICY_CHECKPOINT_OR_HUB_ID> \ --resume=trueDataset continuation options:
- Append in place: keep
--resume=trueand continue recording into the same dataset. - Merge multiple rounds: use the official dataset editor to merge separate datasets.
lerobot-edit-dataset \ --repo_id=<HF_USERNAME_OR_ORG>/<MERGED_DATASET_NAME> \ --operation.type=merge \ --operation.repo_ids="['<HF_USERNAME_OR_ORG>/<DATASET_ROUND_1>','<HF_USERNAME_OR_ORG>/<DATASET_ROUND_2>']"Additional data attributes vs default lerobot-record behavior:
complementary_info.policy_action: policy output action at each step.complementary_info.is_intervention: whether current step is in intervention.complementary_info.state: intervention state-machine state.complementary_info.collector_policy_id: step-level action source ID (humanor policy ID).- Episode metadata
episode_success: success/failure label saved per episode.
Iterative training loop (abstract):
[Multi-task demonstration data pool] | v [Offline RL pretraining for a vision-language-action policy] | v [Task-specific initialization / fine-tuning from demonstrations] | v |---- Iteration k = 1..K -------------------------------------| | 1) Deploy current policy π_k and collect new rollout data | | 2) Merge into data pool: D <- D U new_data | | 3) Train value function on D | | 4) Infer advantage and binarize into indicator tags | | 5) Train advantage-conditioned policy to get π_{k+1} | |-------------------------------------------------------------| | v [Stronger policy with improved success rate and throughput] - Hugging Face model release: coming soon
- Hugging Face dataset release: coming soon
- Once published, this section will pin canonical repos and exact version tags.
- WeChat official post: Coming Soon
- Documentation:
docs/README.md - GitHub Issues: Create an issue
- Email: business@evomind-tech.com
- WeChat group QR code:
@misc{evorl2026, title = {Evo-RL: Towards Iterative Policy Improvement in Real-World Offline RL}, author = {Evo-RL Contributors}, year = {2026}, howpublished = {\url{https://github.com/MINT-SJTU/Evo-RL}} }Apache-2.0. See LICENSE.









