HDMI is a framework that enables humanoid robots to acquire diverse whole-body interaction skills directly from monocular RGB videos of human demonstrations. This repository contains the official training code for HDMI: Learning Interactive Humanoid Whole-Body Control from Human Videos.
Set up the environment, then install IsaacSim, IsaacLab, and HDMI:
# 1) Conda env conda create -n hdmi python=3.10 -y conda activate hdmi # 2) IsaacSim pip install "isaacsim[all,extscache]==4.5.0" --extra-index-url https://pypi.nvidia.com isaacsim # test isaacsim # 3) IsaacLab cd .. git clone git@github.com:isaac-sim/IsaacLab.git cd IsaacLab git checkout v2.2.0 ./isaaclab.sh -i none # 4) HDMI cd .. git clone https://github.com/LeCAR-Lab/HDMI cd HDMI pip install -e .This codebase is designed to be a flexible, high-performance RL framework for Isaac Sim, built from composable MDP components, modular RL algorithms, and Hydra-driven configs. It relies on tensordict/torchrl for efficient data flow.
active_adaptation/envs/— unified base env with composable modular MDP components: Documentation →.learning/— single-file PPO implementations: Documentation →.
scripts/— training, evaluation, visualization entry points: Documentation →.cfg/— Hydra configs for tasks, algorithms, and app launch settingsdata/— motion assets and samples referenced by configs
HDMI-specific code is primarily in active_adaptation/envs/mdp/commands/hdmi/ (commands, observations, rewards) and active_adaptation/learning/ppo_roa.py (PPO with residual action distillation).
The training scripts load motion data from motion.npz (see active_adaptation/utils/motion.py). The desired data format is as follows:
- Body states:
pos,quat,lin_vel,ang_vel→[T, B, 3/4] - Joint states:
pos,vel→[T, J]
T = time steps, B = bodies (including appended objects), J = joints. Body/joint ordering is defined in the accompanying meta.json.
To turn HOI/video data into this format:
- Convert human motion to robot motion via GVHMR → GMR/LocoMujoco to obtain robot body/joint states.
- Extract the object trajectory (position, orientation, velocities).
- Append the object name to
meta.json, then concatenate the object body states (pos,quat,lin_vel,ang_vel) to the robot body states so shapes become[T, B_robot + B_object, 3/4].
Visualize motions in Isaac Sim with +task.command.record_motion=true:
python scripts/train.py algo=ppo_roa_train task=G1/hdmi/move_suitcase +task.command.replay_motion=trueOr visualize a motion.npz in MuJoCo:
# one terminal python scripts/vis/mujoco_mocap_viewer.py # another terminal python scripts/vis/motion_data_publisher.py <path-to-motion-folder>Teacher policy
# train teacher python scripts/train.py algo=ppo_roa_train task=G1/hdmi/move_suitcase # evaluate teacher python scripts/play.py algo=ppo_roa_train task=G1/hdmi/move_suitcase checkpoint_path=run:<teacher-wandb_run_path>Student policy
# train student python scripts/train.py algo=ppo_roa_finetune task=G1/hdmi/move_suitcase checkpoint_path=run:<teacher-wandb_run_path> # evaluate student python scripts/play.py algo=ppo_roa_finetune task=G1/hdmi/move_suitcase checkpoint_path=run:<student-wandb_run_path>To export trained policies, add export_policy=true to the play script.
Please see github.com/EGalahad/sim2real for details.
If you find our work useful for your research, please consider cite us:
@misc{weng2025hdmilearninginteractivehumanoid, title={HDMI: Learning Interactive Humanoid Whole-Body Control from Human Videos}, author={Haoyang Weng and Yitang Li and Nikhil Sobanbabu and Zihan Wang and Zhengyi Luo and Tairan He and Deva Ramanan and Guanya Shi}, year={2025}, eprint={2509.16757}, archivePrefix={arXiv}, primaryClass={cs.RO}, url={https://arxiv.org/abs/2509.16757}, }