Fork of NVlabs/FoundationPose adapted for the SimToolReal perception pipeline. This repo provides three user-facing scripts for recording RGB-D videos, extracting 6D object poses, and running real-time pose tracking with ROS.
conda create -n foundationpose python=3.9 -y conda activate foundationposeCUDA 11.8 requires GCC <= 11. Install both the CUDA toolkit and a compatible GCC via conda:
# CUDA 11.8 toolkit (must match PyTorch CUDA version) conda install -c "nvidia/label/cuda-11.8.0" cuda-toolkit -y # GCC 11 (required by CUDA 11.8 nvcc) conda install -c conda-forge gcc_linux-64=11 gxx_linux-64=11 -ypip install -r requirements.txtPyTorch3D must be compiled from source. Set CUDA_HOME, CC, and CXX to use the conda-installed toolchain:
export CUDA_HOME=$CONDA_PREFIX export CC=$CONDA_PREFIX/bin/x86_64-conda-linux-gnu-gcc export CXX=$CONDA_PREFIX/bin/x86_64-conda-linux-gnu-g++ pip install --no-build-isolation "git+https://github.com/facebookresearch/pytorch3d.git"This takes several minutes to compile CUDA kernels.
export CUDA_HOME=$CONDA_PREFIX export CC=$CONDA_PREFIX/bin/x86_64-conda-linux-gnu-gcc export CXX=$CONDA_PREFIX/bin/x86_64-conda-linux-gnu-g++ pip install --no-build-isolation git+https://github.com/NVlabs/nvdiffrastbash build_all_conda.shThis builds mycpp (pose clustering used by FoundationPose).
Download the FoundationPose pretrained weights from the original repo and place them in weights/:
weights/ ├── 2023-10-28-18-33-37/ # Scorer model └── 2024-01-11-20-02-45/ # Refiner model Required for record_video.py and live_tracking_with_ros.py:
- Install from stereolabs.com
- Then:
pip install pyzed
Required for live_tracking_with_ros.py. Install via RoboStack into the same conda environment:
conda config --env --add channels robostack-staging conda config --env --add channels conda-forge conda config --env --set channel_priority flexible conda install ros-noetic-desktop conda install ros-noetic-geometry-msgs ros-noetic-std-msgsAfter installation, ROS is automatically sourced when you activate the conda environment.
cd /path/to/FoundationPose python -c "from Utils import *; from estimater import *; from generate_mask import generate_binary_mask_box; import mycpp; print('All imports OK')"Record an RGB-D video from a ZED stereo camera.
python record_video.py \ --save_dir recordings/ \ --serial_number 15107 \ --fps 30Press Ctrl+C to stop recording. Output directory structure:
recordings/<timestamp>/ ├── rgb/ # RGB frames as PNGs ├── depth/ # Depth frames as 16-bit PNGs (mm) ├── cam_K.txt # 3x3 camera intrinsics └── rgb.mp4 # RGB video Options: --width (960), --height (540), --exposure (25), --gain (40), --camera_upsidedown
Extract 6D object poses from a recorded RGB-D video and an object mesh.
python extract_poses.py \ --video_dir recordings/<timestamp>/ \ --mesh_path /path/to/object.obj \ --calibration calibration/T_RC_example.txt \ --output_path poses.jsonOn the first frame, an interactive window opens where you click 4 corners of a bounding box around the object for SAM-based segmentation. FoundationPose then tracks the object through all frames. To skip interactive selection and use a pre-existing mask, pass --mask_path <mask.png>.
Output format (poses.json):
{ "poses_cam": [ [x, y, z, qx, qy, qz, qw], ... ], "poses_robot": [ [x, y, z, qx, qy, qz, qw], ... ] }Options: --mask_path <mask.png> (skip interactive SAM), --est_refine_iter (5), --track_refine_iter (2), --debug (0/1/2)
Run real-time 6D pose tracking from a ZED camera and publish poses to ROS topics.
python live_tracking_with_ros.py \ --mesh_path /path/to/object.obj \ --calibration calibration/T_RC_example.txtPublished ROS topics:
camera_frame/current_object_pose(PoseStamped) -- pose in camera framerobot_frame/current_object_pose(PoseStamped) -- pose in robot frame (viaT_RC)
Options: --serial_number (15107), --fps (40), --camera_upsidedown, --width (960), --height (540), --debug (0/1)
Camera intrinsics are automatically read from the ZED SDK for live scripts (record_video.py, live_tracking_with_ros.py). For offline processing (extract_poses.py), they are loaded from cam_K.txt in the video directory.
The --calibration argument accepts a 4x4 homogeneous transform T_RC (robot-from-camera) as a .npy or .txt file. This transform converts poses from camera frame to robot frame: pose_robot = T_RC @ pose_cam.
An example calibration file for our lab setup is provided at calibration/T_RC_example.txt. You must replace this with your own calibration for your camera/robot setup.
Object meshes should be .obj files with units in meters. The mesh origin defines the object coordinate frame for the estimated poses.
unsupported GNU version! gcc versions later than 11 are not supported: Make sure you installed GCC 11 via conda (step 2) and setCC/CXXenv vars before compiling PyTorch3D/nvdiffrast.The detected CUDA version (12.x) mismatches the version that was used to compile PyTorch (11.8): SetCUDA_HOME=$CONDA_PREFIXso the build uses the conda-installed CUDA 11.8 toolkit, not the system CUDA.ModuleNotFoundError: No module named 'torch'during PyTorch3D build: Use--no-build-isolationflag with pip.Disabling PyTorch because PyTorch >= 2.1 is required: This is a cosmetic warning fromtransformers. SAM mask generation still works correctly with PyTorch 2.0.