This is the official PyTorch implementation of LR2PPO. The ECCV2024 paper is available at arXiv.
Introduction video: YouTube
- Download dataset: HuggingFace Hub
- Optional: Original MovieNet dataset Official Website
- Pre-processed datasets (
datasets_trad) available: Google Drive - Optional preparation:
- Follow dataset generation guide:
datasets_trad/README.md - Access source datasets:
• MSLR-Web10K: Microsoft Research
• MQ2008: LETOR 4.0
- Follow dataset generation guide:
Download required weights for both benchmarks:
roberta_base_en_modelandvit_base_patch16_224_model- Source: from Google Drive or from its official repositories
- Save in:
./pretrained_models/
pip3 install -r requirements.txtHardware Requirement: 4 GPUs
# Stage 1: Base Model sh pointwise.sh <your_stage1> # Stage 2: Reward Model sh reward_pair_dataloader.sh <your_stage2> # Stage 3: LR<sup>2</sup>PPO sh ppo.sh <your_stage3> # Evaluation sh ppo_eval.sh <your_eval># Stage 1: Base Model sh pointwise_trad.sh <your_stage1> # Stage 2: Reward Model sh reward_trad.sh <your_stage2> # Stage 3: LR<sup>2</sup>PPO sh ppo_trad.sh <your_stage3> # Evaluation sh ppo_eval_trad.sh <your_eval>- Download: Google Drive
- Download: Google Drive
See LICENSE for details.
Code components borrowed from:
- TencentPretrain
- PaLM-rlhf-pytorch
- benchmarks (Transfer Task)
We are grateful for these excellent works and repositories.
If you found our work helpful in your research, please consider citing it.
@inproceedings{guo2024multimodal, title={Multimodal Label Relevance Ranking via Reinforcement Learning}, author={Guo, Taian and Zhang, Taolin and Wu, Haoqian and Li, Hanjun and Qiao, Ruizhi and Sun, Xing}, booktitle={European Conference on Computer Vision}, pages={391--408}, year={2024}, organization={Springer} }