Multimodal Label Relevance Ranking via Reinforcement Learning (ECCV2024)

This is the official PyTorch implementation of LR²PPO. The ECCV2024 paper is available at arXiv.
Introduction video: YouTube

Getting Started

Data Preparation

For LRMovieNet Benchmark

Download dataset: HuggingFace Hub
Optional: Original MovieNet dataset Official Website

For MSLR-Web10K → MQ2008 Transfer Task

Pre-processed datasets (datasets_trad) available: Google Drive
Optional preparation:
- Follow dataset generation guide: datasets_trad/README.md
- Access source datasets:
  • MSLR-Web10K: Microsoft Research
  • MQ2008: LETOR 4.0

Initialization Weights

Download required weights for both benchmarks:

roberta_base_en_model and vit_base_patch16_224_model
Source: from Google Drive or from its official repositories
Save in: ./pretrained_models/

Prerequisites

pip3 install -r requirements.txt

Hardware Requirement: 4 GPUs

Usage Instructions

For LRMovieNet Benchmark

# Stage 1: Base Model sh pointwise.sh <your_stage1> # Stage 2: Reward Model sh reward_pair_dataloader.sh <your_stage2> # Stage 3: LR<sup>2</sup>PPO sh ppo.sh <your_stage3> # Evaluation sh ppo_eval.sh <your_eval>

For MSLR-Web10K → MQ2008 Transfer Task

# Stage 1: Base Model sh pointwise_trad.sh <your_stage1> # Stage 2: Reward Model sh reward_trad.sh <your_stage2> # Stage 3: LR<sup>2</sup>PPO sh ppo_trad.sh <your_stage3> # Evaluation sh ppo_eval_trad.sh <your_eval>

Model Checkpoints

LRMovieNet Benchmark

Download: Google Drive

MSLR-Web10K → MQ2008 Transfer

Download: Google Drive

License

See LICENSE for details.

Acknowledgments

Code components borrowed from:

We are grateful for these excellent works and repositories.

Citation

If you found our work helpful in your research, please consider citing it.

@inproceedings{guo2024multimodal, title={Multimodal Label Relevance Ranking via Reinforcement Learning}, author={Guo, Taian and Zhang, Taolin and Wu, Haoqian and Li, Hanjun and Qiao, Ruizhi and Sun, Xing}, booktitle={European Conference on Computer Vision}, pages={391--408}, year={2024}, organization={Springer} }

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
datasets_trad		datasets_trad
finetune		finetune
licenses		licenses
logs		logs
models		models
tencentpretrain		tencentpretrain
.gitignore		.gitignore
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
ndcg.py		ndcg.py
pip3_list.txt		pip3_list.txt
pointwise.sh		pointwise.sh
pointwise_2data_infer_trad.sh		pointwise_2data_infer_trad.sh
pointwise_2data_trad.sh		pointwise_2data_trad.sh
pointwise_trad.sh		pointwise_trad.sh
ppo.sh		ppo.sh
ppo_eval.sh		ppo_eval.sh
ppo_eval_trad.sh		ppo_eval_trad.sh
ppo_trad.sh		ppo_trad.sh
preprocess.py		preprocess.py
requirements.txt		requirements.txt
reward_pair_dataloader.sh		reward_pair_dataloader.sh
reward_trad.sh		reward_trad.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multimodal Label Relevance Ranking via Reinforcement Learning (ECCV2024)

Getting Started

Data Preparation

For LRMovieNet Benchmark

For MSLR-Web10K → MQ2008 Transfer Task

Initialization Weights

Prerequisites

Usage Instructions

For LRMovieNet Benchmark

For MSLR-Web10K → MQ2008 Transfer Task

Model Checkpoints

LRMovieNet Benchmark

MSLR-Web10K → MQ2008 Transfer

License

Acknowledgments

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Multimodal Label Relevance Ranking via Reinforcement Learning (ECCV2024)

Getting Started

Data Preparation

For LRMovieNet Benchmark

For MSLR-Web10K → MQ2008 Transfer Task

Initialization Weights

Prerequisites

Usage Instructions

For LRMovieNet Benchmark

For MSLR-Web10K → MQ2008 Transfer Task

Model Checkpoints

LRMovieNet Benchmark

MSLR-Web10K → MQ2008 Transfer

License

Acknowledgments

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages