MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models via Reinforcement Learning

Installation

Run the setup script to configure the environment:

bash setup.sh

This script will:

Create conda environment medvlm-r1
Install necessary dependencies
Configure the open-r1-multimodal framework

Quick Start

Run Demo

Use the Jupyter notebook to quickly experience the model:

jupyter notebook demo.ipynb

The demo includes:

Model loading
Medical image VQA examples
Inference process demonstration

Example Output

The model generates structured reasoning process:

<think> The image is a magnetic resonance imaging (MRI) scan of a knee joint. The scan shows a chondral abnormality, which is a type of cartilage damage. This is evident from the irregular shape and the presence of a defect in the cartilage. </think> <answer>A</answer>

Dataset Download

Training and Testing Datasets

Download the HuatuoGPT-Vision dataset via Hugging Face CLI:

# 1) Install Hugging Face CLI (if not already) pip install -U "huggingface_hub[cli]" # 2) (Optional) Login if the dataset requires auth # huggingface-cli login # 3) Download the dataset to a local directory # Replace <TARGET_DIR> with your local path, e.g., /data/datasets/PubMedVision hf download FreedomIntelligence/PubMedVision \ --repo-type dataset \ --local-dir <TARGET_DIR> \ --local-dir-use-symlinks False \ --include "*" # After download, set <DATASET_PATH_ROOT>=<TARGET_DIR> in your scripts

The dataset contains:

MRI, CT, X-ray medical images
Corresponding visual question-answer pairs
Multi-modal medical reasoning tasks

Training and Testing

Training

Run the training script:

bash train_script.sh

Note: Please update the following paths in the script:

<DATASET_NAME>: Dataset name
<GPU_NUM>: Number of GPUs
<LOG_PATH>: Log output path
<HF_CACHE_DIR>: Hugging Face cache directory
<WANDB_ENTITY>: Weights & Biases entity
<WANDB_PROJECT>: Project name
<OUTPUT_DIR_ROOT>: Output directory root path
<MODEL_REPO_OR_DIR>: Model path
<DATASET_PATH_ROOT>: Dataset root path
<MASTER_ADDR>: Master node address
<MASTER_PORT>: Master node port

Testing

Run the testing script:

bash test_script.sh

Note: Please update the following paths in the script:

<HF_CACHE_DIR>: Hugging Face cache directory
<CUDA_DEVICES>: CUDA devices
<MODEL_REPO_OR_DIR>: Model path
<DATASET_PATH_ROOT>: Dataset root path
<OUTPUT_DIR>: Output directory

Testing Configuration

The testing script supports the following parameters:

MODALITY: Modality type (MRI, CT, Ultrasound, Xray, Dermoscopy, Microscopy, Fundus)
PROMPT_TYPE: Prompt type (simple, complex)
BSZ: Batch size
MAX_NEW_TOKENS: Maximum new tokens to generate
DO_SAMPLE: Whether to sample
TEMPERATURE: Temperature parameter

Project Structure

r1-v-med/ ├── demo.ipynb # Demo notebook ├── setup.sh # Setup script ├── train_script.sh # Training script ├── test_script.sh # Testing script ├── MRI_CT_XRAY_300each_dataset.json # Test dataset ├── images/ # Example images │ ├── successful_cases/ # Successful cases │ └── failure_cases/ # Failure cases └── src/ ├── eval/ # Evaluation code │ └── test_qwen2vl_med.py # Testing script ├── distill_r1/ # R1 distillation related └── open-r1-multimodal/ # Based framework └── src/open_r1/ ├── grpo.py # GRPO training code └── trainer/ └── grpo_trainer.py # GRPO trainer

Acknowledgement

Citation

If you find our work helpful, please cite:

@article{pan2025medvlm, title={MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning}, author={Pan, Jiazhen and Liu, Che and Wu, Junde and Liu, Fenglin and Zhu, Jiayuan and Li, Hongwei Bran and Chen, Chen and Ouyang, Cheng and Rueckert, Daniel}, journal={arXiv preprint arXiv:2502.19634}, year={2025} }

Base Frameworks

Our code is based on the following open-source projects:

open-r1-multimodal: https://github.com/EvolvingLMMs-Lab/
R1-V: https://github.com/StarsfieldAI/R1-V

Thanks to these excellent open-source projects for providing a solid foundation for our research.

License

This project is licensed under the Apache 2.0 License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models via Reinforcement Learning

Installation

Quick Start

Run Demo

Example Output

Dataset Download

Training and Testing Datasets

Training and Testing

Training

Testing

Testing Configuration

Project Structure

Acknowledgement

Citation

Base Frameworks

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
images		images
src		src
MRI_CT_XRAY_300each_dataset.json		MRI_CT_XRAY_300each_dataset.json
README.md		README.md
demo.ipynb		demo.ipynb
setup.sh		setup.sh
test_script.sh		test_script.sh
train_script.sh		train_script.sh

Folders and files

Latest commit

History

Repository files navigation

MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models via Reinforcement Learning

Installation

Quick Start

Run Demo

Example Output

Dataset Download

Training and Testing Datasets

Training and Testing

Training

Testing

Testing Configuration

Project Structure

Acknowledgement

Citation

Base Frameworks

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages