We use ms-swift with some modifications in the SFT phase to train MMDuet2. When loading model (Qwen2.5-VL) and data, we use qwen-vl-utils and transformers.
conda create --name mmduet2_sft python=3.10 conda activate mmduet2_sft pip install -r requirements.txtgit clone https://github.com/modelscope/ms-swift.git cd ms-swift git checkout v3.2.0 pip install -e . cd .. # replace some code files that we modified from ms-swift cp -ri ms-swift-replace-code/* ms-swiftFollow the instructions in MMDuet2-data to prepare the dataset, and move the sft sub folder to ./data/annotations
bash ./scripts/train.sh