Skip to content

Feat: add model support for penguinvl#1257

Open
taintaintainu wants to merge 2 commits intoEvolvingLMMs-Lab:mainfrom
taintaintainu:feat/add-model-penguinvl
Open

Feat: add model support for penguinvl#1257
taintaintainu wants to merge 2 commits intoEvolvingLMMs-Lab:mainfrom
taintaintainu:feat/add-model-penguinvl

Conversation

@taintaintainu
Copy link
Contributor

Summary

  • Add a new simple-model integration for Penguin-VL exposed as --model penguinvl in lmms-eval.
  • Register penguinvl in the model registry and add an example launch script for multi-benchmark evaluation.

In scope

  • Add lmms_eval/models/simple/penguinvl.py, register the model ID, provide examples/models/penguin_vl.sh, and add penguinvl prompt overrides for mmmu_pro_standard and mmmu_pro_vision.

Out of scope

  • No new benchmark/task is introduced, and no metric/aggregation logic or dataset definitions are changed outside the Penguin-VL-specific prompt configuration.

Validation

  • accelerate launch --num_processes=8 --main_process_port=12346 -m lmms_eval --model penguinvl --model_args=pretrained=tencent/Penguin-VL-8B,attn_implementation=flash_attention_2,dtype=bfloat16 --tasks "ai2d,mmmu_pro_standard,ocrbench" --batch_size 1 --log_samples --log_samples_suffix penguinvl --verbosity DEBUG --output_path ./logs/ | sample size: N=3088+1730+1000 | key metrics: ai2d exact_match=0.8491, mmmu_pro_standard mmmu_acc=0.32139, ocrbench_accuracy=0.8430 | result: pass
  • accelerate launch --num_processes=8 --main_process_port=12346 -m lmms_eval --model penguinvl --model_args=pretrained=tencent/Penguin-VL-8B,attn_implementation=flash_attention_2,dtype=bfloat16 --tasks "videomme,longvideobench_val_v" --batch_size 1 --log_samples --log_samples_suffix penguinvl --verbosity DEBUG --output_path ./logs/ | sample size: N=2700+1337 | key metrics: videomme_perception_score=66.30, longvideobench_val_v lvb_acc=0.64996 | result: pass

Risk / Compatibility

  • Runtime compatibility depends on the upstream Penguin-VL Hugging Face implementation; this integration was evaluated with transformers==4.51.3 and attn_implementation=flash_attention_2.

Type of Change

  • Bug fix (non-breaking change)
  • New feature
  • New benchmark/task
  • New model integration
  • Breaking change
  • Documentation update
  • Refactoring (no functional changes)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant