Paper link: https://arxiv.org/abs/2406.08702
- Download dataset in https://huggingface.co/datasets/klee972/VLind-Bench
- Directory structure should be as follows.
├── data │ ├── data.json │ ├── counterfactual │ ├── factual └── evel ├── ctx_cfq ├── gpt4o_eval.py ├── instructblip_eval.py ├── score_pipeline.py └── score.sh - Run
gpt4o_eval.pyorinstructblip_eval.pyto generate model predictions. - Run
score.shto evaluate pipeline scores and accuracies.