This directory provides the scripts for evaluating models on OddGridBench.
To ensure fair comparison, please follow the same evaluation pipeline and output format as provided in this repository.
The evaluation process mainly includes the following steps:
- Load the dataset
- Run model inference
- Extract answers from model outputs
- Compute final accuracy
We provide example scripts for each step.
You can load the OddGridBench dataset using HuggingFace:
from datasets import load_dataset dataset = load_dataset("wwwtttjjj/OddGridBench")Alternatively, you can manually download the datasets and place them under the datasets/ directory with the following structure: datasets/ ├── OddGridBench ├── MNIST ├── SCC ├── MVTec-AD └── VisA Please ensure that the dataset folders follow the same structure as above before running the evaluation scripts.
Before running the evaluation, please configure the model and dataset settings in configs.py.
- model_dir: the local path of the model weights
- model_name: the model name used for evaluation
- data_type: the dataset used for evaluation
Supported datasets include:
OddGridBenchMNISTSCCMVTec-ADVisA
After configuration, run the following script to perform model inference:
python vlm_infer.py