Holistic Evaluation of Multimodal LLMs on Spatial Intelligence
English | 简体中文
EASI conceptualizes a comprehensive taxonomy of spatial tasks that unifies existing benchmarks and a standardized protocol for the fair evaluation of state-of-the-art proprietary and open-source models.
Key features include:
- Supports the evaluation of state-of-the-art Spatial Intelligence models.
- Systematically collects and integrates evolving Spatial Intelligence benchmarks.
- Proposes a standardized testing protocol to ensure fair evaluation and enable cross-benchmark comparisons.
🌟 [2025-11-21] EASI v0.1.1 is released. Major updates include:
-
Expanded model support
Added 9 new Spatial Intelligence models, increasing total from 7 → 16:- SenseNova-SI 1.1 Series
- SpaceR: SpaceR-7B
- VST Series: VST-3B-SFT, VST-7B-SFT
- Cambrian-S series: Cambrian-S-0.5B, Cambrian-S-1.5B, Cambrian-S-3B, Cambrian-S-7B
-
Expanded benchmark support
Added 1 new image–video benchmark, increasing total from 6 → 7:
🌟 [2025-11-07] EASI v0.1.0 is released. Major updates include:
- Supports 7 recent Spatial Intelligence models:
- SenseNova-SI Family: SenseNova-SI-InternVL3-8B, SenseNova-SI-InternVL3-2B
- MindCube Family: MindCube-3B-RawQA-SFT, MindCube-3B-Aug-CGMap-FFR-Out-SFT,MindCube-3B-Plain-CGMap-FFR-Out-SFT
- SpatialLadder: SpatialLadder-3B
- SpatialMLLM: SpatialMLLM-4B
- Supports 6 recent Spatial Intelligence benchmarks:
- 4 image-based benchmarks: MindCube, ViewSpatial, EmbSpatial and MMSI(no circular evaluation)
- 2 image-and-video benchmarks: VSI-Bench and SITE-Bench
- Introduces a standardized testing protocol as outlined in EASI
git clone --recursive https://github.com/EvolvingLMMs-Lab/EASI.git cd EASI pip install -e ./VLMEvalKitVLM Configuration: All VLMs are configured in vlmeval/config.py. During evaluation, you should use the model name specified in supported_VLM in vlmeval/config.py to select the VLM. Make sure you can successfully infer with the VLM before starting the evaluation with the following command vlmutil check {MODEL_NAME}.
Benchmark Configuration: The full list of supported Benchmarks can be found in the official VLMEvalKit documentation VLMEvalKit Supported Benchmarks (Feishu). For the EASI Leaderboard, the following Benchmarks are currently supported:
General command
python run.py --data {BENCHMARK_NAME} --model {MODEL_NAME} --verbose --reuseSee run.py for the full list of arguments.
Example
Evaluate SenseNova-SI-1.1-InternVL3-8B on MindCubeBench_tiny_raw_qa:
python run.py --data MindCubeBench_tiny_raw_qa \ --model SenseNova-SI-1.1-InternVL3-8B \ --verbose --reuseTo submit your evaluation results to our EASI Leaderboard:
- Go to the EASI Leaderboard page.
- Click 🚀 Submit here! to the submission form.
- Follow the instructions to fill in the submission form, and submit your results.
@article{easi2025, title={Holistic Evaluation of Multimodal LLMs on Spatial Intelligence}, author={Cai, Zhongang and Wang, Yubo and Sun, Qingping and Wang, Ruisi and Gu, Chenyang and Yin, Wanqi and Lin, Zhiqian and Yang, Zhitao and Wei, Chen and Shi, Xuanke and Deng, Kewang and Han, Xiaoyang and Chen, Zukai and Li, Jiaqi and Fan, Xiangyu and Deng, Hanming and Lu, Lewei and Li, Bo and Liu, Ziwei and Wang, Quan and Lin, Dahua and Yang, Lei}, journal={arXiv preprint arXiv:2508.13142}, year={2025} }