MapTab is a comprehensive benchmark designed to evaluate the map understanding and spatial reasoning capabilities of Vision-Language Models (VLMs). The benchmark focuses on two core tasks: route planning and map-based question answering, using both metro maps and travel maps.
Project Page: https://ziqiao-shang.github.io/MapTab-Leaderboard/
- Overview
- Task Description
- Dataset
- Version Requirements
- Quick Start
- Supported Models
- Project Structure
- Evaluation Metrics
- Citation
MapTab evaluates VLMs on their ability to:
- Understand map visualizations - Parse and interpret visual map information
- Process tabular data - Understand structured information in tables (JSON/CSV)
- Perform spatial reasoning - Plan routes and answer spatial questions
- Follow complex constraints - Handle multi-constraint route planning tasks
The benchmark includes various route planning subtasks with different input modalities and constraint levels:
| Subtask | Input Modality | Description |
|---|---|---|
shortest_path_only_map | Map Image | Route planning using only visual map |
shortest_path_only_tab | Table (JSON) | Route planning using only tabular data |
shortest_path_only_csv | Table (CSV) | Route planning using only CSV data (ablation) |
shortest_path_map_and_tab_no_constraint | Map + Table | Combined input without constraints |
shortest_path_map_and_csv | Map + CSV | Combined with CSV format (ablation) |
shortest_path_map_and_tab_with_constraint_1 | Map + Table | With constraint type 1 |
shortest_path_map_and_tab_with_constraint_2 | Map + Table | With constraint type 2 |
shortest_path_map_and_tab_with_constraint_3 | Map + Table | With constraint type 3 |
shortest_path_map_and_tab_with_constraint_4 | Map + Table | With constraint type 4 |
shortest_path_map_and_tab_with_constraint_1_2_3_4 | Map + Table | With all four constraints |
shortest_path_map_and_tab_with_constraint_1_2_4 | Map + Table | With constraints 1, 2, and 4 |
shortest_path_map_and_tab_with_constraint_1_3_4 | Map + Table | With constraints 1, 3, and 4 |
shortest_path_map_and_tab_with_constraint_2_3_4 | Map + Table | With constraints 2, 3, and 4 |
only_vertex2 | Map + Table | Special vertex subset task |
shortest_path_csv_vertex2 | Map + CSV | CSV format with vertex subset (ablation) |
shortest_path_map_and_tab_csv_constraint_1_2_3_4 | Map + CSV | CSV format with all constraints (ablation) |
QA tasks evaluate map comprehension across different aspects:
| Subtask ID | Task Type | Description |
|---|---|---|
| 1 | 1_qa_only_pic_global | Global questions using only map image |
| 2 | 2_qa_only_pic_part | Local/partial questions using only map image |
| 3 | 3_qa_only_pic_spatial_judge | Spatial judgment using only map image |
| 4 | 4_qa_edge_tab_global | Global edge questions with table |
| 5 | 5_qa_edge_tab_part | Local edge questions with table |
| 6 | 6_qa_edge_tab_spatial_judge | Spatial edge judgment with table |
| 7 | 7_qa_vertex_tab_global | Global vertex questions with table |
| 8 | 8_qa_vertex_tab_part | Local vertex questions with table |
| 9 | 9_qa_vertex_tab_spatial_judge | Spatial vertex judgment with table |
| 10 | 10_qa_pic_and_tab_global | Global questions with map and table |
| 11 | 11_qa_pic_and_tab_part | Local questions with map and table |
| 12 | 12_qa_pic_and_tab_spatial_judge | Spatial judgment with map and table |
The dataset includes two map types:
- MetroMap: Synthetic metro/subway network maps
- TravelMap: Travel route maps with geographic information
The current release includes the following files under both metromap/ and travelmap/:
- β
data/- Route Planning (RP) task test query set - β
qa_data/- Question Answering (QA) task query set - β
images/- Map images - β
prompts/- Prompt templates for both RP and QA tasks - β
tabulars/-Edge_tabandVertex_tabfiles
Note: In the current release, only the RP task test query set is available. QA task queries and RP task training queries will be released in future updates.
Files in these five folders (
data/,qa_data/,images/,prompts/,tabulars/) can be downloaded from Hugging Face: https://huggingface.co/datasets/szq-nju/MapTab
- Python >= 3.8
- PyTorch >= 2.0
- OpenAI SDK (for API-based models)
- vLLM (for local model inference)
- NumPy
export WORKSPACE_DIR="/path/to/MapTab" export API_KEY="your-api-key" # For API-based models# Generate RP task results bash scripts/generate_rp.sh # Generate QA task results bash scripts/generate_qa.sh# Evaluate RP task results bash scripts/evaluate_rp.sh # Evaluate QA task results bash scripts/evaluate_qa.shThe framework has been tested with the following model identifiers in src/generate_lib/utils.py:
Qwen3-VL-8B-InstructQwen3-VL-8B-ThinkingQwen3-VL-30B-A3B-ThinkingQwen2.5-VL-7B-InstructKimi-VL-A3B-Thinking-2506Kimi-VL-A3B-InstructPhi-4-multimodal-instructPhi-3.5-vision-instructGlyphQwen3-VL-2B-Instructllava-v1.6-mistral-7b-hfInternVL3_5-30B-A3BInternVL3_5-8BOvis2.5-9B
qwen3-vl-32b-instructqwen3-vl-8b-instructqwen3-vl-30b-a3b-instructqwen3-vl-32b-thinkingqwen3-vl-8b-thinkingqwen3-vl-plusqwen3-maxgpt-4.1gpt-5gpt-4odoubao-seed-1-6-251015kimi-latestGLM-4.1V-9B-ThinkingGLM-4.6Vstep3Qwen3-VL-30B-A3B-Instructgemini-1.5-pro-001gemini-1.0-pro-vision-001gemini-1.5-flash-001gemini-1.5-pro-exp-0801gemini-3-flash-preview
Note: For API-based models, we only provide one Aliyun Bailian integration example in
src/generate_lib/qwen_api.py. Since API platforms vary, please implement other provider interfaces by following this example.
MapTab/ βββ src/ β βββ generate.py β βββ evaluate_planning.py β βββ evaluate_qa.py β βββ metromap_utils.py β βββ travelmap_utils.py β βββ generate_lib/ β βββ qwen_api.py β βββ utils.py β βββ vllm_LLMengine.py βββ scripts/ β βββ generate_rp.sh β βββ generate_qa.sh β βββ evaluate_rp.sh β βββ evaluate_qa.sh βββ metromap/ β βββ data/ β β βββ test_set/ β βββ images/ β βββ prompts/ β βββ tabulars/ βββ travelmap/ β βββ data/ β β βββ test_set/ β βββ images/ β βββ prompts/ β βββ tabulars/ βββ results/ βββ results_evaluate/ | Metric | Description |
|---|---|
| all_acc | Exact match accuracy (complete route correctness) |
| part_acc | Partial accuracy (proportion of correct route segments) |
| Difficulty_score | Difficulty-weighted score based on map and query complexity |
| Metric | Description |
|---|---|
| accuracy | Proportion of correct numeric answers |
If you use MapTab in your research, please cite:
@article{shang2026maptab, title={MapTab: Can MLLMs Master Constrained Route Planning?}, author={Shang, Ziqiao and Ge, Lingyue and Chen, Yang and Tian, Shi-Yu and Huang, Zhenyu and Fu, Wenbo and Li, Yu-Feng and Guo, Lan-Zhe}, journal={arXiv preprint arXiv:2602.18600}, year={2026} }