๐ฐ๏ธ A curated collection of 100+ papers at the intersection of Intelligent Remote Sensing Agents ๐
Important
We welcome community contributions to keep this list up-to-date!
- ๐ Add missing papers via Pull Request
- ๐ท๏ธ Propose new or refined categories
- ๐ Report broken links or outdated entries
- ๐ฌ Reach out via Contact for any discussion
If you find this survey or repository useful in your research, please cite our paper:
@article{tang2025intelligent, title={Intelligent Remote Sensing Agents: A Survey}, author={Tang, Jiaqi and Yan, Yingying and Wang, Qianzhou and Xia, Yuyang and Geng, Botong and Chen, Jianmin and Ma, Ke and Zhai, Youyang and He, Qingfeng and Shao, Weigeng and Sun, Yunjin and Dai, Junwei and Chen, Chuxi and Xu, Xiaogang and Yao, Kelu and Zhang, Lei and Wei, Wei and Chen, Qifeng and Plaza, Antonio and Zhang, Yanning}, year={2026}, url={https://github.com/PolyX-Research/Awesome-Remote-Sensing-Agents} }- [2026.03.20] ๐ The survey is now available on GitHub.
- [2026.03.20] ๐ We release the Awesome-Remote-Sensing-Agents repository.
- ๐ฅ News
- ๐ Contents
- Papers โ Ecological Monitoring ยท Emergency Response ยท Geological Exploration ยท Marine Supervision ยท Precision Agriculture ยท Urban Governance ยท Others
- Datasets & Benchmarks
- ๐ค How to Contribute
- ๐ License
- โจ Star History
- โ๏ธ Contact
| Badge | Meaning |
|---|---|
| Preprint on arXiv | |
| Published at a conference or journal | |
| Code repository available | |
| Application domain | |
| Agent design category (planning, memory, tool use, etc.) |
Ecological Monitoring
| Title | Application & Tags | Links |
|---|---|---|
REMSA: An LLM Agent for Foundation Model Selection in Remote Sensing | Paper GitHub | |
ForestGPT and Beyond: A Trustworthy Domain-Specific Large Language Model Paving the Way to Forestry 5.0 | Paper | |
GANDALF: A LLM-based Approach to Map Bark Beetle Outbreaks in Semantic Stories of Sentinel-2 Images | Paper | |
CLEAR: Climate Policy Retrieval and Summarization Using LLMs | Paper | |
DA4DTE: An Agentic System for Enhancing the Accessibility of Digital Twins of Earth | Paper | |
EarthLink: A Self-Evolving AI Agent for Climate Science | Paper | |
A Self-Evolving AI Agent System for Climate Science | Paper | |
Towards LLM Agents for Earth Observation | Paper GitHub Model | |
Accelerating Earth Science Discovery via Multi-Agent LLM Systems | Paper | |
GeoRSMLLM: A Multimodal Large Language Model for Vision-Language Tasks in Geoscience and Remote Sensing | Paper | |
CangLing-KnowFlow: A Unified Knowledge-and-Flow-fused Agent for Remote Sensing Applications | Paper | |
Google Earth AI and Gemini for Climate and Environmental Analysis | Paper | |
REO-VLM: Transforming VLM to Meet Regression Challenges in Earth Observation | ||
H2RSVLM: Towards Helpful and Honest Remote Sensing Large Vision Language Model | Paper GitHub | |
LHRS-Bot: Empowering Remote Sensing with VGI-Enhanced Large Multimodal Language Model | Paper GitHub | |
RS-LLaVA: A Large Vision-Language Model for Joint Captioning and Question Answering in Remote Sensing Imagery | Paper GitHub | |
EarthGPT: A Universal Multimodal Large Language Model for Multisensor Image Comprehension in Remote Sensing Domain | Paper GitHub | |
Transfer Learning in Environmental Remote Sensing | Paper | |
TREE-GPT: Modular Large Language Model Expert System for Forest Remote Sensing Image Understanding and Interactive Analysis | Paper | |
An Agent-Based Model to Represent Space-Time Propagation of Forest-Fire Smoke | Paper | |
High-Resolution Mapping of Global Surface Water and Its Long-Term Changes |
Emergency Response
| Title | Application & Tags | Links |
|---|---|---|
FIRE-VLM: A Vision-Language-Driven Reinforcement Learning Framework for UAV Wildfire Tracking | Paper | |
UAV-CodeAgents: Scalable UAV Mission Planning via Multi-Agent ReAct and Vision-Language Reasoning | Paper | |
Geospatial Artificial Intelligence for Satellite-based Flood Extent Mapping | Paper | |
A RAG-Based Multi-Agent LLM System for Natural Hazard Resilience and Adaptation | Paper GitHub | |
Earth AI: Unlocking Geospatial Insights with Foundation Models and Cross-Modal Reasoning | Paper GitHub | |
Empowering LLM Agents with Geospatial Awareness: Toward Grounded Reasoning for Wildfire Response | Paper | |
A Conceptual High Level Multiagent System for Wildfire Management | Paper | |
RescueADI: Adaptive Disaster Interpretation in Remote Sensing Images With Autonomous Agents | Paper | |
LLM-Enhanced Disaster Geolocalization Using Implicit Geoinformation from Multimodal Data: A Case Study of Hurricane Harvey | Paper | |
Knowledge-Guided Large Language Models for Enhancing Agent-Based Wildfire Spatial Simulation | Paper Dataset | |
Large-Language-Model-Driven Agents for Fire Evacuation Simulation in a Cellular Automata Environment | Paper | |
From Perceptions to Decisions: Wildfire Evacuation Decision Prediction with Behavioral Theory-informed LLMs | Paper | |
ESCAPE: Evacuation Simulation Using Cognitive Agent-Based Modeling on Possible Earthquake in GAMA Platform for the Case of Kalayaan Residence Hall | Paper | |
Description of Wildfires Spreading and Extinguishing with the Aid of Agent-Based Models | Paper |
Geological Exploration
| Title | Application & Tags | Links |
|---|---|---|
PEACE: Empowering Geologic Map Holistic Understanding with MLLMs | Paper GitHub | |
STA-CoT: Structured Target-Centric Agentic Chain-of-Thought for Consistent Multi-Image Geological Reasoning | Paper Dataset | |
Automating Geospatial Vision Tasks with a Large Language Model Agent | Paper GitHub | |
A Vision-Language Foundation Model-Based Multi-Modal Retrieval-Augmented Generation Framework for Remote Sensing Lithological Recognition | Paper | |
HI-MAFE: Hyperspectral Image Multi-Agent Deep Reinforcement Learning Feature Extraction | Paper | |
MineAgent: Towards Remote-Sensing Mineral Exploration with Multimodal Large Language Models | Paper |
Marine Supervision
| Title | Application & Tags | Links |
|---|---|---|
OceanAI: A Conversational Platform for Accurate, Transparent, Near-Real-Time Oceanographic Insights | Paper | |
Large Language Model-Based Decision-Making for COLREGs and the Control of Autonomous Surface Vehicles | Paper | |
Autonomous Vehicle Maneuvering Using VisionโLLM Models for Marine Surface Vehicles | Paper | |
OceanGPT: A Large Language Model for Ocean Science Tasks | Paper | |
WaterGPT: Training a Large Language Model to Become a Hydrology Expert | Paper |
Precision Agriculture
| Title | Application & Tags | Links |
|---|---|---|
AgriGPT: A Large Language Model Ecosystem for Agriculture | Paper | |
ChatLeafDisease: A Chain-of-Thought Prompting Approach for Crop Disease Classification Using Large Language Models | Paper | |
RS-MoE: A VisionโLanguage Model With Mixture of Experts for Remote Sensing Image Captioning and Visual Question Answering | ||
Identifying Potential Rural Residential Areas for Land Consolidation Using a Data-Driven Agent-Based Model | Paper | |
Planning to โHear the Farmerโs Voiceโ: an Agent-Based Modelling Approach to Agricultural Land Use Planning | Paper | |
A Framework for Data-Driven Agent-Based Modelling of Agricultural Land Use | Paper | |
An Agent-Based Model to Simulate the Cultivation Pattern Change of Farmer Households in the North China Plain | Paper |
Urban Governance
| Title | Application & Tags | Links |
|---|---|---|
MMUEChange: A generalized LLM agent framework for intelligent multi-modal urban environment change analysis | Paper | |
AgentSense: LLMs Empower Generalizable and Explainable Web-Based Participatory Urban Sensing | Paper | |
SoPerModel: Leveraging Social Perception for Multi-Agent Trajectory Prediction | Paper | |
AirSpatialBot: A Spatially Aware Aerial Agent for Fine-Grained Vehicle Attribute Recognition and Retrieval | Paper GitHub | |
LLM Agent Framework for Intelligent Change Analysis in Urban Environment Using Remote Sensing Imagery | Paper | |
UrbanLLaVA: A Multi-Modal Large Language Model for Urban Intelligence with Spatial Reasoning and Understanding | Paper GitHub | |
Automating Traffic Model Enhancement with AI Research Agent | Paper GitHub | |
Roads: Robust Prompt-Driven Multi-Class Anomaly Detection Under Domain Shift | ||
A SpatiotemporalโSemantic Coupling Intelligent Q&A Method for Land Use Approval Based on Knowledge Graphs and Intelligent Agents | Paper | |
NavAgent: Multi-scale Urban Street View Fusion For UAV Embodied Vision-and-Language Navigation | Paper | |
GenAI-Powered Multi-Agent Paradigm for Smart Urban Mobility: Opportunities and Challenges for Integrating Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) with Intelligent Transportation Systems | Paper | |
AgentMove: Predicting Human Mobility Anywhere Using Large Language Model Based Agentic Framework | Paper | |
Perceive, Reflect, and Plan: Designing LLM Agent for Goal-Directed City Navigation Without Instructions | Paper | |
UrbanKGent: A Unified Large Language Model Agent Framework for Urban Knowledge Graph Construction | Paper GitHub | |
Large Language Models as Urban Residents: An LLM Agent Framework for Personal Mobility Generation | Paper | |
Guide-LLM: An Embodied LLM Agent and Text-Based Topological Map for Robotic Guidance of People with Visual Impairments | Paper | |
OpenCity: A Scalable Platform to Simulate Urban Activities with Massive LLM Agents | Paper GitHub | |
TopoSense: Agent-Driven Topological Graph Extraction from Remote Sensing Image | Paper | |
GeoChat: Grounded Large Vision-Language Model for Remote Sensing | Paper GitHub | |
3D Question Answering for City Scene Understanding | Paper GitHub Dataset | |
VELMA: Verbalization Embodiment of LLM Agents for Vision and Language Navigation in Street View | Paper GitHub | |
EmbodiedCity: Embodied Aerial Agent for City-Level Visual Language Navigation Using Large Language Model | Paper | |
AirVista: Empowering UAVs With 3D Spatial Reasoning Abilities Through A Multimodal Large Language Model Agent | Paper | |
LLMLight: Large Language Models as Traffic Signal Control Agents | Paper | |
GeoGPT: Understanding and Processing Geospatial Tasks through An Autonomous GPT | Paper | |
Optimum landfill site selection by a hybrid multi-criteria and multi-Agent decision-making method in a temperate and humid climate: BWM-GIS-FAHP-GT | Paper GitHub | |
Urban Change Detection for Multispectral Earth Observation Using Convolutional Neural Networks |
Others
| Title | Application & Tags | Links |
|---|---|---|
VICoT-Agent: A Vision-Interleaved Chain-of-Thought Framework for Interpretable Multimodal Reasoning and Scalable Remote Sensing Analysis | Paper | |
Designing Domain-Specific Agents via Hierarchical Task Abstraction Mechanism | Paper GitHub | |
GeoFlow: Agentic Workflow Automation for Geospatial Tasks | Paper GitHub Dataset | |
RingMo-Agent: A Unified Remote Sensing Foundation Model for Multi-Platform and Multi-Modal Reasoning | Paper | |
An energy-efficient learning solution for the Agile Earth Observation Satellite Scheduling Problem | Paper | |
Multi-Agent Geospatial Copilots for Remote Sensing Workflows | Paper | |
Co-LLaVA: Efficient Remote Sensing Visual Question Answering via Model Collaboration | Paper GitHub | |
GIS Copilot: Towards an Autonomous GIS Agent for Spatial Analysis | Paper GitHub Dataset | |
Swarm Intelligence in Geo-Localization: A Multi-Agent Large Vision-Language Model Collaborative Framework | Paper | |
Chain-of-Programming (CoP): Empowering Large Language Models for Geospatial Code Generation Task | Paper | |
RS-Agent: Automating Remote Sensing Tasks through Intelligent Agent | Paper GitHub | |
Change-Agent: Towards Interactive Comprehensive Remote Sensing Change Interpretation and Analysis | Paper GitHub | |
Remote Sensing ChatGPT: Solving Remote Sensing Tasks with ChatGPT and Visual Models | Paper GitHub | |
GeoLLM-Engine: A Realistic Environment for Building Geospatial Copilots | Paper |
Training and evaluating remote sensing agents requires resources that go beyond static image-label pairs. Agents must integrate visual perception with reasoning, planning, and tool execution. We organize datasets and benchmarks into three tiers: Perception, Reasoning, and Decision-Making.
Datasets
| Category | Name | Size | Description |
|---|---|---|---|
| Perception | UC Merced Land Use | 2.1K images | Land-use classification |
| AID | 10K images | Standard scene classification | |
| xView | ~1M instances | Object detection in overhead imagery | |
| DOTA | 188K instances | Oriented object detection | |
| iSAID | 655K instances | Instance segmentation | |
| xBD | 850K instances | Building damage assessment | |
| EuroSAT | 27K images | Land use/land cover classification | |
| LEVIR-CD | 31K instances | Building change detection | |
| Topo-boundary | 25K images | Road topology extraction | |
| STAR | 210K instances | Scene graph generation | |
| SSGD | 3.1K instances | Spatial scene graph retrieval | |
| GID | 150 images | Land-cover classification (GF-2) | |
| FBP | 5B pixels | Country-scale semantic segmentation | |
| WUSU | 68K instances | Urban semantic understanding | |
| RealScene-ISTD | 739 images | Infrared UAV small-target detection | |
| Reasoning | ETH/UCY & nuScenes | 9K+1.4M | Trajectory prediction |
| AirSpatial | 206K instructions | Embodied spatial reasoning | |
| LEVIR-MCI | 10K pairs | Semantic change understanding | |
| GeoChat | 318K instances | Multimodal instruction following | |
| SkyEye-968k | 968K samples | Multi-task instruction tuning | |
| RSICap | 2.6K pairs | High-precision vision-language alignment | |
| UData | 353K instances | Cross-modal urban reasoning | |
| EarthVQA | 208K QA pairs | Relational VQA | |
| RS-VL3M | 3M pairs | Vision-language pretraining | |
| Decision-Making | RS-Agent | 18 tasks | Expert-guided tool invocation |
| RescueADI | 13.4K interactions | Adaptive disaster response | |
| AEOS-Bench | 16.4K scenarios | Constellation scheduling |
Benchmarks
| Category | Name | Feature | Scale |
|---|---|---|---|
| Perception | AgMTR | Few-shot segmentation | 5 test classes |
| TopoSense | Graph extraction | 2,685 images | |
| TREE-GPT | Interactive forest RS | 3 tiles | |
| STAR | Scene graph generation | 1,273 images | |
| Univ-1652 | Geo-localization | 1,652 buildings | |
| SSGD | Relationship retrieval | 3,130 samples | |
| EarthView | Large-scale pretraining | 15 terapixels | |
| Reasoning | AirSpatial-Bench | Spatial retrieval tasks | 1,773 pairs |
| RSVQA | Remote sensing QA | 77Kโ1M questions | |
| XLRS-Bench | Ultra-high-res reasoning | 45,942 questions | |
| RescueADI | Disaster interpretation | 998 tasks | |
| RSICap | Image description | 936 QA pairs | |
| UrbanLLaVA | Spatial reasoning | โ | |
| EarthVQA | Relational VQA | 1,809 images | |
| City-3DQA | 3D city understanding | 61K pairs | |
| Decision-Making | AEOS-Bench | Constellation scheduling | 16,410 scenarios |
| ThinkGeo | Tool-augmented tasks | 486 tasks (1,773 steps) | |
| RoadMind | Disaster response | 3 cities | |
| RS-Agent | Intent disambiguation & tool use | 18 tasks | |
| GeoBenchX | Geospatial reasoning & execution | 202 tasks | |
| ShapefileGPT | Geospatial workflow orchestration | 42 tasks | |
| GIS Copilot | Agent-assisted GIS decisions | 110 tasks |
The curated list and associated code in this repository are licensed under CC BY-NC 4.0. You are free to share and adapt the material for non-commercial purposes with appropriate attribution.
The survey paper (paper/) is All Rights Reserved โ copyright belongs to the authors. You may read and cite the paper, but redistribution or modification of the paper itself is not permitted without explicit written permission.
If you have any questions, suggestions, or would like to collaborate, feel free to reach out:
- Jiaqi Tang (Project Lead): jtang092@connect.ust.hk
- Wei Wei (Corresponding Author): weiweinwpu@nwpu.edu.cn
- Qifeng Chen (Corresponding Author): cqf@ust.hk