Skip to content

EMI-Group/evonas

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

73 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EvoNAS Paper on arXiv HuggingFace

Introduction

Modern computer vision tasks require a delicate balance between predictive accuracy and real-time efficiency, but the substantial inference cost of large vision models (LVMs) notably restricts their deployment on resource-constrained edge devices. EvoNAS addresses this by introducing a highly efficient, multi-objective evolutionary architecture search framework. To overcome the severe representation collapse and ranking inconsistency typical in conventional weight-sharing paradigms, EvoNAS utilizes a hybrid Vision State Space and Vision Transformer (VSS-ViT) supernet optimized via the Progressive Supernet Training (PST) strategy. This is further stabilized by a novel Cross-Architecture Dual-Domain Knowledge Distillation (CA-DDKD) approach, which aligns features in both spatial and frequency domains using DCT constraints to lock in high-frequency geometric priors. Evaluated through a hardware-isolated Distributed Multi-Model Parallel Evaluation (DMMPE) engine that eliminates computational noise, the resulting EvoNets establish Pareto-optimal trade-offs, demonstrating robust generalizability from 2D dense prediction to high-fidelity 3D rendering tasks like 3D Gaussian Splatting.


Key Features

  • 🔬 Hybrid VSS-ViT Search Space: Combines linear-time Vision State Space (VSS) modules for local geometric feature capture with Vision Transformer (ViT) modules for global semantic reasoning.
  • Progressive Supernet Training (PST): A curriculum-learning strategy that expands from maximum-capacity configurations to compact variants, ensuring a smooth fitness landscape and stable supernet convergence.
  • 🧠 CA-DDKD Strategy: Cross-Architecture Dual-Domain Knowledge Distillation using DCT constraints to mitigate representation collapse and preserve high-frequency geometric priors across both spatial and frequency domains.
  • 🚀 DMMPE Framework: A hardware-isolated distributed evaluation engine with GPU resource pooling and asynchronous scheduling, eliminating latency jitter during parallel architecture evaluation.
  • 🌐 Universal Geometric Transferability: Generalizes across COCO, ADE20K, KITTI/NYU v2, and 3D Gaussian Splatting without task-specific design changes.

Results

🔬 This project accompanies a paper currently under review. Quantitative results and pre-trained model weights will be released upon acceptance.


Installation

Warning

Mamba SSM requires a prebuilt CUDA wheel that must match your exact Python, CUDA, and PyTorch versions. Download the appropriate .whl from the Mamba releases page before proceeding.

# 1. Create environment conda create -n EvoNAS python=3.10 -y && conda activate EvoNAS # 2. Install PyTorch (CUDA 11.8) conda install pytorch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 \ pytorch-cuda=11.8 -c pytorch -c nvidia -y # 3. Install Mamba SSM (replace with your downloaded wheel path) pip install /path/to/mamba_ssm-2.2.4+cu11torch2.4cxx11abiFALSE-cp310-cp310-linux_x86_64.whl # 4. Install Spatial-Mamba kernel (source: https://github.com/EdwardChasel/Spatial-Mamba) cd kernels/selective_scan && pip install . cd kernels/dwconv2d && pip install . # 5. Install remaining dependencies pip install timm==0.4.12 fvcore tensorboardX mmcv==2.2.0 \ numpy==2.0.1 scipy==1.15.2 pymoo==0.6.1.3 \ ptflops==0.7.4 pandas Cython==3.0.12 # 6. Task-specific toolkits (install as needed) pip install mmdet==3.3.0 # Object Detection pip install mmsegmentation==1.2.2 ftfy==6.3.1 # Semantic Segmentation

Data Preparation

Download the datasets and organize them as follows. Pre-defined train/test split files are provided in data_splits/.

data/ ├── NYU_Depth_V2/ │ ├── sync/ ← training RGB-D frames │ └── test/ ← test images ├── KITTI/ │ ├── raw/ ← raw KITTI sequences │ └── depth/ ← ground-truth depth maps ├── coco/ │ ├── train2017/ │ ├── val2017/ │ └── annotations/ └── ADEChallengeData2016/ ← ADE20K ├── images/ └── annotations/ 

Note

Update --data_path and --gt_path in the relevant config files under configs/ to point to your local data directories before running any scripts.


Quickstart

Note

Stage 1 requires the ImageNet-1k pretrained supernet weights (vssd_supernet_imagenet_1k.pth). Download link coming soon. Place the file in the project root before fine-tuning.

EvoNAS follows a four-stage pipeline: (1) pretrain the supernet on ImageNet-1k, (2) fine-tune it on the target dataset using the PST strategy, (3) run the multi-objective evolutionary search, and (4) retrain the discovered subnet.

KITTI (example)

sh scripts/whole_run_kitti.sh # PST fine-tuning (all 8 steps) python MambaDepthNAS/search.py configs/search/search_kitti.txt # evolutionary search python MambaDepthNAS/retrain.py configs/retrain_kitti.txt # retrain
Commands for all supported tasks

NYU Depth v2

sh scripts/whole_run_nyu.sh python MambaDepthNAS/search.py configs/search/search_nyu.txt python MambaDepthNAS/retrain.py configs/retrain_nyu.txt

COCO Object Detection

sh scripts/detection/supernet_steptrain.sh python DetectionNAS/search.py DetectionNAS/configs/01_search/search_coco.txt python DetectionNAS/retrain.py DetectionNAS/configs/02_retrain/retrain_supernet_base.txt

ADE20K Semantic Segmentation

sh scripts/segment/supernet_steptrain_ade20k.sh python SegmentNAS/search.py SegmentNAS/configs/01_search/search_ade20k.txt python SegmentNAS/retrain.py SegmentNAS/configs/02_retrain/retrain_supernet_base.txt

Project Structure

EvoNAS/ ├── MambaDepthNAS/ # Monocular depth estimation module │ ├── train.py # PST supernet fine-tuning │ ├── search.py # NSGA-II/III evolutionary search │ ├── retrain.py # Subnet retraining │ ├── networks/ # VSS-ViT encoder + decoder variants │ └── distillation/ # CA-DDKD (spatial + frequency KD) ├── DetectionNAS/ # Object detection (MMDetection) ├── SegmentNAS/ # Semantic segmentation (MMSegmentation) ├── configs/ # PST, search, and retrain configs ├── scripts/ # End-to-end shell scripts per dataset ├── data_splits/ # Official train/test split file lists └── tools/ # Visualization (evolution curve, HV, det/seg) 

Acknowledgements

EvoNAS builds on a strong ecosystem of open-source tools. We are grateful to the teams behind PyTorch, Mamba SSM, Spatial-Mamba, MMDetection, MMSegmentation, pymoo, and timm for making this work possible.


License

EvoNAS 遵循 GNU 通用公共许可证 3.0 (GPL-3.0) 进行授权。完整的条款和条件请参阅 LICENSE 文件。


⭐ If you find this project helpful, please consider giving it a star.

About

EvoNAS is a framework for neural architecture search , implemented with PyTorch. It supports supernet training, evolutionary multi-objective optimization, and seamless integration with modern computer vision training pipelines, including depth estimation, object detection, and semantic segmentation.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors