SparseSSM

Efficient Selective Structured State Space Models Can Be Pruned in One-Shot

State-space language models such as Mamba match Transformer quality while permitting linear complexity inference, yet still comprise billions of parameters that hinder deployment. Existing one-shot pruning methods are tailored to attention blocks and fail to account for the time-shared and discretized state-transition matrix at the heart of the selective state-space module (SSM). In this paper, we introduce SparseSSM, the first training-free pruning framework that extends the classic optimal brain surgeon (OBS) framework to state space architectures. Our layer-wise algorithm (i) derives an approximate second-order saliency score that aggregates Hessian-trace information across time steps, (ii) incorporates a component sensitivity analysis to guide feed-forward network (FFN) pruning, which also sheds light on where redundancy resides in mamba architecture, (iii) can be easily extended to semi-structured and structured sparsity. Empirically, we prune 50% of SSM weights without fine-tuning and observe no zero-shot accuracy loss, achieving the current state-of-the-art pruning algorithm for Mamba-based LLMs.

🚀Quick Start

1. Install environment

git clone https://github.com/CFinTech/SparseSSM cd SparseSSM pip install -r requirements.txt

2. Download dataset

The data for calibrations can be downloaded here.

3. Execute

To prune the SSM module, you can run the following command:

CUDA_VISIBLE_DEVICES=${your_gpu_id} python main.py \ path/to/your/model wikitext2 \ --experiment_name your_experiment_name\ --method "sparsessm_dev" \ --save path/to/pruned_model \ --sparsity 0.5 \ --nsamples 64 \ --minlayer 0 \ --maxlayer 100 \ --prune_A True \ --do_prune \ --eval_zero_shot \ --log_wandb \

🖼️ Method Overview

Illustration of SparseSSM. The first row depicts the evolution of the diagonal parameter matrix $A_{log}$ within the SSM module in Mamba, together with a schematic of the forward-propagation process. In the second row, the left panel shows the procedure for obtaining a mask from the Hessian estimate at a single time step, while the right panel presents our weighted strategy for merging the masks across all time steps, darker background indicates larger weights.

📊 Comparison of Experimental Results

Performance analysis for one-shot unstructured pruning of SSM modules in Mamba models at $50%$ sparsity.

🙏 Acknowledgements

This source code is derived from the famous PyTorch reimplementation of SparseGPT and mamba-minimal.
We use Mamba checkpoints to test our method.
The README file is inspired by LLM-pruner.

Citation

If you find this work useful for your research, please consider citing our paper:

@article{tuo2025sparsessm, title={SparseSSM: Efficient Selective Structured State Space Models Can Be Pruned in One-Shot}, author={Kaiwen Tuo and Huan Wang}, journal={arXiv preprint arXiv:2506.09613}, year={2025}, }

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
eval		eval
exact_match		exact_match
img/README.assets		img/README.assets
model		model
prune		prune
utils		utils
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SparseSSM

Efficient Selective Structured State Space Models Can Be Pruned in One-Shot

🚀Quick Start

1. Install environment

2. Download dataset

3. Execute

🖼️ Method Overview

📊 Comparison of Experimental Results

🙏 Acknowledgements

Citation

About

Uh oh!

Releases

Packages

Languages

CFinTech/SparseSSM

Folders and files

Latest commit

History

Repository files navigation

SparseSSM

Efficient Selective Structured State Space Models Can Be Pruned in One-Shot

🚀Quick Start

1. Install environment

2. Download dataset

3. Execute

🖼️ Method Overview

📊 Comparison of Experimental Results

🙏 Acknowledgements

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages