State-space language models such as Mamba match Transformer quality while permitting linear complexity inference, yet still comprise billions of parameters that hinder deployment. Existing one-shot pruning methods are tailored to attention blocks and fail to account for the time-shared and discretized state-transition matrix at the heart of the selective state-space module (SSM). In this paper, we introduce SparseSSM, the first training-free pruning framework that extends the classic optimal brain surgeon (OBS) framework to state space architectures. Our layer-wise algorithm (i) derives an approximate second-order saliency score that aggregates Hessian-trace information across time steps, (ii) incorporates a component sensitivity analysis to guide feed-forward network (FFN) pruning, which also sheds light on where redundancy resides in mamba architecture, (iii) can be easily extended to semi-structured and structured sparsity. Empirically, we prune 50% of SSM weights without fine-tuning and observe no zero-shot accuracy loss, achieving the current state-of-the-art pruning algorithm for Mamba-based LLMs.
git clone https://github.com/CFinTech/SparseSSM cd SparseSSM pip install -r requirements.txtThe data for calibrations can be downloaded here.
To prune the SSM module, you can run the following command:
CUDA_VISIBLE_DEVICES=${your_gpu_id} python main.py \ path/to/your/model wikitext2 \ --experiment_name your_experiment_name\ --method "sparsessm_dev" \ --save path/to/pruned_model \ --sparsity 0.5 \ --nsamples 64 \ --minlayer 0 \ --maxlayer 100 \ --prune_A True \ --do_prune \ --eval_zero_shot \ --log_wandb \Illustration of SparseSSM. The first row depicts the evolution of the diagonal parameter matrix
Performance analysis for one-shot unstructured pruning of SSM modules in Mamba models at
- This source code is derived from the famous PyTorch reimplementation of SparseGPT and mamba-minimal.
- We use Mamba checkpoints to test our method.
- The README file is inspired by LLM-pruner.
If you find this work useful for your research, please consider citing our paper:
@article{tuo2025sparsessm, title={SparseSSM: Efficient Selective Structured State Space Models Can Be Pruned in One-Shot}, author={Kaiwen Tuo and Huan Wang}, journal={arXiv preprint arXiv:2506.09613}, year={2025}, } 

