Fun-ASR vLLM Acceleration

This repository provides an accelerated implementation of Fun-ASR using vLLM. By leveraging vLLM's efficient attention mechanisms and memory management, this project significantly boosts the inference performance of Fun-ASR models while maintaining accuracy.

Environment Setup 🐍

To get started, clone the repository and install the required dependencies:

git clone https://github.com/yuekaizhang/Fun-ASR-vllm.git cd Fun-ASR-vllm apt-get install -y ffmpeg uv pip install -r requirements.txt

Features 📝

Support VLLM
Support batch > 1 Inference
Support sensevoice encoder acceleration
Integration with Nvidia Triton Inference Server

Usage 🛠️

Python API Inference

You can run inference directly using the Python API:

from model import FunASRNano from vllm import LLM, SamplingParams def main(): model_dir = "FunAudioLLM/Fun-ASR-Nano-2512" # Load the base model m, kwargs = FunASRNano.from_pretrained(model=model_dir, device="cuda:0") m.eval() # Initialize vLLM vllm = LLM(model="yuekai/Fun-ASR-Nano-2512-vllm", enable_prompt_embeds=True, gpu_memory_utilization=0.4) sampling_params = SamplingParams( top_p=0.001, max_tokens=500, ) # Attach vLLM to the model m.vllm = vllm m.vllm_sampling_params = sampling_params # Run inference wav_path = f"{kwargs['model_path']}/example/zh.mp3" res = m.inference(data_in=[wav_path], **kwargs) print(res) text = res[0][0]["text"] print(text) if __name__ == "__main__": main()

Running Benchmarks

To evaluate performance on a dataset (e.g., SpeechIO):

dataset_name="yuekai/speechio" subset_name="SPEECHIO_ASR_ZH00007" split_name="test" uv run python \ infer.py \ --model_dir FunAudioLLM/Fun-ASR-Nano-2512 \ --huggingface_dataset $dataset_name \ --subset_name $subset_name \ --split_name $split_name \ --batch_size 16 \ --log_dir ./logs_vllm_$dataset_name_$subset_name \ --vllm_model_dir yuekai/Fun-ASR-Nano-2512-vllm

Performance 🚀

We compared the performance of the standard HuggingFace PyTorch implementation against our vLLM-accelerated version.

Benchmark Details:

Dataset: SPEECHIO_ASR_ZH00007 (approx. 1 hour of audio)
Hardware: Single NVIDIA H20 GPU

Mode	Decoding Time	RTF	RTFx	CER	Note
Huggingface PyTorch	218.2 Secs	0.06	16.5	7.02%	batch_size=1
Huggingface PyTorch	45.4 Secs	0.013	79.3	8.53%	batch_size=16
vLLM (Qwen3-0.6B)	145.6 Secs	0.04	24.7	6.99%	batch_size=1
vLLM (Qwen3-0.6B)	26.3 Secs	0.007	136.9	7.03%	batch_size=16

Note: RTF (Real Time Factor) - lower is better; RTFx (Speedup factor) - higher is better.

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
.gitignore		.gitignore
ASR_client_api.py		ASR_client_api.py
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
advanced_audio_processor.py		advanced_audio_processor.py
asr_check.py		asr_check.py
compute_wer_line.py		compute_wer_line.py
infer.py		infer.py
infer_kaldidata.py		infer_kaldidata.py
kaldi_text_itn.py		kaldi_text_itn.py
kaldi_text_normalizer.py		kaldi_text_normalizer.py
model.py		model.py
requirements.txt		requirements.txt
run_add_punc.py		run_add_punc.py
run_asr_data.sh		run_asr_data.sh
run_audio_cat_cut.py		run_audio_cat_cut.py
run_sense_voice.py		run_sense_voice.py
scp2svsjsonl.py		scp2svsjsonl.py
simple_audio_processor.py		simple_audio_processor.py
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fun-ASR vLLM Acceleration

Environment Setup 🐍

Features 📝

Usage 🛠️

Python API Inference

Running Benchmarks

Performance 🚀

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Fun-ASR vLLM Acceleration

Environment Setup 🐍

Features 📝

Usage 🛠️

Python API Inference

Running Benchmarks

Performance 🚀

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages