parakeet-rs

Fast speech recognition with NVIDIA's Parakeet models via ONNX Runtime.

Note: CoreML is unstable with this model. For Apple, use WebGPU EP (uses metal under the hood,dont confuse by its name :-). it's a native GPU standard, not only web) or CPU. But even CPU alone is significantly faster on my Mac M3 16GB compared to Whisper metal! :-)

Models

CTC (English-only):

use parakeet_rs::{Parakeet, Transcriber, TimestampMode}; let mut parakeet = Parakeet::from_pretrained(".", None)?; // Load and transcribe audio (see examples/raw.rs for full example) let result = parakeet.transcribe_samples(audio, 1600, 1, Some(TimestampMode::Words))?; println!("{}", result.text); // Token-level timestamps for token in result.tokens { println!("[{:.3}s - {:.3}s] {}", token.start, token.end, token.text); }

TDT (Multilingual): 25 languages with auto-detection

use parakeet_rs::{ParakeetTDT, Transcriber, TimestampMode}; let mut parakeet = ParakeetTDT::from_pretrained("./tdt", None)?; let result = parakeet.transcribe_samples(audio, 16000, 1, Some(TimestampMode::Sentences))?; println!("{}", result.text); // Token-level timestamps for token in result.tokens { println!("[{:.3}s - {:.3}s] {}", token.start, token.end, token.text); }

EOU (Streaming): Real-time ASR with end-of-utterance detection

use parakeet_rs::ParakeetEOU; let mut parakeet = ParakeetEOU::from_pretrained("./eou", None)?; // Prepare your audio (Vec<f32>, 16kHz mono, normalized) let audio: Vec<f32> = /* your audio samples */; // Process in 160ms chunks for streaming const CHUNK_SIZE: usize = 2560; // 160ms at 16kHz for chunk in audio.chunks(CHUNK_SIZE) { let text = parakeet.transcribe(chunk, false)?; print!("{}", text); }

Nemotron (Streaming): Cache-aware streaming ASR with punctuation

use parakeet_rs::Nemotron; let mut model = Nemotron::from_pretrained("./nemotron", None)?; // Process in 560ms chunks for streaming const CHUNK_SIZE: usize = 8960; // 560ms at 16kHz for chunk in audio.chunks(CHUNK_SIZE) { let text = model.transcribe_chunk(chunk)?; print!("{}", text); }

Multitalker (Streaming Multi-Speaker ASR): Speaker-attributed transcription

parakeet-rs = { version = "0.3", features = ["multitalker"] }

use parakeet_rs::MultitalkerASR; let mut model = MultitalkerASR::from_pretrained( "./multitalker", // encoder, decoder, tokenizer "sortformer.onnx", // Sortformer v2 for diarization None, )?; for chunk in audio.chunks(17920) { // ~1.12s at 16kHz let results = model.transcribe_chunk(chunk)?; for r in &results { println!("[Speaker {}] {}", r.speaker_id, r.text); } }

See examples/multitalker.rs for full usage with latency modes.

Sortformer v2 & v2.1 (Speaker Diarization): Streaming 4-speaker diarization

parakeet-rs = { version = "0.3", features = ["sortformer"] }

use parakeet_rs::sortformer::{Sortformer, DiarizationConfig}; let mut sortformer = Sortformer::with_config( "diar_streaming_sortformer_4spk-v2.onnx", // or v2.1.onnx None, DiarizationConfig::callhome(), // or dihard3(),custom() )?; let segments = sortformer.diarize(audio, 16000, 1)?; for seg in segments { println!("Speaker {} [{:.2}s - {:.2}s]", seg.speaker_id, seg.start as f64 / 16_000.0, seg.end as f64 / 16_000.0); } // For streaming/real-time use, diarize_chunk() preserves state across calls: let segments = sortformer.diarize_chunk(&audio_chunk_16k_mono)?;

See examples/diarization.rs for combining with TDT transcription.

See examples/streaming_diarization.rs for diarize_chunk usage example.

See scripts/export_diar_sortformer.py for exporting the model with custom streaming parameters.

Setup

CTC: Download from HuggingFace: model.onnx, model.onnx_data, tokenizer.json

TDT: Download from HuggingFace: encoder-model.onnx, encoder-model.onnx.data, decoder_joint-model.onnx, vocab.txt

EOU: Download from HuggingFace: encoder.onnx, decoder_joint.onnx, tokenizer.json

Nemotron: Download from HuggingFace: encoder.onnx, encoder.onnx.data, decoder_joint.onnx, tokenizer.model (int8 / int4)

Multitalker: Download from HuggingFace: encoder.int8.onnx, decoder_joint.int8.onnx, tokenizer.model (also needs a Sortformer model for diarization)

Diarization (Sortformer v2 & v2.1): Download from HuggingFace: diar_streaming_sortformer_4spk-v2.onnx or v2.1.onnx.

Quantized versions available (int8). All files must be in the same directory.

GPU support (auto-falls back to CPU if fails):

parakeet-rs = { version = "0.3", features = ["cuda"] } # or tensorrt, webgpu, directml, migraphx or other ort supported EPs (check cargo features)

use parakeet_rs::{Parakeet, ExecutionConfig, ExecutionProvider}; let config = ExecutionConfig::new().with_execution_provider(ExecutionProvider::Cuda); let mut parakeet = Parakeet::from_pretrained(".", Some(config))?;

Advanced session configuration via ort SessionBuilder:

let config = ExecutionConfig::new() .with_custom_configure(|builder| builder.with_memory_pattern(false));

Features

CTC: English with punctuation & capitalization
TDT: Multilingual (auto lang detection)
EOU: Streaming ASR with end-of-utterance detection
Nemotron: Cache aware streaming ASR (600M params,EN only)
Multitalker: Streaming multi-speaker ASR with speaker-kernel injection (ONNX int8)
Sortformer v2 & v2.1: Streaming speaker diarization (up to 4 speakers) NOTE: you can also download v2.1 model same way.
Token-level timestamps (CTC, TDT)

Notes

Audio: 16kHz mono WAV (16-bit PCM or 32-bit float)
CTC/TDT models have ~4-5 minute audio length limit. For longer files, use streaming models or split into chunks

License

Code: MIT OR Apache-2.0

FYI: The Parakeet ONNX models (downloaded separately from HuggingFace) by NVIDIA. This library does not distribute the models.

Name		Name	Last commit message	Last commit date
Latest commit History 262 Commits
.github/workflows		.github/workflows
examples		examples
scripts		scripts
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

parakeet-rs

Models

Setup

Features

Notes

License

About

Uh oh!

Releases 22

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

parakeet-rs

Models

Setup

Features

Notes

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 22

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages