A fast, offline audio transcription tool using whisper.cpp. Transcribe audio files to text without internet or cloud services.
brew tap rodnoy/orangenote brew install orangenote-cligit clone https://github.com/rodnoy/orangenote-cli.git cd orangenote-cli cargo build --release --features whisper sudo ln -sf $(pwd)/target/release/orangenote-cli /usr/local/bin/Download the latest release for your platform from GitHub Releases:
orangenote-cli-*-aarch64-apple-darwin.tar.gz— macOS Apple Silicon (M1/M2/M3/M4)orangenote-cli-*-x86_64-apple-darwin.tar.gz— macOS Intelorangenote-cli-*-x86_64-unknown-linux-gnu.tar.gz— Linux x86_64
# Example: macOS Apple Silicon tar -xzf orangenote-cli-v0.2.0-aarch64-apple-darwin.tar.gz sudo mv orangenote-cli /usr/local/bin/- 🎙️ Audio Transcription — Convert speech to text accurately
- 🌍 Multi-language Support — Auto-detect or specify language
- 📁 Multiple Formats — MP3, WAV, FLAC, M4A, OGG, WMA
- ⚡ Fast Processing — Multi-threaded, optimized for performance
- 💾 Output Formats — JSON, SRT, VTT, TXT, TSV
- 🔌 Modular Design — Easy integration with other applications
- 📦 Offline-First — All processing happens locally
- ✂️ Audio Chunking — Split long files for better transcription quality
After installation, download a Whisper model:
# Download base model (140MB, good balance) orangenote-cli model download base # Or medium for better accuracy (1.5GB) orangenote-cli model download medium# Transcribe with default settings orangenote-cli transcribe input.mp3 # With specific language orangenote-cli transcribe input.mp3 --language ru # With different model (tiny, base, small, medium, large) orangenote-cli transcribe input.mp3 --model small # Save to file orangenote-cli transcribe input.mp3 --output result.json # As SRT subtitles orangenote-cli transcribe video.m4a --format srt -o subs.srt # Long podcast with chunking (recommended for >30 min) orangenote-cli transcribe podcast.mp3 --chunk-size 5 --model mediumorangenote-cli transcribe <INPUT> [OPTIONS]Options:
| Option | Description | Default |
|---|---|---|
-m, --model | Whisper model (tiny, base, small, medium, large) | base |
-l, --language | Language code (en, ru, fr, etc.) or auto-detect | auto |
-f, --format | Output format (json, srt, vtt, txt, tsv) | json |
-o, --output | Output file (stdout if not specified) | - |
-t, --threads | Processing threads | 4 |
--translate | Translate to English | false |
--chunk-size | Chunk size in minutes (0 = disabled) | 0 |
--chunk-overlap | Overlap between chunks in seconds | 5 |
# List available models orangenote-cli model list # Download a model orangenote-cli model download medium # Check model status orangenote-cli model status # Remove a model orangenote-cli model remove baseorangenote-cli infoWhen transcribing long audio files (podcasts, lectures, meetings), whisper.cpp may produce poor results — repeating noise labels like "[Music]" or hallucinating text. The solution is to split audio into smaller chunks.
┌────────────────────────────────────────────────────────────┐ │ Original Audio File │ └────────────────────────────────────────────────────────────┘ ↓ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ Chunk 1 │ │ Chunk 2 │ │ Chunk 3 │ │ 0:00-5:00 │ │ 4:55-9:55 │ │ 9:50-12:00 │ └──────────────┘ └──────────────┘ └──────────────┘ ↓ ↓ ↓ Transcribe Transcribe Transcribe ↓ ↓ ↓ ┌────────────────────────────────────────────────────────────┐ │ Merged Result (duplicates removed) │ └────────────────────────────────────────────────────────────┘ # Transcribe a 1-hour podcast with 5-minute chunks orangenote-cli transcribe podcast.mp3 \ --model medium \ --language ru \ --chunk-size 5 # With custom overlap (10 seconds) orangenote-cli transcribe lecture.mp3 \ --model small \ --chunk-size 10 \ --chunk-overlap 10 # Long meeting recording orangenote-cli transcribe meeting.wav \ --model medium \ --language en \ --chunk-size 5 \ --output meeting_transcript.json| Audio Duration | Recommended Chunk Size | Model |
|---|---|---|
| < 10 minutes | No chunking needed | base/small |
| 10-30 minutes | 5 minutes | small/medium |
| 30-60 minutes | 5-10 minutes | medium |
| > 1 hour | 5 minutes | medium/large |
🎵 Processing audio... 📦 Using chunked transcription (5 min chunks, 5s overlap) Processing chunk 1/12... Processing chunk 2/12... ... Processing chunk 12/12... ✓ Transcription complete! Detected language: ru Segments: 156 Average confidence: 94.32% | Model | Size | Speed | Accuracy | RAM | Best For |
|---|---|---|---|---|---|
| tiny | 39M | ⭐⭐⭐⭐⭐ | ⭐ | ~500MB | Quick tests |
| base | 140M | ⭐⭐⭐⭐ | ⭐⭐ | ~1GB | Short clips |
| small | 466M | ⭐⭐⭐ | ⭐⭐⭐ | ~2GB | General use |
| medium | 1.5G | ⭐⭐ | ⭐⭐⭐⭐ | ~5GB | Quality transcription |
| large | 2.9G | ⭐ | ⭐⭐⭐⭐⭐ | ~10GB | Maximum accuracy |
Recommendation: Use small or medium model with --language flag for best results.
- JSON — Full structured output with timestamps and confidence scores
- SRT — SubRip format for video players
- VTT — WebVTT format for web videos
- TXT — Plain text (text only, no timestamps)
- TSV — Tab-separated for spreadsheets
orangenote-cli transcribe podcast.mp3 \ --model medium \ --language ru \ --chunk-size 5 \ --output transcript.jsonorangenote-cli transcribe movie.m4a \ --model small \ --format srt \ --output subtitles.srtorangenote-cli transcribe interview.wav \ --model large \ --language en \ --threads 8 \ --chunk-size 5 \ --output interview.jsonorangenote-cli transcribe french_audio.mp3 \ --model medium \ --language fr \ --translate \ --output english_translation.txtfor file in *.mp3; do orangenote-cli transcribe "$file" \ --model small \ --chunk-size 5 \ --output "${file%.mp3}.json" doneEnable detailed logging:
# Verbose mode (debug level) orangenote-cli transcribe input.mp3 --verbose # Specific log level orangenote-cli transcribe input.mp3 --log-level debugUse chunking with a smaller chunk size:
orangenote-cli transcribe audio.mp3 --model medium --chunk-size 5- Use a larger model (
mediumorlarge) - Specify the language explicitly with
--language - Enable chunking for long files
Check the path exists:
ls -la your_file.mp3Convert the file first (requires ffmpeg):
ffmpeg -i input.ogg -ar 16000 -ac 1 output.wavUse a smaller model:
orangenote-cli transcribe audio.mp3 --model tiny- Increase threads:
--threads 8 - Use a smaller model:
--model small - Use chunking to process in parallel (future feature)
Current Version: 0.2.0
- Basic CLI structure
- Argument parsing and validation
- Whisper backend integration
- Model management (download, list, remove)
- Multiple output formats (JSON, SRT, VTT, TXT, TSV)
- Audio chunking for long files
- Duplicate segment removal
- Confidence-based segment selection
- Diarization support (speaker identification)
- Result caching
- REST API
- Desktop UI (Tauri)
- Parallel chunk processing
Full documentation is available in Russian: doc/README_RU.md
# Update Rust rustup update # Install development tools rustup component add rustfmt clippy# Debug build cargo build --features whisper # Release build cargo build --release --features whisper # Run tests cargo test --features whisper# Format code cargo fmt # Lint cargo clippy --features whisper # Run all tests cargo test --features whisperorangenote-cli/ ├── src/ │ ├── lib.rs # Library exports │ ├── bin/ │ │ └── orangenote-cli.rs # Main CLI binary │ └── infrastructure/ │ ├── audio/ │ │ ├── chunk.rs # Audio chunking │ │ ├── decoder.rs # Audio decoding │ │ └── processor.rs # Audio processing │ └── transcription/ │ └── whisper/ │ ├── transcriber.rs # Main transcription engine │ ├── merger.rs # Result merging │ ├── context.rs # Whisper context wrapper │ └── model_manager.rs # Model management ├── vendor/ │ └── whisper.cpp/ # Whisper.cpp submodule ├── doc/ # Documentation ├── Cargo.toml # Project manifest └── build.rs # Build script for whisper.cpp Contributions are welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
MIT License - see LICENSE file for details
- whisper.cpp — Underlying transcription engine
- OpenAI Whisper — Original model and paper
Made with ❤️ by the OrangeNote Team