OrangeNote CLI

A fast, offline audio transcription tool using whisper.cpp. Transcribe audio files to text without internet or cloud services.

Installation

macOS (Homebrew) — Recommended

brew tap rodnoy/orangenote brew install orangenote-cli

From Source

git clone https://github.com/rodnoy/orangenote-cli.git cd orangenote-cli cargo build --release --features whisper sudo ln -sf $(pwd)/target/release/orangenote-cli /usr/local/bin/

Download Binary

Download the latest release for your platform from GitHub Releases:

orangenote-cli-*-aarch64-apple-darwin.tar.gz — macOS Apple Silicon (M1/M2/M3/M4)
orangenote-cli-*-x86_64-apple-darwin.tar.gz — macOS Intel
orangenote-cli-*-x86_64-unknown-linux-gnu.tar.gz — Linux x86_64

# Example: macOS Apple Silicon tar -xzf orangenote-cli-v0.2.0-aarch64-apple-darwin.tar.gz sudo mv orangenote-cli /usr/local/bin/

Features

🎙️ Audio Transcription — Convert speech to text accurately
🌍 Multi-language Support — Auto-detect or specify language
📁 Multiple Formats — MP3, WAV, FLAC, M4A, OGG, WMA
⚡ Fast Processing — Multi-threaded, optimized for performance
💾 Output Formats — JSON, SRT, VTT, TXT, TSV
🔌 Modular Design — Easy integration with other applications
📦 Offline-First — All processing happens locally
✂️ Audio Chunking — Split long files for better transcription quality

Quick Start

After installation, download a Whisper model:

# Download base model (140MB, good balance) orangenote-cli model download base # Or medium for better accuracy (1.5GB) orangenote-cli model download medium

Basic Usage

# Transcribe with default settings orangenote-cli transcribe input.mp3 # With specific language orangenote-cli transcribe input.mp3 --language ru # With different model (tiny, base, small, medium, large) orangenote-cli transcribe input.mp3 --model small # Save to file orangenote-cli transcribe input.mp3 --output result.json # As SRT subtitles orangenote-cli transcribe video.m4a --format srt -o subs.srt # Long podcast with chunking (recommended for >30 min) orangenote-cli transcribe podcast.mp3 --chunk-size 5 --model medium

Commands

Transcribe

orangenote-cli transcribe <INPUT> [OPTIONS]

Options:

Option	Description	Default
`-m, --model`	Whisper model (tiny, base, small, medium, large)	base
`-l, --language`	Language code (en, ru, fr, etc.) or auto-detect	auto
`-f, --format`	Output format (json, srt, vtt, txt, tsv)	json
`-o, --output`	Output file (stdout if not specified)	-
`-t, --threads`	Processing threads	4
`--translate`	Translate to English	false
`--chunk-size`	Chunk size in minutes (0 = disabled)	0
`--chunk-overlap`	Overlap between chunks in seconds	5

Model Management

# List available models orangenote-cli model list # Download a model orangenote-cli model download medium # Check model status orangenote-cli model status # Remove a model orangenote-cli model remove base

System Info

orangenote-cli info

Audio Chunking for Long Files

When transcribing long audio files (podcasts, lectures, meetings), whisper.cpp may produce poor results — repeating noise labels like "[Music]" or hallucinating text. The solution is to split audio into smaller chunks.

How It Works

┌────────────────────────────────────────────────────────────┐ │ Original Audio File │ └────────────────────────────────────────────────────────────┘ ↓ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ Chunk 1 │ │ Chunk 2 │ │ Chunk 3 │ │ 0:00-5:00 │ │ 4:55-9:55 │ │ 9:50-12:00 │ └──────────────┘ └──────────────┘ └──────────────┘ ↓ ↓ ↓ Transcribe Transcribe Transcribe ↓ ↓ ↓ ┌────────────────────────────────────────────────────────────┐ │ Merged Result (duplicates removed) │ └────────────────────────────────────────────────────────────┘

Usage

# Transcribe a 1-hour podcast with 5-minute chunks orangenote-cli transcribe podcast.mp3 \ --model medium \ --language ru \ --chunk-size 5 # With custom overlap (10 seconds) orangenote-cli transcribe lecture.mp3 \ --model small \ --chunk-size 10 \ --chunk-overlap 10 # Long meeting recording orangenote-cli transcribe meeting.wav \ --model medium \ --language en \ --chunk-size 5 \ --output meeting_transcript.json

Recommendations

Audio Duration	Recommended Chunk Size	Model
< 10 minutes	No chunking needed	base/small
10-30 minutes	5 minutes	small/medium
30-60 minutes	5-10 minutes	medium
> 1 hour	5 minutes	medium/large

Output Example

🎵 Processing audio... 📦 Using chunked transcription (5 min chunks, 5s overlap) Processing chunk 1/12... Processing chunk 2/12... ... Processing chunk 12/12... ✓ Transcription complete! Detected language: ru Segments: 156 Average confidence: 94.32%

Model Comparison

Model	Size	Speed	Accuracy	RAM	Best For
tiny	39M	⭐⭐⭐⭐⭐	⭐	~500MB	Quick tests
base	140M	⭐⭐⭐⭐	⭐⭐	~1GB	Short clips
small	466M	⭐⭐⭐	⭐⭐⭐	~2GB	General use
medium	1.5G	⭐⭐	⭐⭐⭐⭐	~5GB	Quality transcription
large	2.9G	⭐	⭐⭐⭐⭐⭐	~10GB	Maximum accuracy

Recommendation: Use small or medium model with --language flag for best results.

Output Formats

JSON — Full structured output with timestamps and confidence scores
SRT — SubRip format for video players
VTT — WebVTT format for web videos
TXT — Plain text (text only, no timestamps)
TSV — Tab-separated for spreadsheets

Examples

Transcribe Russian podcast

orangenote-cli transcribe podcast.mp3 \ --model medium \ --language ru \ --chunk-size 5 \ --output transcript.json

Create video subtitles

orangenote-cli transcribe movie.m4a \ --model small \ --format srt \ --output subtitles.srt

High-accuracy English transcription

orangenote-cli transcribe interview.wav \ --model large \ --language en \ --threads 8 \ --chunk-size 5 \ --output interview.json

Translate French to English

orangenote-cli transcribe french_audio.mp3 \ --model medium \ --language fr \ --translate \ --output english_translation.txt

Batch processing

for file in *.mp3; do orangenote-cli transcribe "$file" \ --model small \ --chunk-size 5 \ --output "${file%.mp3}.json" done

Logging

Enable detailed logging:

# Verbose mode (debug level) orangenote-cli transcribe input.mp3 --verbose # Specific log level orangenote-cli transcribe input.mp3 --log-level debug

Troubleshooting

"[Music]" or repeated noise labels

Use chunking with a smaller chunk size:

orangenote-cli transcribe audio.mp3 --model medium --chunk-size 5

Poor transcription quality

Use a larger model (medium or large)
Specify the language explicitly with --language
Enable chunking for long files

File not found

Check the path exists:

ls -la your_file.mp3

Unsupported format

Convert the file first (requires ffmpeg):

ffmpeg -i input.ogg -ar 16000 -ac 1 output.wav

Out of memory

Use a smaller model:

orangenote-cli transcribe audio.mp3 --model tiny

Slow performance

Increase threads: --threads 8
Use a smaller model: --model small
Use chunking to process in parallel (future feature)

Project Status

Current Version: 0.2.0

Implemented Features

Basic CLI structure
Argument parsing and validation
Whisper backend integration
Model management (download, list, remove)
Multiple output formats (JSON, SRT, VTT, TXT, TSV)
Audio chunking for long files
Duplicate segment removal
Confidence-based segment selection

Roadmap

Diarization support (speaker identification)
Result caching
REST API
Desktop UI (Tauri)
Parallel chunk processing

Documentation

Full documentation is available in Russian: doc/README_RU.md

Development

Requirements

# Update Rust rustup update # Install development tools rustup component add rustfmt clippy

Build

# Debug build cargo build --features whisper # Release build cargo build --release --features whisper # Run tests cargo test --features whisper

Code Quality

# Format code cargo fmt # Lint cargo clippy --features whisper # Run all tests cargo test --features whisper

Project Structure

orangenote-cli/ ├── src/ │ ├── lib.rs # Library exports │ ├── bin/ │ │ └── orangenote-cli.rs # Main CLI binary │ └── infrastructure/ │ ├── audio/ │ │ ├── chunk.rs # Audio chunking │ │ ├── decoder.rs # Audio decoding │ │ └── processor.rs # Audio processing │ └── transcription/ │ └── whisper/ │ ├── transcriber.rs # Main transcription engine │ ├── merger.rs # Result merging │ ├── context.rs # Whisper context wrapper │ └── model_manager.rs # Model management ├── vendor/ │ └── whisper.cpp/ # Whisper.cpp submodule ├── doc/ # Documentation ├── Cargo.toml # Project manifest └── build.rs # Build script for whisper.cpp

Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

MIT License - see LICENSE file for details

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.github/workflows		.github/workflows
.kilocode/rules-orchestrator		.kilocode/rules-orchestrator
docs		docs
scripts		scripts
src		src
vendor		vendor
.gitignore		.gitignore
.gitmodules		.gitmodules
CHANGELOG.md		CHANGELOG.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
GETTING_STARTED.md		GETTING_STARTED.md
GIT_COMMIT_GUIDE.md		GIT_COMMIT_GUIDE.md
INDEX.md		INDEX.md
LICENSE		LICENSE
PROGRESS.md		PROGRESS.md
README.md		README.md
ROADMAP.md		ROADMAP.md
STEP_A2.md		STEP_A2.md
STEP_A3.md		STEP_A3.md
STEP_A3_CHECKLIST.md		STEP_A3_CHECKLIST.md
STEP_A3_COMPLETION_REPORT.md		STEP_A3_COMPLETION_REPORT.md
STEP_A3_FILES.txt		STEP_A3_FILES.txt
STEP_A3_SUMMARY.md		STEP_A3_SUMMARY.md
STEP_A4.md		STEP_A4.md
STEP_A4_COMMIT_GUIDE.md		STEP_A4_COMMIT_GUIDE.md
STEP_A5.md		STEP_A5.md
STEP_A5_COMMIT_GUIDE.md		STEP_A5_COMMIT_GUIDE.md
STEP_A5_FINAL_REPORT.md		STEP_A5_FINAL_REPORT.md
STEP_A5_QUICK_REFERENCE.md		STEP_A5_QUICK_REFERENCE.md
STEP_A5_SUMMARY.md		STEP_A5_SUMMARY.md
SUMMARY.md		SUMMARY.md
WHISPER_BUILD_INSTRUCTIONS.md		WHISPER_BUILD_INSTRUCTIONS.md
WHISPER_SETUP_GUIDE.md		WHISPER_SETUP_GUIDE.md
build.rs		build.rs
setup-whisper.sh		setup-whisper.sh

Folders and files

Latest commit

History

Repository files navigation

OrangeNote CLI

Installation

macOS (Homebrew) — Recommended

From Source

Download Binary

Features

Quick Start

Basic Usage

Commands

Transcribe

Model Management

System Info

Audio Chunking for Long Files

How It Works

Usage

Recommendations

Output Example

Model Comparison

Output Formats

Examples

Transcribe Russian podcast

Create video subtitles

High-accuracy English transcription

Translate French to English

Batch processing

Logging

Troubleshooting

"[Music]" or repeated noise labels

Poor transcription quality

File not found

Unsupported format

Out of memory

Slow performance

Project Status

Implemented Features

Roadmap

Documentation

Development

Requirements

Build

Code Quality

Project Structure

Contributing

License

See Also

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages