Voice-to-prompt pipeline: wake word detection → streaming STT → Claude refinement.
flowchart LR subgraph Input M[🎙️ Microphone] end subgraph Pipeline W[Wake Word\nDetection] T[Streaming\nTranscription] C[Claude\nRefinement] end subgraph Output CB[📋 Clipboard] end M --> W --> T --> C --> CB - Wake word activation — Say "Hey Jarvis" to start (hands-free)
- Real-time streaming — See transcription as you speak
- Best-in-class accuracy — Parakeet TDT 0.6B v3 (#1 on HuggingFace Open ASR Leaderboard)
- Auto endpoint detection — Recording stops automatically when you stop speaking
- Claude refinement — Transform raw dictation into polished LLM prompts
- GPU-accelerated — <100ms inference on RTX 4090
The project uses a Nix flake with devenv for reproducible development environments with CUDA support.
Prerequisites:
- Nix with flakes enabled
- direnv with nix-direnv
- NVIDIA GPU with drivers installed
# Clone and enter the directory git clone https://github.com/lokaltog/barkdown.git cd barkdown # Allow direnv (loads the flake automatically) direnv allow # The environment provides: # - Python 3.13 with uv # - CUDA 12.x + cuDNN # - PortAudio (connects to PulseAudio/PipeWire) # - Audio tools (ffmpeg, sox)On first load, direnv activates the flake and displays:
🎤 Parakeet ASR Development Environment ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Python: Python 3.13.x CUDA: 12.x.x cuDNN: 9.x.x GPU: NVIDIA GeForce RTX 4090 # Requires Python 3.13+, CUDA 12.x, and system audio libraries pip install -e . # or with uv uv syncbarkdown # Default loop mode barkdown --auto-refine # Skip confirmation, auto-send to Claude barkdown --studio # Studio preset (aggressive VAD) barkdown --long-form # Long recording preset barkdown --no-loop # Single transcription then exitflowchart TB subgraph TUI["Terminal UI (Textual)"] APP[BarkdownApp] PC[PipelineController] SP[Spectrogram] HP[History Panel] PP[Preview Panel] end subgraph Pipeline["Pipeline (Async)"] ORCH[PipelineOrchestrator] WWD[WakeWordDetector\nOpenWakeWord/CPU] STR[StreamingTranscriber\nParakeet TDT/GPU] REF[ClaudeRefiner\nclaude-agent-sdk] VAD[Silero VAD] end subgraph Persistence HIST[(history.json)] end APP --> PC PC <-->|events| ORCH ORCH --> WWD ORCH --> STR ORCH --> REF STR --> VAD APP --> SP APP --> HP APP --> PP HP <--> HIST | Component | Technology | Purpose |
|---|---|---|
| Wake Word | OpenWakeWord (CPU) | Detects "Hey Jarvis" activation phrase |
| ASR | NVIDIA Parakeet TDT 0.6B (GPU) | Streaming speech-to-text |
| VAD | Silero VAD | Voice activity detection for auto-stop |
| Refinement | Claude via claude-agent-sdk | Transforms dictation into LLM prompts |
| TUI | Textual | Async terminal interface |
| Persistence | JSON | History storage at ~/.local/share/barkdown/history.json |
The pipeline operates as an event-driven state machine:
stateDiagram-v2 [*] --> IDLE IDLE --> LOADING: StartListening LOADING --> LISTENING_WAKE: ModelsLoaded LISTENING_WAKE --> RECORDING: WakeWordDetected LISTENING_WAKE --> IDLE: AbortOperation RECORDING --> TRANSCRIBING: SilenceDetected RECORDING --> IDLE: AbortOperation TRANSCRIBING --> AWAITING_CONFIRM: !auto_refine TRANSCRIBING --> REFINING: auto_refine TRANSCRIBING --> IDLE: AbortOperation AWAITING_CONFIRM --> REFINING: ConfirmRefinement(yes) AWAITING_CONFIRM --> COMPLETE: ConfirmRefinement(no) AWAITING_CONFIRM --> IDLE: AbortOperation REFINING --> COMPLETE: RefinementDone REFINING --> IDLE: AbortOperation COMPLETE --> LISTENING_WAKE: ContinueLoop COMPLETE --> IDLE: !loop_mode COMPLETE --> REFINING: RetryRefinement ERROR --> IDLE: ResetFromError note right of LISTENING_WAKE: CPU-only\n(OpenWakeWord) note right of RECORDING: VAD monitors\nfor silence note right of TRANSCRIBING: GPU inference\n(Parakeet) note right of REFINING: Claude API call | State | Description |
|---|---|
IDLE | Waiting for user to initiate |
LOADING | Loading ML models (ASR, wake word) |
LISTENING_WAKE | Listening for wake word on CPU |
RECORDING | Recording audio, VAD monitors for silence |
TRANSCRIBING | Processing audio through Parakeet ASR |
AWAITING_CONFIRM | Showing transcription, waiting for user to confirm refinement |
REFINING | Claude is transforming raw text into polished prompt |
COMPLETE | Output ready, copied to clipboard |
ERROR | Error state, can reset to IDLE |
User Actions:
StartListening— Begin the pipelineAbortOperation— Cancel current operation (works from any interruptible state)ConfirmRefinement— Confirm or skip Claude refinementRetryRefinement— Re-run refinement from COMPLETE stateResetFromError— Clear error and return to IDLE
Internal Events:
ModelsLoaded— Models ready, proceed to listeningWakeWordDetected— Wake phrase detected, start recordingSilenceDetected— Silence threshold reached, stop recordingTranscriptionComplete— ASR finishedRefinementDone— Claude refinement completeContinueLoop— Continue to next iteration (loop mode)
sequenceDiagram autonumber participant User participant TUI as BarkdownApp participant Orch as PipelineOrchestrator participant Wake as WakeWordDetector participant ASR as StreamingTranscriber participant Claude as ClaudeRefiner participant Clip as Clipboard User->>TUI: Press Enter TUI->>Orch: StartListening Orch->>Orch: Load models (if needed) Orch->>Wake: Start detection User->>Wake: "Hey Jarvis" Wake-->>Orch: WakeWordDetected Orch->>ASR: Start streaming loop While speaking User->>ASR: Audio chunks ASR-->>TUI: Partial transcription ASR->>ASR: VAD monitoring end ASR-->>Orch: SilenceDetected Orch->>ASR: Finalize ASR-->>Orch: TranscriptionComplete alt auto_refine = false Orch-->>TUI: AWAITING_CONFIRM TUI->>User: Show transcription User->>TUI: Confirm (Y) TUI->>Orch: ConfirmRefinement(yes) end Orch->>Claude: Refine raw text Claude-->>Orch: RefinementDone Orch-->>TUI: COMPLETE TUI->>Clip: Copy refined text TUI->>User: Show result Configuration via ~/.config/barkdown/config.toml or environment variables (BARKDOWN_*):
# Audio sample_rate = 16000 channels = 1 # Wake word wake_word_model = "hey_jarvis" wake_word_threshold = 0.5 # ASR parakeet_model = "nvidia/parakeet-tdt-0.6b-v3" # VAD / Recording vad_threshold = 0.5 silence_threshold_sec = 1.5 max_recording_sec = 60.0 # Claude claude_model = "claude-opus-4-5-20251101" auto_refine = false # Audio processing normalize_audio = false strip_silence = trueStudio (--studio) — For high-quality studio equipment:
- Higher VAD threshold (0.7)
- Faster endpoint detection (1.0s silence)
- LUFS normalization enabled
Long-form (--long-form) — For meetings, lectures:
- Lower VAD threshold (0.3)
- Longer silence tolerance (3.0s)
- 1 hour max recording
- Auto-chunking for memory efficiency
| Key | Action |
|---|---|
Enter | Start listening / Confirm |
Y | Confirm refinement |
N | Skip refinement |
Escape | Abort current operation |
R | Retry refinement |
C | Copy refined text |
Shift+C | Copy raw transcription |
? | Show help |
Q | Quit |
Hardware:
- NVIDIA GPU with CUDA 12.x support
- ~4GB VRAM (Parakeet TDT 0.6B)
- Microphone
Software (handled by Nix flake):
- Python 3.13+
- CUDA 12.x + cuDNN
- PulseAudio/PipeWire + PortAudio
API Keys:
ANTHROPIC_API_KEYfor Claude refinement
# direnv auto-loads the environment on cd cd barkdown # Install Python dependencies (auto-runs via devenv) uv sync --extra dev # Run the app barkdown # Run tests (use cuda-uv for GPU tests) cuda-uv run pytest # Type checking pyright # Linting ruff check .Ensure CUDA 12.x, cuDNN, and PortAudio are installed system-wide, then:
uv sync --extra dev uv run barkdown- Async-first — All pipeline components use async/await with thread executors for blocking I/O
- Process isolation — Pipeline runs in subprocess; PyTorch/CUDA blocks asyncio even with executors
- Event-driven — State machine with explicit transitions and events
- Lazy loading — Heavy dependencies (NeMo, OpenWakeWord) imported on first use for fast startup
- Non-blocking TUI — Textual message passing keeps UI responsive
MIT