Skip to content

Lokaltog/barkdown

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Barkdown

Voice-to-prompt pipeline: wake word detection → streaming STT → Claude refinement.

flowchart LR subgraph Input M[🎙️ Microphone] end subgraph Pipeline W[Wake Word\nDetection] T[Streaming\nTranscription] C[Claude\nRefinement] end subgraph Output CB[📋 Clipboard] end M --> W --> T --> C --> CB 
Loading

Features

  • Wake word activation — Say "Hey Jarvis" to start (hands-free)
  • Real-time streaming — See transcription as you speak
  • Best-in-class accuracy — Parakeet TDT 0.6B v3 (#1 on HuggingFace Open ASR Leaderboard)
  • Auto endpoint detection — Recording stops automatically when you stop speaking
  • Claude refinement — Transform raw dictation into polished LLM prompts
  • GPU-accelerated — <100ms inference on RTX 4090

Installation

NixOS / Nix Flakes (Recommended)

The project uses a Nix flake with devenv for reproducible development environments with CUDA support.

Prerequisites:

  • Nix with flakes enabled
  • direnv with nix-direnv
  • NVIDIA GPU with drivers installed
# Clone and enter the directory git clone https://github.com/lokaltog/barkdown.git cd barkdown # Allow direnv (loads the flake automatically) direnv allow # The environment provides: # - Python 3.13 with uv # - CUDA 12.x + cuDNN # - PortAudio (connects to PulseAudio/PipeWire) # - Audio tools (ffmpeg, sox)

On first load, direnv activates the flake and displays:

🎤 Parakeet ASR Development Environment ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Python: Python 3.13.x CUDA: 12.x.x cuDNN: 9.x.x GPU: NVIDIA GeForce RTX 4090 

Manual Installation

# Requires Python 3.13+, CUDA 12.x, and system audio libraries pip install -e . # or with uv uv sync

Quick Start

barkdown # Default loop mode barkdown --auto-refine # Skip confirmation, auto-send to Claude barkdown --studio # Studio preset (aggressive VAD) barkdown --long-form # Long recording preset barkdown --no-loop # Single transcription then exit

Architecture

flowchart TB subgraph TUI["Terminal UI (Textual)"] APP[BarkdownApp] PC[PipelineController] SP[Spectrogram] HP[History Panel] PP[Preview Panel] end subgraph Pipeline["Pipeline (Async)"] ORCH[PipelineOrchestrator] WWD[WakeWordDetector\nOpenWakeWord/CPU] STR[StreamingTranscriber\nParakeet TDT/GPU] REF[ClaudeRefiner\nclaude-agent-sdk] VAD[Silero VAD] end subgraph Persistence HIST[(history.json)] end APP --> PC PC <-->|events| ORCH ORCH --> WWD ORCH --> STR ORCH --> REF STR --> VAD APP --> SP APP --> HP APP --> PP HP <--> HIST 
Loading

Key Components

Component Technology Purpose
Wake Word OpenWakeWord (CPU) Detects "Hey Jarvis" activation phrase
ASR NVIDIA Parakeet TDT 0.6B (GPU) Streaming speech-to-text
VAD Silero VAD Voice activity detection for auto-stop
Refinement Claude via claude-agent-sdk Transforms dictation into LLM prompts
TUI Textual Async terminal interface
Persistence JSON History storage at ~/.local/share/barkdown/history.json

State Machine

The pipeline operates as an event-driven state machine:

stateDiagram-v2 [*] --> IDLE IDLE --> LOADING: StartListening LOADING --> LISTENING_WAKE: ModelsLoaded LISTENING_WAKE --> RECORDING: WakeWordDetected LISTENING_WAKE --> IDLE: AbortOperation RECORDING --> TRANSCRIBING: SilenceDetected RECORDING --> IDLE: AbortOperation TRANSCRIBING --> AWAITING_CONFIRM: !auto_refine TRANSCRIBING --> REFINING: auto_refine TRANSCRIBING --> IDLE: AbortOperation AWAITING_CONFIRM --> REFINING: ConfirmRefinement(yes) AWAITING_CONFIRM --> COMPLETE: ConfirmRefinement(no) AWAITING_CONFIRM --> IDLE: AbortOperation REFINING --> COMPLETE: RefinementDone REFINING --> IDLE: AbortOperation COMPLETE --> LISTENING_WAKE: ContinueLoop COMPLETE --> IDLE: !loop_mode COMPLETE --> REFINING: RetryRefinement ERROR --> IDLE: ResetFromError note right of LISTENING_WAKE: CPU-only\n(OpenWakeWord) note right of RECORDING: VAD monitors\nfor silence note right of TRANSCRIBING: GPU inference\n(Parakeet) note right of REFINING: Claude API call 
Loading

States

State Description
IDLE Waiting for user to initiate
LOADING Loading ML models (ASR, wake word)
LISTENING_WAKE Listening for wake word on CPU
RECORDING Recording audio, VAD monitors for silence
TRANSCRIBING Processing audio through Parakeet ASR
AWAITING_CONFIRM Showing transcription, waiting for user to confirm refinement
REFINING Claude is transforming raw text into polished prompt
COMPLETE Output ready, copied to clipboard
ERROR Error state, can reset to IDLE

Events

User Actions:

  • StartListening — Begin the pipeline
  • AbortOperation — Cancel current operation (works from any interruptible state)
  • ConfirmRefinement — Confirm or skip Claude refinement
  • RetryRefinement — Re-run refinement from COMPLETE state
  • ResetFromError — Clear error and return to IDLE

Internal Events:

  • ModelsLoaded — Models ready, proceed to listening
  • WakeWordDetected — Wake phrase detected, start recording
  • SilenceDetected — Silence threshold reached, stop recording
  • TranscriptionComplete — ASR finished
  • RefinementDone — Claude refinement complete
  • ContinueLoop — Continue to next iteration (loop mode)

Data Flow

sequenceDiagram autonumber participant User participant TUI as BarkdownApp participant Orch as PipelineOrchestrator participant Wake as WakeWordDetector participant ASR as StreamingTranscriber participant Claude as ClaudeRefiner participant Clip as Clipboard User->>TUI: Press Enter TUI->>Orch: StartListening Orch->>Orch: Load models (if needed) Orch->>Wake: Start detection User->>Wake: "Hey Jarvis" Wake-->>Orch: WakeWordDetected Orch->>ASR: Start streaming loop While speaking User->>ASR: Audio chunks ASR-->>TUI: Partial transcription ASR->>ASR: VAD monitoring end ASR-->>Orch: SilenceDetected Orch->>ASR: Finalize ASR-->>Orch: TranscriptionComplete alt auto_refine = false Orch-->>TUI: AWAITING_CONFIRM TUI->>User: Show transcription User->>TUI: Confirm (Y) TUI->>Orch: ConfirmRefinement(yes) end Orch->>Claude: Refine raw text Claude-->>Orch: RefinementDone Orch-->>TUI: COMPLETE TUI->>Clip: Copy refined text TUI->>User: Show result 
Loading

Configuration

Configuration via ~/.config/barkdown/config.toml or environment variables (BARKDOWN_*):

# Audio sample_rate = 16000 channels = 1 # Wake word wake_word_model = "hey_jarvis" wake_word_threshold = 0.5 # ASR parakeet_model = "nvidia/parakeet-tdt-0.6b-v3" # VAD / Recording vad_threshold = 0.5 silence_threshold_sec = 1.5 max_recording_sec = 60.0 # Claude claude_model = "claude-opus-4-5-20251101" auto_refine = false # Audio processing normalize_audio = false strip_silence = true

Presets

Studio (--studio) — For high-quality studio equipment:

  • Higher VAD threshold (0.7)
  • Faster endpoint detection (1.0s silence)
  • LUFS normalization enabled

Long-form (--long-form) — For meetings, lectures:

  • Lower VAD threshold (0.3)
  • Longer silence tolerance (3.0s)
  • 1 hour max recording
  • Auto-chunking for memory efficiency

Keyboard Shortcuts

Key Action
Enter Start listening / Confirm
Y Confirm refinement
N Skip refinement
Escape Abort current operation
R Retry refinement
C Copy refined text
Shift+C Copy raw transcription
? Show help
Q Quit

Requirements

Hardware:

  • NVIDIA GPU with CUDA 12.x support
  • ~4GB VRAM (Parakeet TDT 0.6B)
  • Microphone

Software (handled by Nix flake):

  • Python 3.13+
  • CUDA 12.x + cuDNN
  • PulseAudio/PipeWire + PortAudio

API Keys:

  • ANTHROPIC_API_KEY for Claude refinement

Development

With Nix (recommended)

# direnv auto-loads the environment on cd cd barkdown # Install Python dependencies (auto-runs via devenv) uv sync --extra dev # Run the app barkdown # Run tests (use cuda-uv for GPU tests) cuda-uv run pytest # Type checking pyright # Linting ruff check .

Without Nix

Ensure CUDA 12.x, cuDNN, and PortAudio are installed system-wide, then:

uv sync --extra dev uv run barkdown

Design Principles

  • Async-first — All pipeline components use async/await with thread executors for blocking I/O
  • Process isolation — Pipeline runs in subprocess; PyTorch/CUDA blocks asyncio even with executors
  • Event-driven — State machine with explicit transitions and events
  • Lazy loading — Heavy dependencies (NeMo, OpenWakeWord) imported on first use for fast startup
  • Non-blocking TUI — Textual message passing keeps UI responsive

License

MIT

About

Stop typing prompts like a caveman. Barkdown is a voice-to-prompt toolkit that listens to your rambling, transcribes it in realtime and refines it into something Claude won't judge you for.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors