Linux Speech Tools

Professional text-to-speech and voice input tools for Linux systems. Multi-engine TTS, voice recording, and cross-platform compatibility.

🚀 Quick Installation

curl -fsSL https://raw.githubusercontent.com/pablopda/linux-speech-tools/main/installer.sh | bash

✨ Features

🎙️ Multi-Engine Text-to-Speech

Edge TTS: High-quality cloud-based synthesis with 22-country LATAM regional voice support
Kokoro TTS: Offline neural voice synthesis
Festival TTS: Local fallback engine
Graceful fallbacks: Automatic engine switching for maximum reliability

🗣️ Voice Input & Recording

Toggle recording: Press once to start, again to stop (default mode)
Speech-to-text: Powered by OpenAI Whisper for accurate transcription
Auto-clipboard: Transcription automatically copied to clipboard
GNOME integration: Global hotkey (Ctrl+Alt+V) for system-wide voice input
Smart detection: Terminal vs GUI application handling

🎵 Enhanced Audio Streaming ⭐ NEW

Continuous playback: Eliminates gaps between audio chunks
Professional quality: Broadcast-level smooth TTS streaming
Smart concatenation: Uses ffmpeg/sox for seamless audio joining
Multiple modes: Continuous, buffered, and original streaming options
Drop-in replacement: Enhanced versions of existing commands

🎮 GNOME Media Controls ⭐ LATEST

Desktop media controls: Play/pause/stop from notification panel
Real-time progress: Visual progress tracking for reading sessions
Native integration: Professional media player experience for TTS
Document information: Display source title and reading status
Notification controls: Never lose control of long reading sessions

🖥️ Command-Line Tools

say - Text-to-speech with file output support
say-local - Local TTS using Festival/Kokoro
say-read - Read URLs, PDFs, and documents with TTS
say-read-es - Spanish language content reader
talk2claude - Voice input with transcription

🐧 Cross-Platform Linux Support

Ubuntu 20.04, 22.04
Debian 11, 12
Fedora 38, 39
Automatic dependency detection and installation
XDG-compliant configuration management

📖 Usage Examples

Basic Text-to-Speech

# Simple speech say "Hello from Linux Speech Tools!" # Spanish voice say -v es-ES-AlvaroNeural "¡Hola mundo!" # Save to file say -o greeting.mp3 "Welcome to our application" # Show available options say --help

🎤 Voice Input

GNOME Integration (Recommended):

# Install GNOME integration ./install-gnome-integration.sh # Use system-wide hotkey: Ctrl+Alt+V # Press once → Start recording # Press again → Stop and transcribe

Command Line:

# Toggle mode (default) ./toggle-speech.sh toggle # Start/stop recording ./toggle-speech.sh start # Start only ./toggle-speech.sh stop # Stop only # Fixed duration mode ./simple-speech.sh 5 # 5-second recording # Original talk2claude (advanced) talk2claude # 8-second recording talk2claude start # Background recording talk2claude stop # Stop and transcribe

📖 Content Reading

🎵 Enhanced: Continuous Streaming (NEW)

# Smooth, gap-free audio streaming ./say-read-continuous https://example.com/article # Professional-quality playback for long content ./say-read-smooth --buffered https://en.wikipedia.org/wiki/Linux # Interactive demo showing improvement ./demo-audio-streaming.sh

🎮 GNOME Media Controls (LATEST)

# Reading with desktop media controls ./say-read-gnome https://www.bbc.com/news/technology # Control playback from notification panel: # ⏸️ Pause - Click to pause reading # ▶️ Resume - Click to resume reading # ⏹️ Stop - Click to stop completely # Setup GNOME integration (first time) ./say-read-gnome --setup # Interactive demo and testing ./demo-gnome-media-integration.sh

📚 Standard Reading

# Read web articles say-read https://example.com/article # Read PDF documents say-read document.pdf # Read with Spanish voice say-read-es https://elpais.com/tecnologia/

🔧 Installation Methods

Option 1: One-Command Install (Recommended)

curl -fsSL https://raw.githubusercontent.com/pablopda/linux-speech-tools/main/installer.sh | bash

Option 2: Manual Installation

git clone https://github.com/pablopda/linux-speech-tools.git cd linux-speech-tools ./installer.sh

Option 3: Package Installation

Download packages from Releases:

Ubuntu/Debian:

wget https://github.com/pablopda/linux-speech-tools/releases/download/v1.0.0/linux-speech-tools_1.0.0.deb sudo dpkg -i linux-speech-tools_1.0.0.deb

Fedora/RHEL:

wget https://github.com/pablopda/linux-speech-tools/releases/download/v1.0.0/linux-speech-tools-1.0.0-1.noarch.rpm sudo rpm -i linux-speech-tools-1.0.0-1.noarch.rpm

⚙️ Configuration

Voice Configuration

Create ~/.config/speech-tools/config:

# Default voice for Edge TTS EDGE_VOICE=en-US-EmmaMultilingualNeural # Voice input settings ASR_LANG=en WHISPER_MODEL=large-v3

Available Voices

# List Edge TTS voices edge-tts --list-voices | grep -E "(Male|Female)" # Test different voices say -v en-GB-SoniaNeural "British English" say -v es-MX-DaliaNeural "Mexican Spanish" say -v pt-BR-AntonioNeural "Brazilian Portuguese"

🔍 Troubleshooting

Audio Issues

# Test audio output say "Audio test" # Check audio devices pactl list short sinks # Install audio dependencies sudo apt install pulseaudio-utils # Ubuntu/Debian sudo dnf install pulseaudio-utils # Fedora

Dependency Issues

# Install Python dependencies manually pip3 install edge-tts pyaudio speechrecognition # Install system dependencies sudo apt install python3-pip ffmpeg espeak-ng portaudio19-dev # Ubuntu/Debian sudo dnf install python3-pip ffmpeg espeak-ng portaudio-devel # Fedora

Permission Issues

# Make scripts executable chmod +x ~/.local/bin/{say,say-local,talk2claude} # Add to PATH if needed echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.bashrc source ~/.bashrc

🧪 Development

Running Tests

# Run full test suite python3 tests/test_speech_tools.py # Quick validation ./scripts/quick-release-check.sh # Comprehensive validation ./scripts/pre-release-check.sh

Creating Releases

# Patch release (1.0.0 -> 1.0.1) ./release.sh patch # Minor release (1.0.0 -> 1.1.0) ./release.sh minor # Preview release ./release.sh patch --dry-run

🤝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

Quick Start for Contributors

git clone https://github.com/pablopda/linux-speech-tools.git cd linux-speech-tools # Install development dependencies ./installer.sh # Run tests python3 tests/test_speech_tools.py # Submit changes git checkout -b feature/your-feature # Make changes ./scripts/quick-release-check.sh git commit -m "Add your feature" git push origin feature/your-feature # Create pull request

📋 Requirements

System Requirements

OS: Linux (Ubuntu 20.04+, Debian 11+, Fedora 38+)
Python: 3.7+
Audio: PulseAudio or ALSA
Network: Internet connection for Edge TTS

Dependencies

python3-pip
ffmpeg
espeak-ng
portaudio19-dev (Ubuntu/Debian) or portaudio-devel (Fedora)

All dependencies are automatically installed by the installer script.

📚 Documentation

Installation Guide
API Documentation (coming soon)
Voice Configuration Guide (coming soon)
Troubleshooting Guide (coming soon)

📊 Project Status

✅ Production Ready: Comprehensive testing across multiple distributions
✅ Actively Maintained: Regular updates and improvements
✅ Community Driven: Open to contributions and feature requests
✅ Professional Quality: Enterprise-grade CI/CD and release automation

🔗 Links

Repository: https://github.com/pablopda/linux-speech-tools
Releases: https://github.com/pablopda/linux-speech-tools/releases
Issues: https://github.com/pablopda/linux-speech-tools/issues
Discussions: https://github.com/pablopda/linux-speech-tools/discussions

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

OpenAI Whisper for speech recognition
Microsoft Edge TTS for cloud synthesis
Kokoro ONNX for offline synthesis
Festival Speech Synthesis System
The open-source Linux community

Made with ❤️ for the Linux community

Professional speech tools that just work. 🐧🎙️

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.github		.github
analysis		analysis
bin		bin
data		data
docs		docs
examples		examples
gnome-extension		gnome-extension
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
VERSION		VERSION
installer.sh		installer.sh
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Linux Speech Tools

🚀 Quick Installation

✨ Features

🎙️ Multi-Engine Text-to-Speech

🗣️ Voice Input & Recording

🎵 Enhanced Audio Streaming ⭐ NEW

🎮 GNOME Media Controls ⭐ LATEST

🖥️ Command-Line Tools

🐧 Cross-Platform Linux Support

📖 Usage Examples

Basic Text-to-Speech

🎤 Voice Input

📖 Content Reading

🔧 Installation Methods

Option 1: One-Command Install (Recommended)

Option 2: Manual Installation

Option 3: Package Installation

⚙️ Configuration

Voice Configuration

Available Voices

🔍 Troubleshooting

Audio Issues

Dependency Issues

Permission Issues

🧪 Development

Running Tests

Creating Releases

🤝 Contributing

Quick Start for Contributors

📋 Requirements

System Requirements

Dependencies

📚 Documentation

📊 Project Status

🔗 Links

📄 License

🙏 Acknowledgments

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages