A unified interface for browser speech synthesis and Eleven Labs voices.
# Using npm npm install speech-provider # Using yarn yarn add speech-provider # Using bun bun add speech-providerFull API documentation is available at https://osteele.github.io/speech-provider/.
import { getVoiceProvider } from 'speech-provider'; // Use browser voices only const provider = getVoiceProvider({}); // Use Eleven Labs voices if API key is available const provider = getVoiceProvider({ elevenLabsApiKey: 'your-api-key' }); // Use Eleven Labs with custom cache duration const provider = getVoiceProvider({ elevenLabsApiKey: 'your-api-key', cacheMaxAge: 86400 // Cache for 1 day }); // Get voices for a specific language const voices = await provider.getVoices({ lang: 'en-US', minVoices: 1 }); // Get default voice for a language const defaultVoice = await provider.getDefaultVoice({ lang: 'en-US' }); // Create and play an utterance if (defaultVoice) { const utterance = defaultVoice.createUtterance('Hello, world!'); utterance.onstart = () => console.log('Started speaking'); utterance.onend = () => console.log('Finished speaking'); utterance.start(); }- Unified interface for both browser speech synthesis and Eleven Labs voices
- Automatic fallback to browser voices when Eleven Labs API key is not provided
- Typesafe API with TypeScript support
- Simple voice selection by language
- Event listeners for speech start and end events
- Automatic caching of Eleven Labs API responses to reduce API calls
- Configurable cache duration for Eleven Labs responses
This package is used in Mandarin Sentence Practice, a web application for practicing Mandarin Chinese with listening and translation exercises. The app uses this package to provide high-quality text-to-speech for Mandarin sentences, with automatic fallback to browser voices when Eleven Labs is not available.
Creates a voice provider based on the available API keys. Falls back to browser speech synthesis if no API keys are provided.
function getVoiceProvider(options: { elevenLabsApiKey?: string | null; cacheMaxAge?: number; // Cache duration in seconds (default: 1 hour) }): VoiceProvider;Creates an Eleven Labs voice provider with optional configuration.
function createElevenLabsVoiceProvider( apiKey: string, options?: { validateResponses?: boolean; printVoiceProperties?: boolean; cacheMaxAge?: number; // Cache duration in seconds (default: 1 hour) } ): VoiceProvider;The library implements automatic caching for Eleven Labs API responses:
- Browser voices are cached automatically by the browser's speech synthesis engine
- Eleven Labs responses are cached using IndexedDB with a default duration of 1 hour
- Cache duration can be configured when creating the provider
- Cached responses are automatically invalidated after the specified duration
- Cache can be disabled by setting
cacheMaxAge: nullin the provider options
Examples of cache configuration:
// Use default 1-hour cache const provider = getVoiceProvider({ elevenLabsApiKey: 'your-api-key' }); // Cache for 1 day const provider = getVoiceProvider({ elevenLabsApiKey: 'your-api-key', cacheMaxAge: 86400 // 24 hours in seconds }); // Cache for 1 week const provider = getVoiceProvider({ elevenLabsApiKey: 'your-api-key', cacheMaxAge: 604800 // 7 days in seconds }); // Disable caching (preferred approach) const provider = getVoiceProvider({ elevenLabsApiKey: 'your-api-key', cacheMaxAge: null }); // Alternative way to disable caching const provider = getVoiceProvider({ elevenLabsApiKey: 'your-api-key', cacheMaxAge: 0 });interface VoiceProvider { name: string; getVoices({ lang, minVoices }: { lang: string; minVoices: number }): Promise<Voice[]>; getDefaultVoice({ lang }: { lang: string }): Promise<Voice | null>; }interface Voice { name: string; id: string; lang: string; provider: VoiceProvider; description: string | null; createUtterance(text: string): Utterance; }interface Utterance { start(): void; stop(): void; set onstart(callback: () => void); set onend(callback: () => void); }The browser speech synthesis provider (BrowserVoiceProvider) is supported in all modern browsers:
- Chrome/Edge: Full support (voices load asynchronously)
- Firefox: Full support
- Safari: Full support (iOS and macOS)
- Opera: Full support
Note: Voice availability and quality vary by browser and operating system. Chrome and Edge typically offer the best selection of voices.
The ElevenLabs provider (ElevenLabsVoiceProvider) requires:
- IndexedDB: For caching API responses (supported in all modern browsers)
- Fetch API: For making API requests (supported in all modern browsers)
- Audio API: For playing synthesized speech (supported in all modern browsers)
- Modern browser with ES2022 support
- IndexedDB support (for ElevenLabs caching)
- No Internet Explorer support
The library is designed for client-side use. When used in SSR environments:
- Browser voice provider gracefully handles the absence of
window.speechSynthesis - Returns empty arrays when browser APIs are unavailable
- Safe to import in SSR frameworks (Next.js, Nuxt, etc.) but should only be used client-side
Contributions are welcome! Please read the CONTRIBUTING.md guide for details on our code of conduct and the process for submitting pull requests.
See CHANGELOG.md for a list of changes and version history.
Copyright 2025 by Oliver Steele
Available under the MIT License