Pipecat TTS Cache is a lightweight caching layer for the Pipecat ecosystem. It transparently wraps existing TTS services to eliminate API costs for repeated phrases and reduce response latency to <5ms.
See it in action: Watch the Demo Video
- Ultra-Low Latency β Delivers cached audio in ~0.1ms (Memory) or ~1-5ms (Redis).
- Cost Reduction β Stop paying your TTS provider for common phrases like "Hello," "One moment," or "I didn't catch that."
- Universal Compatibility β Works as a Mixin with all Pipecat TTS services (Cartesia, ElevenLabs, Deepgram, Google, etc.).
- Smart Interruption β Automatically clears pending cache tasks and resets state when users interrupt the bot.
- Precision Alignment β Preserves word-level timestamps for perfect lip-syncing and subtitles, even on cached replays.
# Standard installation (Memory backend only) pip install pipecat-tts-cache # Production installation (with Redis support) pip install "pipecat-tts-cache[redis]" The caching layer intelligently handles different TTS architectures to ensure smooth playback regardless of the provider.
| Service Type | Caching Strategy | Supported Providers (Examples) |
|---|---|---|
| AudioContextWordTTS | Batch Caching Splits audio at word boundaries and caches individual sentences. | Cartesia, Rime |
| WordTTSService | Full Caching w/ Timestamps Caches the full response and preserves alignment data. | ElevenLabs, Hume |
| TTSService | Standard Caching Caches the full audio response (no alignment data). | Google, OpenAI, Deepgram (HTTP) |
| InterruptibleTTS | Sentence Caching Caches single-sentence responses only. | Sarvam, Deepgram (WebSocket) |
The MemoryCacheBackend is perfect for local development or single-process bots. It uses an LRU (Least Recently Used) eviction policy.
from pipecat_tts_cache import TTSCacheMixin, MemoryCacheBackend from pipecat.services.google.tts import GoogleHttpTTSService # 1. Create a cached class using the Mixin class CachedGoogleTTS(TTSCacheMixin, GoogleHttpTTSService): pass # 2. Initialize with memory backend tts = CachedGoogleTTS( voice_id="en-US-Chirp3-HD-Charon", cache_backend=MemoryCacheBackend(max_size=1000), cache_ttl=86400, # Cache for 24 hours )For production deployments, use RedisCacheBackend. This allows the cache to persist across restarts and be shared among multiple bot instances.
from pipecat_tts_cache.backends import RedisCacheBackend tts = CachedGoogleTTS( voice_id="en-US-Chirp3-HD-Charon", cache_backend=RedisCacheBackend( redis_url="redis://localhost:6379/0", key_prefix="pipecat:tts:", ), cache_ttl=604800, # Cache for 1 week )The system utilizes a Frame Interception Architecture to seamlessly integrate with the Pipecat pipeline:
- Deterministic Key Gen: Before requesting audio, a unique key is generated based on the normalized text, voice ID, model, speed, and pitch. Sensitive data (API keys) is excluded.
- Cache Check (
run_tts):
- Hit: The system immediately pushes cached audio frames and timestamps to the pipeline.
- Miss: The system calls the parent TTS service.
- Collection (
push_frame): As the parent service generates audio, the Mixin intercepts the frames, aggregates them, and stores them in the backend for future use.
When an InterruptionFrame is received, the cache mixin immediately:
- Clears all pending cache write tasks.
- Resets the internal batch state.
- Ensures no partial or cut-off audio is committed to the pipeline.
You can monitor cache performance or clear entries programmatically.
# Check performance stats = await tts.get_cache_stats() print(f"Hit Rate: {stats['hit_rate']:.1%}") print(f"Total Saved Calls: {stats['hits']}") # Maintenance await tts.clear_cache() # Clear all await tts.clear_cache(namespace="user_123") # Clear specific namespace| Metric | Direct API | Memory Cache | Redis Cache |
|---|---|---|---|
| Latency | 200ms - 1500ms | ~0.1ms | ~2ms |
| Cost | $ per character | $0 | $0 |
| Consistency | Variable | Deterministic | Deterministic |
# Install with example dependencies pip install "pipecat-tts-cache[examples]" # Optional: Install with Redis support pip install "pipecat-tts-cache[examples,redis]" # Set environment variables export DEEPGRAM_API_KEY=your_key export CARTESIA_API_KEY=your_key export GOOGLE_API_KEY=your_key # Optional: For Redis backend export USE_REDIS_CACHE=true export REDIS_URL=redis://localhost:6379/0# Start the bot server python examples/basic_caching.py --host 0.0.0.0 --port 7860 # Connect via Daily Bots or your Daily room# Run with local WebRTC transport python examples/basic_caching.py -t webrtc --host localhost --port 8765| Pipecat Version | Status |
|---|---|
| v0.0.91+ | β Tested |
β‘οΈ Reach out via mail
β‘οΈ Connect on LinkedIn
