Skip to content

Conversation

@mehtarac
Copy link
Member

@mehtarac mehtarac commented Dec 2, 2025

Description

This PR introduces bidirectional streaming capabilities to Strands SDK, enabling real-time voice and audio conversations with AI models through persistent streaming connections.

Overview

Bidirectional streaming moves beyond traditional request-response patterns by maintaining long-running conversations where users can interrupt, provide continuous input, and receive real-time audio responses. This implementation is marked as experimental as we refine the API based on user feedback and evolving model capabilities.

Key Features:

  • Real-time audio I/O streaming with PyAudio integration
  • Automatic interruption detection that clears audio buffers when users speak
  • Concurrent tool execution during active conversations
  • Multi-modal input support for text, audio, and images
  • Provider-agnostic event system with strongly-typed, JSON-serializable events

Implementation Details

Core Components:

  • BidiAgent - Main agent class with start(), send(), receive(), stop() lifecycle methods
  • _BidiAgentLoop - Event processing engine handling model events and tool execution with connection restart logic
  • BidiModel - Model interface for bidirectional model providers
  • BidiInput/BidiOutput - Pluggable I/O channel abstractions

Model Providers:

  • BidiNovaSonicModel - AWS Bedrock Nova Sonic with complex event sequencing
  • BidiGeminiLiveModel - Google Gemini Live using official SDK
  • BidiOpenAIRealtimeModel - OpenAI Realtime API via WebSocket

I/O Handlers:

  • BidiAudioIO - PyAudio-based microphone/speaker handling with buffering
  • BidiTextIO - Terminal-based text input/output

Usage Example

import asyncio from strands.experimental.bidi import BidiAgent from strands.experimental.bidi.models import BidiNovaSonicModel from strands.experimental.bidi.io import BidiAudioIO, BidiTextIO from strands_tools import calculator async def main(): model = BidiNovaSonicModel() agent = BidiAgent(model=model, tools=[calculator]) audio_io = BidiAudioIO() text_io = BidiTextIO() await agent.run( inputs=[audio_io.input()], outputs=[audio_io.output(), text_io.output()] ) asyncio.run(main()) 

This is a new experimental feature under strands.experimental.bidi.

Related Issues

#217

Documentation PR

Type of Change

New feature

Testing

How have you tested the change? Verify that the changes do not break functionality or introduce warnings in consuming repositories: agents-docs, agents-tools, agents-cli

  • [] I ran hatch run prepare
  • I ran hatch run bidi:prepare: This is done to isolate the bidirectional streaming environment which needs Python 3.12+

Checklist

  • [ x] I have read the CONTRIBUTING document
  • I have added any necessary tests that prove my fix is effective or my feature works
  • I have updated the documentation accordingly
  • I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

mehtarac and others added 30 commits November 6, 2025 06:23
- Remove adapter from constructor - Implement BidirectionlIO interface - Add adapter the run() method
feat: (Agent): Finalize Bidirectional Agent class
Move test scripts into dedicated directory so tests directory only has unit tests and integ tests
Rename bidirectional components
Fix main branch. Temporarily rename loop to original name
Changes: - Keep main's architecture: BidirectionalConnection + start/stop functions - Apply our event renames: BidiTextInputEvent, BidiAudioInputEvent, etc. - Update agent to use BidiAgent, BidiModel, BidiNovaSonicModel - Update tests to use new class names - Fix imports across codebase Tests status: - ✅ 14/14 type tests passing - ⚠️ Integration tests running but failing (models need update to check 'type' field instead of isinstance) Known issue: Models use isinstance() checks which don't work with TypedDict. Need to update models to check content.get('type') field instead.
The agent's send() method was passing plain dicts directly to models, but models expect TypedEvent instances for isinstance() checks to work. Added dict-to-TypedEvent conversion logic that was lost in merge: - Checks event 'type' field in dict - Reconstructs appropriate TypedEvent (BidiTextInputEvent, BidiAudioInputEvent, etc.) - Maintains backward compatibility with WebSocket/dict-based clients Tests: - ✅ 14/14 type tests passing - ✅ 2/2 integration tests passing (nova_sonic, openai)
Updated test imports and usages: - GeminiLiveModel → BidiGeminiLiveModel - NovaSonicModel → BidiNovaSonicModel - OpenAIRealtimeModel → BidiOpenAIRealtimeModel Note: 21 model tests still failing because they call .connect() but models now use .start(). This is a pre-existing issue that needs separate fix - tests need API update.
Updated all test calls from old API to new API: - .connect() → .start() - .close() → .stop() - Updated error message expectations to match actual errors All tests now passing: - ✅ 47/47 bidirectional streaming tests passing - ✅ 14/14 type tests - ✅ 33/33 model tests - ✅ 2/2 integration tests
@github-actions github-actions bot removed the size/xl label Dec 2, 2025
@mehtarac mehtarac marked this pull request as ready for review December 2, 2025 17:52
@pgrayy pgrayy enabled auto-merge (squash) December 3, 2025 04:45
@pgrayy pgrayy disabled auto-merge December 3, 2025 04:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

2 participants