- Notifications
You must be signed in to change notification settings - Fork 3.1k
Description
🔴 Required Information
Describe the Bug:
The session resumption reconnection loop in base_llm_flow.py's run_live() never iterates. The while True: loop at line 481 and the handle-injection code at lines 484–496 are unreachable because both exception handlers (lines 595 and 600) re-raise unconditionally. Additionally, goAway messages from the server are silently dropped at the connection layer, preventing proactive reconnection before the server terminates the connection.
The net effect is that session resumption handles are received and stored (via _receive_from_model → invocation_context.live_session_resumption_handle) but never used. After the ~10-minute connection lifetime expires, run_live() terminates with an error instead of reconnecting with the saved handle.
Steps to Reproduce:
- Configure a live agent with
RunConfig(session_resumption=types.SessionResumptionConfig(transparent=True)) - Start a
run_live()session with a live-only model (e.g.gemini-2.5-flash-live) - Verify that
SessionResumptionUpdatemessages are received (visible at DEBUG level: "Update session resumption handle: ...") - Trigger any WebSocket disconnect, either:
- Wait ~10 minutes for the server's connection lifetime limit, or
- Simulate a network interruption
- Observe that
run_live()raises an error instead of reconnecting with the saved handle - The log message "Attempting to reconnect (Attempt N)..." (line 485) is never emitted
Expected Behavior:
When the WebSocket connection closes (after goAway or session timeout):
- The
while True:loop should catch the exception and continue - On the next iteration, the saved
live_session_resumption_handleshould be injected into the setup config (the code at lines 484–496 already does this) - A new connection should be opened with the handle, restoring the session without replaying history
run_live()should continue yielding events seamlessly
Observed Behavior:
The connection dies and run_live() raises an exception. The reconnection loop never gets a second iteration. There are two disconnect paths, both broken:
Path A: Disconnect during receive (most common):
The genai SDK's _receive() (live.py:538) catches ConnectionClosed from the websockets library and converts it to APIError:
# google/genai/live.py:538-545 except ConnectionClosed as e: if e.rcvd: code = e.rcvd.code reason = e.rcvd.reason else: code = 1006 reason = ... errors.APIError.raise_error(code, reason, None)This APIError propagates through _receive_from_model and hits except Exception at line 600, which re-raises. The except (ConnectionClosed, ConnectionClosedOK) at line 595 never fires on this path because the exception is APIError, not ConnectionClosed.
Path B: Disconnect during send:
The genai SDK's send methods (send_client_content, send_tool_response, etc.) call self._ws.send() without catching ConnectionClosed. The raw ConnectionClosed propagates through the send_task's finally cleanup and hits except (ConnectionClosed, ConnectionClosedOK) at line 595, which also re-raises.
Separately, goAway is unhandled:
gemini_llm_connection.py's receive() method processes message.server_content, message.tool_call, message.usage_metadata, and message.session_resumption_update but has no if message.go_away: check. The LiveServerGoAway event (sent ~60s before server termination) is silently dropped, preventing proactive reconnection.
Environment Details:
- ADK Library Version: 1.27.2
- Desktop OS: macOS (Darwin 24.6.0)
- Python Version: 3.12
Model Information:
- Are you using LiteLLM: No
- Which model is being used: gemini-live-2.5-flash-native-audio
🟡 Optional Information
Regression:
No, the reconnection plumbing was added in PR #2270 (fixing #2179) but the exception handling was never updated to actually loop. The handle injection code at lines 484–496 has been unreachable since it was introduced.
Logs:
When the connection times out, the following is logged:
ERROR An unexpected error occurred in live flow: 1000 None. connection closed The "Attempting to reconnect" log message at line 485 is never reached.
Additional Context:
The code structure suggests the intent was for clean WebSocket closes to end the _receive_from_model generator naturally (the comment at line 596 says "when the session timeout, it will just close and not throw exception"), allowing the while True: loop to iterate. However, the genai SDK's _receive() converts ALL ConnectionClosed exceptions (including ConnectionClosedOK / code 1000) into APIError, so the "clean end" path never occurs.
_receive_from_model does have an except ConnectionClosedOK: pass at line 744, which would enable the clean-end path but APIError is not ConnectionClosedOK, so it doesn't fire for connection closes that come through _receive().
Suggested fix:
There are three changes needed:
- Fix the reconnection loop In
run_live(), catch the genai SDK'sAPIErrorwrappingConnectionClosedandcontinueinstead ofraisewhen a session resumption handle is available:
except (ConnectionClosed, ConnectionClosedOK) as e: if invocation_context.live_session_resumption_handle: logger.info('Connection closed (%s), reconnecting with session handle.', e) continue raise except Exception as e: if ( invocation_context.live_session_resumption_handle and isinstance(e, APIError) ): logger.info('Connection lost (%s), reconnecting with session handle.', e) continue raise- Skip
send_historyon reconnection When reconnecting with a handle, the server already has the session context. Add a guard at line 503:
if llm_request.contents and not invocation_context.live_session_resumption_handle: await llm_connection.send_history(llm_request.contents)- Handle
goAwayIngemini_llm_connection.py'sreceive(), surfacemessage.go_awayso the flow layer can proactively reconnect before the server terminates the connection.
Related issues / PRs:
- Session resumption not working #2179 "Session resumption not working" (closed, partially fixed by feat: Implement Live Session Resumption #2270)
- PR feat: Implement Live Session Resumption #2270 Added handle injection plumbing (but didn't fix exception handling)
- PR feat: expose live_session_resumption_update in events for cross-connection resumption #4358 Expose
live_session_resumption_updatein events (open)
How often has this issue occurred?:
- Always (100%) the reconnection loop has never successfully iterated.