-
- Notifications
You must be signed in to change notification settings - Fork 3.3k
Description
I've experienced this annoying bug on a couple of occasions, so decided it's worth a deep dive - used DeepWiki to investigate and compile this report
Summary
Sometimes Agent Zero generates malformed JSON for response tool calls within a session, causing responses to not display in the web UI. The issue persists within the affected session but doesn't occur in new sessions. Missing JSON characters (like closing braces) can appear in subsequent prompts, indicating buffer accumulation.
Steps to Reproduce
- Start a chat session with Agent Zero
- Engage in a conversation that triggers multiple response tool calls
- The following message for an agent might appear in the (Docker) logs:
You have misformatted your message. Follow system prompt instructions on JSON message formatting precisely. - Observe that eventually a response tool call will have malformed JSON (missing closing brace)
- The response will not display in the web UI
- Send a follow-up message - the missing brace may appear before your message, but the previous message will not render.
- Start a new session - responses work correctly again
Expected Behavior
- All response tool calls should generate valid JSON
- Responses should display properly in the web UI
- Session state should not accumulate malformed JSON fragments
Actual Behavior
- Response tool calls occasionally generate incomplete JSON that persist in later responses from the agent
- Malformed responses don't display in UI
- Missing JSON characters persist and appear in later prompts
- Issue is isolated to the affected session
Technical Details
Where the Issue Occurs
The problem most likely manifests in the streaming response pipeline:
- LLM most likely generates a complete, well-formed JSON - but it gets cut off (last lines of response tool JSON), missing closing braces, which get streamed later, prefixing User's next prompt
- Stream processing fails to handle incomplete data -
handle_response_stream()inagent.pyprocesses chunks but doesn't validate completeness - JSON parsing silently fails -
DirtyJson.parse_string()attempts to parse but can't fix structural issues - LiveResponse extension rejects malformed data - The extension validates JSON structure and returns early if invalid
Session State Corruption
The issue persists within sessions due to state accumulation:
- Loop data maintains temporary params across iterations
- Streaming buffers may retain partial JSON fragments
- WebSocket connection state could be accumulating data
Probable Root Causes
- Token limits - LLM may hit max_tokens and cut off JSON mid-generation
- Insufficient validation - No check for JSON completeness before processing
- Buffer not cleared - Incomplete fragments persist in session state
- Exception handling -
handle_response_stream()silently swallows exceptions
Workarounds
- Clear Chat - Resets session state but loses conversation history 7
- Start New Session - Creates fresh state with clean buffers
- Disable streaming - May prevent partial JSON generation (not tested)
Suggested Fixes
-
Add JSON completeness validation in
handle_response_stream():if not stream.strip().endswith('}'): raise ValueError("Incomplete JSON")
-
Implement proper buffer clearing between requests
-
Increase token limits or add retry logic for incomplete responses
-
Add logging when JSON parsing fails to aid debugging
-
Validate tool JSON structure before sending to LiveResponse extension
Environment
- Docker container
- Agent Zero version: [current]
- Model: [any model with streaming support]
- Browser: [any]
Additional Context
The memory consolidation warning ("LLM consolidation analysis failed") is a separate issue and doesn't affect response display. The core problem is in the streaming response system's handling of incomplete JSON within session state.
Notes
This issue highlights the need for more robust state management in the streaming system. The session-specific nature confirms that the core JSON parsing logic works correctly - the problem is accumulation of corrupted state within individual sessions over time.
Citations
File: agent.py (L397-399)
self.context.streaming_agent = self # mark self as current streamer self.loop_data.iteration += 1 self.loop_data.params_temporary = {} # clear temporary paramsFile: agent.py (L957-972)
async def handle_response_stream(self, stream: str): await self.handle_intervention() try: if len(stream) < 25: return # no reason to try response = DirtyJson.parse_string(stream) if isinstance(response, dict): await self.call_extensions( "response_stream", loop_data=self.loop_data, text=stream, parsed=response, ) except Exception as e: passFile: docs/developer/websockets.md (L52-69)
## Connection Lifecycle 1. **Lazy Connect** – `/js/websocket.js` connects only when a consumer uses the client API (e.g., `emit`, `request`, `on`). Consumers may still explicitly `await websocket.connect()` to block UI until the socket is ready. 2. **Handshake** – Socket.IO connects using the existing Flask session cookie and a CSRF token provided via the Socket.IO `auth` payload (`csrf_token`). The token is obtained from `GET /csrf_token` (see `/js/api.js#getCsrfToken()`), which also sets the runtime-scoped cookie `csrf_token_{runtime_id}`. The server validates an **Origin allowlist** (RFC 6455 / OWASP CSWSH baseline) and then checks handler requirements (`requires_auth`, `requires_csrf`) before accepting. 3. **Lifecycle Hooks** – After acceptance, `WebSocketHandler.on_connect(sid)` fires for every registered handler. Use it for initial emits, state bookkeeping, or session tracking. 4. **Normal Operation** – Client emits events. Manager routes them to the appropriate handlers, gathers results, and wraps outbound deliveries in the mandatory envelope. 5. **Disconnection & Buffering** – If a tab goes away without a graceful disconnect, fire-and-forget events accumulate (max 100). On reconnect, the manager flushes the buffer via `emit_to`. Request flows respond with explicit `CONNECTION_NOT_FOUND` errors. 6. **Reconnection Attempts** – Socket.IO handles reconnect attempts; the manager continues to buffer fire-and-forget events (up to 1 hour) for temporarily disconnected SIDs and flushes them on reconnect. ### State Sync (Replacing `/poll`) Agent Zero can also push poll-shaped state snapshots over the WebSocket bus, replacing the legacy 4Hz `/poll` loop while preserving the existing UI update contract. - **Handshake**: the frontend sync store (`/components/sync/sync-store.js`) calls `websocket.request("state_request", { context, log_from, notifications_from, timezone })` to establish per-tab cursors and a `seq_base`. - **Push**: the server emits `state_push` events containing `{ runtime_epoch, seq, snapshot }`, where `snapshot` is exactly the `/poll` payload shape built by `python/helpers/state_snapshot.py`. - **Coalescing**: the backend `StateMonitor` coalesces dirties per SID (25ms window) so streaming updates stay smooth without unbounded trailing-edge debounce. - **Degraded fallback**: if the WebSocket handshake/push path is unhealthy, the UI enters `DEGRADED` and uses `/poll` as a fallback; while degraded, push snapshots are ignored to avoid racey double-writes.