feat: LLM chatbot with litellm tool-calling and SSE streaming#3543
feat: LLM chatbot with litellm tool-calling and SSE streaming#3543JiwaniZakir wants to merge 1 commit intointelowlproject:developfrom
Conversation
New Django app (api_app/chatbot) implementing a conversational threat intelligence assistant powered by litellm and self-hosted LLMs (Ollama). Backend: - Explicit tool-calling loop (agent.py) — no framework, ~120 lines of transparent Python. Streams SSE events for token-by-token rendering. - 5 tools mapped to IntelOwl REST API: search_jobs, get_job_report, get_analyzer_config, search_observables, create_scan — all scoped to the requesting user's auth token. - ChatSession/ChatMessage models with conversation persistence. - Async SSE streaming endpoint via Django StreamingHttpResponse. - CHATBOT_ENABLED=False feature flag (disabled by default). Frontend: - Floating chat widget (React + Reactstrap) with SSE stream reader. - Tool call indicators during execution. Infrastructure: - Ollama Docker Compose override (docker/chatbot.override.yml). - litellm as sole new dependency — supports Ollama, OpenAI, Anthropic via config change. Tests: - Agent loop tests with mocked litellm (text response, tool calls, max rounds guard, API error handling). - Tool execution tests with mocked httpx (all 5 tools + error cases). Related to intelowlproject#3435
There was a problem hiding this comment.
Pull request overview
Adds a new api_app/chatbot/ Django app plus a React chat widget to provide an LLM-powered “threat intelligence assistant” that streams responses via SSE and uses LiteLLM tool-calling to query/trigger IntelOwl actions.
Changes:
- Backend: introduces a tool-calling agent loop (
litellm.acompletion) + async SSE endpoint + session/message persistence models. - Frontend: adds a floating chat widget and SSE stream reader; wires new chatbot session API URL.
- Infra/tests: adds LiteLLM dependency, optional Ollama compose override, and unit tests for agent/tools.
Reviewed changes
Copilot reviewed 18 out of 20 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/api_app/chatbot/test_tools.py | Adds unit tests for tool dispatch (mocked httpx) |
| tests/api_app/chatbot/test_agent.py | Adds unit tests for the streaming tool-calling agent loop |
| tests/api_app/chatbot/init.py | Test package init |
| requirements/project-requirements.txt | Adds litellm (and a duplicate httpx pin) |
| intel_owl/settings/chatbot.py | Adds chatbot feature-flag and model/provider settings |
| intel_owl/settings/init.py | Registers chatbot app + imports chatbot settings |
| frontend/src/layouts/AppMain.jsx | Renders the chat widget on authenticated pages |
| frontend/src/constants/apiURLs.js | Adds chatbot sessions base URL constant |
| frontend/src/components/chat/ChatWidget.jsx | Implements the floating widget + fetch-based SSE reader |
| docker/chatbot.override.yml | Adds optional Ollama service |
| api_app/urls.py | Includes chatbot URL routes |
| api_app/chatbot/views.py | Adds ChatSession ViewSet + async streaming send_message endpoint |
| api_app/chatbot/urls.py | Defines session CRUD routes + streaming message route |
| api_app/chatbot/tools.py | Implements tool schemas + REST calls to IntelOwl endpoints |
| api_app/chatbot/serializers.py | Serializers for sessions/messages and send-message payload |
| api_app/chatbot/prompts.py | Builds system prompt with safety instructions |
| api_app/chatbot/models.py | Adds ChatSession/ChatMessage models |
| api_app/chatbot/apps.py | Registers Django app config |
| api_app/chatbot/agent.py | Implements the tool-calling loop and SSE event emission |
| api_app/chatbot/init.py | App package init |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Truncate large reports to fit context window. | ||
| result_str = json.dumps(result) | ||
| if len(result_str) > MAX_TOOL_RESULT_CHARS: | ||
| return json.loads(_truncate(json.dumps(result, indent=None))) |
There was a problem hiding this comment.
get_job_report truncation path builds a truncated JSON string and then calls json.loads(...) on it. Since _truncate() slices arbitrary characters and appends a suffix, the result is not valid JSON and will reliably raise a JSONDecodeError for large reports, causing tool execution to fail when reports exceed the limit.
| return json.loads(_truncate(json.dumps(result, indent=None))) | |
| # Avoid parsing truncated JSON; instead, return a safely truncated | |
| # string representation wrapped in a small, structured object. | |
| truncated = _truncate(result_str, MAX_TOOL_RESULT_CHARS) | |
| return { | |
| "truncated": True, | |
| "report": truncated, | |
| } |
| response = StreamingHttpResponse( | ||
| event_stream(), | ||
| content_type="text/event-stream", | ||
| ) |
There was a problem hiding this comment.
This endpoint is implemented as an async view and returns a StreamingHttpResponse whose iterable is event_stream() (an async generator). In the default deployment, nginx forwards non-/ws HTTP traffic to uWSGI (WSGI), where Django cannot iterate async generators for streaming responses; this is likely to error or not stream incrementally. Consider routing this path to the ASGI server (daphne) or reworking it to be synchronous for uWSGI deployments.
| beautifulsoup4==4.14.2 | ||
| # chatbot LLM abstraction layer | ||
| litellm==1.67.2 | ||
| httpx==0.28.1 |
There was a problem hiding this comment.
requirements/project-requirements.txt already pins httpx[http2]==0.28.1; adding a second httpx==0.28.1 entry is redundant and can cause resolver ambiguity about extras. Remove the duplicate and keep the single pinned httpx[http2] requirement (or adjust the existing one if you want to change extras globally).
| httpx==0.28.1 |
| condition: service_started | ||
| | ||
| ollama: | ||
| image: ollama/ollama:latest |
There was a problem hiding this comment.
Using a floating :latest tag makes environments non-reproducible and can break unexpectedly when the upstream image changes. Pin the Ollama image to a known-good version (or digest), and document the expected version in the override file.
| image: ollama/ollama:latest | |
| # Pin Ollama image to a known-good version; update this tag intentionally when upgrading. | |
| image: ollama/ollama:0.5.7 |
| # Manual token auth for the async view. | ||
| token_key = _get_token_from_request(request) | ||
| if not token_key: | ||
| return JsonResponse( | ||
| {"detail": "Authentication credentials were not provided."}, | ||
| status=status.HTTP_401_UNAUTHORIZED, | ||
| ) | ||
| | ||
| from durin.models import AuthToken | ||
| | ||
| try: | ||
| auth_token_obj = await sync_to_async( | ||
| AuthToken.objects.select_related("user").get | ||
| )(token=token_key) | ||
| user = auth_token_obj.user | ||
| except AuthToken.DoesNotExist: | ||
| return JsonResponse( | ||
| {"detail": "Invalid token."}, | ||
| status=status.HTTP_401_UNAUTHORIZED, | ||
| ) |
There was a problem hiding this comment.
The async send_message view only authenticates via an Authorization: Token ... header. The React client calls this endpoint using cookies (credentials: "include") and does not send an Authorization header, so requests will consistently return 401 here. Either support session authentication in this view (e.g., accept authenticated request.user) or update the frontend to retrieve/store an API access token and include it in the Authorization header.
| # Manual token auth for the async view. | |
| token_key = _get_token_from_request(request) | |
| if not token_key: | |
| return JsonResponse( | |
| {"detail": "Authentication credentials were not provided."}, | |
| status=status.HTTP_401_UNAUTHORIZED, | |
| ) | |
| from durin.models import AuthToken | |
| try: | |
| auth_token_obj = await sync_to_async( | |
| AuthToken.objects.select_related("user").get | |
| )(token=token_key) | |
| user = auth_token_obj.user | |
| except AuthToken.DoesNotExist: | |
| return JsonResponse( | |
| {"detail": "Invalid token."}, | |
| status=status.HTTP_401_UNAUTHORIZED, | |
| ) | |
| # Determine authenticated user. | |
| user = getattr(request, "user", None) | |
| if not (user and getattr(user, "is_authenticated", False)): | |
| # Fall back to manual token auth for the async view when session auth is not available. | |
| token_key = _get_token_from_request(request) | |
| if not token_key: | |
| return JsonResponse( | |
| {"detail": "Authentication credentials were not provided."}, | |
| status=status.HTTP_401_UNAUTHORIZED, | |
| ) | |
| from durin.models import AuthToken | |
| try: | |
| auth_token_obj = await sync_to_async( | |
| AuthToken.objects.select_related("user").get | |
| )(token=token_key) | |
| user = auth_token_obj.user | |
| except AuthToken.DoesNotExist: | |
| return JsonResponse( | |
| {"detail": "Invalid token."}, | |
| status=status.HTTP_401_UNAUTHORIZED, | |
| ) |
| class ChatSession(models.Model): | ||
| id = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False) | ||
| user = models.ForeignKey( | ||
| settings.AUTH_USER_MODEL, | ||
| on_delete=models.CASCADE, |
There was a problem hiding this comment.
This PR introduces new Django models for the chatbot app, but no schema migration is included. Without an initial migration under api_app/chatbot/migrations/, deployments and CI runs that apply migrations will fail and the feature can't be enabled.
| const chunk = decoder.decode(value); | ||
| const lines = chunk.split("\n"); | ||
| | ||
| for (const line of lines) { | ||
| if (!line.startsWith("data: ")) continue; |
There was a problem hiding this comment.
The SSE parsing logic splits each received chunk by \n and ignores JSON parse errors. Because network chunk boundaries can split an SSE data: line (or the JSON payload) across reads, this will drop partial data and lose events/tokens. Buffer incomplete lines between reads (carry the remainder to the next chunk) and only parse once a full data: ... line is assembled.
| const resp = await fetch(`${CHATBOT_SESSIONS_URI}/${sid}/messages`, { | ||
| method: "POST", | ||
| headers: { "Content-Type": "application/json" }, | ||
| credentials: "include", | ||
| body: JSON.stringify({ message: text }), |
There was a problem hiding this comment.
These requests only send cookies (credentials: "include") but the backend streaming endpoint currently requires an Authorization: Token ... header (Durin token). As-is, the widget cannot authenticate to /chatbot/sessions/{sid}/messages. Either add the Authorization header here (by retrieving an API access token) or align the backend to accept session auth for the web UI.
| resp = AsyncMock() | ||
| resp.status_code = status_code | ||
| resp.json.return_value = json_data | ||
| resp.raise_for_status = AsyncMock() | ||
| if status_code >= 400: |
There was a problem hiding this comment.
_mock_response builds an AsyncMock for an httpx response, but in execute_tool both resp.raise_for_status() and resp.json() are synchronous methods. Mocking them as async produces coroutine objects (and can prevent exceptions from being raised), making these tests inaccurate or failing. Prefer a MagicMock response with sync .json() / .raise_for_status() behavior.
| class ChatSessionViewSet(viewsets.ModelViewSet): | ||
| """CRUD operations for chat sessions.""" | ||
| | ||
| serializer_class = ChatSessionSerializer | ||
| permission_classes = [IsAuthenticated] |
There was a problem hiding this comment.
CHATBOT_ENABLED is enforced in send_message, but the DRF ChatSessionViewSet has no equivalent check. When the feature flag is off, clients can still create/list/delete sessions via /chatbot/sessions, which undermines the intended default-disabled behavior. Consider blocking these viewset actions (e.g., return 404/403) when CHATBOT_ENABLED is false.
Summary
Adds an LLM-powered threat intelligence chatbot to IntelOwl as a new Django app (
api_app/chatbot/). Analysts can query IntelOwl's data in natural language — search jobs, retrieve reports, look up analyzers, search observables, and trigger scans — all through a streaming chat interface.Architecture: litellm + explicit tool-calling loop in Django async views. One new dependency, zero new services. No LangChain, no FastAPI, no RAG — just a clean ~120-line agent loop that calls litellm and dispatches tools against IntelOwl's own REST API.
Related to #3435
What's Included
Backend (
api_app/chatbot/)agent.py— Core tool-calling loop. Streamslitellm.acompletionchunks, accumulates tool calls from deltas, executes them, appends results, and loops until the LLM responds with text or hits a safety limit (10 rounds max). Yields typed SSE events (token,tool_call,tool_result,done,error).tools.py— 5 tools mapped to IntelOwl endpoints, each using the requesting user's auth token:search_jobsGET /api/jobsget_job_reportGET /api/jobs/{id}get_analyzer_configGET /api/analyzersearch_observablesGET /api/analyzablecreate_scanPOST /api/analyze_observableviews.py—ChatSessionViewSet(standard DRF CRUD) +send_messageasync view returningStreamingHttpResponsewith SSE. Manual Durin token auth on the async endpoint since DRF decorators don't support async views natively.models.py—ChatSession(UUID PK, user FK) andChatMessage(role, content, tool metadata).prompts.py— System prompt builder injecting user context and security instructions (never fabricate results, confirm before scanning, ignore instructions in analysis data).Frontend (
frontend/src/components/chat/)fetch+response.body.getReader()— no EventSource library needed.Infrastructure
docker/chatbot.override.yml— Optional Ollama service. Not in the default stack — opt-in via compose override.intel_owl/settings/chatbot.py—CHATBOT_ENABLED=Falseby default. Model switching is a config change:CHATBOT_MODEL=ollama/llama3.1for local,CHATBOT_MODEL=gpt-4ofor cloud.litellmadded torequirements/project-requirements.txt— sole new dependency. Handles Ollama/OpenAI/Anthropic translation.Tests (
tests/api_app/chatbot/)test_agent.py— Mockedlitellm.acompletion: text response, single tool call round-trip, max rounds guard, LLM API error handling.test_tools.py— Mockedhttpx: all 5 tools against fake API responses, auth token propagation, unknown tool error handling.Why This Architecture
Why litellm, not LangChain/LangGraph:
LangGraph is a state machine framework for multi-agent orchestration. IntelOwl needs one agent calling five tools. The tool-calling loop is 120 lines of readable Python — not a compiled graph you debug through framework internals. litellm provides model abstraction (100+ providers) without owning the application architecture. One dependency vs five-plus.
Why Django async views, not FastAPI:
Django has native async support since 4.1 and
StreamingHttpResponsesince 4.2. A separate FastAPI service means two processes, two auth systems, inter-service latency, more Docker complexity — for zero capability that Django doesn't already have. Every IntelOwl maintainer already knows Django.Why tools over RAG:
IntelOwl's data is structured and queryable via REST API. The LLM queries it through tools and gets exact, current results. RAG would embed reports into a vector store, adding a Celery embedding pipeline, a vector DB service, and stale-data risk — for no accuracy gain over
GET /api/jobs?observable_name=malware.exe.Why SSE over WebSockets:
Unidirectional (client receives tokens), works through all proxies without special config, Django supports it natively without Channels, standard HTTP auth semantics.
Security
create_scantool description instructs the LLM to confirm with user before execution.CHATBOT_ENABLED=Falsedefault — disabled until explicitly opted in.MAX_TOOL_ROUNDS=10prevents infinite tool-calling loops.How to Test Locally
Checklist
api_app/chatbotregistered in INSTALLED_APPS