Event-based video intelligence with 98% cost reduction
Multi-source video processing SDK with intelligent frame selection, motion tracking, and VLM-powered analysis. Built for production use with RTSP, ONVIF, UDP, WebRTC, and more coming soon.
Note:
pip install vlm-sdkinstalls the SDK components (connectors, preprocessors, providers). The FastAPI service depends on additional packages; install them separately if you plan to run the API.
- π― Event-based processing: Only analyze frames with motion/activity (98% cost reduction vs frame-by-frame)
- πΉ Multi-source connectors: RTSP, ONVIF, UDP, WebRTC, File
- π€ RT-DETR + ByteTrack: Real-time object detection and motion tracking
- π§ Provider-agnostic VLM: Gemini, Qwen, ObserveeVLM (Small VLM coming soon) (via env config)
- π¨ Advanced analysis: Timestamps, object detection, bounding boxes, range queries
- β‘ FastAPI REST API: Industry-standard multi-stream video intelligence
- π‘ Server-Sent Events (SSE): Real-time event streaming
- π Authentication: API key-based auth with rate limiting
- π Monitoring: Health checks, metrics, stream management
- π§ Configurable: Environment-based provider selection
# Install from PyPI pip install vlm-sdk # Or install from source git clone https://github.com/observee-ai/vlm-sdk.git cd vlm-sdk pip install -e .from vlm.preprocessors import DetectorPreprocessor from vlm.connectors import RTSPConnector from vlm.providers.gemini import GeminiVideoService import asyncio # Initialize components connector = RTSPConnector("rtsp://camera.local/stream1") preprocessor = DetectorPreprocessor({ "confidence_threshold": 0.6, "interesting_objects": ["person", "car"], "min_event_duration": 2.0, # Only events longer than 2 seconds }) gemini = GeminiVideoService(api_key="your-gemini-key") # Process stream async def process(): for frame in connector.stream_frames(): result = preprocessor.process_frame(frame.data, frame.timestamp) if result['status'] == 'completed': # Event detected! Analyze with VLM upload = await gemini.upload_file(result['clip_path']) analysis = await gemini.query_video_with_file( upload['name'], "Describe the activity in this video" ) print(f"Analysis: {analysis['response']}") asyncio.run(process())
DetectorPreprocessoraccepts a configuration dictionary (matching the keys documented invlm/preprocessors/detector/core.py). Useinteresting_objectsto control tracked classes andmin_event_durationfor event length thresholds. Configuration keys such asconfidence_threshold,interesting_objects, andmin_event_durationmust be provided via the config dict (not as individual keyword arguments).
# Set environment variables export ADMIN_API_KEY=your-secret-key export GEMINI_API_KEY=your-gemini-key export VLM_PROVIDER=gemini # or openai, anthropic # Install SDK (from repo checkout) pip install -e . # Install API dependencies (required for running api.main) pip install fastapi uvicorn[standard] pydantic python-dotenv # or install everything we ship in Docker pip install -r requirements.txt # Run server python -m api.main # Server starts at http://localhost:8000Note: To accept WebRTC publishers, run MediaMTX alongside the API using the provided
mediamtx.yml(see docs/apiguide.md for commands).
# Pull the public image (linux/amd64) docker pull observee/vlm-sdk:latest # Run the API (set your API keys as needed) docker run --rm -p 8000:8000 \ -e ADMIN_API_KEY=your-secret-key \ -e GEMINI_API_KEY=your-gemini-key \ observee/vlm-sdk:latestCreate a stream:
curl -X POST http://localhost:8000/v1/streams/create \ -H "X-Admin-API-Key: your-secret-key" \ -H "X-VLM-API-Key: your-gemini-key" \ -H "Content-Type: application/json" \ -d '{ "source_type": "rtsp", "source_url": "rtsp://camera.local/stream1", "config": { "username": "admin", "password": "password", "profile": "security", "min_duration": 2.0 }, "analysis": { "enabled": true, "mode": "basic", "prompt": "Describe any activity or movement" } }'Listen to events (SSE):
curl -N http://localhost:8000/v1/streams/{stream_id}/events \ -H "X-Admin-API-Key: your-secret-key"# Required ADMIN_API_KEY=your-admin-key # API authentication # VLM Provider (choose one) VLM_PROVIDER=gemini # gemini, openai, or anthropic GEMINI_API_KEY=your-gemini-key # If using Gemini OPENAI_API_KEY=your-openai-key # If using OpenAI ANTHROPIC_API_KEY=your-anthropic-key # If using Claude # Optional: Rate Limiting RATE_LIMIT_REQUESTS=100 # Requests per window RATE_LIMIT_WINDOW=60 # Time window (seconds)Basic - Simple video description
{ "analysis": { "mode": "basic", "prompt": "Describe the activity" } }Timestamps - Find specific moments
{ "analysis": { "mode": "timestamps", "find_timestamps": { "query": "when does someone wave", "find_all": true, "confidence_threshold": 0.7 } } }| Connector | Description | Config |
|---|---|---|
| RTSP | IP camera streams | username, password, transport (tcp/udp) |
| ONVIF | Auto-discovery + PTZ | username, password, profile_index |
| UDP | UDP video receiver | host, port, buffer_size |
| WebRTC | Browser streams | signaling_url, ice_servers |
POST /v1/streams/create Create stream GET /v1/streams/{id}/events SSE event stream GET /v1/streams/{id} Get status DELETE /v1/streams/{id} Stop stream GET /v1/streams List all streams GET /v1/streams/discover/onvif Discover cameras GET /v1/streams/health Health check βββββββββββββββ β Connector β (RTSP/ONVIF/UDP/WebRTC) ββββββββ¬βββββββ β Frames βΌ βββββββββββββββ β RT-DETR β (Object detection + motion tracking) ββββββββ¬βββββββ β Events (only motion/activity) βΌ βββββββββββββββ β Event Bufferβ (Collects frames during events) ββββββββ¬βββββββ β Complete Events ββββββββββββββββββ β β βΌ βΌ βββββββββββββ ββββββββββββ β Storage β β VLM β (Gemini/Qwen/ObserveeVLM) βββββββββββββ ββββββ¬ββββββ β βΌ βββββββββββββββββ β SSE / Webhooksβ βββββββββββββββββ Key Innovation: Event-based processing analyzes only frames with detected motion/activity, reducing VLM API calls by 98% compared to frame-by-frame analysis.
vlm-sdk/ βββ vlm/ # Core SDK components βββ api/ # FastAPI service (routers, services, models) βββ examples/ # Sample scripts for RTSP/UDP/WebRTC usage βββ docs/ # Additional documentation βββ mediamtx/ # MediaMTX config for WebRTC/RTSP bridging βββ output/ # Example generated clips (safe to remove) βββ pyproject.toml # SDK packaging metadata βββ requirements.txt # Full dependency list for API/Docker βββ Dockerfile # Reference container for the API βββ README.md # Clone repository git clone https://github.com/observee-ai/vlm-sdk.git cd vlm-sdk # Install with dev dependencies pip install -e ".[dev]" # Include API stack if you plan to run the server locally pip install -r requirements.txt # Run tests pytest tests/ # Format code black vlm/ api/ ruff check vlm/ api/ # Run API server (development) uvicorn api.main:app --reload- π’ Security & Surveillance: 24/7 perimeter monitoring with motion alerts
- πͺ Retail Analytics: Customer counting, queue analysis, behavior tracking
- π Traffic Monitoring: Vehicle counting, flow analysis, incident detection
- π Smart Home: Activity monitoring, intrusion detection
- π Industrial: Safety compliance, equipment monitoring
| Approach | Frames/Hour | VLM API Calls | Cost Reduction |
|---|---|---|---|
| Frame-by-frame | 54,000 (15 FPS) | 54,000 | Baseline |
| Event-based (VLMS) | 54,000 | ~1,000 | 98% β |
Example: 1-hour 15 FPS stream with 5-10 motion events
Contributions welcome! Please see CONTRIBUTING.md for guidelines.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Apache-2.0 β Permissive license suitable for commercial and open-source use.
See LICENSE for the complete text. Commercial support is available on request.
- Ultralytics RT-DETR: Object detection and tracking
- FastAPI: Modern Python web framework
- Google Gemini: Video understanding API
- Qwen API: Alternative Video Understanding API
- ByteTrack: Multi-object tracking algorithm
Built with β€οΈ for efficient video intelligence in SF