VLMS - Video Intelligence SDK

Event-based video intelligence with 98% cost reduction

Multi-source video processing SDK with intelligent frame selection, motion tracking, and VLM-powered analysis. Built for production use with RTSP, ONVIF, UDP, WebRTC, and more coming soon.

Note: pip install vlm-sdk installs the SDK components (connectors, preprocessors, providers). The FastAPI service depends on additional packages; install them separately if you plan to run the API.

🌟 Features

Core SDK (`vlm`)

🎯 Event-based processing: Only analyze frames with motion/activity (98% cost reduction vs frame-by-frame)
📹 Multi-source connectors: RTSP, ONVIF, UDP, WebRTC, File
🤖 RT-DETR + ByteTrack: Real-time object detection and motion tracking
🧠 Provider-agnostic VLM: Gemini, Qwen, ObserveeVLM (Small VLM coming soon) (via env config)
🎨 Advanced analysis: Timestamps, object detection, bounding boxes, range queries

Production API (`api`)

⚡ FastAPI REST API: Industry-standard multi-stream video intelligence
📡 Server-Sent Events (SSE): Real-time event streaming
🔐 Authentication: API key-based auth with rate limiting
📊 Monitoring: Health checks, metrics, stream management
🔧 Configurable: Environment-based provider selection

🚀 Quick Start

Installation

# Install from PyPI pip install vlm-sdk # Or install from source git clone https://github.com/observee-ai/vlm-sdk.git cd vlm-sdk pip install -e .

SDK Usage

from vlm.preprocessors import DetectorPreprocessor from vlm.connectors import RTSPConnector from vlm.providers.gemini import GeminiVideoService import asyncio # Initialize components connector = RTSPConnector("rtsp://camera.local/stream1") preprocessor = DetectorPreprocessor({ "confidence_threshold": 0.6, "interesting_objects": ["person", "car"], "min_event_duration": 2.0, # Only events longer than 2 seconds }) gemini = GeminiVideoService(api_key="your-gemini-key") # Process stream async def process(): for frame in connector.stream_frames(): result = preprocessor.process_frame(frame.data, frame.timestamp) if result['status'] == 'completed': # Event detected! Analyze with VLM upload = await gemini.upload_file(result['clip_path']) analysis = await gemini.query_video_with_file( upload['name'], "Describe the activity in this video" ) print(f"Analysis: {analysis['response']}") asyncio.run(process())

DetectorPreprocessor accepts a configuration dictionary (matching the keys documented in vlm/preprocessors/detector/core.py). Use interesting_objects to control tracked classes and min_event_duration for event length thresholds. Configuration keys such as confidence_threshold, interesting_objects, and min_event_duration must be provided via the config dict (not as individual keyword arguments).

API Server

# Set environment variables export ADMIN_API_KEY=your-secret-key export GEMINI_API_KEY=your-gemini-key export VLM_PROVIDER=gemini # or openai, anthropic # Install SDK (from repo checkout) pip install -e . # Install API dependencies (required for running api.main) pip install fastapi uvicorn[standard] pydantic python-dotenv # or install everything we ship in Docker pip install -r requirements.txt # Run server python -m api.main # Server starts at http://localhost:8000

Note: To accept WebRTC publishers, run MediaMTX alongside the API using the provided mediamtx.yml (see docs/apiguide.md for commands).

Docker Image

# Pull the public image (linux/amd64) docker pull observee/vlm-sdk:latest # Run the API (set your API keys as needed) docker run --rm -p 8000:8000 \ -e ADMIN_API_KEY=your-secret-key \ -e GEMINI_API_KEY=your-gemini-key \ observee/vlm-sdk:latest

Create a stream:

curl -X POST http://localhost:8000/v1/streams/create \ -H "X-Admin-API-Key: your-secret-key" \ -H "X-VLM-API-Key: your-gemini-key" \ -H "Content-Type: application/json" \ -d '{  "source_type": "rtsp",  "source_url": "rtsp://camera.local/stream1",  "config": {  "username": "admin",  "password": "password",  "profile": "security",  "min_duration": 2.0  },  "analysis": {  "enabled": true,  "mode": "basic",  "prompt": "Describe any activity or movement"  }  }'

Listen to events (SSE):

curl -N http://localhost:8000/v1/streams/{stream_id}/events \ -H "X-Admin-API-Key: your-secret-key"

📖 Documentation

Environment Variables

# Required ADMIN_API_KEY=your-admin-key # API authentication # VLM Provider (choose one) VLM_PROVIDER=gemini # gemini, openai, or anthropic GEMINI_API_KEY=your-gemini-key # If using Gemini OPENAI_API_KEY=your-openai-key # If using OpenAI ANTHROPIC_API_KEY=your-anthropic-key # If using Claude # Optional: Rate Limiting RATE_LIMIT_REQUESTS=100 # Requests per window RATE_LIMIT_WINDOW=60 # Time window (seconds)

Analysis Modes

Basic - Simple video description

{ "analysis": { "mode": "basic", "prompt": "Describe the activity" } }

Timestamps - Find specific moments

{ "analysis": { "mode": "timestamps", "find_timestamps": { "query": "when does someone wave", "find_all": true, "confidence_threshold": 0.7 } } }

Supported Connectors

Connector	Description	Config
RTSP	IP camera streams	`username`, `password`, `transport` (tcp/udp)
ONVIF	Auto-discovery + PTZ	`username`, `password`, `profile_index`
UDP	UDP video receiver	`host`, `port`, `buffer_size`
WebRTC	Browser streams	`signaling_url`, `ice_servers`

API Endpoints

POST /v1/streams/create Create stream GET /v1/streams/{id}/events SSE event stream GET /v1/streams/{id} Get status DELETE /v1/streams/{id} Stop stream GET /v1/streams List all streams GET /v1/streams/discover/onvif Discover cameras GET /v1/streams/health Health check

🏗️ Architecture

┌─────────────┐ │ Connector │ (RTSP/ONVIF/UDP/WebRTC) └──────┬──────┘ │ Frames ▼ ┌─────────────┐ │ RT-DETR │ (Object detection + motion tracking) └──────┬──────┘ │ Events (only motion/activity) ▼ ┌─────────────┐ │ Event Buffer│ (Collects frames during events) └──────┬──────┘ │ Complete Events ├────────────────┐ │ │ ▼ ▼ ┌───────────┐ ┌──────────┐ │ Storage │ │ VLM │ (Gemini/Qwen/ObserveeVLM) └───────────┘ └────┬─────┘ │ ▼ ┌───────────────┐ │ SSE / Webhooks│ └───────────────┘

Key Innovation: Event-based processing analyzes only frames with detected motion/activity, reducing VLM API calls by 98% compared to frame-by-frame analysis.

📦 Repository Layout

vlm-sdk/ ├── vlm/ # Core SDK components ├── api/ # FastAPI service (routers, services, models) ├── examples/ # Sample scripts for RTSP/UDP/WebRTC usage ├── docs/ # Additional documentation ├── mediamtx/ # MediaMTX config for WebRTC/RTSP bridging ├── output/ # Example generated clips (safe to remove) ├── pyproject.toml # SDK packaging metadata ├── requirements.txt # Full dependency list for API/Docker ├── Dockerfile # Reference container for the API └── README.md

🔧 Development

# Clone repository git clone https://github.com/observee-ai/vlm-sdk.git cd vlm-sdk # Install with dev dependencies pip install -e ".[dev]" # Include API stack if you plan to run the server locally pip install -r requirements.txt # Run tests pytest tests/ # Format code black vlm/ api/ ruff check vlm/ api/ # Run API server (development) uvicorn api.main:app --reload

🎯 Use Cases

🏢 Security & Surveillance: 24/7 perimeter monitoring with motion alerts
🏪 Retail Analytics: Customer counting, queue analysis, behavior tracking
🚗 Traffic Monitoring: Vehicle counting, flow analysis, incident detection
🏠 Smart Home: Activity monitoring, intrusion detection
🏭 Industrial: Safety compliance, equipment monitoring

📊 Cost Comparison

Approach	Frames/Hour	VLM API Calls	Cost Reduction
Frame-by-frame	54,000 (15 FPS)	54,000	Baseline
Event-based (VLMS)	54,000	~1,000	98% ✅

Example: 1-hour 15 FPS stream with 5-10 motion events

🤝 Contributing

Contributions welcome! Please see CONTRIBUTING.md for guidelines.

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📄 License

Apache-2.0 – Permissive license suitable for commercial and open-source use.

See LICENSE for the complete text. Commercial support is available on request.

🙏 Acknowledgments

Ultralytics RT-DETR: Object detection and tracking
FastAPI: Modern Python web framework
Google Gemini: Video understanding API
Qwen API: Alternative Video Understanding API
ByteTrack: Multi-object tracking algorithm

Built with ❤️ for efficient video intelligence in SF

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

VLMS - Video Intelligence SDK

🌟 Features

Core SDK (`vlm`)

Production API (`api`)

🚀 Quick Start

Installation

SDK Usage

API Server

Docker Image

📖 Documentation

Environment Variables

Analysis Modes

Supported Connectors

API Endpoints

🏗️ Architecture

📦 Repository Layout

🔧 Development

🎯 Use Cases

📊 Cost Comparison

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
api		api
docs		docs
examples		examples
vlm		vlm
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
mediamtx.yml		mediamtx.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

License

observee-ai/vlm-sdk

Folders and files

Latest commit

History

Repository files navigation

VLMS - Video Intelligence SDK

🌟 Features

Core SDK (vlm)

Production API (api)

🚀 Quick Start

Installation

SDK Usage

API Server

Docker Image

📖 Documentation

Environment Variables

Analysis Modes

Supported Connectors

API Endpoints

🏗️ Architecture

📦 Repository Layout

🔧 Development

🎯 Use Cases

📊 Cost Comparison

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Core SDK (`vlm`)

Production API (`api`)

Packages