A comprehensive Remotely Operated Vehicle (ROV) system with real-time object detection, tracking, and control capabilities. This project integrates embedded systems (ESP8266, ESP32S3), computer vision (YOLOv8), and a modern web interface for complete ROV operation.
- Overview
- System Architecture
- Features
- Project Structure
- Hardware Requirements
- Software Requirements
- Installation
- Configuration
- Usage
- API Documentation
- Troubleshooting
- Contributing
- License
This project implements a complete ROV control and monitoring system that combines:
- Embedded Control: ESP8266-based motor and servo control
- Video Streaming: ESP32S3 camera module for live video feed
- Object Detection: Real-time YOLOv8 inference with TensorRT acceleration
- Object Tracking: Multi-object tracking using Norfair with Kalman filtering
- Web Interface: React-based control dashboard with real-time visualization
- Data Logging: Automatic detection logging with session management
The system is designed for real-time operation with low latency, making it suitable for applications requiring immediate feedback and control.
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ React Frontend (Web UI) โ โ - Control Interface - Detection Charts - Camera Feed โ โโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ HTTP/WebSocket โโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ FastAPI Backend (rov_backend.py) โ โ - Command Routing - WebSocket Bridge - Log Management โ โโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโ โ โ โ WebSocket โ HTTP โ โ โโโโโโโโโผโโโโโโโโโโโ โโโโโโโโโโโโผโโโโโโโโโโโโโโโ โ ESP8266 Motor โ โ ESP32S3 Camera Module โ โ Controller โ โ (Video Stream Server) โ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโฌโโโโโโโโโโโโโโโโ โ MJPEG Stream โโโโโโโโโโโโผโโโโโโโโโโโโโโโโ โ Object Detection โ โ (camera_detector.py) โ โ - YOLOv8 TensorRT โ โ - Norfair Tracking โ โ - Detection Logging โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโ - Control Flow: User โ React UI โ FastAPI โ ESP8266 โ Motors/Servos
- Video Flow: ESP32S3 โ MJPEG Stream โ Object Detection โ Annotated Video
- Data Flow: Object Detection โ Log File โ FastAPI โ React UI (Charts)
- Real-time Joystick Control: 8-directional movement with adjustable speed
- Path Planning: Visual grid-based path planner with automatic execution
- Pan/Tilt Camera Control: Interactive control for camera positioning
- Movement Settings: Configurable forward/backward and turn speeds/durations
- Button Controls: Direct forward, backward, left, right, and stop commands
- Real-time Object Detection: YOLOv8 model with TensorRT acceleration
- Multi-Object Tracking: Persistent tracking across frames using Norfair
- Detection Logging: Automatic logging of detected objects with timestamps
- Session Management: Organize detections into measurement sessions
- Visualization: Pie charts showing detection statistics by object type
- Line Crossing Detection: Tracks objects crossing defined vertical boundaries
- Draggable UI Cards: Customizable dashboard layout
- Live Camera Feed: MJPEG stream display with configurable URL
- Real-time Statistics: FPS, latency, and detection counts
- WebSocket Telemetry: Real-time status updates from ROV
- Responsive Design: Works on desktop and mobile devices
ROV-Real-Time-Object-Detection/ โ โโโ ARDUINO/ # Embedded firmware โ โโโ ESP8266/ # Motor and servo controller โ โ โโโ sketch_apr2a/ โ โ โโโ sketch_apr2a.ino # Main control firmware โ โ โ โโโ XIAO ESP32S3/ # Camera module โ โโโ CameraWebServer/ โ โโโ CameraWebServer.ino # Camera server firmware โ โโโ app_httpd.cpp # HTTP server implementation โ โโโ camera_pins.h # Camera pin definitions โ โโโ partitions.csv # ESP32 partition table โ โโโ Object detection/ # Computer vision module โ โโโ camera_detector.py # Main detection script โ โโโ yolo12n.engine # TensorRT model (generated) โ โโโ detections_log.txt # Detection log file โ โโโ package.json # Node dependencies (for charts) โ โโโ REACT+API/ # Web application โ โโโ rov_backend.py # FastAPI backend server โ โโโ rov_frontend/ # React frontend โ โโโ src/ โ โ โโโ App.js # Main application component โ โ โโโ DetectionPieChart.jsx # Detection visualization โ โ โโโ Animations/ # UI animation components โ โ โโโ Backgrounds/ # Background effects โ โโโ public/ # Static assets โ โโโ package.json # Frontend dependencies โ โโโ LICENSE # GPL v3 License For detailed information about each component, see:
- ARDUINO/README.md - Embedded firmware documentation
- Object detection/README.md - Detection system documentation
- REACT+API/README.md - Web application documentation
- ESP8266 Development Board (e.g., NodeMCU, Wemos D1 Mini)
- Motor Driver (L298N or similar)
- 2x DC Motors for movement
- 2x Servo Motors for pan/tilt camera mount
- Power Supply (7-12V for motors, 5V for ESP8266)
- ESP32S3 Development Board (XIAO ESP32S3 or similar)
- Camera Module compatible with ESP32 (OV2640, OV3660, or OV5640)
- PSRAM (recommended for better performance)
- Computer with:
- NVIDIA GPU (for TensorRT acceleration)
- CUDA Toolkit 11.0+
- Python 3.8+
- Node.js 16+ (for frontend)
- Python 3.8 or higher
- OpenCV (cv2)
- Ultralytics YOLO
- TensorRT
- NumPy
- CuPy (for GPU acceleration)
- Numba
- Norfair (for object tracking)
- FastAPI
- WebSockets
- Uvicorn
- Node.js 16+ and npm
- React 18+
- Material-UI (MUI)
- Recharts
- Axios
- Arduino IDE 1.8+ or PlatformIO
- ESP8266 Board Support Package
- ESP32 Board Support Package
- Required Libraries:
- WebSocketsServer (for ESP8266)
- ArduinoJson
- Servo
git clone <repository-url> cd ROV-Real-Time-Object-Detection# Create virtual environment (recommended) python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate # Install dependencies pip install opencv-python ultralytics numpy cupy numba norfair fastapi websockets uvicorncd REACT+API/rov_frontend npm installSee ARDUINO/README.md for detailed instructions on flashing the ESP8266 and ESP32S3 firmware.
The system uses a WiFi Access Point (AP) mode. Configure the following:
ESP32S3 Camera (Access Point):
- SSID:
ESP32-CAM(default) - Password:
123456789(default) - IP:
192.168.4.1(default)
ESP8266 Motor Controller:
- Connects to ESP32-CAM network
- Static IP:
192.168.4.2(or 3, 4, 5 for multiple ROVs) - WebSocket Port:
81
Edit Object detection/camera_detector.py:
VIDEO_STREAM_SOURCE = "http://192.168.4.1:81/stream" # Camera stream URL MODEL_PATH = "yolo12n.engine" # TensorRT model path MODEL_INPUT_SIZE = 320 # Input image size DISPLAY = True # Show video windowEdit REACT+API/rov_backend.py:
CAR_IPS = ["192.168.4.2", "192.168.4.3", "192.168.4.4", "192.168.4.5"] # ROV IPs CAR_PORT = 81 # WebSocket port LOG_FILE_PATH = "detections_log.txt" # Log file pathEdit REACT+API/rov_frontend/src/App.js:
const API_URL = 'http://localhost:8000'; // Backend API URL- Start the Backend Server:
cd REACT+API python rov_backend.py # Or with uvicorn: uvicorn rov_backend:app --host 0.0.0.0 --port 8000- Start the Frontend:
cd REACT+API/rov_frontend npm start- Start Object Detection:
cd "Object detection" python camera_detector.py- Access the Web Interface:
- Open browser to
http://localhost:3000 - The ROV controller interface will load
- Open browser to
- Joystick Control: Use the joystick card to control movement in real-time
- Path Planning:
- Click dots on the grid to create a path
- Click "Start" to execute the path automatically
- Pan/Tilt: Drag the pointer in the pan/tilt box to adjust camera angle
- Movement Settings: Adjust speed and duration sliders for fine control
- Detection Chart: View pie chart of detected object types
- Session Management: Start new measurement sessions with labels
- Log Viewing: Detection logs are automatically updated in real-time
Send movement command to ROV.
Request Body:
{ "left": 150, // Left motor speed (-255 to 255) "right": -150, // Right motor speed (-255 to 255, typically inverted) "pan": 90, // Pan angle (0-180) "tilt": 90 // Tilt angle (0-180) }Response:
{ "ok": true }Real-time bidirectional communication with ROV.
Messages: JSON strings with status updates from ROV.
Start a new detection logging session.
Response:
{ "ok": true, "start_pos": 1234 }Get new log entries since last session start.
Response:
{ "ok": true, "entries": "2024-01-01 12:00:00.123 | ID: 1 | class: person | x: 100 | y: 200\n..." }End current logging session.
Start a new measurement session with optional label.
Request Body:
{ "label": "Test Run 1" }Response:
{ "ok": true, "session_id": "20240101120000" }- Verify ESP32S3 is powered and connected
- Check WiFi connection to
ESP32-CAMnetwork - Verify camera stream URL in detection script
- Check camera module connections
- Verify ESP8266 is connected to WiFi network
- Check WebSocket connection in backend logs
- Verify motor driver connections
- Check power supply voltage
- Verify NVIDIA GPU and CUDA are installed
- Check TensorRT model file exists
- Verify camera stream is accessible
- Check GPU memory availability
- Verify backend server is running on port 8000
- Check CORS settings in backend
- Verify API_URL in frontend code
- Check browser console for errors
- Reduce Model Input Size: Lower
MODEL_INPUT_SIZEfor faster inference - Adjust Confidence Threshold: Modify
confparameter in YOLO predict call - Disable Display: Set
DISPLAY = Falseto reduce CPU usage - Optimize Network: Use wired connection for lower latency
Contributions are welcome! Please follow these steps:
- Fork the repository
- Create a feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.
- Ultralytics YOLO for object detection
- Norfair for object tracking
- FastAPI for the backend framework