Standalone demo combining Object Detection (YOLOv8) and Face Recognition (face_recognition library) for testing and experimentation.
- π― Object Detection: Detects 80+ object classes (people, animals, objects)
- π€ Face Recognition: Identifies specific people by name
- πΉ Live Webcam: Real-time detection and recognition
- πΌοΈ Image Testing: Test on static images
- π Detailed Analysis: Scene summaries with confidence scores
- π¨ Visual Feedback: Color-coded bounding boxes
chmod +x install_vision.sh ./install_vision.sh# Update pip pip install --upgrade pip # Install dependencies pip install opencv-python pip install cmake pip install dlib pip install face_recognition pip install ultralytics pip install numpyNote: On first run, YOLOv8 will automatically download the model weights (~6MB).
python vision_demo.pySelect option 1 for webcam or option 2 to test an image.
-
Create a directory for known faces:
mkdir known_faces
-
Add photos of people you want to recognize:
# Photo filenames become the person's name cp /path/to/victor_photo.jpg known_faces/Victor.jpg cp /path/to/jane_photo.jpg known_faces/Jane.jpg -
Run the demo:
python vision_demo.py
Tips for face photos:
- Front-facing, well-lit photos work best
- One person per photo
- Multiple photos per person will improve accuracy (coming soon)
python vision_demo.py # Choose option 1Controls:
SPACE- Analyze current frame and show detailed resultss- Save current frame to diskq- Quit
What you'll see:
- Live webcam feed with FPS counter
- When you press SPACE:
- Full scene analysis in terminal
- Detection results with bounding boxes
- Natural language description
python vision_demo.py # Choose option 2 # Enter path to imageAnalyzes a static image and displays results.
Color-coded bounding boxes:
- π’ GREEN - Identified person (with name and confidence)
- π‘ YELLOW - Unidentified person
- π΅ CYAN - Animals (cat, dog, bird, etc.)
- π· BLUE - Objects (chair, laptop, ball, etc.)
Example:
============================================================ SCENE ANALYSIS ============================================================ I can see: Victor, a cat, a laptop, 2 cups π’ Identified People: β’ Victor (face: 94.3%, detection: 0.95) π΅ Animals: β’ Cat (confidence: 0.89) π· Objects: β’ Laptop (confidence: 0.92) β’ Cup (confidence: 0.87) β’ Cup (confidence: 0.85) - person, cat, dog, bird, horse, cow, sheep, bear, zebra, giraffe, elephant
- chair, couch, table, bed, tv, laptop, mouse, keyboard
- cell phone, book, clock, vase, scissors
- bottle, cup, fork, knife, spoon
- car, bicycle, motorcycle, airplane, bus, train, truck
- traffic light, fire hydrant, stop sign, parking meter, bench
- backpack, umbrella, handbag, tie, suitcase
- frisbee, skis, snowboard, sports ball, kite
- baseball bat, baseball glove, skateboard, surfboard
- tennis racket, bottle, wine glass, cup, fork
- ... and 80+ total classes
Typical frame rates on CPU:
- Object detection only: ~20-30 FPS
- Object detection + face recognition: ~5-10 FPS
GPU acceleration:
- YOLOv8 automatically uses CUDA if available
- For face_recognition GPU support, dlib must be compiled with CUDA
- Check webcam is connected and not in use by another program
- Try different camera index: edit
cap = cv2.VideoCapture(1)in the code
- Ensure face is clearly visible and well-lit
- Try a different photo with better face visibility
- Face should be reasonably large in frame
Ubuntu/Debian:
sudo apt-get install build-essential cmake sudo apt-get install libopenblas-dev liblapack-dev sudo apt-get install libx11-dev libgtk-3-dev pip install dlibManually download from: https://github.com/ultralytics/assets/releases Place in: ~/.cache/ultralytics/
VisionDemo βββ __init__() # Initialize models βββ add_known_face() # Add person to database βββ detect_objects() # Run YOLO detection βββ identify_face() # Run face recognition βββ process_frame() # Complete pipeline βββ draw_detections() # Visualize results βββ describe_scene() # Natural language output βββ run_webcam() # Live demo βββ test_image() # Static image demo# In detect_objects(), filter by confidence if confidence < 0.5: # Adjust threshold continue# Faster but less accurate self.yolo = YOLO('yolov8n.pt') # nano (current) # More accurate but slower self.yolo = YOLO('yolov8s.pt') # small self.yolo = YOLO('yolov8m.pt') # medium self.yolo = YOLO('yolov8l.pt') # large self.yolo = YOLO('yolov8x.pt') # extra large# In identify_face(), change tolerance matches = face_recognition.compare_faces( [known_encoding], face_encodings[0], tolerance=0.6 # Lower = stricter (default 0.6) )# Add to process_frame() to ignore certain objects if det['class'] in ['chair', 'table']: # Objects to ignore continueOnce you're comfortable with how this works, key components to integrate:
- VisionDemo.process_frame() - Main processing pipeline
- VisionDemo.describe_scene() - Natural language generation
- VisionDemo.add_known_face() - Building the face database
The structured scene dictionary can be passed directly into Iris's context.
- Test with your webcam: See what objects are detected
- Add your face: Create known_faces/Victor.jpg
- Experiment with different scenes: Try different objects, lighting
- Understand the output: See how confidence scores work
- Customize: Adjust thresholds and filters for your needs
- Integrate: Once comfortable, add to Iris's vision system
This demo uses:
- YOLOv8: AGPL-3.0
- face_recognition: MIT
- dlib: Boost Software License
For commercial use, review each library's licensing requirements.