Skip to content

vshortt73/vision_syststems

Repository files navigation

Vision System Demo

Standalone demo combining Object Detection (YOLOv8) and Face Recognition (face_recognition library) for testing and experimentation.

Features

  • 🎯 Object Detection: Detects 80+ object classes (people, animals, objects)
  • πŸ‘€ Face Recognition: Identifies specific people by name
  • πŸ“Ή Live Webcam: Real-time detection and recognition
  • πŸ–ΌοΈ Image Testing: Test on static images
  • πŸ“Š Detailed Analysis: Scene summaries with confidence scores
  • 🎨 Visual Feedback: Color-coded bounding boxes

Installation

Method 1: Automatic (Recommended)

chmod +x install_vision.sh ./install_vision.sh

Method 2: Manual

# Update pip pip install --upgrade pip # Install dependencies pip install opencv-python pip install cmake pip install dlib pip install face_recognition pip install ultralytics pip install numpy

Note: On first run, YOLOv8 will automatically download the model weights (~6MB).

Quick Start

Basic Usage (No Face Recognition)

python vision_demo.py

Select option 1 for webcam or option 2 to test an image.

With Face Recognition

  1. Create a directory for known faces:

    mkdir known_faces
  2. Add photos of people you want to recognize:

    # Photo filenames become the person's name cp /path/to/victor_photo.jpg known_faces/Victor.jpg cp /path/to/jane_photo.jpg known_faces/Jane.jpg
  3. Run the demo:

    python vision_demo.py

Tips for face photos:

  • Front-facing, well-lit photos work best
  • One person per photo
  • Multiple photos per person will improve accuracy (coming soon)

Usage

Webcam Mode

python vision_demo.py # Choose option 1

Controls:

  • SPACE - Analyze current frame and show detailed results
  • s - Save current frame to disk
  • q - Quit

What you'll see:

  • Live webcam feed with FPS counter
  • When you press SPACE:
    • Full scene analysis in terminal
    • Detection results with bounding boxes
    • Natural language description

Image Testing Mode

python vision_demo.py # Choose option 2 # Enter path to image

Analyzes a static image and displays results.

Understanding the Output

Visual Output

Color-coded bounding boxes:

  • 🟒 GREEN - Identified person (with name and confidence)
  • 🟑 YELLOW - Unidentified person
  • πŸ”΅ CYAN - Animals (cat, dog, bird, etc.)
  • πŸ”· BLUE - Objects (chair, laptop, ball, etc.)

Terminal Output

Example:

============================================================ SCENE ANALYSIS ============================================================ I can see: Victor, a cat, a laptop, 2 cups 🟒 Identified People: β€’ Victor (face: 94.3%, detection: 0.95) πŸ”΅ Animals: β€’ Cat (confidence: 0.89) πŸ”· Objects: β€’ Laptop (confidence: 0.92) β€’ Cup (confidence: 0.87) β€’ Cup (confidence: 0.85) 

Detectable Objects

People & Animals

  • person, cat, dog, bird, horse, cow, sheep, bear, zebra, giraffe, elephant

Common Objects

  • chair, couch, table, bed, tv, laptop, mouse, keyboard
  • cell phone, book, clock, vase, scissors
  • bottle, cup, fork, knife, spoon
  • car, bicycle, motorcycle, airplane, bus, train, truck
  • traffic light, fire hydrant, stop sign, parking meter, bench
  • backpack, umbrella, handbag, tie, suitcase
  • frisbee, skis, snowboard, sports ball, kite
  • baseball bat, baseball glove, skateboard, surfboard
  • tennis racket, bottle, wine glass, cup, fork
  • ... and 80+ total classes

Full class list

Performance

Typical frame rates on CPU:

  • Object detection only: ~20-30 FPS
  • Object detection + face recognition: ~5-10 FPS

GPU acceleration:

  • YOLOv8 automatically uses CUDA if available
  • For face_recognition GPU support, dlib must be compiled with CUDA

Troubleshooting

"Could not open webcam"

  • Check webcam is connected and not in use by another program
  • Try different camera index: edit cap = cv2.VideoCapture(1) in the code

"No face found in image"

  • Ensure face is clearly visible and well-lit
  • Try a different photo with better face visibility
  • Face should be reasonably large in frame

dlib installation fails

Ubuntu/Debian:

sudo apt-get install build-essential cmake sudo apt-get install libopenblas-dev liblapack-dev sudo apt-get install libx11-dev libgtk-3-dev pip install dlib

YOLOv8 model won't download

Manually download from: https://github.com/ultralytics/assets/releases Place in: ~/.cache/ultralytics/

Code Structure

VisionDemo β”œβ”€β”€ __init__() # Initialize models β”œβ”€β”€ add_known_face() # Add person to database β”œβ”€β”€ detect_objects() # Run YOLO detection β”œβ”€β”€ identify_face() # Run face recognition β”œβ”€β”€ process_frame() # Complete pipeline β”œβ”€β”€ draw_detections() # Visualize results β”œβ”€β”€ describe_scene() # Natural language output β”œβ”€β”€ run_webcam() # Live demo └── test_image() # Static image demo

Customization

Adjust detection confidence

# In detect_objects(), filter by confidence if confidence < 0.5: # Adjust threshold continue

Change YOLO model size

# Faster but less accurate self.yolo = YOLO('yolov8n.pt') # nano (current) # More accurate but slower self.yolo = YOLO('yolov8s.pt') # small self.yolo = YOLO('yolov8m.pt') # medium self.yolo = YOLO('yolov8l.pt') # large self.yolo = YOLO('yolov8x.pt') # extra large

Adjust face recognition tolerance

# In identify_face(), change tolerance matches = face_recognition.compare_faces( [known_encoding], face_encodings[0], tolerance=0.6 # Lower = stricter (default 0.6) )

Filter object categories

# Add to process_frame() to ignore certain objects if det['class'] in ['chair', 'table']: # Objects to ignore continue

Integration with Iris

Once you're comfortable with how this works, key components to integrate:

  1. VisionDemo.process_frame() - Main processing pipeline
  2. VisionDemo.describe_scene() - Natural language generation
  3. VisionDemo.add_known_face() - Building the face database

The structured scene dictionary can be passed directly into Iris's context.

Next Steps

  1. Test with your webcam: See what objects are detected
  2. Add your face: Create known_faces/Victor.jpg
  3. Experiment with different scenes: Try different objects, lighting
  4. Understand the output: See how confidence scores work
  5. Customize: Adjust thresholds and filters for your needs
  6. Integrate: Once comfortable, add to Iris's vision system

Resources

License

This demo uses:

  • YOLOv8: AGPL-3.0
  • face_recognition: MIT
  • dlib: Boost Software License

For commercial use, review each library's licensing requirements.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors