Image Processing: Algorithm Improvement for Real-Time FedEx Logo Detector

Question

I've been working on a project involving image processing for logo detection. Specifically, the goal is to develop an automated system for a real-time FedEx truck/logo detector that reads frames from a IP camera stream and sends a notification on detection. Here's a sample of the system in action with the recognized logo surrounded in the green rectangle.

Some constraints on the project:

Uses raw OpenCV (no deep learning, AI, or trained neural networks)
Image background can be noisy
The brightness of the image can vary greatly (morning, afternoon, night)
The FedEx truck/logo can have any scale, rotation, or orientation since it could be parked anywhere on the sidewalk
The logo could potentially be fuzzy or blurry with different shades depending on the time of day
There may be many other vehicles with similar sizes or colors in the same frame
Real-time detection (~25 FPS from IP camera)
The IP camera is in a fixed position and the FedEx truck will always be in the same orientation (never backwards or upside down)
The Fedex Truck will always be the "red" variation instead of the "green" variation

Current Implementation/Algorithm

I have two threads:

Thread #1 - Captures frames from the IP camera using cv2.VideoCapture() and resizes frame for further processing. Decided to handle grabbing frames in a separate thread to improve FPS by reducing I/O latency since cv2.VideoCapture() is blocking. By dedicating an independent thread just for capturing frames, this would allow the main processing thread to always have a frame available to perform detection on.
Thread #2 - Main processing/detection thread to detect FedEx logo using color thresholding and contour detection.

Overall Pseudo-algorithm

For each frame: Find bounding box for purple color of logo Find bounding box for red/orange color of logo If both bounding boxes are valid/adjacent and contours pass checks: Combine bounding boxes Draw combined bounding boxes on original frame Play sound notification for detected logo

Color thresholding for logo detection

For color thresholding, I have defined HSV (low, high) thresholds for purple and red to detect the logo.

colors = { 'purple': ([120,45,45], [150,255,255]), 'red': ([0,130,0], [15,255,255]) }

To find the bounding box coordinates for each color, I follow this algorithm:

Blur the frame
Erode and dilate the frame with a kernel to remove background noise
Convert frame from BGR to HSV color format
Perform a mask on the frame using the lower and upper HSV color bounds with set color thresholds
Find largest contour in the mask and obtain bounding coordinates

After performing a mask, I obtain these isolated purple (left) and red (right) sections of the logo.

False positive checks

Now that I have the two masks, I perform checks to ensure that the found bounding boxes actually form a logo. To do this, I use cv2.matchShapes() which compares the two contours and returns a metric showing the similarity. The lower the result, the higher the match. In addition, I use cv2.pointPolygonTest() which finds the shortest distance between a point in the image and a contour for additional verification. My false positive process involves:

Checking if the bounding boxes are valid
Ensuring the two bounding boxes are adjacent based on their relative proximity

If the bounding boxes pass the adjacency and similarity metric test, the bounding boxes are combined and a FedEx notification is triggered.

Results

This check algorithm is not really robust as there are many false positives and failed detections. For instance, these false positives were triggered.

While this color thresholding and contour detection approach worked in basic cases where the logo was clear, it was severely lacking in some areas:

There is latency problems from having to compute bounding boxes on each frame
It occasionally false detects when the logo is not present
Brightness and time of day had a great impact on detection accuracy
When the logo was on a skewed angle, color threshold detection worked but was unable to detect the logo due to the check algorithm.

Would anyone be able to help me improve my algorithm or suggest alternative detection strategies? Is there any other way to perform this detection since color thresholding is highly dependent on exact calibration? If possible, I would like to move away from color thresholding and the multiple layers of filters since it's not very robust. Any insight or advice is greatly appreciated!

There is one idea to filter out the false contours by shape matching, it means when you detected the purple and red contours, then you can check if the shapes (purple with purple and red with red) were matched (70%) because the logo has a fixed shape which helps you to easily detect the logo. — Bahramdun Adil
– Bahramdun Adil, Commented Apr 2, 2019 at 2:39
have a look at stackoverflow.com/questions/10168686/… it's basically the same problem. also note stackoverflow.com/questions/24299500/can-sift-run-in-realtime — Piglet
– Piglet, Commented Apr 2, 2019 at 8:59
Color segmentation is a good start, but I think you should try using a custom trained Haar cascade. Haar features is the backbone of human face detection. You need some positive and negative samples to train the model. — ZdaR
– ZdaR, Commented Jun 18, 2019 at 4:58
Is the IP camera in a fixed position and looking at a one way street? The FedEx truck will show up with the logo in the same location and the same orientation - the truck will never be backwards or upside down. Those constraints simplify the problem greatly. — Stephen Meschke
– Stephen Meschke, Commented Jun 20, 2019 at 16:47
@StephenMeschke the IP camera is in a fixed position and looking at the one way street like the 1st picture. Yes, the FedEx truck will always show up in that orientation, it will never be backwards or upside down. In addition, the FedEx truck will always be the "red" variation like the picture instead of the "green" ground truck — nathancy
– nathancy, Commented Jun 20, 2019 at 19:37

Carl H · Accepted Answer · 2019-06-20 23:04:39Z

You might want to take a look at feature matching. The goal is to find features in two images, a template image, and a noisy image and match them. This would allow you to find the template (the logo) in the noisy image (the camera image).

A feature is, in essence, things that humans would find interesting in an image, such as corners or open spaces. I would recommend using a scale-invariant feature transform (SIFT) as a feature detection algorithm. The reason I suggest using SIFT is that it is invariant to image translation, scaling, and rotation, partially invariant to illumination changes and robust to local geometric distortion. This matches your specification.

Example of feature detection

I generated the above image using code modified from the OpenCV docs docs on SIFT feature detection:

import numpy as np import cv2 from matplotlib import pyplot as plt img = cv2.imread('main.jpg',0) # target Image # Create the sift object sift = cv2.xfeatures2d.SIFT_create(700) # Find keypoints and descriptors directly kp, des = sift.detectAndCompute(img, None) # Add the keypoints to the final image img2 = cv2.drawKeypoints(img, kp, None, (255, 0, 0), 4) # Show the image plt.imshow(img2) plt.show()

You will notice when doing this that a large number of the features do land on the FedEx logo (Above).

The next thing I did was try matching the features from the video feed to the features in the FedEx logo. I did this using the FLANN feature matcher. You could have gone with many approaches (including brute force) but because you are working on a video feed this is probably your best option. The code below is inspired from the OpenCV docs on feature matching:

import numpy as np import cv2 from matplotlib import pyplot as plt logo = cv2.imread('logo.jpg', 0) # query Image img = cv2.imread('main2.jpg',0) # target Image # Create the sift object sift = cv2.xfeatures2d.SIFT_create(700) # Find keypoints and descriptors directly kp1, des1 = sift.detectAndCompute(img, None) kp2, des2 = sift.detectAndCompute(logo,None) # FLANN parameters FLANN_INDEX_KDTREE = 1 index_params = dict(algorithm = FLANN_INDEX_KDTREE, trees = 5) search_params = dict(checks=50) # or pass empty dictionary flann = cv2.FlannBasedMatcher(index_params,search_params) matches = flann.knnMatch(des1,des2,k=2) # Need to draw only good matches, so create a mask matchesMask = [[0,0] for i in range(len(matches))] # ratio test as per Lowe's paper for i,(m,n) in enumerate(matches): if m.distance < 0.7*n.distance: matchesMask[i]=[1,0] # Draw lines draw_params = dict(matchColor = (0,255,0), singlePointColor = (255,0,0), matchesMask = matchesMask, flags = 0) # Display the matches img3 = cv2.drawMatchesKnn(img,kp1,logo,kp2,matches,None,**draw_params) plt.imshow(img3, ) plt.show()

Using this I was able to get the following features matched as seen below. You will notice that there are outliers. However the majority of features match:

The final step would then to be to simply draw a bounding box around this image. I will link you to another stack overflow question which does something similar but with the orb detector. Here is another way to get a bounding box using the OpenCV docs.

I hope this helps!

SIFT feature detection seems like a very good approach especially since it seems to be resistant from rotation and illumination changes. My only concern is the processing time for each frame using this approach. I will definitely try to integrate this method into the current system. Thank you for your answer!
Very good point, the processing time is a concern. Maybe then instead of using SIFT, try a speeded up robust features (SURF) detector. It does essentially the same thing, however, is a lot faster (here is the original paper, see table 2). I remember reading somewhere that OpenCV's version of SURF is not well implemented (thus slow) and so might not be what you are looking for. Just incase, here is the link to the original source code. Regardless, super interesting problem, good luck!
Try ORB (docs.opencv.org/3.4/d1/d89/tutorial_py_orb.html) with FLANN (docs.opencv.org/3.4/d5/d6f/tutorial_feature_flann_matcher.html), this is a very fast and open-source alternative to the patented SURF and SIFT methods.

fireant · Accepted Answer · 2019-06-25 17:10:16Z

You can help the detector with preprocessing the image, then you don't need as many training images.

First we reduce the barrel distortion.

import cv2 img = cv2.imread('fedex.jpg') margin = 150 # add border as the undistorted image is going to be larger img = cv2.copyMakeBorder( img, margin, margin, margin, margin, cv2.BORDER_CONSTANT, 0) import numpy as np width = img.shape[1] height = img.shape[0] distCoeff = np.zeros((4,1), np.float64) k1 = -4.5e-5; k2 = 0.0; p1 = 0.0; p2 = 0.0; distCoeff[0,0] = k1; distCoeff[1,0] = k2; distCoeff[2,0] = p1; distCoeff[3,0] = p2; cam = np.eye(3, dtype=np.float32) cam[0,2] = width/2.0 # define center x cam[1,2] = height/2.0 # define center y cam[0,0] = 12. # define focal length x cam[1,1] = 12. # define focal length y dst = cv2.undistort(img, cam, distCoeff)

Then we transform the image in a way as if the camera is facing the FedEx truck right on. That is wherever along the curb the truck is parked, the FedEx logo will have almost the same size and orientation.

# use four points for homography estimation, coordinated taken from undistorted image # 1. top-left corner of F # 2. bottom-left corner of F # 3. top-right of E # 4. bottom-right of E pts_src = np.array([[1083, 235], [1069, 343], [1238, 301],[1201, 454]]) pts_dst = np.array([[1069, 235],[1069, 320],[1201, 235],[1201, 320]]) h, status = cv2.findHomography(pts_src, pts_dst) im_out = cv2.warpPerspective(dst, h, (dst.shape[1], dst.shape[0]))

Very interesting approach to ensure that the logo is always facing the front. This method would definitely be a great preprocessing stage before detection and remove problems involving logo rotation

Collectives™ on Stack Overflow

Image Processing: Algorithm Improvement for Real-Time FedEx Logo Detector

2 Answers 2

3 Comments

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

1 Comment

Linked

Related