Revisions to Image Processing: Algorithm Improvement for 'Coca-Cola Can' Recognition

replaced http://stackoverflow.com/ with https://stackoverflow.com/

edited May 23, 2017 at 11:47

URL Rewriter Bot

I really like Darren Cook's Darren Cook's and stacker's answers stacker's answers to this problem. I was in the midst of throwing my thoughts into a comment on those, but I believe my approach is too answer-shaped to not leave here.

In short summary, you've identified an algorithm to determine that a Coca-Cola logo is present at a particular location in space. You're now trying to determine, for arbitrary orientations and arbitrary scaling factors, a heuristic suitable for distinguishing Coca-Cola cans from other objects, inclusive of: bottles, billboards, advertisements, and Coca-Cola paraphernalia all associated with this iconic logo. You didn't call out many of these additional cases in your problem statement, but I feel they're vital to the success of your algorithm.

The secret here is determining what visual features a can contains or, through the negative space, what features are present for other Coke products that are not present for cans. To that end, the current top answer the current top answer sketches out a basic approach for selecting "can" if and only if "bottle" is not identified, either by the presence of a bottle cap, liquid, or other similar visual heuristics.

The problem is this breaks down. A bottle could, for example, be empty and lack the presence of a cap, leading to a false positive. Or, it could be a partial bottle with additional features mangled, leading again to false detection. Needless to say, this isn't elegant, nor is it effective for our purposes.

To this end, the most correct selection criteria for cans appear to be the following:

Is the shape of the object silhouette, as you sketched out in your question you sketched out in your question, correct? If so, +1.
If we assume the presence of natural or artificial light, do we detect a chrome outline to the bottle that signifies whether this is made of aluminum? If so, +1.
Do we determine that the specular properties of the object are correct, relative to our light sources (illustrative video link on light source detection)? If so, +1.
Can we determine any other properties about the object that identify it as a can, including, but not limited to, the topological image skew of the logo, the orientation of the object, the juxtaposition of the object (for example, on a planar surface like a table or in the context of other cans), and the presence of a pull tab? If so, for each, +1.

Your classification might then look like the following:

For each candidate match, if the presence of a Coca Cola logo was detected, draw a gray border.
For each match over +2, draw a red border.

This visually highlights to the user what was detected, emphasizing weak positives that may, correctly, be detected as mangled cans.

The detection of each property carries a very different time and space complexity, and for each approach, a quick pass through http://dsp.stackexchange.com is more than reasonable for determining the most correct and most efficient algorithm for your purposes. My intent here is, purely and simply, to emphasize that detecting if something is a can by invalidating a small portion of the candidate detection space isn't the most robust or effective solution to this problem, and ideally, you should take the appropriate actions accordingly.

And hey, congrats on the Hacker News posting! On the whole, this is a pretty terrific question worthy of the publicity it received. :)

I really like Darren Cook's and stacker's answers to this problem. I was in the midst of throwing my thoughts into a comment on those, but I believe my approach is too answer-shaped to not leave here.

In short summary, you've identified an algorithm to determine that a Coca-Cola logo is present at a particular location in space. You're now trying to determine, for arbitrary orientations and arbitrary scaling factors, a heuristic suitable for distinguishing Coca-Cola cans from other objects, inclusive of: bottles, billboards, advertisements, and Coca-Cola paraphernalia all associated with this iconic logo. You didn't call out many of these additional cases in your problem statement, but I feel they're vital to the success of your algorithm.

The secret here is determining what visual features a can contains or, through the negative space, what features are present for other Coke products that are not present for cans. To that end, the current top answer sketches out a basic approach for selecting "can" if and only if "bottle" is not identified, either by the presence of a bottle cap, liquid, or other similar visual heuristics.

The problem is this breaks down. A bottle could, for example, be empty and lack the presence of a cap, leading to a false positive. Or, it could be a partial bottle with additional features mangled, leading again to false detection. Needless to say, this isn't elegant, nor is it effective for our purposes.

To this end, the most correct selection criteria for cans appear to be the following:

Is the shape of the object silhouette, as you sketched out in your question, correct? If so, +1.
If we assume the presence of natural or artificial light, do we detect a chrome outline to the bottle that signifies whether this is made of aluminum? If so, +1.
Do we determine that the specular properties of the object are correct, relative to our light sources (illustrative video link on light source detection)? If so, +1.
Can we determine any other properties about the object that identify it as a can, including, but not limited to, the topological image skew of the logo, the orientation of the object, the juxtaposition of the object (for example, on a planar surface like a table or in the context of other cans), and the presence of a pull tab? If so, for each, +1.

Your classification might then look like the following:

For each candidate match, if the presence of a Coca Cola logo was detected, draw a gray border.
For each match over +2, draw a red border.