6
$\begingroup$

Is there any literature about the different ways translation invariance can be achieved when classifying images with Convolutional Neural Networks? Aside from using the structure of CNN, did anyone attempt something different? Pre-processing, for example?

$\endgroup$
1
  • $\begingroup$ I can't give you a specific link, but I'd start looking into convolutional neural networks (CNNs). I don't know whether there are other approaches to this problem. $\endgroup$ Commented Sep 2, 2016 at 13:20

1 Answer 1

5
$\begingroup$

This answer by Matt Krause on What is translation invariance in computer vision and convolutional netral network? contain some pointers:

One can show that the convolution operator commutes with respect to translation. If you convolve $f$ with $g$, it doesn't matter if you translate the convolved output $f*g$, or if you translate $f$ or $g$ first, then convolve them. Wikipedia has a bit more.

One approach to translation-invariant object recognition is to take a "template" of the object and convolve it with every possible location of the object in the image. If you get a large response at a location, it suggests that an object resembling the template is located at that location. This approach is often called template-matching.

You may also find this technical report interesting, they give some overview: Leibo, Joel Z., Jim Mutch, Lorenzo Rosasco, Shimon Ullman, and Tomaso Poggio. "Learning generic invariances in object recognition: translation and scale." (2010). https://scholar.google.com/scholar?cluster=17887886525836197513&hl=en&as_sdt=0,22 ; http://cbcl.mit.edu/cbcl/publications/ps/Efficiency_of_invariance_and_learning_CBCL_TR.pdf

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.