Creating a Object Detection model from scratch using Keras

Question

I have a dataset containing 330 images which contain guns. Along with the images, I have a text file associated with each image file which contains,

The number of objects ( guns ) in the image.
Coordinates for bounding boxes around the gun in the image.

I need to train a model which takes an image as an input and outputs 4 integer values which are the coordinates for the bounding box ( vertices of the bounding box ).

For training an object detection model, should the image be kept as an input and the coordinates as the output of the model? Should there be Convolution layers for feature extraction and then FC layers for learning the features for producing 4 outputs ( coordinates of the bounding box )?

Is this notion of the model architecture correct? Any other tips/suggestions?

I am creating this model entirely in TensorFlow Keras without using any of the pretrained stuff.

praisethemoon · Accepted Answer · 2019-06-03 12:07:18Z

Before I answer your question, let me tell you this, You can go on and train a model from scratch, but you will definitely end up using one of the object detection architectures, be it Mask R-CNN, Faster R-CNN, Yolo or SSD. Your problem is a simplified version of what these architectures are trying to solve. These are generic object detectors that some of which supports more 1k classes. You have a single class detection problem.

Now back to your question.

For training an object detection model, should the image be kept as an input and the coordinates as the output of the model? Should there be Convolution layers for feature extraction and then FC layers for learning the features for producing 4 outputs ( coordinates of the bounding box )? No, it is not that simple. Training a FCN to output 4 values as Bounding boxes wont work.

All object detectors mentioned earlier are based on assumptions, Faster R-CNN for example generates proposals (regions) its assuption is that these regions are very likely to contain an object, then does an additional step by classifying which class each region contains (you only have one class) and finally refining the output. The most important thing in Faster R-CNN is region proposal network, which iterates through the final convolution layer in a sliding window fashion, generating proposals in the different aspect ratios, for example 1:1, 1:2 and 2:1.

Why am I saying all this? Because I want you to understand, that the problem is not as easy as you think it is.

Stack Exchange Network

Creating a Object Detection model from scratch using Keras

1 Answer 1

Hot Network Questions

Creating a Object Detection model from scratch using Keras

1 Answer 1

Related

Hot Network Questions