The overall development and growing era of the 21st century is reflected in different fields and use-cases. Computer vision has shifted its pace from old statistical methods to deep learning neural network methods. It is widely used in place for facial recognition with indexing, photo stabilization or machine vision. Major applications have been developed to process the image data and generate insights from it. Here we discuss the evolution of various deep learning architectures that deal with processing image data.
Classification of Images
The journey of deep learning originates with the classification problem of image data sets. Images with objects like cats or dogs must be classified into a given set of categories or labels. For this task, Convolutional Neural Network (CNN) must be trained with images and its corresponding labels.
Understanding Convolutional Neural Network
CNN works with the principle of convolution, which is a form of matrix multiplication which will down-sample the features to a matrix/vector of features. The resultant matrix or vector is later given to a layer of neurons which will further transform the vector to a single value vector. This single value will be later identified as the label.
Detection & Recognition
Classification of images is a straightforward task. The image contains only one object and the entire image is entitled to that object which is presented as label.
Object Detection (Single & Multiple)
Image classification and image recognition have seen many advances and deep neural architectures which give better accuracy than the previous one.
Semantic Segmentation
Now that regional object detection is achieved it became essential to detect the object without failing to detect the objects present under the bounding box of another object. A concept in image processing which presents the idea of pixel level segmentation is called Semantic Segmentation of images.