I am writing this blog post to provide an overview of deep learning (DL), explain more clearly the practicality of algorithms without diving into the math expressions behind the model. Basically, DL is a branch of machine learning (ML) based on computational models built of multiple processing layers using learning representations of data with increasing level abstractions. These data can be text, voice, image, video, etc. As for tools to perform deep learning, I only list the most popular used by the 2017 top programming languages.
On Twitter, the hashtag #DeepLearning easily allows finding some posts related to Deep Learning.
Increasingly, DL becomes indispensable because of its accuracy in solving problems in the fields of Automatic speech recognition, image recognition, natural language processing (NLP), drug discovery and toxicology, customer relationship management, recommendation systems, bioinformatics, etc.
For people willing to make of DL their cup-of-coffee, there are some prerequisites to brush up for being able to understand the skeleton of DL and how to handle function parameters. The starting point includes some basics in linear algebra, probability and information theory, numerical computation and some ML techniques. Since the development of algorithms relies on the background of these disciplines above, it’s recommended to beginners to consider them for mastering the theoretical side of DL and for being able to interpret the results.
The learning algorithm can be supervised or unsupervised. The goal of any supervised learning algorithms is to build an artificial system that can learn the mapping between the input and the output, and can predict the output of the system given new inputs; and the goal of unsupervised learning is to build representations of the input that can be used for decision making, predicting future inputs, efficiently communicating the inputs to another machine, etc.
Perceptron is a supervised learning algorithm; it is a linear classifier combining a set of weights with the feature vector. The appropriate weights are applied to the inputs, and the resulting weighted sum passed to a function that produces the output. This algorithm includes three steps: - (Step 1) initialization of the weights to zero or a small random value; - (Step 2) calculation of the weight and update of the weights; and - (Step 3) offline learning and repeat the step 2 as many possible until the error will be than the threshold.
Kernel methods are a class of algorithms for pattern analysis in datasets performing a nonlinear mapping from raw data in high-dimensional space, where simple classifier can be used to handle the data. Algorithms that can be used with inner product information: - Ridge Regression - Fisher Discriminant - Principal Components Analysis - Canonical Correlation Analysis - Spectral Clustering - Support Vector Machines, etc. Kernel methods can work as follows: - (Step 1) Embedding data in a vector space and looking for linear relations in such space - (Step 2) If the map is chosen fitly, complex relations can be simplified, and easily detected.
Backpropagation to train multilayer architectures. It replaces hand-engineering with trainable multilayer networks. Backpropagation (backward propagation of errors) algorithm is a method of training artificial neural networks looking for the local minimum of the error function in weight space using an optimization method such as gradient descent (the steepest direction). The solution to the learning problem arises from the merger of weights minimizing the error function. The algorithm can be decomposed in the following steps: - Feed-forward computation (graph whose nodes is computing units and whose directed edges transmit numerical information from node to node) - Backpropagation to the output layer - Backpropagation to the hidden layer - Weight updates. The goal is to increase or decrease the weight in order to make the error sufficiently smaller.
Convolutional neural networks (ConNet or CNN). It is a type of feed-forward artificial neural network with a specialized connectivity structure - stack multiple stages of feature extractors - higher stages compute more global, more invariant features - classification layer at the end. This method was inspired by biological processes and are variations of multilayer perceptrons designed to use minimal amounts of preprocessing. Applications of ConNet are Image and Video recognition, recommender systems, NLP, etc.
Let explain the ConNet seamlessly and make it easier for beginners (include me) referring to the Fig 1 a deep convolutional neural network for image classification.