0.1 Introduction

A lot of deep learning frameworks often abstract away the mechanics behind training a neural network. While that has the advantage of building deep learning models at a very fast speed, it is equally important to slow down and understand how they work. In this 2-part series, we’ll dig deep and build our own neural-net from scratch. This will help us understand, at a basic level, how those big frameworks work. The network we’ll build will contain a single hidden layer and perform binary classification using a vectorized implementation of backpropagation, all written in base-R.

In this 2-part series, we will describe in detail what a single-layer neural network is, how it works, and the equations used to describe it. We will see what kind of data preparation is required to be able to use it with a neural-network. Then, we will implement a neural-net step-by-step from scratch and examine the output at each step. Finally, to see how our neural-net fares, we will describe a few metrics used for classification problems and use them.

We will also compare our neural-net with a logistic regression model and see how the decision boundaries differ. By the end of this series, you will have a deeper understanding of the maths behind neural-networks and the ability to implement it yourself from scratch!

In this first part, we’ll go through the dataset we are going to use, the pre-processing involved, train-test split, and the describe in detail the architecture of the model. Then we’ll build our neural-net chunk-by-chunk. It will involve writing functions for initializing parameters and running forward propagation.

In the second part, we’ll implement backpropagation by writing functions to calculate gradients and update the weights. Finally, we’ll make predictions on the test data and see how accurate our model is using metrics such as Accuracy, Recall, Precision, and F1-score. We’ll compare our neural-net with a logistic regression model andv visualize the difference in the decision boundaries produced by these models.

0.1.1 Set Seed

Before we start, let’s set a seed value to ensure reproducibility of the results.

set.seed(69)

0.1.2 Architecture Definition

To understand the matrix multiplications better and keep the numbers digestible, we will describe a very simple 3-layer neural net ie. a neural-net with a single hidden layer. The \(1^{st}\) layer will take in the inputs and the \(3^{rd}\) layer will spit out an output.

The input layer will have 2 (input) neurons, the hidden layer 4 (hidden) neurons, and the output layer 1 (output) neuron.

Our input layer has 2 neurons because we’ll be passing 2 features (columns of a dataframe) as the input. A single output neuron because we’re performing binary classification. This means two output classes - 0 and 1. Our output will actually be a probability (a number that lies between 0 and 1). We’ll define a threshold for rounding off this probability to 0 or 1. For instance, this threshold can be 0.5.

In a deep neural-net, multiple hidden layers are stacked together (hence the name “deep”). Each hidden layer can contain any number of neurons you want.

In this series, we’re implementing a single-layer neural-net which, as the name suggests, contains a single hidden layer.

  • n_x: the size of the input layer (set this to 2).
  • n_h: the size of the hidden layer (set this to 4).
  • n_y: the size of the output layer (set this to 1).