MISSION

rm(list=ls()) #clear workspace


I wanted to set myself the challenge of building a neural network from scratch, depending solely on base R. I think the best way to fully get to grips with an algorithm is to code it from the ground up - one is left with little wiggle room for winging it. Moreover, I wanted to tackle the well-known MNIST (“Modified National Institute of Standards and Technology”) digit classification problem. The dataset is a collection of handwritten digits: with each row representing a single digit, and each column representing a 0-255 grey scale pixel. I kept asking myself the question, could I design a neural network algorithm to achieve at least 90% accuracy in the field of simple digit recognition? For the vast majority of the project, I found myself scribbling away with good old fashioned pen and paper - obtaining the key derivatives was by far the trickiest aspect of the process! In order to simplify the process, I made the neural network almost as small as possible - in this way, I was able to liberate my attention to focus on the key mathematics involved.

This project is condensed in to a two part series- the first part (this document), provides a broad introduction to neural networks and includes the relevant codes that I employed to build the neural network. The second part shows the relevant working I used to obtain the key derivatives - it is mathematically heavy and assumes previous knowledge of multivariable calculus and linear alegbra. For the mathematically inclined, I include a link at the bottom of the page for the second part.

ENSURE KEY LIBRARIES LOADED


Whilst I will used a few libraries in this post, they were not used for the purposes of building either the neural network function or the associated predictive function. I used the dslabs library to source the digits and their respective pixellated translations. Plus, I employed one of my favourite visual packages HighCharter to visualise the movement of the loss functions. Okay, let me take you through the building process!


If you have RStudio and are interested, here is a neat function that will help ensure you have the prerequisite libraries to replicate my results.

library_sort <- function(x){
  for( library in x ){
    #  require returns TRUE invisibly if it was able to load package
    if( ! require( library , character.only = TRUE ) ){
      # if package unavailable, then install
      install.packages( library , dependencies = TRUE )
      #  Load package after installing
      require( library , character.only = TRUE )
    }
  }
}
#  Then try/install packages...
library_sort( c("dslabs" , "caret" , "tidyverse", "rafalib","matrixStats","highcharter") )


Categorical data are defined by label values- unfortunately, not all machine learning models are designed to work in harmony with categorical data. Most tree based models can learn from categorical data without applying transformations. However, neural networks- like most machine learning models - require the input and output to be numeric. Sometimes, integer coding (1,2,3…) can be recruited when categorical data can be ranked - in other words, when the respective data is inherently ordinal. One-hot encoding is recruited when there is no clear order in the data being examined: all elements will be 0s with the exception of the element that corresponds to the actual category being defined (which will be labelled as a 1). Neural networks cannot distinguish between character labels - however, the algorithm can discriminate between 1’s and 0’s. The dependent variable (column containing the classification of digits), required one-hot encoding. I put together a function that will perform one-hot encoding:

one_hot_convert<-function(Y){
                   
              categories<-sort(unique(Y)) 
                  
              matrix_hot<-matrix(0, ncol=length(unique(categories)), 
                  byrow = T,
                  nrow= nrow(Y), 
                  dimnames = list(NULL, c(unique(categories))))
                  
                  for(i in 1:nrow(Y)){
                  for(j in 1:length(categories)){
                  if(Y[i] == categories[j]){matrix_hot[i,j]<-1} 
                   else next
                                }}
                                
                  return(matrix_hot)
                  }


After obtaining the raw training data from the dslabs library - I did a little spring cleaning, before splitting the data into a training and validation set.


if(!exists("mnist"))mnist<-read_mnist()
x<-mnist$train$images
y<-mnist$train$labels
combined<-cbind(x,y)
df<-cbind(x,y)
df<-data.frame(df)

#keep all single digits [0-9] 
df<-df%>% filter(y != 10)            

#omit cases with missing values
df<- na.omit(df)                                                                                                                                                                    
ynew<-df[,ncol(df)]      


#remove zero variance columns
removeZeroVar <- function(df){
    df[, !sapply(df, function(x) min(x) == max(x))]
}


#apply zero variance removal function
reduced<-removeZeroVar(df) 


#partition data into training/test sets
n = nrow(reduced)
trainIndex = sample(1:n, size = round(0.7*n), replace=FALSE)
train = reduced[trainIndex ,]
test = reduced[-trainIndex ,]

#Y training before one-hot encoding
YORIG<-train[,ncol(train)]  

#Add 1 to the Y values (for intuitive index comparison later)
YORIG<-YORIG+1

#training design matrix
X<-as.matrix(train[,-ncol(train)])  

#apply one-hot encoding to the dependent variable
Y<-as.matrix(train[,ncol(train)])
Y<-Y+1

Y<-one_hot_convert(Y)


Let’s set aside and prep the test set

XTEST<-as.matrix(test[,-ncol(test)])
YORIG_TEST<-test[,ncol(test)]

#will be used for indexing matching later
YORIG_TEST<-YORIG_TEST+1

NEURAL NETWORKS


It is no exaggeration to claim that the proliferation of artificial neural networks (NN) over the last decade or so have revolutionised the field of artificial intelligence (AI). The application of neural networks have enjoyed phenomenal success in the domains of image recognition and natural language processing. Moreover, neural networks are proving to be of great utility in a wide range of other fields - there is every indication that this trend is set to continue in the advancing years.


NEURAL NETWORK ARCHITECTURE


I decided to construct a fully connected neural network - the architecture of such a network entails the connection of every neuron to every other neuron in adjoining layers. The training of a neural network primarily consists of two phases: the forward and backward pass. After the forward pass, the loss is computed, after which we recursively apply the chain rule of calculus back to the respective inputs. The chain rule is a technique that enables us to easily evaluate the derivative of the composition of two or more functions. Once the respective derivatives are established, the weights can then be updated using the gradient descent algorithm.

VISUAL OVERVIEW OF FEED FORWARD PHASE


One of the best ways to grasp what is going on in a neural network, is to view its internal workings via a computaional graph. I will make a slight break with convention by visualising the activation (a0,a1,a2) and z product layers separately.

Below, the feedforward phase is illustrated:

                                                          htmltools::img(src = knitr::image_uri("C:/Users/gamal/Downloads/20220307_233706.jpg" ), alt = 'logo') 

logo