Abalone Data Set

Link

1. Pre-process Data

#library(readr)
abalone <- read.csv("~/Spring2022/ANN/abalone.data", header=FALSE)
head(abalone)
dim(abalone)
## [1] 4177    9
colnames(abalone)
## [1] "V1" "V2" "V3" "V4" "V5" "V6" "V7" "V8" "V9"

Renaming data

names(abalone) <- c('Sex', 'Length', 'Diameter', 'Height', 'Whole', 'Shucked', 
                       'Viscera', 'Shell', 'Rings')

head(abalone)
colnames(abalone)
## [1] "Sex"      "Length"   "Diameter" "Height"   "Whole"    "Shucked"  "Viscera" 
## [8] "Shell"    "Rings"

Loading the libraries

library(tensorflow)
library(keras)
#head(abalone) #dataset

 library(caret) #this package has the createDataPartition function
## Loading required package: ggplot2
## Loading required package: lattice
## 
## Attaching package: 'caret'
## The following object is masked from 'package:tensorflow':
## 
##     train
 set.seed(123) #randomization`
    
 #creating indices
 trainIndex <- createDataPartition(abalone$Rings,p=0.75,list=FALSE)
    
 #splitting data into training/testing data using the trainIndex object
 abalone_TRAIN <- abalone[trainIndex,] #training data (75% of data)
    
 abalone_TEST <- abalone[-trainIndex,] #testing data (25% of data)
head(abalone_TRAIN)

Construct Dataset

Let’s shuffle our dataset so that our model is invariant to the order of samples. This is good for generalization and will help increase performance on unseen (test) data.

df <- abalone[sample(nrow(df)), ]

#df = df %>% mutate_if(is.factor, as.numeric) #convert categorical data to numeric data
library(readr)
df <- read_csv("~/Spring2022/Dissertation20220127/ANN_R/df.csv")
## New names:
## * `` -> ...1
## * ...1 -> ...2
## Rows: 3134 Columns: 11
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (11): ...1, ...2, Sex, Length, Diameter, Height, Whole, Shucked, Viscera...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#View(df)
head(df)
df = data.frame(df)
#write.csv(df,"myfirstcsvfile.csv")
#write.csv(df, "df.csv")

##A Neural Net Using R

link1 link2

Train-Test Split

Now that we have our dataset prepared, let’s go ahead and split it into train and test sets. We’ll put 80% of our data into our train set and the remaining 20% into our test set. (To keep the focus on the neural-net, we will not be using a validation set here. ).

train_test_split_index <- 0.8 * nrow(df)

Train and Test Dataset

Because we’ve already shuffled the dataset above, we can go ahead and extract the first 80% rows into train set.

train <- df[1:train_test_split_index,]
head(train)

Next, we select last 20% rows of the shuffled dataset to be our test set.

test <- df[(train_test_split_index+1): nrow(df),]
head(test)

Preprocess

Neural networks work best when the input values are standardized. So, we’ll scale all the values to to have their mean=0 and standard-deviation=1.

Standardizing input values speeds up the training and ensures faster convergence.

To standardize the input values, we’ll use the scale() function in R. Note that we’re standardizing the input values (X) only and not the output values (y).

X_train <- scale(train[, c(1:8)])

y_train <- train$Rings
dim(y_train) <- c(length(y_train), 1) # add extra dimension to vector

X_test <- scale(test[, c(1:2)])

y_test <- test$Rings
dim(y_test) <- c(length(y_test), 1) # add extra dimension to vector

Because neural nets are made up of a bunch matrix multiplications, let’s convert our input and output to matrices from dataframes. While dataframes are a good way to represent data in a tabular form, we choose to convert to a matrix type because matrices are smaller than an equivalent dataframe and often speed up the computations.

We will also change the shape of X and y by taking its transpose. This will make the matrix calculations slightly more intuitive as we’ll see in the second part. There’s really no difference though. Some of you might find this way better, while others might prefer the non-transposed way. I feel this this makes more sense.

We’re going to use the as.matrix() method to construct out matrix. We’ll fill out matrix row-by-row.

X_train <- as.matrix(X_train, byrow=TRUE)
X_train <- t(X_train)
y_train <- as.matrix(y_train, byrow=TRUE)
y_train <- t(y_train)

X_test <- as.matrix(X_test, byrow=TRUE)
X_test <- t(X_test)
y_test <- as.matrix(y_test, byrow=TRUE)
y_test <- t(y_test)

Build a neural-net

Now that we’re done processing our data, let’s move on to building our neural net. As discussed above, we will broadly follow the steps outlined below.

Get layer sizes

A neural network optimizes certain parameters to get to the right output. These parameters are initialized randomly. However, the size of these matrices is dependent upon the number of layers in different layers of neural-net.

To generate matrices with random parameters, we need to first obtain the size (number of neurons) of all the layers in our neural-net. We’ll write a function to do that. Let’s denote n_x, n_h, and n_y as the number of neurons in input layer, hidden layer, and output layer respectively.

We will obtain these shapes from our input and output data matrices created above.

getLayerSize <- function(X, y, hidden_neurons, train=TRUE) {
  n_x <- dim(X)[1]
  n_h <- hidden_neurons
  n_y <- dim(y)[1]   
  
  size <- list("n_x" = n_x,
               "n_h" = n_h,
               "n_y" = n_y)
  
  return(size)
}

As we can see below, the number of neurons is decided based on shape of the input and output matrices.

layer_size <- getLayerSize(X_train, y_train, hidden_neurons = 40)
layer_size
## $n_x
## [1] 8
## 
## $n_h
## [1] 40
## 
## $n_y
## [1] 1

Initialise parameters

Before we start training our parameters, we need to initialize them. Let’s initialize the parameters based on random uniform distribution.

The function initializeParameters() takes as argument an input matrix and a list which contains the layer sizes i.e. number of neurons. The function returns the trainable parameters W1, b1, W2, b2.

Our neural-net has 3 layers, which gives us 2 sets of parameter. The first set is W1 and b1. The second set is W2 and b2. Note that these parameters exist as matrices.

These random weights matrices W1, b1, W2, b2 are created based on the layer sizes of the different layers (n_x, n_h, and n_y).

The sizes of these weights matrices are -

W1 = (n_h, n_x) b1 = (n_h, 1) W2 = (n_y, n_h) b2 = (n_y, 1)

initializeParameters <- function(X, list_layer_size){

    m <- dim(data.matrix(X))[2]
    
    n_x <- list_layer_size$n_x
    n_h <- list_layer_size$n_h
    n_y <- list_layer_size$n_y
        
    W1 <- matrix(runif(n_h * n_x), nrow = n_h, ncol = n_x, byrow = TRUE) * 0.01
    b1 <- matrix(rep(0, n_h), nrow = n_h)
    W2 <- matrix(runif(n_y * n_h), nrow = n_y, ncol = n_h, byrow = TRUE) * 0.01
    b2 <- matrix(rep(0, n_y), nrow = n_y)
    
    params <- list("W1" = W1,
                   "b1" = b1, 
                   "W2" = W2,
                   "b2" = b2)
    
    return (params)
    
}

For our network, the size of our weight matrices are as follows. Remember that, number of input neurons n_x = 8, hidden neurons n_h = 40, and output neuron n_y = 1. layer_size is calculate above.

n_x: the size of the input layer (set this to 2).

n_h: the size of the hidden layer (set this to 4).

n_y: the size of the output layer (set this to 1).
init_params <- initializeParameters(X_train, layer_size)
lapply(init_params, function(x) dim(x))
## $W1
## [1] 40  8
## 
## $b1
## [1] 40  1
## 
## $W2
## [1]  1 40
## 
## $b2
## [1] 1 1
sigmoid <- function(x){
    return(1 / (1 + exp(-x)))
}
forwardPropagation <- function(X, params, list_layer_size){
    
    m <- dim(X)[2]
    n_h <- list_layer_size$n_h
    n_y <- list_layer_size$n_y
    
    W1 <- params$W1
    b1 <- params$b1
    W2 <- params$W2
    b2 <- params$b2
    
    b1_new <- matrix(rep(b1, m), nrow = n_h)
    b2_new <- matrix(rep(b2, m), nrow = n_y)
    
    Z1 <- W1 %*% X + b1_new
    A1 <- sigmoid(Z1)
    Z2 <- W2 %*% A1 + b2_new
    A2 <- sigmoid(Z2)
    
    cache <- list("Z1" = Z1,
                  "A1" = A1, 
                  "Z2" = Z2,
                  "A2" = A2)

    return (cache)
}
fwd_prop <- forwardPropagation(X_train, init_params, layer_size)
lapply(fwd_prop, function(x) dim(x))
## $Z1
## [1]   40 2507
## 
## $A1
## [1]   40 2507
## 
## $Z2
## [1]    1 2507
## 
## $A2
## [1]    1 2507

Compute Cost

We will use Binary Cross Entropy loss function (aka log loss). Here, y is the true label and ^y is the predicted output.

\[ cost = -1/N\Sigma^N_{i=1}y_ilog(\hat y_i) + (1-y_i)(log(1-\hat y_i)) \]

The computeCost() function takes as arguments the input matrix X, the true labels y and a cache. cache is the output of the forward pass that we calculated above. To calculate the error, we will only use the final output A2 from the cache.

computeCost <- function(X, y, cache) {
    m <- dim(X)[2]
    A2 <- cache$A2
    logprobs <- (log(A2) * y) + (log(1-A2) * (1-y))
    cost <- -sum(logprobs/m)
    return (cost)
}

cost <- computeCost(X_train, y_train, fwd_prop)
cost
## [1] -0.2559867

Backpropagation

backwardPropagation <- function(X, y, cache, params, list_layer_size){
    
    m <- dim(X)[2]
    
    n_x <- list_layer_size$n_x
    n_h <- list_layer_size$n_h
    n_y <- list_layer_size$n_y

    A2 <- cache$A2
    A1 <- cache$A1
    W2 <- params$W2

    dZ2 <- A2 - y
    dW2 <- 1/m * (dZ2 %*% t(A1)) 
    db2 <- matrix(1/m * sum(dZ2), nrow = n_y)
    db2_new <- matrix(rep(db2, m), nrow = n_y)
    
    dZ1 <- (t(W2) %*% dZ2) * (1 - A1^2)
    dW1 <- 1/m * (dZ1 %*% t(X))
    db1 <- matrix(1/m * sum(dZ1), nrow = n_h)
    db1_new <- matrix(rep(db1, m), nrow = n_h)
    
    grads <- list("dW1" = dW1, 
                  "db1" = db1,
                  "dW2" = dW2,
                  "db2" = db2)
    
    return(grads)
}
back_prop <- backwardPropagation(X_train, y_train, fwd_prop, init_params, layer_size)
lapply(back_prop, function(x) dim(x))
## $dW1
## [1] 40  8
## 
## $db1
## [1] 40  1
## 
## $dW2
## [1]  1 40
## 
## $db2
## [1] 1 1

Update Parameters

updateParameters <- function(grads, params, learning_rate){

    W1 <- params$W1
    b1 <- params$b1
    W2 <- params$W2
    b2 <- params$b2
    
    dW1 <- grads$dW1
    db1 <- grads$db1
    dW2 <- grads$dW2
    db2 <- grads$db2
    
    
    W1 <- W1 - learning_rate * dW1
    b1 <- b1 - learning_rate * db1
    W2 <- W2 - learning_rate * dW2
    b2 <- b2 - learning_rate * db2
    
    updated_params <- list("W1" = W1,
                           "b1" = b1,
                           "W2" = W2,
                           "b2" = b2)
    
    return (updated_params)
}
update_params <- updateParameters(back_prop, init_params, learning_rate = 0.01)
lapply(update_params, function(x) dim(x))
## $W1
## [1] 40  8
## 
## $b1
## [1] 40  1
## 
## $W2
## [1]  1 40
## 
## $b2
## [1] 1 1

Train the Model

Now that we have all our components, let’s go ahead write a function that will train our model.

We will use all the functions we have written above in the following order.

Run forward propagation
Calculate loss
Calculate gradients
Update parameters
Repeat
trainModel <- function(X, y, num_iteration, hidden_neurons, lr){
    
    layer_size <- getLayerSize(X, y, hidden_neurons)
    init_params <- initializeParameters(X, layer_size)
    cost_history <- c()
    for (i in 1:num_iteration) {
        fwd_prop <- forwardPropagation(X, init_params, layer_size)
        cost <- computeCost(X, y, fwd_prop)
        back_prop <- backwardPropagation(X, y, fwd_prop, init_params, layer_size)
        update_params <- updateParameters(back_prop, init_params, learning_rate = lr)
        init_params <- update_params
        cost_history <- c(cost_history, cost)
        
        if (i %% 10000 == 0) cat("Iteration", i, " | Cost: ", cost, "\n")
    }
    
    model_out <- list("updated_params" = update_params,
                      "cost_hist" = cost_history)
    return (model_out)
}
EPOCHS = 60
HIDDEN_NEURONS = 40
LEARNING_RATE = 0.9

train_model <- trainModel(X_train, y_train, hidden_neurons = HIDDEN_NEURONS, num_iteration = EPOCHS, lr = LEARNING_RATE)
train_model$updated_params
## $W1
##             ...1       ...2        Sex   Length Diameter   Height    Whole
##  [1,] -0.3299709 -0.8467479 -0.1889782 2.469116 2.580707 2.388210 2.363017
##  [2,] -0.3459184 -0.7778733 -0.1819614 2.721740 2.829377 2.621017 2.638501
##  [3,] -0.3511797 -0.8201793 -0.1844913 2.594473 2.700110 2.495740 2.499021
##  [4,] -0.3298825 -0.8266540 -0.1764896 2.618472 2.727805 2.529207 2.529172
##  [5,] -0.3496832 -0.8424577 -0.1688283 2.512849 2.623431 2.427213 2.409480
##  [6,] -0.3015580 -0.7832754 -0.2386074 2.658286 2.773063 2.561475 2.567851
##  [7,] -0.3004132 -0.8387778 -0.1551971 2.641717 2.759636 2.560325 2.555174
##  [8,] -0.2979555 -0.8125678 -0.2195029 2.542459 2.648580 2.455463 2.441979
##  [9,] -0.3500734 -0.8313314 -0.1764329 2.484271 2.600553 2.408794 2.384861
## [10,] -0.3004219 -0.8559863 -0.2137749 2.640340 2.748334 2.552875 2.547302
## [11,] -0.3462960 -0.8073615 -0.1740992 2.567358 2.672881 2.475135 2.464450
## [12,] -0.2865947 -0.8072512 -0.2270057 2.503276 2.614619 2.417389 2.399496
## [13,] -0.3364998 -0.7974085 -0.1739677 2.635508 2.743955 2.537672 2.541395
## [14,] -0.3343154 -0.7967951 -0.1731873 2.688473 2.798530 2.601738 2.600512
## [15,] -0.3533269 -0.7889517 -0.2332113 2.562597 2.665819 2.466860 2.459645
## [16,] -0.2924701 -0.8345417 -0.1821861 2.814836 2.931828 2.699067 2.747862
## [17,] -0.3382335 -0.8379235 -0.1682023 2.768654 2.879190 2.657164 2.693368
## [18,] -0.3063503 -0.8391714 -0.1527376 2.726507 2.839794 2.627314 2.647668
## [19,] -0.3280545 -0.8385911 -0.1762587 2.677048 2.794799 2.577868 2.598351
## [20,] -0.2828904 -0.8160959 -0.2309591 2.729470 2.836996 2.626408 2.644861
## [21,] -0.2837795 -0.8164262 -0.1595788 2.451178 2.562958 2.374026 2.345792
## [22,] -0.3390645 -0.8348899 -0.1486970 2.675190 2.790841 2.574486 2.593882
## [23,] -0.3387570 -0.8109863 -0.1487898 2.444533 2.554489 2.360372 2.335421
## [24,] -0.3585092 -0.7814334 -0.1616409 2.585951 2.697756 2.490387 2.491038
## [25,] -0.2863984 -0.8597730 -0.2195089 2.420340 2.531392 2.351437 2.306008
## [26,] -0.3315377 -0.8446946 -0.2258533 2.668874 2.782601 2.580157 2.582189
## [27,] -0.3123696 -0.8549333 -0.2148962 2.682815 2.798187 2.591290 2.601155
## [28,] -0.3060121 -0.7628251 -0.2056828 2.823646 2.936687 2.712933 2.754968
## [29,] -0.2990963 -0.8118043 -0.2323999 2.575695 2.689626 2.495261 2.478887
## [30,] -0.3577601 -0.8020223 -0.1954068 2.732647 2.846486 2.640192 2.652168
## [31,] -0.3347439 -0.8072987 -0.1720361 2.685713 2.794093 2.586332 2.596944
## [32,] -0.3110387 -0.7759805 -0.2210814 2.810937 2.922946 2.693850 2.738061
## [33,] -0.3683008 -0.8524699 -0.2236686 2.742827 2.863377 2.641429 2.670912
## [34,] -0.3044102 -0.8293786 -0.1785781 2.624655 2.729100 2.526888 2.527658
## [35,] -0.3568356 -0.8022556 -0.1607533 2.576406 2.685870 2.487391 2.477064
## [36,] -0.2825601 -0.8349947 -0.2318740 2.563594 2.677024 2.476073 2.464828
## [37,] -0.3380288 -0.8414531 -0.2027817 2.727169 2.831641 2.626164 2.640067
## [38,] -0.3028403 -0.8272539 -0.1627030 2.603032 2.711537 2.518288 2.508232
## [39,] -0.3641869 -0.7748099 -0.1500617 2.801869 2.916072 2.688634 2.731090
## [40,] -0.3567212 -0.7808500 -0.2085704 2.734829 2.849500 2.630745 2.653357
##        Shucked
##  [1,] 1.719350
##  [2,] 1.985521
##  [3,] 1.849149
##  [4,] 1.874759
##  [5,] 1.768540
##  [6,] 1.922831
##  [7,] 1.904874
##  [8,] 1.793080
##  [9,] 1.737964
## [10,] 1.896376
## [11,] 1.820747
## [12,] 1.757331
## [13,] 1.897574
## [14,] 1.949266
## [15,] 1.813508
## [16,] 2.091127
## [17,] 2.036339
## [18,] 1.992194
## [19,] 1.941865
## [20,] 1.991622
## [21,] 1.704195
## [22,] 1.939983
## [23,] 1.700339
## [24,] 1.846912
## [25,] 1.664850
## [26,] 1.926268
## [27,] 1.943268
## [28,] 2.094489
## [29,] 1.832405
## [30,] 1.996750
## [31,] 1.949294
## [32,] 2.084652
## [33,] 2.009251
## [34,] 1.879551
## [35,] 1.830993
## [36,] 1.822481
## [37,] 1.985848
## [38,] 1.858757
## [39,] 2.078096
## [40,] 2.005564
## 
## $b1
##           [,1]
##  [1,] 588.6234
##  [2,] 588.6234
##  [3,] 588.6234
##  [4,] 588.6234
##  [5,] 588.6234
##  [6,] 588.6234
##  [7,] 588.6234
##  [8,] 588.6234
##  [9,] 588.6234
## [10,] 588.6234
## [11,] 588.6234
## [12,] 588.6234
## [13,] 588.6234
## [14,] 588.6234
## [15,] 588.6234
## [16,] 588.6234
## [17,] 588.6234
## [18,] 588.6234
## [19,] 588.6234
## [20,] 588.6234
## [21,] 588.6234
## [22,] 588.6234
## [23,] 588.6234
## [24,] 588.6234
## [25,] 588.6234
## [26,] 588.6234
## [27,] 588.6234
## [28,] 588.6234
## [29,] 588.6234
## [30,] 588.6234
## [31,] 588.6234
## [32,] 588.6234
## [33,] 588.6234
## [34,] 588.6234
## [35,] 588.6234
## [36,] 588.6234
## [37,] 588.6234
## [38,] 588.6234
## [39,] 588.6234
## [40,] 588.6234
## 
## $W2
##          [,1]     [,2]    [,3]     [,4]     [,5]     [,6]     [,7]     [,8]
## [1,] 475.5578 475.5383 475.548 475.5474 475.5515 475.5473 475.5447 475.5506
##         [,9]   [,10]    [,11]   [,12]    [,13]    [,14]    [,15]    [,16]
## [1,] 475.554 475.543 475.5541 475.558 475.5486 475.5446 475.5533 475.5352
##         [,17]    [,18]    [,19]    [,20]    [,21]   [,22]    [,23]    [,24]
## [1,] 475.5375 475.5404 475.5435 475.5386 475.5586 475.543 475.5584 475.5516
##         [,25]    [,26]    [,27]    [,28]    [,29]    [,30]    [,31]    [,32]
## [1,] 475.5587 475.5429 475.5418 475.5335 475.5491 475.5398 475.5401 475.5348
##         [,33]    [,34]    [,35]    [,36]    [,37]    [,38]    [,39]   [,40]
## [1,] 475.5392 475.5482 475.5487 475.5495 475.5395 475.5492 475.5361 475.541
## 
## $b2
##          [,1]
## [1,] 481.7346

Logistic Regression

Before we go ahead and test our neural net, let’s quickly train a simple logistic regression model so that we can compare its performance with our neural net. Since, a logistic regression model can learn only linear boundaries, it will not fit the data well. A neural-network on the other hand will.

We’ll use the glm() function in R to build this model.

lr_model <- glm(Rings~Sex + Length + Diameter + Height + Whole + Shucked + Viscera + Shell, data = train)
lr_model
## 
## Call:  glm(formula = Rings ~ Sex + Length + Diameter + Height + Whole + 
##     Shucked + Viscera + Shell, data = train)
## 
## Coefficients:
## (Intercept)          Sex       Length     Diameter       Height        Whole  
##     3.10042      0.06182     -2.71772     15.08943      7.78905      9.28470  
##     Shucked      Viscera        Shell  
##   -20.34189    -10.70057     10.19805  
## 
## Degrees of Freedom: 2506 Total (i.e. Null);  2498 Residual
## Null Deviance:       27370 
## Residual Deviance: 12750     AIC: 11210

Let’s now make generate predictions of the logistic regression model on the test set.

#colnames(test)
#head(test)
lr_pred <- round(as.vector(predict(lr_model, test[, 1:10])))
lr_pred
##   [1]  9 13  7 17  7  6 11  9  9  9  8  9  9 12 11 10 10 11 10  7 11 10  6 10 15
##  [26]  8 14  9 10 12 11  9 11  6 14  6 12  6 10  8 12  9 11 14 15  9  7 10  9  9
##  [51] 12 10  6 10 10 10  9 11 12  9  8  8  7  7 13  6 10  8 12  8 11 12  5 11 11
##  [76]  9 12 14 16 10  8  6 17 10 10  9 10  8 12 11 13 10  8 11 11  7 11 10  8  9
## [101] 10  9  6  8  9 10  7 11 11  9  9 10 13  9 12 11 13 10  8  9 10  9 16  9  8
## [126]  7  9  8 10 14  9  9 11  8 12  8 11 12 10  9 10 11 11  6  9  9 12  8  9  9
## [151] 11  9 11  9 11 12 13  7  6  7 11  6  8  9 10 15  9 12  7  9  8 11 11 13 10
## [176] 14  8  8  9  7  8  9  7 10  8 10 12 12 11 12 10  5  9  7 12  8  9  9 13  8
## [201] 12 10  9 10 12 11  8  8  7 11  9  9 10  8 19 10 12 14 15  8  8  9 12  9  7
## [226] 13 10  9 10 12  9  8 12 11 10 15 12 10  7 17  7 13  9 11  6 10 10  5  8  8
## [251]  6 10  9 14 16  7  9 13  8 13  9  8  8  8 16  8 10 11 10  9 10  8 10 11  9
## [276] 15  9 11 10  9  9 12  5  8  9  9 10  9  9 11 10  9  9  7 10 11 14 13 10 11
## [301]  9  9  9  5 10 10  8  7  9  8 12 11 11  7 11 10 11  9  6  9 10 13 12  8 10
## [326]  9  9  6 10 11 10 12  8 14  7 13 12 10  9 11 10 11 11  7  7 10  8  9 24  9
## [351] 13 11 10  6 10  5 10 11  8  6  6  7 12  9 11 10 10 11 10  9 11  8 10  7  9
## [376] 12  8  9 13 12 11  6 12  9  7  9  8 10 11  8  9 11 10 12 11  9 12 10  9  9
## [401]  9  9  6  9 12 14 10 10  7 11 10  9 12 10 11 11 12 10  9 10 10 10  8  8 11
## [426] 11 -1 10 11  9  7 11 11  7 10 11 10 10 10  8  9 13  9 12  9  7 14 12 10 11
## [451]  9 11  9 13 11  9 14 12  8 10 11  9 18 11  8  4  7  9 11  7  9 10 10  9  8
## [476] 10 14 10  9 12 13  9 10 12  8  9  5  7 11 11 11 10 11  6  8  8 13 12  9 13
## [501]  6 14 11 11 10  8 12 12  7 13 11  9 12 13 10 15  7 14 11  9 10  9 10 10  8
## [526] 10  8 11 14  7 11 12 10 12 11  8  8 10  9 12  8 11  9  9 13 13  9  5  8  6
## [551]  6  9 11  7  8  8  8  9 14  8 10 11  9  9 11 13 13  6 11 12 11 13  6 10 13
## [576]  8 10  9  8 10 13  8  8  9 12  8 11 11 11 11  8  5  9 10 10  8  9 10  9  9
## [601] 12  9 11 15  9  9 14  6 10 11  7 10 10  6 11 13  9  8  6 11  7 11 13 14 10
## [626] 11

Test the Model

Finally, it’s time to make predictions. To do that -

First get the layer sizes.
Run forward propagation.
Return the prediction.

During inference time, we do not need to perform backpropagation as you can see below. We only perform forward propagation and return the final output from our neural network. (Note that instead of randomly initializing parameters, we’re using the trained parameters here. )

makePrediction <- function(X, y, hidden_neurons){
    layer_size <- getLayerSize(X, y, hidden_neurons)
    params <- train_model$updated_params
    fwd_prop <- forwardPropagation(X, params, layer_size)
    pred <- fwd_prop$A2
    
    return (pred)
}

After obtaining our output probabilities (Sigmoid), we round-off those to obtain output labels.

X_test <- as.matrix(X_test, byrow=TRUE)
X_test <- t(X_test)
y_test <- as.matrix(y_test, byrow=TRUE)
y_test <- t(y_test)
#y_pred <- makePrediction(X_test, y_test, HIDDEN_NEURONS)
#y_pred <- round(y_pred)

Confusion Matrix

A confusion matrix is often used to describe the performance of a classifier. It is defined as:

confusionMatrix =  matrix(c("TrueNegative", "FalsePositive", "FalseNegative", "TruePositive "), 2, 2)

confusionMatrix
##      [,1]            [,2]           
## [1,] "TrueNegative"  "FalseNegative"
## [2,] "FalsePositive" "TruePositive "

Let’s go over the basic terms used in a confusion matrix through an example. Consider the case where we were trying to predict if an email was spam or not.

#tb_nn <- table(y_test, y_pred)
#tb_lr <- table(y_test, lr_pred)

Accuracy Metrics

We’ll calculate the Precision, Recall, F1 Score, Accuracy. These metrics, derived from the confusion matrix, are defined as -

Precision is defined as the number of true positives over the number of true positives plus the number of false positives.

\[ Precision=\frac{TruePositive}{TruePositive+FalsePositive} \] Recall is defined as the number of true positives over the number of true positives plus the number of false negatives.

\[ Recall=\frac{TruePositive}{TruePositive+FalseNegative} \] F1-score is the harmonic mean of precision and recall.

\[ F1Score=2×\frac{Precision×Recall}{Precision+Recall} \]

Accuracy gives us the percentage of the all correct predictions out total predictions made.

\[ Accuracy=\frac{TruePositive+TrueNegative}{TruePositive+FalsePositive+TrueNegative+FalseNegative} \] To better understand these terms, let’s continue the example of “email-spam” we used above.

Now that we have an understanding of the accuracy metrics, let’s actually calculate them. We’ll define a function that takes as input the confusion matrix. Then based on the above formulas, we’ll calculate the metrics.

calculate_stats <- function(tb, model_name) {
  acc <- (tb[1] + tb[4])/(tb[1] + tb[2] + tb[3] + tb[4])
  recall <- tb[4]/(tb[4] + tb[3])
  precision <- tb[4]/(tb[4] + tb[2])
  f1 <- 2 * ((precision * recall) / (precision + recall))
  
  cat(model_name, ": \n")
  cat("\tAccuracy = ", acc*100, "%.")
  cat("\n\tPrecision = ", precision*100, "%.")
  cat("\n\tRecall = ", recall*100, "%.")
  cat("\n\tF1 Score = ", f1*100, "%.\n\n")
}

Conclusion

In this two-part series, we’ve built a neural net from scratch with a vectorized implementation of backpropagation. We went through the entire life cycle of training a model; right from data pre-processing to model evaluation. Along the way, we learned about the mathematics that makes a neural-network. We went over basic concepts of linear algebra and calculus and implemented them as functions. We saw how to initialize weights, perform forward propagation, gradient descent, and back-propagation.

We learned about the ability of a neural net to fit to non-linear data and understood the importance of the role activation functions play in it. We trained a neural net and compared it’s performance to a logistic-regression model. We visualized the decision boundaries of both these models and saw how a neural-net was able to fit better than logistic regression. We learned about metrics like Precision, Recall, F1-Score, and Accuracy by evaluating our models against them.

You should now have a pretty solid understanding of how neural-networks are built.

I hope you had as much fun reading as I had while writing this! If I’ve made a mistake somewhere, I’d love to hear about it so I can correct it. Suggestions and constructive criticism are welcome. :)

Define the Activation Functions.

We implement the sigmoid() activation function for the output layer.

Build the model

Building the neural network requires configuring the layers of the model, then compiling the model.

Setup the layers

model <- keras_model_sequential()
## Loaded Tensorflow version 2.7.1
model %>%
  #layer_flatten(input_shape = c(28, 28)) %>%
  layer_dense(units = 40, activation = 'relu') #%>%
  #layer_dense(units = 10, activation = 'softmax')

Compile the model

model %>% compile(
  optimizer = 'adam', 
  loss = 'sparse_categorical_crossentropy',
  metrics = c('accuracy')
)

Get layer sizes

A neural network optimizes certain parameters to get to the right output. These parameters are initialized randomly. However, the size of these matrices is dependent upon the number of layers in different layers of neural-net.

To generate matrices with random parameters, we need to first obtain the size (number of neurons) of all the layers in our neural-net. We’ll write a function to do that. Let’s denote n_x, n_h, and n_y as the number of neurons in input layer, hidden layer, and output layer respectively.

We will obtain these shapes from our input and output data matrices created above.

#dim(abalone)[1]
dim(abalone)
## [1] 4177    9
colnames(abalone_TRAIN)
## [1] "Sex"      "Length"   "Diameter" "Height"   "Whole"    "Shucked"  "Viscera" 
## [8] "Shell"    "Rings"
#install.packages("neuralnet")
#library(neuralnet)

#df=data.frame(abalone_TRAIN)

#nn=neuralnet(Rings~Sex + Length + Diameter + Height + Whole + Shucked + Viscera + Shell, data=abalone_TRAIN, hidden=40,act.fct = "logistic",
#                linear.output = FALSE)

fit neural network

#nn =ne