Info about the lab

Important information

We will work with the library neuralnet here. This is an old library that is just enough to get a bit familiar with deep learning. If you are interested in the subject, you need to take a special course or even several courses (e.g., Deep Learning sequence on Coursera has 5 courses).

There are two modern frameworks for deep learning available in R. One is keras (it his a high level interfaces for tensorflow developed by Google). keras is definitely worth getting familiar with. The only problem is that keras for R is not really written in R, but rather it just allows you to call Python functions with R syntax. It means that in order for it to work, you will first need to install Python + Anaconda + TensorFlow + Keras for Python. Only after installing the Python bundle, you will be able to work in keras and chances are that there will be some issues with installation.

Another popular framework for deep learning called PyTorch (PyTorch sits on top of Torch written in C) was released for R last year as a native R implementation, i.e., it will work in R without having to install Python. It is easier to install than keras for R, but there may be problems to. Anyway, this new library is worth exploring and you can do it in your project:

Learning aim

The aim of this lab is to acquire basic familiarity with deep learning.

Objectives

By the end of this lab session, students should be able to

  1. Plot greyscale images in R

  2. Arrange several plots on a grid

  3. Train artificial neural networks in R

  4. Set training controls and hyperparameters for neural networks.

Mode

Please run the R chunks one by one, look at the output and make sure that you understand how it is produced. There will be questions that either require a short answer - then you type your answer right in this document - or modifying R codes - then you modify the R codes here. In either case, you can discuss your work with the lab instructor.

Data

The original raw dataset can be found here:

It consists of 60000 training and 10000 grayscale images, each labelled by one of the following classes:

0 T-shirt/top 1 Trouser 2 Pullover 3 Dress 4 Coat 5 Sandal 6 Shirt 7 Sneaker 8 Bag 9 Ankle boot

Loading data into R

First we will load the data into R. Below are the data dimensions and variables.

library(tidyverse) # for manipulation with data
library(caret) # for machine learning, including KNN
library(neuralnet)
library(fastDummies)

# library(keras) # for artificial neural networks
library(gridExtra) # for arranging several plots

fashion_mnist_train <- read_csv("fashion-mnist_train.csv")
fashion_mnist_test <- read_csv("fashion-mnist_test.csv")
# c(train_images, train_labels) %<-% fashion_mnist$train
# c(test_images, test_labels) %<-% fashion_mnist$test

# Labels need to be stored separately as they are not a part of the data
class_names = c('T_shirt_top',
                'Trouser',
                'Pullover',
                'Dress',
                'Coat', 
                'Sandal',
                'Shirt',
                'Sneaker',
                'Bag',
                'Ankle_boot')

cat("Training data dim =", dim(fashion_mnist_train), "\n")
## Training data dim = 60000 785
cat("Test data dim =", dim(fashion_mnist_test), "\n")
## Test data dim = 10000 785

Here is a sample of how the first image represented in the raw data:

matrix(fashion_mnist_train[1, -1], ncol = 28)[10:15, 10:15]
##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 208  217  193  158  184  230 
## [2,] 203  230  139  90   156  229 
## [3,] 215  222  136  103  150  221 
## [4,] 210  215  195  186  193  213 
## [5,] 209  224  147  138  170  224 
## [6,] 209  233  156  100  164  233

And this is what labels look like:

head(fashion_mnist_train$label)
## [1] 2 9 6 0 3 4

The dataset is perfectly balanced:

table(fashion_mnist_train$label)
## 
##    0    1    2    3    4    5    6    7    8    9 
## 6000 6000 6000 6000 6000 6000 6000 6000 6000 6000

Actual images

fashion_mnist_train %>% select(-label) %>% slice(1) %>% as.numeric %>% matrix(ncol = 28) %>% as.data.frame()

Below is an actual image:

plot_fashion_mnist_image <- function(i, dataset = fashion_mnist_train) {
  image <- dataset %>% select(-label) %>% slice(i) %>% 
    as.numeric %>% matrix(ncol = 28) %>% t %>% as.data.frame()
  colnames(image) <- seq_len(ncol(image))
  image$y <- seq_len(nrow(image))
  image <- pivot_longer(image, -y, names_to = "x", values_to = "value")
  image$x <- as.integer(image$x)
  
  ggplot(image, aes(x = x, y = y, fill = value)) +
    geom_tile() +
    scale_fill_gradient(low = "white", high = "black", na.value = NA) +
    scale_y_reverse() +
    theme_minimal() +
    theme(panel.grid = element_blank(), axis.text.y=element_blank(),
          axis.text.x=element_blank(), aspect.ratio = 1, 
          legend.position = "none") +
    xlab("") +
    ylab("")
}

plot_fashion_mnist_image(1)

And here is how we can plot several images on the same plot:

1:25 %>%
  lapply(plot_fashion_mnist_image) %>%
  marrangeGrob(nrow = 5, ncol = 5)

Question 1

Change the function for plotting images so that it will print the image title with the class label. Plot the first 16 images with the class labels.

plot_fashion_mnist_image <- function(i, dataset = fashion_mnist_train) {
  image <- dataset %>% select(-label) %>% slice(i) %>% 
    as.numeric %>% matrix(ncol = 28) %>% t %>% as.data.frame()
  colnames(image) <- seq_len(ncol(image))
  image$y <- seq_len(nrow(image))
  image <- pivot_longer(image, -y, names_to = "x", values_to = "value")
  image$x <- as.integer(image$x)

  ggplot(image, aes(x = x, y = y, fill = value)) +
    geom_tile() +
    scale_fill_gradient(low = "white", high = "black", na.value = NA) +
    scale_y_reverse() +
    theme_minimal() +
    theme(panel.grid = element_blank(), axis.text.y=element_blank(),
          axis.text.x=element_blank(), aspect.ratio = 1, 
          legend.position = "none") +
    xlab("") +
    ylab("") + ggtitle(class_names[dataset$label[i]+1])
}

1:16 %>%
  lapply(plot_fashion_mnist_image) %>%
  marrangeGrob(nrow = 4, ncol = 4)

Random forest

Here, we will train a random forest to get some idea of how a familiar model can predict the labels.

First, we convert our numeric labels to categorical with the function recode or, in this case, recode_factor. Its argument is a named vector whose names are old labels and whose entries are new labels.

names(class_names) <- 0:9
train_labels_char <- fashion_mnist_train$label %>%
  as.character %>%
  recode_factor(!!!class_names) 

test_labels_char <- fashion_mnist_test$label %>%
  as.character %>%
  recode_factor(!!!class_names) 

head(train_labels_char)
## [1] Pullover    Ankle_boot  Shirt       T_shirt_top Dress       Coat       
## 10 Levels: T_shirt_top Trouser Pullover Dress Coat Sandal Shirt Sneaker ... Ankle_boot

We do not understand the purpose of !!! - it is like that in the manual. If someone can figure it out and explain to all the students via WhatsApp, we will be grateful.

Now we will convert our data to data frame format. Below is a sample of the training data:

train_df <- fashion_mnist_train %>%
  mutate(label = train_labels_char)

test_df <- fashion_mnist_test %>%
  mutate(label = test_labels_char)

train_df %>%
  select(pixel305:pixel310, label) %>%
  sample_n(10)

Now we are ready to train familiar models. We will do random forest since it is natively designed for multi-class classification. Since the dataset is quite large, we will skip proper hyperparameter tuning, but you can try to experiment with hyperparameters at home to see how well you can predict test labels.

set.seed(199)

rfGrid <- expand.grid(mtry = c(30), 
                      min.node.size = c(40),
                      splitrule = "gini")

mod_rf <- train(label ~ . , data = train_df, method = "ranger",
                num.trees = 100,
                importance = 'impurity',
                tuneGrid = rfGrid,
                trControl = trainControl("oob"))
mod_rf
## Random Forest 
## 
## 60000 samples
##   784 predictor
##    10 classes: 'T_shirt_top', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle_boot' 
## 
## No pre-processing
## Resampling results:
## 
##   Accuracy   Kappa    
##   0.8730167  0.8589074
## 
## Tuning parameter 'mtry' was held constant at a value of 30
## Tuning
##  parameter 'splitrule' was held constant at a value of gini
## Tuning
##  parameter 'min.node.size' was held constant at a value of 40

Question 2

Which classes do you think are the hardest for the model to predict? Which ones will it confuse the most? Think about it and then print the test confusion matrix.

Answer the model should have difficulty distinguishing items that look similar. It will probably have hard time to distinguish a shirt, a t-shirt, and a pullover. Maybe, ankle boots vs sandals with high heels.

Looking at the confusion matrix, we see that, indeed, “shirt” is the hardest label to predict, but distinguishing sandals from high heel boots doesn’t seem to be hard - probably, the dataset does not have many shoes with high heels.

mod_rf %>%
  predict(test_df, type = "raw") %>%
  confusionMatrix(test_df$label)
## Confusion Matrix and Statistics
## 
##              Reference
## Prediction    T_shirt_top Trouser Pullover Dress Coat Sandal Shirt Sneaker Bag
##   T_shirt_top         842       2        7    19    0      0   171       0   1
##   Trouser               0     966        1     9    1      0     1       0   1
##   Pullover             11       6      802    14   59      0    94       0   7
##   Dress                37      17       11   928   32      0    31       0   0
##   Coat                  2       1      115    19  864      0    76       0   3
##   Sandal                2       1        0     0    0    946     0      20   2
##   Shirt                94       6       53    10   41      0   609       0  10
##   Sneaker               0       0        0     0    0     37     0     925   2
##   Bag                  12       1       11     1    3      5    18       0 973
##   Ankle_boot            0       0        0     0    0     12     0      55   1
##              Reference
## Prediction    Ankle_boot
##   T_shirt_top          0
##   Trouser              0
##   Pullover             0
##   Dress                0
##   Coat                 0
##   Sandal               7
##   Shirt                1
##   Sneaker             42
##   Bag                  3
##   Ankle_boot         947
## 
## Overall Statistics
##                                           
##                Accuracy : 0.8802          
##                  95% CI : (0.8737, 0.8865)
##     No Information Rate : 0.1             
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.8669          
##                                           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: T_shirt_top Class: Trouser Class: Pullover
## Sensitivity                      0.8420         0.9660          0.8020
## Specificity                      0.9778         0.9986          0.9788
## Pos Pred Value                   0.8081         0.9867          0.8077
## Neg Pred Value                   0.9824         0.9962          0.9780
## Prevalence                       0.1000         0.1000          0.1000
## Detection Rate                   0.0842         0.0966          0.0802
## Detection Prevalence             0.1042         0.0979          0.0993
## Balanced Accuracy                0.9099         0.9823          0.8904
##                      Class: Dress Class: Coat Class: Sandal Class: Shirt
## Sensitivity                0.9280      0.8640        0.9460       0.6090
## Specificity                0.9858      0.9760        0.9964       0.9761
## Pos Pred Value             0.8788      0.8000        0.9673       0.7391
## Neg Pred Value             0.9919      0.9848        0.9940       0.9574
## Prevalence                 0.1000      0.1000        0.1000       0.1000
## Detection Rate             0.0928      0.0864        0.0946       0.0609
## Detection Prevalence       0.1056      0.1080        0.0978       0.0824
## Balanced Accuracy          0.9569      0.9200        0.9712       0.7926
##                      Class: Sneaker Class: Bag Class: Ankle_boot
## Sensitivity                  0.9250     0.9730            0.9470
## Specificity                  0.9910     0.9940            0.9924
## Pos Pred Value               0.9195     0.9474            0.9330
## Neg Pred Value               0.9917     0.9970            0.9941
## Prevalence                   0.1000     0.1000            0.1000
## Detection Rate               0.0925     0.0973            0.0947
## Detection Prevalence         0.1006     0.1027            0.1015
## Balanced Accuracy            0.9580     0.9835            0.9697

Neural networks

Preparing data

For multiclass classification, we will need to explicitly convert our vectors of labels to matrices of dummy variables.

y_train <- dummy_cols(train_labels_char, remove_selected_columns = TRUE) %>% setNames(class_names)
y_test <- dummy_cols(test_labels_char, remove_selected_columns = TRUE) %>% setNames(class_names)
head(y_train)

Here are our datasets

set.seed(158)
train_df <- fashion_mnist_train %>%
  select(-label) %>%
  cbind(y_train)

test_df <- fashion_mnist_test %>%
  select(-label) %>%
  cbind(y_test)

train_df %>%
  select(pixel305:pixel310, Trouser, Shirt, Bag) %>%
  sample_n(10)

Since the response variable is 10-dimensional, we will need to create a special formula for predicting

f <- class_names %>% 
  paste(collapse = " + ") %>%
  paste("~ .") %>%
  as.formula()
f
## T_shirt_top + Trouser + Pullover + Dress + Coat + Sandal + Shirt + 
##     Sneaker + Bag + Ankle_boot ~ .
## <environment: 0x11dde8538>

Training a neural network

Now we will train a model with one hidden layer with 10 units.

mod_nn <- neuralnet(f, 
                    data = train_df, hidden = c(10),
                    stepmax = 1000,
                    threshold = 50,
                    lifesign = "full",
                    lifesign.step = 100,
                    err.fct = "ce",
                    linear.output = FALSE)
## hidden: 10    thresh: 50    rep: 1/1    steps:     100   min thresh: 355.029772451359
##                                                    200   min thresh: 174.398683265989
##                                                    300   min thresh: 55.8617951596633
##                                                    400   min thresh: 55.8617951596633
##                                                    500   min thresh: 55.8617951596633
##                                                    600   min thresh: 55.8617951596633
##                                                    700   min thresh: 54.9082071779552
##                                                    714   error: 107062.88718 time: 10.07 mins
summary(mod_nn)
##                     Length   Class      Mode    
## call                      10 -none-     call    
## response              600000 -none-     numeric 
## covariate           47040000 -none-     numeric 
## model.list                 2 -none-     list    
## err.fct                    1 -none-     function
## act.fct                    1 -none-     function
## linear.output              1 -none-     logical 
## data                     794 data.frame list    
## exclude                    0 -none-     NULL    
## net.result                 1 -none-     list    
## weights                    1 -none-     list    
## generalized.weights        1 -none-     list    
## startweights               1 -none-     list    
## result.matrix           7963 -none-     numeric

Question 3

How many parameters does our model have? First, find the answer based on your knowledge of lecture material and then verify it by printing the summary of the model.

Answer The input has \(28\times 28=784\) units, i.e., layer dimensions are \[ n_1=784,\quad n_2=10,\quad n_3=10 \] Thus the total number of parameters is \[ (784+1)\times 10 + (10+1)\times 10=7960 \]

This is how we can dimensions of these matrices

mod_nn$weights[[1]] %>% lapply(dim)
## [[1]]
## [1] 785  10
## 
## [[2]]
## [1] 11 10

And the total number of parameters:

mod_nn$weights[[1]] %>% lapply(dim) %>% sapply(prod) %>% sum
## [1] 7960

Model’s predictions

This is what raw predictions look like:

mod_nn %>% predict(fashion_mnist_test) %>% head
##             [,1]         [,2]        [,3]        [,4]        [,5]         [,6]
## [1,] 0.640174955 0.0658487596 0.010963269 0.087205895 0.008027419 0.0004322397
## [2,] 0.040296847 0.8531541214 0.018011822 0.019602809 0.069062228 0.0001437993
## [3,] 0.004420297 0.0087816599 0.375128071 0.007880877 0.326293704 0.0009430098
## [4,] 0.004420297 0.0087816599 0.375128071 0.007880877 0.326293704 0.0009430098
## [5,] 0.003400619 0.2932075764 0.008434255 0.357872946 0.280149888 0.0009832037
## [6,] 0.158338363 0.0001074794 0.266215500 0.036566973 0.050180557 0.0028300134
##            [,7]         [,8]        [,9]        [,10]
## [1,] 0.15011072 2.198296e-34 0.008089964 6.418063e-05
## [2,] 0.06559408 2.365127e-33 0.003558540 1.420052e-04
## [3,] 0.21954092 3.936055e-33 0.033918875 7.309403e-04
## [4,] 0.21954092 3.936055e-33 0.033918875 7.309403e-04
## [5,] 0.02968224 3.086418e-34 0.007676339 2.964850e-05
## [6,] 0.41443740 3.658414e-34 0.074230896 3.304623e-04

Raw predictions are probabilities rather than labels. To identify predicted labels, we will construct a special function. It will first find the position of the item with the largest predicted probability and then insert its label. Below is the test confusion matrix of our model.

predict_nn <- function(nn, dataset = fashion_mnist_test, label_vector = class_names) {
  predicted_positions <- nn %>%
    predict(dataset) %>%
    apply(1, which.max)

  as.factor(label_vector[predicted_positions])
}

mod_nn %>% predict_nn(test_df) %>%
  confusionMatrix(test_labels_char)
## Warning in confusionMatrix.default(., test_labels_char): Levels are not in the
## same order for reference and data. Refactoring data to match.
## Confusion Matrix and Statistics
## 
##              Reference
## Prediction    T_shirt_top Trouser Pullover Dress Coat Sandal Shirt Sneaker Bag
##   T_shirt_top         745      26       14    69   11      5   213       0   3
##   Trouser               9     835       13    29   53      2    12       0   1
##   Pullover             20      21      841    19  725      0   522       0  82
##   Dress               125     112       11   852  173      0   104       0  37
##   Coat                  0       0       10     2    8      0     7       0   2
##   Sandal                0       0        0     0    1    291     0      28   5
##   Shirt                50       4       94    24   16      2   100       0  45
##   Sneaker               0       1        0     0    0    452     0     809  15
##   Bag                  49       1       14     4    3     94    31       5 429
##   Ankle_boot            2       0        3     1   10    154    11     158 381
##              Reference
## Prediction    Ankle_boot
##   T_shirt_top          3
##   Trouser              1
##   Pullover             2
##   Dress                0
##   Coat                 0
##   Sandal               1
##   Shirt                0
##   Sneaker            108
##   Bag                 57
##   Ankle_boot         828
## 
## Overall Statistics
##                                          
##                Accuracy : 0.5738         
##                  95% CI : (0.564, 0.5835)
##     No Information Rate : 0.1            
##     P-Value [Acc > NIR] : < 2.2e-16      
##                                          
##                   Kappa : 0.5264         
##                                          
##  Mcnemar's Test P-Value : NA             
## 
## Statistics by Class:
## 
##                      Class: T_shirt_top Class: Trouser Class: Pullover
## Sensitivity                      0.7450         0.8350          0.8410
## Specificity                      0.9618         0.9867          0.8454
## Pos Pred Value                   0.6841         0.8743          0.3768
## Neg Pred Value                   0.9714         0.9818          0.9795
## Prevalence                       0.1000         0.1000          0.1000
## Detection Rate                   0.0745         0.0835          0.0841
## Detection Prevalence             0.1089         0.0955          0.2232
## Balanced Accuracy                0.8534         0.9108          0.8432
##                      Class: Dress Class: Coat Class: Sandal Class: Shirt
## Sensitivity                0.8520      0.0080        0.2910       0.1000
## Specificity                0.9376      0.9977        0.9961       0.9739
## Pos Pred Value             0.6025      0.2759        0.8926       0.2985
## Neg Pred Value             0.9828      0.9005        0.9267       0.9069
## Prevalence                 0.1000      0.1000        0.1000       0.1000
## Detection Rate             0.0852      0.0008        0.0291       0.0100
## Detection Prevalence       0.1414      0.0029        0.0326       0.0335
## Balanced Accuracy          0.8948      0.5028        0.6436       0.5369
##                      Class: Sneaker Class: Bag Class: Ankle_boot
## Sensitivity                  0.8090     0.4290            0.8280
## Specificity                  0.9360     0.9713            0.9200
## Pos Pred Value               0.5841     0.6245            0.5349
## Neg Pred Value               0.9778     0.9387            0.9796
## Prevalence                   0.1000     0.1000            0.1000
## Detection Rate               0.0809     0.0429            0.0828
## Detection Prevalence         0.1385     0.0687            0.1548
## Balanced Accuracy            0.8725     0.7002            0.8740

As we see, the model’s performance is not exactly spectacular. It is actually possible to create a deep learning model that will outperform random forest, but it’s got to be a convolutional neural network (special architecture) trained on a modern deep learning framework (such as keras or torch) and it will require a lot of tuning. This goes beyond our course, especially since finding a right combination of hyperparameters for a deep leaning model requires a lot of time and effort. In the next section, you will see a tip of the iceberg of tuning a deep learning model.

Customizing the model

Artificial neural networks usually have a large number of parameters (weights and biases defining model layers) and a large number of hyperparameters that control the training process. One can choose the number of layers and layer dimensions, the learning rate, the optimization algorithm, activation functions in hidden layers etc. You can read the manual on the function neuralnet to find the list of controls available.

Below we train a new model, with two hidden layers (10 units and 5 units respectively) and using the algorithm called “slr” rather than “rprop+”. If you are super-patient, you can also try a smaller value of threshold and a large value of stepmax.

set.seed(158)
mod_nn_2 <- neuralnet(f,
                    data = train_df, hidden = c(10, 5),
                    stepmax = 1000,
                    threshold = 50,
                    lifesign = "full",
                    lifesign.step = 10,
                    err.fct = "ce",
                    algorithm = "slr",
                    linear.output = FALSE)
## hidden: 10, 5    thresh: 50    rep: 1/1    steps:      10    min thresh: 2196.46647194059
##                                                        20    min thresh: 1763.73885525966
##                                                        30    min thresh: 1305.29781962469
##                                                        40    min thresh: 444.983399972875
##                                                        50    min thresh: 381.514121566101
##                                                        60    min thresh: 308.199536800102
##                                                        70    min thresh: 260.695932471294
##                                                        80    min thresh: 260.695932471294
##                                                        90    min thresh: 260.695932471294
##                                                       100    min thresh: 260.695932471294
##                                                       110    min thresh: 222.386316954971
##                                                       120    min thresh: 222.386316954971
##                                                       130    min thresh: 222.386316954971
##                                                       140    min thresh: 222.386316954971
##                                                       150    min thresh: 222.386316954971
##                                                       160    min thresh: 222.386316954971
##                                                       170    min thresh: 173.733595514207
##                                                       180    min thresh: 173.733595514207
##                                                       190    min thresh: 173.733595514207
##                                                       200    min thresh: 173.733595514207
##                                                       210    min thresh: 173.733595514207
##                                                       220    min thresh: 173.733595514207
##                                                       230    min thresh: 173.733595514207
##                                                       240    min thresh: 173.733595514207
##                                                       250    min thresh: 169.459673421583
##                                                       260    min thresh: 159.75651651691
##                                                       270    min thresh: 159.75651651691
##                                                       280    min thresh: 111.642730676291
##                                                       290    min thresh: 111.642730676291
##                                                       300    min thresh: 111.642730676291
##                                                       310    min thresh: 111.642730676291
##                                                       320    min thresh: 111.642730676291
##                                                       330    min thresh: 111.642730676291
##                                                       340    min thresh: 111.642730676291
##                                                       350    min thresh: 111.642730676291
##                                                       360    min thresh: 96.9121750173971
##                                                       370    min thresh: 60.2778975343813
##                                                       380    min thresh: 60.2778975343813
##                                                       390    min thresh: 60.2778975343813
##                                                       400    min thresh: 60.2778975343813
##                                                       410    min thresh: 60.2778975343813
##                                                       420    min thresh: 60.2778975343813
##                                                       430    min thresh: 60.2778975343813
##                                                       440    min thresh: 60.2778975343813
##                                                       450    min thresh: 60.2778975343813
##                                                       460    min thresh: 60.2778975343813
##                                                       470    min thresh: 60.2778975343813
##                                                       480    min thresh: 60.2778975343813
##                                                       490    min thresh: 60.2778975343813
##                                                       500    min thresh: 60.2778975343813
##                                                       510    min thresh: 60.2778975343813
##                                                       520    min thresh: 60.2778975343813
##                                                       530    min thresh: 60.2778975343813
##                                                       540    min thresh: 60.2778975343813
##                                                       550    min thresh: 60.2778975343813
##                                                       560    min thresh: 60.2778975343813
##                                                       570    min thresh: 60.2778975343813
##                                                       580    min thresh: 60.2778975343813
##                                                       590    min thresh: 60.2778975343813
##                                                       600    min thresh: 60.2778975343813
##                                                       610    error: 108487.54983 time: 8.83 mins
summary(mod_nn_2)
##                     Length   Class      Mode    
## call                      11 -none-     call    
## response              600000 -none-     numeric 
## covariate           47040000 -none-     numeric 
## model.list                 2 -none-     list    
## err.fct                    1 -none-     function
## act.fct                    1 -none-     function
## linear.output              1 -none-     logical 
## data                     794 data.frame list    
## exclude                    0 -none-     NULL    
## net.result                 1 -none-     list    
## weights                    1 -none-     list    
## generalized.weights        1 -none-     list    
## startweights               1 -none-     list    
## result.matrix           7968 -none-     numeric

And here is the confusion matrix of the new model:

mod_nn_2 %>% predict_nn(test_df) %>%
  confusionMatrix(test_labels_char)
## Warning in confusionMatrix.default(., test_labels_char): Levels are not in the
## same order for reference and data. Refactoring data to match.
## Confusion Matrix and Statistics
## 
##              Reference
## Prediction    T_shirt_top Trouser Pullover Dress Coat Sandal Shirt Sneaker Bag
##   T_shirt_top           0       0        0     0    0      0     1       0   0
##   Trouser              88     932       14   787  101      0    48       0   1
##   Pullover            789      29      903    58  721      1   774       0  22
##   Dress                13       6        6    42   13      4    12       0   5
##   Coat                 73      29       34    95  136      1    99       0   1
##   Sandal                3       1        1     0    1    571     1      48  22
##   Shirt                 3       0        0     0    0      1     2       0   1
##   Sneaker               0       1        0    13    1    104     0     626   4
##   Bag                  30       2       41     5   27     14    63       2 937
##   Ankle_boot            1       0        1     0    0    304     0     324   7
##              Reference
## Prediction    Ankle_boot
##   T_shirt_top          1
##   Trouser              0
##   Pullover             0
##   Dress                0
##   Coat                 0
##   Sandal              46
##   Shirt                0
##   Sneaker             13
##   Bag                  2
##   Ankle_boot         938
## 
## Overall Statistics
##                                           
##                Accuracy : 0.5087          
##                  95% CI : (0.4989, 0.5185)
##     No Information Rate : 0.1             
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.4541          
##                                           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: T_shirt_top Class: Trouser Class: Pullover
## Sensitivity                      0.0000         0.9320          0.9030
## Specificity                      0.9998         0.8846          0.7340
## Pos Pred Value                   0.0000         0.4729          0.2739
## Neg Pred Value                   0.9000         0.9915          0.9855
## Prevalence                       0.1000         0.1000          0.1000
## Detection Rate                   0.0000         0.0932          0.0903
## Detection Prevalence             0.0002         0.1971          0.3297
## Balanced Accuracy                0.4999         0.9083          0.8185
##                      Class: Dress Class: Coat Class: Sandal Class: Shirt
## Sensitivity                0.0420      0.1360        0.5710       0.0020
## Specificity                0.9934      0.9631        0.9863       0.9994
## Pos Pred Value             0.4158      0.2906        0.8228       0.2857
## Neg Pred Value             0.9032      0.9094        0.9539       0.9001
## Prevalence                 0.1000      0.1000        0.1000       0.1000
## Detection Rate             0.0042      0.0136        0.0571       0.0002
## Detection Prevalence       0.0101      0.0468        0.0694       0.0007
## Balanced Accuracy          0.5177      0.5496        0.7787       0.5007
##                      Class: Sneaker Class: Bag Class: Ankle_boot
## Sensitivity                  0.6260     0.9370            0.9380
## Specificity                  0.9849     0.9793            0.9292
## Pos Pred Value               0.8215     0.8344            0.5956
## Neg Pred Value               0.9595     0.9929            0.9926
## Prevalence                   0.1000     0.1000            0.1000
## Detection Rate               0.0626     0.0937            0.0938
## Detection Prevalence         0.0762     0.1123            0.1575
## Balanced Accuracy            0.8054     0.9582            0.9336

Question 4

Find the test image for which the model is least confident in its prediction. Was it correct for predicting that image’s class?

conf_level <- mod_nn_2 %>% predict(test_df) %>% apply(1, max)
ind_of_least_confidence <- which.min(conf_level)
cat("The model is least confident in predicting test image",
      ind_of_least_confidence, "\n")
## The model is least confident in predicting test image 1977
cat("Below are class probabilities for test image", ind_of_least_confidence, ":\n")
## Below are class probabilities for test image 1977 :
mod_nn_2 %>% predict(test_df[ind_of_least_confidence , ]) %>% round(3)
##       [,1]  [,2]  [,3]  [,4]  [,5] [,6]  [,7] [,8]  [,9] [,10]
## 1977 0.018 0.002 0.005 0.011 0.004    0 0.004    0 0.001 0.018
cat("Predicted class is", as.character(predict_nn(mod_nn_2, test_df[ind_of_least_confidence , ])),
    "\n")
## Predicted class is T_shirt_top
cat("The actual class is", as.character(test_labels_char[ind_of_least_confidence]), "\n")
## The actual class is Ankle_boot

And here is the image the model is least confident about:

plot_fashion_mnist_image(ind_of_least_confidence, fashion_mnist_test)

Regularization

If our deep learning model is too flexible, i.e., if it has too many parameters, it may be prone to overfitting (which doesn’t happen here, clearly). One possible way to prevent overfitting is not to have a super-flexible model, but in that case, the model may be underfitting. A common method to construct flexible models that neither overfit nor underfit is regularization.

We are not going to learn about regularization in this lab (because, again, it takes a lot of time and effort). Below is just some basic information for you

Most common techniques to regularize neural networks are \(l_1/l_2\) regularization (similar to ridge / LASSO) and dropout. Both techniques are applied at a particular layer.

Let’s say we introduce \(l_1\)-regularization in the first hidden layer with coefficient \(0.005\). It means that the loss function will get the extra term \[ +0.005\sum_{i=1}^{n_2}\sum_{j=1}^{n_1}|w^{(1)}_{i,j}| \]

Let’s say that we introduce a dropout regularization in the second hidden layer with \(p=0.05\). It means that when calculating the model’s output, the output of every unit in the second hidden layer will be replaced with 0 with probability \(0.05\). This operation will prevent the model’s output to be too much dependent on a particular unit.

The point of this section is to see that regularizing a deep learning model takes a lot of effort - we need to choose which layers to regularize, regularization type, and regularization constants (they are also hyperparameters of the model).

Survey

There is a link to a simple survey after lab 7:

Answers

Here are the answers: