Libraries Used

library(nnet)
library(caret)
library(RSNNS)
library(darch)
library(deepnet)
library(DT)
library(h2o)
library(data.table)
library(plotly)


Objective

Objective of this project highlights my novel approach to building a digit recognizer given information on pixels of digts. Neural network is used to predict the digit given the pixels and the accuracy of the model will be improved with several adjustments.

Data Loading and Partial Preview

First 5 columns and 100 rows

digits.train <- read.csv("train.csv")

datatable(digits.train[1:100, 1:5])


Data Preprocessing

The label column needs to be converted into factors so that R knows that this is a classification problem, not a regression problem.

digits.train$label <- as.factor(digits.train$label)

#check for levels
levels(digits.train$label)
##  [1] "0" "1" "2" "3" "4" "5" "6" "7" "8" "9"

Data Sampling

As it would take really long to train on a 42,000 rows data, a sample of 5,000 observations will be taken to reduce computation time of training a neural network. The label will be seperated as well to serve as an input into the neural network.

i <- 1:5000


digits.x <- digits.train[i , -1]

#labels
digits.y <- digits.train[i , 1]

Label Distribution

It is important that the distribution of the label is roughly even across all digits so that the training is not particularly biased towards a particular set of digits


Model Training

the packrage caret will be used to train our neural network with the following parameters:

#set seed for reproducible results
set.seed(1234)
digits.ml <- caret::train(digits.x, digits.y, method = "nnet", tuneGrid = expand.grid(.size = c(5), .decay = 0.1), trControl = trainControl(method = "none"), MaxNWts = 10000, maxit = 100)
## # weights:  3985
## initial  value 12434.260588 
## iter  10 value 9977.228514
## iter  20 value 9802.227158
## iter  30 value 9446.929945
## iter  40 value 9381.958983
## iter  50 value 9223.237586
## iter  60 value 9109.338442
## iter  70 value 8979.846532
## iter  80 value 8681.859348
## iter  90 value 8593.547349
## iter 100 value 8459.820674
## final  value 8459.820674 
## stopped after 100 iterations

Model Prediction

The predict() function is used to predict

digits.yhat1 <- predict(digits.ml)

Prediction distribution

Confusion Matrix

caret::confusionMatrix(digits.yhat1, digits.y)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   0   1   2   3   4   5   6   7   8   9
##          0 445   7  76 117  50 192  38  73 205  82
##          1   0 530  17   1   1  12   1  12  14   5
##          2  28  17 437  24 417  49 477 150  94 288
##          3  20   3  11 329   5 209   0  17 156  56
##          4   0   0   0   0   0   0   0   0   0   0
##          5   0   0   0   0   0   0   0   0   0   0
##          6   0   0   0   0   0   0   0   0   0   0
##          7   1   1   4   9   4   7   0 254   8  47
##          8   0   0   0   0   0   0   0   0   0   0
##          9   0   0   0   0   0   0   0   0   0   0
## 
## Overall Statistics
##                                           
##                Accuracy : 0.399           
##                  95% CI : (0.3854, 0.4127)
##     No Information Rate : 0.1116          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.3292          
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: 0 Class: 1 Class: 2 Class: 3 Class: 4 Class: 5
## Sensitivity            0.9008   0.9498   0.8018   0.6854   0.0000   0.0000
## Specificity            0.8136   0.9858   0.6534   0.8945   1.0000   1.0000
## Pos Pred Value         0.3463   0.8938   0.2206   0.4082      NaN      NaN
## Neg Pred Value         0.9868   0.9936   0.9642   0.9640   0.9046   0.9062
## Prevalence             0.0988   0.1116   0.1090   0.0960   0.0954   0.0938
## Detection Rate         0.0890   0.1060   0.0874   0.0658   0.0000   0.0000
## Detection Prevalence   0.2570   0.1186   0.3962   0.1612   0.0000   0.0000
## Balanced Accuracy      0.8572   0.9678   0.7276   0.7899   0.5000   0.5000
##                      Class: 6 Class: 7 Class: 8 Class: 9
## Sensitivity            0.0000   0.5020   0.0000   0.0000
## Specificity            1.0000   0.9820   1.0000   1.0000
## Pos Pred Value            NaN   0.7582      NaN      NaN
## Neg Pred Value         0.8968   0.9460   0.9046   0.9044
## Prevalence             0.1032   0.1012   0.0954   0.0956
## Detection Rate         0.0000   0.0508   0.0000   0.0000
## Detection Prevalence   0.0000   0.0670   0.0000   0.0000
## Balanced Accuracy      0.5000   0.7420   0.5000   0.5000


Improving Model Performance

To improve the model, the number of hidden neurons in the hidden layer is increased to 10.

#set seed for reproducible results
set.seed(1234)

digits.m2 <- caret::train(digits.x, digits.y, method = "nnet", tuneGrid = expand.grid(.size = c(10), .decay = 0.1), trControl = trainControl(method = "none"), MaxNWts = 50000, maxit = 100)
## # weights:  7960
## initial  value 14194.996238 
## iter  10 value 8394.797041
## iter  20 value 7325.992444
## iter  30 value 6839.076549
## iter  40 value 6691.493033
## iter  50 value 6617.147106
## iter  60 value 6373.274511
## iter  70 value 6208.286504
## iter  80 value 5944.266565
## iter  90 value 5851.746804
## iter 100 value 5775.463191
## final  value 5775.463191 
## stopped after 100 iterations

Improved Model Prediction Plot

digits.yhat2 <- predict(digits.m2)

Improved Accuracy

Adding addtional hidden neurons significantly improved the model’s accuracy from 39% to 63%. When the hidden nodes are set to 40, accuracy jumps to around 80%.

caret::confusionMatrix(digits.yhat2, digits.y)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   0   1   2   3   4   5   6   7   8   9
##          0 347   0  50   3   2   9  25   3   6   3
##          1   0 507  12   4   0   6   3  10  10   7
##          2  13   8 353  17  13  41  30  21  22   0
##          3  10   8  47 393   0  49   1  17  93   7
##          4  10   0   6   1 253  17  14   1   9 105
##          5  74   0  10  22   0 104   2   5  17  14
##          6   5  28  33   2   4   8 426   2  12   0
##          7  19   2   8   8   8  26   2 383  10  32
##          8  15   3   7  24   1 156  10   5 268  11
##          9   1   2  19   6 196  53   3  59  30 299
## 
## Overall Statistics
##                                           
##                Accuracy : 0.6666          
##                  95% CI : (0.6533, 0.6797)
##     No Information Rate : 0.1116          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.6294          
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: 0 Class: 1 Class: 2 Class: 3 Class: 4 Class: 5
## Sensitivity            0.7024   0.9086   0.6477   0.8187   0.5304   0.2217
## Specificity            0.9776   0.9883   0.9630   0.9487   0.9640   0.9682
## Pos Pred Value         0.7746   0.9070   0.6815   0.6288   0.6082   0.4194
## Neg Pred Value         0.9677   0.9885   0.9572   0.9801   0.9511   0.9232
## Prevalence             0.0988   0.1116   0.1090   0.0960   0.0954   0.0938
## Detection Rate         0.0694   0.1014   0.0706   0.0786   0.0506   0.0208
## Detection Prevalence   0.0896   0.1118   0.1036   0.1250   0.0832   0.0496
## Balanced Accuracy      0.8400   0.9484   0.8053   0.8837   0.7472   0.5950
##                      Class: 6 Class: 7 Class: 8 Class: 9
## Sensitivity            0.8256   0.7569   0.5618   0.6255
## Specificity            0.9790   0.9744   0.9487   0.9184
## Pos Pred Value         0.8192   0.7691   0.5360   0.4476
## Neg Pred Value         0.9799   0.9727   0.9536   0.9587
## Prevalence             0.1032   0.1012   0.0954   0.0956
## Detection Rate         0.0852   0.0766   0.0536   0.0598
## Detection Prevalence   0.1040   0.0996   0.1000   0.1336
## Balanced Accuracy      0.9023   0.8657   0.7553   0.7720