library(nnet)
library(caret)
library(RSNNS)
library(darch)
library(deepnet)
library(DT)
library(h2o)
library(data.table)
library(plotly)
Objective of this project highlights my novel approach to building a digit recognizer given information on pixels of digts. Neural network is used to predict the digit given the pixels and the accuracy of the model will be improved with several adjustments.
First 5 columns and 100 rows
digits.train <- read.csv("train.csv")
datatable(digits.train[1:100, 1:5])
The label column needs to be converted into factors so that R knows that this is a classification problem, not a regression problem.
digits.train$label <- as.factor(digits.train$label)
#check for levels
levels(digits.train$label)
## [1] "0" "1" "2" "3" "4" "5" "6" "7" "8" "9"
As it would take really long to train on a 42,000 rows data, a sample of 5,000 observations will be taken to reduce computation time of training a neural network. The label will be seperated as well to serve as an input into the neural network.
i <- 1:5000
digits.x <- digits.train[i , -1]
#labels
digits.y <- digits.train[i , 1]
the packrage caret will be used to train our neural network with the following parameters:
#set seed for reproducible results
set.seed(1234)
digits.ml <- caret::train(digits.x, digits.y, method = "nnet", tuneGrid = expand.grid(.size = c(5), .decay = 0.1), trControl = trainControl(method = "none"), MaxNWts = 10000, maxit = 100)
## # weights: 3985
## initial value 12434.260588
## iter 10 value 9977.228514
## iter 20 value 9802.227158
## iter 30 value 9446.929945
## iter 40 value 9381.958983
## iter 50 value 9223.237586
## iter 60 value 9109.338442
## iter 70 value 8979.846532
## iter 80 value 8681.859348
## iter 90 value 8593.547349
## iter 100 value 8459.820674
## final value 8459.820674
## stopped after 100 iterations
The predict() function is used to predict
digits.yhat1 <- predict(digits.ml)
caret::confusionMatrix(digits.yhat1, digits.y)
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1 2 3 4 5 6 7 8 9
## 0 445 7 76 117 50 192 38 73 205 82
## 1 0 530 17 1 1 12 1 12 14 5
## 2 28 17 437 24 417 49 477 150 94 288
## 3 20 3 11 329 5 209 0 17 156 56
## 4 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0
## 7 1 1 4 9 4 7 0 254 8 47
## 8 0 0 0 0 0 0 0 0 0 0
## 9 0 0 0 0 0 0 0 0 0 0
##
## Overall Statistics
##
## Accuracy : 0.399
## 95% CI : (0.3854, 0.4127)
## No Information Rate : 0.1116
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.3292
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: 0 Class: 1 Class: 2 Class: 3 Class: 4 Class: 5
## Sensitivity 0.9008 0.9498 0.8018 0.6854 0.0000 0.0000
## Specificity 0.8136 0.9858 0.6534 0.8945 1.0000 1.0000
## Pos Pred Value 0.3463 0.8938 0.2206 0.4082 NaN NaN
## Neg Pred Value 0.9868 0.9936 0.9642 0.9640 0.9046 0.9062
## Prevalence 0.0988 0.1116 0.1090 0.0960 0.0954 0.0938
## Detection Rate 0.0890 0.1060 0.0874 0.0658 0.0000 0.0000
## Detection Prevalence 0.2570 0.1186 0.3962 0.1612 0.0000 0.0000
## Balanced Accuracy 0.8572 0.9678 0.7276 0.7899 0.5000 0.5000
## Class: 6 Class: 7 Class: 8 Class: 9
## Sensitivity 0.0000 0.5020 0.0000 0.0000
## Specificity 1.0000 0.9820 1.0000 1.0000
## Pos Pred Value NaN 0.7582 NaN NaN
## Neg Pred Value 0.8968 0.9460 0.9046 0.9044
## Prevalence 0.1032 0.1012 0.0954 0.0956
## Detection Rate 0.0000 0.0508 0.0000 0.0000
## Detection Prevalence 0.0000 0.0670 0.0000 0.0000
## Balanced Accuracy 0.5000 0.7420 0.5000 0.5000
To improve the model, the number of hidden neurons in the hidden layer is increased to 10.
#set seed for reproducible results
set.seed(1234)
digits.m2 <- caret::train(digits.x, digits.y, method = "nnet", tuneGrid = expand.grid(.size = c(10), .decay = 0.1), trControl = trainControl(method = "none"), MaxNWts = 50000, maxit = 100)
## # weights: 7960
## initial value 14194.996238
## iter 10 value 8394.797041
## iter 20 value 7325.992444
## iter 30 value 6839.076549
## iter 40 value 6691.493033
## iter 50 value 6617.147106
## iter 60 value 6373.274511
## iter 70 value 6208.286504
## iter 80 value 5944.266565
## iter 90 value 5851.746804
## iter 100 value 5775.463191
## final value 5775.463191
## stopped after 100 iterations
digits.yhat2 <- predict(digits.m2)
Adding addtional hidden neurons significantly improved the model’s accuracy from 39% to 63%. When the hidden nodes are set to 40, accuracy jumps to around 80%.
caret::confusionMatrix(digits.yhat2, digits.y)
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1 2 3 4 5 6 7 8 9
## 0 347 0 50 3 2 9 25 3 6 3
## 1 0 507 12 4 0 6 3 10 10 7
## 2 13 8 353 17 13 41 30 21 22 0
## 3 10 8 47 393 0 49 1 17 93 7
## 4 10 0 6 1 253 17 14 1 9 105
## 5 74 0 10 22 0 104 2 5 17 14
## 6 5 28 33 2 4 8 426 2 12 0
## 7 19 2 8 8 8 26 2 383 10 32
## 8 15 3 7 24 1 156 10 5 268 11
## 9 1 2 19 6 196 53 3 59 30 299
##
## Overall Statistics
##
## Accuracy : 0.6666
## 95% CI : (0.6533, 0.6797)
## No Information Rate : 0.1116
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.6294
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: 0 Class: 1 Class: 2 Class: 3 Class: 4 Class: 5
## Sensitivity 0.7024 0.9086 0.6477 0.8187 0.5304 0.2217
## Specificity 0.9776 0.9883 0.9630 0.9487 0.9640 0.9682
## Pos Pred Value 0.7746 0.9070 0.6815 0.6288 0.6082 0.4194
## Neg Pred Value 0.9677 0.9885 0.9572 0.9801 0.9511 0.9232
## Prevalence 0.0988 0.1116 0.1090 0.0960 0.0954 0.0938
## Detection Rate 0.0694 0.1014 0.0706 0.0786 0.0506 0.0208
## Detection Prevalence 0.0896 0.1118 0.1036 0.1250 0.0832 0.0496
## Balanced Accuracy 0.8400 0.9484 0.8053 0.8837 0.7472 0.5950
## Class: 6 Class: 7 Class: 8 Class: 9
## Sensitivity 0.8256 0.7569 0.5618 0.6255
## Specificity 0.9790 0.9744 0.9487 0.9184
## Pos Pred Value 0.8192 0.7691 0.5360 0.4476
## Neg Pred Value 0.9799 0.9727 0.9536 0.9587
## Prevalence 0.1032 0.1012 0.0954 0.0956
## Detection Rate 0.0852 0.0766 0.0536 0.0598
## Detection Prevalence 0.1040 0.0996 0.1000 0.1336
## Balanced Accuracy 0.9023 0.8657 0.7553 0.7720