One of the biggest challenges in machine learning is computer vision, which is the field of training computers to correctly recognize imagery in a similar way a human would. The goal is always to approach human level accuracy, and as you’ll see some hand written numbers are hard to discern with the eye.
The computer vision challenge for the demonstration to follow involves training the computer to recognize hand-written numbers. The data used to train the machine is sourced from the widely-known database called MNIST. It is a good database for people who want to study deep learning techniques and pattern recognition methods on real-world data while spending minimal efforts on pre-processing and formatting.
There are many ways to conduct training and testing on the MNIST data set, but the deep learning methods here leverage Keras interface to R, which is underpinned by a Google-developed capability called Tensorflow. While using Keras we will define a model with two separate approaches and compare both accuracy and runtimes. Generally, readers can consider this to be a hello world test for Keras interface to R.
#library(keras)
#keras_install()
loadedNamespaces()
## [1] "colorspace" "ellipsis" "class" "rprojroot" "base64enc"
## [6] "fs" "rstudioapi" "stats" "remotes" "prodlim"
## [11] "fansi" "lubridate" "xml2" "codetools" "splines"
## [16] "R.methodsS3" "knitr" "pkgload" "zeallot" "jsonlite"
## [21] "pROC" "caret" "broom" "base" "kernlab"
## [26] "dbplyr" "R.oo" "tfruns" "readr" "compiler"
## [31] "httr" "backports" "assertthat" "Matrix" "lazyeval"
## [36] "cli" "htmltools" "prettyunits" "tools" "gtable"
## [41] "glue" "reshape2" "dplyr" "naivebayes" "grDevices"
## [46] "Rcpp" "cellranger" "vctrs" "nlme" "iterators"
## [51] "timeDate" "xfun" "gower" "stringr" "ps"
## [56] "testthat" "rvest" "lifecycle" "pacman" "devtools"
## [61] "MASS" "scales" "ipred" "graphics" "hms"
## [66] "tidyverse" "utils" "yaml" "memoise" "reticulate"
## [71] "ggplot2" "keras" "datasets" "rpart" "stringi"
## [76] "tensorflow" "desc" "randomForest" "foreach" "pkgbuild"
## [81] "lava" "rlang" "pkgconfig" "evaluate" "lattice"
## [86] "purrr" "recipes" "htmlwidgets" "processx" "tidyselect"
## [91] "plyr" "magrittr" "R6" "generics" "DBI"
## [96] "pillar" "haven" "whisker" "withr" "survival"
## [101] "nnet" "tibble" "modelr" "crayon" "plotly"
## [106] "rmarkdown" "usethis" "grid" "readxl" "data.table"
## [111] "blob" "callr" "methods" "forcats" "ModelMetrics"
## [116] "webshot" "reprex" "digest" "tidyr" "R.utils"
## [121] "stats4" "munsell" "MLeval" "kableExtra" "viridisLite"
## [126] "sessioninfo"
Load the data from the Kaggle digits data set. There is a train and test data set. For this example, I only use the train data set but run cross validation. Also note: this data set is large and so I reduce it by “percent” – see below.
## [1] 42000 785
| label | pixel0 | pixel1 | pixel2 | pixel3 |
|---|---|---|---|---|
| 1 | 0 | 0 | 0 | 0 |
| 0 | 0 | 0 | 0 | 0 |
| 1 | 0 | 0 | 0 | 0 |
| 4 | 0 | 0 | 0 | 0 |
| 0 | 0 | 0 | 0 | 0 |
| 0 | 0 | 0 | 0 | 0 |
EDA <- DigitTotalDF
## Distribution of digits across all data sets
plot1 <- ggplot(EDA, aes(x = label, y = (..count..)/sum(..count..),fill = label)) + geom_bar() + theme_light() +
labs(y = "Relative frequency", title = "mnist_train dataset") +
scale_y_continuous(labels=scales::percent, limits = c(0 , 0.15)) +
geom_text(stat = "count",
aes(label = scales:: percent((..count..)/sum(..count..)), vjust = -1))
plot1
View the digits data images, and how they vary.
library(readr)
#split training data into list by label number
digit_groups <- split(EDA, EDA$label)
# Remove label column in each
dg2 <- lapply(digit_groups, function(x) { x["label"] <- NULL; x })
gs1 = grey(c(0:128)/128)
# Pixel-wise function
pixel_wise <- function(df, func) {
output <- apply(t(df),1,func)
u <- matrix(output, ncol = 28)
v <- u
for(i in 1:28) {
v[i,] <- rev(u[i,])
}
image(v, col = gs1, axes = FALSE)
}
# Plot each digits mean
#par(mfrow=c(4,3))
plot <- lapply(dg2, pixel_wise, func = mean)
title("Pixel-wise Mean", outer=TRUE, line = -2)
# Plot each digits median
#par(mfrow=c(4,3))
plot <- lapply(dg2, pixel_wise, func = median)
title("Pixel-wise Median", outer=TRUE, line = -2)
#par(mfrow=c(4,3))
# Plot each digits standard deviation
plot <- lapply(dg2, pixel_wise, func = sd)
title("Pixel-wise Standard Deviation", outer=TRUE, line = -2)
This will provide you with default CPU-based installations of Keras and TensorFlow. If you want a more customized installation, e.g. if you want to take advantage of NVIDIA GPUs, see the documentation for install_keras().
Learning Keras Below we walk through a simple example of using Keras to recognize handwritten digits from the MNIST dataset. After getting familiar with the basics, check out the tutorials and additional learning resources available on this website.
The Deep Learning with R book by François Chollet (the creator of Keras) provides a more comprehensive introduction to both Keras and the concepts and practice of deep learning.
You may also find it convenient to download the Deep Learning with Keras cheat sheet, a quick high-level reference to all of the capabilities of Keras.
MNIST Example We can learn the basics of Keras by walking through a simple example: recognizing handwritten digits from the MNIST dataset. MNIST consists of 28 x 28 grayscale images of handwritten digits like these:
require(kableExtra)
df <-DigitTotalDF #keep this to reset without reloading
head(df[,1:5],10)
## label pixel0 pixel1 pixel2 pixel3
## 1 1 0 0 0 0
## 2 0 0 0 0 0
## 3 1 0 0 0 0
## 4 4 0 0 0 0
## 5 0 0 0 0 0
## 6 0 0 0 0 0
## 7 7 0 0 0 0
## 8 3 0 0 0 0
## 9 5 0 0 0 0
## 10 3 0 0 0 0
dim(df)
## [1] 42000 785
nDim <- dim(df)[2]-1
nDim
## [1] 784
###############
x <- modelDataC(df)
train <-x@trn
test <- x@tst
hold <- x@hld
trn <- as.data.frame(dim(train))
names(trn) <- c("dimension_training_train")
tst <-as.data.frame(dim(test))
names(tst) <- c("dimension_testing_test")
hld <-as.data.frame(dim(hold))
names(hld) <- c("dimension_testing_holdout")
train <- train
dim(train)
## [1] 33604 785
test <- test
dim(test)
## [1] 8396 785
hold <- hold
dim(hold)
## [1] 1676 785
split <- cbind(trn,tst,hld)
split %>%kable() %>% kable_styling()
| dimension_training_train | dimension_testing_test | dimension_testing_holdout |
|---|---|---|
| 33604 | 8396 | 1676 |
| 785 | 785 | 785 |
#view the data sets
head(train[,1:5]) %>% kable() %>% kable_styling()
| label | pixel0 | pixel1 | pixel2 | pixel3 |
|---|---|---|---|---|
| 1 | 0 | 0 | 0 | 0 |
| 0 | 0 | 0 | 0 | 0 |
| 1 | 0 | 0 | 0 | 0 |
| 4 | 0 | 0 | 0 | 0 |
| 0 | 0 | 0 | 0 | 0 |
| 0 | 0 | 0 | 0 | 0 |
head(test[,1:5]) %>% kable() %>% kable_styling()
| label | pixel0 | pixel1 | pixel2 | pixel3 | |
|---|---|---|---|---|---|
| 10 | 3 | 0 | 0 | 0 | 0 |
| 11 | 8 | 0 | 0 | 0 | 0 |
| 15 | 3 | 0 | 0 | 0 | 0 |
| 18 | 0 | 0 | 0 | 0 | 0 |
| 20 | 5 | 0 | 0 | 0 | 0 |
| 23 | 2 | 0 | 0 | 0 | 0 |
head(hold[,1:5]) %>% kable() %>% kable_styling()
| label | pixel0 | pixel1 | pixel2 | pixel3 | |
|---|---|---|---|---|---|
| 11 | 8 | 0 | 0 | 0 | 0 |
| 24 | 0 | 0 | 0 | 0 | 0 |
| 81 | 5 | 0 | 0 | 0 | 0 |
| 100 | 5 | 0 | 0 | 0 | 0 |
| 104 | 7 | 0 | 0 | 0 | 0 |
| 111 | 0 | 0 | 0 | 0 | 0 |
#devtools::install_github("rstudio/keras")
library(keras)
x_train <- train[,2:785] #omit label
x_test <- test[,2:785] #omit label
x_hold <- hold[,2:785] #omit label
#make data frames for features matrix shape
x_train <- as.matrix(x_train)
x_test <- as.matrix(x_test)
x_hold <- as.matrix(x_hold)
dim(x_train)
## [1] 33604 784
head(x_train[,1:5]) %>% kable() %>% kable_styling()
| pixel0 | pixel1 | pixel2 | pixel3 | pixel4 |
|---|---|---|---|---|
| 0 | 0 | 0 | 0 | 0 |
| 0 | 0 | 0 | 0 | 0 |
| 0 | 0 | 0 | 0 | 0 |
| 0 | 0 | 0 | 0 | 0 |
| 0 | 0 | 0 | 0 | 0 |
| 0 | 0 | 0 | 0 | 0 |
dim(x_test)
## [1] 8396 784
head(x_test[,1:5]) %>% kable() %>% kable_styling()
| pixel0 | pixel1 | pixel2 | pixel3 | pixel4 | |
|---|---|---|---|---|---|
| 10 | 0 | 0 | 0 | 0 | 0 |
| 11 | 0 | 0 | 0 | 0 | 0 |
| 15 | 0 | 0 | 0 | 0 | 0 |
| 18 | 0 | 0 | 0 | 0 | 0 |
| 20 | 0 | 0 | 0 | 0 | 0 |
| 23 | 0 | 0 | 0 | 0 | 0 |
dim(x_hold)
## [1] 1676 784
head(x_hold[,1:5]) %>% kable() %>% kable_styling()
| pixel0 | pixel1 | pixel2 | pixel3 | pixel4 | |
|---|---|---|---|---|---|
| 11 | 0 | 0 | 0 | 0 | 0 |
| 24 | 0 | 0 | 0 | 0 | 0 |
| 81 | 0 | 0 | 0 | 0 | 0 |
| 100 | 0 | 0 | 0 | 0 | 0 |
| 104 | 0 | 0 | 0 | 0 | 0 |
| 111 | 0 | 0 | 0 | 0 | 0 |
The x data is a 3-d array (images,width,height) of grayscale values . To prepare the data for training we convert the 3-d arrays into matrices by reshaping width and height into a single dimension (28x28 images are flattened into length 784 vectors). Then, we convert the grayscale values from integers ranging between 0 to 255 into floating point values ranging between 0 and 1:
x_train <- array_reshape(x_train, c(nrow(x_train), nDim))
x_test <- array_reshape(x_test, c(nrow(x_test), nDim))
x_hold <- array_reshape(x_hold, c(nrow(x_hold), nDim))
#check dimensions
dim(x_train)
## [1] 33604 784
dim(x_test)
## [1] 8396 784
dim(x_hold)
## [1] 1676 784
#rescale - Don't do this if using PCA input as its already scaled
x_train <- x_train / 255
x_test <- x_test / 255
x_hold <- x_hold / 255
Note that we use the array_reshape() function rather than the dim<-() function to reshape the array. This is so that the data is re-interpreted using row-major semantics (as opposed to R’s default column-major semantics), which is in turn compatible with the way that the numerical libraries called by Keras interpret array dimensions.
The y data is an integer vector with values ranging from 0 to 9. To prepare this data for training we one-hot encode the vectors into binary class matrices using the Keras to_categorical() function:
#y_train <- as.numeric(train$label)
y_train <- train$label #these were already R factor
head(y_train)
## [1] 1 0 1 4 0 0
## Levels: 0 1 2 3 4 5 6 7 8 9
#y_test <- as.numeric(test$label)
y_test <- test$label #these were already R factor
head(y_test)
## [1] 3 8 3 0 5 2
## Levels: 0 1 2 3 4 5 6 7 8 9
y_hold <- hold$label #these were already R factor
head(y_hold)
## [1] 8 0 5 5 7 0
## Levels: 0 1 2 3 4 5 6 7 8 9
# categorical conversion
y_train <- to_categorical(y_train, 10)
y_test <- to_categorical(y_test, 10)
y_hold <- to_categorical(y_hold, 10)
The simplest type of model is the Sequential mode, a linear stack of layers. We begin by creating a sequential model and then adding layers using the pipe (%>%) operator:
tf.fit <- keras_model_sequential()
tf.fit %>%
layer_dense(units = 512, activation = 'relu', input_shape = c(nDim)) %>%
#layer_dense(units = 256, activation = 'relu') %>%
layer_dropout(rate = 0.4) %>%
layer_dense(units = 256, activation = 'relu') %>%
layer_dropout(rate = 0.3) %>%
layer_dense(units = 64, activation = 'relu') %>%
layer_dropout(rate = 0.2) %>%
layer_dense(units = 10, activation = 'softmax')
The input_shape argument to the first layer specifies the shape of the input data (a length 784 numeric vector representing a grayscale image). The final layer outputs a length 10 numeric vector (probabilities for each digit) using a softmax activation function.
Use the summary() function to print the details of the tf.fit:
summary(tf.fit)
## Model: "sequential"
## ________________________________________________________________________________
## Layer (type) Output Shape Param #
## ================================================================================
## dense (Dense) (None, 512) 401920
## ________________________________________________________________________________
## dropout (Dropout) (None, 512) 0
## ________________________________________________________________________________
## dense_1 (Dense) (None, 256) 131328
## ________________________________________________________________________________
## dropout_1 (Dropout) (None, 256) 0
## ________________________________________________________________________________
## dense_2 (Dense) (None, 64) 16448
## ________________________________________________________________________________
## dropout_2 (Dropout) (None, 64) 0
## ________________________________________________________________________________
## dense_3 (Dense) (None, 10) 650
## ================================================================================
## Total params: 550,346
## Trainable params: 550,346
## Non-trainable params: 0
## ________________________________________________________________________________
Next, compile the tf.fit with appropriate loss function, optimizer, and metrics:
tf.fit %>% keras::compile(
loss = 'categorical_crossentropy',
# optimizer = optimizer_rmsprop(),
optimizer = 'rmsprop',
#optimizer = 'rmsprop', this was a work around given on stackoverflow, which made it work on Linux with ts 2.3.1. It limited the ability to change the learning rate though.
metrics = c('accuracy')
)
Use the fit() function to train the tf.fit for 30 epochs using batches of 128 images:
ptm <- proc.time() #timer
tf.train <- tf.fit %>% fit(
x_train, y_train,
#epochs = 10, batch_size = 10,
epochs = 30, batch_size = 128,
validation_split = 0.2
)
proc.time() - ptm
## user system elapsed
## 211.43 61.64 109.69
The history object returned by fit() includes loss and accuracy metrics which we can plot:
plot(tf.train)
## `geom_smooth()` using formula 'y ~ x'
# Results
#alluvial plots
plotCM <- function(cm){
cmdf <- as.data.frame(cm[["table"]])
cmdf[["color"]] <- ifelse(cmdf[[1]] == cmdf[[2]], "grey", "red")
alluvial::alluvial(cmdf[,1:2]
, freq = cmdf$Freq
, col = cmdf[["color"]]
, alpha = 0.5
, hide = cmdf$Freq == 0
)
}
require(plotly)
ggPlotCM <- function(cm){
## ggplotly version
cm_d <- as.data.frame(cm$table)
# confusion matrix statistics as data.frame
cm_st <-data.frame(cm$overall)
# round the values
cm_st$cm.overall <- round(cm_st$cm.overall,2)
# here we also have the rounded percentage values
cm_p <- as.data.frame(prop.table(cm$table))
cm_d$Perc <- round(cm_p$Freq*100,2)
# plotting the matrix
cm_d_p <- ggplot(data = cm_d, aes(x = Prediction , y = Reference, fill = Freq))+
geom_tile() +
geom_text(aes(label = paste("",Freq,",",Perc,"%")), color = 'red', size = 3) +
theme_light() +
guides(fill=FALSE) + theme_1()
# plotting the stats
#cm_st_p <- tableGrob(cm_st)
cm_d_ply <-ggplotly(cm_d_p)
return (cm_d_ply)
}
Evaluate the tf.fit’s performance on the test data:
tf.fit %>% keras::evaluate(x_test, y_test)
## loss accuracy
## 0.1750664 0.9767746
#Generate predictions on new data:
pred <-tf.fit %>% predict_classes(x_test)
#table(Predicted=pred, Actual =test$label)
cm <- confusionMatrix(factor(pred), factor(test$label))
cm
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1 2 3 4 5 6 7 8 9
## 0 815 0 1 3 2 2 4 3 3 3
## 1 0 925 0 0 0 0 1 2 5 0
## 2 0 3 817 5 1 2 0 4 3 0
## 3 3 0 3 847 0 10 0 0 4 5
## 4 1 5 2 0 804 0 3 1 2 19
## 5 0 0 0 3 1 730 1 0 6 0
## 6 2 0 1 0 1 7 814 0 2 1
## 7 1 1 7 7 0 0 0 865 1 5
## 8 3 1 4 0 0 4 3 1 782 2
## 9 1 1 0 5 5 4 1 4 4 802
##
## Overall Statistics
##
## Accuracy : 0.9768
## 95% CI : (0.9733, 0.9799)
## No Information Rate : 0.1115
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.9742
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: 0 Class: 1 Class: 2 Class: 3 Class: 4 Class: 5
## Sensitivity 0.98668 0.9882 0.97844 0.9736 0.98771 0.96179
## Specificity 0.99723 0.9989 0.99762 0.9967 0.99565 0.99856
## Pos Pred Value 0.97488 0.9914 0.97844 0.9713 0.96057 0.98516
## Neg Pred Value 0.99854 0.9985 0.99762 0.9969 0.99868 0.99621
## Prevalence 0.09838 0.1115 0.09945 0.1036 0.09695 0.09040
## Detection Rate 0.09707 0.1102 0.09731 0.1009 0.09576 0.08695
## Detection Prevalence 0.09957 0.1111 0.09945 0.1039 0.09969 0.08826
## Balanced Accuracy 0.99195 0.9936 0.98803 0.9851 0.99168 0.98018
## Class: 6 Class: 7 Class: 8 Class: 9
## Sensitivity 0.98428 0.9830 0.96305 0.95818
## Specificity 0.99815 0.9971 0.99763 0.99669
## Pos Pred Value 0.98309 0.9752 0.97750 0.96977
## Neg Pred Value 0.99828 0.9980 0.99605 0.99538
## Prevalence 0.09850 0.1048 0.09671 0.09969
## Detection Rate 0.09695 0.1030 0.09314 0.09552
## Detection Prevalence 0.09862 0.1056 0.09528 0.09850
## Balanced Accuracy 0.99122 0.9900 0.98034 0.97744
#plotCM(cm)
#ggPlotCM(cm)
Evaluate the tf.fit’s performance on the hold out data:
tf.fit %>% keras::evaluate(x_hold, y_hold)
## loss accuracy
## 0.1880576 0.9743437
#Generate predictions on new data:
pred2 <-tf.fit %>% predict_classes(x_hold)
#table(Predicted=pred, Actual =test$label)
cm <- confusionMatrix(factor(pred2), factor(hold$label))
cm
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1 2 3 4 5 6 7 8 9
## 0 164 0 0 1 0 0 3 1 1 1
## 1 0 184 0 0 0 0 0 1 0 0
## 2 0 1 161 3 0 0 0 0 0 0
## 3 1 0 1 164 0 2 0 0 1 1
## 4 0 1 1 0 160 0 1 0 0 3
## 5 0 0 0 2 0 145 0 0 0 0
## 6 0 0 1 0 1 2 161 0 0 0
## 7 0 0 1 3 0 0 0 173 0 0
## 8 0 1 2 0 0 1 0 0 159 0
## 9 0 0 0 1 1 1 0 1 1 162
##
## Overall Statistics
##
## Accuracy : 0.9743
## 95% CI : (0.9656, 0.9814)
## No Information Rate : 0.1116
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.9715
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: 0 Class: 1 Class: 2 Class: 3 Class: 4 Class: 5
## Sensitivity 0.99394 0.9840 0.96407 0.94253 0.98765 0.96026
## Specificity 0.99537 0.9993 0.99735 0.99601 0.99604 0.99869
## Pos Pred Value 0.95906 0.9946 0.97576 0.96471 0.96386 0.98639
## Neg Pred Value 0.99934 0.9980 0.99603 0.99336 0.99868 0.99608
## Prevalence 0.09845 0.1116 0.09964 0.10382 0.09666 0.09010
## Detection Rate 0.09785 0.1098 0.09606 0.09785 0.09547 0.08652
## Detection Prevalence 0.10203 0.1104 0.09845 0.10143 0.09905 0.08771
## Balanced Accuracy 0.99465 0.9916 0.98071 0.96927 0.99185 0.97948
## Class: 6 Class: 7 Class: 8 Class: 9
## Sensitivity 0.97576 0.9830 0.98148 0.97006
## Specificity 0.99735 0.9973 0.99736 0.99669
## Pos Pred Value 0.97576 0.9774 0.97546 0.97006
## Neg Pred Value 0.99735 0.9980 0.99802 0.99669
## Prevalence 0.09845 0.1050 0.09666 0.09964
## Detection Rate 0.09606 0.1032 0.09487 0.09666
## Detection Prevalence 0.09845 0.1056 0.09726 0.09964
## Balanced Accuracy 0.98656 0.9901 0.98942 0.98337
#plotCM(cm)
#ggPlotCM(cm)
The previous run 97.83 accuracy was attained on th test set. The following code will try to improve upon that by using the convolutions neural networking (CNN) approach. The MNIST data preparation is a little different with this approach.
batch_size <- 128
num_classes <- 10
epochs <- 12
# Input image dimensions
img_rows <- 28
img_cols <- 28
library(keras)
#The data, shuffled and split between train and test sets
mnist <- dataset_mnist()
x_train <- mnist$train$x
y_train <- mnist$train$y
x_test <- mnist$test$x
y_test <- mnist$test$y
x_train <- array_reshape(x_train, c(nrow(x_train), img_rows, img_cols, 1))
x_test <- array_reshape(x_test, c(nrow(x_test), img_rows, img_cols, 1))
input_shape <- c(img_rows, img_cols, 1)
#RGB values converted to values ranging from 0 to 1
x_train <- x_train / 255
x_test <- x_test / 255
cat('x_train_shape:', dim(x_train), '\n')
## x_train_shape: 60000 28 28 1
cat(nrow(x_train), 'train samples\n')
## 60000 train samples
cat(nrow(x_test), 'test samples\n')
## 10000 test samples
#Convert class vectors to binary class matrices
y_train <- to_categorical(y_train, 10)
y_test <- to_categorical(y_test, 10)
The key difference with the CNN model definition is the data inputs structure and the use of 2 dimensional layers implemented in the code as “layer_conv_2d” layers.
model <- keras_model_sequential() %>%
layer_conv_2d(filters = 32, kernel_size = c(3,3), activation = 'relu',
input_shape = input_shape) %>%
layer_conv_2d(filters = 64, kernel_size = c(3,3), activation = 'relu') %>%
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_dropout(rate = 0.25) %>%
layer_flatten() %>%
layer_dense(units = 128, activation = 'relu') %>%
layer_dropout(rate = 0.5) %>%
layer_dense(units = num_classes, activation = 'softmax')
summary(model)
## Model: "sequential_1"
## ________________________________________________________________________________
## Layer (type) Output Shape Param #
## ================================================================================
## conv2d (Conv2D) (None, 26, 26, 32) 320
## ________________________________________________________________________________
## conv2d_1 (Conv2D) (None, 24, 24, 64) 18496
## ________________________________________________________________________________
## max_pooling2d (MaxPooling2D) (None, 12, 12, 64) 0
## ________________________________________________________________________________
## dropout_3 (Dropout) (None, 12, 12, 64) 0
## ________________________________________________________________________________
## flatten (Flatten) (None, 9216) 0
## ________________________________________________________________________________
## dense_4 (Dense) (None, 128) 1179776
## ________________________________________________________________________________
## dropout_4 (Dropout) (None, 128) 0
## ________________________________________________________________________________
## dense_5 (Dense) (None, 10) 1290
## ================================================================================
## Total params: 1,199,882
## Trainable params: 1,199,882
## Non-trainable params: 0
## ________________________________________________________________________________
model %>% keras::compile(
loss = loss_categorical_crossentropy,
optimizer = optimizer_adadelta(),
metrics = c('accuracy')
)
ptm <- proc.time() #timer
model %>% fit(
x_train, y_train,
batch_size = batch_size,
epochs = epochs,
validation_split = 0.2
)
proc.time() - ptm
## user system elapsed
## 3031.71 612.77 1412.36
scores <- model %>% keras::evaluate(
x_test, y_test, verbose = 0
)
# Output metrics
cat('Test loss:', scores[[1]], '\n')
## Test loss: 0.02950084
cat('Test accuracy:', scores[[2]], '\n')
## Test accuracy: 0.9909
We set out to demonstrate that deep learning methods can be used to train a machine to recognize hand-written digits. After this demonstration, we can safely conclude that computer vision is very effective here with 97.83% and greater accuracy. There is a slight increase in accuracy with the second case using Keras with the CNN approach, which yielded 99.13% accuracy. Wile the CNN approach did yield slightly better accuracy, the training time was 14x longer indicating trade-offs to consider. If the case was a larger scale model training initiative where one iteration was 1 hour with one method, the second method would burn the entire workday. Is the 1.3% gain worth it? It may depend on the use case, but either way you’ll have to decide.
https://keras.rstudio.com/ https://blog.rstudio.com/2017/09/05/keras-for-r/ https://tensorflow.rstudio.com/guide/keras/guide_keras/ https://keras.rstudio.com/articles/examples/mnist_cnn.html https://www.analyticsvidhya.com/blog/2020/02/learn-image-classification-cnn-convolutional-neural-networks-3-datasets/