American Sign Language Image Classification
library(tidyverse)
library(tensorflow)
library(openssl)
library(reticulate)
library(keras)
library(nnet)
library(plotly)
library(rpart.plot)
library(caret)
Objective
The goal of this project is to classify images using two different machine learning algorithms learned throughout the semester, specifically one methodology studied in weeks 1-10, and one methodology from weeks 11-15. I chose to model the data using XG boost, learned in week 7, and convolutional neural network, learned in week 14. Each algorithms was analyzed in terms of computational speed and accuracy to determine which algorithm is better for classifying images of American Sign Language letters.
Data Source
The data source used for this assignment is the ‘Sign Language MNIST’ dataset on Kaggle.com (https://www.kaggle.com/datasets/datamunge/sign-language-mnist?select=sign_mnist_train). The images within this dataset represent 24 of the 26 American Sign Language letters. Both the letters ‘J’ and ‘Z’ were excluded from this analysis as they include motion to sign the letter which is not possible to show in a still image. This data source contains separate .csv files for training and testing the model. The columns of the files relate to the grayscale pixel value of a 28 x 28 pixel image as well as a column for the label.
Read in Data
The training and testing datasets were stored in memory. I releveled the labels of the target variable as the numerical location of the letter was being used to represent the label instead of using the letter.
<- read.csv("sign_mnist_train.csv")
train <- read.csv("sign_mnist_test.csv") test
$label <- as.factor(train$label)
train
levels(train$label) <- list(A="0", B="1", C="2",
D="3", E="4", F="5",
G="6", H="7", I="8",
K="10", L="11", M="12",
N="13", O="14", P="15",
Q="16", R="17", S="18",
T="19", U="20", V="21",
W="22", X="23", Y="24")
$label <- as.factor(test$label)
test
levels(test$label) <- list(A="0", B="1", C="2",
D="3", E="4", F="5",
G="6", H="7", I="8",
K="10", L="11", M="12",
N="13", O="14", P="15",
Q="16", R="17", S="18",
T="19", U="20", V="21",
W="22", X="23", Y="24")
Glimpse at Data
Looking at the summary statistics of the first and last 10 columns of data:
- The pixel data stored in the columns ranges from 0 to 255
- There are 784 different columns related to pixel location, meaning that the data can be rearanged into a 28 x 28 matrix
summary(train[,1:10])
## label pixel1 pixel2 pixel3
## R : 1294 Min. : 0.0 Min. : 0.0 Min. : 0.0
## Q : 1279 1st Qu.:121.0 1st Qu.:126.0 1st Qu.:130.0
## L : 1241 Median :150.0 Median :153.0 Median :156.0
## W : 1225 Mean :145.4 Mean :148.5 Mean :151.2
## F : 1204 3rd Qu.:174.0 3rd Qu.:176.0 3rd Qu.:178.0
## S : 1199 Max. :255.0 Max. :255.0 Max. :255.0
## (Other):20013
## pixel4 pixel5 pixel6 pixel7
## Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0.0
## 1st Qu.:133.0 1st Qu.:137.0 1st Qu.:140.0 1st Qu.:142.0
## Median :158.0 Median :160.0 Median :162.0 Median :164.0
## Mean :153.5 Mean :156.2 Mean :158.4 Mean :160.5
## 3rd Qu.:179.0 3rd Qu.:181.0 3rd Qu.:182.0 3rd Qu.:183.0
## Max. :255.0 Max. :255.0 Max. :255.0 Max. :255.0
##
## pixel8 pixel9
## Min. : 0.0 Min. : 0
## 1st Qu.:144.0 1st Qu.:146
## Median :165.0 Median :166
## Mean :162.3 Mean :164
## 3rd Qu.:184.0 3rd Qu.:185
## Max. :255.0 Max. :255
##
summary(train[,776:785])
## pixel775 pixel776 pixel777 pixel778 pixel779
## Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0
## 1st Qu.: 92.0 1st Qu.: 96.0 1st Qu.:103.0 1st Qu.:112.0 1st Qu.:120
## Median :144.0 Median :162.0 Median :172.0 Median :180.0 Median :183
## Mean :141.1 Mean :147.5 Mean :153.3 Mean :159.1 Mean :162
## 3rd Qu.:196.0 3rd Qu.:202.0 3rd Qu.:205.0 3rd Qu.:207.0 3rd Qu.:208
## Max. :255.0 Max. :255.0 Max. :255.0 Max. :255.0 Max. :255
## pixel780 pixel781 pixel782 pixel783 pixel784
## Min. : 0.0 Min. : 0.0 Min. : 0 Min. : 0.0 Min. : 0.0
## 1st Qu.:125.0 1st Qu.:128.0 1st Qu.:128 1st Qu.:128.0 1st Qu.:125.5
## Median :184.0 Median :184.0 Median :182 Median :182.0 Median :182.0
## Mean :162.7 Mean :162.9 Mean :162 Mean :161.1 Mean :159.8
## 3rd Qu.:207.0 3rd Qu.:207.0 3rd Qu.:206 3rd Qu.:204.0 3rd Qu.:204.0
## Max. :255.0 Max. :255.0 Max. :255 Max. :255.0 Max. :255.0
paste0("There are ", nrow(train), " rows in the training data.")
## [1] "There are 27455 rows in the training data."
paste0("There are ", ncol(train), " columns in the training data.")
## [1] "There are 785 columns in the training data."
Normalize Data
It is important to normalize pixel intensity to make sure all input parameters have a similar data distribution. The normalization will also reduce the computational load for future calculations.
for(i in 2:ncol(train)){
<- train[,i] / 255
min.max.scale
<- min.max.scale
train[,i]
}
for(i in 2:ncol(test)){
<- test[,i] / 255
min.max.scale2
<- min.max.scale2
test[,i] }
Explore Data
Frequency of Labels
Having a near equal distribution of labeled data is imperative when creating a machine learning model. A balanced dataset will give the algorithm a “fighting chance” to classify all classes successfully.
Training Data
The training data appears have a fairly even distribution across most classes. R is the most common class and E is the least common class with a 1.2% difference in the volume of records.
<- c("A","B","C","D","E","F",
letter "G","H","I","K","L","M",
"N","O","P","Q","R","S",
"T","U","V","W","X","Y")
<- as.vector(prop.table(table(train$label)))
proportions
<- plot_ly(
fig x = letter,
y = proportions,
name = "Distribution",
type = "bar",
marker = list(color = "#ED7171",
line = list(color = '#910E0E',
width = 1.5)))
%>% layout(title = "Frequency of Letter in Training Data",
fig xaxis = list(categoryorder = "total descending"))
Testing Data
All classes in the testing set contain over 2% of the records in the testing set (approx. 143 records). This should serve as a good dataset for prediction as all classes are represented and have many examples.
<- c("A","B","C","D","E","F",
letter "G","H","I","K","L","M",
"N","O","P","Q","R","S",
"T","U","V","W","X","Y")
<- as.vector(prop.table(table(test$label)))
proportions
<- plot_ly(
fig x = letter,
y = proportions,
name = "Distribution",
type = "bar",
marker = list(color = "#545BC1",
line = list(color = '#050B5C',
width = 1.5)))
%>% layout(title = "Frequency of Letter in Testing Data",
fig xaxis = list(categoryorder = "total descending"))
Explore Images
The data needs to be parsed and stored into a 28 x 28 matrix in order to visualize the image. The function that I created below visualizes the images. The function was used to showcase all of the classes in the dataset. There are similarities between a handful of the American Sign Language letters such as ‘M’ and ‘N’ both are expressed through a closed fist or ‘I’ and ‘Y’ both use an extended pinky finger. This may prove difficult for modeling.
# function to plot images and provide the number the image is representing
<- function(dataset, row.index){
plot.image
# Obtain and store all pixels in a single row
<- as.numeric(dataset[row.index, 2:785])
x
# Create an empty matrix to store image pixels
<- matrix(nrow = 28, ncol = 28)
im
# Store data in matrix
<- 1
j for(i in 28:1){
<- x[j:(j+27)]
im[,i]
<- j+28
j
}
# Plot the image
image(x = 1:28,
y = 1:28,
z = im,
col=gray((0:255)/255),
main = paste0("Image: ", row.index, " Label: ", dataset$label[row.index]))
}
par(mfrow = c(2, 2))
plot.image(train, 47)
plot.image(train, 30)
plot.image(train, 3)
plot.image(train, 1)
par(mfrow = c(2, 2))
plot.image(train, 45)
plot.image(train, 49)
plot.image(train, 2)
plot.image(train, 50)
par(mfrow = c(2, 2))
plot.image(train, 7)
plot.image(train, 12)
plot.image(train, 41)
plot.image(train, 32)
par(mfrow = c(2, 2))
plot.image(train, 5)
plot.image(train, 62)
plot.image(train, 42)
plot.image(train, 6)
par(mfrow = c(2, 2))
plot.image(train, 17)
plot.image(train, 11)
plot.image(train, 20)
plot.image(train, 15)
par(mfrow = c(2, 2))
plot.image(train, 22)
plot.image(train, 8)
plot.image(train, 24)
plot.image(train, 27)
Machine Learning
XG Boost
Feature Engineering
Create New Features
I created two new features to be used in the XG boost algorithm:
rowSum
and rowVariance
. The
rowSum
and rowVariance
variables store the sum
and variance of the grayscale pixel values, respectfully.
<- train
train.xgb <- test
test.xgb
$rowSum <- rowSums(train.xgb[, -1])
train.xgb$rowVariance <- rowSums((train.xgb[, -1] - rowMeans(train.xgb[, -1]))^2)/(dim(train.xgb)[2] - 1)
train.xgb
$rowSum <- rowSums(test[, -1])
test.xgb$rowVariance <- rowSums((test[, -1] - rowMeans(test[, -1]))^2)/(dim(test)[2] - 1) test.xgb
Dimensionality Reduction via PCA
PCA was used to reduce the dimensionality of the data. The variance captured through the Principal Components seems to converge at 0.962 within the first 119 Principal Components. The XG boost model will consist of the first 119 Principal Components. PCA effectively reduced the feature size from 786 to 119.
<- prcomp(train.xgb[,-1],
pca.train center = TRUE,
scale. = TRUE)
<- predict(pca.train, newdata = test.xgb) pca.test
<- as.data.frame(pca.train$x)
train1 <- as.data.frame(pca.test)
test1
$label <- train$label
train1$label <- test$label test1
# Calculate Variance
<- round(pca.train $sdev^2/ sum(pca.train $sdev^2),3)
a
<- cumsum(a)
variance
1:150] variance[
## [1] 0.359 0.442 0.512 0.561 0.598 0.624 0.649 0.669 0.687 0.703 0.717 0.730
## [13] 0.741 0.751 0.760 0.768 0.775 0.782 0.789 0.795 0.801 0.806 0.811 0.816
## [25] 0.820 0.824 0.828 0.832 0.835 0.838 0.841 0.844 0.847 0.850 0.853 0.856
## [37] 0.859 0.862 0.865 0.867 0.869 0.871 0.873 0.875 0.877 0.879 0.881 0.883
## [49] 0.885 0.887 0.889 0.891 0.893 0.895 0.897 0.899 0.900 0.901 0.902 0.903
## [61] 0.904 0.905 0.906 0.907 0.908 0.909 0.910 0.911 0.912 0.913 0.914 0.915
## [73] 0.916 0.917 0.918 0.919 0.920 0.921 0.922 0.923 0.924 0.925 0.926 0.927
## [85] 0.928 0.929 0.930 0.931 0.932 0.933 0.934 0.935 0.936 0.937 0.938 0.939
## [97] 0.940 0.941 0.942 0.943 0.944 0.945 0.946 0.947 0.948 0.949 0.950 0.951
## [109] 0.952 0.953 0.954 0.955 0.956 0.957 0.958 0.959 0.960 0.961 0.962 0.962
## [121] 0.962 0.962 0.962 0.962 0.962 0.962 0.962 0.962 0.962 0.962 0.962 0.962
## [133] 0.962 0.962 0.962 0.962 0.962 0.962 0.962 0.962 0.962 0.962 0.962 0.962
## [145] 0.962 0.962 0.962 0.962 0.962 0.962
Model XG Boost Algorithm
I elected to use the XG boost algorithm with 119 features (PC1 to PC119) to classify the images. The model hyperparameters are:
nrounds
= 100 (Number of Trees)max_depth
= 3 (Maximum tree depth)eta
= 0.3 (Learning Rate)gamma
= 0 (Regularization Tuning)colsample_bytree
= 0.8 (Column sampling)min_child_weight
= 1 (Minimum leaf weight)subsample
= 0.5 (Row sampling)
set.seed(12345)
<- Sys.time()
start_time
<- expand.grid(
hyperparameters nrounds = 100,
max_depth = 3,
eta = 0.3,
gamma = 0,
colsample_bytree = 0.8,
min_child_weight = 1,
subsample = 0.5
)
<- trainControl(method = "cv",
trctrl number = 4)
<- train(label ~ PC1 + PC2 + PC3 + PC4 + PC5 + PC6 +PC7 + PC8 + PC9 + PC10 + PC11 + PC12 + PC13 + PC14 + PC15 + PC16 + PC17 + PC18 + PC19 + PC20 + PC21 + PC22 + PC23 + PC24 + PC25 + PC26 + PC27 + PC28 + PC29 + PC30 + PC31 + PC32 + PC33 + PC34 + PC35 + PC36 + PC37 + PC38 + PC39 + PC40 + PC41 + PC42 + PC43 + PC44 + PC45 + PC46 + PC47 + PC48 + PC49 + PC50 + PC51 + PC52 + PC53 + PC54 + PC55 + PC56 + PC57 + PC58 + PC59 + PC60 + PC61 + PC62 + PC63 + PC64 + PC65 + PC66 + PC67 + PC68 + PC69 + PC70 + PC71 + PC72 + PC73 + PC74 + PC75 + PC76 + PC77 + PC78 + PC79 + PC80 + PC81 + PC82 + PC83 + PC84 + PC85 + PC86 + PC87 + PC88 + PC89 + PC90 + PC91 + PC92 + PC93 + PC94 + PC95 + PC96 + PC97 + PC98 + PC99 + PC100 + PC101 + PC102 + PC103 + PC104 + PC105 + PC106 + PC107 + PC108 + PC109 + PC110 + PC111 + PC112 + PC113 + PC114 + PC115 + PC116 + PC117 + PC118 + PC119,
xgboost method = "xgbTree",
trControl = trctrl,
tuneGrid = hyperparameters,
data = train1)
<- Sys.time() end_time
- start_time end_time
## Time difference of 15.29524 mins
# set.seed(12345)
#
# start_time <- Sys.time()
#
# trctrl <- trainControl(method = "cv",
# number = 4)
#
# forest <- train(label ~ PC1 + PC2 + PC3 + PC4 + PC5 + PC6 +PC7 + PC8 + PC9 + PC10 + PC11 + PC12 + PC13 + PC14 + PC15 + PC16 + PC17 + PC18 + PC19 + PC20 + PC21 + PC22 + PC23 + PC24 + PC25 + PC26 + PC27 + PC28 + PC29 + PC30,
# method = "xgbTree",
# trControl = trctrl,
# data = train1)
#
# end_time <- Sys.time()
# end_time - start_time
Model Evaluation
The model performed well with an overall accuracy of 74.48%. This model is much more effective than guessing by random chance, as the random chance percentage is 1/24 or 4.167%. This model has the capacity to classify all classes with some level of accuracy, though some classes achieved better results than others. For example, C received a near perfect prediction rate on the testing data where as N was less successful in terms of prediction.
<- predict(xgboost, newdata = test1)
pred
<- confusionMatrix(data = pred, reference = test1$label)
cm1
cm1
## Confusion Matrix and Statistics
##
## Reference
## Prediction A B C D E F G H I K L M N O P Q R
## A 319 0 0 0 0 0 0 0 8 0 0 11 21 0 0 0 0
## B 0 361 0 0 0 0 0 0 19 0 0 0 0 0 0 0 0
## C 0 0 307 0 0 0 0 0 0 0 0 0 0 5 0 0 0
## D 0 18 0 225 0 0 1 0 6 0 0 8 5 0 0 0 0
## E 0 2 0 0 419 0 0 1 0 0 0 29 37 0 0 1 0
## F 0 0 0 1 0 217 0 0 0 22 0 0 0 12 0 0 0
## G 7 0 0 0 0 0 250 40 0 0 0 0 0 5 0 0 0
## H 0 0 0 0 0 0 27 383 16 0 0 0 0 0 0 0 0
## I 0 0 0 0 0 0 2 0 149 0 0 0 0 1 0 0 0
## K 0 38 0 0 0 9 0 0 3 204 0 0 0 0 0 0 2
## L 0 0 0 0 0 21 0 0 0 2 209 0 0 0 0 0 13
## M 0 0 0 0 21 0 8 0 0 1 0 224 15 2 0 0 0
## N 0 0 0 0 0 0 0 0 0 4 0 25 133 0 0 0 1
## O 0 0 2 0 0 0 3 1 0 0 0 11 25 184 0 0 0
## P 0 0 0 0 0 0 12 0 0 0 0 0 0 0 336 0 0
## Q 5 0 0 0 0 0 0 0 0 0 0 9 31 0 7 162 0
## R 0 8 0 1 0 0 0 0 1 36 0 0 0 0 0 0 92
## S 0 0 0 0 58 0 0 0 1 15 0 58 16 1 0 1 7
## T 0 0 0 0 0 0 23 5 0 0 0 0 8 24 0 0 8
## U 0 0 0 16 0 0 0 0 13 3 0 0 0 12 0 0 19
## V 0 0 0 2 0 0 19 0 3 8 0 4 0 0 0 0 0
## W 0 5 0 0 0 0 0 0 0 11 0 0 0 0 0 0 0
## X 0 0 0 0 0 0 3 6 0 0 0 0 0 0 4 0 2
## Y 0 0 1 0 0 0 0 0 69 25 0 15 0 0 0 0 0
## Reference
## Prediction S T U V W X Y
## A 11 0 0 0 0 0 0
## B 0 0 39 0 0 0 1
## C 0 0 0 0 0 0 0
## D 0 0 2 1 0 0 0
## E 2 0 0 0 0 0 0
## F 0 0 0 20 0 0 1
## G 0 0 0 0 0 0 0
## H 0 3 0 0 0 0 0
## I 21 12 0 0 0 0 2
## K 0 0 32 40 20 0 8
## L 0 15 15 0 0 0 0
## M 3 0 0 0 0 0 0
## N 0 0 0 0 0 0 0
## O 0 3 0 0 0 0 0
## P 0 3 0 0 0 0 0
## Q 0 0 0 0 0 0 0
## R 0 4 21 45 47 4 54
## S 209 0 0 0 0 24 15
## T 0 161 1 4 0 21 13
## U 0 9 154 62 20 0 2
## V 0 1 0 126 7 2 1
## W 0 0 0 48 112 16 29
## X 0 37 0 0 0 200 0
## Y 0 0 2 0 0 0 206
##
## Overall Statistics
##
## Accuracy : 0.7448
## 95% CI : (0.7346, 0.7549)
## No Information Rate : 0.0694
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.7332
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: A Class: B Class: C Class: D Class: E Class: F
## Sensitivity 0.96375 0.83565 0.99032 0.91837 0.84137 0.87854
## Specificity 0.99254 0.99125 0.99927 0.99408 0.98921 0.99191
## Pos Pred Value 0.86216 0.85952 0.98397 0.84586 0.85336 0.79487
## Neg Pred Value 0.99824 0.98948 0.99956 0.99710 0.98818 0.99565
## Prevalence 0.04615 0.06023 0.04322 0.03416 0.06944 0.03444
## Detection Rate 0.04448 0.05033 0.04281 0.03137 0.05842 0.03026
## Detection Prevalence 0.05159 0.05856 0.04350 0.03709 0.06846 0.03806
## Balanced Accuracy 0.97815 0.91345 0.99480 0.95622 0.91529 0.93523
## Class: G Class: H Class: I Class: K Class: L Class: M
## Sensitivity 0.71839 0.87844 0.51736 0.61631 1.00000 0.56853
## Specificity 0.99238 0.99317 0.99448 0.97778 0.99052 0.99262
## Pos Pred Value 0.82781 0.89277 0.79679 0.57303 0.76000 0.81752
## Neg Pred Value 0.98574 0.99214 0.98010 0.98137 1.00000 0.97536
## Prevalence 0.04852 0.06079 0.04016 0.04615 0.02914 0.05494
## Detection Rate 0.03486 0.05340 0.02078 0.02844 0.02914 0.03123
## Detection Prevalence 0.04211 0.05982 0.02607 0.04964 0.03834 0.03820
## Balanced Accuracy 0.85539 0.93581 0.75592 0.79705 0.99526 0.78058
## Class: N Class: O Class: P Class: Q Class: R Class: S
## Sensitivity 0.45704 0.74797 0.96830 0.98780 0.63889 0.84959
## Specificity 0.99564 0.99350 0.99780 0.99258 0.96855 0.97170
## Pos Pred Value 0.81595 0.80349 0.95726 0.75701 0.29393 0.51605
## Neg Pred Value 0.97746 0.99107 0.99839 0.99971 0.99242 0.99453
## Prevalence 0.04057 0.03430 0.04838 0.02287 0.02008 0.03430
## Detection Rate 0.01854 0.02566 0.04685 0.02259 0.01283 0.02914
## Detection Prevalence 0.02273 0.03193 0.04894 0.02984 0.04364 0.05647
## Balanced Accuracy 0.72634 0.87074 0.98305 0.99019 0.80372 0.91065
## Class: T Class: U Class: V Class: W Class: X Class: Y
## Sensitivity 0.64919 0.57895 0.36416 0.54369 0.74906 0.62048
## Specificity 0.98455 0.97741 0.99311 0.98435 0.99247 0.98363
## Pos Pred Value 0.60075 0.49677 0.72832 0.50679 0.79365 0.64780
## Neg Pred Value 0.98740 0.98368 0.96857 0.98648 0.99032 0.98162
## Prevalence 0.03458 0.03709 0.04824 0.02872 0.03723 0.04629
## Detection Rate 0.02245 0.02147 0.01757 0.01562 0.02789 0.02872
## Detection Prevalence 0.03737 0.04322 0.02412 0.03081 0.03514 0.04434
## Balanced Accuracy 0.81687 0.77818 0.67864 0.76402 0.87077 0.80205
CNN
Set up Tensorflow and Keras Framework in R
The CNN function that I will use comes from the Keras package that is native to Python. To access the functions needed to run the CNN in R, a python virtual environment needs to be estabished and the tensorflow and keras packages need to be installed in both R and the newly created virtual environment.
# path_to_python <- install_python()
# virtualenv_create("r-reticulate", python = path_to_python)
#virtualenv_install(envname = "r-reticulate", "html5lib")
#install_tensorflow(envname = "r-reticulate", pip_options = "--no-cache-dir")
#install_keras(envname = "r-reticulate", pip_options = "--no-cache-dir")
Prepare Data
To prepare the data for the CNN, I separated the data into two separate lists. In the first list, the training labels were converted to the python categorical data type. In the second list, pixel data was reshaped into a 28 x 28 matrix for each record. Testing data recieved the same procedure.
use_virtualenv("r-reticulate")
<- train[,-1]
train.cnn <- train[,1]
train.labels
<- 25
num.classes <- to_categorical(as.integer(train.labels), num.classes) train.labels
## Loaded Tensorflow version 2.9.3
<- test[,-1]
test.cnn <- test[,1] test.labels
<- Sys.time()
start_time
<- list()
train.cnn2
# Create an empty matrix to store image pixels
<- matrix(nrow = 28, ncol = 28)
im # Store data in matrix
for(k in 1:nrow(train.cnn)){
<- as.numeric(train.cnn[k, 1:784])
x <- 1
j for(i in 28:1){
<- x[j:(j+27)]
im[,i]
<- j+28
j
} <- im
train.cnn2[[k]]
}
<- Sys.time()
end_time - start_time end_time
## Time difference of 2.393476 mins
<- Sys.time()
start_time
<- list()
test.cnn2
# Create an empty matrix to store image pixels
<- matrix(nrow = 28, ncol = 28)
im # Store data in matrix
for(k in 1:nrow(test.cnn)){
<- as.numeric(test.cnn[k, 1:784])
x <- 1
j for(i in 28:1){
<- x[j:(j+27)]
im[,i]
<- j+28
j
} <- im
test.cnn2[[k]]
}
<- Sys.time()
end_time - start_time end_time
## Time difference of 37.57966 secs
Reshape Data
The CNN requires the structure of the data to be 4 dimensional (row x height, width, color channel). As the image is grayscale, that means there is only 1 color channel. The new data shapes are:
- Training - 27455 x 28 x 28 x 1
- Testing - 7172 x 28 x 28 x 1
use_virtualenv("r-reticulate")
<- array_reshape(train.cnn2,
train.cnn3 dim = c(nrow(train.cnn), 28, 28, 1)
)
<- array_reshape(test.cnn2,
test.cnn3 dim = c(nrow(test.cnn), 28, 28, 1)
)
Create CNN Structure
The CNN framework I used for modeling consists of the following structure:
- Convolutional Layer
- 32 filters
- kernel size of 4 x 4
- padding = same
- activation function of relu
- Max Pooling Layer
- pool size of 3 x 3
- Dropout Later
- Rate of 0.1
- Convolutional Layer
- 32 filters
- kernel size of 4 x 4
- padding = same
- activation function of relu
- Max Pooling Layer
- pool size of 3 x 3
- Dropout Later
- Rate of 0.1
- Convolutional Layer
- 32 filters
- kernel size of 4 x 4
- padding = same
- activation function of relu
- Max Pooling Layer
- pool size of 3 x 3
- Flatten Layer
- Dense Layer
- 16 neurons
- activation function of relu
- Dense Layer (Output Layer)
- 25 neurons
- softmax activation
::tf$random$set_seed(12345)
tensorflow
<- keras_model_sequential(name = "CNN_Model") %>%
model
layer_conv_2d(filters = 32,
kernel_size = c(4,4),
padding = "same", activation = "relu",
input_shape = c(28, 28, 1)
%>%
)
layer_max_pooling_2d(pool_size = c(3,3)) %>%
layer_dropout(rate = 0.1) %>%
layer_conv_2d(filters = 32,
kernel_size = c(4,4),
padding = "same", activation = "relu",
input_shape = c(28, 28, 1)
%>%
)
layer_max_pooling_2d(pool_size = c(3,3)) %>%
layer_dropout(rate = 0.1) %>%
layer_conv_2d(filters = 32,
kernel_size = c(4,4),
padding = "same", activation = "relu",
input_shape = c(28, 28, 1)
%>%
)
layer_max_pooling_2d(pool_size = c(3,3)) %>%
layer_flatten() %>%
layer_dense(units = 16,
activation = "relu") %>%
layer_dense(units = 25,
activation = "softmax",
name = "Output"
)
model
## Model: "CNN_Model"
## ________________________________________________________________________________
## Layer (type) Output Shape Param #
## ================================================================================
## conv2d_2 (Conv2D) (None, 28, 28, 32) 544
## max_pooling2d_2 (MaxPooling2D) (None, 9, 9, 32) 0
## dropout_1 (Dropout) (None, 9, 9, 32) 0
## conv2d_1 (Conv2D) (None, 9, 9, 32) 16416
## max_pooling2d_1 (MaxPooling2D) (None, 3, 3, 32) 0
## dropout (Dropout) (None, 3, 3, 32) 0
## conv2d (Conv2D) (None, 3, 3, 32) 16416
## max_pooling2d (MaxPooling2D) (None, 1, 1, 32) 0
## flatten (Flatten) (None, 32) 0
## dense (Dense) (None, 16) 528
## Output (Dense) (None, 25) 425
## ================================================================================
## Total params: 34,329
## Trainable params: 34,329
## Non-trainable params: 0
## ________________________________________________________________________________
Train Model
The model was trained to optimize accuracy and minimize categorical crossentropy. The following hyperparameters were used to train the model:
- epochs = 10
- batch_size = 32
- vaidation_splot = 0.1
<- Sys.time()
start_time
%>%
model compile(loss = "categorical_crossentropy",
optimizer = optimizer_adam(learning_rate = 0.001),
metrics = "accuracy"
)
<- model %>%
train_history fit(x = train.cnn3,
y = train.labels,
epochs = 10,
batch_size = 32,
validation_split = 0.1,
verbose = 2
)
<- Sys.time()
end_time - start_time end_time
## Time difference of 2.400791 mins
plot(train_history)
CNN Testing Data Predictions
The CNN model produced a testing set accuracy of 85.63%. This accuracy is deceiving as the model correctly classified almost all classes perfectly apart from 3 letters: M, N, and O. The algorithm classified 0 values as M, N labels as O, and O labels as M and N.
<- model %>% predict(test.cnn3) %>% k_argmax()
pred.cnn
<- as.factor(as.numeric(pred.cnn))
pred.factor
levels(pred.factor) <- list(A="1", B="2", C="3",
D="4", E="5", F="6",
G="7", H="8", I="9",
K="10", L="11", M="12",
N="12", O="13", P="15",
Q="16", R="17", S="18",
T="19", U="20", V="21",
W="22", X="23", Y="24")
<- confusionMatrix(data = pred.factor, reference = test$label)
cm2
cm2
## Confusion Matrix and Statistics
##
## Reference
## Prediction A B C D E F G H I K L M N O P Q R
## A 331 0 0 0 0 0 0 0 0 0 0 0 1 20 0 0 0
## B 0 412 0 0 0 0 0 0 0 0 0 0 37 0 0 0 0
## C 0 0 310 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## D 0 0 0 220 0 0 13 0 0 0 0 0 0 0 0 0 0
## E 0 0 0 18 494 0 0 0 0 0 0 0 0 0 0 0 0
## F 0 0 0 0 0 247 0 0 0 0 0 0 0 0 0 0 0
## G 0 0 0 0 0 0 267 23 0 0 0 0 0 0 0 0 0
## H 0 0 0 0 0 0 21 413 0 0 0 0 0 0 0 0 0
## I 0 0 0 0 0 0 0 0 272 0 0 6 0 0 0 0 0
## K 0 20 0 0 0 0 0 0 0 329 0 0 0 0 0 0 19
## L 0 0 0 0 0 0 0 0 0 0 209 0 0 0 0 0 1
## M 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## N 0 0 0 0 0 0 0 0 0 0 0 326 12 0 0 0 0
## O 0 0 0 0 0 0 0 0 0 0 0 27 231 0 0 0 0
## P 0 0 0 0 0 0 0 0 0 0 0 0 0 0 347 0 0
## Q 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 164 0
## R 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 124
## S 0 0 0 0 4 0 20 0 0 1 0 35 10 0 0 0 0
## T 0 0 0 6 0 0 26 0 0 0 0 0 0 0 0 0 0
## U 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## V 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## W 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## X 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
## Y 0 0 0 0 0 0 0 0 16 0 0 0 0 0 0 0 0
## Reference
## Prediction S T U V W X Y
## A 0 0 0 0 0 0 0
## B 0 0 8 0 0 0 0
## C 0 0 0 0 0 0 0
## D 0 0 0 0 0 0 0
## E 0 0 0 0 0 0 0
## F 0 0 0 0 0 0 0
## G 0 0 0 0 0 0 0
## H 0 0 0 0 0 0 0
## I 11 0 0 0 0 0 33
## K 0 0 20 0 0 0 8
## L 0 0 0 0 0 0 0
## M 0 0 0 0 0 0 0
## N 1 0 0 0 0 0 0
## O 0 0 0 0 0 0 0
## P 0 0 0 0 0 0 0
## Q 0 0 0 0 0 0 0
## R 0 0 4 0 0 0 0
## S 234 0 0 0 0 0 0
## T 0 227 0 0 0 7 0
## U 0 0 234 0 0 0 0
## V 0 0 0 346 2 0 0
## W 0 0 0 0 204 0 0
## X 0 21 0 0 0 260 0
## Y 0 0 0 0 0 0 291
##
## Overall Statistics
##
## Accuracy : 0.8563
## 95% CI : (0.8478, 0.8645)
## No Information Rate : 0.0717
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.8496
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: A Class: B Class: C Class: D Class: E Class: F
## Sensitivity 1.00000 0.95370 1.00000 0.90164 0.99197 1.00000
## Specificity 0.99682 0.99309 1.00000 0.99806 0.99721 1.00000
## Pos Pred Value 0.94034 0.90153 1.00000 0.94421 0.96484 1.00000
## Neg Pred Value 1.00000 0.99692 1.00000 0.99642 0.99938 1.00000
## Prevalence 0.04766 0.06220 0.04464 0.03513 0.07171 0.03557
## Detection Rate 0.04766 0.05932 0.04464 0.03168 0.07113 0.03557
## Detection Prevalence 0.05068 0.06580 0.04464 0.03355 0.07372 0.03557
## Balanced Accuracy 0.99841 0.97340 1.00000 0.94985 0.99459 1.00000
## Class: G Class: H Class: I Class: K Class: L Class: M
## Sensitivity 0.76724 0.94725 0.94444 0.99396 1.00000 0.00000
## Specificity 0.99651 0.99677 0.99249 0.98987 0.99985 1.00000
## Pos Pred Value 0.92069 0.95161 0.84472 0.83081 0.99524 NaN
## Neg Pred Value 0.98783 0.99647 0.99758 0.99969 1.00000 0.94327
## Prevalence 0.05011 0.06278 0.04147 0.04766 0.03009 0.05673
## Detection Rate 0.03844 0.05947 0.03916 0.04737 0.03009 0.00000
## Detection Prevalence 0.04176 0.06249 0.04636 0.05702 0.03024 0.00000
## Balanced Accuracy 0.88188 0.97201 0.96847 0.99191 0.99993 0.50000
## Class: N Class: O Class: P Class: Q Class: R Class: S
## Sensitivity 0.041237 0.00000 1.00000 1.00000 0.86111 0.95122
## Specificity 0.950857 0.96274 1.00000 1.00000 0.99926 0.98955
## Pos Pred Value 0.035398 0.00000 1.00000 1.00000 0.96124 0.76974
## Neg Pred Value 0.957766 0.99701 1.00000 1.00000 0.99707 0.99819
## Prevalence 0.041901 0.00288 0.04996 0.02361 0.02073 0.03542
## Detection Rate 0.001728 0.00000 0.04996 0.02361 0.01785 0.03369
## Detection Prevalence 0.048812 0.03715 0.04996 0.02361 0.01857 0.04377
## Balanced Accuracy 0.496047 0.48137 1.00000 1.00000 0.93019 0.97039
## Class: T Class: U Class: V Class: W Class: X Class: Y
## Sensitivity 0.91532 0.87970 1.00000 0.99029 0.97378 0.8765
## Specificity 0.99418 1.00000 0.99970 1.00000 0.99671 0.9976
## Pos Pred Value 0.85338 1.00000 0.99425 1.00000 0.92199 0.9479
## Neg Pred Value 0.99686 0.99523 1.00000 0.99970 0.99895 0.9938
## Prevalence 0.03571 0.03830 0.04982 0.02966 0.03844 0.0478
## Detection Rate 0.03269 0.03369 0.04982 0.02937 0.03744 0.0419
## Detection Prevalence 0.03830 0.03369 0.05011 0.02937 0.04060 0.0442
## Balanced Accuracy 0.95475 0.93985 0.99985 0.99515 0.98524 0.9370
Which Performed Better? (XG Boost vs CNN)
Each algorithm produced a respectable accuracy metric being well above the baseline of random chance. CNN was able to attain a much higher accuracy for most classes, but failed to classify the letters M, N, and O properly. The XG Boost algorithm showcased the ability to successfully identify all classes with a degree of accuracy. I would consider the XG Boost algorithm to be superior to the CNN algorithm as the CNN algorithm cannot identify all classes.
Conclusion
Using techniques learned throughout the semester, I achieved two models that could classify images of American Sign Language letters. It was found that the XG Boost algorithm was more reliable than its CNN counterpart after analyzing the model results. The inclusion of synthesized/distorted images could have improved model performance further and may have fixed the issues with the CNN model.
A classification model of sign language letters could be applied to a real time application that would convert sign language letters to text. This type of application would provide an alternative way to communicate with someone who is deaf or hard of hearing.