American Sign Language Image Classification

library(tidyverse)
library(tensorflow)
library(openssl)
library(reticulate)
library(keras)
library(nnet)
library(plotly)
library(rpart.plot)
library(caret)

Objective

The goal of this project is to classify images using two different machine learning algorithms learned throughout the semester, specifically one methodology studied in weeks 1-10, and one methodology from weeks 11-15. I chose to model the data using XG boost, learned in week 7, and convolutional neural network, learned in week 14. Each algorithms was analyzed in terms of computational speed and accuracy to determine which algorithm is better for classifying images of American Sign Language letters.

Data Source

The data source used for this assignment is the ‘Sign Language MNIST’ dataset on Kaggle.com (https://www.kaggle.com/datasets/datamunge/sign-language-mnist?select=sign_mnist_train). The images within this dataset represent 24 of the 26 American Sign Language letters. Both the letters ‘J’ and ‘Z’ were excluded from this analysis as they include motion to sign the letter which is not possible to show in a still image. This data source contains separate .csv files for training and testing the model. The columns of the files relate to the grayscale pixel value of a 28 x 28 pixel image as well as a column for the label.

Read in Data

The training and testing datasets were stored in memory. I releveled the labels of the target variable as the numerical location of the letter was being used to represent the label instead of using the letter.

train <- read.csv("sign_mnist_train.csv")
test <- read.csv("sign_mnist_test.csv")
train$label <- as.factor(train$label)

levels(train$label) <- list(A="0", B="1", C="2",
                           D="3", E="4", F="5",
                            G="6", H="7", I="8",
                            K="10", L="11", M="12",
                            N="13", O="14", P="15",
                            Q="16", R="17", S="18",
                            T="19", U="20", V="21",
                            W="22", X="23", Y="24")

test$label <- as.factor(test$label)

levels(test$label) <- list(A="0", B="1", C="2",
                           D="3", E="4", F="5",
                            G="6", H="7", I="8",
                            K="10", L="11", M="12",
                            N="13", O="14", P="15",
                            Q="16", R="17", S="18",
                            T="19", U="20", V="21",
                            W="22", X="23", Y="24")

Glimpse at Data

Looking at the summary statistics of the first and last 10 columns of data:

  • The pixel data stored in the columns ranges from 0 to 255
  • There are 784 different columns related to pixel location, meaning that the data can be rearanged into a 28 x 28 matrix
summary(train[,1:10])
##      label           pixel1          pixel2          pixel3     
##  R      : 1294   Min.   :  0.0   Min.   :  0.0   Min.   :  0.0  
##  Q      : 1279   1st Qu.:121.0   1st Qu.:126.0   1st Qu.:130.0  
##  L      : 1241   Median :150.0   Median :153.0   Median :156.0  
##  W      : 1225   Mean   :145.4   Mean   :148.5   Mean   :151.2  
##  F      : 1204   3rd Qu.:174.0   3rd Qu.:176.0   3rd Qu.:178.0  
##  S      : 1199   Max.   :255.0   Max.   :255.0   Max.   :255.0  
##  (Other):20013                                                  
##      pixel4          pixel5          pixel6          pixel7     
##  Min.   :  0.0   Min.   :  0.0   Min.   :  0.0   Min.   :  0.0  
##  1st Qu.:133.0   1st Qu.:137.0   1st Qu.:140.0   1st Qu.:142.0  
##  Median :158.0   Median :160.0   Median :162.0   Median :164.0  
##  Mean   :153.5   Mean   :156.2   Mean   :158.4   Mean   :160.5  
##  3rd Qu.:179.0   3rd Qu.:181.0   3rd Qu.:182.0   3rd Qu.:183.0  
##  Max.   :255.0   Max.   :255.0   Max.   :255.0   Max.   :255.0  
##                                                                 
##      pixel8          pixel9   
##  Min.   :  0.0   Min.   :  0  
##  1st Qu.:144.0   1st Qu.:146  
##  Median :165.0   Median :166  
##  Mean   :162.3   Mean   :164  
##  3rd Qu.:184.0   3rd Qu.:185  
##  Max.   :255.0   Max.   :255  
## 
summary(train[,776:785])
##     pixel775        pixel776        pixel777        pixel778        pixel779  
##  Min.   :  0.0   Min.   :  0.0   Min.   :  0.0   Min.   :  0.0   Min.   :  0  
##  1st Qu.: 92.0   1st Qu.: 96.0   1st Qu.:103.0   1st Qu.:112.0   1st Qu.:120  
##  Median :144.0   Median :162.0   Median :172.0   Median :180.0   Median :183  
##  Mean   :141.1   Mean   :147.5   Mean   :153.3   Mean   :159.1   Mean   :162  
##  3rd Qu.:196.0   3rd Qu.:202.0   3rd Qu.:205.0   3rd Qu.:207.0   3rd Qu.:208  
##  Max.   :255.0   Max.   :255.0   Max.   :255.0   Max.   :255.0   Max.   :255  
##     pixel780        pixel781        pixel782      pixel783        pixel784    
##  Min.   :  0.0   Min.   :  0.0   Min.   :  0   Min.   :  0.0   Min.   :  0.0  
##  1st Qu.:125.0   1st Qu.:128.0   1st Qu.:128   1st Qu.:128.0   1st Qu.:125.5  
##  Median :184.0   Median :184.0   Median :182   Median :182.0   Median :182.0  
##  Mean   :162.7   Mean   :162.9   Mean   :162   Mean   :161.1   Mean   :159.8  
##  3rd Qu.:207.0   3rd Qu.:207.0   3rd Qu.:206   3rd Qu.:204.0   3rd Qu.:204.0  
##  Max.   :255.0   Max.   :255.0   Max.   :255   Max.   :255.0   Max.   :255.0
paste0("There are ", nrow(train), " rows in the training data.")
## [1] "There are 27455 rows in the training data."
paste0("There are ", ncol(train), " columns in the training data.")
## [1] "There are 785 columns in the training data."

Normalize Data

It is important to normalize pixel intensity to make sure all input parameters have a similar data distribution. The normalization will also reduce the computational load for future calculations.

for(i in 2:ncol(train)){
  min.max.scale <- train[,i] / 255
  
  train[,i] <- min.max.scale
}

for(i in 2:ncol(test)){
  min.max.scale2 <- test[,i] / 255
  
  test[,i] <- min.max.scale2
}

Explore Data

Frequency of Labels

Having a near equal distribution of labeled data is imperative when creating a machine learning model. A balanced dataset will give the algorithm a “fighting chance” to classify all classes successfully.

Training Data

The training data appears have a fairly even distribution across most classes. R is the most common class and E is the least common class with a 1.2% difference in the volume of records.

letter <- c("A","B","C","D","E","F",
            "G","H","I","K","L","M",
            "N","O","P","Q","R","S",
            "T","U","V","W","X","Y")
proportions <- as.vector(prop.table(table(train$label)))

fig <- plot_ly(
  x = letter,
  y = proportions,
  name = "Distribution",
  type = "bar",
  marker = list(color = "#ED7171",
                line = list(color = '#910E0E',
                                  width = 1.5)))

fig %>% layout(title = "Frequency of Letter in Training Data",
               xaxis = list(categoryorder = "total descending"))

Testing Data

All classes in the testing set contain over 2% of the records in the testing set (approx. 143 records). This should serve as a good dataset for prediction as all classes are represented and have many examples.

letter <- c("A","B","C","D","E","F",
            "G","H","I","K","L","M",
            "N","O","P","Q","R","S",
            "T","U","V","W","X","Y")
proportions <- as.vector(prop.table(table(test$label)))

fig <- plot_ly(
  x = letter,
  y = proportions,
  name = "Distribution",
  type = "bar",
  marker = list(color = "#545BC1",
                line = list(color = '#050B5C',
                                  width = 1.5)))

fig %>% layout(title = "Frequency of Letter in Testing Data",
               xaxis = list(categoryorder = "total descending"))

Explore Images

The data needs to be parsed and stored into a 28 x 28 matrix in order to visualize the image. The function that I created below visualizes the images. The function was used to showcase all of the classes in the dataset. There are similarities between a handful of the American Sign Language letters such as ‘M’ and ‘N’ both are expressed through a closed fist or ‘I’ and ‘Y’ both use an extended pinky finger. This may prove difficult for modeling.

# function to plot images and provide the number the image is representing  
plot.image <- function(dataset, row.index){

# Obtain and store all pixels in a single row
x <- as.numeric(dataset[row.index, 2:785])

# Create an empty matrix to store image pixels
im <- matrix(nrow = 28, ncol = 28)

# Store data in matrix
j <- 1
for(i in 28:1){

  im[,i] <- x[j:(j+27)]

  j <- j+28

}  

# Plot the image
image(x = 1:28, 
      y = 1:28, 
      z = im, 
      col=gray((0:255)/255), 
      main = paste0("Image: ", row.index, " Label: ", dataset$label[row.index]))
}
par(mfrow = c(2, 2))

plot.image(train, 47)
plot.image(train, 30)
plot.image(train, 3)
plot.image(train, 1)

par(mfrow = c(2, 2))

plot.image(train, 45)
plot.image(train, 49)
plot.image(train, 2)
plot.image(train, 50)

par(mfrow = c(2, 2))

plot.image(train, 7)
plot.image(train, 12)
plot.image(train, 41)
plot.image(train, 32)

par(mfrow = c(2, 2))

plot.image(train, 5)
plot.image(train, 62)
plot.image(train, 42)
plot.image(train, 6)

par(mfrow = c(2, 2))

plot.image(train, 17)
plot.image(train, 11)
plot.image(train, 20)
plot.image(train, 15)

par(mfrow = c(2, 2))

plot.image(train, 22)
plot.image(train, 8)
plot.image(train, 24)
plot.image(train, 27)

Machine Learning

XG Boost

Feature Engineering

Create New Features

I created two new features to be used in the XG boost algorithm: rowSum and rowVariance. The rowSum and rowVariance variables store the sum and variance of the grayscale pixel values, respectfully.

train.xgb <- train
test.xgb <- test

train.xgb$rowSum <- rowSums(train.xgb[, -1])
train.xgb$rowVariance <- rowSums((train.xgb[, -1] - rowMeans(train.xgb[, -1]))^2)/(dim(train.xgb)[2] - 1)

test.xgb$rowSum <- rowSums(test[, -1])
test.xgb$rowVariance <- rowSums((test[, -1] - rowMeans(test[, -1]))^2)/(dim(test)[2] - 1)

Dimensionality Reduction via PCA

PCA was used to reduce the dimensionality of the data. The variance captured through the Principal Components seems to converge at 0.962 within the first 119 Principal Components. The XG boost model will consist of the first 119 Principal Components. PCA effectively reduced the feature size from 786 to 119.

pca.train <- prcomp(train.xgb[,-1],
                   center = TRUE,
                   scale. = TRUE)

pca.test <- predict(pca.train, newdata = test.xgb)
train1 <- as.data.frame(pca.train$x)
test1 <- as.data.frame(pca.test)

train1$label <- train$label
test1$label <- test$label
# Calculate Variance
a <- round(pca.train $sdev^2/ sum(pca.train $sdev^2),3)

variance <- cumsum(a)

variance[1:150]
##   [1] 0.359 0.442 0.512 0.561 0.598 0.624 0.649 0.669 0.687 0.703 0.717 0.730
##  [13] 0.741 0.751 0.760 0.768 0.775 0.782 0.789 0.795 0.801 0.806 0.811 0.816
##  [25] 0.820 0.824 0.828 0.832 0.835 0.838 0.841 0.844 0.847 0.850 0.853 0.856
##  [37] 0.859 0.862 0.865 0.867 0.869 0.871 0.873 0.875 0.877 0.879 0.881 0.883
##  [49] 0.885 0.887 0.889 0.891 0.893 0.895 0.897 0.899 0.900 0.901 0.902 0.903
##  [61] 0.904 0.905 0.906 0.907 0.908 0.909 0.910 0.911 0.912 0.913 0.914 0.915
##  [73] 0.916 0.917 0.918 0.919 0.920 0.921 0.922 0.923 0.924 0.925 0.926 0.927
##  [85] 0.928 0.929 0.930 0.931 0.932 0.933 0.934 0.935 0.936 0.937 0.938 0.939
##  [97] 0.940 0.941 0.942 0.943 0.944 0.945 0.946 0.947 0.948 0.949 0.950 0.951
## [109] 0.952 0.953 0.954 0.955 0.956 0.957 0.958 0.959 0.960 0.961 0.962 0.962
## [121] 0.962 0.962 0.962 0.962 0.962 0.962 0.962 0.962 0.962 0.962 0.962 0.962
## [133] 0.962 0.962 0.962 0.962 0.962 0.962 0.962 0.962 0.962 0.962 0.962 0.962
## [145] 0.962 0.962 0.962 0.962 0.962 0.962

Model XG Boost Algorithm

I elected to use the XG boost algorithm with 119 features (PC1 to PC119) to classify the images. The model hyperparameters are:

  • nrounds = 100 (Number of Trees)
  • max_depth = 3 (Maximum tree depth)
  • eta = 0.3 (Learning Rate)
  • gamma = 0 (Regularization Tuning)
  • colsample_bytree = 0.8 (Column sampling)
  • min_child_weight = 1 (Minimum leaf weight)
  • subsample = 0.5 (Row sampling)
set.seed(12345)

start_time <- Sys.time()

hyperparameters <- expand.grid(
  nrounds = 100,
  max_depth = 3,
  eta = 0.3,
  gamma = 0,
  colsample_bytree = 0.8,
  min_child_weight = 1,
  subsample = 0.5
)

trctrl <- trainControl(method = "cv",
                       number = 4) 

xgboost <- train(label ~ PC1 + PC2 + PC3 + PC4 + PC5 + PC6 +PC7 + PC8 + PC9 + PC10 + PC11 + PC12 + PC13 + PC14 + PC15 + PC16 + PC17 + PC18 + PC19 + PC20 + PC21 + PC22 + PC23 + PC24 + PC25 + PC26 + PC27 + PC28 + PC29 + PC30 + PC31 + PC32 + PC33 + PC34 + PC35 + PC36 + PC37 + PC38 + PC39 + PC40 + PC41 + PC42 + PC43 + PC44 + PC45 + PC46 + PC47 + PC48 + PC49 + PC50 + PC51 + PC52 + PC53 + PC54 + PC55 + PC56 + PC57 + PC58 + PC59 + PC60 + PC61 + PC62 + PC63 + PC64 + PC65 + PC66 + PC67 + PC68 + PC69 + PC70 + PC71 + PC72 + PC73 + PC74 + PC75 + PC76 + PC77 + PC78 + PC79 + PC80 + PC81 + PC82 + PC83 + PC84 + PC85 + PC86 + PC87 + PC88 + PC89 + PC90 + PC91 + PC92 + PC93 + PC94 + PC95 + PC96 + PC97 + PC98 + PC99 + PC100 + PC101 + PC102 + PC103 + PC104 + PC105 + PC106 + PC107 + PC108 + PC109 + PC110 + PC111 + PC112 + PC113 + PC114 + PC115 + PC116 + PC117 + PC118 + PC119,
                method = "xgbTree",
                trControl = trctrl,
                tuneGrid = hyperparameters,
                data = train1)

end_time <- Sys.time()
end_time - start_time
## Time difference of 15.29524 mins
# set.seed(12345)
# 
# start_time <- Sys.time()
# 
# trctrl <- trainControl(method = "cv",
#                        number = 4) 
# 
# forest <- train(label ~ PC1 + PC2 + PC3 + PC4 + PC5 + PC6 +PC7 + PC8 + PC9 + PC10 + PC11 + PC12 + PC13 + PC14 + PC15 + PC16 + PC17 + PC18 + PC19 + PC20 + PC21 + PC22 + PC23 + PC24 + PC25 + PC26 + PC27 + PC28 + PC29 + PC30,
#                 method = "xgbTree",
#                 trControl = trctrl,
#                 data = train1)
# 
# end_time <- Sys.time()
# end_time - start_time

Model Evaluation

The model performed well with an overall accuracy of 74.48%. This model is much more effective than guessing by random chance, as the random chance percentage is 1/24 or 4.167%. This model has the capacity to classify all classes with some level of accuracy, though some classes achieved better results than others. For example, C received a near perfect prediction rate on the testing data where as N was less successful in terms of prediction.

pred <- predict(xgboost, newdata = test1)

cm1 <- confusionMatrix(data = pred, reference = test1$label)

cm1
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   A   B   C   D   E   F   G   H   I   K   L   M   N   O   P   Q   R
##          A 319   0   0   0   0   0   0   0   8   0   0  11  21   0   0   0   0
##          B   0 361   0   0   0   0   0   0  19   0   0   0   0   0   0   0   0
##          C   0   0 307   0   0   0   0   0   0   0   0   0   0   5   0   0   0
##          D   0  18   0 225   0   0   1   0   6   0   0   8   5   0   0   0   0
##          E   0   2   0   0 419   0   0   1   0   0   0  29  37   0   0   1   0
##          F   0   0   0   1   0 217   0   0   0  22   0   0   0  12   0   0   0
##          G   7   0   0   0   0   0 250  40   0   0   0   0   0   5   0   0   0
##          H   0   0   0   0   0   0  27 383  16   0   0   0   0   0   0   0   0
##          I   0   0   0   0   0   0   2   0 149   0   0   0   0   1   0   0   0
##          K   0  38   0   0   0   9   0   0   3 204   0   0   0   0   0   0   2
##          L   0   0   0   0   0  21   0   0   0   2 209   0   0   0   0   0  13
##          M   0   0   0   0  21   0   8   0   0   1   0 224  15   2   0   0   0
##          N   0   0   0   0   0   0   0   0   0   4   0  25 133   0   0   0   1
##          O   0   0   2   0   0   0   3   1   0   0   0  11  25 184   0   0   0
##          P   0   0   0   0   0   0  12   0   0   0   0   0   0   0 336   0   0
##          Q   5   0   0   0   0   0   0   0   0   0   0   9  31   0   7 162   0
##          R   0   8   0   1   0   0   0   0   1  36   0   0   0   0   0   0  92
##          S   0   0   0   0  58   0   0   0   1  15   0  58  16   1   0   1   7
##          T   0   0   0   0   0   0  23   5   0   0   0   0   8  24   0   0   8
##          U   0   0   0  16   0   0   0   0  13   3   0   0   0  12   0   0  19
##          V   0   0   0   2   0   0  19   0   3   8   0   4   0   0   0   0   0
##          W   0   5   0   0   0   0   0   0   0  11   0   0   0   0   0   0   0
##          X   0   0   0   0   0   0   3   6   0   0   0   0   0   0   4   0   2
##          Y   0   0   1   0   0   0   0   0  69  25   0  15   0   0   0   0   0
##           Reference
## Prediction   S   T   U   V   W   X   Y
##          A  11   0   0   0   0   0   0
##          B   0   0  39   0   0   0   1
##          C   0   0   0   0   0   0   0
##          D   0   0   2   1   0   0   0
##          E   2   0   0   0   0   0   0
##          F   0   0   0  20   0   0   1
##          G   0   0   0   0   0   0   0
##          H   0   3   0   0   0   0   0
##          I  21  12   0   0   0   0   2
##          K   0   0  32  40  20   0   8
##          L   0  15  15   0   0   0   0
##          M   3   0   0   0   0   0   0
##          N   0   0   0   0   0   0   0
##          O   0   3   0   0   0   0   0
##          P   0   3   0   0   0   0   0
##          Q   0   0   0   0   0   0   0
##          R   0   4  21  45  47   4  54
##          S 209   0   0   0   0  24  15
##          T   0 161   1   4   0  21  13
##          U   0   9 154  62  20   0   2
##          V   0   1   0 126   7   2   1
##          W   0   0   0  48 112  16  29
##          X   0  37   0   0   0 200   0
##          Y   0   0   2   0   0   0 206
## 
## Overall Statistics
##                                           
##                Accuracy : 0.7448          
##                  95% CI : (0.7346, 0.7549)
##     No Information Rate : 0.0694          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.7332          
##                                           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: A Class: B Class: C Class: D Class: E Class: F
## Sensitivity           0.96375  0.83565  0.99032  0.91837  0.84137  0.87854
## Specificity           0.99254  0.99125  0.99927  0.99408  0.98921  0.99191
## Pos Pred Value        0.86216  0.85952  0.98397  0.84586  0.85336  0.79487
## Neg Pred Value        0.99824  0.98948  0.99956  0.99710  0.98818  0.99565
## Prevalence            0.04615  0.06023  0.04322  0.03416  0.06944  0.03444
## Detection Rate        0.04448  0.05033  0.04281  0.03137  0.05842  0.03026
## Detection Prevalence  0.05159  0.05856  0.04350  0.03709  0.06846  0.03806
## Balanced Accuracy     0.97815  0.91345  0.99480  0.95622  0.91529  0.93523
##                      Class: G Class: H Class: I Class: K Class: L Class: M
## Sensitivity           0.71839  0.87844  0.51736  0.61631  1.00000  0.56853
## Specificity           0.99238  0.99317  0.99448  0.97778  0.99052  0.99262
## Pos Pred Value        0.82781  0.89277  0.79679  0.57303  0.76000  0.81752
## Neg Pred Value        0.98574  0.99214  0.98010  0.98137  1.00000  0.97536
## Prevalence            0.04852  0.06079  0.04016  0.04615  0.02914  0.05494
## Detection Rate        0.03486  0.05340  0.02078  0.02844  0.02914  0.03123
## Detection Prevalence  0.04211  0.05982  0.02607  0.04964  0.03834  0.03820
## Balanced Accuracy     0.85539  0.93581  0.75592  0.79705  0.99526  0.78058
##                      Class: N Class: O Class: P Class: Q Class: R Class: S
## Sensitivity           0.45704  0.74797  0.96830  0.98780  0.63889  0.84959
## Specificity           0.99564  0.99350  0.99780  0.99258  0.96855  0.97170
## Pos Pred Value        0.81595  0.80349  0.95726  0.75701  0.29393  0.51605
## Neg Pred Value        0.97746  0.99107  0.99839  0.99971  0.99242  0.99453
## Prevalence            0.04057  0.03430  0.04838  0.02287  0.02008  0.03430
## Detection Rate        0.01854  0.02566  0.04685  0.02259  0.01283  0.02914
## Detection Prevalence  0.02273  0.03193  0.04894  0.02984  0.04364  0.05647
## Balanced Accuracy     0.72634  0.87074  0.98305  0.99019  0.80372  0.91065
##                      Class: T Class: U Class: V Class: W Class: X Class: Y
## Sensitivity           0.64919  0.57895  0.36416  0.54369  0.74906  0.62048
## Specificity           0.98455  0.97741  0.99311  0.98435  0.99247  0.98363
## Pos Pred Value        0.60075  0.49677  0.72832  0.50679  0.79365  0.64780
## Neg Pred Value        0.98740  0.98368  0.96857  0.98648  0.99032  0.98162
## Prevalence            0.03458  0.03709  0.04824  0.02872  0.03723  0.04629
## Detection Rate        0.02245  0.02147  0.01757  0.01562  0.02789  0.02872
## Detection Prevalence  0.03737  0.04322  0.02412  0.03081  0.03514  0.04434
## Balanced Accuracy     0.81687  0.77818  0.67864  0.76402  0.87077  0.80205

CNN

Set up Tensorflow and Keras Framework in R

The CNN function that I will use comes from the Keras package that is native to Python. To access the functions needed to run the CNN in R, a python virtual environment needs to be estabished and the tensorflow and keras packages need to be installed in both R and the newly created virtual environment.

# path_to_python <- install_python()
# virtualenv_create("r-reticulate", python = path_to_python)
#virtualenv_install(envname = "r-reticulate", "html5lib")
#install_tensorflow(envname = "r-reticulate", pip_options = "--no-cache-dir")
#install_keras(envname = "r-reticulate", pip_options = "--no-cache-dir")

Prepare Data

To prepare the data for the CNN, I separated the data into two separate lists. In the first list, the training labels were converted to the python categorical data type. In the second list, pixel data was reshaped into a 28 x 28 matrix for each record. Testing data recieved the same procedure.

use_virtualenv("r-reticulate")
train.cnn <- train[,-1]
train.labels <- train[,1]

num.classes <- 25
train.labels <- to_categorical(as.integer(train.labels), num.classes)
## Loaded Tensorflow version 2.9.3
test.cnn <- test[,-1]
test.labels <- test[,1]
start_time <- Sys.time()


train.cnn2 <- list()

# Create an empty matrix to store image pixels

im <- matrix(nrow = 28, ncol = 28)
# Store data in matrix


for(k in 1:nrow(train.cnn)){
  x <- as.numeric(train.cnn[k, 1:784])
  j <- 1
  for(i in 28:1){
  
  
  
  im[,i] <- x[j:(j+27)]
  
  j <- j+28
  
  }  
  train.cnn2[[k]] <- im
}

end_time <- Sys.time()
end_time - start_time
## Time difference of 2.393476 mins
start_time <- Sys.time()


test.cnn2 <- list()

# Create an empty matrix to store image pixels

im <- matrix(nrow = 28, ncol = 28)
# Store data in matrix


for(k in 1:nrow(test.cnn)){
  x <- as.numeric(test.cnn[k, 1:784])
  j <- 1
  for(i in 28:1){
  
  
  
  im[,i] <- x[j:(j+27)]
  
  j <- j+28
  
  }  
  test.cnn2[[k]] <- im
}

end_time <- Sys.time()
end_time - start_time
## Time difference of 37.57966 secs

Reshape Data

The CNN requires the structure of the data to be 4 dimensional (row x height, width, color channel). As the image is grayscale, that means there is only 1 color channel. The new data shapes are:

  • Training - 27455 x 28 x 28 x 1
  • Testing - 7172 x 28 x 28 x 1
use_virtualenv("r-reticulate")
train.cnn3 <- array_reshape(train.cnn2, 
                               dim = c(nrow(train.cnn), 28, 28, 1)
                               )

test.cnn3 <- array_reshape(test.cnn2, 
                               dim = c(nrow(test.cnn), 28, 28, 1)
                               )

Create CNN Structure

The CNN framework I used for modeling consists of the following structure:

  • Convolutional Layer
    • 32 filters
    • kernel size of 4 x 4
    • padding = same
    • activation function of relu
  • Max Pooling Layer
    • pool size of 3 x 3
  • Dropout Later
    • Rate of 0.1
  • Convolutional Layer
    • 32 filters
    • kernel size of 4 x 4
    • padding = same
    • activation function of relu
  • Max Pooling Layer
    • pool size of 3 x 3
  • Dropout Later
    • Rate of 0.1
  • Convolutional Layer
    • 32 filters
    • kernel size of 4 x 4
    • padding = same
    • activation function of relu
  • Max Pooling Layer
    • pool size of 3 x 3
  • Flatten Layer
  • Dense Layer
    • 16 neurons
    • activation function of relu
  • Dense Layer (Output Layer)
    • 25 neurons
    • softmax activation
tensorflow::tf$random$set_seed(12345)

model <- keras_model_sequential(name = "CNN_Model") %>% 
  
  layer_conv_2d(filters = 32, 
                kernel_size = c(4,4), 
                padding = "same", activation = "relu",
                input_shape = c(28, 28, 1)
                ) %>% 
  
  layer_max_pooling_2d(pool_size = c(3,3)) %>% 
  
  layer_dropout(rate = 0.1) %>%
  
  layer_conv_2d(filters = 32, 
                kernel_size = c(4,4), 
                padding = "same", activation = "relu",
                input_shape = c(28, 28, 1)
                ) %>% 
  
  layer_max_pooling_2d(pool_size = c(3,3)) %>% 
  
  layer_dropout(rate = 0.1) %>%
  
  layer_conv_2d(filters = 32, 
                kernel_size = c(4,4), 
                padding = "same", activation = "relu",
                input_shape = c(28, 28, 1)
                ) %>% 
  
  layer_max_pooling_2d(pool_size = c(3,3)) %>% 
  
  
  layer_flatten() %>% 
  
  layer_dense(units = 16, 
              activation = "relu") %>% 
 
  layer_dense(units = 25, 
              activation = "softmax",
              name = "Output"
              )

model
## Model: "CNN_Model"
## ________________________________________________________________________________
##  Layer (type)                       Output Shape                    Param #     
## ================================================================================
##  conv2d_2 (Conv2D)                  (None, 28, 28, 32)              544         
##  max_pooling2d_2 (MaxPooling2D)     (None, 9, 9, 32)                0           
##  dropout_1 (Dropout)                (None, 9, 9, 32)                0           
##  conv2d_1 (Conv2D)                  (None, 9, 9, 32)                16416       
##  max_pooling2d_1 (MaxPooling2D)     (None, 3, 3, 32)                0           
##  dropout (Dropout)                  (None, 3, 3, 32)                0           
##  conv2d (Conv2D)                    (None, 3, 3, 32)                16416       
##  max_pooling2d (MaxPooling2D)       (None, 1, 1, 32)                0           
##  flatten (Flatten)                  (None, 32)                      0           
##  dense (Dense)                      (None, 16)                      528         
##  Output (Dense)                     (None, 25)                      425         
## ================================================================================
## Total params: 34,329
## Trainable params: 34,329
## Non-trainable params: 0
## ________________________________________________________________________________

Train Model

The model was trained to optimize accuracy and minimize categorical crossentropy. The following hyperparameters were used to train the model:

  • epochs = 10
  • batch_size = 32
  • vaidation_splot = 0.1
start_time <- Sys.time()

model %>% 
  compile(loss = "categorical_crossentropy",
          optimizer = optimizer_adam(learning_rate = 0.001), 
          metrics = "accuracy"
          )


train_history <- model %>% 
  fit(x = train.cnn3, 
      y = train.labels,
      epochs = 10, 
      batch_size = 32,
      validation_split = 0.1, 
      
      verbose = 2
      )


end_time <- Sys.time()
end_time - start_time
## Time difference of 2.400791 mins
plot(train_history)

CNN Testing Data Predictions

The CNN model produced a testing set accuracy of 85.63%. This accuracy is deceiving as the model correctly classified almost all classes perfectly apart from 3 letters: M, N, and O. The algorithm classified 0 values as M, N labels as O, and O labels as M and N.

pred.cnn <- model %>% predict(test.cnn3) %>% k_argmax()

pred.factor <- as.factor(as.numeric(pred.cnn))

levels(pred.factor) <- list(A="1", B="2", C="3",
                           D="4", E="5", F="6",
                            G="7", H="8", I="9",
                            K="10", L="11", M="12",
                            N="12", O="13", P="15",
                            Q="16", R="17", S="18",
                            T="19", U="20", V="21",
                            W="22", X="23", Y="24")

cm2 <- confusionMatrix(data = pred.factor, reference = test$label)

cm2
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   A   B   C   D   E   F   G   H   I   K   L   M   N   O   P   Q   R
##          A 331   0   0   0   0   0   0   0   0   0   0   0   1  20   0   0   0
##          B   0 412   0   0   0   0   0   0   0   0   0   0  37   0   0   0   0
##          C   0   0 310   0   0   0   0   0   0   0   0   0   0   0   0   0   0
##          D   0   0   0 220   0   0  13   0   0   0   0   0   0   0   0   0   0
##          E   0   0   0  18 494   0   0   0   0   0   0   0   0   0   0   0   0
##          F   0   0   0   0   0 247   0   0   0   0   0   0   0   0   0   0   0
##          G   0   0   0   0   0   0 267  23   0   0   0   0   0   0   0   0   0
##          H   0   0   0   0   0   0  21 413   0   0   0   0   0   0   0   0   0
##          I   0   0   0   0   0   0   0   0 272   0   0   6   0   0   0   0   0
##          K   0  20   0   0   0   0   0   0   0 329   0   0   0   0   0   0  19
##          L   0   0   0   0   0   0   0   0   0   0 209   0   0   0   0   0   1
##          M   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
##          N   0   0   0   0   0   0   0   0   0   0   0 326  12   0   0   0   0
##          O   0   0   0   0   0   0   0   0   0   0   0  27 231   0   0   0   0
##          P   0   0   0   0   0   0   0   0   0   0   0   0   0   0 347   0   0
##          Q   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0 164   0
##          R   0   0   0   0   0   0   0   0   0   1   0   0   0   0   0   0 124
##          S   0   0   0   0   4   0  20   0   0   1   0  35  10   0   0   0   0
##          T   0   0   0   6   0   0  26   0   0   0   0   0   0   0   0   0   0
##          U   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
##          V   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
##          W   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
##          X   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0
##          Y   0   0   0   0   0   0   0   0  16   0   0   0   0   0   0   0   0
##           Reference
## Prediction   S   T   U   V   W   X   Y
##          A   0   0   0   0   0   0   0
##          B   0   0   8   0   0   0   0
##          C   0   0   0   0   0   0   0
##          D   0   0   0   0   0   0   0
##          E   0   0   0   0   0   0   0
##          F   0   0   0   0   0   0   0
##          G   0   0   0   0   0   0   0
##          H   0   0   0   0   0   0   0
##          I  11   0   0   0   0   0  33
##          K   0   0  20   0   0   0   8
##          L   0   0   0   0   0   0   0
##          M   0   0   0   0   0   0   0
##          N   1   0   0   0   0   0   0
##          O   0   0   0   0   0   0   0
##          P   0   0   0   0   0   0   0
##          Q   0   0   0   0   0   0   0
##          R   0   0   4   0   0   0   0
##          S 234   0   0   0   0   0   0
##          T   0 227   0   0   0   7   0
##          U   0   0 234   0   0   0   0
##          V   0   0   0 346   2   0   0
##          W   0   0   0   0 204   0   0
##          X   0  21   0   0   0 260   0
##          Y   0   0   0   0   0   0 291
## 
## Overall Statistics
##                                           
##                Accuracy : 0.8563          
##                  95% CI : (0.8478, 0.8645)
##     No Information Rate : 0.0717          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.8496          
##                                           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: A Class: B Class: C Class: D Class: E Class: F
## Sensitivity           1.00000  0.95370  1.00000  0.90164  0.99197  1.00000
## Specificity           0.99682  0.99309  1.00000  0.99806  0.99721  1.00000
## Pos Pred Value        0.94034  0.90153  1.00000  0.94421  0.96484  1.00000
## Neg Pred Value        1.00000  0.99692  1.00000  0.99642  0.99938  1.00000
## Prevalence            0.04766  0.06220  0.04464  0.03513  0.07171  0.03557
## Detection Rate        0.04766  0.05932  0.04464  0.03168  0.07113  0.03557
## Detection Prevalence  0.05068  0.06580  0.04464  0.03355  0.07372  0.03557
## Balanced Accuracy     0.99841  0.97340  1.00000  0.94985  0.99459  1.00000
##                      Class: G Class: H Class: I Class: K Class: L Class: M
## Sensitivity           0.76724  0.94725  0.94444  0.99396  1.00000  0.00000
## Specificity           0.99651  0.99677  0.99249  0.98987  0.99985  1.00000
## Pos Pred Value        0.92069  0.95161  0.84472  0.83081  0.99524      NaN
## Neg Pred Value        0.98783  0.99647  0.99758  0.99969  1.00000  0.94327
## Prevalence            0.05011  0.06278  0.04147  0.04766  0.03009  0.05673
## Detection Rate        0.03844  0.05947  0.03916  0.04737  0.03009  0.00000
## Detection Prevalence  0.04176  0.06249  0.04636  0.05702  0.03024  0.00000
## Balanced Accuracy     0.88188  0.97201  0.96847  0.99191  0.99993  0.50000
##                      Class: N Class: O Class: P Class: Q Class: R Class: S
## Sensitivity          0.041237  0.00000  1.00000  1.00000  0.86111  0.95122
## Specificity          0.950857  0.96274  1.00000  1.00000  0.99926  0.98955
## Pos Pred Value       0.035398  0.00000  1.00000  1.00000  0.96124  0.76974
## Neg Pred Value       0.957766  0.99701  1.00000  1.00000  0.99707  0.99819
## Prevalence           0.041901  0.00288  0.04996  0.02361  0.02073  0.03542
## Detection Rate       0.001728  0.00000  0.04996  0.02361  0.01785  0.03369
## Detection Prevalence 0.048812  0.03715  0.04996  0.02361  0.01857  0.04377
## Balanced Accuracy    0.496047  0.48137  1.00000  1.00000  0.93019  0.97039
##                      Class: T Class: U Class: V Class: W Class: X Class: Y
## Sensitivity           0.91532  0.87970  1.00000  0.99029  0.97378   0.8765
## Specificity           0.99418  1.00000  0.99970  1.00000  0.99671   0.9976
## Pos Pred Value        0.85338  1.00000  0.99425  1.00000  0.92199   0.9479
## Neg Pred Value        0.99686  0.99523  1.00000  0.99970  0.99895   0.9938
## Prevalence            0.03571  0.03830  0.04982  0.02966  0.03844   0.0478
## Detection Rate        0.03269  0.03369  0.04982  0.02937  0.03744   0.0419
## Detection Prevalence  0.03830  0.03369  0.05011  0.02937  0.04060   0.0442
## Balanced Accuracy     0.95475  0.93985  0.99985  0.99515  0.98524   0.9370

Which Performed Better? (XG Boost vs CNN)

Each algorithm produced a respectable accuracy metric being well above the baseline of random chance. CNN was able to attain a much higher accuracy for most classes, but failed to classify the letters M, N, and O properly. The XG Boost algorithm showcased the ability to successfully identify all classes with a degree of accuracy. I would consider the XG Boost algorithm to be superior to the CNN algorithm as the CNN algorithm cannot identify all classes.

Conclusion

Using techniques learned throughout the semester, I achieved two models that could classify images of American Sign Language letters. It was found that the XG Boost algorithm was more reliable than its CNN counterpart after analyzing the model results. The inclusion of synthesized/distorted images could have improved model performance further and may have fixed the issues with the CNN model.

A classification model of sign language letters could be applied to a real time application that would convert sign language letters to text. This type of application would provide an alternative way to communicate with someone who is deaf or hard of hearing.