Ex. 1

Use the nnet() package to analyze the iris data set. Use 80% of the 150 samples as the training data and the rest for validation. Discuss the results.

i <- iris
i
samp <- c(sample(1:50,40), sample(51:100,40), sample(101:150,40))
train <- i[samp,]
test <- i[-samp,]

iris_nnet <- nnet(Species~., train, size = 2, rang = 0.1,
            decay = 5e-4, maxit = 200)
## # weights:  19
## initial  value 132.649285 
## iter  10 value 59.475556
## iter  20 value 56.075284
## iter  30 value 25.896365
## iter  40 value 9.275291
## iter  50 value 7.653829
## iter  60 value 7.556775
## iter  70 value 7.514557
## iter  80 value 7.192152
## iter  90 value 6.730758
## iter 100 value 6.054399
## iter 110 value 5.978169
## iter 120 value 5.906089
## iter 130 value 5.882412
## iter 140 value 5.867738
## iter 150 value 5.865712
## iter 160 value 5.862511
## iter 170 value 5.860267
## iter 180 value 5.858933
## iter 190 value 5.858714
## iter 200 value 5.858672
## final  value 5.858672 
## stopped after 200 iterations
i_pred <- as.factor(predict(iris_nnet, test, 'class'))
confusionMatrix(i_pred, test$Species)
## Confusion Matrix and Statistics
## 
##             Reference
## Prediction   setosa versicolor virginica
##   setosa         10          0         0
##   versicolor      0         10         0
##   virginica       0          0        10
## 
## Overall Statistics
##                                              
##                Accuracy : 1                  
##                  95% CI : (0.884, 1)         
##     No Information Rate : 0.333              
##     P-Value [Acc > NIR] : 0.00000000000000486
##                                              
##                   Kappa : 1                  
##                                              
##  Mcnemar's Test P-Value : NA                 
## 
## Statistics by Class:
## 
##                      Class: setosa Class: versicolor Class: virginica
## Sensitivity                  1.000             1.000            1.000
## Specificity                  1.000             1.000            1.000
## Pos Pred Value               1.000             1.000            1.000
## Neg Pred Value               1.000             1.000            1.000
## Prevalence                   0.333             0.333            0.333
## Detection Rate               0.333             0.333            0.333
## Detection Prevalence         0.333             0.333            0.333
## Balanced Accuracy            1.000             1.000            1.000
I used the help documentation in R to understand how to use nnet and they had used the iris data set as their example, so I chose to use the same hyperparameters as in their example. I wish I knew why they chose those hyperparaeter values rather than the defaults though. Let’s run it again using the defaults to see what happens…
set.seed(42)
iris_nnet2 <- nnet(Species~., train, size = 1)
## # weights:  11
## initial  value 136.485426 
## iter  10 value 122.644900
## iter  20 value 55.495846
## iter  30 value 55.451895
## final  value 55.451790 
## converged
i_pred2 <- as.factor(predict(iris_nnet2, test, 'class'))
confusionMatrix(i_pred2, test$Species)
## Confusion Matrix and Statistics
## 
##             Reference
## Prediction   setosa versicolor virginica
##   setosa         10          0         0
##   versicolor      0          0         0
##   virginica       0         10        10
## 
## Overall Statistics
##                                         
##                Accuracy : 0.667         
##                  95% CI : (0.472, 0.827)
##     No Information Rate : 0.333         
##     P-Value [Acc > NIR] : 0.000194      
##                                         
##                   Kappa : 0.5           
##                                         
##  Mcnemar's Test P-Value : NA            
## 
## Statistics by Class:
## 
##                      Class: setosa Class: versicolor Class: virginica
## Sensitivity                  1.000             0.000            1.000
## Specificity                  1.000             1.000            0.500
## Pos Pred Value               1.000               NaN            0.500
## Neg Pred Value               1.000             0.667            1.000
## Prevalence                   0.333             0.333            0.333
## Detection Rate               0.333             0.000            0.333
## Detection Prevalence         0.333             0.000            0.667
## Balanced Accuracy            1.000             0.500            0.750
I got really inconsistent results using the defaults and size=1 (above). So I set a seed to show the problem. You can see above using 42 as my seed one third of the test set are misclassified. However if I use seed 250 (below) then I get almost the same performance as in the first model above which gave consistently good results no matter how many times I ran it. So clearly the customized hyperparameters result in a much better and more consistent model.
set.seed(250)
iris_nnet2 <- nnet(Species~., train, size = 1)
## # weights:  11
## initial  value 159.272971 
## iter  10 value 57.107406
## iter  20 value 53.041122
## iter  30 value 11.023845
## iter  40 value 5.671716
## iter  50 value 5.594857
## iter  60 value 5.572958
## iter  70 value 5.562449
## iter  80 value 5.542120
## iter  90 value 5.532411
## iter 100 value 5.526612
## final  value 5.526612 
## stopped after 100 iterations
i_pred2 <- as.factor(predict(iris_nnet2, test, 'class'))
confusionMatrix(i_pred2, test$Species)
## Confusion Matrix and Statistics
## 
##             Reference
## Prediction   setosa versicolor virginica
##   setosa         10          0         0
##   versicolor      0         10         0
##   virginica       0          0        10
## 
## Overall Statistics
##                                              
##                Accuracy : 1                  
##                  95% CI : (0.884, 1)         
##     No Information Rate : 0.333              
##     P-Value [Acc > NIR] : 0.00000000000000486
##                                              
##                   Kappa : 1                  
##                                              
##  Mcnemar's Test P-Value : NA                 
## 
## Statistics by Class:
## 
##                      Class: setosa Class: versicolor Class: virginica
## Sensitivity                  1.000             1.000            1.000
## Specificity                  1.000             1.000            1.000
## Pos Pred Value               1.000             1.000            1.000
## Neg Pred Value               1.000             1.000            1.000
## Prevalence                   0.333             0.333            0.333
## Detection Rate               0.333             0.333            0.333
## Detection Prevalence         0.333             0.333            0.333
## Balanced Accuracy            1.000             1.000            1.000

Ex. 2

As a mini project, install the keras package and learn how to use it. Then, carry out various tasks that may be useful to your project and studies.

I used the cran documentation to learn about using the keras package. Most of the code is modified from this page: https://cran.r-project.org/web/packages/keras/vignettes/index.html

Downloading and preparing the data

I decided to use the alternative fashion dataset though instead of MNIST. The following code loads the dataset:
fashion <- dataset_fashion_mnist()

First 4 images

Here’s a description of the dataset from the keras package documentation found here: https://cran.r-project.org/web/packages/keras/keras.pdf
Details

Dataset of 60,000 28x28 grayscale images of 10 fashion categories, along with a test set of 10,000 images. This dataset can be used as a drop-in replacement for MNIST. The class labels are:

  • 0 - T-shirt/top
  • 1 - Trouser
  • 2 - Pullover
  • 3 - Dress
  • 4 - Coat
  • 5 - Sandal
  • 6 - Shirt
  • 7 - Sneaker
  • 8 - Bag
  • 9 - Ankle boot
Value

Lists of training and test data: train\(x, train\)y, test\(x, test\)y, where x is an array of grayscale image data with shape (num_samples, 28, 28) and y is an array of article labels (integers in range 0-9) with shape (num_samples).

The data is already split into training and testing sets but the x values need to be reshaped from 3-d arrays to 1-d vectors and scaled so that the values range from 0 to 1 instead of 0 to 255 (grayscale values) and the y values need to be one-hot encoded into binary class matrices using the Keras to_categorical() function.
x_train <- fashion$train$x
y_train <- fashion$train$y
x_test <- fashion$test$x
y_test <- fashion$test$y
# reshape
x_train <- array_reshape(x_train, c(nrow(x_train), 784))
x_test <- array_reshape(x_test, c(nrow(x_test), 784))
# rescale
x_train <- x_train / 255
x_test <- x_test / 255

y_train <- to_categorical(y_train, 10)
y_test <- to_categorical(y_test, 10)

Defining and training the model

Now that the data is ready, we can define a model. For the first model I just used the parameters exactly as I found them in the keras documentation example. Then I iterated through a range of changes to the hyperparameters and the number of layers to see if I could impove upon the model. The code is shown for the first model but just the plots for the subsequent models.

Model 1 - 2 layers 256/128 units each, batch size 128, 30 epochs

model <- keras_model_sequential() 
model %>% 
  layer_dense(units = 256, activation = 'relu', input_shape = c(784)) %>% 
  layer_dropout(rate = 0.4) %>% 
  layer_dense(units = 128, activation = 'relu') %>%
  layer_dropout(rate = 0.3) %>%
  # outputs a length 10 numeric vector (probabilities for each class) using a softmax activation function.
  layer_dense(units = 10, activation = 'softmax')
summary(model)
## Model: "sequential"
## ________________________________________________________________________________
##  Layer (type)                       Output Shape                    Param #     
## ================================================================================
##  dense_2 (Dense)                    (None, 256)                     200960      
##                                                                                 
##  dropout_1 (Dropout)                (None, 256)                     0           
##                                                                                 
##  dense_1 (Dense)                    (None, 128)                     32896       
##                                                                                 
##  dropout (Dropout)                  (None, 128)                     0           
##                                                                                 
##  dense (Dense)                      (None, 10)                      1290        
##                                                                                 
## ================================================================================
## Total params: 235,146
## Trainable params: 235,146
## Non-trainable params: 0
## ________________________________________________________________________________
model %>% compile(
  loss = 'categorical_crossentropy',
  optimizer = optimizer_rmsprop(),
  metrics = c('accuracy')
)
set.seed(42)
history <- model %>% fit(
  x_train, y_train, 
  epochs = 30, batch_size = 128, 
  validation_split = 0.2
)

plot(history)

model %>% evaluate(x_test, y_test)
##     loss accuracy 
##   0.4072   0.8853

Model 2 - 2 layers 256/128 units each, batch size 256, 30 epochs

##     loss accuracy 
##   0.3508   0.8850

Model 3 - 2 layers 256/128 units each, batch size 64, 30 epochs

##     loss accuracy 
##   0.4908   0.8771

Model 4 - 2 layers 128/64 units each, batch size 128, 30 epochs

##     loss accuracy 
##   0.4074   0.8721

Model 5 - 2 layers 128/64 units each, batch size 256, 30 epochs

##     loss accuracy 
##   0.3729   0.8753

Model 6 - 2 layers 64/32 units each, batch size 256, 30 epochs

##     loss accuracy 
##   0.3802   0.8678

Model 7 - 3 layers 128/64/32 units each, batch size 256, 30 epochs

##     loss accuracy 
##   0.3720   0.8766

Model 8 - 3 layers 256/128/64 units each, batch size 512, 30 epochs

##     loss accuracy 
##   0.3413   0.8832

Evaluation

All of the models performed about the same with only about 1% or less difference in accuracy between them. You can see in the confusion matrix below, model 7, the most stable looking model, is particularly bad at classifying class 6 with only a little better than half classified correctly. Class 6 is labeled ‘Shirt’ and was most often misclassified as 0, ‘T-shirt/top’, which is a pertty minor distinction so it’s not surprisig that the model had trouble with differentiating the two classes. Class 2 was next with a little over 75% classified correctly and all the rest at over 80%.
pred <- model7 %>% predict(x_test) %>% k_argmax() %>% as.numeric() %>% as.factor()
y_transformed <- (as.numeric(sub('.', '', names(as.data.frame(y_test))[max.col(y_test)]))-1) %>% as.factor() 
confusionMatrix(pred, y_transformed) 
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   0   1   2   3   4   5   6   7   8   9
##          0 804   0  13  21   0   0 119   0   1   0
##          1   2 959   1   4   0   0   1   0   0   0
##          2  16   3 747   7  74   0  79   0   5   0
##          3  55  30  16 922  45   1  47   0   4   0
##          4   7   3 133  24 809   0  66   0   3   0
##          5   1   0   0   0   0 954   0  12   1   7
##          6 103   2  87  16  68   0 677   0   6   1
##          7   1   0   0   0   0  32   0 975   5  48
##          8  11   3   3   6   4   2  11   0 975   0
##          9   0   0   0   0   0  11   0  13   0 944
## 
## Overall Statistics
##                                              
##                Accuracy : 0.877              
##                  95% CI : (0.87, 0.883)      
##     No Information Rate : 0.1                
##     P-Value [Acc > NIR] : <0.0000000000000002
##                                              
##                   Kappa : 0.863              
##                                              
##  Mcnemar's Test P-Value : NA                 
## 
## Statistics by Class:
## 
##                      Class: 0 Class: 1 Class: 2 Class: 3 Class: 4 Class: 5
## Sensitivity            0.8040   0.9590   0.7470   0.9220   0.8090   0.9540
## Specificity            0.9829   0.9991   0.9796   0.9780   0.9738   0.9977
## Pos Pred Value         0.8392   0.9917   0.8024   0.8232   0.7742   0.9785
## Neg Pred Value         0.9783   0.9955   0.9721   0.9912   0.9787   0.9949
## Prevalence             0.1000   0.1000   0.1000   0.1000   0.1000   0.1000
## Detection Rate         0.0804   0.0959   0.0747   0.0922   0.0809   0.0954
## Detection Prevalence   0.0958   0.0967   0.0931   0.1120   0.1045   0.0975
## Balanced Accuracy      0.8934   0.9791   0.8633   0.9500   0.8914   0.9758
##                      Class: 6 Class: 7 Class: 8 Class: 9
## Sensitivity            0.6770   0.9750   0.9750   0.9440
## Specificity            0.9686   0.9904   0.9956   0.9973
## Pos Pred Value         0.7052   0.9189   0.9606   0.9752
## Neg Pred Value         0.9643   0.9972   0.9972   0.9938
## Prevalence             0.1000   0.1000   0.1000   0.1000
## Detection Rate         0.0677   0.0975   0.0975   0.0944
## Detection Prevalence   0.0960   0.1061   0.1015   0.0968
## Balanced Accuracy      0.8228   0.9827   0.9853   0.9707