6.2.1. Naive Bayes
We trained the Naive Bayes model
modelNB <- naiveBayes(x = train_features, y = as.factor(train_labels))
summary(modelNB)
## Length Class Mode
## apriori 2 table numeric
## tables 32 -none- list
## levels 2 -none- character
## isnumeric 32 -none- logical
## call 3 -none- call
We created the Confusion matrix on the training data set. The balanced accuracy of this method is 0.49372.
prediction_NB_train <- predict(modelNB, as.matrix(dtm[train_id,]))
cm_NB_train <- caret::confusionMatrix(as.factor(prediction_NB_train),as.factor(train_labels[train_id]))
cm_NB_train
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 281 7994
## 1 2737 67661
##
## Accuracy : 0.8636
## 95% CI : (0.8612, 0.866)
## No Information Rate : 0.9616
## P-Value [Acc > NIR] : 1
##
## Kappa : -0.0068
##
## Mcnemar's Test P-Value : <2e-16
##
## Sensitivity : 0.093108
## Specificity : 0.894336
## Pos Pred Value : 0.033958
## Neg Pred Value : 0.961121
## Prevalence : 0.038361
## Detection Rate : 0.003572
## Detection Prevalence : 0.105182
## Balanced Accuracy : 0.493722
##
## 'Positive' Class : 0
##
6.2.2. Neural Network
We used the Neural network with 2 hidden layers.
- The first layer uses ‘relu’ activation function and has 128 nodes.
- The second layer also uses ‘relu’ activation function and has 64 nodes.
- We used a drop out rate of 30% after each hidden layer.
model_NN <- keras_model_sequential()
## Loaded Tensorflow version 2.7.0
model_NN %>%
layer_dense(units = 128, activation = 'relu',input_shape = c(2^5)) %>%
layer_dropout(rate = 0.3) %>%
layer_dense(units = 64, activation = 'relu') %>%
layer_dropout(rate = 0.3) %>%
layer_dense(units = 2, activation = 'softmax')
summary(model_NN)
## Model: "sequential"
## ________________________________________________________________________________
## Layer (type) Output Shape Param #
## ================================================================================
## dense_2 (Dense) (None, 128) 4224
##
## dropout_1 (Dropout) (None, 128) 0
##
## dense_1 (Dense) (None, 64) 8256
##
## dropout (Dropout) (None, 64) 0
##
## dense (Dense) (None, 2) 130
##
## ================================================================================
## Total params: 12,610
## Trainable params: 12,610
## Non-trainable params: 0
## ________________________________________________________________________________
We compiled model with ‘adam’ optimizer, ‘sparse_categorical_crossentropy’ loss function and measure performance by ‘accuracy’.
model_NN %>% compile(
optimizer = 'adam',
loss = 'sparse_categorical_crossentropy',
metrics = c('accuracy'),
)
We trained the model using 10 epochs. Validation set’s size is 20% of training set’s size
history <- model_NN %>% keras::fit(train_features, train_labels, epochs = 10, validation_split = 0.2, verbose = T)
plot(history)
## `geom_smooth()` using formula 'y ~ x'

In the chart of loss history, we can see the model is becoming over-fitting if we increase number of epochs. We predict on training set and calculate. The balanced accuracy of prediction on training set is 0.5000.
prediction_NN_train <- predict(model_NN, train_features)
prediction_NN_train <- argmax(prediction_NN_train) - 1
cm_NN_train <- caret::confusionMatrix(as.factor(prediction_NN_train),as.factor(train_labels[train_id]))
cm_NN_train
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 0 7
## 1 3018 75648
##
## Accuracy : 0.9615
## 95% CI : (0.9602, 0.9629)
## No Information Rate : 0.9616
## P-Value [Acc > NIR] : 0.5565
##
## Kappa : -2e-04
##
## Mcnemar's Test P-Value : <2e-16
##
## Sensitivity : 0.000e+00
## Specificity : 9.999e-01
## Pos Pred Value : 0.000e+00
## Neg Pred Value : 9.616e-01
## Prevalence : 3.836e-02
## Detection Rate : 0.000e+00
## Detection Prevalence : 8.898e-05
## Balanced Accuracy : 5.000e-01
##
## 'Positive' Class : 0
##