This document wants to revisit the imbalanced 80 top genes of fold change values in acute samples versus the chronic samples of Lyme disease data where there were 4 classes that were not equally weighted as one had 10 samples and the other 3 had between 20 and 30 samples each. This time we are going to use Deep Neural Networks with Keras in R.
I recently have been following a youtube doctorate in data science, Dr. Bharatendra Rai, and it was a great way to review my data science projects in python with keras from years ago and neural networks that are more for artificial intelligence and big data. When looking for data on kaggle his link came up for timeseries in R, but this project won’t be using time series but deep learning from his many other YouTube videos. It can be confusing if you don’t follow along or are given instructions that are missing information when it comes to the layers. But fortunately with all the AI information in artificial intelligence all those nuances can be corrected with instant access to how to do the task correctly. You can subscribe to his channel or just watch many well explained data science videos of his online by searching on Youtube for his name.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyr)
library(keras3)
library(tensorflow)
##
## Attaching package: 'tensorflow'
## The following objects are masked from 'package:keras3':
##
## set_random_seed, shape
Lyme <- read.csv('ML_lyme_DF_86X81.csv', header=T, na.strings=c('',' ','na','NA'))
str(Lyme)
## 'data.frame': 86 obs. of 81 variables:
## $ NEK6 : num -0.3508 -0.1491 -0.0436 0.5418 -0.4119 ...
## $ HECTD1 : num -0.302 0.521 -1.802 0.547 0.531 ...
## $ PDE4DIP : num -0.5338 -0.111 0.04 -0.0543 0.0395 ...
## $ POU2F2 : num -0.305 -0.258 -0.0195 -0.0769 -0.1503 ...
## $ TMX3 : num -0.0214 0.2596 -0.61 0.234 0.4109 ...
## $ KAT5 : num 0.2 0.434 0.377 0.37 -0.159 ...
## $ PLCXD1 : num -0.5578 -0.5126 -0.334 0.0917 -0.4348 ...
## $ MPDU1 : num -0.31 -0.444 -0.122 -0.09 -0.354 ...
## $ PCSK1N : num -0.0774 -0.0476 0.7991 -0.0582 -0.3024 ...
## $ TCEAL4 : num 0.5637 -0.318 -1.2768 0.282 0.0424 ...
## $ KRT36 : num -0.05664 -0.199899 0.185492 0.000144 -0.155894 ...
## $ HSD11B1 : num -0.0724 0.0489 0.1243 -0.0985 -0.1971 ...
## $ LOC100287314 : num -0.1245 0.1503 -0.1013 0.0786 -0.1708 ...
## $ CHGA : num 0.0631 -0.2733 1.2062 -0.2496 -0.2776 ...
## $ STK38L : num -0.7317 0.0727 -0.3871 0.3623 0.0709 ...
## $ BRD3 : num 0.0203 0.2553 0.126 0.313 0.4716 ...
## $ TMSB4Y : num 0.3061 -0.5535 0.0623 -0.2572 0.2259 ...
## $ GGT7 : num -0.156 0.111 -0.0523 -0.4699 0.2977 ...
## $ RNF217 : num -0.315 1.019 -0.152 -0.205 0.305 ...
## $ ALAS2 : num 0.14 2.008 -0.517 -0.554 -0.762 ...
## $ LILRA6 : num -0.571 -0.804 -1.225 -0.696 -0.374 ...
## $ ADRBK2 : num -0.7166 -0.0239 -0.5431 0.0757 -0.2658 ...
## $ TAS2R3 : num 0.2325 0.1031 -0.036 -0.0213 -0.2399 ...
## $ C1orf51 : num 0.1973 -0.1743 0.2388 0.1867 -0.0457 ...
## $ CFTR : num 0.25 -0.272 0.751 -0.632 -0.32 ...
## $ HS3ST3B1 : num -0.336 0.323 0.194 0.392 0.282 ...
## $ LRRCC1 : num -0.0765 -0.0728 -0.0306 0.1316 0.0904 ...
## $ MPO : num -0.0461 -0.3983 -0.0862 -0.0156 -0.249 ...
## $ LOC653562 : num -0.158 1.295 0.837 -0.192 -0.407 ...
## $ TPBG : num -0.287 0.947 0.276 -0.4 -0.591 ...
## $ SCN11A : num -0.0107 0.0328 1.0342 -0.0841 -0.1941 ...
## $ VSIG8 : num 0.627 -0.198 -0.332 0.423 -0.334 ...
## $ CD2AP : num -0.0392 -0.1189 0.0636 0.1281 0.119 ...
## $ FLJ23867 : num -0.4395 0.4936 -0.0292 -0.3027 0.0265 ...
## $ BEGAIN : num -0.3016 -0.1186 1.0694 -0.0139 -0.2875 ...
## $ MGC34800 : num -0.0912 -0.2683 -0.0734 -0.019 -0.2323 ...
## $ P2RX7 : num -0.699 0.453 -0.534 -0.214 0.141 ...
## $ WDR46 : num 0.0299 -0.1665 -0.2608 -0.1359 -0.1444 ...
## $ GSC : num 0.3248 -0.1677 0.407 0.0255 0.3996 ...
## $ C3orf42 : num 0.0669 -0.1948 0.4071 0.0616 -0.4221 ...
## $ MTMR14 : num -0.0949 -0.0122 -0.1157 0.1285 -0.0997 ...
## $ FBXO48 : num 0.173 -0.098 -0.393 0.239 -0.116 ...
## $ H2AFJ : num -0.633 -0.0892 -0.715 0.1485 0.3033 ...
## $ KRTAP10.7 : num 0.734 0.257 1.111 -0.148 -0.135 ...
## $ SNORA74A : num 0.676 0.139 0.872 0.114 -0.38 ...
## $ RASAL1 : num 0.11742 -0.00149 -0.02482 -0.23536 0.028 ...
## $ ENSG00000229196: num 0.75822 -0.00579 0.93324 -0.02652 0.18702 ...
## $ FBXO9 : num 0.0192 -0.0072 -0.6633 -0.0984 0.025 ...
## $ SOX10 : num 0.0862 -0.1704 -0.1274 -0.0798 -0.3639 ...
## $ C1orf127 : num 0.246 0.336 1.465 -0.343 -0.128 ...
## $ F13A1 : num -0.605 0.36 -0.218 -0.598 -0.223 ...
## $ OR10A3 : num -0.06915 0.05162 0.24496 0.00137 -0.28854 ...
## $ LOC145757 : num -0.0569 -0.04769 0.00873 -0.09158 -0.1248 ...
## $ FAT2 : num 0.1919 -0.3219 0.3047 -0.0593 -0.1934 ...
## $ OOEP : num 0.177491 -0.000566 0.668561 0.199253 -0.173194 ...
## $ EIF2C2 : num 0.4964 0.2373 -0.7133 0.3628 -0.0516 ...
## $ MMP13 : num 0.0434 -0.0802 0.2907 -0.0312 -0.1319 ...
## $ MAP2K3 : num -0.7315 0.0363 0.1161 -0.0728 -0.1372 ...
## $ BRSK1 : num -0.604 0.0353 0.3551 -0.164 -0.3456 ...
## $ RHOBTB2 : num 0.1453 0.1134 0.6744 0.1848 0.0606 ...
## $ Septin.6 : num -0.1437 -0.3411 -0.0162 -0.4229 -0.0351 ...
## $ STRA8 : num 0.8642 0.0254 0.3683 0.3999 -0.0766 ...
## $ ATP2C1 : num -0.361 0.125 -0.659 0.119 0.149 ...
## $ OR14A16 : num 0.0286 -0.0641 0.4736 0.0261 -0.3375 ...
## $ AHSG : num 0.192 -0.22 0.92 -0.15 -0.277 ...
## $ ORC6L : num 0.19 -1.369 -0.131 -0.742 0.143 ...
## $ TLX1NB : num 0.502 -0.256 0.807 -0.475 -0.145 ...
## $ MTHFD2 : num 0.35248 0.31067 -1.05314 -0.01933 0.00216 ...
## $ GMPPB : num 0.0119 0.3596 0.5872 0.3397 0.0886 ...
## $ RGPD3 : num -0.000246 0.262285 -0.040798 -0.375937 0.190607 ...
## $ FXR1 : num 0.012 0.4993 -0.3251 0.0827 0.3152 ...
## $ GJA5 : num 0.3345 0.1924 0.4793 -0.0485 -0.1555 ...
## $ ZNF836 : num 0.0683 -0.0257 0.1888 -0.057 -0.1004 ...
## $ C16orf48 : num 0.2156 -0.0478 0.4009 0.3981 -0.21 ...
## $ DNAJC24 : num -0.4986 0.0249 -0.5918 0.3067 -0.0755 ...
## $ TSHR : num -0.056 0.1009 0.3179 0.0243 -0.0942 ...
## $ MTA3 : num 0.239 -0.491 -0.483 0.221 0.871 ...
## $ NDST3 : num -0.0178 0.0633 0.1542 -0.064 -0.0408 ...
## $ OR12D2 : num 0.2254 -0.1449 0.066 0.135 -0.0408 ...
## $ MKL1 : num -0.33057 0.12237 0.00884 0.07873 -0.08919 ...
## $ class : chr "healthy" "healthy" "healthy" "healthy" ...
Change the class feature that is our target into a factor instead of character feature.
Lyme$class <- as.factor(Lyme$class)
Lets look at the class imbalance in the target. We use dplyr for this. In the video demonstration of source he used the table() as shown below. You could also do that instead if preferred and not use dplyr or tidyr. The table function is part of base R.
table(Lyme$class)
##
## 1 month 6 month acute healthy
## 27 10 28 21
Lyme %>% group_by(class) %>% count(class)
## # A tibble: 4 × 2
## # Groups: class [4]
## class n
## <fct> <int>
## 1 1 month 27
## 2 6 month 10
## 3 acute 28
## 4 healthy 21
barplot(prop.table(table(Lyme$class)), col = rainbow(4), ylab='Proportion', xlab='class',cex.names=1.5)
The above bar plot shows the proportion of each class to the total
number of samples in a class feature that is our target to make
predictions on using our deep neural network yet to build.
For the Lyme disease data read in, we want to convert it to a matrix and remove default names of dimensions and normalize it to run the neural networks in Keras.
Remove the target class that is the last feature in data frame of 81 features. Matrices can only have one type of object like numeric, integer, or character.
DNN_matrix <- as.matrix(Lyme[,1:80])
dimnames(DNN_matrix) <- NULL
DNN_matrix <- normalize(DNN_matrix)
Now, we can see that we have our training data but didn’t separate the target class, only from the data to make the matrix of numeric predictors. So we can easily just make the vector from original data read in as Lyme.
TargetFeature <- Lyme$class
str(TargetFeature)
## Factor w/ 4 levels "1 month","6 month",..: 4 4 4 4 4 4 4 4 4 4 ...
summary(TargetFeature)
## 1 month 6 month acute healthy
## 27 10 28 21
set.seed(1234)
ind <- sample(1:86, .7*86)
training <- DNN_matrix[ind,]
testing <- DNN_matrix[-ind,]
trainingtarget <- TargetFeature[ind]
testingtarget <- TargetFeature[-ind]
summary(trainingtarget)
## 1 month 6 month acute healthy
## 16 7 20 17
summary(testingtarget)
## 1 month 6 month acute healthy
## 11 3 8 4
Turn the target into categorical features
trainLabels <- to_categorical(trainingtarget) #turns labels into binary categories
testLabels <- to_categorical(testingtarget)
testLabels[1:10,]
## [,1] [,2] [,3] [,4]
## [1,] 0 0 0 1
## [2,] 0 0 0 1
## [3,] 0 0 0 1
## [4,] 0 0 0 1
## [5,] 0 0 1 0
## [6,] 0 0 1 0
## [7,] 0 0 1 0
## [8,] 0 0 1 0
## [9,] 0 0 1 0
## [10,] 0 0 1 0
summary(testingtarget)
## 1 month 6 month acute healthy
## 11 3 8 4
summary(testLabels)
## V1 V2 V3 V4
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.0000 Median :0.0000 Median :0.0000 Median :0.0000
## Mean :0.4231 Mean :0.1154 Mean :0.3077 Mean :0.1538
## 3rd Qu.:1.0000 3rd Qu.:0.0000 3rd Qu.:1.0000 3rd Qu.:0.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
Not useful for when we have to get the labels back. Have to sum the rows per class.
sum(testLabels[,1]==1)
## [1] 11
That lets me know the class of 1st column is 1 month.
sum(testLabels[,2]==1)
## [1] 3
The 2nd column is 6 months.
sum(testLabels[,3]==1)
## [1] 8
The 3rd column is acute, and 4th by elimination is healthy. Lets make these categorical values into a class vector to use later in prediction accuracy comparison to actual test data.
classesOfCategories <- c("1 month","6 months","acute","healthy")
Our data has 4 classes and this is matrix of the class predicted of the 4, with only one 1 value in each row to indicate the class for that observation.
You can see the matrix of test targets is similar but for 26 samples since it has 26 observations to test the model of 60 samples built by our DNN yet to be built.
testLabels
## [,1] [,2] [,3] [,4]
## [1,] 0 0 0 1
## [2,] 0 0 0 1
## [3,] 0 0 0 1
## [4,] 0 0 0 1
## [5,] 0 0 1 0
## [6,] 0 0 1 0
## [7,] 0 0 1 0
## [8,] 0 0 1 0
## [9,] 0 0 1 0
## [10,] 0 0 1 0
## [11,] 0 0 1 0
## [12,] 0 0 1 0
## [13,] 1 0 0 0
## [14,] 1 0 0 0
## [15,] 1 0 0 0
## [16,] 1 0 0 0
## [17,] 1 0 0 0
## [18,] 1 0 0 0
## [19,] 1 0 0 0
## [20,] 1 0 0 0
## [21,] 1 0 0 0
## [22,] 1 0 0 0
## [23,] 1 0 0 0
## [24,] 0 1 0 0
## [25,] 0 1 0 0
## [26,] 0 1 0 0
trainLabels
## [,1] [,2] [,3] [,4]
## [1,] 0 0 1 0
## [2,] 0 1 0 0
## [3,] 0 0 1 0
## [4,] 0 0 0 1
## [5,] 0 0 0 1
## [6,] 0 0 1 0
## [7,] 0 0 0 1
## [8,] 0 0 0 1
## [9,] 1 0 0 0
## [10,] 0 0 0 1
## [11,] 1 0 0 0
## [12,] 1 0 0 0
## [13,] 0 1 0 0
## [14,] 1 0 0 0
## [15,] 0 0 0 1
## [16,] 0 0 1 0
## [17,] 1 0 0 0
## [18,] 1 0 0 0
## [19,] 0 1 0 0
## [20,] 1 0 0 0
## [21,] 0 0 1 0
## [22,] 1 0 0 0
## [23,] 1 0 0 0
## [24,] 0 0 0 1
## [25,] 0 0 1 0
## [26,] 0 0 1 0
## [27,] 0 0 0 1
## [28,] 0 0 1 0
## [29,] 0 0 1 0
## [30,] 1 0 0 0
## [31,] 1 0 0 0
## [32,] 0 0 1 0
## [33,] 0 0 0 1
## [34,] 0 0 1 0
## [35,] 0 0 1 0
## [36,] 0 0 1 0
## [37,] 0 0 0 1
## [38,] 0 0 0 1
## [39,] 0 1 0 0
## [40,] 0 0 0 1
## [41,] 0 0 1 0
## [42,] 1 0 0 0
## [43,] 0 1 0 0
## [44,] 0 0 1 0
## [45,] 0 0 0 1
## [46,] 1 0 0 0
## [47,] 0 0 1 0
## [48,] 0 1 0 0
## [49,] 0 0 1 0
## [50,] 0 1 0 0
## [51,] 1 0 0 0
## [52,] 0 0 1 0
## [53,] 0 0 0 1
## [54,] 1 0 0 0
## [55,] 0 0 1 0
## [56,] 0 0 0 1
## [57,] 0 0 1 0
## [58,] 1 0 0 0
## [59,] 0 0 0 1
## [60,] 0 0 0 1
model <- keras_model_sequential()
Next, pipe operator used and to connect fully connected layer in neural network
The activation as ‘relu’ is a rectified linear unit, 8 neurons or nodes in 1st hidden layer, for activation function relu is most popular, input shape is 80 independent variables, connect with pipe %>% operator to another layer with 4 units bc 4 classes in target variable. Use softmax activation for 2nd layer bc it is most suitable for this.
set.seed(123)
model %>%
layer_dense(units=8,activation='relu', input_shape=c(80)) %>%
layer_dense(units=4, activation='softmax')
I was given a user warning from last output that required to change
it because When using Sequential models, prefer using an
Input(shape) object as the first layer in the model
instead.
summary(model)
## Model: "sequential"
## ┌───────────────────────────────────┬──────────────────────────┬───────────────
## │ Layer (type) │ Output Shape │ Param #
## ├───────────────────────────────────┼──────────────────────────┼───────────────
## │ dense (Dense) │ (None, 8) │ 648
## ├───────────────────────────────────┼──────────────────────────┼───────────────
## │ dense_1 (Dense) │ (None, 4) │ 36
## └───────────────────────────────────┴──────────────────────────┴───────────────
## Total params: 684 (2.67 KB)
## Trainable params: 684 (2.67 KB)
## Non-trainable params: 0 (0.00 B)
The code was ran and the summary produced a table with output shape for 1st row (None, 8) and Param# of 648 which is 808+8 bias units=648 parameters. The 2nd line is for Layer dense_1 as dense as well as 1st line under Layer type, with a an output shape of (None, 4) and 36 parameters for 48 nodes of first layer + 4 bias terms = 36 parameters.Further results under the table show total prameters of 684 with 684 or all of those parameters trainable and 0 non-trainable parameters.
Next the most suitable loss is categorical crossentropy
set.seed(123)
model %>%
compile(loss='categorical_crossentropy',
optimizer='adam',
metrics='accuracy')
Lets store this model after fitting it to our training data.
Make sure I have enough data to run as many batches per epoch. I have 60 samples, so 10 epochs of 6 sample batches each.
set.seed(123)
history <- fit(model, x=training,
y=trainLabels,
epochs=10,
batch_size = 6)
## Epoch 1/10
## 10/10 - 0s - 44ms/step - accuracy: 0.1833 - loss: 1.4001
## Epoch 2/10
## 10/10 - 0s - 3ms/step - accuracy: 0.2000 - loss: 1.3856
## Epoch 3/10
## 10/10 - 0s - 3ms/step - accuracy: 0.2667 - loss: 1.3728
## Epoch 4/10
## 10/10 - 0s - 3ms/step - accuracy: 0.2833 - loss: 1.3612
## Epoch 5/10
## 10/10 - 0s - 3ms/step - accuracy: 0.3500 - loss: 1.3501
## Epoch 6/10
## 10/10 - 0s - 3ms/step - accuracy: 0.3833 - loss: 1.3397
## Epoch 7/10
## 10/10 - 0s - 3ms/step - accuracy: 0.4167 - loss: 1.3291
## Epoch 8/10
## 10/10 - 0s - 3ms/step - accuracy: 0.4833 - loss: 1.3182
## Epoch 9/10
## 10/10 - 0s - 3ms/step - accuracy: 0.5000 - loss: 1.3068
## Epoch 10/10
## 10/10 - 0s - 6ms/step - accuracy: 0.5333 - loss: 1.2957
plot(history)
It looks good as accuracy increases while loss decreases per iteration
of 10 epochs and 6 for batch size. Lets try more epochs and lower
batches.
set.seed(123)
history <- fit(model, x=training,
y=trainLabels,
epochs=15,
batch_size = 4)
## Epoch 1/15
## 15/15 - 0s - 14ms/step - accuracy: 0.5667 - loss: 1.2818
## Epoch 2/15
## 15/15 - 0s - 2ms/step - accuracy: 0.6000 - loss: 1.2635
## Epoch 3/15
## 15/15 - 0s - 2ms/step - accuracy: 0.6167 - loss: 1.2460
## Epoch 4/15
## 15/15 - 0s - 2ms/step - accuracy: 0.6167 - loss: 1.2250
## Epoch 5/15
## 15/15 - 0s - 3ms/step - accuracy: 0.6167 - loss: 1.2046
## Epoch 6/15
## 15/15 - 0s - 2ms/step - accuracy: 0.6333 - loss: 1.1846
## Epoch 7/15
## 15/15 - 0s - 2ms/step - accuracy: 0.6333 - loss: 1.1644
## Epoch 8/15
## 15/15 - 0s - 3ms/step - accuracy: 0.6833 - loss: 1.1420
## Epoch 9/15
## 15/15 - 0s - 2ms/step - accuracy: 0.7000 - loss: 1.1206
## Epoch 10/15
## 15/15 - 0s - 4ms/step - accuracy: 0.7000 - loss: 1.0984
## Epoch 11/15
## 15/15 - 0s - 2ms/step - accuracy: 0.7000 - loss: 1.0759
## Epoch 12/15
## 15/15 - 0s - 2ms/step - accuracy: 0.7167 - loss: 1.0530
## Epoch 13/15
## 15/15 - 0s - 2ms/step - accuracy: 0.7333 - loss: 1.0313
## Epoch 14/15
## 15/15 - 0s - 3ms/step - accuracy: 0.7333 - loss: 1.0101
## Epoch 15/15
## 15/15 - 0s - 3ms/step - accuracy: 0.7333 - loss: 0.9889
plot(history)
With 15 epochs and 4 for batch size, the model improved more to above
70% accuracy.
Now for the original model in evaluating the testing set of 26 samples.
set.seed(123)
model %>%
evaluate(testing,testLabels)
## 1/1 - 0s - 100ms/step - accuracy: 0.4231 - loss: 1.1701
## $accuracy
## [1] 0.4230769
##
## $loss
## [1] 1.170123
when evaluating the model fit used on training set to the testing set the accuracy was 42% with a loss of 1.2 on normalized data first run, then second run is 31%. The set.seed() doesn’t really seem to help.
Lets use this model to test the accuracy in classification on the testing model with the predict function.
set.seed(123)
pred <- model %>%
predict(testing)
## 1/1 - 0s - 33ms/step
pred
## [,1] [,2] [,3] [,4]
## [1,] 0.2052311 0.26876014 0.1015893 0.42441946
## [2,] 0.2790193 0.20253578 0.1981393 0.32030568
## [3,] 0.1868092 0.26065952 0.1632926 0.38923863
## [4,] 0.2887329 0.13820300 0.2614725 0.31159166
## [5,] 0.3281626 0.13632673 0.3497936 0.18571706
## [6,] 0.4159975 0.16103640 0.1635657 0.25940049
## [7,] 0.2037498 0.14383301 0.5088997 0.14351737
## [8,] 0.3194025 0.18580954 0.2918125 0.20297545
## [9,] 0.3040723 0.15074617 0.3895478 0.15563369
## [10,] 0.3941506 0.11098588 0.3908422 0.10402124
## [11,] 0.3874596 0.17675640 0.2469133 0.18887067
## [12,] 0.1305620 0.20648573 0.3761832 0.28676903
## [13,] 0.4103707 0.07876841 0.4388428 0.07201809
## [14,] 0.3641017 0.10263371 0.3935627 0.13970180
## [15,] 0.2955174 0.17513637 0.3464928 0.18285340
## [16,] 0.2968966 0.18567362 0.3029403 0.21448940
## [17,] 0.3307474 0.14276147 0.3448962 0.18159500
## [18,] 0.3178866 0.19428258 0.2717184 0.21611241
## [19,] 0.2351886 0.21626684 0.2846549 0.26388970
## [20,] 0.2197315 0.11352270 0.5407102 0.12603557
## [21,] 0.3818920 0.07789458 0.4448255 0.09538786
## [22,] 0.3090226 0.25591794 0.1417481 0.29331139
## [23,] 0.4538968 0.13142340 0.1757105 0.23896928
## [24,] 0.2850437 0.17970487 0.3216736 0.21357781
## [25,] 0.3826675 0.18594480 0.2039630 0.22742471
## [26,] 0.1975918 0.24264613 0.2876964 0.27206576
The row values will select the highest value in prediction for that class sample. The pred matrix is 26 samples, by 4 class options wide. Lets add our classesOfCategories names to the above pred table to get back the actual class and retrieve the class with highest probability per row. We made that earlier.
predicted1 <- data.frame(pred)
colnames(predicted1) <- classesOfCategories
predicted1
## 1 month 6 months acute healthy
## 1 0.2052311 0.26876014 0.1015893 0.42441946
## 2 0.2790193 0.20253578 0.1981393 0.32030568
## 3 0.1868092 0.26065952 0.1632926 0.38923863
## 4 0.2887329 0.13820300 0.2614725 0.31159166
## 5 0.3281626 0.13632673 0.3497936 0.18571706
## 6 0.4159975 0.16103640 0.1635657 0.25940049
## 7 0.2037498 0.14383301 0.5088997 0.14351737
## 8 0.3194025 0.18580954 0.2918125 0.20297545
## 9 0.3040723 0.15074617 0.3895478 0.15563369
## 10 0.3941506 0.11098588 0.3908422 0.10402124
## 11 0.3874596 0.17675640 0.2469133 0.18887067
## 12 0.1305620 0.20648573 0.3761832 0.28676903
## 13 0.4103707 0.07876841 0.4388428 0.07201809
## 14 0.3641017 0.10263371 0.3935627 0.13970180
## 15 0.2955174 0.17513637 0.3464928 0.18285340
## 16 0.2968966 0.18567362 0.3029403 0.21448940
## 17 0.3307474 0.14276147 0.3448962 0.18159500
## 18 0.3178866 0.19428258 0.2717184 0.21611241
## 19 0.2351886 0.21626684 0.2846549 0.26388970
## 20 0.2197315 0.11352270 0.5407102 0.12603557
## 21 0.3818920 0.07789458 0.4448255 0.09538786
## 22 0.3090226 0.25591794 0.1417481 0.29331139
## 23 0.4538968 0.13142340 0.1757105 0.23896928
## 24 0.2850437 0.17970487 0.3216736 0.21357781
## 25 0.3826675 0.18594480 0.2039630 0.22742471
## 26 0.1975918 0.24264613 0.2876964 0.27206576
Lets run the pred matrix with actual testLabels matrix to make a dataframe that is 26X8 in shape.
df_results1 <- data.frame(Predicted = pred, Actual=testLabels)
df_results1
## Predicted.1 Predicted.2 Predicted.3 Predicted.4 Actual.1 Actual.2 Actual.3
## 1 0.2052311 0.26876014 0.1015893 0.42441946 0 0 0
## 2 0.2790193 0.20253578 0.1981393 0.32030568 0 0 0
## 3 0.1868092 0.26065952 0.1632926 0.38923863 0 0 0
## 4 0.2887329 0.13820300 0.2614725 0.31159166 0 0 0
## 5 0.3281626 0.13632673 0.3497936 0.18571706 0 0 1
## 6 0.4159975 0.16103640 0.1635657 0.25940049 0 0 1
## 7 0.2037498 0.14383301 0.5088997 0.14351737 0 0 1
## 8 0.3194025 0.18580954 0.2918125 0.20297545 0 0 1
## 9 0.3040723 0.15074617 0.3895478 0.15563369 0 0 1
## 10 0.3941506 0.11098588 0.3908422 0.10402124 0 0 1
## 11 0.3874596 0.17675640 0.2469133 0.18887067 0 0 1
## 12 0.1305620 0.20648573 0.3761832 0.28676903 0 0 1
## 13 0.4103707 0.07876841 0.4388428 0.07201809 1 0 0
## 14 0.3641017 0.10263371 0.3935627 0.13970180 1 0 0
## 15 0.2955174 0.17513637 0.3464928 0.18285340 1 0 0
## 16 0.2968966 0.18567362 0.3029403 0.21448940 1 0 0
## 17 0.3307474 0.14276147 0.3448962 0.18159500 1 0 0
## 18 0.3178866 0.19428258 0.2717184 0.21611241 1 0 0
## 19 0.2351886 0.21626684 0.2846549 0.26388970 1 0 0
## 20 0.2197315 0.11352270 0.5407102 0.12603557 1 0 0
## 21 0.3818920 0.07789458 0.4448255 0.09538786 1 0 0
## 22 0.3090226 0.25591794 0.1417481 0.29331139 1 0 0
## 23 0.4538968 0.13142340 0.1757105 0.23896928 1 0 0
## 24 0.2850437 0.17970487 0.3216736 0.21357781 0 1 0
## 25 0.3826675 0.18594480 0.2039630 0.22742471 0 1 0
## 26 0.1975918 0.24264613 0.2876964 0.27206576 0 1 0
## Actual.4
## 1 1
## 2 1
## 3 1
## 4 1
## 5 0
## 6 0
## 7 0
## 8 0
## 9 0
## 10 0
## 11 0
## 12 0
## 13 0
## 14 0
## 15 0
## 16 0
## 17 0
## 18 0
## 19 0
## 20 0
## 21 0
## 22 0
## 23 0
## 24 0
## 25 0
## 26 0
This data frame above combining the predicted classes as a matrix of probabilities per class to the actual values as a matrix of classes with the only integer in the class and all other numberic or zero is confusing. It looks like dplyr needs to add columns to summarize the data in predicted as the max of the class probabilites, and the actual needs to return the value of the class the row has a vote or value that will take some heavy computing and time.
But as a table as the guide showed, will return a confusion matrix.
table(predicted=pred, actual=testLabels)
## actual
## predicted 0 1
## 0.0720180869102478 1 0
## 0.0778945758938789 1 0
## 0.0787684097886086 1 0
## 0.0953878611326218 1 0
## 0.101589344441891 1 0
## 0.102633714675903 1 0
## 0.104021243751049 1 0
## 0.1109858751297 1 0
## 0.113522700965405 1 0
## 0.126035571098328 1 0
## 0.130562007427216 1 0
## 0.131423398852348 1 0
## 0.136326730251312 1 0
## 0.138202995061874 1 0
## 0.139701798558235 1 0
## 0.14174810051918 1 0
## 0.142761468887329 1 0
## 0.143517374992371 1 0
## 0.143833011388779 1 0
## 0.150746166706085 1 0
## 0.155633687973022 1 0
## 0.161036401987076 1 0
## 0.163292646408081 1 0
## 0.163565680384636 0 1
## 0.175136372447014 1 0
## 0.175710469484329 1 0
## 0.176756396889687 1 0
## 0.179704874753952 0 1
## 0.181594997644424 1 0
## 0.182853400707245 1 0
## 0.185673624277115 1 0
## 0.185717061161995 1 0
## 0.185809537768364 1 0
## 0.185944795608521 0 1
## 0.186809226870537 1 0
## 0.188870668411255 1 0
## 0.194282576441765 1 0
## 0.197591751813889 1 0
## 0.198139324784279 1 0
## 0.202535778284073 1 0
## 0.202975451946259 1 0
## 0.20374983549118 1 0
## 0.203963026404381 1 0
## 0.205231115221977 1 0
## 0.206485733389854 1 0
## 0.213577806949615 1 0
## 0.21448940038681 1 0
## 0.216112405061722 1 0
## 0.216266840696335 1 0
## 0.219731524586678 0 1
## 0.227424710988998 1 0
## 0.235188573598862 0 1
## 0.238969281315804 1 0
## 0.242646127939224 0 1
## 0.2469132989645 0 1
## 0.255917936563492 1 0
## 0.259400486946106 1 0
## 0.260659515857697 1 0
## 0.26147249341011 1 0
## 0.263889700174332 1 0
## 0.268760144710541 1 0
## 0.271718442440033 1 0
## 0.272065758705139 1 0
## 0.279019266366959 1 0
## 0.284654915332794 1 0
## 0.285043686628342 1 0
## 0.286769032478333 1 0
## 0.28769639134407 1 0
## 0.28873285651207 1 0
## 0.291812509298325 0 1
## 0.293311387300491 1 0
## 0.295517385005951 0 1
## 0.296896636486053 0 1
## 0.302940338850021 1 0
## 0.304072320461273 1 0
## 0.309022605419159 0 1
## 0.311591655015945 0 1
## 0.317886620759964 0 1
## 0.319402515888214 1 0
## 0.320305675268173 0 1
## 0.321673601865768 1 0
## 0.328162550926208 1 0
## 0.330747365951538 0 1
## 0.344896197319031 1 0
## 0.346492826938629 1 0
## 0.349793642759323 0 1
## 0.364101678133011 0 1
## 0.376183241605759 0 1
## 0.381891995668411 0 1
## 0.382667452096939 1 0
## 0.387459635734558 1 0
## 0.389238625764847 0 1
## 0.389547795057297 0 1
## 0.390842229127884 0 1
## 0.393562734127045 1 0
## 0.394150644540787 1 0
## 0.410370737314224 0 1
## 0.415997505187988 1 0
## 0.424419462680817 0 1
## 0.4388427734375 1 0
## 0.444825530052185 1 0
## 0.453896820545197 0 1
## 0.508899748325348 0 1
## 0.540710151195526 1 0
No confusion matrix, so maybe we can create a data frame with the predicted1 data frame and attach the testingtarget class names for the testing data’s class labels before made into a matrix by class count, so that we can get the max entry of the row in predicted1 and compare to the actual test label.
predicted1$predicted = ifelse(
(predicted1$"1 month"> predicted1$"6 months") &
(predicted1$"1 month" > predicted1$acute) &
(predicted1$"1 month"> predicted1$healthy)
,
"1 month",(
ifelse(
(predicted1$"6 months" > predicted1$"1 month") &
(predicted1$"6 months" > predicted1$acute) &
(predicted1$"6 months" > predicted1$healthy)
,
"6 months",
ifelse(
(predicted1$acute > predicted1$"1 month") &
(predicted1$acute > predicted1$"6 months") &
(predicted1$acute > predicted1$healthy)
,
"acute",
"healthy"))
)
)
predicted1
## 1 month 6 months acute healthy predicted
## 1 0.2052311 0.26876014 0.1015893 0.42441946 healthy
## 2 0.2790193 0.20253578 0.1981393 0.32030568 healthy
## 3 0.1868092 0.26065952 0.1632926 0.38923863 healthy
## 4 0.2887329 0.13820300 0.2614725 0.31159166 healthy
## 5 0.3281626 0.13632673 0.3497936 0.18571706 acute
## 6 0.4159975 0.16103640 0.1635657 0.25940049 1 month
## 7 0.2037498 0.14383301 0.5088997 0.14351737 acute
## 8 0.3194025 0.18580954 0.2918125 0.20297545 1 month
## 9 0.3040723 0.15074617 0.3895478 0.15563369 acute
## 10 0.3941506 0.11098588 0.3908422 0.10402124 1 month
## 11 0.3874596 0.17675640 0.2469133 0.18887067 1 month
## 12 0.1305620 0.20648573 0.3761832 0.28676903 acute
## 13 0.4103707 0.07876841 0.4388428 0.07201809 acute
## 14 0.3641017 0.10263371 0.3935627 0.13970180 acute
## 15 0.2955174 0.17513637 0.3464928 0.18285340 acute
## 16 0.2968966 0.18567362 0.3029403 0.21448940 acute
## 17 0.3307474 0.14276147 0.3448962 0.18159500 acute
## 18 0.3178866 0.19428258 0.2717184 0.21611241 1 month
## 19 0.2351886 0.21626684 0.2846549 0.26388970 acute
## 20 0.2197315 0.11352270 0.5407102 0.12603557 acute
## 21 0.3818920 0.07789458 0.4448255 0.09538786 acute
## 22 0.3090226 0.25591794 0.1417481 0.29331139 1 month
## 23 0.4538968 0.13142340 0.1757105 0.23896928 1 month
## 24 0.2850437 0.17970487 0.3216736 0.21357781 acute
## 25 0.3826675 0.18594480 0.2039630 0.22742471 1 month
## 26 0.1975918 0.24264613 0.2876964 0.27206576 acute
Now, that the tedious work of nested ifelse loops is done for four classes, compare across the rows to see in data frame output above if the highest probability is selected in the new feature we added named ‘predicted’ and you will see it is.
Lets now combine this to the actual labels in the testingtarget list as a data frame and get the accuracy.
results1 <- data.frame(predicted=predicted1$predicted,actual=testingtarget)
results1
## predicted actual
## 1 healthy healthy
## 2 healthy healthy
## 3 healthy healthy
## 4 healthy healthy
## 5 acute acute
## 6 1 month acute
## 7 acute acute
## 8 1 month acute
## 9 acute acute
## 10 1 month acute
## 11 1 month acute
## 12 acute acute
## 13 acute 1 month
## 14 acute 1 month
## 15 acute 1 month
## 16 acute 1 month
## 17 acute 1 month
## 18 1 month 1 month
## 19 acute 1 month
## 20 acute 1 month
## 21 acute 1 month
## 22 1 month 1 month
## 23 1 month 1 month
## 24 acute 6 month
## 25 1 month 6 month
## 26 acute 6 month
Lets see the results in accuracy as well.
sum(results1$predicted==results1$actual)/length(results1$predicted)
## [1] 0.4230769
The accuracy is 42% in RStudio but when rerun it says 38%.
*** More layers in the Deep Neural Network built in Keras ***
Lets see if adding more hidden layers can make the accuracy better. It should if done like the demonstration did. But lets see with this data of completely different data.
set.seed(123)
model2 <- keras_model_sequential()
set.seed(123)
model2 %>%
layer_dense(units=40,activation='relu', input_shape=c(80)) %>%
layer_dropout(rate=0.4) %>%
layer_dense(units=30,activation='relu') %>%
layer_dropout(rate=0.3) %>%
layer_dense(units=20,activation='relu') %>%
layer_dropout(rate=0.2) %>%
layer_dense(units=4, activation='softmax')
summary(model2)
## Model: "sequential_1"
## ┌───────────────────────────────────┬──────────────────────────┬───────────────
## │ Layer (type) │ Output Shape │ Param #
## ├───────────────────────────────────┼──────────────────────────┼───────────────
## │ dense_2 (Dense) │ (None, 40) │ 3,240
## ├───────────────────────────────────┼──────────────────────────┼───────────────
## │ dropout (Dropout) │ (None, 40) │ 0
## ├───────────────────────────────────┼──────────────────────────┼───────────────
## │ dense_3 (Dense) │ (None, 30) │ 1,230
## ├───────────────────────────────────┼──────────────────────────┼───────────────
## │ dropout_1 (Dropout) │ (None, 30) │ 0
## ├───────────────────────────────────┼──────────────────────────┼───────────────
## │ dense_4 (Dense) │ (None, 20) │ 620
## ├───────────────────────────────────┼──────────────────────────┼───────────────
## │ dropout_2 (Dropout) │ (None, 20) │ 0
## ├───────────────────────────────────┼──────────────────────────┼───────────────
## │ dense_5 (Dense) │ (None, 4) │ 84
## └───────────────────────────────────┴──────────────────────────┴───────────────
## Total params: 5,174 (20.21 KB)
## Trainable params: 5,174 (20.21 KB)
## Non-trainable params: 0 (0.00 B)
set.seed(123)
model2 %>%
compile(loss='categorical_crossentropy',
optimizer='adam',
metrics='accuracy')
set.seed(123)
history2a <- fit(model2, x=training,
y=trainLabels,
epochs=10,
batch_size = 6)
## Epoch 1/10
## 10/10 - 1s - 64ms/step - accuracy: 0.2167 - loss: 1.3906
## Epoch 2/10
## 10/10 - 0s - 3ms/step - accuracy: 0.2833 - loss: 1.3745
## Epoch 3/10
## 10/10 - 0s - 3ms/step - accuracy: 0.4500 - loss: 1.3540
## Epoch 4/10
## 10/10 - 0s - 3ms/step - accuracy: 0.3500 - loss: 1.3463
## Epoch 5/10
## 10/10 - 0s - 4ms/step - accuracy: 0.4667 - loss: 1.3255
## Epoch 6/10
## 10/10 - 0s - 3ms/step - accuracy: 0.3667 - loss: 1.3241
## Epoch 7/10
## 10/10 - 0s - 4ms/step - accuracy: 0.5167 - loss: 1.2762
## Epoch 8/10
## 10/10 - 0s - 3ms/step - accuracy: 0.3833 - loss: 1.2620
## Epoch 9/10
## 10/10 - 0s - 3ms/step - accuracy: 0.5167 - loss: 1.2209
## Epoch 10/10
## 10/10 - 0s - 6ms/step - accuracy: 0.5333 - loss: 1.2025
plot(history2a)
set.seed(123)
history2b <- fit(model2, x=training,
y=trainLabels,
epochs=15,
batch_size = 4)
## Epoch 1/15
## 15/15 - 0s - 24ms/step - accuracy: 0.5833 - loss: 1.1743
## Epoch 2/15
## 15/15 - 0s - 3ms/step - accuracy: 0.5167 - loss: 1.1611
## Epoch 3/15
## 15/15 - 0s - 2ms/step - accuracy: 0.5333 - loss: 1.1083
## Epoch 4/15
## 15/15 - 0s - 3ms/step - accuracy: 0.6833 - loss: 1.0461
## Epoch 5/15
## 15/15 - 0s - 3ms/step - accuracy: 0.5667 - loss: 1.0067
## Epoch 6/15
## 15/15 - 0s - 3ms/step - accuracy: 0.5667 - loss: 0.9694
## Epoch 7/15
## 15/15 - 0s - 3ms/step - accuracy: 0.6500 - loss: 0.9225
## Epoch 8/15
## 15/15 - 0s - 3ms/step - accuracy: 0.7000 - loss: 0.8196
## Epoch 9/15
## 15/15 - 0s - 3ms/step - accuracy: 0.6833 - loss: 0.7906
## Epoch 10/15
## 15/15 - 0s - 5ms/step - accuracy: 0.7167 - loss: 0.7361
## Epoch 11/15
## 15/15 - 0s - 2ms/step - accuracy: 0.7500 - loss: 0.7746
## Epoch 12/15
## 15/15 - 0s - 3ms/step - accuracy: 0.6667 - loss: 0.8199
## Epoch 13/15
## 15/15 - 0s - 2ms/step - accuracy: 0.7500 - loss: 0.7011
## Epoch 14/15
## 15/15 - 0s - 3ms/step - accuracy: 0.7667 - loss: 0.6379
## Epoch 15/15
## 15/15 - 0s - 2ms/step - accuracy: 0.7833 - loss: 0.6380
plot(history2b)
set.seed(123)
model2 %>%
evaluate(testing,testLabels)
## 1/1 - 0s - 112ms/step - accuracy: 0.4231 - loss: 1.0618
## $accuracy
## [1] 0.4230769
##
## $loss
## [1] 1.061782
set.seed(123)
pred2 <- model %>%
predict(testing)
## 1/1 - 0s - 22ms/step
pred2
## [,1] [,2] [,3] [,4]
## [1,] 0.2052311 0.26876014 0.1015893 0.42441946
## [2,] 0.2790193 0.20253578 0.1981393 0.32030568
## [3,] 0.1868092 0.26065952 0.1632926 0.38923863
## [4,] 0.2887329 0.13820300 0.2614725 0.31159166
## [5,] 0.3281626 0.13632673 0.3497936 0.18571706
## [6,] 0.4159975 0.16103640 0.1635657 0.25940049
## [7,] 0.2037498 0.14383301 0.5088997 0.14351737
## [8,] 0.3194025 0.18580954 0.2918125 0.20297545
## [9,] 0.3040723 0.15074617 0.3895478 0.15563369
## [10,] 0.3941506 0.11098588 0.3908422 0.10402124
## [11,] 0.3874596 0.17675640 0.2469133 0.18887067
## [12,] 0.1305620 0.20648573 0.3761832 0.28676903
## [13,] 0.4103707 0.07876841 0.4388428 0.07201809
## [14,] 0.3641017 0.10263371 0.3935627 0.13970180
## [15,] 0.2955174 0.17513637 0.3464928 0.18285340
## [16,] 0.2968966 0.18567362 0.3029403 0.21448940
## [17,] 0.3307474 0.14276147 0.3448962 0.18159500
## [18,] 0.3178866 0.19428258 0.2717184 0.21611241
## [19,] 0.2351886 0.21626684 0.2846549 0.26388970
## [20,] 0.2197315 0.11352270 0.5407102 0.12603557
## [21,] 0.3818920 0.07789458 0.4448255 0.09538786
## [22,] 0.3090226 0.25591794 0.1417481 0.29331139
## [23,] 0.4538968 0.13142340 0.1757105 0.23896928
## [24,] 0.2850437 0.17970487 0.3216736 0.21357781
## [25,] 0.3826675 0.18594480 0.2039630 0.22742471
## [26,] 0.1975918 0.24264613 0.2876964 0.27206576
predicted2 <- data.frame(pred2)
colnames(predicted2) <- classesOfCategories
predicted2
## 1 month 6 months acute healthy
## 1 0.2052311 0.26876014 0.1015893 0.42441946
## 2 0.2790193 0.20253578 0.1981393 0.32030568
## 3 0.1868092 0.26065952 0.1632926 0.38923863
## 4 0.2887329 0.13820300 0.2614725 0.31159166
## 5 0.3281626 0.13632673 0.3497936 0.18571706
## 6 0.4159975 0.16103640 0.1635657 0.25940049
## 7 0.2037498 0.14383301 0.5088997 0.14351737
## 8 0.3194025 0.18580954 0.2918125 0.20297545
## 9 0.3040723 0.15074617 0.3895478 0.15563369
## 10 0.3941506 0.11098588 0.3908422 0.10402124
## 11 0.3874596 0.17675640 0.2469133 0.18887067
## 12 0.1305620 0.20648573 0.3761832 0.28676903
## 13 0.4103707 0.07876841 0.4388428 0.07201809
## 14 0.3641017 0.10263371 0.3935627 0.13970180
## 15 0.2955174 0.17513637 0.3464928 0.18285340
## 16 0.2968966 0.18567362 0.3029403 0.21448940
## 17 0.3307474 0.14276147 0.3448962 0.18159500
## 18 0.3178866 0.19428258 0.2717184 0.21611241
## 19 0.2351886 0.21626684 0.2846549 0.26388970
## 20 0.2197315 0.11352270 0.5407102 0.12603557
## 21 0.3818920 0.07789458 0.4448255 0.09538786
## 22 0.3090226 0.25591794 0.1417481 0.29331139
## 23 0.4538968 0.13142340 0.1757105 0.23896928
## 24 0.2850437 0.17970487 0.3216736 0.21357781
## 25 0.3826675 0.18594480 0.2039630 0.22742471
## 26 0.1975918 0.24264613 0.2876964 0.27206576
26X8 in shape.
df_results2 <- data.frame(Predicted = pred2, Actual=testLabels)
df_results2
## Predicted.1 Predicted.2 Predicted.3 Predicted.4 Actual.1 Actual.2 Actual.3
## 1 0.2052311 0.26876014 0.1015893 0.42441946 0 0 0
## 2 0.2790193 0.20253578 0.1981393 0.32030568 0 0 0
## 3 0.1868092 0.26065952 0.1632926 0.38923863 0 0 0
## 4 0.2887329 0.13820300 0.2614725 0.31159166 0 0 0
## 5 0.3281626 0.13632673 0.3497936 0.18571706 0 0 1
## 6 0.4159975 0.16103640 0.1635657 0.25940049 0 0 1
## 7 0.2037498 0.14383301 0.5088997 0.14351737 0 0 1
## 8 0.3194025 0.18580954 0.2918125 0.20297545 0 0 1
## 9 0.3040723 0.15074617 0.3895478 0.15563369 0 0 1
## 10 0.3941506 0.11098588 0.3908422 0.10402124 0 0 1
## 11 0.3874596 0.17675640 0.2469133 0.18887067 0 0 1
## 12 0.1305620 0.20648573 0.3761832 0.28676903 0 0 1
## 13 0.4103707 0.07876841 0.4388428 0.07201809 1 0 0
## 14 0.3641017 0.10263371 0.3935627 0.13970180 1 0 0
## 15 0.2955174 0.17513637 0.3464928 0.18285340 1 0 0
## 16 0.2968966 0.18567362 0.3029403 0.21448940 1 0 0
## 17 0.3307474 0.14276147 0.3448962 0.18159500 1 0 0
## 18 0.3178866 0.19428258 0.2717184 0.21611241 1 0 0
## 19 0.2351886 0.21626684 0.2846549 0.26388970 1 0 0
## 20 0.2197315 0.11352270 0.5407102 0.12603557 1 0 0
## 21 0.3818920 0.07789458 0.4448255 0.09538786 1 0 0
## 22 0.3090226 0.25591794 0.1417481 0.29331139 1 0 0
## 23 0.4538968 0.13142340 0.1757105 0.23896928 1 0 0
## 24 0.2850437 0.17970487 0.3216736 0.21357781 0 1 0
## 25 0.3826675 0.18594480 0.2039630 0.22742471 0 1 0
## 26 0.1975918 0.24264613 0.2876964 0.27206576 0 1 0
## Actual.4
## 1 1
## 2 1
## 3 1
## 4 1
## 5 0
## 6 0
## 7 0
## 8 0
## 9 0
## 10 0
## 11 0
## 12 0
## 13 0
## 14 0
## 15 0
## 16 0
## 17 0
## 18 0
## 19 0
## 20 0
## 21 0
## 22 0
## 23 0
## 24 0
## 25 0
## 26 0
table(predicted=pred2, actual=testLabels)
## actual
## predicted 0 1
## 0.0720180869102478 1 0
## 0.0778945758938789 1 0
## 0.0787684097886086 1 0
## 0.0953878611326218 1 0
## 0.101589344441891 1 0
## 0.102633714675903 1 0
## 0.104021243751049 1 0
## 0.1109858751297 1 0
## 0.113522700965405 1 0
## 0.126035571098328 1 0
## 0.130562007427216 1 0
## 0.131423398852348 1 0
## 0.136326730251312 1 0
## 0.138202995061874 1 0
## 0.139701798558235 1 0
## 0.14174810051918 1 0
## 0.142761468887329 1 0
## 0.143517374992371 1 0
## 0.143833011388779 1 0
## 0.150746166706085 1 0
## 0.155633687973022 1 0
## 0.161036401987076 1 0
## 0.163292646408081 1 0
## 0.163565680384636 0 1
## 0.175136372447014 1 0
## 0.175710469484329 1 0
## 0.176756396889687 1 0
## 0.179704874753952 0 1
## 0.181594997644424 1 0
## 0.182853400707245 1 0
## 0.185673624277115 1 0
## 0.185717061161995 1 0
## 0.185809537768364 1 0
## 0.185944795608521 0 1
## 0.186809226870537 1 0
## 0.188870668411255 1 0
## 0.194282576441765 1 0
## 0.197591751813889 1 0
## 0.198139324784279 1 0
## 0.202535778284073 1 0
## 0.202975451946259 1 0
## 0.20374983549118 1 0
## 0.203963026404381 1 0
## 0.205231115221977 1 0
## 0.206485733389854 1 0
## 0.213577806949615 1 0
## 0.21448940038681 1 0
## 0.216112405061722 1 0
## 0.216266840696335 1 0
## 0.219731524586678 0 1
## 0.227424710988998 1 0
## 0.235188573598862 0 1
## 0.238969281315804 1 0
## 0.242646127939224 0 1
## 0.2469132989645 0 1
## 0.255917936563492 1 0
## 0.259400486946106 1 0
## 0.260659515857697 1 0
## 0.26147249341011 1 0
## 0.263889700174332 1 0
## 0.268760144710541 1 0
## 0.271718442440033 1 0
## 0.272065758705139 1 0
## 0.279019266366959 1 0
## 0.284654915332794 1 0
## 0.285043686628342 1 0
## 0.286769032478333 1 0
## 0.28769639134407 1 0
## 0.28873285651207 1 0
## 0.291812509298325 0 1
## 0.293311387300491 1 0
## 0.295517385005951 0 1
## 0.296896636486053 0 1
## 0.302940338850021 1 0
## 0.304072320461273 1 0
## 0.309022605419159 0 1
## 0.311591655015945 0 1
## 0.317886620759964 0 1
## 0.319402515888214 1 0
## 0.320305675268173 0 1
## 0.321673601865768 1 0
## 0.328162550926208 1 0
## 0.330747365951538 0 1
## 0.344896197319031 1 0
## 0.346492826938629 1 0
## 0.349793642759323 0 1
## 0.364101678133011 0 1
## 0.376183241605759 0 1
## 0.381891995668411 0 1
## 0.382667452096939 1 0
## 0.387459635734558 1 0
## 0.389238625764847 0 1
## 0.389547795057297 0 1
## 0.390842229127884 0 1
## 0.393562734127045 1 0
## 0.394150644540787 1 0
## 0.410370737314224 0 1
## 0.415997505187988 1 0
## 0.424419462680817 0 1
## 0.4388427734375 1 0
## 0.444825530052185 1 0
## 0.453896820545197 0 1
## 0.508899748325348 0 1
## 0.540710151195526 1 0
predicted2$predicted = ifelse(
(predicted2$"1 month"> predicted2$"6 months") &
(predicted2$"1 month" > predicted2$acute) &
(predicted2$"1 month"> predicted2$healthy)
,
"1 month",(
ifelse(
(predicted2$"6 months" > predicted2$"1 month") &
(predicted2$"6 months" > predicted2$acute) &
(predicted2$"6 months" > predicted2$healthy)
,
"6 months",
ifelse(
(predicted2$acute > predicted2$"1 month") &
(predicted2$acute > predicted2$"6 months") &
(predicted2$acute > predicted2$healthy)
,
"acute",
"healthy"))
)
)
predicted2
## 1 month 6 months acute healthy predicted
## 1 0.2052311 0.26876014 0.1015893 0.42441946 healthy
## 2 0.2790193 0.20253578 0.1981393 0.32030568 healthy
## 3 0.1868092 0.26065952 0.1632926 0.38923863 healthy
## 4 0.2887329 0.13820300 0.2614725 0.31159166 healthy
## 5 0.3281626 0.13632673 0.3497936 0.18571706 acute
## 6 0.4159975 0.16103640 0.1635657 0.25940049 1 month
## 7 0.2037498 0.14383301 0.5088997 0.14351737 acute
## 8 0.3194025 0.18580954 0.2918125 0.20297545 1 month
## 9 0.3040723 0.15074617 0.3895478 0.15563369 acute
## 10 0.3941506 0.11098588 0.3908422 0.10402124 1 month
## 11 0.3874596 0.17675640 0.2469133 0.18887067 1 month
## 12 0.1305620 0.20648573 0.3761832 0.28676903 acute
## 13 0.4103707 0.07876841 0.4388428 0.07201809 acute
## 14 0.3641017 0.10263371 0.3935627 0.13970180 acute
## 15 0.2955174 0.17513637 0.3464928 0.18285340 acute
## 16 0.2968966 0.18567362 0.3029403 0.21448940 acute
## 17 0.3307474 0.14276147 0.3448962 0.18159500 acute
## 18 0.3178866 0.19428258 0.2717184 0.21611241 1 month
## 19 0.2351886 0.21626684 0.2846549 0.26388970 acute
## 20 0.2197315 0.11352270 0.5407102 0.12603557 acute
## 21 0.3818920 0.07789458 0.4448255 0.09538786 acute
## 22 0.3090226 0.25591794 0.1417481 0.29331139 1 month
## 23 0.4538968 0.13142340 0.1757105 0.23896928 1 month
## 24 0.2850437 0.17970487 0.3216736 0.21357781 acute
## 25 0.3826675 0.18594480 0.2039630 0.22742471 1 month
## 26 0.1975918 0.24264613 0.2876964 0.27206576 acute
and get the accuracy.
results2 <- data.frame(predicted=predicted2$predicted,actual=testingtarget)
results2
## predicted actual
## 1 healthy healthy
## 2 healthy healthy
## 3 healthy healthy
## 4 healthy healthy
## 5 acute acute
## 6 1 month acute
## 7 acute acute
## 8 1 month acute
## 9 acute acute
## 10 1 month acute
## 11 1 month acute
## 12 acute acute
## 13 acute 1 month
## 14 acute 1 month
## 15 acute 1 month
## 16 acute 1 month
## 17 acute 1 month
## 18 1 month 1 month
## 19 acute 1 month
## 20 acute 1 month
## 21 acute 1 month
## 22 1 month 1 month
## 23 1 month 1 month
## 24 acute 6 month
## 25 1 month 6 month
## 26 acute 6 month
sum(results2$predicted==results2$actual)/length(results2$predicted)
## [1] 0.4230769
The results were the same 42% first run no changes, and 2nd run after closing and reopenign of 38%, and then even lower accuracy at 31%, but both runs tend to be the same accuracy with 3 or 8 hidden layers. There was not really any improvement with these settings to the neural network. The demo data used 2k+ observations by 21 features, and this used 86 observations by 80 features. The demonstration source scored really well above 85% or 70%, but this data really didn’t do very well at all with more or less hidden networks. It scored the same around 42% in accuracy to classify 4 classes based on top 80 genes in up and down regulation fold change values in acute infection versus chronic infection. We also used 30% of the data to test and 70% of data to train. We have been using an 80/20 split in other models that scored better in random forest.
When I restart Rstudio with .rs.restartR() the 2nd model predicted with 46% accuracy on testing set, and 1st model the exact same accuracy. Then I used the restart option again and got 26.9% accuracy.
Comparing this to using same data in randomForest to get 87% accuracy in prediction of 2 classes o acute or chronic infection, and with 4 classes it had an accuracy of 61% which was better and less stressful to find workarounds to run the python package of keras in R with reticulate and tensorflow that only gave at best run 46% and never produced the same result with each run even with same set.seed() value and with restarting Rstudio. The value always changed in prediction. There must be some sort of randomizer that bypasses the set.seed option outside the model fitting when running the nodes that uses a random number generator in creating the iteration tuning. Or it could be due to any of the dependency packages. It is my opinion from this project not to use a package not made specifically for R. When comparing results in classification with variation or noise or error in the data to predict class and not retrieve best gene targets in this pathology data of Lyme disease patients, the result was worse of 16%.
Deep Neural Networks must work better with large samples as with any machine learning algorithm and was used here for class balancing, but I just remembered that I didn’t add that to the 2nd run of 8 hidden layers so will back track now to do that. Without using the class weights though, DNN in keras with these settings and data set, it performs better than PCA on same data, but not better than random forests with same data. We have to get the class with the most samples and get the weights of the ratio of that class to the other classes and add it to a fit() with same epochs and batch_size earlier of 15 and 4, to get the training model prediction results to compare to the testing set of data.
table(Lyme$class)
##
## 1 month 6 month acute healthy
## 27 10 28 21
28/27
## [1] 1.037037
28/10
## [1] 2.8
28/28
## [1] 1
28/21
## [1] 1.333333
The weights for our classes are 1.04 for month1, 2.8 for month6, 1 for acute, and 1.33 for healthy.
history <- model %>%
fit(training,
trainLabels,
epochs=15,
batch_size=4,
class_weight = list("1 month"=1.0 , "6 month" = 2.8, "acute" = 1, "healthy" = 1.3),
validation_split=0.2)
## Epoch 1/15
## 12/12 - 0s - 21ms/step - accuracy: 0.7083 - loss: 0.9753 - val_accuracy: 0.8333 - val_loss: 0.9309
## Epoch 2/15
## 12/12 - 0s - 6ms/step - accuracy: 0.7292 - loss: 0.9562 - val_accuracy: 0.8333 - val_loss: 0.9251
## Epoch 3/15
## 12/12 - 0s - 5ms/step - accuracy: 0.7292 - loss: 0.9358 - val_accuracy: 0.8333 - val_loss: 0.9208
## Epoch 4/15
## 12/12 - 0s - 5ms/step - accuracy: 0.7292 - loss: 0.9159 - val_accuracy: 0.8333 - val_loss: 0.9165
## Epoch 5/15
## 12/12 - 0s - 5ms/step - accuracy: 0.7500 - loss: 0.8965 - val_accuracy: 0.8333 - val_loss: 0.9119
## Epoch 6/15
## 12/12 - 0s - 5ms/step - accuracy: 0.7708 - loss: 0.8764 - val_accuracy: 0.8333 - val_loss: 0.9098
## Epoch 7/15
## 12/12 - 0s - 5ms/step - accuracy: 0.7708 - loss: 0.8584 - val_accuracy: 0.7500 - val_loss: 0.9056
## Epoch 8/15
## 12/12 - 0s - 6ms/step - accuracy: 0.7917 - loss: 0.8397 - val_accuracy: 0.7500 - val_loss: 0.9037
## Epoch 9/15
## 12/12 - 0s - 7ms/step - accuracy: 0.7917 - loss: 0.8227 - val_accuracy: 0.6667 - val_loss: 0.9015
## Epoch 10/15
## 12/12 - 0s - 8ms/step - accuracy: 0.7917 - loss: 0.8045 - val_accuracy: 0.6667 - val_loss: 0.9000
## Epoch 11/15
## 12/12 - 0s - 7ms/step - accuracy: 0.7917 - loss: 0.7883 - val_accuracy: 0.6667 - val_loss: 0.8980
## Epoch 12/15
## 12/12 - 0s - 7ms/step - accuracy: 0.7917 - loss: 0.7712 - val_accuracy: 0.6667 - val_loss: 0.8970
## Epoch 13/15
## 12/12 - 0s - 8ms/step - accuracy: 0.7917 - loss: 0.7547 - val_accuracy: 0.6667 - val_loss: 0.8959
## Epoch 14/15
## 12/12 - 0s - 7ms/step - accuracy: 0.8125 - loss: 0.7384 - val_accuracy: 0.6667 - val_loss: 0.8952
## Epoch 15/15
## 12/12 - 0s - 6ms/step - accuracy: 0.8125 - loss: 0.7231 - val_accuracy: 0.6667 - val_loss: 0.8949
plot(history)
This data showed in 2 plots above that the training data overfit with
above 80% accuracy but had a high loss as when used to predict the
testing data it predicted with under 75%.
model %>%
evaluate(testing,testLabels)
## 1/1 - 0s - 26ms/step - accuracy: 0.4615 - loss: 1.1279
## $accuracy
## [1] 0.4615385
##
## $loss
## [1] 1.127879
The results on the test model show it got 35% accuracy using those weights. Although, those were the weights for the whole data, which makes sense since we can’t use the weights in the training data that won’t be the same weights in the testing data.
pred3 <- model %>%
predict(testing)
## 1/1 - 0s - 28ms/step
predicted3 <- data.frame(pred3)
colnames(predicted3) <- classesOfCategories
predicted3
## 1 month 6 months acute healthy
## 1 0.18336290 0.24976259 0.05131632 0.51555812
## 2 0.31384629 0.15927246 0.12510520 0.40177596
## 3 0.16288677 0.25216654 0.09917263 0.48577401
## 4 0.30378717 0.10722215 0.18391219 0.40507847
## 5 0.38008893 0.08921684 0.39885283 0.13184145
## 6 0.54656357 0.11279293 0.09290297 0.24774066
## 7 0.09173267 0.08418077 0.76146758 0.06261895
## 8 0.43070769 0.12563123 0.28966445 0.15399659
## 9 0.27664629 0.09612387 0.54023427 0.08699558
## 10 0.42635339 0.05576184 0.47579584 0.04208894
## 11 0.49636069 0.14190561 0.20291014 0.15882356
## 12 0.06201007 0.16015822 0.54153502 0.23629673
## 13 0.46293628 0.03903709 0.46798021 0.03004644
## 14 0.42015547 0.05712447 0.44840401 0.07431605
## 15 0.29740363 0.11302797 0.48539883 0.10416954
## 16 0.32085422 0.15431388 0.33389607 0.19093582
## 17 0.39013615 0.08714156 0.41611046 0.10661182
## 18 0.36020768 0.16188848 0.29956284 0.17834099
## 19 0.20054211 0.21565567 0.32921991 0.25458238
## 20 0.10336647 0.05916153 0.78646255 0.05100938
## 21 0.35799044 0.04485825 0.54336417 0.05378722
## 22 0.37479120 0.23029664 0.07659261 0.31831956
## 23 0.52260792 0.10317834 0.08461027 0.28960350
## 24 0.34858719 0.13816924 0.32686472 0.18637879
## 25 0.51155967 0.15035981 0.11311059 0.22496992
## 26 0.18707566 0.24802709 0.26121375 0.30368352
predicted3a <- data.frame(predicted3, testingtarget)
predicted3a
## X1.month X6.months acute healthy testingtarget
## 1 0.18336290 0.24976259 0.05131632 0.51555812 healthy
## 2 0.31384629 0.15927246 0.12510520 0.40177596 healthy
## 3 0.16288677 0.25216654 0.09917263 0.48577401 healthy
## 4 0.30378717 0.10722215 0.18391219 0.40507847 healthy
## 5 0.38008893 0.08921684 0.39885283 0.13184145 acute
## 6 0.54656357 0.11279293 0.09290297 0.24774066 acute
## 7 0.09173267 0.08418077 0.76146758 0.06261895 acute
## 8 0.43070769 0.12563123 0.28966445 0.15399659 acute
## 9 0.27664629 0.09612387 0.54023427 0.08699558 acute
## 10 0.42635339 0.05576184 0.47579584 0.04208894 acute
## 11 0.49636069 0.14190561 0.20291014 0.15882356 acute
## 12 0.06201007 0.16015822 0.54153502 0.23629673 acute
## 13 0.46293628 0.03903709 0.46798021 0.03004644 1 month
## 14 0.42015547 0.05712447 0.44840401 0.07431605 1 month
## 15 0.29740363 0.11302797 0.48539883 0.10416954 1 month
## 16 0.32085422 0.15431388 0.33389607 0.19093582 1 month
## 17 0.39013615 0.08714156 0.41611046 0.10661182 1 month
## 18 0.36020768 0.16188848 0.29956284 0.17834099 1 month
## 19 0.20054211 0.21565567 0.32921991 0.25458238 1 month
## 20 0.10336647 0.05916153 0.78646255 0.05100938 1 month
## 21 0.35799044 0.04485825 0.54336417 0.05378722 1 month
## 22 0.37479120 0.23029664 0.07659261 0.31831956 1 month
## 23 0.52260792 0.10317834 0.08461027 0.28960350 1 month
## 24 0.34858719 0.13816924 0.32686472 0.18637879 6 month
## 25 0.51155967 0.15035981 0.11311059 0.22496992 6 month
## 26 0.18707566 0.24802709 0.26121375 0.30368352 6 month
predicted3$predicted = ifelse(
(predicted3$"1 month"> predicted3$"6 months") &
(predicted3$"1 month" > predicted3$acute) &
(predicted3$"1 month"> predicted3$healthy)
,
"1 month",(
ifelse(
(predicted3$"6 months" > predicted3$"1 month") &
(predicted3$"6 months" > predicted3$acute) &
(predicted3$"6 months" > predicted3$healthy)
,
"6 months",
ifelse(
(predicted3$acute > predicted3$"1 month") &
(predicted3$acute > predicted3$"6 months") &
(predicted3$acute > predicted3$healthy)
,
"acute",
"healthy"))
)
)
predicted3
## 1 month 6 months acute healthy predicted
## 1 0.18336290 0.24976259 0.05131632 0.51555812 healthy
## 2 0.31384629 0.15927246 0.12510520 0.40177596 healthy
## 3 0.16288677 0.25216654 0.09917263 0.48577401 healthy
## 4 0.30378717 0.10722215 0.18391219 0.40507847 healthy
## 5 0.38008893 0.08921684 0.39885283 0.13184145 acute
## 6 0.54656357 0.11279293 0.09290297 0.24774066 1 month
## 7 0.09173267 0.08418077 0.76146758 0.06261895 acute
## 8 0.43070769 0.12563123 0.28966445 0.15399659 1 month
## 9 0.27664629 0.09612387 0.54023427 0.08699558 acute
## 10 0.42635339 0.05576184 0.47579584 0.04208894 acute
## 11 0.49636069 0.14190561 0.20291014 0.15882356 1 month
## 12 0.06201007 0.16015822 0.54153502 0.23629673 acute
## 13 0.46293628 0.03903709 0.46798021 0.03004644 acute
## 14 0.42015547 0.05712447 0.44840401 0.07431605 acute
## 15 0.29740363 0.11302797 0.48539883 0.10416954 acute
## 16 0.32085422 0.15431388 0.33389607 0.19093582 acute
## 17 0.39013615 0.08714156 0.41611046 0.10661182 acute
## 18 0.36020768 0.16188848 0.29956284 0.17834099 1 month
## 19 0.20054211 0.21565567 0.32921991 0.25458238 acute
## 20 0.10336647 0.05916153 0.78646255 0.05100938 acute
## 21 0.35799044 0.04485825 0.54336417 0.05378722 acute
## 22 0.37479120 0.23029664 0.07659261 0.31831956 1 month
## 23 0.52260792 0.10317834 0.08461027 0.28960350 1 month
## 24 0.34858719 0.13816924 0.32686472 0.18637879 1 month
## 25 0.51155967 0.15035981 0.11311059 0.22496992 1 month
## 26 0.18707566 0.24802709 0.26121375 0.30368352 healthy
and get the accuracy.
results3 <- data.frame(predicted=predicted3$predicted,actual=testingtarget)
results3
## predicted actual
## 1 healthy healthy
## 2 healthy healthy
## 3 healthy healthy
## 4 healthy healthy
## 5 acute acute
## 6 1 month acute
## 7 acute acute
## 8 1 month acute
## 9 acute acute
## 10 acute acute
## 11 1 month acute
## 12 acute acute
## 13 acute 1 month
## 14 acute 1 month
## 15 acute 1 month
## 16 acute 1 month
## 17 acute 1 month
## 18 1 month 1 month
## 19 acute 1 month
## 20 acute 1 month
## 21 acute 1 month
## 22 1 month 1 month
## 23 1 month 1 month
## 24 1 month 6 month
## 25 1 month 6 month
## 26 healthy 6 month
sum(results3$predicted==results3$actual)/length(results3$predicted)
## [1] 0.4615385
So, even with the added class weights to balance out the samples, the limitations on this data were not good enough for DNN to work. Deep Neural Networks are designed for very big data, not just a lot of features but also massive gigabytes and tetrabytes of data to be used in facial recognition programs and sentiment analysis. Of course it wouldn’t be better than random forest in class predictions. But it was better than PCA in predictions. Here we saw the this 86X81 data frame was at best modeled to predict 4 classes with 46% accuracy but you will have to take my word as the run with cache clearing, restarting Rstudio, and having the same set.seed() set to 123 kept bringing up a different accuracy measure. The parameters were limited as well because it flagged errors and stopped processing when the epochs or iterations and batch size per iteration was more than the number of observations in the sample. AI feedback and error feedback within Rstudio did say you can just duplicate your 86X81 feature space by adding more duplicate rows, but that wasn’t tested out, like making the 86X81 data into an 86,000X81 data frame. That may be one way to utilize the features of DNN. But for now, PCA within randomForest gave 18% at best in class predictions of 4 classes, while randomForest gave 65% prediction accuracy on 4 classes, and when reduced to 2 classes was able to get 87% accuracy. See the previous Lyme disease projects in my profile for the data here as well as the results.
Thanks for following along.