We have acquired 250 images of which 125 of them are Cats and the other being flowers. For our model we will first load and process the images
## Loading required package: magrittr
##
## Attaching package: 'imager'
## The following object is masked from 'package:magrittr':
##
## add
## The following objects are masked from 'package:stats':
##
## convolve, spectrum
## The following object is masked from 'package:graphics':
##
## frame
## The following object is masked from 'package:base':
##
## save.image
Now that we are done with loading, processing and splitting the data, let’s make the logistic regression model.
## 2 3 10 12 15 18 19 27 28 29 31 33 37 47 57 58 61 62 65 73
## 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0
## 80 82 87 96 100 101 104 105 111 115 123 124 128 130 133 134 138 145 147 149
## 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 0
## 157 163 165 172 173 174 176 177 180 182 184 186 189 191 193 194 196 203 204 206
## 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1
## 208 212 213 215 217 218 219 220 222 227 230 232 235 240 241
## 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1
## [1] "Confusion Matrix:"
## Actual
## Predicted 0 1
## 0 30 4
## 1 2 39
## [1] "Accuracy: 0.92"
## [1] "Precision: 0.9512"
## [1] "Recall: 0.907"
## [1] "F1 Score: 0.9286"
From this little evaluation, we understand the Log_Reg model has done pretty well on a very small labelled dataset. Below is the plot for ROC Curve and further see what features helped determine the score
## [1] "AUC: 0.9578"
## [1] "Top 10 important features:"
## feature coefficient p_value
## X1 X1 -246.479766 1.369728e-11
## X4 X4 194.176820 1.920739e-11
## X3 X3 54.202074 1.155562e-09
## X7 X7 50.821998 4.097232e-02
## X6 X6 -29.941606 1.871504e-01
## (Intercept) (Intercept) -4.449820 4.791257e-03
## X2 X2 -3.157375 6.454207e-01
Before clustering, we first need to evaluate if our data can be clustered, for that the hopkin’s statistic is calculated. Consider the Hopkin’s Statisitc H
If, H close to 1 (> 0.75): Data has a high tendency to cluster. H around 0.5: Data is randomly distributed. H close to 0: Data is uniformly distributed.
## Loading required package: ggplot2
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
## [1] "Hopkins statistic: 0.8273"
## all_labels
## 0 1
## 1 45 64
## 2 80 61
## [1] "Clustering accuracy: 0.424"
## [1] "Average silhouette width: 0.2365"
## [1] "Accuracy: 1"
plot(nn)
## [1] "Accuracy: 1"
plot(nn)