Creating a logistic regression model for image classification.

About the Dataset.

We have acquired 250 images of which 125 of them are Cats and the other being flowers. For our model we will first load and process the images

## Loading required package: magrittr
## 
## Attaching package: 'imager'
## The following object is masked from 'package:magrittr':
## 
##     add
## The following objects are masked from 'package:stats':
## 
##     convolve, spectrum
## The following object is masked from 'package:graphics':
## 
##     frame
## The following object is masked from 'package:base':
## 
##     save.image

Now that we are done with loading, processing and splitting the data, let’s make the logistic regression model.

##   2   3  10  12  15  18  19  27  28  29  31  33  37  47  57  58  61  62  65  73 
##   0   0   0   0   0   0   0   0   1   0   1   0   0   0   0   0   0   0   0   0 
##  80  82  87  96 100 101 104 105 111 115 123 124 128 130 133 134 138 145 147 149 
##   0   0   0   0   0   0   0   0   0   0   0   0   1   1   1   1   1   1   1   0 
## 157 163 165 172 173 174 176 177 180 182 184 186 189 191 193 194 196 203 204 206 
##   1   1   1   1   1   1   1   1   1   0   1   1   1   1   1   1   1   1   1   1 
## 208 212 213 215 217 218 219 220 222 227 230 232 235 240 241 
##   1   1   1   1   0   0   1   1   1   1   1   1   1   1   1

Evaluating the logistic regression model

## [1] "Confusion Matrix:"
##          Actual
## Predicted  0  1
##         0 30  4
##         1  2 39
## [1] "Accuracy: 0.92"
## [1] "Precision: 0.9512"
## [1] "Recall: 0.907"
## [1] "F1 Score: 0.9286"

From this little evaluation, we understand the Log_Reg model has done pretty well on a very small labelled dataset. Below is the plot for ROC Curve and further see what features helped determine the score

## [1] "AUC: 0.9578"
## [1] "Top 10 important features:"
##                 feature coefficient      p_value
## X1                   X1 -246.479766 1.369728e-11
## X4                   X4  194.176820 1.920739e-11
## X3                   X3   54.202074 1.155562e-09
## X7                   X7   50.821998 4.097232e-02
## X6                   X6  -29.941606 1.871504e-01
## (Intercept) (Intercept)   -4.449820 4.791257e-03
## X2                   X2   -3.157375 6.454207e-01

Part B - The Clustering.

Before clustering, we first need to evaluate if our data can be clustered, for that the hopkin’s statistic is calculated. Consider the Hopkin’s Statisitc H

If, H close to 1 (> 0.75): Data has a high tendency to cluster. H around 0.5: Data is randomly distributed. H close to 0: Data is uniformly distributed.

## Loading required package: ggplot2
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
## [1] "Hopkins statistic: 0.8273"
##    all_labels
##      0  1
##   1 45 64
##   2 80 61
## [1] "Clustering accuracy: 0.424"
## [1] "Average silhouette width: 0.2365"

An ANN model

## [1] "Accuracy: 1"
plot(nn)
## [1] "Accuracy: 1"
plot(nn)