How well can KNN modeling predict if a person has diabetes or heart disease based off of their medical statistics?
## [1] -6.577989
## [1] 0.8349515
## [1] -5.215195
## [1] 0.4761905
| Diabetes | Heart Disease | |
|---|---|---|
| Number of Neighbors | 3 | 7 |
| Accuracy | .78 | .64 |
| Kappa | .52 | .21 |
| TPR | .6 | .76 |
| FPR | .1 | .56 |
| LogLoss | -6.58 | -5.22 |
| F1 | .83 | 0.47 |
The threshold for this situation is low because do not want to miss any positive classifications (basically tell someone they don’t have a disease when they do)
This means the most important statistics are: Accuracy (want to be high) and TNR (want to be high, means portion of false negatives is low)
## Confusion Matrix and Statistics
##
## Actual
## Prediction 0 1
## 0 23 15
## 1 25 15
##
## Accuracy : 0.4872
## 95% CI : (0.3723, 0.6031)
## No Information Rate : 0.6154
## P-Value [Acc > NIR] : 0.9921
##
## Kappa : -0.0196
##
## Mcnemar's Test P-Value : 0.1547
##
## Sensitivity : 0.5000
## Specificity : 0.4792
## Pos Pred Value : 0.3750
## Neg Pred Value : 0.6053
## Precision : 0.3750
## Recall : 0.5000
## F1 : 0.4286
## Prevalence : 0.3846
## Detection Rate : 0.1923
## Detection Prevalence : 0.5128
## Balanced Accuracy : 0.4896
##
## 'Positive' Class : 1
##
## Confusion Matrix and Statistics
##
## Actual
## Prediction 0 1
## 0 11 11
## 1 12 27
##
## Accuracy : 0.623
## 95% CI : (0.4896, 0.7439)
## No Information Rate : 0.623
## P-Value [Acc > NIR] : 0.5567
##
## Kappa : 0.1904
##
## Mcnemar's Test P-Value : 1.0000
##
## Sensitivity : 0.7105
## Specificity : 0.4783
## Pos Pred Value : 0.6923
## Neg Pred Value : 0.5000
## Precision : 0.6923
## Recall : 0.7105
## F1 : 0.7013
## Prevalence : 0.6230
## Detection Rate : 0.4426
## Detection Prevalence : 0.6393
## Balanced Accuracy : 0.5944
##
## 'Positive' Class : 1
##
| Diabetes | Heart Disease | |
|---|---|---|
| Number of Neighbors | 3 | 7 |
| Accuracy | .78 | .64 |
| Kappa | .52 | .21 |
| TPR | .6 | .76 |
| FPR | .1 | .56 |
| LogLoss | -6.58 | -5.22 |
| F1 | .83 | 0.47 |
These models with the adjusted threshold are not great. They are not much better than guessing if a person has a serious health condition–which we definitely don’t want to do.
More data could improve these models as the testing and training sets were pretty small. This might allow us to lower the tolerance more to limit the number of false negative classifications.