Github Link: https://github.com/asmozo24/Data621_Assignment2 Web link: https://rpubs.com/amekueko/742903
Using R to acquire data
What is the structure of data?
## Rows: 181
## Columns: 11
## $ pregnant <int> 7, 2, 3, 1, 4, 1, 9, 8, 1, 2, 5, 5, 13, 0, 7, 12...
## $ glucose <int> 124, 122, 107, 91, 83, 100, 89, 120, 79, 123, 88...
## $ diastolic <int> 70, 76, 62, 64, 86, 74, 62, 78, 60, 48, 78, 72, ...
## $ skinfold <int> 33, 27, 13, 24, 19, 12, 0, 0, 42, 32, 30, 43, 0,...
## $ insulin <int> 215, 200, 48, 0, 0, 46, 0, 0, 48, 165, 0, 75, 0,...
## $ bmi <dbl> 25.5, 35.9, 22.9, 29.2, 29.3, 19.5, 22.5, 25.0, ...
## $ pedigree <dbl> 0.161, 0.483, 0.678, 0.192, 0.317, 0.149, 0.142,...
## $ age <int> 37, 26, 23, 21, 34, 28, 33, 64, 23, 26, 37, 33, ...
## $ class <int> 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, ...
## $ scored.class <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, ...
## $ scored.probability <dbl> 0.32845226, 0.27319044, 0.10966039, 0.05599835, ...
## [1] 181 11
## [1] 0
## class
## scored.class 0 1
## 0 119 30
## 1 5 27
## The classification accuracy for dataset is: 80.663 %
## The classification error rate for dataset is: 19.337 %
## The classification precision for dataset is: 84.375 %
## The classification sensitivity for dataset is: 47.368 %
## The classification specificity for dataset is: 95.968 %
## The classification F1 score for dataset is: 60.674 %
## Classification_Metrics Score
## 1 Accuracy 80.663
## 2 Error Rate 19.337
## 3 Precision 84.375
## 4 Sensitivity 47.368
## 5 Specificity 95.968
## 6 F1 Score 60.674
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 119 30
## 1 5 27
##
## Accuracy : 0.8066
## 95% CI : (0.7415, 0.8615)
## No Information Rate : 0.6851
## P-Value [Acc > NIR] : 0.0001712
##
## Kappa : 0.4916
##
## Mcnemar's Test P-Value : 4.976e-05
##
## Sensitivity : 0.9597
## Specificity : 0.4737
## Pos Pred Value : 0.7987
## Neg Pred Value : 0.8438
## Prevalence : 0.6851
## Detection Rate : 0.6575
## Detection Prevalence : 0.8232
## Balanced Accuracy : 0.7167
##
## 'Positive' Class : 0
##
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
##
## Call:
## roc.default(response = classif_df$class, predictor = classif_df$scored.probability, plot = T, print.auc = T)
##
## Data: classif_df$scored.probability in 124 controls (classif_df$class 0) < 57 cases (classif_df$class 1).
## Area under the curve: 0.8503