Github Link: https://github.com/asmozo24/Data621_Assignment2 Web link: https://rpubs.com/amekueko/742903

Using R to acquire data

What is the structure of data?

## Rows: 181
## Columns: 11
## $ pregnant           <int> 7, 2, 3, 1, 4, 1, 9, 8, 1, 2, 5, 5, 13, 0, 7, 12...
## $ glucose            <int> 124, 122, 107, 91, 83, 100, 89, 120, 79, 123, 88...
## $ diastolic          <int> 70, 76, 62, 64, 86, 74, 62, 78, 60, 48, 78, 72, ...
## $ skinfold           <int> 33, 27, 13, 24, 19, 12, 0, 0, 42, 32, 30, 43, 0,...
## $ insulin            <int> 215, 200, 48, 0, 0, 46, 0, 0, 48, 165, 0, 75, 0,...
## $ bmi                <dbl> 25.5, 35.9, 22.9, 29.2, 29.3, 19.5, 22.5, 25.0, ...
## $ pedigree           <dbl> 0.161, 0.483, 0.678, 0.192, 0.317, 0.149, 0.142,...
## $ age                <int> 37, 26, 23, 21, 34, 28, 33, 64, 23, 26, 37, 33, ...
## $ class              <int> 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, ...
## $ scored.class       <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, ...
## $ scored.probability <dbl> 0.32845226, 0.27319044, 0.10966039, 0.05599835, ...
## [1] 181  11
## [1] 0
##             class
## scored.class   0   1
##            0 119  30
##            1   5  27
## The classification accuracy for dataset is:  80.663 %
## The classification error rate for dataset is:  19.337 %
## The classification precision for dataset is:  84.375 %
## The classification sensitivity for dataset is:  47.368 %
## The classification specificity for dataset is:  95.968 %
## The classification F1 score for dataset is:  60.674 %

##   Classification_Metrics  Score
## 1               Accuracy 80.663
## 2             Error Rate 19.337
## 3              Precision 84.375
## 4            Sensitivity 47.368
## 5            Specificity 95.968
## 6               F1 Score 60.674
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   0   1
##          0 119  30
##          1   5  27
##                                           
##                Accuracy : 0.8066          
##                  95% CI : (0.7415, 0.8615)
##     No Information Rate : 0.6851          
##     P-Value [Acc > NIR] : 0.0001712       
##                                           
##                   Kappa : 0.4916          
##                                           
##  Mcnemar's Test P-Value : 4.976e-05       
##                                           
##             Sensitivity : 0.9597          
##             Specificity : 0.4737          
##          Pos Pred Value : 0.7987          
##          Neg Pred Value : 0.8438          
##              Prevalence : 0.6851          
##          Detection Rate : 0.6575          
##    Detection Prevalence : 0.8232          
##       Balanced Accuracy : 0.7167          
##                                           
##        'Positive' Class : 0               
## 
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases

## 
## Call:
## roc.default(response = classif_df$class, predictor = classif_df$scored.probability,     plot = T, print.auc = T)
## 
## Data: classif_df$scored.probability in 124 controls (classif_df$class 0) < 57 cases (classif_df$class 1).
## Area under the curve: 0.8503