Introduction:

In this homework, you will apply logistic regression to a real-world dataset: the Pima Indians Diabetes Database. This dataset contains medical records from 768 women of Pima Indian heritage, aged 21 or older, and is used to predict the onset of diabetes (binary outcome: 0 = no diabetes, 1 = diabetes) based on physiological measurements.

The data is publicly available from the UCI Machine Learning Repository and can be imported directly.

Dataset URL: https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv

Columns (no header in the CSV, so we need to assign them manually):

  1. Pregnancies: Number of times pregnant
  2. Glucose: Plasma glucose concentration (2-hour test)
  3. BloodPressure: Diastolic blood pressure (mm Hg)
  4. SkinThickness: Triceps skin fold thickness (mm)
  5. Insulin: 2-hour serum insulin (mu U/ml)
  6. BMI: Body mass index (weight in kg/(height in m)^2)
  7. DiabetesPedigreeFunction: Diabetes pedigree function (a function scoring genetic risk)
  8. Age: Age in years
  9. Outcome: Class variable (0 = no diabetes, 1 = diabetes)

Task Overview: You will load the data, build a logistic regression model to predict diabetes onset using a subset of predictors (Glucose, BMI, Age), interpret the model, evaluate it with a confusion matrix and metrics, and analyze the ROC curve and AUC.

Cleaning the dataset Don’t change the following code

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   4.0.2     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.1.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
url <- "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv"

data <- read.csv(url, header = FALSE)

colnames(data) <- c("Pregnancies", "Glucose", "BloodPressure", "SkinThickness", "Insulin", "BMI", "DiabetesPedigreeFunction", "Age", "Outcome")

data$Outcome <- as.factor(data$Outcome)

# Handle missing values (replace 0s with NA because 0 makes no sense here)
data$Glucose[data$Glucose == 0] <- NA
data$BloodPressure[data$BloodPressure == 0] <- NA
data$BMI[data$BMI == 0] <- NA


colSums(is.na(data))
##              Pregnancies                  Glucose            BloodPressure 
##                        0                        5                       35 
##            SkinThickness                  Insulin                      BMI 
##                        0                        0                       11 
## DiabetesPedigreeFunction                      Age                  Outcome 
##                        0                        0                        0

Question 1: Create and Interpret a Logistic Regression Model

- Fit a logistic regression model to predict Outcome using Glucose, BMI, and Age.

## Enter your code here
logistic <- glm(Outcome ~ Glucose + BMI + Age, data=data, family="binomial")

summary(logistic)
## 
## Call:
## glm(formula = Outcome ~ Glucose + BMI + Age, family = "binomial", 
##     data = data)
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -9.032377   0.711037 -12.703  < 2e-16 ***
## Glucose      0.035548   0.003481  10.212  < 2e-16 ***
## BMI          0.089753   0.014377   6.243  4.3e-10 ***
## Age          0.028699   0.007809   3.675 0.000238 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 974.75  on 751  degrees of freedom
## Residual deviance: 724.96  on 748  degrees of freedom
##   (16 observations deleted due to missingness)
## AIC: 732.96
## 
## Number of Fisher Scoring iterations: 4

Calculating R^2

r_square <- 1 - (logistic$deviance/logistic$null.deviance)

r_square
## [1] 0.25626

This indicates that about 25 % of the variation in the data can be explained Glucose, BMI, and Age.

What does the intercept represent (log-odds of diabetes when predictors are zero)?

For each predictor (Glucose, BMI, Age), does a one-unit increase raise or lower the odds of diabetes? Are they significant (p-value < 0.05)?

Question 2: Confusion Matrix and Important Metric

Calculate and report the metrics:

Accuracy: (TP + TN) / Total Sensitivity (Recall): TP / (TP + FN) Specificity: TN / (TN + FP) Precision: TP / (TP + FP)

Use the following starter code

# Keep only rows with no missing values in Glucose, BMI, or Age
data_subset <- data[complete.cases(data[, c("Glucose", "BMI", "Age")]), ]

#Create a numeric version of the outcome (0 = no diabetes, 1 = diabetes).This is required for calculating confusion matrices.
data_subset$Outcome_num <- ifelse(data_subset$Outcome == "1", 1, 0)


# Predicted probabilities
predicted.data <- data.frame(
  probability.of.hd=logistic$fitted.values,
  age=data_subset$Age, glucose = data_subset$Glucose, bmi = data_subset$BMI)

predicted.data
##     probability.of.hd age glucose  bmi
## 1          0.66360006  50     148 33.6
## 2          0.06101402  31      85 26.6
## 3          0.61834186  32     183 23.3
## 4          0.06043396  21      89 28.1
## 5          0.65771328  33     137 43.1
## 6          0.14802668  30     116 25.6
## 7          0.06116212  26      78 31.0
## 8          0.28013239  29     115 35.3
## 9          0.90283168  53     197 30.5
## 11         0.29185049  30     110 37.6
## 12         0.79018904  34     168 38.0
## 13         0.49423685  57     139 27.1
## 14         0.88904285  59     189 30.1
## 15         0.65652969  51     166 25.8
## 16         0.13393352  32     100 30.0
## 17         0.54057167  31     118 45.8
## 18         0.15678020  31     107 29.6
## 19         0.36875552  33     103 43.3
## 20         0.28484895  32     115 34.6
## 21         0.43753713  27     126 39.3
## 22         0.28886248  50      99 35.4
## 23         0.93606735  41     196 39.8
## 24         0.20309566  29     119 29.0
## 25         0.68988838  51     143 36.6
## 26         0.34957706  41     125 31.1
## 27         0.72382298  43     147 39.4
## 28         0.05362751  22      97 23.2
## 29         0.43793280  57     145 22.2
## 30         0.32692619  38     117 34.1
## 31         0.44902956  60     109 36.0
## 32         0.55575994  28     158 31.6
## 33         0.04535146  22      88 24.8
## 34         0.04022137  28      92 19.9
## 35         0.28355775  45     122 27.6
## 36         0.09365574  33     103 24.0
## 37         0.46443795  35     138 33.2
## 38         0.24352489  46     102 32.9
## 39         0.16388268  27      90 38.2
## 40         0.46267837  56     111 37.1
## 41         0.76206518  26     180 34.0
## 42         0.59035695  37     133 40.2
## 43         0.13595021  48     106 22.7
## 44         0.93528537  54     171 45.4
## 45         0.55649424  40     159 27.4
## 46         0.86452121  25     180 42.0
## 47         0.41473239  29     146 29.7
## 48         0.03343950  22      71 28.0
## 49         0.27449787  31     103 39.1
## 51         0.04750056  22     103 19.4
## 52         0.07420421  26     101 24.2
## 53         0.05451568  30      88 24.4
## 54         0.87138830  58     176 33.7
## 55         0.65012987  42     150 34.7
## 56         0.02252439  21      73 23.0
## 57         0.89802263  41     187 37.7
## 58         0.40432752  31     100 46.8
## 59         0.74180750  44     146 40.5
## 60         0.28015166  22     105 41.5
## 62         0.44217047  39     133 32.9
## 63         0.01490161  36      44 25.0
## 64         0.25891619  24     141 25.4
## 65         0.30350831  42     114 32.8
## 66         0.12005398  32      99 29.0
## 67         0.24046916  38     109 32.5
## 68         0.55590485  54     109 42.7
## 69         0.03997582  25      95 19.6
## 70         0.38375591  27     146 28.9
## 71         0.15172557  28     100 32.9
## 72         0.31473014  26     139 28.6
## 73         0.63351150  42     126 43.4
## 74         0.34608813  23     129 35.1
## 75         0.06176809  22      79 32.0
## 77         0.06146859  41      62 32.6
## 78         0.18291002  27      95 37.7
## 79         0.56166300  26     131 43.2
## 80         0.10732115  24     112 25.0
## 81         0.08520738  22     113 22.4
## 83         0.08173799  36      83 29.3
## 84         0.06896305  22     101 24.6
## 85         0.78236624  37     137 48.8
## 86         0.19166509  27     110 32.4
## 87         0.33450675  45     106 36.6
## 88         0.21824692  26     100 38.5
## 89         0.59050304  43     136 37.1
## 90         0.10326043  24     107 26.5
## 91         0.02040067  21      80 19.1
## 92         0.30744087  34     123 32.0
## 93         0.31948020  42      81 46.7
## 94         0.39870087  60     134 23.8
## 95         0.23776250  21     142 24.7
## 96         0.56884043  40     144 33.9
## 97         0.09647762  24      92 31.6
## 98         0.01718930  22      71 20.4
## 99         0.07653216  23      93 28.7
## 100        0.65810775  31     122 49.7
## 101        0.77018913  33     163 39.0
## 102        0.33387716  22     151 26.1
## 103        0.12273756  21     125 22.5
## 104        0.04407517  24      81 26.6
## 105        0.15687002  27      85 39.6
## 106        0.20185497  21     126 28.7
## 107        0.05549181  27      96 22.4
## 108        0.44920355  37     144 29.5
## 109        0.09229839  25      83 34.3
## 110        0.16661940  24      95 37.4
## 111        0.67346049  24     171 33.3
## 112        0.70042432  46     155 34.0
## 113        0.08254696  23      89 31.2
## 114        0.07164768  25      76 34.0
## 115        0.62528208  39     160 30.5
## 116        0.67008425  61     146 31.2
## 117        0.38171854  38     124 34.0
## 118        0.07464178  25      78 33.7
## 119        0.08152471  22      97 28.2
## 120        0.05582038  21      99 23.2
## 121        0.90191908  25     162 53.2
## 122        0.20945381  24     111 34.2
## 123        0.17465864  23     107 33.6
## 124        0.51139163  69     132 26.8
## 125        0.20316943  23     113 33.3
## 126        0.44483507  26      88 55.0
## 127        0.48619280  30     120 42.9
## 128        0.23346251  23     118 33.3
## 129        0.34777790  40     117 34.5
## 130        0.26573200  62     105 27.9
## 131        0.67483938  33     173 29.7
## 132        0.31871597  33     122 33.3
## 133        0.72476634  30     170 34.5
## 134        0.18399064  39      84 38.3
## 135        0.04834651  26      96 21.1
## 136        0.33949245  31     125 33.8
## 137        0.10807987  21     100 30.8
## 138        0.07452834  22      93 28.7
## 139        0.30701297  29     129 31.2
## 140        0.23426580  28     105 36.9
## 141        0.26698030  55     128 21.1
## 142        0.34785494  38     106 39.5
## 143        0.16180713  22     108 32.5
## 144        0.25353709  42     108 32.4
## 145        0.51149487  23     154 32.8
## 147        0.05287106  41      57 32.8
## 148        0.17493386  34     106 30.5
## 149        0.74711670  65     147 33.7
## 150        0.06000638  22      90 27.3
## 151        0.46199537  24     136 37.4
## 152        0.12428635  37     114 21.9
## 153        0.68933166  42     156 34.3
## 154        0.67051462  23     153 40.6
## 155        0.96022281  43     188 47.9
## 156        0.86895301  36     152 50.0
## 157        0.06282464  21      99 24.6
## 158        0.09658196  23     109 25.2
## 159        0.06477073  22      88 29.0
## 160        0.85590643  47     163 40.9
## 161        0.50854865  36     151 29.7
## 162        0.31513699  45     102 37.2
## 163        0.44079186  27     114 44.2
## 164        0.09892429  21     100 29.7
## 165        0.34954797  32     131 31.6
## 166        0.18616725  41     104 29.9
## 167        0.44449836  22     148 32.5
## 168        0.24339382  34     120 29.6
## 169        0.19361257  29     110 31.9
## 170        0.15377524  29     111 28.4
## 171        0.16673816  36     102 30.8
## 172        0.43550662  29     134 35.4
## 173        0.06733514  25      87 28.9
## 174        0.15979544  23      79 43.5
## 175        0.05988683  33      75 29.7
## 176        0.78563291  36     179 32.7
## 177        0.11866407  42      85 31.2
## 178        0.91067595  26     129 67.1
## 179        0.80825740  47     143 45.0
## 180        0.53993201  37     130 39.1
## 181        0.05025601  32      87 23.2
## 182        0.26703676  23     119 34.9
## 184        0.03707194  27      73 26.8
## 185        0.40252230  40     141 27.6
## 186        0.90574248  41     194 35.9
## 187        0.86120289  60     181 30.1
## 188        0.34005026  33     128 32.0
## 189        0.14630667  31     109 27.9
## 190        0.36876070  25     139 31.6
## 191        0.07904064  21     111 22.6
## 192        0.36791134  40     123 33.1
## 193        0.59421282  36     159 30.4
## 194        0.83322339  40     135 52.3
## 195        0.06814984  42      85 24.4
## 196        0.72166676  29     158 39.4
## 197        0.07473295  21     105 24.3
## 198        0.07492956  23     107 22.9
## 199        0.21618023  26     109 34.8
## 200        0.45868538  29     148 30.9
## 201        0.16377117  21     113 31.0
## 202        0.56854406  28     138 40.1
## 203        0.13888668  32     108 27.3
## 204        0.05179431  27      99 20.4
## 205        0.39920095  55     103 37.7
## 206        0.10279202  27     111 23.9
## 207        0.94962688  57     196 37.5
## 208        0.83235858  52     162 37.7
## 209        0.11534291  21      96 33.2
## 210        0.86661368  41     184 35.5
## 211        0.04976700  25      81 27.7
## 212        0.67335126  24     147 42.8
## 213        0.89304307  60     179 34.2
## 214        0.61220621  24     140 42.6
## 215        0.27923019  36     112 34.2
## 216        0.76451750  38     151 41.8
## 217        0.22670470  25     109 35.8
## 218        0.27330482  32     125 30.0
## 219        0.07659116  32      85 29.0
## 220        0.38185630  41     112 37.8
## 221        0.72467033  21     177 34.6
## 222        0.78827161  66     158 31.6
## 223        0.18565015  37     119 25.2
## 224        0.58685205  61     142 28.8
## 225        0.06829164  26     100 23.6
## 226        0.09949318  22      87 34.6
## 227        0.18367083  26     101 35.7
## 228        0.68004613  24     162 37.2
## 229        0.89605868  31     197 36.7
## 230        0.46813090  24     117 45.2
## 231        0.64472858  22     142 44.0
## 232        0.76813313  46     134 46.2
## 233        0.03512856  22      79 25.4
## 234        0.32697577  29     122 35.0
## 235        0.04410466  23      74 29.7
## 236        0.84628204  26     171 43.6
## 237        0.88969137  51     181 35.9
## 238        0.87532621  23     179 44.1
## 239        0.61780765  32     164 30.8
## 240        0.05170763  27     104 18.4
## 241        0.07082780  21      91 29.2
## 242        0.10017276  22      91 33.1
## 243        0.23827632  22     139 25.6
## 244        0.19422429  33     119 27.1
## 245        0.60311596  29     146 38.2
## 246        0.83303549  49     184 30.0
## 247        0.32770854  41     122 31.2
## 248        0.89909414  23     165 52.3
## 249        0.38428435  34     124 35.4
## 250        0.15124012  23     111 30.1
## 251        0.22120889  42     106 31.2
## 252        0.23889824  27     129 28.0
## 253        0.04953331  24      90 24.4
## 254        0.11459758  25      86 35.8
## 255        0.11691034  44      92 27.6
## 256        0.19828080  21     113 33.6
## 257        0.17887125  30     111 30.1
## 258        0.15623408  25     114 28.7
## 259        0.69883589  24     193 25.9
## 260        0.71707281  51     155 33.3
## 261        0.81853025  34     191 30.9
## 262        0.36525031  27     141 30.0
## 263        0.11051715  24      95 32.1
## 264        0.67512915  63     142 32.4
## 265        0.31358505  35     123 32.0
## 266        0.20261849  43      96 33.6
## 267        0.46226051  25     138 36.3
## 268        0.44933995  24     128 40.0
## 269        0.07235915  21     102 25.1
## 270        0.36110033  28     146 27.5
## 271        0.43567653  38     101 45.6
## 272        0.08877056  21     108 25.2
## 273        0.18493829  40     122 23.0
## 274        0.05088367  21      71 33.2
## 275        0.33128368  52     106 34.2
## 276        0.24506561  25     100 40.5
## 277        0.11369280  29     106 26.5
## 278        0.10154494  23     104 27.8
## 279        0.24801836  57     114 24.9
## 280        0.09186564  22     108 25.3
## 281        0.58972781  28     146 37.9
## 282        0.47370166  39     129 35.9
## 283        0.41711384  37     133 32.4
## 284        0.68313032  47     161 30.4
## 285        0.21797420  52     108 27.0
## 286        0.40116323  51     136 26.0
## 287        0.71641934  34     155 38.7
## 288        0.53067228  29     119 45.6
## 289        0.04712263  26      96 20.8
## 290        0.26775522  33     108 36.1
## 291        0.08745865  21      78 36.9
## 292        0.22682859  25     107 36.6
## 293        0.57291177  31     128 43.3
## 294        0.46046734  24     128 40.5
## 295        0.62758689  65     161 21.9
## 296        0.58058435  28     151 35.5
## 297        0.37824219  29     146 28.0
## 298        0.24803175  24     126 30.7
## 299        0.29474268  46     100 36.6
## 300        0.21955100  58     112 23.6
## 301        0.66018715  30     167 32.3
## 302        0.41100863  25     144 31.6
## 303        0.11129750  35      77 35.8
## 304        0.64729036  28     115 52.9
## 305        0.32005860  37     150 21.0
## 306        0.40826274  29     120 39.7
## 307        0.58137125  47     161 25.5
## 308        0.20853969  21     137 24.8
## 309        0.26360927  25     128 30.5
## 310        0.30776656  30     124 32.9
## 311        0.06535416  41      80 26.2
## 312        0.25036944  22     106 39.4
## 313        0.41092641  27     155 26.6
## 314        0.16107311  25     113 29.5
## 315        0.33149016  43     109 35.9
## 316        0.22369709  26     112 34.1
## 317        0.05117748  30      99 19.3
## 318        0.73245084  29     182 30.5
## 319        0.32712977  28     115 38.1
## 320        0.84109136  59     194 23.5
## 321        0.25184255  31     129 27.5
## 322        0.18282382  25     112 31.6
## 323        0.24378690  36     124 27.4
## 324        0.50258896  43     152 26.8
## 325        0.22371606  21     112 35.7
## 326        0.38582606  24     157 25.6
## 327        0.33531991  30     122 35.1
## 328        0.82388681  37     179 35.1
## 329        0.34014640  23     102 45.5
## 330        0.18639905  37     105 30.8
## 331        0.19088602  46     118 23.1
## 332        0.09218007  25      87 32.7
## 333        0.91902894  41     180 43.3
## 334        0.13200336  44     106 23.6
## 335        0.05320940  22      95 23.9
## 336        0.86742537  26     165 47.9
## 337        0.35965734  44     117 33.8
## 338        0.29290730  44     115 31.2
## 339        0.59568971  33     152 34.2
## 340        0.88624733  41     178 39.9
## 341        0.18920894  22     130 25.9
## 342        0.09132616  36      95 25.9
## 344        0.34659852  33     122 34.7
## 345        0.32815123  57      95 36.8
## 346        0.57649831  49     126 38.5
## 347        0.29236645  22     139 28.7
## 348        0.10531291  23     116 23.5
## 349        0.05676819  26      99 21.8
## 351        0.24193286  29      92 42.2
## 352        0.37729664  30     137 31.2
## 353        0.07898002  46      61 34.4
## 354        0.06279661  24      90 27.2
## 355        0.19814573  21      90 42.7
## 356        0.72467781  49     165 30.4
## 357        0.31076811  28     125 33.3
## 358        0.59801892  44     129 39.9
## 359        0.20451411  48      88 35.3
## 360        0.88526710  29     196 36.5
## 361        0.78897461  29     189 31.2
## 362        0.74400417  63     158 29.8
## 363        0.50320629  65     103 39.2
## 364        0.82287641  67     146 38.5
## 365        0.54649709  30     147 34.9
## 366        0.16790447  30      99 34.0
## 367        0.21165603  29     124 27.6
## 368        0.04952246  21     101 21.0
## 369        0.04507078  22      81 27.5
## 370        0.48272227  45     133 32.8
## 371        0.78269038  25     173 38.4
## 373        0.09704412  21      84 35.8
## 374        0.19000429  25     105 34.9
## 375        0.34459338  28     122 36.2
## 376        0.75532282  58     140 39.2
## 377        0.06564960  22      98 25.2
## 378        0.12244145  22      87 37.2
## 379        0.85402788  32     156 48.3
## 380        0.30435064  35      93 43.4
## 381        0.14485063  24     107 30.8
## 382        0.05348425  22     105 20.0
## 383        0.09319434  21     109 25.4
## 384        0.05402437  25      90 25.1
## 385        0.15572249  25     125 24.3
## 386        0.10794570  24     119 22.3
## 387        0.26789645  35     116 32.3
## 388        0.46951938  45     105 43.3
## 389        0.65094037  58     144 32.0
## 390        0.13731054  28     100 31.6
## 391        0.19779697  42     100 32.0
## 392        0.85134307  27     166 45.7
## 393        0.16168195  21     131 23.7
## 394        0.13430673  37     116 22.1
## 395        0.60509679  31     158 32.9
## 396        0.21179332  25     127 27.7
## 397        0.09248973  39      96 24.7
## 398        0.33946376  22     131 34.3
## 399        0.02913697  25      82 21.1
## 400        0.84267098  25     193 34.9
## 401        0.13084013  31      95 32.0
## 402        0.39847300  55     137 24.2
## 403        0.48699564  35     136 35.0
## 404        0.07268482  38      72 31.6
## 405        0.74444828  41     168 32.9
## 406        0.46625190  26     123 42.1
## 407        0.26301832  46     115 28.9
## 408        0.05958239  25     101 21.9
## 409        0.80446521  39     197 25.9
## 410        0.84435244  28     172 42.4
## 411        0.19801831  28     102 35.7
## 412        0.22338931  25     112 34.4
## 413        0.61960934  22     143 42.4
## 414        0.26996338  21     143 26.2
## 415        0.39684645  21     138 34.6
## 416        0.72171476  22     173 35.7
## 417        0.07505161  22      97 27.2
## 418        0.64654541  37     144 38.5
## 419        0.02475853  27      83 18.2
## 420        0.21863561  28     129 26.4
## 421        0.50245499  26     119 45.3
## 422        0.05982690  21      94 26.0
## 423        0.23869814  21     102 40.6
## 424        0.17118007  21     115 30.8
## 425        0.77187668  36     151 42.9
## 426        0.84799767  31     184 37.0
## 428        0.82533914  38     181 34.1
## 429        0.53910692  26     135 40.6
## 430        0.21756689  43      95 35.0
## 431        0.05413944  23      99 22.2
## 432        0.11409791  38      89 30.4
## 433        0.05393312  22      80 30.0
## 434        0.27662620  29     139 25.6
## 435        0.06907776  36      90 24.5
## 436        0.64969264  29     141 42.4
## 437        0.61721992  41     140 37.4
## 438        0.42076439  28     147 29.9
## 439        0.03395945  21      97 18.2
## 440        0.26189146  31     107 36.8
## 441        0.87450365  41     189 34.3
## 442        0.07172639  22      83 32.2
## 443        0.23064261  24     117 33.2
## 444        0.18113764  33     108 30.5
## 445        0.20642234  30     117 29.7
## 446        0.96817202  25     180 59.4
## 447        0.08292490  28     100 25.3
## 448        0.16339824  26      95 36.5
## 449        0.15599867  22     104 33.6
## 450        0.21704383  26     120 30.5
## 451        0.02779798  23      82 21.2
## 452        0.26600085  23     134 28.9
## 453        0.18259140  25      91 39.9
## 454        0.27355231  72     119 19.6
## 455        0.19842979  24     100 37.8
## 456        0.78495582  38     175 33.6
## 457        0.48559124  62     135 26.7
## 458        0.07070356  24      86 30.2
## 459        0.74404318  51     148 37.6
## 460        0.59394115  81     134 25.9
## 461        0.17913761  48     120 20.8
## 462        0.02176006  26      71 21.8
## 463        0.10771664  39      74 35.3
## 464        0.08587260  37      88 27.6
## 465        0.14009285  34     115 24.0
## 466        0.11253219  21     124 21.8
## 467        0.03642788  22      74 27.8
## 468        0.17309703  25      97 36.8
## 469        0.27220493  38     120 30.0
## 470        0.79486433  27     154 46.1
## 471        0.64494785  28     144 41.3
## 472        0.36560333  22     137 33.2
## 473        0.33439546  22     119 38.8
## 474        0.48018967  50     136 29.9
## 475        0.15482239  24     114 28.9
## 476        0.49529998  59     137 27.3
## 477        0.19109830  29     105 33.7
## 478        0.12410536  31     114 23.8
## 479        0.24797063  39     126 25.9
## 480        0.49527096  63     132 28.0
## 481        0.68458039  35     158 35.5
## 482        0.33885594  29     123 35.2
## 483        0.06226367  28      85 27.8
## 484        0.12371592  23      84 38.2
## 485        0.72687680  31     145 44.2
## 486        0.56265143  24     135 42.3
## 487        0.54101247  21     139 40.7
## 488        0.95052210  58     173 46.5
## 489        0.08227156  28      99 25.6
## 490        0.89372062  67     194 26.1
## 491        0.11005245  24      83 36.8
## 492        0.16022990  42      89 33.5
## 493        0.16490742  33      99 32.8
## 494        0.33102362  45     125 28.9
## 496        0.75953954  66     166 26.6
## 497        0.12702210  30     110 26.0
## 498        0.06099965  25      81 30.1
## 499        0.84950541  55     195 25.1
## 500        0.54761477  39     154 29.3
## 501        0.11828122  21     117 25.2
## 502        0.12966096  28      84 37.2
## 504        0.17866354  41      94 33.3
## 505        0.24526648  40      96 37.3
## 506        0.09221063  38      75 33.3
## 507        0.83844593  35     180 36.5
## 508        0.22417076  21     130 28.6
## 509        0.06208386  21      84 30.4
## 510        0.33491186  64     120 25.0
## 511        0.11299337  46      84 29.7
## 512        0.18168280  21     139 22.1
## 513        0.12336536  58      91 24.2
## 514        0.06204313  22      91 27.3
## 515        0.07400936  24      99 25.6
## 516        0.59909870  28     163 31.6
## 517        0.58968113  53     145 30.3
## 518        0.56205034  51     125 37.6
## 519        0.09884110  41      76 32.8
## 520        0.27576164  60     129 19.6
## 521        0.02523876  25      68 25.0
## 522        0.28936865  26     124 33.2
## 524        0.48747062  45     130 34.2
## 525        0.25656343  24     125 31.6
## 526        0.03291344  21      87 21.8
## 527        0.03395945  21      97 18.2
## 528        0.13475781  24     116 26.3
## 529        0.18580656  22     117 30.8
## 530        0.12036732  31     111 24.6
## 531        0.19948677  22     122 29.8
## 532        0.38363361  24     107 45.3
## 533        0.19213803  29      86 41.3
## 534        0.09680854  31      91 29.8
## 535        0.06801244  24      77 33.3
## 536        0.32583319  23     132 32.9
## 537        0.21032098  46     105 29.6
## 538        0.04166002  67      57 21.7
## 539        0.35441889  23     127 36.3
## 540        0.43504221  32     129 36.4
## 541        0.33020703  43     100 39.4
## 542        0.31016162  27     128 32.4
## 543        0.25095444  56      90 34.9
## 544        0.14385608  25      84 39.5
## 545        0.09976982  29      88 32.0
## 546        0.85041886  37     186 34.5
## 547        0.95475562  53     187 43.6
## 548        0.35407064  28     131 33.1
## 549        0.76428975  50     164 32.8
## 550        0.78684564  37     189 28.5
## 551        0.13623736  21     116 27.4
## 552        0.07829450  25      84 31.9
## 553        0.35648962  66     114 27.8
## 554        0.07172684  23      88 29.9
## 555        0.12665258  28      84 36.9
## 556        0.21859713  37     124 25.5
## 557        0.21354962  30      97 38.1
## 558        0.27639435  58     110 27.8
## 559        0.49525433  42     103 46.2
## 560        0.09072912  35      85 30.1
## 561        0.49863081  54     125 33.8
## 562        0.92529006  28     198 41.3
## 563        0.13282467  24      87 37.6
## 564        0.10152438  32      99 26.9
## 565        0.10768221  27      91 32.4
## 566        0.06408069  22      95 26.1
## 567        0.19062099  21      99 38.6
## 568        0.17225806  46      92 32.0
## 569        0.57765294  37     154 31.3
## 570        0.33059973  33     121 34.3
## 571        0.09766907  39      78 32.5
## 572        0.14429785  21     130 22.6
## 573        0.14094557  22     111 29.5
## 574        0.14150256  22      98 34.7
## 575        0.35723780  23     143 30.1
## 576        0.28936725  25     119 35.5
## 577        0.11561206  35     108 24.0
## 578        0.40501046  21     118 42.9
## 579        0.29985290  36     133 27.0
## 580        0.94605555  62     197 34.7
## 581        0.67186867  21     151 42.1
## 582        0.10536856  27     109 25.0
## 583        0.36048196  62     121 26.5
## 584        0.31028760  42     100 38.7
## 585        0.36443558  52     124 28.7
## 586        0.04412532  22      93 22.5
## 587        0.58904672  41     143 34.9
## 588        0.08645851  29     103 24.3
## 589        0.84621187  52     176 33.3
## 590        0.02132939  25      73 21.1
## 591        0.59997227  45     111 46.8
## 592        0.30450063  24     112 39.4
## 593        0.50255482  44     132 34.4
## 594        0.05509608  25      82 28.5
## 595        0.33883146  34     123 33.6
## 596        0.76026197  22     188 32.0
## 597        0.22016747  46      67 45.3
## 598        0.05892304  21      89 27.8
## 599        0.81919423  38     173 36.8
## 600        0.08801009  26     109 23.1
## 601        0.11183714  24     108 27.1
## 602        0.06362255  28      96 23.7
## 603        0.21954458  30     124 27.8
## 604        0.73280017  54     150 35.2
## 605        0.74174410  36     183 28.4
## 606        0.30819104  21     124 35.8
## 607        0.83525045  22     181 40.0
## 608        0.03576715  25      92 19.5
## 609        0.70485802  27     152 41.5
## 610        0.09343462  23     111 24.0
## 611        0.14159007  24     106 30.9
## 612        0.75749794  36     174 32.9
## 613        0.81997981  40     168 38.2
## 614        0.16291589  26     105 32.5
## 615        0.63373693  50     138 36.1
## 616        0.10212907  27     106 25.8
## 617        0.19210652  30     117 28.7
## 618        0.01550448  23      68 20.1
## 619        0.25255837  50     112 28.2
## 620        0.23051726  24     119 32.4
## 621        0.30983015  28     112 38.4
## 622        0.05806539  28      92 24.2
## 623        0.91880972  45     183 40.8
## 624        0.23434633  21      94 43.5
## 625        0.13870084  21     108 30.8
## 626        0.16560570  29      90 37.7
## 627        0.14562927  21     125 24.7
## 628        0.30377897  21     132 32.4
## 629        0.47868226  45     128 34.6
## 630        0.05359131  21      94 24.7
## 631        0.17582314  34     114 27.4
## 632        0.16503579  24     102 34.5
## 633        0.11155604  23     111 26.2
## 634        0.20058382  22     128 27.5
## 635        0.07258193  31      92 25.9
## 636        0.19084517  38     104 31.2
## 637        0.20214464  48     104 28.8
## 638        0.10023708  23      94 31.6
## 639        0.26993477  32      97 40.9
## 640        0.05098845  28     100 19.5
## 641        0.11900934  27     102 29.3
## 642        0.32851032  24     128 34.3
## 643        0.56852689  50     147 29.5
## 644        0.08089095  31      90 28.0
## 645        0.10727554  27     103 27.6
## 646        0.72028900  30     157 39.4
## 647        0.48785825  33     167 23.4
## 648        0.79490552  22     179 37.8
## 649        0.38877110  42     136 28.3
## 650        0.09982364  23     107 26.4
## 651        0.05337015  23      91 25.2
## 652        0.25640525  27     117 33.8
## 653        0.31091947  28     123 34.1
## 654        0.16989599  27     120 26.8
## 655        0.17316520  22     106 34.2
## 656        0.66116202  25     155 38.7
## 657        0.05447303  22     101 21.8
## 658        0.47537877  41     120 38.9
## 659        0.60974433  51     127 39.0
## 660        0.08753509  27      80 34.2
## 661        0.68185368  54     162 27.7
## 662        0.92576964  22     199 42.9
## 663        0.81949182  43     167 37.6
## 664        0.66187967  40     145 37.9
## 665        0.31610727  40     115 33.7
## 666        0.22464126  24     112 34.8
## 667        0.74038861  70     145 32.5
## 668        0.18688482  40     111 27.5
## 669        0.22045595  43      98 34.0
## 670        0.62406486  45     154 30.9
## 671        0.77816331  49     165 33.6
## 672        0.06718727  21      99 25.4
## 673        0.11105363  47      68 35.5
## 674        0.75292098  22     123 57.3
## 675        0.34281746  68      91 35.6
## 676        0.82671405  31     195 30.9
## 677        0.56464439  53     156 24.8
## 678        0.13697367  25      93 35.3
## 679        0.31378419  25     121 36.0
## 680        0.06850206  23     101 24.2
## 681        0.01422697  22      56 24.2
## 682        0.87261940  26     162 49.6
## 683        0.26484153  22      95 44.6
## 684        0.28598125  27     125 32.3
## 686        0.32094951  25     129 33.2
## 687        0.15362251  22     130 23.1
## 688        0.13511619  29     107 28.3
## 689        0.22573798  23     140 24.1
## 690        0.82408951  46     144 46.1
## 691        0.11455343  34     107 24.6
## 692        0.83801270  44     158 42.3
## 693        0.36316566  23     121 39.1
## 694        0.56041931  43     129 38.5
## 695        0.04713819  25      90 23.5
## 696        0.49449709  43     142 30.4
## 697        0.63379122  31     169 29.9
## 698        0.06673730  22      99 25.0
## 699        0.35029763  28     127 34.5
## 700        0.47563672  26     118 44.5
## 701        0.32580619  26     122 35.9
## 702        0.33060689  49     125 27.6
## 703        0.82826641  52     168 35.0
## 704        0.54623397  41     129 38.5
## 705        0.14206554  27     110 28.4
## 706        0.14030083  28      80 39.8
## 708        0.31026059  22     127 34.4
## 709        0.73746767  45     164 32.8
## 710        0.16033718  23      93 38.0
## 711        0.51831021  24     158 31.2
## 712        0.32110908  40     126 29.6
## 713        0.58460380  38     129 41.2
## 714        0.21470587  21     134 26.4
## 715        0.13700414  32     102 29.5
## 716        0.83664700  34     187 33.9
## 717        0.73899368  31     173 33.8
## 718        0.11811293  56      94 23.1
## 719        0.21112030  24     108 35.5
## 720        0.28973691  52      97 35.6
## 721        0.07753199  34      83 29.3
## 722        0.27735147  21     114 38.1
## 723        0.52482847  42     149 29.3
## 724        0.46044316  42     117 39.1
## 725        0.29918259  45     111 32.8
## 726        0.39551955  38     112 39.4
## 727        0.27863328  25     116 36.1
## 728        0.38207831  22     141 32.4
## 729        0.46885040  22     175 22.9
## 730        0.08098655  22      92 30.1
## 731        0.29185614  34     130 28.4
## 732        0.16991140  22     120 28.4
## 733        0.86244607  24     174 44.5
## 734        0.11608125  22     106 29.0
## 735        0.15609935  53     105 23.3
## 736        0.15782675  28      95 35.4
## 737        0.18370871  21     126 27.4
## 738        0.06634126  42      65 32.0
## 739        0.16444963  21      99 36.6
## 740        0.34166645  42     102 39.5
## 741        0.60048581  48     120 42.3
## 742        0.13057157  26     102 30.8
## 743        0.12257080  22     109 28.5
## 744        0.54257711  45     140 32.7
## 745        0.76309072  39     153 40.6
## 746        0.18772966  46     100 30.0
## 747        0.80105067  27     147 49.3
## 748        0.25368504  32      81 46.3
## 749        0.87161001  36     187 36.4
## 750        0.58476045  50     162 24.3
## 751        0.31730612  22     136 31.2
## 752        0.39481184  28     121 39.0
## 753        0.10506793  25     108 26.0
## 754        0.88435023  26     181 43.3
## 755        0.65508506  45     154 32.4
## 756        0.46396635  37     128 36.5
## 757        0.45736779  39     137 32.0
## 758        0.52258754  52     123 36.3
## 759        0.24005514  26     106 37.5
## 760        0.94278973  66     190 35.5
## 761        0.06158409  22      88 28.4
## 762        0.89970687  43     170 44.0
## 763        0.05205013  33      89 22.5
## 764        0.33601291  63     101 32.9
## 765        0.35029608  27     122 36.8
## 766        0.17967203  30     121 26.2
## 767        0.37685726  47     126 30.1
## 768        0.08803680  23      93 30.4
 #xtabs(~ probability.of.hd + age + glucose + bmi, data=predicted.data)
# NOTE: I tried running this code from the class notes and it was running for about 5 minutes with no output, so I stopped it. I hope that isn't a problem

# Predicted classes
predicted.data <- data.frame(
  probability.of.hd=logistic$fitted.values,
  diabetes = data_subset$Outcome, age=data_subset$Age, glucose = data_subset$Glucose, bmi = data_subset$BMI)
head(predicted.data)
##   probability.of.hd diabetes age glucose  bmi
## 1        0.66360006        1  50     148 33.6
## 2        0.06101402        0  31      85 26.6
## 3        0.61834186        1  32     183 23.3
## 4        0.06043396        0  21      89 28.1
## 5        0.65771328        1  33     137 43.1
## 6        0.14802668        0  30     116 25.6
# Confusion matrix

Confusion Matrix

#Create a numeric version of the outcome (0 = no diabetes, 1 = diabetes).This is required for calculating confusion matrices.
data_subset$Outcome_num <- ifelse(data_subset$Outcome == "1", 1, 0)


# Predicted probabilities
predicted.probs <- logistic$fitted.values

# Predicted classes: 1 if prob > 0.5, else 0
predicted.classes <- ifelse(predicted.probs > 0.5, 1, 0)

# Confusion matrix
confusion <- table(
  Predicted = factor(predicted.classes, levels = c(0, 1)),
  Actual = factor(data_subset$Outcome_num, levels = c(0, 1))
)

confusion
##          Actual
## Predicted   0   1
##         0 429 114
##         1  59 150
#Extract Values:
TN <- 429
FP <- 59
FN <- 114
TP <- 150

#Metrics    
accuracy <- (TP + TN) / (TP + TN + FP + FN)
sensitivity <- TP / (TP + FN)   # also called recall or true positive rate
specificity <- TN / (TN + FP)   # true negative rate
precision <- TP / (TP + FP)     # positive predictive value

cat("Accuracy:", round(accuracy, 3), "\nSensitivity:", round(sensitivity, 3), "\nSpecificity:", round(specificity, 3), "\nPrecision:", round(precision, 3))
## Accuracy: 0.77 
## Sensitivity: 0.568 
## Specificity: 0.879 
## Precision: 0.718

Interpret: How well does the model perform? Is it better at detecting diabetes (sensitivity) or non-diabetes (specificity)? Why might this matter for medical diagnosis?

This model isn’t incredible, but its not horrible either. It accurately predicts whether an individual has diabetes 87.9% of the time. However, it only accurately predicts whether an individual is healthy 56% of the time, which is not far above a 50-50 chance. For a medical diagnosis, I would say this is a pretty bad model. When it comes to an individuals health and their lives the model needs to be extremely accurate. And if the model is only detecting healthy individuals about half the time, then many people could be predicted healthy and actually have diabetes (false negative), which endangers them further.

Question 3: ROC Curve, AUC, and Interpretation

#Enter your code here
library(pROC)
## Warning: package 'pROC' was built under R version 4.5.3
## Type 'citation("pROC")' for a citation.
## 
## Attaching package: 'pROC'
## The following objects are masked from 'package:stats':
## 
##     cov, smooth, var
# ROC curve & AUC on full data
roc_obj <- roc(response = data_subset$Outcome,
               predictor = logistic$fitted.values,
               levels = c("0", "1"),
               direction = "<")  # smaller prob = Healthy

# Print AUC value
auc_val <- auc(roc_obj); auc_val
## Area under the curve: 0.828
# Plot ROC with AUC displayed
plot.roc(roc_obj, print.auc = TRUE, legacy.axes = TRUE,
         xlab = "False Positive Rate (1 - Specificity)",
         ylab = "True Positive Rate (Sensitivity)")

What does AUC indicate (0.5 = random, 1.0 = perfect)?

The AUC (or Area Under the Curve) shows how accurate our model is at identifying individuals who do and do not have diabetes. If the line was a straight diagonal, that would indicate that there is a 50-50 chance and the model is no better than a random guess. The more the line is dragged toward the upper left corner, the more accurate our model is. Our model has an AUC of 0.828, which is pretty good. However, I think for a medical situation like this the model should be accurate.

For diabetes diagnosis, prioritize sensitivity (catching cases) or specificity (avoiding false positives)? Suggest a threshold and explain.

For diabetes diagnosis, you should prioritize sensitivity (catching cases). Obviously having false positives isn’t ideal, but they do far less harm then false negatives. In a medical scenario, false negatives mean the patient won’t recieve the necessary treatment because they’ve been wrongly diagnosed without diabetes. Such errors can cause the individual’s health to severely detioriate and can even lead to death. In a situation where model accuracy is so important, I would suggest a threshold of at least 0.05, ideally 0.01 (AUC of 99%).