Introduction:

In this homework, you will apply logistic regression to a real-world dataset: the Pima Indians Diabetes Database. This dataset contains medical records from 768 women of Pima Indian heritage, aged 21 or older, and is used to predict the onset of diabetes (binary outcome: 0 = no diabetes, 1 = diabetes) based on physiological measurements.

The data is publicly available from the UCI Machine Learning Repository and can be imported directly.

Dataset URL: https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv

Columns (no header in the CSV, so we need to assign them manually):

  1. Pregnancies: Number of times pregnant
  2. Glucose: Plasma glucose concentration (2-hour test)
  3. BloodPressure: Diastolic blood pressure (mm Hg)
  4. SkinThickness: Triceps skin fold thickness (mm)
  5. Insulin: 2-hour serum insulin (mu U/ml)
  6. BMI: Body mass index (weight in kg/(height in m)^2)
  7. DiabetesPedigreeFunction: Diabetes pedigree function (a function scoring genetic risk)
  8. Age: Age in years
  9. Outcome: Class variable (0 = no diabetes, 1 = diabetes)

Task Overview: You will load the data, build a logistic regression model to predict diabetes onset using a subset of predictors (Glucose, BMI, Age), interpret the model, evaluate it with a confusion matrix and metrics, and analyze the ROC curve and AUC.

Cleaning the dataset Don’t change the following code

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.1     ✔ stringr   1.5.1
## ✔ ggplot2   4.0.0     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.1.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
url <- "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv"

data <- read.csv(url, header = FALSE)

colnames(data) <- c("Pregnancies", "Glucose", "BloodPressure", "SkinThickness", "Insulin", "BMI", "DiabetesPedigreeFunction", "Age", "Outcome")

data$Outcome <- as.factor(data$Outcome)

# Handle missing values (replace 0s with NA because 0 makes no sense here)
data$Glucose[data$Glucose == 0] <- NA
data$BloodPressure[data$BloodPressure == 0] <- NA
data$BMI[data$BMI == 0] <- NA


colSums(is.na(data))
##              Pregnancies                  Glucose            BloodPressure 
##                        0                        5                       35 
##            SkinThickness                  Insulin                      BMI 
##                        0                        0                       11 
## DiabetesPedigreeFunction                      Age                  Outcome 
##                        0                        0                        0

Question 1: Create and Interpret a Logistic Regression Model - Fit a logistic regression model to predict Outcome using Glucose, BMI, and Age.

Logistic Regression Model

## Enter your code here
logistic <- glm(Outcome ~ Glucose+BMI+Age, data=data, family="binomial")

summary(logistic)
## 
## Call:
## glm(formula = Outcome ~ Glucose + BMI + Age, family = "binomial", 
##     data = data)
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -9.032377   0.711037 -12.703  < 2e-16 ***
## Glucose      0.035548   0.003481  10.212  < 2e-16 ***
## BMI          0.089753   0.014377   6.243  4.3e-10 ***
## Age          0.028699   0.007809   3.675 0.000238 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 974.75  on 751  degrees of freedom
## Residual deviance: 724.96  on 748  degrees of freedom
##   (16 observations deleted due to missingness)
## AIC: 732.96
## 
## Number of Fisher Scoring iterations: 4

Calculating R^2

r_square <- 1 - (logistic$deviance/logistic$null.deviance)

r_square
## [1] 0.25626

What does it indicate about the model’s explanatory power? This indicates that model explains around 25.6% of the variation in diabetes outcomes.

What does the intercept represent (log-odds of diabetes when predictors are zero)?

The intercept represents the log-odds of having diabetes when BMI, Age and Glucose are = to 0. These values are not realistic because to have a meaningful real-world interpretation because there is no human with BMI of 0, AGE 0 and Glucose 0. Therefore, the intercept does not have meaningful real-world interpretation. It purpose is as the model’s baseline starting point for calculating log-odds.

For each predictor (Glucose, BMI, Age), does a one-unit increase raise or lower the odds of diabetes? Are they significant (p-value < 0.05)? The three predictors (glucose, BMI and Age), meaning that a one-unit increase in glucose increases the odds of diabetes. All the three p-values is less than 0.05, so it is a statistically significant predictor. As well as, the BMI coefficient is positive, so higher BMI increases the odds of diabetes. The Age coefficient is positive, meaning older age increases the odds of diabetes. so Age is significant, although its effect is smaller than Glucose or BMI.

Question 2: Confusion Matrix and Important Metric

Calculate and report the metrics:

Accuracy: (TP + TN) / Total Sensitivity (Recall): TP / (TP + FN) Specificity: TN / (TN + FP) Precision: TP / (TP + FP)

Use the following starter code

# Keep only rows with no missing values in Glucose, BMI, or Age
data_subset <- data[complete.cases(data[, c("Glucose", "BMI", "Age")]), ]

#Create a numeric version of the outcome (0 = no diabetes, 1 = diabetes).This is required for calculating confusion matrices.
data_subset$Outcome_num <- ifelse(data_subset$Outcome == "1", 1, 0)


# Predicted probabilities
predicted.data <- data.frame(probabilty_diabetes=logistic$fitted.values,
                             Glucose = data_subset$Glucose,
                             BMI= data_subset$BMI,
                             AGE= data_subset$Age)
predicted.data
##     probabilty_diabetes Glucose  BMI AGE
## 1            0.66360006     148 33.6  50
## 2            0.06101402      85 26.6  31
## 3            0.61834186     183 23.3  32
## 4            0.06043396      89 28.1  21
## 5            0.65771328     137 43.1  33
## 6            0.14802668     116 25.6  30
## 7            0.06116212      78 31.0  26
## 8            0.28013239     115 35.3  29
## 9            0.90283168     197 30.5  53
## 11           0.29185049     110 37.6  30
## 12           0.79018904     168 38.0  34
## 13           0.49423685     139 27.1  57
## 14           0.88904285     189 30.1  59
## 15           0.65652969     166 25.8  51
## 16           0.13393352     100 30.0  32
## 17           0.54057167     118 45.8  31
## 18           0.15678020     107 29.6  31
## 19           0.36875552     103 43.3  33
## 20           0.28484895     115 34.6  32
## 21           0.43753713     126 39.3  27
## 22           0.28886248      99 35.4  50
## 23           0.93606735     196 39.8  41
## 24           0.20309566     119 29.0  29
## 25           0.68988838     143 36.6  51
## 26           0.34957706     125 31.1  41
## 27           0.72382298     147 39.4  43
## 28           0.05362751      97 23.2  22
## 29           0.43793280     145 22.2  57
## 30           0.32692619     117 34.1  38
## 31           0.44902956     109 36.0  60
## 32           0.55575994     158 31.6  28
## 33           0.04535146      88 24.8  22
## 34           0.04022137      92 19.9  28
## 35           0.28355775     122 27.6  45
## 36           0.09365574     103 24.0  33
## 37           0.46443795     138 33.2  35
## 38           0.24352489     102 32.9  46
## 39           0.16388268      90 38.2  27
## 40           0.46267837     111 37.1  56
## 41           0.76206518     180 34.0  26
## 42           0.59035695     133 40.2  37
## 43           0.13595021     106 22.7  48
## 44           0.93528537     171 45.4  54
## 45           0.55649424     159 27.4  40
## 46           0.86452121     180 42.0  25
## 47           0.41473239     146 29.7  29
## 48           0.03343950      71 28.0  22
## 49           0.27449787     103 39.1  31
## 51           0.04750056     103 19.4  22
## 52           0.07420421     101 24.2  26
## 53           0.05451568      88 24.4  30
## 54           0.87138830     176 33.7  58
## 55           0.65012987     150 34.7  42
## 56           0.02252439      73 23.0  21
## 57           0.89802263     187 37.7  41
## 58           0.40432752     100 46.8  31
## 59           0.74180750     146 40.5  44
## 60           0.28015166     105 41.5  22
## 62           0.44217047     133 32.9  39
## 63           0.01490161      44 25.0  36
## 64           0.25891619     141 25.4  24
## 65           0.30350831     114 32.8  42
## 66           0.12005398      99 29.0  32
## 67           0.24046916     109 32.5  38
## 68           0.55590485     109 42.7  54
## 69           0.03997582      95 19.6  25
## 70           0.38375591     146 28.9  27
## 71           0.15172557     100 32.9  28
## 72           0.31473014     139 28.6  26
## 73           0.63351150     126 43.4  42
## 74           0.34608813     129 35.1  23
## 75           0.06176809      79 32.0  22
## 77           0.06146859      62 32.6  41
## 78           0.18291002      95 37.7  27
## 79           0.56166300     131 43.2  26
## 80           0.10732115     112 25.0  24
## 81           0.08520738     113 22.4  22
## 83           0.08173799      83 29.3  36
## 84           0.06896305     101 24.6  22
## 85           0.78236624     137 48.8  37
## 86           0.19166509     110 32.4  27
## 87           0.33450675     106 36.6  45
## 88           0.21824692     100 38.5  26
## 89           0.59050304     136 37.1  43
## 90           0.10326043     107 26.5  24
## 91           0.02040067      80 19.1  21
## 92           0.30744087     123 32.0  34
## 93           0.31948020      81 46.7  42
## 94           0.39870087     134 23.8  60
## 95           0.23776250     142 24.7  21
## 96           0.56884043     144 33.9  40
## 97           0.09647762      92 31.6  24
## 98           0.01718930      71 20.4  22
## 99           0.07653216      93 28.7  23
## 100          0.65810775     122 49.7  31
## 101          0.77018913     163 39.0  33
## 102          0.33387716     151 26.1  22
## 103          0.12273756     125 22.5  21
## 104          0.04407517      81 26.6  24
## 105          0.15687002      85 39.6  27
## 106          0.20185497     126 28.7  21
## 107          0.05549181      96 22.4  27
## 108          0.44920355     144 29.5  37
## 109          0.09229839      83 34.3  25
## 110          0.16661940      95 37.4  24
## 111          0.67346049     171 33.3  24
## 112          0.70042432     155 34.0  46
## 113          0.08254696      89 31.2  23
## 114          0.07164768      76 34.0  25
## 115          0.62528208     160 30.5  39
## 116          0.67008425     146 31.2  61
## 117          0.38171854     124 34.0  38
## 118          0.07464178      78 33.7  25
## 119          0.08152471      97 28.2  22
## 120          0.05582038      99 23.2  21
## 121          0.90191908     162 53.2  25
## 122          0.20945381     111 34.2  24
## 123          0.17465864     107 33.6  23
## 124          0.51139163     132 26.8  69
## 125          0.20316943     113 33.3  23
## 126          0.44483507      88 55.0  26
## 127          0.48619280     120 42.9  30
## 128          0.23346251     118 33.3  23
## 129          0.34777790     117 34.5  40
## 130          0.26573200     105 27.9  62
## 131          0.67483938     173 29.7  33
## 132          0.31871597     122 33.3  33
## 133          0.72476634     170 34.5  30
## 134          0.18399064      84 38.3  39
## 135          0.04834651      96 21.1  26
## 136          0.33949245     125 33.8  31
## 137          0.10807987     100 30.8  21
## 138          0.07452834      93 28.7  22
## 139          0.30701297     129 31.2  29
## 140          0.23426580     105 36.9  28
## 141          0.26698030     128 21.1  55
## 142          0.34785494     106 39.5  38
## 143          0.16180713     108 32.5  22
## 144          0.25353709     108 32.4  42
## 145          0.51149487     154 32.8  23
## 147          0.05287106      57 32.8  41
## 148          0.17493386     106 30.5  34
## 149          0.74711670     147 33.7  65
## 150          0.06000638      90 27.3  22
## 151          0.46199537     136 37.4  24
## 152          0.12428635     114 21.9  37
## 153          0.68933166     156 34.3  42
## 154          0.67051462     153 40.6  23
## 155          0.96022281     188 47.9  43
## 156          0.86895301     152 50.0  36
## 157          0.06282464      99 24.6  21
## 158          0.09658196     109 25.2  23
## 159          0.06477073      88 29.0  22
## 160          0.85590643     163 40.9  47
## 161          0.50854865     151 29.7  36
## 162          0.31513699     102 37.2  45
## 163          0.44079186     114 44.2  27
## 164          0.09892429     100 29.7  21
## 165          0.34954797     131 31.6  32
## 166          0.18616725     104 29.9  41
## 167          0.44449836     148 32.5  22
## 168          0.24339382     120 29.6  34
## 169          0.19361257     110 31.9  29
## 170          0.15377524     111 28.4  29
## 171          0.16673816     102 30.8  36
## 172          0.43550662     134 35.4  29
## 173          0.06733514      87 28.9  25
## 174          0.15979544      79 43.5  23
## 175          0.05988683      75 29.7  33
## 176          0.78563291     179 32.7  36
## 177          0.11866407      85 31.2  42
## 178          0.91067595     129 67.1  26
## 179          0.80825740     143 45.0  47
## 180          0.53993201     130 39.1  37
## 181          0.05025601      87 23.2  32
## 182          0.26703676     119 34.9  23
## 184          0.03707194      73 26.8  27
## 185          0.40252230     141 27.6  40
## 186          0.90574248     194 35.9  41
## 187          0.86120289     181 30.1  60
## 188          0.34005026     128 32.0  33
## 189          0.14630667     109 27.9  31
## 190          0.36876070     139 31.6  25
## 191          0.07904064     111 22.6  21
## 192          0.36791134     123 33.1  40
## 193          0.59421282     159 30.4  36
## 194          0.83322339     135 52.3  40
## 195          0.06814984      85 24.4  42
## 196          0.72166676     158 39.4  29
## 197          0.07473295     105 24.3  21
## 198          0.07492956     107 22.9  23
## 199          0.21618023     109 34.8  26
## 200          0.45868538     148 30.9  29
## 201          0.16377117     113 31.0  21
## 202          0.56854406     138 40.1  28
## 203          0.13888668     108 27.3  32
## 204          0.05179431      99 20.4  27
## 205          0.39920095     103 37.7  55
## 206          0.10279202     111 23.9  27
## 207          0.94962688     196 37.5  57
## 208          0.83235858     162 37.7  52
## 209          0.11534291      96 33.2  21
## 210          0.86661368     184 35.5  41
## 211          0.04976700      81 27.7  25
## 212          0.67335126     147 42.8  24
## 213          0.89304307     179 34.2  60
## 214          0.61220621     140 42.6  24
## 215          0.27923019     112 34.2  36
## 216          0.76451750     151 41.8  38
## 217          0.22670470     109 35.8  25
## 218          0.27330482     125 30.0  32
## 219          0.07659116      85 29.0  32
## 220          0.38185630     112 37.8  41
## 221          0.72467033     177 34.6  21
## 222          0.78827161     158 31.6  66
## 223          0.18565015     119 25.2  37
## 224          0.58685205     142 28.8  61
## 225          0.06829164     100 23.6  26
## 226          0.09949318      87 34.6  22
## 227          0.18367083     101 35.7  26
## 228          0.68004613     162 37.2  24
## 229          0.89605868     197 36.7  31
## 230          0.46813090     117 45.2  24
## 231          0.64472858     142 44.0  22
## 232          0.76813313     134 46.2  46
## 233          0.03512856      79 25.4  22
## 234          0.32697577     122 35.0  29
## 235          0.04410466      74 29.7  23
## 236          0.84628204     171 43.6  26
## 237          0.88969137     181 35.9  51
## 238          0.87532621     179 44.1  23
## 239          0.61780765     164 30.8  32
## 240          0.05170763     104 18.4  27
## 241          0.07082780      91 29.2  21
## 242          0.10017276      91 33.1  22
## 243          0.23827632     139 25.6  22
## 244          0.19422429     119 27.1  33
## 245          0.60311596     146 38.2  29
## 246          0.83303549     184 30.0  49
## 247          0.32770854     122 31.2  41
## 248          0.89909414     165 52.3  23
## 249          0.38428435     124 35.4  34
## 250          0.15124012     111 30.1  23
## 251          0.22120889     106 31.2  42
## 252          0.23889824     129 28.0  27
## 253          0.04953331      90 24.4  24
## 254          0.11459758      86 35.8  25
## 255          0.11691034      92 27.6  44
## 256          0.19828080     113 33.6  21
## 257          0.17887125     111 30.1  30
## 258          0.15623408     114 28.7  25
## 259          0.69883589     193 25.9  24
## 260          0.71707281     155 33.3  51
## 261          0.81853025     191 30.9  34
## 262          0.36525031     141 30.0  27
## 263          0.11051715      95 32.1  24
## 264          0.67512915     142 32.4  63
## 265          0.31358505     123 32.0  35
## 266          0.20261849      96 33.6  43
## 267          0.46226051     138 36.3  25
## 268          0.44933995     128 40.0  24
## 269          0.07235915     102 25.1  21
## 270          0.36110033     146 27.5  28
## 271          0.43567653     101 45.6  38
## 272          0.08877056     108 25.2  21
## 273          0.18493829     122 23.0  40
## 274          0.05088367      71 33.2  21
## 275          0.33128368     106 34.2  52
## 276          0.24506561     100 40.5  25
## 277          0.11369280     106 26.5  29
## 278          0.10154494     104 27.8  23
## 279          0.24801836     114 24.9  57
## 280          0.09186564     108 25.3  22
## 281          0.58972781     146 37.9  28
## 282          0.47370166     129 35.9  39
## 283          0.41711384     133 32.4  37
## 284          0.68313032     161 30.4  47
## 285          0.21797420     108 27.0  52
## 286          0.40116323     136 26.0  51
## 287          0.71641934     155 38.7  34
## 288          0.53067228     119 45.6  29
## 289          0.04712263      96 20.8  26
## 290          0.26775522     108 36.1  33
## 291          0.08745865      78 36.9  21
## 292          0.22682859     107 36.6  25
## 293          0.57291177     128 43.3  31
## 294          0.46046734     128 40.5  24
## 295          0.62758689     161 21.9  65
## 296          0.58058435     151 35.5  28
## 297          0.37824219     146 28.0  29
## 298          0.24803175     126 30.7  24
## 299          0.29474268     100 36.6  46
## 300          0.21955100     112 23.6  58
## 301          0.66018715     167 32.3  30
## 302          0.41100863     144 31.6  25
## 303          0.11129750      77 35.8  35
## 304          0.64729036     115 52.9  28
## 305          0.32005860     150 21.0  37
## 306          0.40826274     120 39.7  29
## 307          0.58137125     161 25.5  47
## 308          0.20853969     137 24.8  21
## 309          0.26360927     128 30.5  25
## 310          0.30776656     124 32.9  30
## 311          0.06535416      80 26.2  41
## 312          0.25036944     106 39.4  22
## 313          0.41092641     155 26.6  27
## 314          0.16107311     113 29.5  25
## 315          0.33149016     109 35.9  43
## 316          0.22369709     112 34.1  26
## 317          0.05117748      99 19.3  30
## 318          0.73245084     182 30.5  29
## 319          0.32712977     115 38.1  28
## 320          0.84109136     194 23.5  59
## 321          0.25184255     129 27.5  31
## 322          0.18282382     112 31.6  25
## 323          0.24378690     124 27.4  36
## 324          0.50258896     152 26.8  43
## 325          0.22371606     112 35.7  21
## 326          0.38582606     157 25.6  24
## 327          0.33531991     122 35.1  30
## 328          0.82388681     179 35.1  37
## 329          0.34014640     102 45.5  23
## 330          0.18639905     105 30.8  37
## 331          0.19088602     118 23.1  46
## 332          0.09218007      87 32.7  25
## 333          0.91902894     180 43.3  41
## 334          0.13200336     106 23.6  44
## 335          0.05320940      95 23.9  22
## 336          0.86742537     165 47.9  26
## 337          0.35965734     117 33.8  44
## 338          0.29290730     115 31.2  44
## 339          0.59568971     152 34.2  33
## 340          0.88624733     178 39.9  41
## 341          0.18920894     130 25.9  22
## 342          0.09132616      95 25.9  36
## 344          0.34659852     122 34.7  33
## 345          0.32815123      95 36.8  57
## 346          0.57649831     126 38.5  49
## 347          0.29236645     139 28.7  22
## 348          0.10531291     116 23.5  23
## 349          0.05676819      99 21.8  26
## 351          0.24193286      92 42.2  29
## 352          0.37729664     137 31.2  30
## 353          0.07898002      61 34.4  46
## 354          0.06279661      90 27.2  24
## 355          0.19814573      90 42.7  21
## 356          0.72467781     165 30.4  49
## 357          0.31076811     125 33.3  28
## 358          0.59801892     129 39.9  44
## 359          0.20451411      88 35.3  48
## 360          0.88526710     196 36.5  29
## 361          0.78897461     189 31.2  29
## 362          0.74400417     158 29.8  63
## 363          0.50320629     103 39.2  65
## 364          0.82287641     146 38.5  67
## 365          0.54649709     147 34.9  30
## 366          0.16790447      99 34.0  30
## 367          0.21165603     124 27.6  29
## 368          0.04952246     101 21.0  21
## 369          0.04507078      81 27.5  22
## 370          0.48272227     133 32.8  45
## 371          0.78269038     173 38.4  25
## 373          0.09704412      84 35.8  21
## 374          0.19000429     105 34.9  25
## 375          0.34459338     122 36.2  28
## 376          0.75532282     140 39.2  58
## 377          0.06564960      98 25.2  22
## 378          0.12244145      87 37.2  22
## 379          0.85402788     156 48.3  32
## 380          0.30435064      93 43.4  35
## 381          0.14485063     107 30.8  24
## 382          0.05348425     105 20.0  22
## 383          0.09319434     109 25.4  21
## 384          0.05402437      90 25.1  25
## 385          0.15572249     125 24.3  25
## 386          0.10794570     119 22.3  24
## 387          0.26789645     116 32.3  35
## 388          0.46951938     105 43.3  45
## 389          0.65094037     144 32.0  58
## 390          0.13731054     100 31.6  28
## 391          0.19779697     100 32.0  42
## 392          0.85134307     166 45.7  27
## 393          0.16168195     131 23.7  21
## 394          0.13430673     116 22.1  37
## 395          0.60509679     158 32.9  31
## 396          0.21179332     127 27.7  25
## 397          0.09248973      96 24.7  39
## 398          0.33946376     131 34.3  22
## 399          0.02913697      82 21.1  25
## 400          0.84267098     193 34.9  25
## 401          0.13084013      95 32.0  31
## 402          0.39847300     137 24.2  55
## 403          0.48699564     136 35.0  35
## 404          0.07268482      72 31.6  38
## 405          0.74444828     168 32.9  41
## 406          0.46625190     123 42.1  26
## 407          0.26301832     115 28.9  46
## 408          0.05958239     101 21.9  25
## 409          0.80446521     197 25.9  39
## 410          0.84435244     172 42.4  28
## 411          0.19801831     102 35.7  28
## 412          0.22338931     112 34.4  25
## 413          0.61960934     143 42.4  22
## 414          0.26996338     143 26.2  21
## 415          0.39684645     138 34.6  21
## 416          0.72171476     173 35.7  22
## 417          0.07505161      97 27.2  22
## 418          0.64654541     144 38.5  37
## 419          0.02475853      83 18.2  27
## 420          0.21863561     129 26.4  28
## 421          0.50245499     119 45.3  26
## 422          0.05982690      94 26.0  21
## 423          0.23869814     102 40.6  21
## 424          0.17118007     115 30.8  21
## 425          0.77187668     151 42.9  36
## 426          0.84799767     184 37.0  31
## 428          0.82533914     181 34.1  38
## 429          0.53910692     135 40.6  26
## 430          0.21756689      95 35.0  43
## 431          0.05413944      99 22.2  23
## 432          0.11409791      89 30.4  38
## 433          0.05393312      80 30.0  22
## 434          0.27662620     139 25.6  29
## 435          0.06907776      90 24.5  36
## 436          0.64969264     141 42.4  29
## 437          0.61721992     140 37.4  41
## 438          0.42076439     147 29.9  28
## 439          0.03395945      97 18.2  21
## 440          0.26189146     107 36.8  31
## 441          0.87450365     189 34.3  41
## 442          0.07172639      83 32.2  22
## 443          0.23064261     117 33.2  24
## 444          0.18113764     108 30.5  33
## 445          0.20642234     117 29.7  30
## 446          0.96817202     180 59.4  25
## 447          0.08292490     100 25.3  28
## 448          0.16339824      95 36.5  26
## 449          0.15599867     104 33.6  22
## 450          0.21704383     120 30.5  26
## 451          0.02779798      82 21.2  23
## 452          0.26600085     134 28.9  23
## 453          0.18259140      91 39.9  25
## 454          0.27355231     119 19.6  72
## 455          0.19842979     100 37.8  24
## 456          0.78495582     175 33.6  38
## 457          0.48559124     135 26.7  62
## 458          0.07070356      86 30.2  24
## 459          0.74404318     148 37.6  51
## 460          0.59394115     134 25.9  81
## 461          0.17913761     120 20.8  48
## 462          0.02176006      71 21.8  26
## 463          0.10771664      74 35.3  39
## 464          0.08587260      88 27.6  37
## 465          0.14009285     115 24.0  34
## 466          0.11253219     124 21.8  21
## 467          0.03642788      74 27.8  22
## 468          0.17309703      97 36.8  25
## 469          0.27220493     120 30.0  38
## 470          0.79486433     154 46.1  27
## 471          0.64494785     144 41.3  28
## 472          0.36560333     137 33.2  22
## 473          0.33439546     119 38.8  22
## 474          0.48018967     136 29.9  50
## 475          0.15482239     114 28.9  24
## 476          0.49529998     137 27.3  59
## 477          0.19109830     105 33.7  29
## 478          0.12410536     114 23.8  31
## 479          0.24797063     126 25.9  39
## 480          0.49527096     132 28.0  63
## 481          0.68458039     158 35.5  35
## 482          0.33885594     123 35.2  29
## 483          0.06226367      85 27.8  28
## 484          0.12371592      84 38.2  23
## 485          0.72687680     145 44.2  31
## 486          0.56265143     135 42.3  24
## 487          0.54101247     139 40.7  21
## 488          0.95052210     173 46.5  58
## 489          0.08227156      99 25.6  28
## 490          0.89372062     194 26.1  67
## 491          0.11005245      83 36.8  24
## 492          0.16022990      89 33.5  42
## 493          0.16490742      99 32.8  33
## 494          0.33102362     125 28.9  45
## 496          0.75953954     166 26.6  66
## 497          0.12702210     110 26.0  30
## 498          0.06099965      81 30.1  25
## 499          0.84950541     195 25.1  55
## 500          0.54761477     154 29.3  39
## 501          0.11828122     117 25.2  21
## 502          0.12966096      84 37.2  28
## 504          0.17866354      94 33.3  41
## 505          0.24526648      96 37.3  40
## 506          0.09221063      75 33.3  38
## 507          0.83844593     180 36.5  35
## 508          0.22417076     130 28.6  21
## 509          0.06208386      84 30.4  21
## 510          0.33491186     120 25.0  64
## 511          0.11299337      84 29.7  46
## 512          0.18168280     139 22.1  21
## 513          0.12336536      91 24.2  58
## 514          0.06204313      91 27.3  22
## 515          0.07400936      99 25.6  24
## 516          0.59909870     163 31.6  28
## 517          0.58968113     145 30.3  53
## 518          0.56205034     125 37.6  51
## 519          0.09884110      76 32.8  41
## 520          0.27576164     129 19.6  60
## 521          0.02523876      68 25.0  25
## 522          0.28936865     124 33.2  26
## 524          0.48747062     130 34.2  45
## 525          0.25656343     125 31.6  24
## 526          0.03291344      87 21.8  21
## 527          0.03395945      97 18.2  21
## 528          0.13475781     116 26.3  24
## 529          0.18580656     117 30.8  22
## 530          0.12036732     111 24.6  31
## 531          0.19948677     122 29.8  22
## 532          0.38363361     107 45.3  24
## 533          0.19213803      86 41.3  29
## 534          0.09680854      91 29.8  31
## 535          0.06801244      77 33.3  24
## 536          0.32583319     132 32.9  23
## 537          0.21032098     105 29.6  46
## 538          0.04166002      57 21.7  67
## 539          0.35441889     127 36.3  23
## 540          0.43504221     129 36.4  32
## 541          0.33020703     100 39.4  43
## 542          0.31016162     128 32.4  27
## 543          0.25095444      90 34.9  56
## 544          0.14385608      84 39.5  25
## 545          0.09976982      88 32.0  29
## 546          0.85041886     186 34.5  37
## 547          0.95475562     187 43.6  53
## 548          0.35407064     131 33.1  28
## 549          0.76428975     164 32.8  50
## 550          0.78684564     189 28.5  37
## 551          0.13623736     116 27.4  21
## 552          0.07829450      84 31.9  25
## 553          0.35648962     114 27.8  66
## 554          0.07172684      88 29.9  23
## 555          0.12665258      84 36.9  28
## 556          0.21859713     124 25.5  37
## 557          0.21354962      97 38.1  30
## 558          0.27639435     110 27.8  58
## 559          0.49525433     103 46.2  42
## 560          0.09072912      85 30.1  35
## 561          0.49863081     125 33.8  54
## 562          0.92529006     198 41.3  28
## 563          0.13282467      87 37.6  24
## 564          0.10152438      99 26.9  32
## 565          0.10768221      91 32.4  27
## 566          0.06408069      95 26.1  22
## 567          0.19062099      99 38.6  21
## 568          0.17225806      92 32.0  46
## 569          0.57765294     154 31.3  37
## 570          0.33059973     121 34.3  33
## 571          0.09766907      78 32.5  39
## 572          0.14429785     130 22.6  21
## 573          0.14094557     111 29.5  22
## 574          0.14150256      98 34.7  22
## 575          0.35723780     143 30.1  23
## 576          0.28936725     119 35.5  25
## 577          0.11561206     108 24.0  35
## 578          0.40501046     118 42.9  21
## 579          0.29985290     133 27.0  36
## 580          0.94605555     197 34.7  62
## 581          0.67186867     151 42.1  21
## 582          0.10536856     109 25.0  27
## 583          0.36048196     121 26.5  62
## 584          0.31028760     100 38.7  42
## 585          0.36443558     124 28.7  52
## 586          0.04412532      93 22.5  22
## 587          0.58904672     143 34.9  41
## 588          0.08645851     103 24.3  29
## 589          0.84621187     176 33.3  52
## 590          0.02132939      73 21.1  25
## 591          0.59997227     111 46.8  45
## 592          0.30450063     112 39.4  24
## 593          0.50255482     132 34.4  44
## 594          0.05509608      82 28.5  25
## 595          0.33883146     123 33.6  34
## 596          0.76026197     188 32.0  22
## 597          0.22016747      67 45.3  46
## 598          0.05892304      89 27.8  21
## 599          0.81919423     173 36.8  38
## 600          0.08801009     109 23.1  26
## 601          0.11183714     108 27.1  24
## 602          0.06362255      96 23.7  28
## 603          0.21954458     124 27.8  30
## 604          0.73280017     150 35.2  54
## 605          0.74174410     183 28.4  36
## 606          0.30819104     124 35.8  21
## 607          0.83525045     181 40.0  22
## 608          0.03576715      92 19.5  25
## 609          0.70485802     152 41.5  27
## 610          0.09343462     111 24.0  23
## 611          0.14159007     106 30.9  24
## 612          0.75749794     174 32.9  36
## 613          0.81997981     168 38.2  40
## 614          0.16291589     105 32.5  26
## 615          0.63373693     138 36.1  50
## 616          0.10212907     106 25.8  27
## 617          0.19210652     117 28.7  30
## 618          0.01550448      68 20.1  23
## 619          0.25255837     112 28.2  50
## 620          0.23051726     119 32.4  24
## 621          0.30983015     112 38.4  28
## 622          0.05806539      92 24.2  28
## 623          0.91880972     183 40.8  45
## 624          0.23434633      94 43.5  21
## 625          0.13870084     108 30.8  21
## 626          0.16560570      90 37.7  29
## 627          0.14562927     125 24.7  21
## 628          0.30377897     132 32.4  21
## 629          0.47868226     128 34.6  45
## 630          0.05359131      94 24.7  21
## 631          0.17582314     114 27.4  34
## 632          0.16503579     102 34.5  24
## 633          0.11155604     111 26.2  23
## 634          0.20058382     128 27.5  22
## 635          0.07258193      92 25.9  31
## 636          0.19084517     104 31.2  38
## 637          0.20214464     104 28.8  48
## 638          0.10023708      94 31.6  23
## 639          0.26993477      97 40.9  32
## 640          0.05098845     100 19.5  28
## 641          0.11900934     102 29.3  27
## 642          0.32851032     128 34.3  24
## 643          0.56852689     147 29.5  50
## 644          0.08089095      90 28.0  31
## 645          0.10727554     103 27.6  27
## 646          0.72028900     157 39.4  30
## 647          0.48785825     167 23.4  33
## 648          0.79490552     179 37.8  22
## 649          0.38877110     136 28.3  42
## 650          0.09982364     107 26.4  23
## 651          0.05337015      91 25.2  23
## 652          0.25640525     117 33.8  27
## 653          0.31091947     123 34.1  28
## 654          0.16989599     120 26.8  27
## 655          0.17316520     106 34.2  22
## 656          0.66116202     155 38.7  25
## 657          0.05447303     101 21.8  22
## 658          0.47537877     120 38.9  41
## 659          0.60974433     127 39.0  51
## 660          0.08753509      80 34.2  27
## 661          0.68185368     162 27.7  54
## 662          0.92576964     199 42.9  22
## 663          0.81949182     167 37.6  43
## 664          0.66187967     145 37.9  40
## 665          0.31610727     115 33.7  40
## 666          0.22464126     112 34.8  24
## 667          0.74038861     145 32.5  70
## 668          0.18688482     111 27.5  40
## 669          0.22045595      98 34.0  43
## 670          0.62406486     154 30.9  45
## 671          0.77816331     165 33.6  49
## 672          0.06718727      99 25.4  21
## 673          0.11105363      68 35.5  47
## 674          0.75292098     123 57.3  22
## 675          0.34281746      91 35.6  68
## 676          0.82671405     195 30.9  31
## 677          0.56464439     156 24.8  53
## 678          0.13697367      93 35.3  25
## 679          0.31378419     121 36.0  25
## 680          0.06850206     101 24.2  23
## 681          0.01422697      56 24.2  22
## 682          0.87261940     162 49.6  26
## 683          0.26484153      95 44.6  22
## 684          0.28598125     125 32.3  27
## 686          0.32094951     129 33.2  25
## 687          0.15362251     130 23.1  22
## 688          0.13511619     107 28.3  29
## 689          0.22573798     140 24.1  23
## 690          0.82408951     144 46.1  46
## 691          0.11455343     107 24.6  34
## 692          0.83801270     158 42.3  44
## 693          0.36316566     121 39.1  23
## 694          0.56041931     129 38.5  43
## 695          0.04713819      90 23.5  25
## 696          0.49449709     142 30.4  43
## 697          0.63379122     169 29.9  31
## 698          0.06673730      99 25.0  22
## 699          0.35029763     127 34.5  28
## 700          0.47563672     118 44.5  26
## 701          0.32580619     122 35.9  26
## 702          0.33060689     125 27.6  49
## 703          0.82826641     168 35.0  52
## 704          0.54623397     129 38.5  41
## 705          0.14206554     110 28.4  27
## 706          0.14030083      80 39.8  28
## 708          0.31026059     127 34.4  22
## 709          0.73746767     164 32.8  45
## 710          0.16033718      93 38.0  23
## 711          0.51831021     158 31.2  24
## 712          0.32110908     126 29.6  40
## 713          0.58460380     129 41.2  38
## 714          0.21470587     134 26.4  21
## 715          0.13700414     102 29.5  32
## 716          0.83664700     187 33.9  34
## 717          0.73899368     173 33.8  31
## 718          0.11811293      94 23.1  56
## 719          0.21112030     108 35.5  24
## 720          0.28973691      97 35.6  52
## 721          0.07753199      83 29.3  34
## 722          0.27735147     114 38.1  21
## 723          0.52482847     149 29.3  42
## 724          0.46044316     117 39.1  42
## 725          0.29918259     111 32.8  45
## 726          0.39551955     112 39.4  38
## 727          0.27863328     116 36.1  25
## 728          0.38207831     141 32.4  22
## 729          0.46885040     175 22.9  22
## 730          0.08098655      92 30.1  22
## 731          0.29185614     130 28.4  34
## 732          0.16991140     120 28.4  22
## 733          0.86244607     174 44.5  24
## 734          0.11608125     106 29.0  22
## 735          0.15609935     105 23.3  53
## 736          0.15782675      95 35.4  28
## 737          0.18370871     126 27.4  21
## 738          0.06634126      65 32.0  42
## 739          0.16444963      99 36.6  21
## 740          0.34166645     102 39.5  42
## 741          0.60048581     120 42.3  48
## 742          0.13057157     102 30.8  26
## 743          0.12257080     109 28.5  22
## 744          0.54257711     140 32.7  45
## 745          0.76309072     153 40.6  39
## 746          0.18772966     100 30.0  46
## 747          0.80105067     147 49.3  27
## 748          0.25368504      81 46.3  32
## 749          0.87161001     187 36.4  36
## 750          0.58476045     162 24.3  50
## 751          0.31730612     136 31.2  22
## 752          0.39481184     121 39.0  28
## 753          0.10506793     108 26.0  25
## 754          0.88435023     181 43.3  26
## 755          0.65508506     154 32.4  45
## 756          0.46396635     128 36.5  37
## 757          0.45736779     137 32.0  39
## 758          0.52258754     123 36.3  52
## 759          0.24005514     106 37.5  26
## 760          0.94278973     190 35.5  66
## 761          0.06158409      88 28.4  22
## 762          0.89970687     170 44.0  43
## 763          0.05205013      89 22.5  33
## 764          0.33601291     101 32.9  63
## 765          0.35029608     122 36.8  27
## 766          0.17967203     121 26.2  30
## 767          0.37685726     126 30.1  47
## 768          0.08803680      93 30.4  23
# Predicted classes
predicted.classes <- ifelse(predicted.data$probabilty_diabetes> 0.5, 1, 0)


# Confusion matrix
confusion <- table(
  Predicted = factor(predicted.classes, levels = c(0, 1)),
  Actual = factor(data_subset$Outcome_num, levels = c(0, 1))
)

confusion
##          Actual
## Predicted   0   1
##         0 429 114
##         1  59 150

Now let’s calculate key performance metrics:

#Extract Values:
TN <- 429
FP <- 59
FN <- 114
TP <- 150

#Metrics 
accuracy <- (TP + TN) / (TP + TN + FP + FN)
sensitivity <- TP / (TP + FN)   # also called recall or true positive rate
specificity <- TN / (TN + FP)   # true negative rate
precision <- TP / (TP + FP)     # positive predictive value
f1_score <- 2 * (precision * sensitivity) / (precision + sensitivity)

cat("Accuracy:", round(accuracy, 3), "\nSensitivity:", round(sensitivity, 3), "\nSpecificity:", round(specificity, 3), "\nPrecision:", round(precision, 3))
## Accuracy: 0.77 
## Sensitivity: 0.568 
## Specificity: 0.879 
## Precision: 0.718

Interpret: How well does the model perform? Is it better at detecting diabetes (sensitivity) or non-diabetes (specificity)? Why might this matter for medical diagnosis?

The model has an accuracy of 77% means it performs moderately well and that it correctly identifies most cases.

The model is better at identifying non-diabetic individuals because it has a high specificity of 87.9% than detecting people with diabetes. The sensitivity is 56.8% which is average meaning that the model misses a lot of people with diabetes. The precision is 71.8 percent showing that the positive predictions are mostly correct, the lower sensitivity suggest that the he model misses a lot of people with diabetes. In conclusion, the model is better at identify people without diabetes than cases of people with diabetes, this is important to consider in medicine because missing cases is detrimental to people’s livelihood and health.

Question 3: ROC Curve, AUC, and Interpretation

  • Plot the ROC curve, use the “data_subset” from Q2.

  • Calculate AUC.

#Enter your code here
library(pROC)
## Type 'citation("pROC")' for a citation.
## 
## Attaching package: 'pROC'
## The following objects are masked from 'package:stats':
## 
##     cov, smooth, var
# ROC curve & AUC on full data
roc_obj <- roc(response = data_subset$Outcome_num,
               predictor = logistic$fitted.values,
               levels = c(0, 1),
               direction = "<")  

# Print AUC value
auc_val <- auc(roc_obj); auc_val
## Area under the curve: 0.828
# Plot ROC with AUC displayed
plot.roc(roc_obj, print.auc = TRUE, legacy.axes = TRUE,
         xlab = "False Positive Rate (1 - Specificity)",
         ylab = "True Positive Rate (Sensitivity)")

What does AUC indicate (0.5 = random, 1.0 = perfect)?

The AUC model helps at distinguishing between two classes. AUC 0.5 means that the model is no different from random guessing and AUC 1.0= perfect classification.

For diabetes diagnosis, prioritize sensitivity (catching cases) or specificity (avoiding false positives)? Suggest a threshold and explain. For diabetes diagnosis, it’s better to focus on sensitivity because catching people who really have diabetes is more important than avoiding a few false alarms. A false negative is worse since the person won’t get the treatment they need, while a false positive just means more testing. To help catch more real cases, we can lower the threshold from 0.5 to something like 0.3 so the model marks more people as possibly having diabetes. This raises sensitivity, even if specificity drops a bit, and that’s usually the safer choice in medical screening.