In this homework, you will apply logistic regression to a real-world dataset: the Pima Indians Diabetes Database. This dataset contains medical records from 768 women of Pima Indian heritage, aged 21 or older, and is used to predict the onset of diabetes (binary outcome: 0 = no diabetes, 1 = diabetes) based on physiological measurements.
The data is publicly available from the UCI Machine Learning Repository and can be imported directly.
Dataset URL: https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv
Columns (no header in the CSV, so we need to assign them manually):
Task Overview: You will load the data, build a logistic regression model to predict diabetes onset using a subset of predictors (Glucose, BMI, Age), interpret the model, evaluate it with a confusion matrix and metrics, and analyze the ROC curve and AUC.
Cleaning the dataset Don’t change the following code
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.1 ✔ stringr 1.5.1
## ✔ ggplot2 4.0.0 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.1.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
url <- "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv"
data <- read.csv(url, header = FALSE)
colnames(data) <- c("Pregnancies", "Glucose", "BloodPressure", "SkinThickness", "Insulin", "BMI", "DiabetesPedigreeFunction", "Age", "Outcome")
data$Outcome <- as.factor(data$Outcome)
# Handle missing values (replace 0s with NA because 0 makes no sense here)
data$Glucose[data$Glucose == 0] <- NA
data$BloodPressure[data$BloodPressure == 0] <- NA
data$BMI[data$BMI == 0] <- NA
colSums(is.na(data))
## Pregnancies Glucose BloodPressure
## 0 5 35
## SkinThickness Insulin BMI
## 0 0 11
## DiabetesPedigreeFunction Age Outcome
## 0 0 0
Question 1: Create and Interpret a Logistic Regression Model - Fit a logistic regression model to predict Outcome using Glucose, BMI, and Age.
Provide the model summary.
Calculate and interpret R²: 1 - (model\(deviance / model\)null.deviance). What does it indicate about the model’s explanatory power?
## Enter your code here
logistic <- glm(Outcome ~ Glucose+BMI+Age, data=data, family="binomial")
summary(logistic)
##
## Call:
## glm(formula = Outcome ~ Glucose + BMI + Age, family = "binomial",
## data = data)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -9.032377 0.711037 -12.703 < 2e-16 ***
## Glucose 0.035548 0.003481 10.212 < 2e-16 ***
## BMI 0.089753 0.014377 6.243 4.3e-10 ***
## Age 0.028699 0.007809 3.675 0.000238 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 974.75 on 751 degrees of freedom
## Residual deviance: 724.96 on 748 degrees of freedom
## (16 observations deleted due to missingness)
## AIC: 732.96
##
## Number of Fisher Scoring iterations: 4
r_square <- 1 - (logistic$deviance/logistic$null.deviance)
r_square
## [1] 0.25626
What does it indicate about the model’s explanatory power? This indicates that model explains around 25.6% of the variation in diabetes outcomes.
What does the intercept represent (log-odds of diabetes when predictors are zero)?
The intercept represents the log-odds of having diabetes when BMI, Age and Glucose are = to 0. These values are not realistic because to have a meaningful real-world interpretation because there is no human with BMI of 0, AGE 0 and Glucose 0. Therefore, the intercept does not have meaningful real-world interpretation. It purpose is as the model’s baseline starting point for calculating log-odds.
For each predictor (Glucose, BMI, Age), does a one-unit increase raise or lower the odds of diabetes? Are they significant (p-value < 0.05)? The three predictors (glucose, BMI and Age), meaning that a one-unit increase in glucose increases the odds of diabetes. All the three p-values is less than 0.05, so it is a statistically significant predictor. As well as, the BMI coefficient is positive, so higher BMI increases the odds of diabetes. The Age coefficient is positive, meaning older age increases the odds of diabetes. so Age is significant, although its effect is smaller than Glucose or BMI.
Question 2: Confusion Matrix and Important Metric
Predict probabilities using the fitted model.
Create predicted classes with a 0.5 threshold (1 if probability > 0.5, else 0).
Build a confusion matrix (Predicted vs. Actual Outcome).
Calculate and report the metrics:
Accuracy: (TP + TN) / Total Sensitivity (Recall): TP / (TP + FN) Specificity: TN / (TN + FP) Precision: TP / (TP + FP)
Use the following starter code
# Keep only rows with no missing values in Glucose, BMI, or Age
data_subset <- data[complete.cases(data[, c("Glucose", "BMI", "Age")]), ]
#Create a numeric version of the outcome (0 = no diabetes, 1 = diabetes).This is required for calculating confusion matrices.
data_subset$Outcome_num <- ifelse(data_subset$Outcome == "1", 1, 0)
# Predicted probabilities
predicted.data <- data.frame(probabilty_diabetes=logistic$fitted.values,
Glucose = data_subset$Glucose,
BMI= data_subset$BMI,
AGE= data_subset$Age)
predicted.data
## probabilty_diabetes Glucose BMI AGE
## 1 0.66360006 148 33.6 50
## 2 0.06101402 85 26.6 31
## 3 0.61834186 183 23.3 32
## 4 0.06043396 89 28.1 21
## 5 0.65771328 137 43.1 33
## 6 0.14802668 116 25.6 30
## 7 0.06116212 78 31.0 26
## 8 0.28013239 115 35.3 29
## 9 0.90283168 197 30.5 53
## 11 0.29185049 110 37.6 30
## 12 0.79018904 168 38.0 34
## 13 0.49423685 139 27.1 57
## 14 0.88904285 189 30.1 59
## 15 0.65652969 166 25.8 51
## 16 0.13393352 100 30.0 32
## 17 0.54057167 118 45.8 31
## 18 0.15678020 107 29.6 31
## 19 0.36875552 103 43.3 33
## 20 0.28484895 115 34.6 32
## 21 0.43753713 126 39.3 27
## 22 0.28886248 99 35.4 50
## 23 0.93606735 196 39.8 41
## 24 0.20309566 119 29.0 29
## 25 0.68988838 143 36.6 51
## 26 0.34957706 125 31.1 41
## 27 0.72382298 147 39.4 43
## 28 0.05362751 97 23.2 22
## 29 0.43793280 145 22.2 57
## 30 0.32692619 117 34.1 38
## 31 0.44902956 109 36.0 60
## 32 0.55575994 158 31.6 28
## 33 0.04535146 88 24.8 22
## 34 0.04022137 92 19.9 28
## 35 0.28355775 122 27.6 45
## 36 0.09365574 103 24.0 33
## 37 0.46443795 138 33.2 35
## 38 0.24352489 102 32.9 46
## 39 0.16388268 90 38.2 27
## 40 0.46267837 111 37.1 56
## 41 0.76206518 180 34.0 26
## 42 0.59035695 133 40.2 37
## 43 0.13595021 106 22.7 48
## 44 0.93528537 171 45.4 54
## 45 0.55649424 159 27.4 40
## 46 0.86452121 180 42.0 25
## 47 0.41473239 146 29.7 29
## 48 0.03343950 71 28.0 22
## 49 0.27449787 103 39.1 31
## 51 0.04750056 103 19.4 22
## 52 0.07420421 101 24.2 26
## 53 0.05451568 88 24.4 30
## 54 0.87138830 176 33.7 58
## 55 0.65012987 150 34.7 42
## 56 0.02252439 73 23.0 21
## 57 0.89802263 187 37.7 41
## 58 0.40432752 100 46.8 31
## 59 0.74180750 146 40.5 44
## 60 0.28015166 105 41.5 22
## 62 0.44217047 133 32.9 39
## 63 0.01490161 44 25.0 36
## 64 0.25891619 141 25.4 24
## 65 0.30350831 114 32.8 42
## 66 0.12005398 99 29.0 32
## 67 0.24046916 109 32.5 38
## 68 0.55590485 109 42.7 54
## 69 0.03997582 95 19.6 25
## 70 0.38375591 146 28.9 27
## 71 0.15172557 100 32.9 28
## 72 0.31473014 139 28.6 26
## 73 0.63351150 126 43.4 42
## 74 0.34608813 129 35.1 23
## 75 0.06176809 79 32.0 22
## 77 0.06146859 62 32.6 41
## 78 0.18291002 95 37.7 27
## 79 0.56166300 131 43.2 26
## 80 0.10732115 112 25.0 24
## 81 0.08520738 113 22.4 22
## 83 0.08173799 83 29.3 36
## 84 0.06896305 101 24.6 22
## 85 0.78236624 137 48.8 37
## 86 0.19166509 110 32.4 27
## 87 0.33450675 106 36.6 45
## 88 0.21824692 100 38.5 26
## 89 0.59050304 136 37.1 43
## 90 0.10326043 107 26.5 24
## 91 0.02040067 80 19.1 21
## 92 0.30744087 123 32.0 34
## 93 0.31948020 81 46.7 42
## 94 0.39870087 134 23.8 60
## 95 0.23776250 142 24.7 21
## 96 0.56884043 144 33.9 40
## 97 0.09647762 92 31.6 24
## 98 0.01718930 71 20.4 22
## 99 0.07653216 93 28.7 23
## 100 0.65810775 122 49.7 31
## 101 0.77018913 163 39.0 33
## 102 0.33387716 151 26.1 22
## 103 0.12273756 125 22.5 21
## 104 0.04407517 81 26.6 24
## 105 0.15687002 85 39.6 27
## 106 0.20185497 126 28.7 21
## 107 0.05549181 96 22.4 27
## 108 0.44920355 144 29.5 37
## 109 0.09229839 83 34.3 25
## 110 0.16661940 95 37.4 24
## 111 0.67346049 171 33.3 24
## 112 0.70042432 155 34.0 46
## 113 0.08254696 89 31.2 23
## 114 0.07164768 76 34.0 25
## 115 0.62528208 160 30.5 39
## 116 0.67008425 146 31.2 61
## 117 0.38171854 124 34.0 38
## 118 0.07464178 78 33.7 25
## 119 0.08152471 97 28.2 22
## 120 0.05582038 99 23.2 21
## 121 0.90191908 162 53.2 25
## 122 0.20945381 111 34.2 24
## 123 0.17465864 107 33.6 23
## 124 0.51139163 132 26.8 69
## 125 0.20316943 113 33.3 23
## 126 0.44483507 88 55.0 26
## 127 0.48619280 120 42.9 30
## 128 0.23346251 118 33.3 23
## 129 0.34777790 117 34.5 40
## 130 0.26573200 105 27.9 62
## 131 0.67483938 173 29.7 33
## 132 0.31871597 122 33.3 33
## 133 0.72476634 170 34.5 30
## 134 0.18399064 84 38.3 39
## 135 0.04834651 96 21.1 26
## 136 0.33949245 125 33.8 31
## 137 0.10807987 100 30.8 21
## 138 0.07452834 93 28.7 22
## 139 0.30701297 129 31.2 29
## 140 0.23426580 105 36.9 28
## 141 0.26698030 128 21.1 55
## 142 0.34785494 106 39.5 38
## 143 0.16180713 108 32.5 22
## 144 0.25353709 108 32.4 42
## 145 0.51149487 154 32.8 23
## 147 0.05287106 57 32.8 41
## 148 0.17493386 106 30.5 34
## 149 0.74711670 147 33.7 65
## 150 0.06000638 90 27.3 22
## 151 0.46199537 136 37.4 24
## 152 0.12428635 114 21.9 37
## 153 0.68933166 156 34.3 42
## 154 0.67051462 153 40.6 23
## 155 0.96022281 188 47.9 43
## 156 0.86895301 152 50.0 36
## 157 0.06282464 99 24.6 21
## 158 0.09658196 109 25.2 23
## 159 0.06477073 88 29.0 22
## 160 0.85590643 163 40.9 47
## 161 0.50854865 151 29.7 36
## 162 0.31513699 102 37.2 45
## 163 0.44079186 114 44.2 27
## 164 0.09892429 100 29.7 21
## 165 0.34954797 131 31.6 32
## 166 0.18616725 104 29.9 41
## 167 0.44449836 148 32.5 22
## 168 0.24339382 120 29.6 34
## 169 0.19361257 110 31.9 29
## 170 0.15377524 111 28.4 29
## 171 0.16673816 102 30.8 36
## 172 0.43550662 134 35.4 29
## 173 0.06733514 87 28.9 25
## 174 0.15979544 79 43.5 23
## 175 0.05988683 75 29.7 33
## 176 0.78563291 179 32.7 36
## 177 0.11866407 85 31.2 42
## 178 0.91067595 129 67.1 26
## 179 0.80825740 143 45.0 47
## 180 0.53993201 130 39.1 37
## 181 0.05025601 87 23.2 32
## 182 0.26703676 119 34.9 23
## 184 0.03707194 73 26.8 27
## 185 0.40252230 141 27.6 40
## 186 0.90574248 194 35.9 41
## 187 0.86120289 181 30.1 60
## 188 0.34005026 128 32.0 33
## 189 0.14630667 109 27.9 31
## 190 0.36876070 139 31.6 25
## 191 0.07904064 111 22.6 21
## 192 0.36791134 123 33.1 40
## 193 0.59421282 159 30.4 36
## 194 0.83322339 135 52.3 40
## 195 0.06814984 85 24.4 42
## 196 0.72166676 158 39.4 29
## 197 0.07473295 105 24.3 21
## 198 0.07492956 107 22.9 23
## 199 0.21618023 109 34.8 26
## 200 0.45868538 148 30.9 29
## 201 0.16377117 113 31.0 21
## 202 0.56854406 138 40.1 28
## 203 0.13888668 108 27.3 32
## 204 0.05179431 99 20.4 27
## 205 0.39920095 103 37.7 55
## 206 0.10279202 111 23.9 27
## 207 0.94962688 196 37.5 57
## 208 0.83235858 162 37.7 52
## 209 0.11534291 96 33.2 21
## 210 0.86661368 184 35.5 41
## 211 0.04976700 81 27.7 25
## 212 0.67335126 147 42.8 24
## 213 0.89304307 179 34.2 60
## 214 0.61220621 140 42.6 24
## 215 0.27923019 112 34.2 36
## 216 0.76451750 151 41.8 38
## 217 0.22670470 109 35.8 25
## 218 0.27330482 125 30.0 32
## 219 0.07659116 85 29.0 32
## 220 0.38185630 112 37.8 41
## 221 0.72467033 177 34.6 21
## 222 0.78827161 158 31.6 66
## 223 0.18565015 119 25.2 37
## 224 0.58685205 142 28.8 61
## 225 0.06829164 100 23.6 26
## 226 0.09949318 87 34.6 22
## 227 0.18367083 101 35.7 26
## 228 0.68004613 162 37.2 24
## 229 0.89605868 197 36.7 31
## 230 0.46813090 117 45.2 24
## 231 0.64472858 142 44.0 22
## 232 0.76813313 134 46.2 46
## 233 0.03512856 79 25.4 22
## 234 0.32697577 122 35.0 29
## 235 0.04410466 74 29.7 23
## 236 0.84628204 171 43.6 26
## 237 0.88969137 181 35.9 51
## 238 0.87532621 179 44.1 23
## 239 0.61780765 164 30.8 32
## 240 0.05170763 104 18.4 27
## 241 0.07082780 91 29.2 21
## 242 0.10017276 91 33.1 22
## 243 0.23827632 139 25.6 22
## 244 0.19422429 119 27.1 33
## 245 0.60311596 146 38.2 29
## 246 0.83303549 184 30.0 49
## 247 0.32770854 122 31.2 41
## 248 0.89909414 165 52.3 23
## 249 0.38428435 124 35.4 34
## 250 0.15124012 111 30.1 23
## 251 0.22120889 106 31.2 42
## 252 0.23889824 129 28.0 27
## 253 0.04953331 90 24.4 24
## 254 0.11459758 86 35.8 25
## 255 0.11691034 92 27.6 44
## 256 0.19828080 113 33.6 21
## 257 0.17887125 111 30.1 30
## 258 0.15623408 114 28.7 25
## 259 0.69883589 193 25.9 24
## 260 0.71707281 155 33.3 51
## 261 0.81853025 191 30.9 34
## 262 0.36525031 141 30.0 27
## 263 0.11051715 95 32.1 24
## 264 0.67512915 142 32.4 63
## 265 0.31358505 123 32.0 35
## 266 0.20261849 96 33.6 43
## 267 0.46226051 138 36.3 25
## 268 0.44933995 128 40.0 24
## 269 0.07235915 102 25.1 21
## 270 0.36110033 146 27.5 28
## 271 0.43567653 101 45.6 38
## 272 0.08877056 108 25.2 21
## 273 0.18493829 122 23.0 40
## 274 0.05088367 71 33.2 21
## 275 0.33128368 106 34.2 52
## 276 0.24506561 100 40.5 25
## 277 0.11369280 106 26.5 29
## 278 0.10154494 104 27.8 23
## 279 0.24801836 114 24.9 57
## 280 0.09186564 108 25.3 22
## 281 0.58972781 146 37.9 28
## 282 0.47370166 129 35.9 39
## 283 0.41711384 133 32.4 37
## 284 0.68313032 161 30.4 47
## 285 0.21797420 108 27.0 52
## 286 0.40116323 136 26.0 51
## 287 0.71641934 155 38.7 34
## 288 0.53067228 119 45.6 29
## 289 0.04712263 96 20.8 26
## 290 0.26775522 108 36.1 33
## 291 0.08745865 78 36.9 21
## 292 0.22682859 107 36.6 25
## 293 0.57291177 128 43.3 31
## 294 0.46046734 128 40.5 24
## 295 0.62758689 161 21.9 65
## 296 0.58058435 151 35.5 28
## 297 0.37824219 146 28.0 29
## 298 0.24803175 126 30.7 24
## 299 0.29474268 100 36.6 46
## 300 0.21955100 112 23.6 58
## 301 0.66018715 167 32.3 30
## 302 0.41100863 144 31.6 25
## 303 0.11129750 77 35.8 35
## 304 0.64729036 115 52.9 28
## 305 0.32005860 150 21.0 37
## 306 0.40826274 120 39.7 29
## 307 0.58137125 161 25.5 47
## 308 0.20853969 137 24.8 21
## 309 0.26360927 128 30.5 25
## 310 0.30776656 124 32.9 30
## 311 0.06535416 80 26.2 41
## 312 0.25036944 106 39.4 22
## 313 0.41092641 155 26.6 27
## 314 0.16107311 113 29.5 25
## 315 0.33149016 109 35.9 43
## 316 0.22369709 112 34.1 26
## 317 0.05117748 99 19.3 30
## 318 0.73245084 182 30.5 29
## 319 0.32712977 115 38.1 28
## 320 0.84109136 194 23.5 59
## 321 0.25184255 129 27.5 31
## 322 0.18282382 112 31.6 25
## 323 0.24378690 124 27.4 36
## 324 0.50258896 152 26.8 43
## 325 0.22371606 112 35.7 21
## 326 0.38582606 157 25.6 24
## 327 0.33531991 122 35.1 30
## 328 0.82388681 179 35.1 37
## 329 0.34014640 102 45.5 23
## 330 0.18639905 105 30.8 37
## 331 0.19088602 118 23.1 46
## 332 0.09218007 87 32.7 25
## 333 0.91902894 180 43.3 41
## 334 0.13200336 106 23.6 44
## 335 0.05320940 95 23.9 22
## 336 0.86742537 165 47.9 26
## 337 0.35965734 117 33.8 44
## 338 0.29290730 115 31.2 44
## 339 0.59568971 152 34.2 33
## 340 0.88624733 178 39.9 41
## 341 0.18920894 130 25.9 22
## 342 0.09132616 95 25.9 36
## 344 0.34659852 122 34.7 33
## 345 0.32815123 95 36.8 57
## 346 0.57649831 126 38.5 49
## 347 0.29236645 139 28.7 22
## 348 0.10531291 116 23.5 23
## 349 0.05676819 99 21.8 26
## 351 0.24193286 92 42.2 29
## 352 0.37729664 137 31.2 30
## 353 0.07898002 61 34.4 46
## 354 0.06279661 90 27.2 24
## 355 0.19814573 90 42.7 21
## 356 0.72467781 165 30.4 49
## 357 0.31076811 125 33.3 28
## 358 0.59801892 129 39.9 44
## 359 0.20451411 88 35.3 48
## 360 0.88526710 196 36.5 29
## 361 0.78897461 189 31.2 29
## 362 0.74400417 158 29.8 63
## 363 0.50320629 103 39.2 65
## 364 0.82287641 146 38.5 67
## 365 0.54649709 147 34.9 30
## 366 0.16790447 99 34.0 30
## 367 0.21165603 124 27.6 29
## 368 0.04952246 101 21.0 21
## 369 0.04507078 81 27.5 22
## 370 0.48272227 133 32.8 45
## 371 0.78269038 173 38.4 25
## 373 0.09704412 84 35.8 21
## 374 0.19000429 105 34.9 25
## 375 0.34459338 122 36.2 28
## 376 0.75532282 140 39.2 58
## 377 0.06564960 98 25.2 22
## 378 0.12244145 87 37.2 22
## 379 0.85402788 156 48.3 32
## 380 0.30435064 93 43.4 35
## 381 0.14485063 107 30.8 24
## 382 0.05348425 105 20.0 22
## 383 0.09319434 109 25.4 21
## 384 0.05402437 90 25.1 25
## 385 0.15572249 125 24.3 25
## 386 0.10794570 119 22.3 24
## 387 0.26789645 116 32.3 35
## 388 0.46951938 105 43.3 45
## 389 0.65094037 144 32.0 58
## 390 0.13731054 100 31.6 28
## 391 0.19779697 100 32.0 42
## 392 0.85134307 166 45.7 27
## 393 0.16168195 131 23.7 21
## 394 0.13430673 116 22.1 37
## 395 0.60509679 158 32.9 31
## 396 0.21179332 127 27.7 25
## 397 0.09248973 96 24.7 39
## 398 0.33946376 131 34.3 22
## 399 0.02913697 82 21.1 25
## 400 0.84267098 193 34.9 25
## 401 0.13084013 95 32.0 31
## 402 0.39847300 137 24.2 55
## 403 0.48699564 136 35.0 35
## 404 0.07268482 72 31.6 38
## 405 0.74444828 168 32.9 41
## 406 0.46625190 123 42.1 26
## 407 0.26301832 115 28.9 46
## 408 0.05958239 101 21.9 25
## 409 0.80446521 197 25.9 39
## 410 0.84435244 172 42.4 28
## 411 0.19801831 102 35.7 28
## 412 0.22338931 112 34.4 25
## 413 0.61960934 143 42.4 22
## 414 0.26996338 143 26.2 21
## 415 0.39684645 138 34.6 21
## 416 0.72171476 173 35.7 22
## 417 0.07505161 97 27.2 22
## 418 0.64654541 144 38.5 37
## 419 0.02475853 83 18.2 27
## 420 0.21863561 129 26.4 28
## 421 0.50245499 119 45.3 26
## 422 0.05982690 94 26.0 21
## 423 0.23869814 102 40.6 21
## 424 0.17118007 115 30.8 21
## 425 0.77187668 151 42.9 36
## 426 0.84799767 184 37.0 31
## 428 0.82533914 181 34.1 38
## 429 0.53910692 135 40.6 26
## 430 0.21756689 95 35.0 43
## 431 0.05413944 99 22.2 23
## 432 0.11409791 89 30.4 38
## 433 0.05393312 80 30.0 22
## 434 0.27662620 139 25.6 29
## 435 0.06907776 90 24.5 36
## 436 0.64969264 141 42.4 29
## 437 0.61721992 140 37.4 41
## 438 0.42076439 147 29.9 28
## 439 0.03395945 97 18.2 21
## 440 0.26189146 107 36.8 31
## 441 0.87450365 189 34.3 41
## 442 0.07172639 83 32.2 22
## 443 0.23064261 117 33.2 24
## 444 0.18113764 108 30.5 33
## 445 0.20642234 117 29.7 30
## 446 0.96817202 180 59.4 25
## 447 0.08292490 100 25.3 28
## 448 0.16339824 95 36.5 26
## 449 0.15599867 104 33.6 22
## 450 0.21704383 120 30.5 26
## 451 0.02779798 82 21.2 23
## 452 0.26600085 134 28.9 23
## 453 0.18259140 91 39.9 25
## 454 0.27355231 119 19.6 72
## 455 0.19842979 100 37.8 24
## 456 0.78495582 175 33.6 38
## 457 0.48559124 135 26.7 62
## 458 0.07070356 86 30.2 24
## 459 0.74404318 148 37.6 51
## 460 0.59394115 134 25.9 81
## 461 0.17913761 120 20.8 48
## 462 0.02176006 71 21.8 26
## 463 0.10771664 74 35.3 39
## 464 0.08587260 88 27.6 37
## 465 0.14009285 115 24.0 34
## 466 0.11253219 124 21.8 21
## 467 0.03642788 74 27.8 22
## 468 0.17309703 97 36.8 25
## 469 0.27220493 120 30.0 38
## 470 0.79486433 154 46.1 27
## 471 0.64494785 144 41.3 28
## 472 0.36560333 137 33.2 22
## 473 0.33439546 119 38.8 22
## 474 0.48018967 136 29.9 50
## 475 0.15482239 114 28.9 24
## 476 0.49529998 137 27.3 59
## 477 0.19109830 105 33.7 29
## 478 0.12410536 114 23.8 31
## 479 0.24797063 126 25.9 39
## 480 0.49527096 132 28.0 63
## 481 0.68458039 158 35.5 35
## 482 0.33885594 123 35.2 29
## 483 0.06226367 85 27.8 28
## 484 0.12371592 84 38.2 23
## 485 0.72687680 145 44.2 31
## 486 0.56265143 135 42.3 24
## 487 0.54101247 139 40.7 21
## 488 0.95052210 173 46.5 58
## 489 0.08227156 99 25.6 28
## 490 0.89372062 194 26.1 67
## 491 0.11005245 83 36.8 24
## 492 0.16022990 89 33.5 42
## 493 0.16490742 99 32.8 33
## 494 0.33102362 125 28.9 45
## 496 0.75953954 166 26.6 66
## 497 0.12702210 110 26.0 30
## 498 0.06099965 81 30.1 25
## 499 0.84950541 195 25.1 55
## 500 0.54761477 154 29.3 39
## 501 0.11828122 117 25.2 21
## 502 0.12966096 84 37.2 28
## 504 0.17866354 94 33.3 41
## 505 0.24526648 96 37.3 40
## 506 0.09221063 75 33.3 38
## 507 0.83844593 180 36.5 35
## 508 0.22417076 130 28.6 21
## 509 0.06208386 84 30.4 21
## 510 0.33491186 120 25.0 64
## 511 0.11299337 84 29.7 46
## 512 0.18168280 139 22.1 21
## 513 0.12336536 91 24.2 58
## 514 0.06204313 91 27.3 22
## 515 0.07400936 99 25.6 24
## 516 0.59909870 163 31.6 28
## 517 0.58968113 145 30.3 53
## 518 0.56205034 125 37.6 51
## 519 0.09884110 76 32.8 41
## 520 0.27576164 129 19.6 60
## 521 0.02523876 68 25.0 25
## 522 0.28936865 124 33.2 26
## 524 0.48747062 130 34.2 45
## 525 0.25656343 125 31.6 24
## 526 0.03291344 87 21.8 21
## 527 0.03395945 97 18.2 21
## 528 0.13475781 116 26.3 24
## 529 0.18580656 117 30.8 22
## 530 0.12036732 111 24.6 31
## 531 0.19948677 122 29.8 22
## 532 0.38363361 107 45.3 24
## 533 0.19213803 86 41.3 29
## 534 0.09680854 91 29.8 31
## 535 0.06801244 77 33.3 24
## 536 0.32583319 132 32.9 23
## 537 0.21032098 105 29.6 46
## 538 0.04166002 57 21.7 67
## 539 0.35441889 127 36.3 23
## 540 0.43504221 129 36.4 32
## 541 0.33020703 100 39.4 43
## 542 0.31016162 128 32.4 27
## 543 0.25095444 90 34.9 56
## 544 0.14385608 84 39.5 25
## 545 0.09976982 88 32.0 29
## 546 0.85041886 186 34.5 37
## 547 0.95475562 187 43.6 53
## 548 0.35407064 131 33.1 28
## 549 0.76428975 164 32.8 50
## 550 0.78684564 189 28.5 37
## 551 0.13623736 116 27.4 21
## 552 0.07829450 84 31.9 25
## 553 0.35648962 114 27.8 66
## 554 0.07172684 88 29.9 23
## 555 0.12665258 84 36.9 28
## 556 0.21859713 124 25.5 37
## 557 0.21354962 97 38.1 30
## 558 0.27639435 110 27.8 58
## 559 0.49525433 103 46.2 42
## 560 0.09072912 85 30.1 35
## 561 0.49863081 125 33.8 54
## 562 0.92529006 198 41.3 28
## 563 0.13282467 87 37.6 24
## 564 0.10152438 99 26.9 32
## 565 0.10768221 91 32.4 27
## 566 0.06408069 95 26.1 22
## 567 0.19062099 99 38.6 21
## 568 0.17225806 92 32.0 46
## 569 0.57765294 154 31.3 37
## 570 0.33059973 121 34.3 33
## 571 0.09766907 78 32.5 39
## 572 0.14429785 130 22.6 21
## 573 0.14094557 111 29.5 22
## 574 0.14150256 98 34.7 22
## 575 0.35723780 143 30.1 23
## 576 0.28936725 119 35.5 25
## 577 0.11561206 108 24.0 35
## 578 0.40501046 118 42.9 21
## 579 0.29985290 133 27.0 36
## 580 0.94605555 197 34.7 62
## 581 0.67186867 151 42.1 21
## 582 0.10536856 109 25.0 27
## 583 0.36048196 121 26.5 62
## 584 0.31028760 100 38.7 42
## 585 0.36443558 124 28.7 52
## 586 0.04412532 93 22.5 22
## 587 0.58904672 143 34.9 41
## 588 0.08645851 103 24.3 29
## 589 0.84621187 176 33.3 52
## 590 0.02132939 73 21.1 25
## 591 0.59997227 111 46.8 45
## 592 0.30450063 112 39.4 24
## 593 0.50255482 132 34.4 44
## 594 0.05509608 82 28.5 25
## 595 0.33883146 123 33.6 34
## 596 0.76026197 188 32.0 22
## 597 0.22016747 67 45.3 46
## 598 0.05892304 89 27.8 21
## 599 0.81919423 173 36.8 38
## 600 0.08801009 109 23.1 26
## 601 0.11183714 108 27.1 24
## 602 0.06362255 96 23.7 28
## 603 0.21954458 124 27.8 30
## 604 0.73280017 150 35.2 54
## 605 0.74174410 183 28.4 36
## 606 0.30819104 124 35.8 21
## 607 0.83525045 181 40.0 22
## 608 0.03576715 92 19.5 25
## 609 0.70485802 152 41.5 27
## 610 0.09343462 111 24.0 23
## 611 0.14159007 106 30.9 24
## 612 0.75749794 174 32.9 36
## 613 0.81997981 168 38.2 40
## 614 0.16291589 105 32.5 26
## 615 0.63373693 138 36.1 50
## 616 0.10212907 106 25.8 27
## 617 0.19210652 117 28.7 30
## 618 0.01550448 68 20.1 23
## 619 0.25255837 112 28.2 50
## 620 0.23051726 119 32.4 24
## 621 0.30983015 112 38.4 28
## 622 0.05806539 92 24.2 28
## 623 0.91880972 183 40.8 45
## 624 0.23434633 94 43.5 21
## 625 0.13870084 108 30.8 21
## 626 0.16560570 90 37.7 29
## 627 0.14562927 125 24.7 21
## 628 0.30377897 132 32.4 21
## 629 0.47868226 128 34.6 45
## 630 0.05359131 94 24.7 21
## 631 0.17582314 114 27.4 34
## 632 0.16503579 102 34.5 24
## 633 0.11155604 111 26.2 23
## 634 0.20058382 128 27.5 22
## 635 0.07258193 92 25.9 31
## 636 0.19084517 104 31.2 38
## 637 0.20214464 104 28.8 48
## 638 0.10023708 94 31.6 23
## 639 0.26993477 97 40.9 32
## 640 0.05098845 100 19.5 28
## 641 0.11900934 102 29.3 27
## 642 0.32851032 128 34.3 24
## 643 0.56852689 147 29.5 50
## 644 0.08089095 90 28.0 31
## 645 0.10727554 103 27.6 27
## 646 0.72028900 157 39.4 30
## 647 0.48785825 167 23.4 33
## 648 0.79490552 179 37.8 22
## 649 0.38877110 136 28.3 42
## 650 0.09982364 107 26.4 23
## 651 0.05337015 91 25.2 23
## 652 0.25640525 117 33.8 27
## 653 0.31091947 123 34.1 28
## 654 0.16989599 120 26.8 27
## 655 0.17316520 106 34.2 22
## 656 0.66116202 155 38.7 25
## 657 0.05447303 101 21.8 22
## 658 0.47537877 120 38.9 41
## 659 0.60974433 127 39.0 51
## 660 0.08753509 80 34.2 27
## 661 0.68185368 162 27.7 54
## 662 0.92576964 199 42.9 22
## 663 0.81949182 167 37.6 43
## 664 0.66187967 145 37.9 40
## 665 0.31610727 115 33.7 40
## 666 0.22464126 112 34.8 24
## 667 0.74038861 145 32.5 70
## 668 0.18688482 111 27.5 40
## 669 0.22045595 98 34.0 43
## 670 0.62406486 154 30.9 45
## 671 0.77816331 165 33.6 49
## 672 0.06718727 99 25.4 21
## 673 0.11105363 68 35.5 47
## 674 0.75292098 123 57.3 22
## 675 0.34281746 91 35.6 68
## 676 0.82671405 195 30.9 31
## 677 0.56464439 156 24.8 53
## 678 0.13697367 93 35.3 25
## 679 0.31378419 121 36.0 25
## 680 0.06850206 101 24.2 23
## 681 0.01422697 56 24.2 22
## 682 0.87261940 162 49.6 26
## 683 0.26484153 95 44.6 22
## 684 0.28598125 125 32.3 27
## 686 0.32094951 129 33.2 25
## 687 0.15362251 130 23.1 22
## 688 0.13511619 107 28.3 29
## 689 0.22573798 140 24.1 23
## 690 0.82408951 144 46.1 46
## 691 0.11455343 107 24.6 34
## 692 0.83801270 158 42.3 44
## 693 0.36316566 121 39.1 23
## 694 0.56041931 129 38.5 43
## 695 0.04713819 90 23.5 25
## 696 0.49449709 142 30.4 43
## 697 0.63379122 169 29.9 31
## 698 0.06673730 99 25.0 22
## 699 0.35029763 127 34.5 28
## 700 0.47563672 118 44.5 26
## 701 0.32580619 122 35.9 26
## 702 0.33060689 125 27.6 49
## 703 0.82826641 168 35.0 52
## 704 0.54623397 129 38.5 41
## 705 0.14206554 110 28.4 27
## 706 0.14030083 80 39.8 28
## 708 0.31026059 127 34.4 22
## 709 0.73746767 164 32.8 45
## 710 0.16033718 93 38.0 23
## 711 0.51831021 158 31.2 24
## 712 0.32110908 126 29.6 40
## 713 0.58460380 129 41.2 38
## 714 0.21470587 134 26.4 21
## 715 0.13700414 102 29.5 32
## 716 0.83664700 187 33.9 34
## 717 0.73899368 173 33.8 31
## 718 0.11811293 94 23.1 56
## 719 0.21112030 108 35.5 24
## 720 0.28973691 97 35.6 52
## 721 0.07753199 83 29.3 34
## 722 0.27735147 114 38.1 21
## 723 0.52482847 149 29.3 42
## 724 0.46044316 117 39.1 42
## 725 0.29918259 111 32.8 45
## 726 0.39551955 112 39.4 38
## 727 0.27863328 116 36.1 25
## 728 0.38207831 141 32.4 22
## 729 0.46885040 175 22.9 22
## 730 0.08098655 92 30.1 22
## 731 0.29185614 130 28.4 34
## 732 0.16991140 120 28.4 22
## 733 0.86244607 174 44.5 24
## 734 0.11608125 106 29.0 22
## 735 0.15609935 105 23.3 53
## 736 0.15782675 95 35.4 28
## 737 0.18370871 126 27.4 21
## 738 0.06634126 65 32.0 42
## 739 0.16444963 99 36.6 21
## 740 0.34166645 102 39.5 42
## 741 0.60048581 120 42.3 48
## 742 0.13057157 102 30.8 26
## 743 0.12257080 109 28.5 22
## 744 0.54257711 140 32.7 45
## 745 0.76309072 153 40.6 39
## 746 0.18772966 100 30.0 46
## 747 0.80105067 147 49.3 27
## 748 0.25368504 81 46.3 32
## 749 0.87161001 187 36.4 36
## 750 0.58476045 162 24.3 50
## 751 0.31730612 136 31.2 22
## 752 0.39481184 121 39.0 28
## 753 0.10506793 108 26.0 25
## 754 0.88435023 181 43.3 26
## 755 0.65508506 154 32.4 45
## 756 0.46396635 128 36.5 37
## 757 0.45736779 137 32.0 39
## 758 0.52258754 123 36.3 52
## 759 0.24005514 106 37.5 26
## 760 0.94278973 190 35.5 66
## 761 0.06158409 88 28.4 22
## 762 0.89970687 170 44.0 43
## 763 0.05205013 89 22.5 33
## 764 0.33601291 101 32.9 63
## 765 0.35029608 122 36.8 27
## 766 0.17967203 121 26.2 30
## 767 0.37685726 126 30.1 47
## 768 0.08803680 93 30.4 23
# Predicted classes
predicted.classes <- ifelse(predicted.data$probabilty_diabetes> 0.5, 1, 0)
# Confusion matrix
confusion <- table(
Predicted = factor(predicted.classes, levels = c(0, 1)),
Actual = factor(data_subset$Outcome_num, levels = c(0, 1))
)
confusion
## Actual
## Predicted 0 1
## 0 429 114
## 1 59 150
#Extract Values:
TN <- 429
FP <- 59
FN <- 114
TP <- 150
#Metrics
accuracy <- (TP + TN) / (TP + TN + FP + FN)
sensitivity <- TP / (TP + FN) # also called recall or true positive rate
specificity <- TN / (TN + FP) # true negative rate
precision <- TP / (TP + FP) # positive predictive value
f1_score <- 2 * (precision * sensitivity) / (precision + sensitivity)
cat("Accuracy:", round(accuracy, 3), "\nSensitivity:", round(sensitivity, 3), "\nSpecificity:", round(specificity, 3), "\nPrecision:", round(precision, 3))
## Accuracy: 0.77
## Sensitivity: 0.568
## Specificity: 0.879
## Precision: 0.718
Interpret: How well does the model perform? Is it better at detecting diabetes (sensitivity) or non-diabetes (specificity)? Why might this matter for medical diagnosis?
The model has an accuracy of 77% means it performs moderately well and that it correctly identifies most cases.
The model is better at identifying non-diabetic individuals because it has a high specificity of 87.9% than detecting people with diabetes. The sensitivity is 56.8% which is average meaning that the model misses a lot of people with diabetes. The precision is 71.8 percent showing that the positive predictions are mostly correct, the lower sensitivity suggest that the he model misses a lot of people with diabetes. In conclusion, the model is better at identify people without diabetes than cases of people with diabetes, this is important to consider in medicine because missing cases is detrimental to people’s livelihood and health.
Question 3: ROC Curve, AUC, and Interpretation
Plot the ROC curve, use the “data_subset” from Q2.
Calculate AUC.
#Enter your code here
library(pROC)
## Type 'citation("pROC")' for a citation.
##
## Attaching package: 'pROC'
## The following objects are masked from 'package:stats':
##
## cov, smooth, var
# ROC curve & AUC on full data
roc_obj <- roc(response = data_subset$Outcome_num,
predictor = logistic$fitted.values,
levels = c(0, 1),
direction = "<")
# Print AUC value
auc_val <- auc(roc_obj); auc_val
## Area under the curve: 0.828
# Plot ROC with AUC displayed
plot.roc(roc_obj, print.auc = TRUE, legacy.axes = TRUE,
xlab = "False Positive Rate (1 - Specificity)",
ylab = "True Positive Rate (Sensitivity)")
What does AUC indicate (0.5 = random, 1.0 = perfect)?
The AUC model helps at distinguishing between two classes. AUC 0.5 means that the model is no different from random guessing and AUC 1.0= perfect classification.
For diabetes diagnosis, prioritize sensitivity (catching cases) or specificity (avoiding false positives)? Suggest a threshold and explain. For diabetes diagnosis, it’s better to focus on sensitivity because catching people who really have diabetes is more important than avoiding a few false alarms. A false negative is worse since the person won’t get the treatment they need, while a false positive just means more testing. To help catch more real cases, we can lower the threshold from 0.5 to something like 0.3 so the model marks more people as possibly having diabetes. This raises sensitivity, even if specificity drops a bit, and that’s usually the safer choice in medical screening.