Multiple Logistic Regression

Multiple logistic regression analysis of diabetes complications

Step 1: Generate the data

Study objective: To investigate the association between various risk factors and the occurrence of diabetes complications (yes/no) using multiple logistic regression.

  • The data set contains information on 1000 patients with diabetes.
  • The variables include, age (in years), body mass index (BMI), blood pressure (BP), cholesterol level, smoking status (yes/no), and the presence of diabetes complications (yes/no), randomly.
set.seed(123) # For reproducibility
n <- 1000
age <- rnorm(n, mean=55, sd=10)
bmi <- rnorm(n, mean=28, sd=5)
bp <- rnorm(n, mean=130, sd=15)
cholesterol <- rnorm(n, mean=200, sd=30)
smoking_status <- rbinom(n, 1, 0.3) # 30% smokers
# Logistic model to generate diabetes complications
logit_prob <- -5 + 0.02*age + 0.1*bmi + 0.005*bp + 0.004*cholesterol + 1.0*smoking_status
prob_diabetes_complications <- 1 / (1 + exp(-logit_prob))
diabetes_complications <- rbinom(n, 1, prob_diabetes_complications)

Create a data frame

diabetes_data <- data.frame(
  age = age,
  bmi = bmi,
  bp = bp,
  cholesterol = cholesterol,
  smoking_status = factor(smoking_status, labels=c("No", "Yes")),
  diabetes_complications = factor(diabetes_complications, labels=c("No", "Yes"))
)
head(diabetes_data)
##        age      bmi       bp cholesterol smoking_status diabetes_complications
## 1 49.39524 23.02101 122.3259    195.4908             No                    Yes
## 2 52.69823 22.80022 133.5541    190.1673             No                    Yes
## 3 70.58708 27.91010 121.8762    156.5550            Yes                    Yes
## 4 55.70508 27.33912 148.2884    179.0815             No                    Yes
## 5 56.29288 15.25329 132.6120    277.9547            Yes                     No
## 6 72.15065 33.20287 120.7710    198.8775            Yes                    Yes
  • (Optional) Export the data to a CSV file
write.csv(diabetes_data, "diabetes_data.csv", row.names = FALSE)
  • since the data already in factor, we do need to convert again. (mutate part is not needed)

Step 2: Load the required libraries

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.1     ✔ stringr   1.5.2
## ✔ ggplot2   4.0.0     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.1.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(broom)
library(haven)
library(here)
## here() starts at D:/DrPH/SEM1/Multivariable Analysis/Asignment Prof KIM/MLogR
library(gtsummary)

Step 3: Estimation

Univariable logistic regression

For age

options(scipen=999) to avoid scientific notation in output

modlog.age <- glm(diabetes_complications ~ age, data = diabetes_data, family = binomial(link = "logit"))
summary(modlog.age)
## 
## Call:
## glm(formula = diabetes_complications ~ age, family = binomial(link = "logit"), 
##     data = diabetes_data)
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.836200   0.377519  -2.215 0.026761 *  
## age          0.026078   0.006826   3.820 0.000133 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1302.2  on 999  degrees of freedom
## Residual deviance: 1287.2  on 998  degrees of freedom
## AIC: 1291.2
## 
## Number of Fisher Scoring iterations: 4
options(scipen=999)
tidy(modlog.age, exponentiate = TRUE, conf.int = TRUE)
## # A tibble: 2 × 7
##   term        estimate std.error statistic  p.value conf.low conf.high
##   <chr>          <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
## 1 (Intercept)    0.433   0.378       -2.21 0.0268      0.206     0.906
## 2 age            1.03    0.00683      3.82 0.000133    1.01      1.04

Intepretation: For each additional year of age, the odds of having diabetes complications increase by a factor of exp(0.026) = 1.03, when controlling for other variables in the model.

For BMI

modlog.bmi <- glm(diabetes_complications ~ bmi, data = diabetes_data, family = binomial(link = "logit"))
summary(modlog.bmi)
## 
## Call:
## glm(formula = diabetes_complications ~ bmi, family = binomial(link = "logit"), 
##     data = diabetes_data)
## 
## Coefficients:
##             Estimate Std. Error z value         Pr(>|z|)    
## (Intercept) -2.21317    0.39569  -5.593 0.00000002229953 ***
## bmi          0.10071    0.01415   7.117 0.00000000000111 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1302.2  on 999  degrees of freedom
## Residual deviance: 1247.5  on 998  degrees of freedom
## AIC: 1251.5
## 
## Number of Fisher Scoring iterations: 4
tidy(modlog.bmi, exponentiate = TRUE, conf.int = TRUE)
## # A tibble: 2 × 7
##   term        estimate std.error statistic  p.value conf.low conf.high
##   <chr>          <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
## 1 (Intercept)    0.109    0.396      -5.59 2.23e- 8   0.0499     0.236
## 2 bmi            1.11     0.0142      7.12 1.11e-12   1.08       1.14

Intepretation: For each additional unit increase in BMI, the odds of having diabetes complications increase by a factor of exp(0.101) = 1.106, when controlling for other variables in the model.

For Blood Pressure

modlog.bp <- glm(diabetes_complications ~ bp, data = diabetes_data, family = binomial(link = "logit"))
summary(modlog.bp)
## 
## Call:
## glm(formula = diabetes_complications ~ bp, family = binomial(link = "logit"), 
##     data = diabetes_data)
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)
## (Intercept)  0.779754   0.588108   1.326    0.185
## bp          -0.001441   0.004503  -0.320    0.749
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1302.2  on 999  degrees of freedom
## Residual deviance: 1302.1  on 998  degrees of freedom
## AIC: 1306.1
## 
## Number of Fisher Scoring iterations: 4
tidy(modlog.bp, exponentiate = TRUE, conf.int = TRUE)
## # A tibble: 2 × 7
##   term        estimate std.error statistic p.value conf.low conf.high
##   <chr>          <dbl>     <dbl>     <dbl>   <dbl>    <dbl>     <dbl>
## 1 (Intercept)    2.18    0.588       1.33    0.185    0.690      6.93
## 2 bp             0.999   0.00450    -0.320   0.749    0.990      1.01

Intepretation: For each additional unit increase in blood pressure, the odds of having diabetes complications increase by a factor of exp(-0.001) = 0.999, when controlling for other variables in the model.

For Cholesterol

modlog.chol <- glm(diabetes_complications ~ cholesterol, data = diabetes_data, family = binomial(link = "logit"))
summary(modlog.chol)
## 
## Call:
## glm(formula = diabetes_complications ~ cholesterol, family = binomial(link = "logit"), 
##     data = diabetes_data)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.057273   0.447296   0.128    0.898
## cholesterol 0.002686   0.002223   1.208    0.227
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1302.2  on 999  degrees of freedom
## Residual deviance: 1300.7  on 998  degrees of freedom
## AIC: 1304.7
## 
## Number of Fisher Scoring iterations: 4
tidy(modlog.chol, exponentiate = TRUE, conf.int = TRUE)
## # A tibble: 2 × 7
##   term        estimate std.error statistic p.value conf.low conf.high
##   <chr>          <dbl>     <dbl>     <dbl>   <dbl>    <dbl>     <dbl>
## 1 (Intercept)     1.06   0.447       0.128   0.898    0.441      2.55
## 2 cholesterol     1.00   0.00222     1.21    0.227    0.998      1.01

Intepretation: For each additional unit increase in cholesterol level, the odds of having diabetes complications increase by a factor of exp(0.003) = 1.003, when controlling for other variables in the model.

For Smoking Status

modlog.smoke <- glm(diabetes_complications ~ smoking_status, data = diabetes_data, family = binomial(link = "logit"))
summary(modlog.smoke)
## 
## Call:
## glm(formula = diabetes_complications ~ smoking_status, family = binomial(link = "logit"), 
##     data = diabetes_data)
## 
## Coefficients:
##                   Estimate Std. Error z value     Pr(>|z|)    
## (Intercept)        0.36408    0.07567   4.811 0.0000014982 ***
## smoking_statusYes  0.92607    0.16425   5.638 0.0000000172 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1302.2  on 999  degrees of freedom
## Residual deviance: 1267.4  on 998  degrees of freedom
## AIC: 1271.4
## 
## Number of Fisher Scoring iterations: 4
tidy(modlog.smoke, exponentiate = TRUE, conf.int = TRUE)
## # A tibble: 2 × 7
##   term              estimate std.error statistic      p.value conf.low conf.high
##   <chr>                <dbl>     <dbl>     <dbl>        <dbl>    <dbl>     <dbl>
## 1 (Intercept)           1.44    0.0757      4.81 0.00000150       1.24      1.67
## 2 smoking_statusYes     2.52    0.164       5.64 0.0000000172     1.84      3.51

Intepretation: Smokers have exp(0.926) = 2.524 times higher odds of having diabetes complications compared to non-smokers, when controlling for other variables in the model.

Multivariable logistic regression

We will only use significant and clinically relevant variables from the univariable analysis (age,BMI and smoking status).

without interaction terms

modlog.multi <- glm(diabetes_complications ~ age + bmi + smoking_status, data = diabetes_data, family = binomial(link = "logit"))
summary(modlog.multi)
## 
## Call:
## glm(formula = diabetes_complications ~ age + bmi + smoking_status, 
##     family = binomial(link = "logit"), data = diabetes_data)
## 
## Coefficients:
##                    Estimate Std. Error z value         Pr(>|z|)    
## (Intercept)       -3.797968   0.557167  -6.817 0.00000000000932 ***
## age                0.023893   0.007142   3.345         0.000822 ***
## bmi                0.101781   0.014576   6.983 0.00000000000289 ***
## smoking_statusYes  0.990752   0.169755   5.836 0.00000000533555 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1302.2  on 999  degrees of freedom
## Residual deviance: 1199.1  on 996  degrees of freedom
## AIC: 1207.1
## 
## Number of Fisher Scoring iterations: 4
tidy(modlog.multi, exponentiate = TRUE, conf.int = TRUE)
## # A tibble: 4 × 7
##   term              estimate std.error statistic  p.value conf.low conf.high
##   <chr>                <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
## 1 (Intercept)         0.0224   0.557       -6.82 9.32e-12  0.00741    0.0659
## 2 age                 1.02     0.00714      3.35 8.22e- 4  1.01       1.04  
## 3 bmi                 1.11     0.0146       6.98 2.89e-12  1.08       1.14  
## 4 smoking_statusYes   2.69     0.170        5.84 5.34e- 9  1.94       3.78

Intepretation: After adjusting for all other variables in the model, each additional year of age increases the odds of diabetes complications by a factor of exp(0.024) = 1.024, each additional unit increase in BMI increases the odds by exp(0.102) = 1.107 and smokers have exp(0.991) = 2.694 times higher odds of having diabetes complications compared to non-smokers.

with interaction terms (age and BMI)

modlog.interact <- glm(diabetes_complications ~ age + bmi + smoking_status + age:bmi, data = diabetes_data, family = binomial(link = "logit"))
summary(modlog.interact)
## 
## Call:
## glm(formula = diabetes_complications ~ age + bmi + smoking_status + 
##     age:bmi, family = binomial(link = "logit"), data = diabetes_data)
## 
## Coefficients:
##                     Estimate Std. Error z value      Pr(>|z|)    
## (Intercept)       -4.1322178  2.3194910  -1.782        0.0748 .  
## age                0.0300109  0.0418060   0.718        0.4728    
## bmi                0.1139454  0.0832113   1.369        0.1709    
## smoking_statusYes  0.9903857  0.1698123   5.832 0.00000000547 ***
## age:bmi           -0.0002223  0.0014962  -0.149        0.8819    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1302.2  on 999  degrees of freedom
## Residual deviance: 1199.0  on 995  degrees of freedom
## AIC: 1209
## 
## Number of Fisher Scoring iterations: 4
tidy(modlog.interact, exponentiate = TRUE, conf.int = TRUE)
## # A tibble: 5 × 7
##   term              estimate std.error statistic      p.value conf.low conf.high
##   <chr>                <dbl>     <dbl>     <dbl>        <dbl>    <dbl>     <dbl>
## 1 (Intercept)         0.0160   2.32       -1.78       7.48e-2 0.000161      1.46
## 2 age                 1.03     0.0418      0.718      4.73e-1 0.950         1.12
## 3 bmi                 1.12     0.0832      1.37       1.71e-1 0.953         1.32
## 4 smoking_statusYes   2.69     0.170       5.83       5.47e-9 1.94          3.78
## 5 age:bmi             1.000    0.00150    -0.149      8.82e-1 0.997         1.00

Intepretation: The interaction term between age and BMI was not statistically significant (p = 0.882), indicating that the effect of age on diabetes complications does not differ significantly across different BMI levels in this dataset.

with interaction terms (age and smoking status)

modlog.interact2 <- glm(diabetes_complications ~ age + bmi + smoking_status + age:smoking_status, data = diabetes_data, family = binomial(link = "logit"))
summary(modlog.interact2)
## 
## Call:
## glm(formula = diabetes_complications ~ age + bmi + smoking_status + 
##     age:smoking_status, family = binomial(link = "logit"), data = diabetes_data)
## 
## Coefficients:
##                        Estimate Std. Error z value         Pr(>|z|)    
## (Intercept)           -3.655834   0.587255  -6.225 0.00000000048067 ***
## age                    0.021160   0.008004   2.644           0.0082 ** 
## bmi                    0.102051   0.014588   6.996 0.00000000000264 ***
## smoking_statusYes      0.282053   0.965880   0.292           0.7703    
## age:smoking_statusYes  0.013173   0.017735   0.743           0.4576    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1302.2  on 999  degrees of freedom
## Residual deviance: 1198.5  on 995  degrees of freedom
## AIC: 1208.5
## 
## Number of Fisher Scoring iterations: 4
tidy(modlog.interact2, exponentiate = TRUE, conf.int = TRUE)
## # A tibble: 5 × 7
##   term                  estimate std.error statistic  p.value conf.low conf.high
##   <chr>                    <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
## 1 (Intercept)             0.0258   0.587      -6.23  4.81e-10  0.00804    0.0805
## 2 age                     1.02     0.00800     2.64  8.20e- 3  1.01       1.04  
## 3 bmi                     1.11     0.0146      7.00  2.64e-12  1.08       1.14  
## 4 smoking_statusYes       1.33     0.966       0.292 7.70e- 1  0.197      8.78  
## 5 age:smoking_statusYes   1.01     0.0177      0.743 4.58e- 1  0.979      1.05

Intepretation: The interaction term between age and smoking status was not statistically significant (p = 0.458), indicating that the effect of age on diabetes complications does not differ significantly between smokers and non-smokers in this dataset.

with interaction terms (bmi and smoking status)

modlog.interact3 <- glm(diabetes_complications ~ age + bmi + smoking_status + bmi:smoking_status, data = diabetes_data, family = binomial(link = "logit"))
summary(modlog.interact3)
## 
## Call:
## glm(formula = diabetes_complications ~ age + bmi + smoking_status + 
##     bmi:smoking_status, family = binomial(link = "logit"), data = diabetes_data)
## 
## Coefficients:
##                        Estimate Std. Error z value       Pr(>|z|)    
## (Intercept)           -3.724888   0.591973  -6.292 0.000000000313 ***
## age                    0.023949   0.007142   3.353       0.000799 ***
## bmi                    0.099043   0.016399   6.040 0.000000001544 ***
## smoking_statusYes      0.643954   0.978902   0.658       0.510645    
## bmi:smoking_statusYes  0.012851   0.035782   0.359       0.719488    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1302.2  on 999  degrees of freedom
## Residual deviance: 1198.9  on 995  degrees of freedom
## AIC: 1208.9
## 
## Number of Fisher Scoring iterations: 4
tidy(modlog.interact3, exponentiate = TRUE, conf.int = TRUE)
## # A tibble: 5 × 7
##   term                  estimate std.error statistic  p.value conf.low conf.high
##   <chr>                    <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
## 1 (Intercept)             0.0241   0.592      -6.29  3.13e-10  0.00743    0.0758
## 2 age                     1.02     0.00714     3.35  7.99e- 4  1.01       1.04  
## 3 bmi                     1.10     0.0164      6.04  1.54e- 9  1.07       1.14  
## 4 smoking_statusYes       1.90     0.979       0.658 5.11e- 1  0.273     12.8   
## 5 bmi:smoking_statusYes   1.01     0.0358      0.359 7.19e- 1  0.945      1.09

Intepretation: The interaction term between BMI and smoking status was not statistically significant (p = 0.719), indicating that the effect of BMI on diabetes complications does not differ significantly between smokers and non-smokers in this dataset.

Conclusion: Since none of the interaction terms were statistically significant, we will proceed with the simpler model without interaction terms (modlog.multi) for further inference and prediction.

Step 4: Inference

For log odds and odds ratio

final_model <- modlog.multi
summary(final_model)
## 
## Call:
## glm(formula = diabetes_complications ~ age + bmi + smoking_status, 
##     family = binomial(link = "logit"), data = diabetes_data)
## 
## Coefficients:
##                    Estimate Std. Error z value         Pr(>|z|)    
## (Intercept)       -3.797968   0.557167  -6.817 0.00000000000932 ***
## age                0.023893   0.007142   3.345         0.000822 ***
## bmi                0.101781   0.014576   6.983 0.00000000000289 ***
## smoking_statusYes  0.990752   0.169755   5.836 0.00000000533555 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1302.2  on 999  degrees of freedom
## Residual deviance: 1199.1  on 996  degrees of freedom
## AIC: 1207.1
## 
## Number of Fisher Scoring iterations: 4
tidy(final_model, exponentiate = TRUE, conf.int = TRUE)
## # A tibble: 4 × 7
##   term              estimate std.error statistic  p.value conf.low conf.high
##   <chr>                <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
## 1 (Intercept)         0.0224   0.557       -6.82 9.32e-12  0.00741    0.0659
## 2 age                 1.02     0.00714      3.35 8.22e- 4  1.01       1.04  
## 3 bmi                 1.11     0.0146       6.98 2.89e-12  1.08       1.14  
## 4 smoking_statusYes   2.69     0.170        5.84 5.34e- 9  1.94       3.78

Interpretation: After adjusting for all other variables in the model, each additional year of age increases the odds of diabetes complications by a factor of 1.02 (95% CI: 1.01, 1.04), each additional unit increase in BMI increases the odds by 1.11 (95% CI: 1.08, 1.14) and smokers have 2.69 (95% CI: 1.94, 3.78) times higher odds of having diabetes complications compared to non-smokers.

Step 5: Prediction

augment(final_model, type.predict = "link") %>% 
head(10)
## # A tibble: 10 × 10
##    diabetes_complications   age   bmi smoking_status .fitted .resid    .hat
##    <fct>                  <dbl> <dbl> <fct>            <dbl>  <dbl>   <dbl>
##  1 Yes                     49.4  23.0 No             -0.275   1.30  0.00301
##  2 Yes                     52.7  22.8 No             -0.218   1.27  0.00287
##  3 Yes                     70.6  27.9 Yes             1.72    0.574 0.00475
##  4 Yes                     55.7  27.3 No              0.316   1.05  0.00153
##  5 No                      56.3  15.3 Yes             0.0903 -1.22  0.0128 
##  6 Yes                     72.2  33.2 Yes             2.30    0.438 0.00392
##  7 Yes                     59.6  29.2 No              0.603   0.934 0.00178
##  8 Yes                     42.3  40.1 No              1.29    0.696 0.00799
##  9 No                      48.1  31.4 No              0.551  -1.42  0.00264
## 10 No                      50.5  25.8 No              0.0321 -1.19  0.00197
## # ℹ 3 more variables: .sigma <dbl>, .cooksd <dbl>, .std.resid <dbl>
augment(final_model, type.predict = "response") %>% 
head(10)
## # A tibble: 10 × 10
##    diabetes_complications   age   bmi smoking_status .fitted .resid    .hat
##    <fct>                  <dbl> <dbl> <fct>            <dbl>  <dbl>   <dbl>
##  1 Yes                     49.4  23.0 No               0.432  1.30  0.00301
##  2 Yes                     52.7  22.8 No               0.446  1.27  0.00287
##  3 Yes                     70.6  27.9 Yes              0.848  0.574 0.00475
##  4 Yes                     55.7  27.3 No               0.578  1.05  0.00153
##  5 No                      56.3  15.3 Yes              0.523 -1.22  0.0128 
##  6 Yes                     72.2  33.2 Yes              0.909  0.438 0.00392
##  7 Yes                     59.6  29.2 No               0.646  0.934 0.00178
##  8 Yes                     42.3  40.1 No               0.785  0.696 0.00799
##  9 No                      48.1  31.4 No               0.634 -1.42  0.00264
## 10 No                      50.5  25.8 No               0.508 -1.19  0.00197
## # ℹ 3 more variables: .sigma <dbl>, .cooksd <dbl>, .std.resid <dbl>

Manual calculate for first observation

# Coefficients
coef <- coef(final_model)
# First observation values
obs1 <- diabetes_data[1, ]
# Linear predictor
linear_predictor <- coef[1] + coef[2]*obs1$age + coef[3]*obs1$bmi + coef[4]*(ifelse(obs1$smoking_status == "Yes", 1, 0))
# Probability calculation
probability <- 1 / (1 + exp(-linear_predictor))
probability
## (Intercept) 
##   0.4317638

Interpretation: For the first patient in the dataset, the predicted probability of having diabetes complications is approximately (43.2%).

Step 6: Confounding and mediation

Load library

library(corrplot)
## corrplot 0.95 loaded

Check correlation between age and BMI (numerical variables)

cor(diabetes_data$age, diabetes_data$bmi, use = 'complete.obs')
## [1] 0.08647944

Correlation between age and BMI

cor_matrix <- diabetes_data %>%
    dplyr::select(age, bmi, bp, cholesterol) %>%
    cor(use = "complete.obs")
cor_matrix
##                     age          bmi          bp  cholesterol
## age          1.00000000  0.086479441 -0.01932954 -0.002994710
## bmi          0.08647944  1.000000000  0.02650333 -0.007029076
## bp          -0.01932954  0.026503334  1.00000000  0.050560850
## cholesterol -0.00299471 -0.007029076  0.05056085  1.000000000

Visualize the correlation matrix

corrplot(cor_matrix, type = 'upper', order = 'hclust')

Interpretation: The correlation between age and BMI is low (r = 0.09), indicating that there is weak to no linear relationship between these two variables in this dataset. Therefore, we can include both in the model without worrying about multicollinearity, and it is unlikely that BMI strongly confounds the relationship between age and diabetes complications.

Check the association of each variable with the outcome (diabetes complications)

slr.age.bmi.smoke <- 
  diabetes_data %>%
  dplyr::select(age, bmi, smoking_status) %>%
  purrr::map(~ glm(diabetes_complications ~ .x, data = diabetes_data, family = binomial)) %>%
  purrr::map(tidy) %>%  bind_rows()

Display the results

slr.age.bmi.smoke %>% 
  mutate(model = c('b0', 'age', 'b0', 'bmi', 'b0', 'smoking_statusYes')) %>%
  dplyr::select(model, everything())
## # A tibble: 6 × 6
##   model             term        estimate std.error statistic  p.value
##   <chr>             <chr>          <dbl>     <dbl>     <dbl>    <dbl>
## 1 b0                (Intercept)  -0.836    0.378       -2.21 2.68e- 2
## 2 age               .x            0.0261   0.00683      3.82 1.33e- 4
## 3 b0                (Intercept)  -2.21     0.396       -5.59 2.23e- 8
## 4 bmi               .x            0.101    0.0142       7.12 1.11e-12
## 5 b0                (Intercept)   0.364    0.0757       4.81 1.50e- 6
## 6 smoking_statusYes .xYes         0.926    0.164        5.64 1.72e- 8

Interpretation: All three variables (age, BMI, and smoking status) show significant associations with diabetes complications in the univariable analyses. This suggests that they are important predictors and should be included in the multivariable model. Since none of these variables show strong correlations with each other, it is unlikely that confounding is a major concern in this analysis.

Step 7: Model checking

Load library

library(caret)
## Loading required package: lattice
## 
## Attaching package: 'caret'
## The following object is masked from 'package:purrr':
## 
##     lift

Create predicted classes based on a 0.5 threshold

final.m.prob <- augment(final_model, type.predict = "response") %>%
  mutate(pred.class = ifelse(.fitted >= 0.5, "Yes", "No"))

Create confusion matrix

confusionMatrix(as.factor(final.m.prob$pred.class), diabetes_data$diabetes_complications, positive = "Yes")
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  No Yes
##        No  113  79
##        Yes 243 565
##                                               
##                Accuracy : 0.678               
##                  95% CI : (0.648, 0.7069)     
##     No Information Rate : 0.644               
##     P-Value [Acc > NIR] : 0.01301             
##                                               
##                   Kappa : 0.2171              
##                                               
##  Mcnemar's Test P-Value : < 0.0000000000000002
##                                               
##             Sensitivity : 0.8773              
##             Specificity : 0.3174              
##          Pos Pred Value : 0.6993              
##          Neg Pred Value : 0.5885              
##              Prevalence : 0.6440              
##          Detection Rate : 0.5650              
##    Detection Prevalence : 0.8080              
##       Balanced Accuracy : 0.5974              
##                                               
##        'Positive' Class : Yes                 
## 

Interpretation: The confusion matrix shows that the model has an accuracy of approximately 67.8%, with a sensitivity of 87.7% and a specificity of 31.7%. This indicates that the model is quite good at identifying patients with diabetes complications (high sensitivity) but less effective at correctly identifying those without complications (low specificity). Depending on the clinical context, further refinement of the model or adjustment of the classification threshold may be necessary to improve performance.

Check for linearity of covariates with logit

###Load library

library(mfp)
## Warning: package 'mfp' was built under R version 4.5.2
## Loading required package: survival
## 
## Attaching package: 'survival'
## The following object is masked from 'package:caret':
## 
##     cluster
lin.age.bmi <- mfp(diabetes_complications ~ fp(age) + fp(bmi) + smoking_status, data = diabetes_data, family = binomial(link = "logit"),
                   verbose = TRUE)
## 
##  Variable    Deviance    Power(s)
## ------------------------------------------------
## Cycle 1
##  bmi              
##              1251.692     
##              1199.066    1
##              1199.066    1
##              1199.065    -2 1
## 
##  smoking_statusYes            
##              1236.408     
##              1199.066    1
##                              
##                              
## 
##  age              
##              1210.496     
##              1199.066    1
##              1198.847    2
##              1197.198    -2 -2
## 
## 
## Tansformation
##                   shift scale
## bmi                   0    10
## smoking_statusYes     0     1
## age                   0   100
## 
## Fractional polynomials
##                   df.initial select alpha df.final power1 power2
## bmi                        4      1  0.05        1      1      .
## smoking_statusYes          1      1  0.05        1      1      .
## age                        4      1  0.05        1      1      .
## 
## 
## Transformations of covariates:
##                       formula
## age            I((age/100)^1)
## bmi             I((bmi/10)^1)
## smoking_status smoking_status
## 
## 
## Deviance table:
##           Resid. Dev
## Null model    1302.164
## Linear model  1199.066
## Final model   1199.066
summary(lin.age.bmi)
## 
## Call:
## glm(formula = diabetes_complications ~ I((bmi/10)^1) + smoking_status + 
##     I((age/100)^1), family = binomial(link = "logit"), data = diabetes_data)
## 
## Coefficients:
##                   Estimate Std. Error z value         Pr(>|z|)    
## (Intercept)        -3.7980     0.5572  -6.817 0.00000000000932 ***
## I((bmi/10)^1)       1.0178     0.1458   6.983 0.00000000000289 ***
## smoking_statusYes   0.9908     0.1698   5.836 0.00000000533555 ***
## I((age/100)^1)      2.3893     0.7142   3.345         0.000822 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1302.2  on 999  degrees of freedom
## Residual deviance: 1199.1  on 996  degrees of freedom
## AIC: 1207.1
## 
## Number of Fisher Scoring iterations: 4

Interpretation: The analysis confirms that both Age and BMI have a linear relationship with the log-odds of diabetes complications.The “Power” value for both variables is 1, which means a straight line (\(x^1\)) fits the data best.Because the relationship is linear, we do not need to apply complex transformations to these variables. We can include them in the logistic regression model exactly as they are.

Step 8: Diagnostics plot

plot(final_model)

Interpretation: The diagnostic plots indicate that the residuals are approximately normally distributed, and there are no obvious patterns suggesting non-linearity or heteroscedasticity. There are a few potential outliers, but they do not appear to have a significant influence on the model. Overall, the model assumptions seem to be reasonably met.

Influential observations

infl <- influence.measures(final_model)
infl.v <- data.frame(infl$infmat)
names(infl.v)
## [1] "dfb.1_"   "dfb.age"  "dfb.bmi"  "dfb.sm_Y" "dffit"    "cov.r"    "cook.d"  
## [8] "hat"

Identify observations with high Cook’s distance

cutoff.cook.d <- 4/3820
infl.v |>
filter(cook.d > cutoff.cook.d)
##             dfb.1_      dfb.age         dfb.bmi    dfb.sm_Y       dffit
## 5    -0.0594546125 -0.013760500  0.095001730423 -0.06653561 -0.12715827
## 9    -0.0036668251  0.030754523 -0.034136540333  0.02103548 -0.06662158
## 13    0.0287719914  0.019330487 -0.049539410183 -0.02409077  0.06889973
## 25    0.0771688848 -0.024049945 -0.073232852550 -0.03014678  0.09259093
## 26    0.0403897472 -0.062474750  0.012708518816 -0.02039296  0.07550210
## 29   -0.0243170607  0.047803601 -0.021702139261  0.02157175 -0.06960049
## 33   -0.0074937617 -0.038155329  0.038617929701  0.02202888 -0.06888745
## 41   -0.0433039754  0.024800426  0.035611298199 -0.08036276 -0.10537456
## 51    0.0302424402 -0.016125885 -0.026021688661 -0.08823483 -0.10178356
## 53    0.0256254063 -0.002166441 -0.033137362615 -0.08829956 -0.10246213
## 55    0.0263571000  0.011569229 -0.057346000895  0.01938892 -0.07752616
## 56    0.0624278374 -0.071405249 -0.026992630749  0.01894098 -0.09312697
## 64   -0.0024765226  0.047029675 -0.051640543801  0.02059966 -0.08477320
## 72    0.0832766087 -0.098035618 -0.010709360763 -0.02670270  0.10994829
## 73    0.0090593588 -0.054651847  0.040253018050 -0.08597737 -0.11929707
## 84    0.0377687885  0.031634070 -0.073302182685 -0.02674058  0.09171378
## 87    0.0164852277 -0.058759061  0.034005868007 -0.08675925 -0.11946518
## 88    0.0391918009 -0.019769929 -0.044786369011  0.01928679 -0.07154832
## 90    0.0061733527  0.036065248 -0.043268020184  0.04784627  0.07924954
## 91    0.0848531673 -0.044083088 -0.084074853529  0.01496162 -0.10934298
## 97    0.0791171146 -0.103721099 -0.018815503197  0.01849341 -0.11829082
## 104   0.0585049925 -0.011517174 -0.060016996515 -0.02770489  0.07808634
## 108   0.0449211285 -0.063159470  0.007348289352 -0.02136874  0.07612787
## 120  -0.0748294882  0.033276610  0.070774239114 -0.06662538 -0.11548341
## 122   0.0797264747 -0.038831401 -0.062543601010 -0.02969689  0.08988129
## 123   0.0260769865  0.024494648 -0.069473177330  0.01897824 -0.08921042
## 136   0.0065789490 -0.049856990  0.030064171963  0.02203569 -0.07359092
## 140   0.0468197530 -0.055465987 -0.002487085289 -0.02236398  0.07048256
## 143   0.0425936602 -0.059100982  0.006584327447 -0.02115850  0.07261121
## 145   0.0358577160 -0.058166111  0.014619531848 -0.01967485  0.07182520
## 146  -0.0298989047  0.018205674  0.023543671421 -0.08334318 -0.10218926
## 147   0.0398092461 -0.039167877 -0.016951071598  0.05197936  0.07588149
## 149   0.0588442092 -0.103605283  0.019025975925 -0.08815882 -0.14288937
## 159   0.0027456343 -0.053445966  0.047769397602 -0.08503917 -0.12114988
## 161   0.0448052898 -0.049227113 -0.024053856596  0.01979964 -0.07568435
## 164   0.0943647052 -0.155363837  0.009978694732  0.01904539 -0.16396028
## 169   0.0277103200  0.024224304 -0.052775234620 -0.02423263  0.07264669
## 175  -0.0152108423  0.029647153 -0.007681956767 -0.08671213 -0.10346525
## 179   0.0219031793  0.018622702 -0.048069730471  0.05225514  0.08101861
## 187   0.0020600581 -0.048228270  0.034879140917  0.02208916 -0.07397214
## 195   0.0693652101 -0.053957476 -0.034103918658 -0.02695120  0.08054690
## 198   0.0200122488  0.054334162 -0.079910755756 -0.08814365 -0.13406576
## 202   0.0098092418 -0.057787054  0.033239660704  0.02207295 -0.08033060
## 207  -0.0450501036  0.029129766  0.033836409105 -0.08028843 -0.10587811
## 216   0.0758226752 -0.078707957 -0.038399280066  0.01778459 -0.10308053
## 217   0.0912043340 -0.015192359 -0.100779221031 -0.03281726  0.11471181
## 219   0.0786245487 -0.051676024 -0.048763432672 -0.02874054  0.08747657
## 223  -0.0285665197 -0.038551332  0.069731654212  0.02042252 -0.08568228
## 231   0.0259219358 -0.087571483  0.039551078465  0.02208404 -0.10607868
## 234   0.0542111888 -0.019370863 -0.055841529499  0.05913453  0.09461803
## 244   0.0606051811 -0.045645819 -0.030267662714 -0.02585371  0.07267663
## 248  -0.0268439714  0.057644900 -0.027756719794  0.02153043 -0.07846011
## 249   0.0208581566  0.032611617 -0.070106389594  0.01918866 -0.09194029
## 255   0.0884981235 -0.073696805 -0.060594144695  0.01595973 -0.11020328
## 257   0.0064457152  0.043350099 -0.042637535263 -0.02109147  0.07258503
## 260  -0.0424008183  0.058922115 -0.006862063876  0.02157026 -0.07303982
## 261   0.0196567805  0.020424437 -0.046719262609 -0.08841155 -0.10899426
## 265   0.0697677772 -0.108829685 -0.000981009138  0.01966567 -0.12061391
## 273   0.0456490046 -0.035326258 -0.028683753625 -0.08784789 -0.10605961
## 278   0.0285276876 -0.002845517 -0.046392622493  0.01967038 -0.06922296
## 286   0.0134328922 -0.011570409 -0.007307304224 -0.08790483 -0.09948174
## 288  -0.0049929931  0.062583090 -0.045709127549 -0.02001701  0.08599440
## 289   0.0768438131 -0.005102569 -0.090938183933 -0.03119519  0.10434887
## 307   0.1075469544 -0.089340206 -0.051846700001 -0.03129327  0.11676762
## 309   0.0019803460 -0.053200378  0.039859677881  0.02213573 -0.07924040
## 310   0.0587224720 -0.096156836  0.002064485612  0.02021131 -0.10899499
## 312   0.0187150916 -0.041812873  0.014597468457 -0.08750916 -0.10807664
## 313  -0.0189853445  0.077794759 -0.048927206463 -0.08837825 -0.13396596
## 322   0.0103028065  0.040749114 -0.053461853558  0.05048495  0.08955506
## 327   0.0339110289  0.004020864 -0.060498537222  0.01894705 -0.07952394
## 332  -0.0200876063  0.051438675 -0.031229147877  0.02140447 -0.07592914
## 336  -0.0927371536  0.070310859  0.059675366135 -0.06187064 -0.12284673
## 338  -0.0412626364  0.001445270  0.055329031520 -0.07839313 -0.10936270
## 340  -0.0063186863 -0.001194817  0.009838454548 -0.08637283 -0.09934483
## 343   0.0263529474 -0.075743149  0.027426836187  0.02181642 -0.09315769
## 345   0.0534406641  0.001576445 -0.065700332953 -0.02763650  0.08130034
## 347  -0.0239752587 -0.011879375  0.044414848646 -0.08227025 -0.10719248
## 349   0.0583447915 -0.038604395 -0.033950839810 -0.02589303  0.07013564
## 353  -0.0281200591  0.013298615  0.025830741013 -0.08335600 -0.10196855
## 354   0.0278642341 -0.004904459 -0.043476684723  0.01979292 -0.06739643
## 358   0.0519474777  0.002348921 -0.064428211019 -0.02741474  0.08015976
## 359  -0.0674779177  0.070208738  0.018830513588  0.01914386 -0.08017726
## 371   0.0534477567 -0.113021894  0.025732149114  0.02119108 -0.12559214
## 374  -0.0205578060 -0.032935191  0.052295950198  0.02158272 -0.07262499
## 380  -0.0363781180  0.074866730 -0.031117358896  0.02157295 -0.09226627
## 385   0.0461770064 -0.025214914 -0.039161056408 -0.08762281 -0.10578707
## 386  -0.0200183638 -0.050643552  0.069084046177  0.02123510 -0.09205411
## 388  -0.0205823131  0.048712031 -0.018684248248 -0.08697779 -0.11159512
## 389   0.0051881693  0.019884326 -0.035952659824  0.02076579 -0.06407994
## 392   0.1066685503 -0.095765683 -0.044489750903 -0.03080859  0.11858049
## 395   0.0030302623  0.026238553 -0.039115734216  0.02074795 -0.06792017
## 396   0.0011787421 -0.063151331  0.050753506653  0.02220334 -0.09083038
## 401  -0.0267036156 -0.003719957  0.040296447882 -0.08227757 -0.10506390
## 403  -0.0011670753  0.028623198 -0.035563542128  0.02094326 -0.06661295
## 413   0.0199038098  0.028692793 -0.046611229634 -0.02292537  0.06912788
## 414   0.0336481722  0.002129164 -0.048308806620 -0.08799173 -0.10682912
## 416   0.0687467299 -0.106433985  0.016781969411 -0.02332233  0.11605932
## 422   0.0386554039 -0.012602120 -0.050979319636  0.01908390 -0.07396269
## 432   0.0421991897  0.020098396 -0.068255047235 -0.02679014  0.08473924
## 436  -0.0240675538  0.044996301 -0.019308037310  0.02159412 -0.06696531
## 437  -0.0351727246 -0.024729977  0.065825122919  0.01981129 -0.07682384
## 439   0.0464022483 -0.061905294 -0.004086164522  0.05180609  0.08776164
## 444   0.0236213059  0.005257703 -0.037542278590 -0.08833385 -0.10392539
## 448  -0.0299578144 -0.031682534  0.064915281759  0.02053172 -0.07935661
## 449   0.0398167475 -0.074486501  0.024855552418 -0.01929654  0.08732540
## 450  -0.0467830957  0.061013644 -0.002620209516  0.02144763 -0.07391290
## 453   0.0368115369 -0.019856481 -0.041405333692  0.01950731 -0.06936922
## 454   0.0623934557 -0.032331485 -0.064651700806  0.01733276 -0.08997287
## 456   0.0275586194 -0.060227054  0.020199259279  0.04262695  0.07968510
## 458   0.0253642411  0.024164237 -0.068166815513  0.01905659 -0.08810504
## 459   0.0272677056 -0.010390670 -0.037330319930  0.02001436 -0.06422138
## 462   0.0080215678  0.016009300 -0.036144155528  0.02068043 -0.06331700
## 468   0.0726840970 -0.081250374 -0.031630357026  0.01825056 -0.10256805
## 472   0.0533102471 -0.008667426 -0.055748174951 -0.02693961  0.07393994
## 473   0.0401789224 -0.035419721 -0.021074012470  0.05260297  0.07589007
## 476   0.0531931545  0.010834981 -0.074206953006 -0.02819178  0.08888738
## 479   0.0521408899  0.020992879 -0.082490064894 -0.02865674  0.09738132
## 482   0.0264584408 -0.014390797 -0.032326277984  0.02020068 -0.06215377
## 484   0.0383255177 -0.004601803 -0.058261704922  0.01882777 -0.07816617
## 490   0.0150230672  0.018211439 -0.048044025266  0.02012471 -0.07171115
## 491   0.0168399933  0.005128740 -0.037875939756  0.02035996 -0.06317653
## 493   0.0083818178 -0.056774951  0.034271365107  0.02209013 -0.07989279
## 495   0.0638807476 -0.034726577 -0.045090086448 -0.02718443  0.07509361
## 499   0.0428303165 -0.016184762 -0.053286955929  0.01880781 -0.07645362
## 500   0.0235457599 -0.031322511 -0.002160145156 -0.08801984 -0.10318639
## 513   0.0481037124 -0.041063953 -0.036520630685  0.01920780 -0.07618101
## 519  -0.0252262169  0.043640640 -0.007409478708 -0.08603885 -0.10814624
## 523  -0.0300692987  0.026390688  0.015884208879 -0.08395415 -0.10279823
## 535   0.0704195857 -0.092108857 -0.018030998048  0.01894247 -0.10792563
## 537   0.0417956699 -0.059569224 -0.000006526297 -0.08814827 -0.11417496
## 540   0.0125311555  0.009422924 -0.036038867468  0.02054855 -0.06226941
## 545   0.0267873587 -0.040162301 -0.007773579248  0.02091441 -0.06476571
## 554   0.0342907581 -0.011023825 -0.046463300745  0.01943389 -0.07043842
## 556  -0.0713256767  0.055706840  0.044325712351 -0.07351642 -0.11569718
## 567   0.0554616845 -0.026641827 -0.050548376645 -0.08672349 -0.10914097
## 572   0.0676980667 -0.059835659 -0.026250601870 -0.02626017  0.08129500
## 583  -0.0672134780  0.043555370  0.050391745850 -0.07346685 -0.11282481
## 591   0.1666304232 -0.127321955 -0.096194380423 -0.03707237  0.17087617
## 598  -0.0003327305  0.093874286 -0.100105203325  0.01913831 -0.14414752
## 599   0.0175618567  0.043155715 -0.065761661268  0.05377780  0.10120763
## 605  -0.0270206480 -0.030008249  0.058861337085  0.02104542 -0.07494638
## 608   0.0002010670  0.062620180 -0.060667857750 -0.08879298 -0.12978287
## 615   0.0277663842 -0.056783281  0.016592444345 -0.08777930 -0.11473132
## 616  -0.0998881669  0.091696270  0.048881431196 -0.06216445 -0.13209360
## 623   0.0460835672 -0.039370087 -0.035363538744  0.01933803 -0.07461792
## 629   0.0507551372 -0.054172089 -0.008962935854 -0.02329357  0.07078196
## 631  -0.0689030411  0.019541344  0.075873690178 -0.06760794 -0.11600365
## 632  -0.0576182607  0.060658345  0.013552262563  0.02056326 -0.07278294
## 633   0.0273397802  0.024880605 -0.052904446828 -0.02420473  0.07292346
## 636  -0.0578375462  0.064923555  0.009608582173  0.02071021 -0.07600603
## 637   0.0371130062  0.006922990 -0.067730517758  0.01852520 -0.08534186
## 649   0.0501922807 -0.034166539 -0.046075188256  0.01874464 -0.07808379
## 654  -0.0267619537  0.062116312 -0.032246458488  0.02147940 -0.08343200
## 656   0.0799830612 -0.033410798 -0.068080128728 -0.03005873  0.09176486
## 661   0.0565696422 -0.039893031 -0.049342329967  0.01830364 -0.08293562
## 664  -0.0423028377 -0.006623206  0.064540283251 -0.07699873 -0.11344234
## 665   0.0932842263 -0.097706124 -0.024512881675 -0.02850039  0.11293826
## 666  -0.0024148915 -0.007509754  0.010562172802 -0.08658744 -0.09970431
## 668   0.0241418228  0.007998771 -0.050813878566  0.01969745 -0.07232223
## 676  -0.0200321472  0.090191934 -0.069100462711  0.02071590 -0.12208912
## 688   0.0421373770  0.024703654 -0.081749772622  0.05980763  0.11269915
## 690   0.0555668846 -0.081173152  0.010311923634 -0.02235250  0.09231197
## 692   0.0055059555  0.036944404 -0.043198428746  0.04766353  0.07941830
## 693  -0.0077897249 -0.049139283  0.049834060474  0.02203327 -0.08077339
## 700  -0.0454532025  0.060874879 -0.004400066396  0.02149659 -0.07407385
## 702   0.0615659022 -0.062215774 -0.015754520499 -0.02492132  0.07937805
## 707   0.0151420088 -0.019066722 -0.002427279773 -0.08783268 -0.10043439
## 708   0.0122822316  0.043523104 -0.068787271776  0.01959371 -0.09495367
## 710   0.0614975028 -0.033997886 -0.061813750065  0.01750931 -0.08858709
## 716   0.0156320015 -0.050842741  0.018249576739  0.02176754 -0.07172948
## 722   0.0422640489 -0.064495330  0.012142170223 -0.02066551  0.07726931
## 726   0.0652087597  0.016459724 -0.095815961785 -0.03058958  0.10881681
## 727   0.0307047805  0.026800771 -0.059229296399 -0.02502794  0.07859245
## 732   0.0245715205  0.039822159 -0.063415224925 -0.02473673  0.08634977
## 733   0.0407388993 -0.071284652  0.020601891197 -0.01977446  0.08389185
## 742   0.0067840959 -0.044074970  0.024125814393  0.02194999 -0.06799240
## 744   0.0784243337 -0.028577138 -0.090207110758  0.01508095 -0.10869618
## 746   0.1020783810 -0.097522842 -0.036581924018 -0.02997945  0.11684661
## 747   0.0553088483 -0.134996019  0.054159464275 -0.08809233 -0.17446262
## 749   0.1126112393 -0.108482724 -0.060052211731  0.01432510 -0.13608010
## 756   0.0422907929  0.007434641 -0.075366876233  0.01796199 -0.09160321
## 759   0.0855715497 -0.087239158 -0.024121148132 -0.02778611  0.10351371
## 766   0.0126706222  0.040498940 -0.048194285611 -0.02226810  0.07506735
## 771   0.0765231470 -0.070049941 -0.028363429830 -0.02723327  0.09038342
## 773   0.0739390648 -0.018837596 -0.093449294580  0.01521961 -0.10904378
## 775   0.0623894069  0.017610454 -0.093100714963 -0.03019559  0.10646803
## 781   0.0282865042 -0.037768140 -0.012190078189  0.02074642 -0.06428168
## 785  -0.0234376164  0.057537817 -0.023270789444 -0.08706442 -0.11644718
## 788   0.0221436499 -0.063564833  0.037729020977 -0.01567033  0.08082556
## 789   0.0601277614 -0.057905201 -0.026812059078  0.05727007  0.09545303
## 791   0.0274558290 -0.018170629 -0.030047939612  0.02023689 -0.06203557
## 792  -0.0509300494  0.034315943  0.036917850015 -0.07894614 -0.10765921
## 801   0.0324869229  0.017896329 -0.053139440859 -0.02474450  0.07156697
## 808   0.0540334167 -0.101493230  0.013744603206  0.02077948 -0.11392223
## 811  -0.0595140843  0.093467558 -0.016324399264  0.02132187 -0.10292837
## 812   0.0649207380 -0.063833224 -0.018708469587 -0.02546636  0.08167560
## 827  -0.0443301723  0.009077702  0.052185122429 -0.07821453 -0.10817642
## 830  -0.0125620895  0.065905381 -0.055905600543  0.02079082 -0.09819000
## 831   0.0295358261 -0.048867986 -0.003156921166  0.02096957 -0.07005768
## 833   0.0383501802  0.038479830 -0.080610375210 -0.02731426  0.10000450
## 834  -0.0045201333  0.057335030 -0.041373310025 -0.01967427  0.07996844
## 839   0.0436435444 -0.001134236 -0.068954579808  0.01815231 -0.08635707
## 842  -0.0300702500  0.086419729 -0.035308539590 -0.01563319  0.09836433
## 850   0.0541960831  0.002625725 -0.067719148811 -0.02784140  0.08302747
## 851   0.0750616597  0.032519171 -0.124580686477 -0.03303123  0.13754525
## 860   0.0123104807  0.070374742 -0.076079026086 -0.02449667  0.11110394
## 863   0.0574391081 -0.041604375 -0.038836685240  0.05800280  0.09198569
## 867  -0.0784124039  0.072740364  0.037640402648 -0.07294191 -0.12227985
## 868   0.0108314780 -0.050481965  0.024663239822  0.02193364 -0.07265611
## 875   0.0603381911 -0.069848262 -0.025610647374  0.01909120 -0.09144649
## 877   0.0917544845 -0.029562742 -0.097622356241 -0.07702365 -0.12862646
## 878   0.0112414822  0.080544524 -0.084316513911 -0.02503870  0.12302632
## 884   0.0242039280 -0.064359447  0.028796197378 -0.08734700 -0.12096099
## 886   0.0531819448 -0.023403585 -0.050537456810 -0.08688593 -0.10860734
## 887   0.0211306402  0.044570085 -0.072031622674 -0.08820463 -0.12638704
## 888   0.0729725166 -0.023085713 -0.068478918321 -0.02950685  0.08828817
## 891   0.0753781030 -0.040556645 -0.055015929605 -0.02885798  0.08518023
## 900   0.0266945545 -0.011028784 -0.035914921997  0.02008003 -0.06347091
## 901   0.0142997292  0.049207902 -0.077095942202  0.01923946 -0.10354464
## 910   0.0508879455 -0.010718804 -0.069679094731  0.01774373 -0.08792618
## 913   0.0323987630 -0.003919430 -0.050720002780  0.01936392 -0.07243909
## 914  -0.0475700360  0.078034335 -0.009863492330 -0.08430864 -0.12552815
## 921   0.0641318535 -0.007880550 -0.080560613454  0.06272312  0.11350616
## 923  -0.0401137860 -0.006149051  0.061073761581 -0.07789077 -0.11208213
## 930  -0.0024882513 -0.027061735  0.029519122397 -0.08561272 -0.10625941
## 936   0.0650547062 -0.021689819 -0.059124339041 -0.02822694  0.08012375
## 937  -0.0299455442  0.025832415  0.016252489038 -0.08393770 -0.10270653
## 940  -0.0162998375 -0.052115651  0.065087296261  0.02156927 -0.09064962
## 950   0.0506980280  0.061033820 -0.138538839132  0.01452705 -0.15786656
## 953   0.0051558190 -0.013180049  0.005623361646 -0.08720926 -0.09987979
## 955   0.0524879990 -0.015289789 -0.067469748600  0.01775201 -0.08700179
## 956   0.0591587452 -0.064950552 -0.009926256549 -0.02425376  0.08010417
## 961   0.0245354397  0.035444912 -0.059204189579 -0.02440770  0.08133723
## 967  -0.0313324923  0.080168328 -0.043444976705  0.02140469 -0.10147881
## 972  -0.0632584832  0.062625174  0.020057887333  0.01963738 -0.07455618
## 975   0.0095564068  0.040321978 -0.043891977058 -0.02155600  0.07192686
## 976   0.0085816454  0.047664806 -0.057765364880 -0.08865234 -0.12176772
## 977   0.0014144205  0.033275676 -0.034035729627 -0.08830675 -0.10887315
## 980   0.0253061902 -0.006451708 -0.038422898307  0.02005102 -0.06422146
## 984   0.0761082051 -0.019052946 -0.076580903691 -0.03026700  0.09403175
## 988   0.0770815595 -0.034737179 -0.082454256888  0.01557689 -0.10443103
## 992   0.0086053142  0.046517889 -0.056691804732 -0.08864507 -0.12088871
## 997   0.0364223946 -0.049906970 -0.011748798480  0.02050950 -0.07238784
## 1000 -0.0117153327  0.005705746  0.010602145425 -0.08595414 -0.09951017
##          cov.r      cook.d         hat
## 5    1.0119988 0.003601324 0.012826119
## 9    0.9999273 0.001150827 0.002640252
## 13   1.0024883 0.001079011 0.003656786
## 25   1.0018276 0.002283288 0.004854630
## 26   1.0048428 0.001211051 0.005189871
## 29   1.0016424 0.001154229 0.003370873
## 33   1.0028842 0.001058160 0.003822832
## 41   1.0028488 0.003012658 0.006088440
## 51   0.9956379 0.004287956 0.003487367
## 53   0.9956837 0.004345316 0.003533694
## 55   0.9985961 0.001791482 0.002892721
## 56   0.9989761 0.002729400 0.003883274
## 64   1.0008581 0.001934549 0.004006751
## 72   1.0059925 0.002945603 0.007969413
## 73   1.0010118 0.004579346 0.006173645
## 84   1.0038429 0.002026305 0.005697737
## 87   1.0002835 0.004786782 0.005901280
## 88   0.9979507 0.001549620 0.002416006
## 90   1.0110216 0.001156067 0.009522798
## 91   0.9967725 0.004740574 0.004178540
## 97   1.0006044 0.004579081 0.005952020
## 104  1.0016957 0.001514058 0.003909350
## 108  1.0044427 0.001254465 0.005018610
## 120  1.0097466 0.002971430 0.010617433
## 122  1.0017711 0.002126821 0.004664578
## 123  0.9991227 0.002431729 0.003700935
## 136  1.0021925 0.001280174 0.003827329
## 140  1.0032868 0.001096425 0.004101361
## 143  1.0040945 0.001136418 0.004620655
## 145  1.0046985 0.001081284 0.004882323
## 146  1.0010911 0.003045802 0.005154070
## 147  1.0087436 0.001092161 0.007656896
## 149  0.9990035 0.008421112 0.006847845
## 159  1.0019552 0.004539856 0.006678775
## 161  0.9986145 0.001689992 0.002795995
## 164  1.0063176 0.008675978 0.011495484
## 169  1.0028433 0.001202441 0.004039966
## 175  0.9991002 0.003530179 0.004525919
## 179  1.0096181 0.001250032 0.008569067
## 187  1.0026894 0.001264663 0.004056525
## 195  1.0022368 0.001587160 0.004275955
## 198  0.9964065 0.008346082 0.005473704
## 202  1.0028465 0.001530678 0.004521491
## 207  1.0029489 0.003035414 0.006165338
## 216  0.9986370 0.003600005 0.004357272
## 217  1.0024486 0.003831399 0.006492772
## 219  1.0020305 0.001962481 0.004620783
## 223  1.0097810 0.001422361 0.008944072
## 231  1.0052300 0.002764792 0.007309941
## 234  1.0094265 0.001821118 0.009197069
## 244  1.0019638 0.001257260 0.003680635
## 248  1.0024629 0.001472866 0.004240228
## 249  0.9996094 0.002541980 0.004011473
## 255  0.9975814 0.004577666 0.004452386
## 257  1.0043334 0.001124516 0.004739698
## 260  1.0037291 0.001170263 0.004468883
## 261  0.9959653 0.004978951 0.003951482
## 265  1.0022023 0.004430513 0.006752356
## 273  0.9948751 0.005036142 0.003541381
## 278  0.9982440 0.001405806 0.002359225
## 286  0.9971518 0.003632278 0.003731329
## 288  1.0069344 0.001543650 0.007051476
## 289  1.0024961 0.002988320 0.005867441
## 307  1.0036368 0.003795953 0.007169507
## 309  1.0035019 0.001437093 0.004750850
## 310  1.0017693 0.003468963 0.005845984
## 312  0.9983437 0.004140958 0.004557339
## 313  1.0005006 0.006448653 0.006871932
## 322  1.0123694 0.001505085 0.011088050
## 327  0.9982698 0.001945691 0.002919429
## 332  1.0016854 0.001416269 0.003773274
## 336  1.0133156 0.003203768 0.013536086
## 338  1.0041352 0.003128308 0.006949272
## 340  0.9988942 0.003225480 0.004219319
## 343  1.0030344 0.002184803 0.005408091
## 345  1.0020903 0.001635494 0.004262857
## 347  1.0021063 0.003265460 0.005876004
## 349  1.0017058 0.001171311 0.003426937
## 353  1.0010507 0.003035710 0.005124956
## 354  0.9982394 0.001321756 0.002262330
## 358  1.0020849 0.001580814 0.004190136
## 359  1.0099875 0.001210616 0.008793110
## 371  1.0051550 0.004317956 0.008487880
## 374  1.0050036 0.001097477 0.005096794
## 380  1.0045926 0.001993472 0.006109159
## 385  0.9943912 0.005192943 0.003424050
## 386  1.0094690 0.001701714 0.009080595
## 388  0.9997185 0.004133349 0.005218637
## 389  0.9992681 0.001097168 0.002324828
## 392  1.0042201 0.003853065 0.007569137
## 395  0.9995575 0.001232193 0.002611761
## 396  1.0056086 0.001844554 0.006567203
## 401  1.0018824 0.003136344 0.005649409
## 403  0.9997641 0.001162333 0.002594454
## 413  1.0030434 0.001058980 0.003907365
## 414  0.9948791 0.005127536 0.003582331
## 416  1.0094147 0.003036640 0.010431370
## 422  0.9979482 0.001675175 0.002542652
## 432  1.0029509 0.001735029 0.004843464
## 436  1.0014637 0.001064389 0.003146621
## 437  1.0088531 0.001121473 0.007787415
## 439  1.0115063 0.001455989 0.010333207
## 444  0.9957323 0.004487114 0.003622658
## 448  1.0082228 0.001228659 0.007494252
## 449  1.0072325 0.001588065 0.007317896
## 450  1.0043764 0.001172083 0.004845266
## 453  0.9980197 0.001435003 0.002317246
## 454  0.9974132 0.002781042 0.003275861
## 456  1.0137557 0.001125772 0.011725463
## 458  0.9991142 0.002359627 0.003634606
## 459  0.9982391 0.001182843 0.002097557
## 462  0.9990793 0.001080822 0.002237806
## 468  0.9990302 0.003467710 0.004450694
## 472  1.0016589 0.001331059 0.003641631
## 473  1.0085061 0.001098367 0.007490497
## 476  1.0026419 0.001980643 0.004966502
## 479  1.0033881 0.002402858 0.005836075
## 482  0.9982873 0.001093890 0.002001588
## 484  0.9980261 0.001898723 0.002785624
## 490  0.9990021 0.001450434 0.002676330
## 491  0.9986221 0.001109245 0.002125755
## 493  1.0029280 0.001504745 0.004530058
## 495  1.0016025 0.001385343 0.003690936
## 499  0.9978444 0.001824372 0.002651019
## 500  0.9970378 0.004014440 0.003906636
## 513  0.9980984 0.001777472 0.002696030
## 519  1.0001319 0.003718485 0.005158232
## 523  1.0008506 0.003133495 0.005099228
## 535  1.0001646 0.003691898 0.005156728
## 537  0.9968569 0.005270884 0.004472297
## 540  0.9988075 0.001059579 0.002119304
## 545  0.9991660 0.001132033 0.002337008
## 554  0.9980553 0.001483245 0.002381001
## 556  1.0071134 0.003231351 0.008970328
## 567  0.9934885 0.006032266 0.003410770
## 572  1.0026557 0.001590095 0.004498743
## 583  1.0066631 0.003074359 0.008529970
## 591  1.0037723 0.010959881 0.010615573
## 598  1.0055161 0.006205770 0.009835068
## 599  1.0135993 0.001974635 0.012627665
## 605  1.0064768 0.001125626 0.006097931
## 608  0.9982894 0.006744673 0.005814310
## 615  0.9984349 0.004808469 0.004974367
## 616  1.0153796 0.003706639 0.015558182
## 623  0.9981231 0.001689682 0.002618037
## 629  1.0029060 0.001127092 0.003950658
## 631  1.0094337 0.003031320 0.010440608
## 632  1.0062763 0.001056769 0.005843125
## 633  1.0028884 0.001210794 0.004076859
## 636  1.0065072 0.001162680 0.006181091
## 637  0.9983013 0.002301441 0.003248991
## 649  0.9978254 0.001920446 0.002733517
## 654  1.0028258 0.001680146 0.004706291
## 656  1.0017690 0.002239605 0.004779965
## 661  0.9977392 0.002231375 0.002973878
## 664  1.0052520 0.003286064 0.007781272
## 665  1.0051657 0.003259245 0.007703795
## 666  0.9987275 0.003289533 0.004188926
## 668  0.9985296 0.001526999 0.002591558
## 676  1.0055765 0.003936812 0.008494725
## 688  1.0128361 0.002603972 0.012641677
## 690  1.0061752 0.001882056 0.006982542
## 692  1.0111400 0.001159474 0.009622550
## 693  1.0048921 0.001421730 0.005546493
## 700  1.0042029 0.001186430 0.004767057
## 702  1.0031031 0.001469364 0.004576033
## 707  0.9973056 0.003681036 0.003824645
## 708  1.0003957 0.002629673 0.004456477
## 710  0.9974612 0.002668625 0.003212309
## 716  1.0010929 0.001277361 0.003303831
## 722  1.0049466 0.001275047 0.005356421
## 726  1.0035087 0.003174676 0.006610033
## 727  1.0031498 0.001431390 0.004547866
## 732  1.0042840 0.001713655 0.005579735
## 733  1.0063962 0.001478403 0.006595229
## 742  1.0014171 0.001105885 0.003192141
## 744  0.9968814 0.004633679 0.004171704
## 746  1.0046182 0.003643685 0.007662184
## 747  1.0045467 0.011251535 0.011221831
## 749  0.9981234 0.007761561 0.006132074
## 756  0.9982456 0.002746981 0.003584836
## 759  1.0044113 0.002683975 0.006721401
## 766  1.0040861 0.001230665 0.004770996
## 771  1.0031540 0.002015059 0.005288809
## 773  0.9970605 0.004614504 0.004239833
## 775  1.0034956 0.003002281 0.006456602
## 781  0.9989330 0.001130096 0.002254992
## 785  1.0001670 0.004502649 0.005674103
## 788  1.0083906 0.001278024 0.007693606
## 789  1.0106156 0.001811287 0.010084360
## 791  0.9983011 0.001088095 0.001998393
## 792  1.0037461 0.003055267 0.006651422
## 801  1.0024797 0.001180949 0.003818844
## 808  1.0031022 0.003645899 0.006738998
## 811  1.0093223 0.002246912 0.009604181
## 812  1.0030930 0.001575404 0.004715407
## 827  1.0040689 0.003049916 0.006841947
## 830  1.0026601 0.002538651 0.005555234
## 831  0.9995717 0.001323539 0.002735777
## 833  1.0045907 0.002440337 0.006593282
## 834  1.0061497 0.001327637 0.006207263
## 839  0.9979973 0.002417423 0.003225624
## 842  1.0127429 0.001870264 0.011823437
## 850  1.0021696 0.001714484 0.004402141
## 851  1.0053302 0.005489908 0.009325549
## 860  1.0082369 0.002808465 0.009378832
## 863  1.0094958 0.001697586 0.009095400
## 867  1.0084633 0.003574867 0.010191156
## 868  1.0016607 0.001276577 0.003563900
## 875  0.9989985 0.002605225 0.003792864
## 877  0.9885312 0.014036379 0.003474076
## 878  1.0099720 0.003477395 0.011206431
## 884  0.9997142 0.005114067 0.005781541
## 886  0.9936133 0.005898995 0.003408719
## 887  0.9960857 0.007274316 0.004946013
## 888  1.0017539 0.002036553 0.004559418
## 891  1.0017517 0.001864835 0.004367039
## 900  0.9982575 0.001149948 0.002062839
## 901  1.0007936 0.003202174 0.005123195
## 910  0.9977068 0.002576263 0.003237675
## 913  0.9981499 0.001573130 0.002508058
## 914  1.0033681 0.004661382 0.007589747
## 921  1.0111894 0.002744511 0.011494342
## 923  1.0046922 0.003257121 0.007402095
## 930  1.0000310 0.003574156 0.005008189
## 936  1.0016143 0.001618191 0.004003053
## 937  1.0008489 0.003126662 0.005092953
## 940  1.0083091 0.001687474 0.008210981
## 950  1.0011512 0.009907081 0.008611966
## 953  0.9981278 0.003434874 0.004021041
## 955  0.9976310 0.002523963 0.003168121
## 956  1.0035151 0.001474434 0.004811331
## 961  1.0038084 0.001510675 0.005028918
## 967  1.0048591 0.002505363 0.006825363
## 972  1.0080780 0.001064364 0.007117925
## 975  1.0040233 0.001114400 0.004542227
## 976  0.9971833 0.006106797 0.004997945
## 977  0.9975589 0.004443170 0.004370057
## 980  0.9982953 0.001178185 0.002109445
## 984  1.0019208 0.002361830 0.004981769
## 988  0.9969608 0.004160514 0.003955404
## 992  0.9971562 0.006001357 0.004939506
## 997  0.9991554 0.001468025 0.002753956
## 1000 0.9992471 0.003168574 0.004340472

Interpretation: The analysis identified a few observations with high Cook’s distance values, indicating that they may be influential points that could disproportionately affect the model’s estimates. It is important to investigate these observations further to determine if they are data entry errors, outliers, or valid extreme cases. Depending on the findings, we may consider conducting sensitivity analyses by excluding these points to assess their impact on the model results.

Step 9: Model presentation

Regression table

final_model %>%
  tbl_regression(
    exponentiate = TRUE,
    conf.level = 0.95,
    label = list(
      age ~ "Age (years)",
      bmi ~ "Body Mass Index (BMI)",
      smoking_status ~ "Smoking Status (Yes vs No)"
    )
  ) %>%
  as_gt() %>%
  gt::tab_header(
    title = "Multiple Logistic Regression Analysis of Diabetes Complications",
    subtitle = "Odds Ratios with 95% Confidence Intervals"
  ) %>% 

# --- Adding model fit statistics ---
  gt::tab_source_note(
    source_note = paste0(
      "Note: Model fit statistics - AIC: ", round(AIC(final_model), 2),
      ", BIC: ", round(BIC(final_model), 2),
      ", Log-Likelihood: ", round(logLik(final_model), 2)
    )
  )
Multiple Logistic Regression Analysis of Diabetes Complications
Odds Ratios with 95% Confidence Intervals
Characteristic OR 95% CI p-value
Age (years) 1.02 1.01, 1.04 <0.001
Body Mass Index (BMI) 1.11 1.08, 1.14 <0.001
Smoking Status (Yes vs No)


    No
    Yes 2.69 1.94, 3.78 <0.001
Abbreviations: CI = Confidence Interval, OR = Odds Ratio
Note: Model fit statistics - AIC: 1207.07, BIC: 1226.7, Log-Likelihood: -599.53

Step 10: Interpretation of the final model table

Interpretation:

  • The final multiple logistic regression model indicates that age, BMI, and smoking status are significant predictors of diabetes complications.
  • Each additional year of age is associated with a 2.4% increase in the odds of complications (OR = 1.024, 95% CI: 1.012-1.036), and each unit increase in BMI corresponds to a 10.7% increase in odds (OR = 1.107, 95% CI: 1.080-1.134).
  • Smokers have nearly 2.7 times higher odds of developing complications compared to non-smokers (OR = 2.694, 95% CI: 1.940-3.738).
  • The model fit statistics (AIC, BIC, Log-Likelihood) suggest a reasonable fit to the data, supporting the validity of these findings.