This topic is for those who interested in how to use logistic regression model for basic epidemiology study with R. This topic is a part of FETP biostatistic course so I can not provide the dataset for you.

##Setting up environment : Package require
library(tidyverse)#For data manipulation and plot.
library(haven)#For import .dta file.
library(sjPlot)#For export model result into nice form table.
library(jtools)#For export model result into nice form table.
library(kableExtra)#For export model result into nice form table.

In this topic I try to fit the model to study about the relationship of oral contraceptive use and occurence of myocardial infarction from dataset.

oc_mi <- haven::read_dta("shapiro_mi.dta")
head(oc_mi)  
## # A tibble: 6 × 5
##          mi        oc     smoke agegroup     agemid
##   <dbl+lbl> <dbl+lbl> <dbl+lbl>    <dbl>  <dbl+lbl>
## 1    0 [No]   0 [No]   0 [None]        1 27 [25-29]
## 2    0 [No]   0 [No]   0 [None]        5 47 [45-49]
## 3    0 [No]   0 [No]   0 [None]        3 37 [35-39]
## 4    0 [No]   0 [No]   0 [None]        2 32 [30-34]
## 5    0 [No]   1 [Yes]  0 [None]        4 42 [40-44]
## 6    0 [No]   0 [No]   0 [None]        3 37 [35-39]

The data set provide occurence of myocardial infarction (mi) as outcome of interest, coded 0 mean absence of mi and 1 as presence of mi. For oral contraceptive use (oc) coded 0 mean no history of oral contraceptive use and 1 mean presence of oral contraceptive use. For smoking status (smoke) 0 mean no smoking , 1 mean smoking 1-24 rolls/day and 2 mean smoking more than 25 rolls/day. While age (agemid) record in number of years and they also devided age into groups.
1.Find crude odds ratio for all predictors.

oc_mi<- oc_mi%>%
             mutate(mi=factor(mi,labels = c("negative","positive")))%>%
             mutate(oc=factor(oc,labels = c("not use","use")))%>%
             mutate(smoke=factor(smoke,labels = c("none","1-24",">25")))
mi_oc_model <- glm(data = oc_mi,
                   mi~oc,
                   family = binomial(link = "logit"))

mi_smoke_model <- glm(data = oc_mi,
                      mi~smoke,
                      family = binomial(link = "logit"))

mi_age_model <- glm(data = oc_mi,
                    mi~agemid,
                    family = binomial(link = "logit"))

sjPlot::tab_model(mi_oc_model,
                  show.reflvl = TRUE,
                  show.intercept = FALSE,
                  show.r2 = FALSE,
                  show.obs = FALSE,
                  pred.labels = c("Oral contracetion use"),
                  dv.labels = "Crude odds ratio of being mi")
  Crude odds ratio of being mi
Predictors Odds Ratios CI p
Oral contracetion use 1.68 1.08 – 2.55 0.017
tab_model(mi_smoke_model,
                  show.reflvl = TRUE,
                  show.intercept = FALSE,
                  show.r2 = FALSE,
                  show.obs = FALSE,
                  pred.labels = c("Smoke 1-24 rolls/day",
                                  "Smoke more than 25 rolls/day"),
                  dv.labels = "Crude odds ratio of being mi")
  Crude odds ratio of being mi
Predictors Odds Ratios CI p
Smoke 1-24 rolls/day 2.82 1.91 – 4.24 <0.001
Smoke more than 25 rolls/day 7.58 5.18 – 11.31 <0.001
tab_model(mi_age_model,
                  show.reflvl = TRUE,
                  show.intercept = FALSE,
                  show.r2 = FALSE,
                  show.obs = FALSE,
                  pred.labels = c("Age"),
                  dv.labels = "Crude odds ratio of being mi")
  Crude odds ratio of being mi
Predictors Odds Ratios CI p
Age 1.13 1.11 – 1.16 <0.001

Interpretation :
1. Odds of being myocardial infarction among those who use oral contraception is 1.68 times of those who not use.
2. Odds of being myocardial infarction among those who smoke 1-24 rolls per day is 2.82 times of those who not smoke.
3. Odds of being myocardial infarction among those who smoke more than 25 rolls per day is 7.58 times of thos who not smoke.
4.Odds of being myocardial infarction increase 13% when age increase 1 years.
5. All predictors are statistical significant at alpha level =0.05.
We can conclude that the relationship of using oral contraception and being mi is not occur by chance

2.Examine adjusted odds ratio of all predictors

mi_adj<- glm(data = oc_mi, mi~oc+smoke+agemid,
             family = binomial(link = "logit"))
tab_model(mi_adj,
          show.ci = FALSE,
          show.intercept = FALSE,
          show.r2 = FALSE,
          show.obs = FALSE,
          pred.labels = c("Use oral contraception",
                          "Smoke 1-24 rolls/day",
                          "Smoke more than 25 rolls/day",
                          "Age"),
          dv.labels = "Adjusted odds ratio")
  Adjusted odds ratio
Predictors Odds Ratios p
Use oral contraception 3.28 <0.001
Smoke 1-24 rolls/day 3.08 <0.001
Smoke more than 25 rolls/day 8.47 <0.001
Age 1.16 <0.001

Interpretation : 1. After adjusted for age and smoking status, odds of being mi in those who use oral contraception is 3.28 times of those who not.
2. After adjusted for oral contraception use and age, odds of being mi in those who smoke 1-2 rolls/day is 3.08 times of those who not smoke and odds of being mi in those who smoke more than 25 rolls/day is 8.47 times of those who not smoke.
3. After adjusted for oral contraception use and smoking status, odds of being mi increase 16% if age increase 1 year.
3.Explore interaction
Now I will add multiplicative cross-product term to identify interaction between oral contraception use and other predictors.

mi_oc_smoke <-glm(data = oc_mi, mi~oc+smoke+(oc*smoke),
                  family = binomial(link = "logit"))
summary(mi_oc_smoke)
## 
## Call:
## glm(formula = mi ~ oc + smoke + (oc * smoke), family = binomial(link = "logit"), 
##     data = oc_mi)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.0230  -0.5112  -0.3176  -0.2970   2.5072  
## 
## Coefficients:
##                 Estimate Std. Error z value Pr(>|z|)    
## (Intercept)      -3.0990     0.1753 -17.679  < 2e-16 ***
## ocuse             0.5341     0.5477   0.975    0.329    
## smoke1-24         1.1299     0.2125   5.317 1.05e-07 ***
## smoke>25          1.9613     0.2123   9.237  < 2e-16 ***
## ocuse:smoke1-24  -1.3981     0.8169  -1.712    0.087 .  
## ocuse:smoke>25    0.2289     0.6253   0.366    0.714    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1437.6  on 1975  degrees of freedom
## Residual deviance: 1305.0  on 1970  degrees of freedom
## AIC: 1317
## 
## Number of Fisher Scoring iterations: 5

The p-value of cross-product term is not a statistical significant so there is absence multiplicative interaction between oral contraceptive use and smoking status.

mi_oc_age <- glm(data = oc_mi, mi~oc+agemid+(oc*agemid),
                 family = binomial(link = "logit"))
summary(mi_oc_age)
## 
## Call:
## glm(formula = mi ~ oc + agemid + (oc * agemid), family = binomial(link = "logit"), 
##     data = oc_mi)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.2139  -0.5503  -0.3911  -0.2754   2.8260  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -7.86873    0.61310 -12.834   <2e-16 ***
## ocuse         1.92044    1.34502   1.428    0.153    
## agemid        0.14423    0.01440  10.018   <2e-16 ***
## ocuse:agemid -0.01585    0.03596  -0.441    0.659    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1437.6  on 1975  degrees of freedom
## Residual deviance: 1288.2  on 1972  degrees of freedom
## AIC: 1296.2
## 
## Number of Fisher Scoring iterations: 6

Again the p-value of cross-product term is not a statistical significant so there is absence multiplicative interaction between oral contraceptive use and age.

Since there is no interaction between oral contraceptive use and other predictors the next step is compare crude odds ratio and adjust odds ratio between oral contraception use and other predictors to find the confounder.
4.Compare crude odds ration and adjusted odds ratio

oc_smoke <- glm(data = oc_mi, mi~oc+smoke,
                family = binomial(link = "logit"))
tab_model(oc_smoke,
          show.ci = FALSE,
          show.intercept = FALSE,
          show.r2 = FALSE,
          show.obs = FALSE,
          pred.labels = c("Use oral contraception",
                          "Smoke 1-24 rolls/day",
                          "Smoke more than 25 rolls/day"),
          dv.labels = "Adjusted odds ratio")
  Adjusted odds ratio
Predictors Odds Ratios p
Use oral contraception 1.39 0.147
Smoke 1-24 rolls/day 2.81 <0.001
Smoke more than 25 rolls/day 7.44 <0.001

The difference between crude odds ratio (1.68) and odds ratio of being mi among those who use oral contraception after adjusted for smoking status (1.39) is 17%. We can say that smoking status is a confounder.

oc_age <- glm(data = oc_mi, mi~oc+agemid,
                family = binomial(link = "logit"))
tab_model(oc_age,
          show.ci = FALSE,
          show.intercept = FALSE,
          show.r2 = FALSE,
          show.obs = FALSE,
          pred.labels = c("Use oral contraception",
                          "Age"),
          dv.labels = "Adjusted odds ratio")
  Adjusted odds ratio
Predictors Odds Ratios p
Use oral contraception 3.81 <0.001
Age 1.15 <0.001

The difference between crude odds ratio (1.68) and odds ratio of being mi among those who use oral contraception after adjusted for age (3.81) is 127%. We can say that age is a confounder too.

Conclusion
According to this dataset there is a positive relation of using oral contraception and being myocardial infarction without effect modification by the way this relationship is confounded by age and smoking status.