This topic is for those who interested in how to use logistic regression model for basic epidemiology study with R. This topic is a part of FETP biostatistic course so I can not provide the dataset for you.
##Setting up environment : Package require
library(tidyverse)#For data manipulation and plot.
library(haven)#For import .dta file.
library(sjPlot)#For export model result into nice form table.
library(jtools)#For export model result into nice form table.
library(kableExtra)#For export model result into nice form table.
In this topic I try to fit the model to study about the relationship of oral contraceptive use and occurence of myocardial infarction from dataset.
oc_mi <- haven::read_dta("shapiro_mi.dta")
head(oc_mi)
## # A tibble: 6 × 5
## mi oc smoke agegroup agemid
## <dbl+lbl> <dbl+lbl> <dbl+lbl> <dbl> <dbl+lbl>
## 1 0 [No] 0 [No] 0 [None] 1 27 [25-29]
## 2 0 [No] 0 [No] 0 [None] 5 47 [45-49]
## 3 0 [No] 0 [No] 0 [None] 3 37 [35-39]
## 4 0 [No] 0 [No] 0 [None] 2 32 [30-34]
## 5 0 [No] 1 [Yes] 0 [None] 4 42 [40-44]
## 6 0 [No] 0 [No] 0 [None] 3 37 [35-39]
The data set provide occurence of myocardial infarction (mi) as
outcome of interest, coded 0 mean absence of mi and 1 as presence of mi.
For oral contraceptive use (oc) coded 0 mean no history of oral
contraceptive use and 1 mean presence of oral contraceptive use. For
smoking status (smoke) 0 mean no smoking , 1 mean smoking 1-24 rolls/day
and 2 mean smoking more than 25 rolls/day. While age (agemid) record in
number of years and they also devided age into groups.
1.Find crude odds ratio for all
predictors.
oc_mi<- oc_mi%>%
mutate(mi=factor(mi,labels = c("negative","positive")))%>%
mutate(oc=factor(oc,labels = c("not use","use")))%>%
mutate(smoke=factor(smoke,labels = c("none","1-24",">25")))
mi_oc_model <- glm(data = oc_mi,
mi~oc,
family = binomial(link = "logit"))
mi_smoke_model <- glm(data = oc_mi,
mi~smoke,
family = binomial(link = "logit"))
mi_age_model <- glm(data = oc_mi,
mi~agemid,
family = binomial(link = "logit"))
sjPlot::tab_model(mi_oc_model,
show.reflvl = TRUE,
show.intercept = FALSE,
show.r2 = FALSE,
show.obs = FALSE,
pred.labels = c("Oral contracetion use"),
dv.labels = "Crude odds ratio of being mi")
| Crude odds ratio of being mi | |||
|---|---|---|---|
| Predictors | Odds Ratios | CI | p |
| Oral contracetion use | 1.68 | 1.08 – 2.55 | 0.017 |
tab_model(mi_smoke_model,
show.reflvl = TRUE,
show.intercept = FALSE,
show.r2 = FALSE,
show.obs = FALSE,
pred.labels = c("Smoke 1-24 rolls/day",
"Smoke more than 25 rolls/day"),
dv.labels = "Crude odds ratio of being mi")
| Crude odds ratio of being mi | |||
|---|---|---|---|
| Predictors | Odds Ratios | CI | p |
| Smoke 1-24 rolls/day | 2.82 | 1.91 – 4.24 | <0.001 |
| Smoke more than 25 rolls/day | 7.58 | 5.18 – 11.31 | <0.001 |
tab_model(mi_age_model,
show.reflvl = TRUE,
show.intercept = FALSE,
show.r2 = FALSE,
show.obs = FALSE,
pred.labels = c("Age"),
dv.labels = "Crude odds ratio of being mi")
| Crude odds ratio of being mi | |||
|---|---|---|---|
| Predictors | Odds Ratios | CI | p |
| Age | 1.13 | 1.11 – 1.16 | <0.001 |
Interpretation :
1. Odds of being myocardial infarction among those who use oral
contraception is 1.68 times of those who not use.
2. Odds of being myocardial infarction among those who smoke 1-24 rolls
per day is 2.82 times of those who not smoke.
3. Odds of being myocardial infarction among those who smoke more than
25 rolls per day is 7.58 times of thos who not smoke.
4.Odds of being myocardial infarction increase 13% when age
increase 1 years.
5. All predictors are statistical significant at alpha level
=0.05.
We can conclude that the relationship of using oral contraception and
being mi is not occur by chance
2.Examine adjusted odds ratio of all predictors
mi_adj<- glm(data = oc_mi, mi~oc+smoke+agemid,
family = binomial(link = "logit"))
tab_model(mi_adj,
show.ci = FALSE,
show.intercept = FALSE,
show.r2 = FALSE,
show.obs = FALSE,
pred.labels = c("Use oral contraception",
"Smoke 1-24 rolls/day",
"Smoke more than 25 rolls/day",
"Age"),
dv.labels = "Adjusted odds ratio")
| Adjusted odds ratio | ||
|---|---|---|
| Predictors | Odds Ratios | p |
| Use oral contraception | 3.28 | <0.001 |
| Smoke 1-24 rolls/day | 3.08 | <0.001 |
| Smoke more than 25 rolls/day | 8.47 | <0.001 |
| Age | 1.16 | <0.001 |
Interpretation : 1. After adjusted for age and
smoking status, odds of being mi in those who use oral contraception is
3.28 times of those who not.
2. After adjusted for oral contraception use and age, odds of being mi
in those who smoke 1-2 rolls/day is 3.08 times of those who not
smoke and odds of being mi in those who smoke more than 25
rolls/day is 8.47 times of those who not smoke.
3. After adjusted for oral contraception use and smoking status, odds of
being mi increase 16% if age increase 1 year.
3.Explore interaction
Now I will add multiplicative cross-product term to identify interaction
between oral contraception use and other predictors.
mi_oc_smoke <-glm(data = oc_mi, mi~oc+smoke+(oc*smoke),
family = binomial(link = "logit"))
summary(mi_oc_smoke)
##
## Call:
## glm(formula = mi ~ oc + smoke + (oc * smoke), family = binomial(link = "logit"),
## data = oc_mi)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.0230 -0.5112 -0.3176 -0.2970 2.5072
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.0990 0.1753 -17.679 < 2e-16 ***
## ocuse 0.5341 0.5477 0.975 0.329
## smoke1-24 1.1299 0.2125 5.317 1.05e-07 ***
## smoke>25 1.9613 0.2123 9.237 < 2e-16 ***
## ocuse:smoke1-24 -1.3981 0.8169 -1.712 0.087 .
## ocuse:smoke>25 0.2289 0.6253 0.366 0.714
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1437.6 on 1975 degrees of freedom
## Residual deviance: 1305.0 on 1970 degrees of freedom
## AIC: 1317
##
## Number of Fisher Scoring iterations: 5
The p-value of cross-product term is not a statistical significant so there is absence multiplicative interaction between oral contraceptive use and smoking status.
mi_oc_age <- glm(data = oc_mi, mi~oc+agemid+(oc*agemid),
family = binomial(link = "logit"))
summary(mi_oc_age)
##
## Call:
## glm(formula = mi ~ oc + agemid + (oc * agemid), family = binomial(link = "logit"),
## data = oc_mi)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.2139 -0.5503 -0.3911 -0.2754 2.8260
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -7.86873 0.61310 -12.834 <2e-16 ***
## ocuse 1.92044 1.34502 1.428 0.153
## agemid 0.14423 0.01440 10.018 <2e-16 ***
## ocuse:agemid -0.01585 0.03596 -0.441 0.659
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1437.6 on 1975 degrees of freedom
## Residual deviance: 1288.2 on 1972 degrees of freedom
## AIC: 1296.2
##
## Number of Fisher Scoring iterations: 6
Again the p-value of cross-product term is not a statistical significant so there is absence multiplicative interaction between oral contraceptive use and age.
Since there is no interaction between oral contraceptive use and
other predictors the next step is compare crude odds ratio and
adjust odds ratio between oral contraception use and other predictors to
find the confounder.
4.Compare crude odds ration and adjusted odds ratio
oc_smoke <- glm(data = oc_mi, mi~oc+smoke,
family = binomial(link = "logit"))
tab_model(oc_smoke,
show.ci = FALSE,
show.intercept = FALSE,
show.r2 = FALSE,
show.obs = FALSE,
pred.labels = c("Use oral contraception",
"Smoke 1-24 rolls/day",
"Smoke more than 25 rolls/day"),
dv.labels = "Adjusted odds ratio")
| Adjusted odds ratio | ||
|---|---|---|
| Predictors | Odds Ratios | p |
| Use oral contraception | 1.39 | 0.147 |
| Smoke 1-24 rolls/day | 2.81 | <0.001 |
| Smoke more than 25 rolls/day | 7.44 | <0.001 |
The difference between crude odds ratio (1.68) and odds ratio of being mi among those who use oral contraception after adjusted for smoking status (1.39) is 17%. We can say that smoking status is a confounder.
oc_age <- glm(data = oc_mi, mi~oc+agemid,
family = binomial(link = "logit"))
tab_model(oc_age,
show.ci = FALSE,
show.intercept = FALSE,
show.r2 = FALSE,
show.obs = FALSE,
pred.labels = c("Use oral contraception",
"Age"),
dv.labels = "Adjusted odds ratio")
| Adjusted odds ratio | ||
|---|---|---|
| Predictors | Odds Ratios | p |
| Use oral contraception | 3.81 | <0.001 |
| Age | 1.15 | <0.001 |
The difference between crude odds ratio (1.68) and odds ratio of being mi among those who use oral contraception after adjusted for age (3.81) is 127%. We can say that age is a confounder too.
Conclusion
According to this dataset there is a positive relation of using oral
contraception and being myocardial infarction without effect
modification by the way this relationship is confounded by age and
smoking status.