This tutorial is located on RPubs.
The entire R Markdown code is located on my GitHub page
epitools
This tutorial will center around using the R package epitools
to understand confounding and interaction in epidemiological studies.
Install the package epitools
# install.packages("epitools") ### Once installed, you can comment this line
library("epitools") ### Load the epitools package
epitools
is an R package that allows you to perform basic epidemiologic calculations such as the risk ratio (RR) and odds ratio (OR).
Documentation on epitools
can be found using the following link
In this example, we will go over how to use epitools
to estimate the risk ratio and odds ratio using a matrix that we create. This is useful for when you have a table (e.g., 2 x 2 contingency table) and want to check the risk ratio or odds ratio.
Here some examples of using epitools
.
##########################################
### Estimate the risk and odds ratios
##########################################
# Step 1: Create a matrix
Table1 <- matrix(c(11, 36, 518, 517), nrow = 2, ncol = 2)
Table1
## [,1] [,2]
## [1,] 11 518
## [2,] 36 517
This generates the 2 x 2 contingency table that we will use for this simple example. The arrangement is critical for interpreting the output. By default, the unexposed group (exposure = 0) is in the first row and the non-outcome (outcome = 0) is in the first column. However, you can use the epitools
command to change this arrangement with the rev()
argument so that the analysis will use the contingency table on the right where the exposed group (exposure = 1) is in the first row and the outcome (outcome = 1) is in the first column.
Here is an example of the arrangement:
Throughout this exercise, I will interpret the findings using the arrangement on the right.
riskratio.wald()
function to estimate the risk ratio for our object (Table1
)# Step 2: Estimate the RR
riskratio.wald(Table1, rev = c("both")) # use the rev() argument to change the arrangement of the contingency table
## $data
## Outcome
## Predictor Disease2 Disease1 Total
## Exposed2 517 36 553
## Exposed1 518 11 529
## Total 1035 47 1082
##
## $measure
## risk ratio with 95% C.I.
## Predictor estimate lower upper
## Exposed2 1.0000000 NA NA
## Exposed1 0.3194182 0.1643304 0.6208709
##
## $p.value
## two-sided
## Predictor midp.exact fisher.exact chi.square
## Exposed2 NA NA NA
## Exposed1 0.0002954494 0.0003001641 0.0003517007
##
## $correction
## [1] FALSE
##
## attr(,"method")
## [1] "Unconditional MLE & normal approximation (Wald) CI"
The risk ratio is 0.32 with a 95% confidence interval (CI) of 0.16, 0.62. The p-value is <0.001. Therefore, subjects in the Exposure Group 1 had a 68% lower risk of developing the outcome (Outcome = 1) compared to subjects in the Exposure Group 2.
oddsratio.wald()
function to estimate the odds ratio for our object (Table1
)# Step 3: Estimate the OR
oddsratio.wald(Table1, rev = c("both"))
## $data
## Outcome
## Predictor Disease2 Disease1 Total
## Exposed2 517 36 553
## Exposed1 518 11 529
## Total 1035 47 1082
##
## $measure
## odds ratio with 95% C.I.
## Predictor estimate lower upper
## Exposed2 1.0000000 NA NA
## Exposed1 0.3049657 0.1535563 0.6056675
##
## $p.value
## two-sided
## Predictor midp.exact fisher.exact chi.square
## Exposed2 NA NA NA
## Exposed1 0.0002954494 0.0003001641 0.0003517007
##
## $correction
## [1] FALSE
##
## attr(,"method")
## [1] "Unconditional MLE & normal approximation (Wald) CI"
The odds ratio is 0.30 with a 95% CI of 0.15, 0.61. The p-value is <0.001. Therefore the subjects in the Exposure Group 1 had a 70% lower odds or developing the outcome (Outcome = 1) compared to subjects in the Exposure Group 2.
Once you’ve estimated the risk ratio and odds ratio, you can double check your work. Here is how I checked the risk and odds ratio calculation.
### RR check
risk1 <- 11 / (11+517)
risk2 <- 36/(36+518)
RR <- risk1/risk2
RR
## [1] 0.3206019
### OR check
num <- 11 * 517
denom <- 518 * 36
OR <- num / denom
OR
## [1] 0.3049657
Unfortunately, epitools
does not seem to estimate the risk differences, so I show how you can do that using some simple R codes.
# Step 4: Estimate the risk difference
risk1 <- 11 / (518 + 11)
risk2 <- 36 / (517 + 36)
RD <- risk1 - risk2
RD
## [1] -0.04430551
Suppose we were interested in figuring out if confounding or effect modification was happening to a hypothetical drug study. In this drug study, patients either have high or low cholesterol. These patients were followed up for 12 months and assessed for all-cause mortality (e.g., death).
Here is a direct acyclic graph depicting Cholesterol and Death with Exercise as a potential confounder.
The RR and OR in this example is a crude estimate, which means that we did not adjust for confounding. Let’s assume a retrospective cohort study was performed to collect data to investigate the association between cholesterol status (high versus low) and all-cause mortality. The data are aggregated into a contingency table.
We can enter the values from the table into R as a matrix
##################################################################################
# Motivating Example #1 [Does High cholesterol (v. Low cholesterol) cause Death?]
##################################################################################
Table2 <- matrix(c(250, 150, 2000, 1500), nrow = 2, ncol = 2)
Table2
## [,1] [,2]
## [1,] 250 2000
## [2,] 150 1500
Then we can estimate the risk ratio (RR) and odds ratio (OR) using the riskratio.wald()
and oddsratio.wald()
functions. We will use the Wald method for estimating the 95% CIs.
Throughout the exercise, I will present the findings in both the RR and OR. Recall that with rare events, the RR and OR yield similar results. However, with more common events, the RR and OR will diverge.
riskratio.wald(Table2, rev = c("both"))
## $data
## Outcome
## Predictor Disease2 Disease1 Total
## Exposed2 1500 150 1650
## Exposed1 2000 250 2250
## Total 3500 400 3900
##
## $measure
## risk ratio with 95% C.I.
## Predictor estimate lower upper
## Exposed2 1.000000 NA NA
## Exposed1 1.222222 1.008509 1.481224
##
## $p.value
## two-sided
## Predictor midp.exact fisher.exact chi.square
## Exposed2 NA NA NA
## Exposed1 0.03942366 0.04233379 0.03993182
##
## $correction
## [1] FALSE
##
## attr(,"method")
## [1] "Unconditional MLE & normal approximation (Wald) CI"
oddsratio.wald(Table2, rev = c("both"))
## $data
## Outcome
## Predictor Disease2 Disease1 Total
## Exposed2 1500 150 1650
## Exposed1 2000 250 2250
## Total 3500 400 3900
##
## $measure
## odds ratio with 95% C.I.
## Predictor estimate lower upper
## Exposed2 1.00 NA NA
## Exposed1 1.25 1.009986 1.547051
##
## $p.value
## two-sided
## Predictor midp.exact fisher.exact chi.square
## Exposed2 NA NA NA
## Exposed1 0.03942366 0.04233379 0.03993182
##
## $correction
## [1] FALSE
##
## attr(,"method")
## [1] "Unconditional MLE & normal approximation (Wald) CI"
Subjects with High Cholesterol had a 22% increase in the risk of Death compared to subjects with Low Cholesterol (RR = 1.22; 95% CI: 1.01, 1.48; P=0.040).
Subjects with High Cholesterol had a 25% increase in odds of Death compared to subjects with Low Cholesterol (OR = 1.25; 95% CI: 1.01, 1.55; P=0.040).
In both these measures, the association was significant.
Let’s suppose we are interested in seeing whether Exercise is a confounder on the Drug to Death direct pathway.
DAG diagram with Exercise as a confounder.
To check if there is confounding, we need to determine if the confounder meets three criteria:
The confounding variable is associated with the outcome
The confounding variable is associated with treatment assignment
The confounding variable is not on the causal pathway between the exposure and outcome
For the first criterion, we look at the association between Exercise and Death.
### Criterion 1: The confounding variable is associated with the outcome (Is exercise associated with death?)
Table3<- matrix(c(200, 200, 2400, 1100), nrow = 2, ncol= 2)
Table3
## [,1] [,2]
## [1,] 200 2400
## [2,] 200 1100
riskratio.wald(Table3, rev = c("both"))
## $data
## Outcome
## Predictor Disease2 Disease1 Total
## Exposed2 1100 200 1300
## Exposed1 2400 200 2600
## Total 3500 400 3900
##
## $measure
## risk ratio with 95% C.I.
## Predictor estimate lower upper
## Exposed2 1.0 NA NA
## Exposed1 0.5 0.4158255 0.6012138
##
## $p.value
## two-sided
## Predictor midp.exact fisher.exact chi.square
## Exposed2 NA NA NA
## Exposed1 3.715916e-13 4.52674e-13 8.380705e-14
##
## $correction
## [1] FALSE
##
## attr(,"method")
## [1] "Unconditional MLE & normal approximation (Wald) CI"
oddsratio.wald(Table3, rev = c("both"))
## $data
## Outcome
## Predictor Disease2 Disease1 Total
## Exposed2 1100 200 1300
## Exposed1 2400 200 2600
## Total 3500 400 3900
##
## $measure
## odds ratio with 95% C.I.
## Predictor estimate lower upper
## Exposed2 1.0000000 NA NA
## Exposed1 0.4583333 0.3720441 0.5646359
##
## $p.value
## two-sided
## Predictor midp.exact fisher.exact chi.square
## Exposed2 NA NA NA
## Exposed1 3.715916e-13 4.52674e-13 8.380705e-14
##
## $correction
## [1] FALSE
##
## attr(,"method")
## [1] "Unconditional MLE & normal approximation (Wald) CI"
First criterion: Confounder is associated with the outcome. In this example, Exercise is assocaited with Death.
Subjects who exercise had a lower risk (RR = 0.50; 95% CI: 0.42, 0.60) and odds (OR = 0.46; 95% CI: 0.37, 0.56 ) of Death compared to subjects who did not exercise. Hence, this satisfies the first criterion for confounding. The results are summarized into a table below.
For the second criterion, we look at the association between Exercise and Cholesterol Status (High v. Low).
### Criterion 2: The confounding variable (Exercise) is associated with the Cholesterol Status (Is exercise associated with cholesterol?)
Table4<- matrix(c(1750, 500, 850, 800), nrow = 2, ncol= 2)
Table4
## [,1] [,2]
## [1,] 1750 850
## [2,] 500 800
riskratio.wald(Table4, rev = c("both"))
## $data
## Outcome
## Predictor Disease2 Disease1 Total
## Exposed2 800 500 1300
## Exposed1 850 1750 2600
## Total 1650 2250 3900
##
## $measure
## risk ratio with 95% C.I.
## Predictor estimate lower upper
## Exposed2 1.00 NA NA
## Exposed1 1.75 1.62551 1.884024
##
## $p.value
## two-sided
## Predictor midp.exact fisher.exact chi.square
## Exposed2 NA NA NA
## Exposed1 0 5.385362e-66 3.221793e-66
##
## $correction
## [1] FALSE
##
## attr(,"method")
## [1] "Unconditional MLE & normal approximation (Wald) CI"
oddsratio.wald(Table4, rev = c("both"))
## $data
## Outcome
## Predictor Disease2 Disease1 Total
## Exposed2 800 500 1300
## Exposed1 850 1750 2600
## Total 1650 2250 3900
##
## $measure
## odds ratio with 95% C.I.
## Predictor estimate lower upper
## Exposed2 1.000000 NA NA
## Exposed1 3.294118 2.867891 3.78369
##
## $p.value
## two-sided
## Predictor midp.exact fisher.exact chi.square
## Exposed2 NA NA NA
## Exposed1 0 5.385362e-66 3.221793e-66
##
## $correction
## [1] FALSE
##
## attr(,"method")
## [1] "Unconditional MLE & normal approximation (Wald) CI"
Second criterion: Confounder is associated with the exposure In this example, Exercise is associated with Cholesterol Status (High and Low).
Third criterion: Confounder is not on the causal pathway between the exposure and outcome.
Subjects who exercise had a higher risk (RR = 1.75; 95% CI: 1.63, 1.88) and odds (OR = 3.29; 95% CI: 2.87, 3.78) of having High Cholesterol compared to subjects who do not exercise. Hence, this satisfies the second criterion for confounding. The results are summarized into a table below.
Since Exercise is not on the causal pathway between Cholesterol Status and Death, it meets the necessary criteria to be considered a confounder.
Stratifying your cohort is a great way to see how different the results can be. In this tutorial, we focus on stratifying on a couple of variables. But you can stratify many variables as long as you have a large enough sample.
We can check to see how much of an impact Exercise has on the exposure to outcome relationship using stratification. We do this by stratifying the groups into those who exercise and don’t exercise. Then we evaluate the exposure to outcome relationship.
Here is an illustration of how exercise is stratified into two strata (Exercise = 1 and Exercise = 0).
The association between Cholesterol Status and Death can be estimated for each strata.
### Distribution of Exercise across Drug groups
# Among subjects who exercise (N=2600)
Table3 <- matrix(c(150, 50, 1600, 800), nrow = 2, ncol = 2)
Table3
## [,1] [,2]
## [1,] 150 1600
## [2,] 50 800
riskratio.wald(Table3, rev = c("both"))
## $data
## Outcome
## Predictor Disease2 Disease1 Total
## Exposed2 800 50 850
## Exposed1 1600 150 1750
## Total 2400 200 2600
##
## $measure
## risk ratio with 95% C.I.
## Predictor estimate lower upper
## Exposed2 1.000000 NA NA
## Exposed1 1.457143 1.069385 1.985501
##
## $p.value
## two-sided
## Predictor midp.exact fisher.exact chi.square
## Exposed2 NA NA NA
## Exposed1 0.01429813 0.01507732 0.01578802
##
## $correction
## [1] FALSE
##
## attr(,"method")
## [1] "Unconditional MLE & normal approximation (Wald) CI"
oddsratio.wald(Table3, rev = c("both"))
## $data
## Outcome
## Predictor Disease2 Disease1 Total
## Exposed2 800 50 850
## Exposed1 1600 150 1750
## Total 2400 200 2600
##
## $measure
## odds ratio with 95% C.I.
## Predictor estimate lower upper
## Exposed2 1.0 NA NA
## Exposed1 1.5 1.077177 2.088794
##
## $p.value
## two-sided
## Predictor midp.exact fisher.exact chi.square
## Exposed2 NA NA NA
## Exposed1 0.01429813 0.01507732 0.01578802
##
## $correction
## [1] FALSE
##
## attr(,"method")
## [1] "Unconditional MLE & normal approximation (Wald) CI"
# Among subjects who do not exercise (N=1300)
Table4 <- matrix(c(100, 100, 400, 700), nrow = 2, ncol = 2)
Table4
## [,1] [,2]
## [1,] 100 400
## [2,] 100 700
riskratio.wald(Table4, rev = c("both"))
## $data
## Outcome
## Predictor Disease2 Disease1 Total
## Exposed2 700 100 800
## Exposed1 400 100 500
## Total 1100 200 1300
##
## $measure
## risk ratio with 95% C.I.
## Predictor estimate lower upper
## Exposed2 1.0 NA NA
## Exposed1 1.6 1.241526 2.061978
##
## $p.value
## two-sided
## Predictor midp.exact fisher.exact chi.square
## Exposed2 NA NA NA
## Exposed1 0.0003199078 0.0003604172 0.0002660503
##
## $correction
## [1] FALSE
##
## attr(,"method")
## [1] "Unconditional MLE & normal approximation (Wald) CI"
oddsratio.wald(Table4, rev = c("both"))
## $data
## Outcome
## Predictor Disease2 Disease1 Total
## Exposed2 700 100 800
## Exposed1 400 100 500
## Total 1100 200 1300
##
## $measure
## odds ratio with 95% C.I.
## Predictor estimate lower upper
## Exposed2 1.00 NA NA
## Exposed1 1.75 1.29231 2.369787
##
## $p.value
## two-sided
## Predictor midp.exact fisher.exact chi.square
## Exposed2 NA NA NA
## Exposed1 0.0003199078 0.0003604172 0.0002660503
##
## $correction
## [1] FALSE
##
## attr(,"method")
## [1] "Unconditional MLE & normal approximation (Wald) CI"
Among subjects who exercise, those with High Cholesterol had a higher risk (RR = 1.45; 95% CI: 1.07, 1.99) and odds (OR = 1.50; 95% CI: 1.08, 2.09) of Death compared to those with Low Cholesterol. Similarly, among subjects who did not exercise, those with High Cholesterol had a higher risk (RR = 1.60; 95% CI: 1.24, 2.06) and odds (OR = 1.75; 95% CI: 1.29, 2.37) of Death compared to those with Low Cholesterol.
Compared to the crude analysis where the RR = 1.22 and the OR = 1.25, the stratified results are much higher. this suggests that Exercise has some confounding effect on the exposure to outcome relationship. When we stratify the groups, we get a stronger measure of association between Cholesterol Status and Death. However, we would like to have a single measure of this association, which means that we need to combine these two stratified results. A common method for adjusting these stratified results into a single measure of association is to use the Mantel-Haenszel (M-H) method of adjustment.
These equations are for the point estimates (RR or OR), but not the 95% CI. If you need to estimate the 95% CI, it’s recommended that you use the epi.2by2()
function, which is part of the epiR
package.
The equation for the M-H adjusted risk ratio:
\(\begin{aligned} \LARGE RR_{adjusted} = \frac{\sum{\frac{a_{i}(c_{i} + d_{i})}{n_{i}}}}{\sum{\frac{c_{i}(a_{i} + b_{i})}{n_{i}}}} \end{aligned}\)
The equation for the M-H adjusted odds ratio:
\(\begin{aligned} \LARGE OR_{adjusted} = \frac{\sum{\frac{a_{i}d_{i}}{n_{i}}}}{\sum{\frac{c_{i}b_{i}}{n_{i}}}} \end{aligned}\)
Fortunately, R has an easier way to estimate the M-H adjusted RR and OR.
You need to install the epiR
package, which contains the epi.2by2()
function, which will generate the M-H adjusted RR and OR.
## install.packages("epiR")
library("epiR")
The epiR
documentation is available here.
We need to create an array with our stratified matrices. We already created two tables (Table4
and Table5
), which contains the stratified groups. Table3
contains the stratum where subjects were classified as having exercised (Exercise = 1), and Table4
contains the stratum where subjects were classified as not having exercised (Exercise = 0).
matrix.array <- array(c(Table3, Table4), dim = c(2, 2, 2))
matrix.array
## , , 1
##
## [,1] [,2]
## [1,] 150 1600
## [2,] 50 800
##
## , , 2
##
## [,1] [,2]
## [1,] 100 400
## [2,] 100 700
Once we have our array, we can use the epi.2by2()
function. This will generate the M-H adjusted RR and OR.
The output contains a lot of information. We are interested in the risk ratio (M-H) and the odds ratio (M-H).
epi.2by2(matrix.array)
## Outcome + Outcome - Total Inc risk * Odds
## Exposed + 250 2000 2250 11.11 0.125
## Exposed - 150 1500 1650 9.09 0.100
## Total 400 3500 3900 10.26 0.114
##
##
## Point estimates and 95% CIs:
## -------------------------------------------------------------------
## Inc risk ratio (crude) 1.22 (1.01, 1.48)
## Inc risk ratio (M-H) 1.53 (1.26, 1.87)
## Inc risk ratio (crude:M-H) 0.80
## Odds ratio (crude) 1.25 (1.01, 1.55)
## Odds ratio (M-H) 1.62 (1.30, 2.03)
## Odds ratio (crude:M-H) 0.77
## Attrib risk in the exposed (crude) * 2.02 (0.12, 3.92)
## Attrib risk in the exposed (M-H) * 4.37 (2.03, 6.71)
## Attrib risk (crude:M-H) 0.46
## -------------------------------------------------------------------
## M-H test of homogeneity of PRs: chi2(1) = 0.210 Pr>chi2 = 0.647
## M-H test of homogeneity of ORs: chi2(1) = 0.490 Pr>chi2 = 0.484
## Test that M-H adjusted OR = 1: chi2(1) = 18.325 Pr>chi2 = <0.001
## Wald confidence limits
## M-H: Mantel-Haenszel; CI: confidence interval
## * Outcomes per 100 population units
There are no hypothesis tests that can be performed to determine whether confounding exists. Hence, it is often necessary to adjust for confounders using the M-H adjustment method or multivariable regression models.
The M-H adjusted RR and and OR are higher than the crude RR and OR. Additionally, the M-H adjusted RR and OR for the two strata were similar. This suggests that there was confounding by Exercise in the overall cohort.
We can estimate the magnitude of confounding by looking at the relative change between the crude and adjusted measures of associations:
Magnitude of confounding for RR = \(\LARGE \frac{RR_{crude} - RR_{adjusted}}{RR_{adjusted}}\)
The greater than 10% rule of thumb is not a recommended method to discern whether confounding is present. It is best to have a solid framework to identify potential confounders and use a study design to mitigate their impact on the causal relationship between the exposure and outcome.
A general rule of thumb, if the magnitude of confounding is greater than 10%, then we can conclude that the variable is a confounder.
rr_crude <- 1.22
rr_adjusted <- 1.53
rr_change <- (rr_crude - rr_adjusted) / rr_adjusted
rr_change
## [1] -0.2026144
Magnitude of confounding for OR = \(\LARGE \frac{OR_{crude} - OR_{adjusted}}{OR_{adjusted}}\)
or_crude <- 1.25
or_adjusted <- 1.63
or_change <- (or_crude - or_adjusted) / or_adjusted
or_change
## [1] -0.2331288
Since the magnitude of confounding is greater than 10% for the RR and OR, we can conclude that Exercise was a confounder on the Cholesterol Status and Death relationship. Additionally, We reported that subjects who exercised were likely to have higher cholesterol, and we also reported that subjects who exercised had lower risk (or odds) of deaths. The crude RR and OR underestimated the association of Cholesterol Status and Death because of the large number of people who exercised had High Cholesterol.
Another type of issue that can impact the causal relationship between the exposure and outcome is an effect modifier. Effect modifiers occur when the third variable (e.g., Stress Level) modifies the exposure to outcome relationship. In other words, when you vary the Stress Level (High versus Low), the effect of the exposure to outcome pathway will change in different ways. See diagram below.
To assess effect modification, we stratify the groups into the different levels of the third variable (e.g., stress). Let’s assume that the stress variable has two levels (“High Stress” and “Low Stress”). We stratify the analysis based on these two levels of stree (“High Stress” and “Low Stress”).
We estimate the stratified RR and OR using the R commands that we have been using thus far.
##########################################
# Interactions or Effect Modification
##########################################
### Among subjects with High Stress ###
Table6 <- matrix(c(250, 75, 1500, 825), nrow = 2, ncol= 2)
Table6
## [,1] [,2]
## [1,] 250 1500
## [2,] 75 825
riskratio.wald(Table6, rev = c("both"))
## $data
## Outcome
## Predictor Disease2 Disease1 Total
## Exposed2 825 75 900
## Exposed1 1500 250 1750
## Total 2325 325 2650
##
## $measure
## risk ratio with 95% C.I.
## Predictor estimate lower upper
## Exposed2 1.000000 NA NA
## Exposed1 1.714286 1.341514 2.190641
##
## $p.value
## two-sided
## Predictor midp.exact fisher.exact chi.square
## Exposed2 NA NA NA
## Exposed1 5.766403e-06 6.30191e-06 9.69556e-06
##
## $correction
## [1] FALSE
##
## attr(,"method")
## [1] "Unconditional MLE & normal approximation (Wald) CI"
oddsratio.wald(Table6, rev = c("both"))
## $data
## Outcome
## Predictor Disease2 Disease1 Total
## Exposed2 825 75 900
## Exposed1 1500 250 1750
## Total 2325 325 2650
##
## $measure
## odds ratio with 95% C.I.
## Predictor estimate lower upper
## Exposed2 1.000000 NA NA
## Exposed1 1.833333 1.397199 2.405607
##
## $p.value
## two-sided
## Predictor midp.exact fisher.exact chi.square
## Exposed2 NA NA NA
## Exposed1 5.766403e-06 6.30191e-06 9.69556e-06
##
## $correction
## [1] FALSE
##
## attr(,"method")
## [1] "Unconditional MLE & normal approximation (Wald) CI"
### Among subjects with Low Stress ###
Table7<- matrix(c(25, 100, 425, 700), nrow = 2, ncol= 2)
Table7
## [,1] [,2]
## [1,] 25 425
## [2,] 100 700
riskratio.wald(Table7, rev = c("both"))
## $data
## Outcome
## Predictor Disease2 Disease1 Total
## Exposed2 700 100 800
## Exposed1 425 25 450
## Total 1125 125 1250
##
## $measure
## risk ratio with 95% C.I.
## Predictor estimate lower upper
## Exposed2 1.0000000 NA NA
## Exposed1 0.4444444 0.291213 0.6783037
##
## $p.value
## two-sided
## Predictor midp.exact fisher.exact chi.square
## Exposed2 NA NA NA
## Exposed1 4.861788e-05 5.207064e-05 8.55232e-05
##
## $correction
## [1] FALSE
##
## attr(,"method")
## [1] "Unconditional MLE & normal approximation (Wald) CI"
oddsratio.wald(Table7, rev = c("both"))
## $data
## Outcome
## Predictor Disease2 Disease1 Total
## Exposed2 700 100 800
## Exposed1 425 25 450
## Total 1125 125 1250
##
## $measure
## odds ratio with 95% C.I.
## Predictor estimate lower upper
## Exposed2 1.0000000 NA NA
## Exposed1 0.4117647 0.2613655 0.648709
##
## $p.value
## two-sided
## Predictor midp.exact fisher.exact chi.square
## Exposed2 NA NA NA
## Exposed1 4.861788e-05 5.207064e-05 8.55232e-05
##
## $correction
## [1] FALSE
##
## attr(,"method")
## [1] "Unconditional MLE & normal approximation (Wald) CI"
After stratifying based on the Stress Level, we can see that the point estimates for the risk ratio and odds ratio are different for the different strata. You can see the difference with the risk ratio for the stratum of subjects with High Stress versus the stratum of subjects with Low Stress (RR= 1.71 versus OR = 0.44) and with the the odds ratio (OR = 1.83 versus OR = 0.41). Among subjects with High Stress, those with High Cholesterol had a higher risk (and odds) of mortality compared to those with Low Cholesterol (RR=1.71; 95% CI: 1.34, 2.19 and OR = 1.83; 95% CI: 1.40, 2.41). However, among subjects with Low Stress, those with High Cholesterol had a lower risk (and odds) or mortality compared to those with Low Cholesterol (RR = 0.44; 95% CI: 0.29, 0.68 and OR = 0.41; 95% CI: 0.26, 0.65).
Effect modification is different from confounding because the direction of effect for the different strata are not aligned. For example, there is a positive association between Cholesterol Level and Death for subjects who have High Stress, but this becomes a negative association among subjects who have Low Stress.
These stratified results by Stress Level (“High Stress” v. “Low Stress”) generated conflicting measures of associations for the Cholesterol Status and Death relationship. Because the variable Stress Level provides disparate findings at different levels of stress, it is an effect modifier.
Using epitools
and epiR
will help you quickly estimate the risk and odds ratio for any 2 x 2 contingency table and perform adjustments using the M-H method. Moreover, you can also check for confounding and effect modification using this tool. Make sure to carefully set up your DAG diagram and identify the potential confounders and effect modifiers.
I used the Tufte style for this tutorial based on Allaire and Xie; see “A Tufte Handout Example”
Additional Tufte style resources are available at “Tint Is Not Tufte” by Allaire, Xie, and Eddelbuettel
This is a work in progress. I will make updates as appropriate. If there are any issues or concerns, please email me at internal.validity.blog@gmail.com