In this worksheet, you’ll investigate how adding a control variable can affect the relationship between two variables. You’ll:
Estimate a bivariate regression (e.g., child height-for-age on HWISE score).
Add a potential confounder (e.g., rural vs. urban) in a multivariable regression.
Compare the results visually using jtools::plot_summs().
Before regressions: Save your cleaned data
So far, we have been working from the raw DHS data. In previous assignments, you recoded (using mutate) your main variables. Now, save your dataset from previous assignments, by adding the command write.csv(dhs_clean, filename = "Temp/dhs_clean.csv") at the end of your script. Then run the whole script.
Click below to see an example of a script making dhs_clean.
Next create a new R script, named “regressions.R”. Save it in your “Code” folder.
library(tidyverse)# Install jtools if not already installedif (!requireNamespace("jtools", quietly =TRUE)) {install.packages("jtools")}library(jtools)dhs_clean <-read.csv("Temp/dhs_clean.csv")
Choose Your Variables
# Example variables – replace with your own choicesdependent_var <-"height_for_age"main_predictor <-"WISE_score"confounder <-"urban"# Re-coded: 1 = urban, 0 = rural
Rescale Your Variables
When running regressions, it is best to standardize your variables. This allows you to easily compare the magnitude of different coefficients.
Leave categorical variables as they are.
Divide your continuous variables by 2 times their standard deviation.
This makes continuous variables roughly comparable to binary variables.1
# Bivariate modelmodel_biv <-lm(height_for_age ~ WISE_score, data = dhs_clean)# Multivariable model with confoundermodel_mult <-lm(height_for_age ~ WISE_score + urban, data = dhs_clean)
Compare the Models Visually
# Plot both models side by side to compare coefficients for hwise_scoreplot_summs(model_biv, model_mult,model.names =c("Bivariate", "Multivariable"),ci_level = .68)
Interpretation Questions
How is the independent related to the dependent?
Higher HWISE scores (more water insecurity) is associated with lower height-for-age.
How did adding the confounder change the estimated association between dependent and independent? Why do you think this is?
Adding the urban variable attenuated the association between the HWISE score and height-for-age. Urban households must have lower HWISE scores and also taller children.
Which variable seems to have a stronger relationship with the dependent: your independent or the confounder?
Urban-rural has a stronger relationship with height-for-age than HWISE does.
Footnotes
If you are interested in more explanation, see this paper.↩︎