Lab 8

Updates

  • No lab next week (5/26), Zoom office hours instead!
  • Last day to turn in any homework: Tuesday 5/30

Variables Beyond Covariates

Confounders, colliders, mediators, moderators, etc.

Covariates

  • aka \(X\) variables, independent variables, explanatory variables, etc.
  • Some texts call covariates the variables that directly affect our outcome variable, like in the diagram to the right.
  • Other texts note that covariates are direct, indirect, and other influential variables as you will see in the subsequent slides.

Mediating (Intervening) Variables

  • If a variable “intervenes” or “mediates” an effect, we are saying that the mediating variable is allowing/within the relationship between an independent variable and the dependent variable
  • Mediators carry (some or all) effect from an indirect variable to the outcome variable.
  • You can have partial or complete mediation.
  • Helps us answer “how” and “why”
  • Another resource

Confounders

  • Confounders are a problem if you do not control for them because they distort the association between the outcome and the covariate
  • They affect both the outcome and the covariate

Moderating Variables

  • Sounds similar to mediating, but it is distinct
  • Affects the direction or strength between independent variable and dependent variable
  • E.g. you have a variable of interest like education and an outcome like salary, moderators may be gender (similar to interactions!)

Covariates

  • There are others to be wary of like colliders and proxy confounders, but not for this class!
  • You can actually leverage relationships for causal purposes, such as instrumental variables
  • Not discussing the causal techniques in lab, but willing to discuss in office hours!

Mediating/Intervening Variables

KHB Method

full.logit <- glm(happy.b ~ prestige + idv.income + age + sex + 
                    married + health.b + relig.b,  
                   family = binomial(logit), 
                   data = df.gss)

# reduced model (without mediator)
med.logit <- glm(happy.b ~ prestige + age + sex + married + health.b + relig.b, 
                   family = binomial(logit), 
                   data = df.gss)

# It looks like khb only works if you drop all NAs
  # If you don't drop NAs (try `na.omit`), you'll get the following error message when executing khb:
      # Error in data.frame(model.frame(reduced), zresid): arguments imply differing number of rows

KHB Method

khb::khb(reduced = med.logit, full = full.logit)
KHB method
Model type: glm lm (logit) 
Variables of interest: 
prestige, age, sex, married (marriedmarried), health.b (health.bhealthy), relig.b (relig.breligious)
Z variables (mediators): idv.income

Summary of confounding
                        Ratio   Percentage Rescaling
prestige            -2.220586   145.033149    1.2189
age                  0.071806 -1292.647872    0.7933
sex                  0.684802   -46.027648    0.9999
marriedmarried       1.123962    11.029021    1.0115
health.bhealthy      1.065202     6.121052    1.0087
relig.breligious     0.979365    -2.106985    1.0030
------------------------------------------------------------------
prestige :
          Estimate Std. Error z value Pr(>|z|)   
Reduced  0.0024662  0.0033044  0.7463 0.455464   
Full    -0.0011106  0.0034212 -0.3246 0.745465   
Diff     0.0035768  0.0011171  3.2019 0.001365 **
------------------------------------------------------------------
age :
           Estimate  Std. Error z value Pr(>|z|)
Reduced  0.00002524  0.00267151  0.0094   0.9925
Full     0.00035150  0.00267288  0.1315   0.8954
Diff    -0.00032626  0.00021204 -1.5387   0.1239
------------------------------------------------------------------
sex :
         Estimate Std. Error z value Pr(>|z|)   
Reduced  0.066500   0.090071  0.7383 0.460328   
Full     0.097109   0.090513  1.0729 0.283325   
Diff    -0.030609   0.011215 -2.7293 0.006346 **
------------------------------------------------------------------
marriedmarried :
        Estimate Std. Error z value              Pr(>|z|)    
Reduced 1.022787   0.094504 10.8226 < 0.00000000000000022 ***
Full    0.909983   0.098513  9.2372 < 0.00000000000000022 ***
Diff    0.112803   0.035072  3.2163              0.001298 ** 
------------------------------------------------------------------
health.bhealthy :
        Estimate Std. Error z value              Pr(>|z|)    
Reduced 0.918316   0.096903  9.4767 < 0.00000000000000022 ***
Full    0.862105   0.097737  8.8207 < 0.00000000000000022 ***
Diff    0.056211   0.018736  3.0001              0.002699 ** 
------------------------------------------------------------------
relig.breligious :
          Estimate Std. Error z value Pr(>|z|)   
Reduced  0.2689653  0.0981354  2.7408 0.006130 **
Full     0.2746324  0.0981600  2.7978 0.005145 **
Diff    -0.0056671  0.0071413 -0.7936 0.427451   

KHB Method Interpretation

  • Because we cannot compare the coefficients to check mediators, we need to use a different method to decompose direct and indirect effects
  • When we look at the “reduced” line of output, it tells us the total effect of the variable
  • The “full” line shows us the direct effect of the variable (accounting for mediation) and the “diff” line tells us the indirect effect

Tables

( Advice from Alice )

Tables

Saving Lists to Use in Tables

  • It may be easier to create a list of variable names to use in the tables

  • Do this if you have a lot or plan to use a few different tables

Example:

 df.names <- df.gss %>%
                      # new name = old name 
       dplyr::select("Occupational prestige" = prestige,
                     "Age" = age,
                     "Personal income (thousands USD)" = idv.income,
                     "Sex" = sex,
                     "Marital status" = married,
                     "Religiosity" = relig.b,
                     "Self-reported health" = health.b,
                     "Happiness" = happy.b)

Exporting Tables

datasummary_skim(df.names,
                 output = "flextable",
                 fmt = 2,
                 title = "Descriptive Statistics for Continuous Variables",
                 notes = str_c("N = ", count(df.names)),
                 histogram = F
                 ) %>% 
  autofit() %>% 
  save_as_docx(path = "summary_flextable.docx")

Regression Output

  • First decide what you want to include in your tables

  • Do you need BIC? N? Exponentiated or Logit? All the variables, or omit the controls?

Regression Output

modelsummary(list("Reduced Model" = med.logit, "Full Model" = full.logit),
             fmt = 4,
             stars = T,
             coef_map = c( 
               "prestige" = "Occupational prestige", # list key vars first
               "idv.income" = "Income (thousands USD)", # unit of measurement
               # provide reference categories for factor variables
               "health.bhealthy" = "Self-reported health (1 = healthy)",
               "relig.breligious" = "Religiosity (1 = religious)",
               "sexfemale" = "Sex (1 = female)",
               "marriedmarried" = "Marital status (1 = married)",
               "age" = "Age",
               "(Intercept)" = "(Intercept)" # shift order of intercept to last
             ),
             gof_map = c("nobs", "aic", "bic", "logLik"), # change these as necessary
             exponentiate = T, # remove exponentiation if not applicable for you
             # provide informative title and notes
             title = "Binary Logit of Reported Happiness with and without Income",
             notes = c("Standard error in parentheses",
                       "Coefficients reported as odds ratios",
                       "KHB method to decompose direct and indirect effects (mediator: personal income)"),
             output = "flextable") %>% 
  autofit()

Reduced Model

Full Model

Occupational prestige

1.0020

0.9989

(0.0033)

(0.0034)

Income (thousands USD)

1.0049**

(0.0015)

Self-reported health (1 = healthy)

2.4853***

2.3681***

(0.2405)

(0.2315)

Religiosity (1 = religious)

1.3076**

1.3160**

(0.1281)

(0.1292)

Marital status (1 = married)

2.7488***

2.4843***

(0.2590)

(0.2447)

Age

1.0000

1.0004

(0.0027)

(0.0027)

(Intercept)

0.7759

0.7624

(0.1957)

(0.1925)

Num.Obs.

3204

3204

AIC

3175.1

3165.6

BIC

3217.6

3214.2

Log.Lik.

-1580.534

-1574.798

+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001

Standard error in parentheses

Coefficients reported as odds ratios

KHB method to decompose direct and indirect effects (mediator: personal income)

Bonus: Stargazer \(\LaTeX\)

stargazer::stargazer(med.logit, full.logit,
                     type = "latex",
                     title = "Binary Logit of Reported Happiness with and without Income",
                     covariate.labels = c("Occupational Prestige",
                                          "Income (thousands USD)",
                                          "Self-reported health (1 = healthy)",
                                          "Religiosity (1 = religious)",
                                          "Sex (1 = female)",
                                          "Marital Status (1 = married)",
                                          "Age",
                                          "(Intercept)"),
                     dep.var.labels = "Happiness",
                     column.labels = c("Reduced", "Full Model"),
                     out = "regressiontable.html")

% Table created by stargazer v.5.2.3 by Marek Hlavac, Social Policy Institute. E-mail: marek.hlavac at gmail.com
% Date and time: Fri, May 19, 2023 - 10:49:21 AM
\begin{table}[!htbp] \centering 
  \caption{Binary Logit of Reported Happiness with and without Income} 
  \label{} 
\begin{tabular}{@{\extracolsep{5pt}}lcc} 
\\[-1.8ex]\hline 
\hline \\[-1.8ex] 
 & \multicolumn{2}{c}{\textit{Dependent variable:}} \\ 
\cline{2-3} 
\\[-1.8ex] & \multicolumn{2}{c}{Happiness} \\ 
 & Reduced & Full Model \\ 
\\[-1.8ex] & (1) & (2)\\ 
\hline \\[-1.8ex] 
 Occupational Prestige & 0.002 & $-$0.001 \\ 
  & (0.003) & (0.003) \\ 
  & & \\ 
 Income (thousands USD) &  & 0.005$^{***}$ \\ 
  &  & (0.001) \\ 
  & & \\ 
 Self-reported health (1 = healthy) & 0.00003 & 0.0004 \\ 
  & (0.003) & (0.003) \\ 
  & & \\ 
 Religiosity (1 = religious) & 0.067 & 0.097 \\ 
  & (0.090) & (0.091) \\ 
  & & \\ 
 Sex (1 = female) & 1.011$^{***}$ & 0.910$^{***}$ \\ 
  & (0.094) & (0.099) \\ 
  & & \\ 
 Marital Status (1 = married) & 0.910$^{***}$ & 0.862$^{***}$ \\ 
  & (0.097) & (0.098) \\ 
  & & \\ 
 Age & 0.268$^{***}$ & 0.275$^{***}$ \\ 
  & (0.098) & (0.098) \\ 
  & & \\ 
 (Intercept) & $-$0.254 & $-$0.271 \\ 
  & (0.252) & (0.253) \\ 
  & & \\ 
\hline \\[-1.8ex] 
Observations & 3,204 & 3,204 \\ 
Log Likelihood & $-$1,580.534 & $-$1,574.798 \\ 
Akaike Inf. Crit. & 3,175.068 & 3,165.596 \\ 
\hline 
\hline \\[-1.8ex] 
\textit{Note:}  & \multicolumn{2}{r}{$^{*}$p$<$0.1; $^{**}$p$<$0.05; $^{***}$p$<$0.01} \\ 
\end{tabular} 
\end{table}