Lab 8

Updates

No lab next week (5/26), Zoom office hours instead!
Last day to turn in any homework: Tuesday 5/30

Variables Beyond Covariates

Confounders, colliders, mediators, moderators, etc.

Covariates

aka \(X\) variables, independent variables, explanatory variables, etc.
Some texts call covariates the variables that directly affect our outcome variable, like in the diagram to the right.
Other texts note that covariates are direct, indirect, and other influential variables as you will see in the subsequent slides.

Mediating (Intervening) Variables

If a variable “intervenes” or “mediates” an effect, we are saying that the mediating variable is allowing/within the relationship between an independent variable and the dependent variable
Mediators carry (some or all) effect from an indirect variable to the outcome variable.
You can have partial or complete mediation.
Helps us answer “how” and “why”
Another resource

Confounders

Confounders are a problem if you do not control for them because they distort the association between the outcome and the covariate
They affect both the outcome and the covariate

Moderating Variables

Sounds similar to mediating, but it is distinct
Affects the direction or strength between independent variable and dependent variable
E.g. you have a variable of interest like education and an outcome like salary, moderators may be gender (similar to interactions!)

Covariates

There are others to be wary of like colliders and proxy confounders, but not for this class!
You can actually leverage relationships for causal purposes, such as instrumental variables
Not discussing the causal techniques in lab, but willing to discuss in office hours!

Mediating/Intervening Variables

KHB Method

full.logit <- glm(happy.b ~ prestige + idv.income + age + sex + 
                    married + health.b + relig.b,  
                   family = binomial(logit), 
                   data = df.gss)

# reduced model (without mediator)
med.logit <- glm(happy.b ~ prestige + age + sex + married + health.b + relig.b, 
                   family = binomial(logit), 
                   data = df.gss)

# It looks like khb only works if you drop all NAs
  # If you don't drop NAs (try `na.omit`), you'll get the following error message when executing khb:
      # Error in data.frame(model.frame(reduced), zresid): arguments imply differing number of rows

KHB Method

khb::khb(reduced = med.logit, full = full.logit)

KHB method
Model type: glm lm (logit) 
Variables of interest: 
prestige, age, sex, married (marriedmarried), health.b (health.bhealthy), relig.b (relig.breligious)
Z variables (mediators): idv.income

Summary of confounding
                        Ratio   Percentage Rescaling
prestige            -2.220586   145.033149    1.2189
age                  0.071806 -1292.647872    0.7933
sex                  0.684802   -46.027648    0.9999
marriedmarried       1.123962    11.029021    1.0115
health.bhealthy      1.065202     6.121052    1.0087
relig.breligious     0.979365    -2.106985    1.0030
------------------------------------------------------------------
prestige :
          Estimate Std. Error z value Pr(>|z|)   
Reduced  0.0024662  0.0033044  0.7463 0.455464   
Full    -0.0011106  0.0034212 -0.3246 0.745465   
Diff     0.0035768  0.0011171  3.2019 0.001365 **
------------------------------------------------------------------
age :
           Estimate  Std. Error z value Pr(>|z|)
Reduced  0.00002524  0.00267151  0.0094   0.9925
Full     0.00035150  0.00267288  0.1315   0.8954
Diff    -0.00032626  0.00021204 -1.5387   0.1239
------------------------------------------------------------------
sex :
         Estimate Std. Error z value Pr(>|z|)   
Reduced  0.066500   0.090071  0.7383 0.460328   
Full     0.097109   0.090513  1.0729 0.283325   
Diff    -0.030609   0.011215 -2.7293 0.006346 **
------------------------------------------------------------------
marriedmarried :
        Estimate Std. Error z value              Pr(>|z|)    
Reduced 1.022787   0.094504 10.8226 < 0.00000000000000022 ***
Full    0.909983   0.098513  9.2372 < 0.00000000000000022 ***
Diff    0.112803   0.035072  3.2163              0.001298 ** 
------------------------------------------------------------------
health.bhealthy :
        Estimate Std. Error z value              Pr(>|z|)    
Reduced 0.918316   0.096903  9.4767 < 0.00000000000000022 ***
Full    0.862105   0.097737  8.8207 < 0.00000000000000022 ***
Diff    0.056211   0.018736  3.0001              0.002699 ** 
------------------------------------------------------------------
relig.breligious :
          Estimate Std. Error z value Pr(>|z|)   
Reduced  0.2689653  0.0981354  2.7408 0.006130 **
Full     0.2746324  0.0981600  2.7978 0.005145 **
Diff    -0.0056671  0.0071413 -0.7936 0.427451

KHB Method Interpretation

Because we cannot compare the coefficients to check mediators, we need to use a different method to decompose direct and indirect effects
When we look at the “reduced” line of output, it tells us the total effect of the variable
The “full” line shows us the direct effect of the variable (accounting for mediation) and the “diff” line tells us the indirect effect

Tables

(✨ Advice from Alice ✨)

Tables

Saving Lists to Use in Tables

It may be easier to create a list of variable names to use in the tables
Do this if you have a lot or plan to use a few different tables

Example:

 df.names <- df.gss %>%
                      # new name = old name 
       dplyr::select("Occupational prestige" = prestige,
                     "Age" = age,
                     "Personal income (thousands USD)" = idv.income,
                     "Sex" = sex,
                     "Marital status" = married,
                     "Religiosity" = relig.b,
                     "Self-reported health" = health.b,
                     "Happiness" = happy.b)

Exporting Tables

datasummary_skim(df.names,
                 output = "flextable",
                 fmt = 2,
                 title = "Descriptive Statistics for Continuous Variables",
                 notes = str_c("N = ", count(df.names)),
                 histogram = F
                 ) %>% 
  autofit() %>% 
  save_as_docx(path = "summary_flextable.docx")

Regression Output

First decide what you want to include in your tables
Do you need BIC? N? Exponentiated or Logit? All the variables, or omit the controls?

Regression Output

modelsummary(list("Reduced Model" = med.logit, "Full Model" = full.logit),
             fmt = 4,
             stars = T,
             coef_map = c( 
               "prestige" = "Occupational prestige", # list key vars first
               "idv.income" = "Income (thousands USD)", # unit of measurement
               # provide reference categories for factor variables
               "health.bhealthy" = "Self-reported health (1 = healthy)",
               "relig.breligious" = "Religiosity (1 = religious)",
               "sexfemale" = "Sex (1 = female)",
               "marriedmarried" = "Marital status (1 = married)",
               "age" = "Age",
               "(Intercept)" = "(Intercept)" # shift order of intercept to last
             ),
             gof_map = c("nobs", "aic", "bic", "logLik"), # change these as necessary
             exponentiate = T, # remove exponentiation if not applicable for you
             # provide informative title and notes
             title = "Binary Logit of Reported Happiness with and without Income",
             notes = c("Standard error in parentheses",
                       "Coefficients reported as odds ratios",
                       "KHB method to decompose direct and indirect effects (mediator: personal income)"),
             output = "flextable") %>% 
  autofit()


	Reduced Model	Full Model
Occupational prestige	1.0020	0.9989
	(0.0033)	(0.0034)
Income (thousands USD)		1.0049**
		(0.0015)
Self-reported health (1 = healthy)	2.4853***	2.3681***
	(0.2405)	(0.2315)
Religiosity (1 = religious)	1.3076**	1.3160**
	(0.1281)	(0.1292)
Marital status (1 = married)	2.7488***	2.4843***
	(0.2590)	(0.2447)
Age	1.0000	1.0004
	(0.0027)	(0.0027)
(Intercept)	0.7759	0.7624
	(0.1957)	(0.1925)
Num.Obs.	3204	3204
AIC	3175.1	3165.6
BIC	3217.6	3214.2
Log.Lik.	-1580.534	-1574.798
+ p < 0.1, * p < 0.05, p < 0.01, * p < 0.001
Standard error in parentheses
Coefficients reported as odds ratios
KHB method to decompose direct and indirect effects (mediator: personal income)

Bonus: Stargazer \(\LaTeX\)

stargazer::stargazer(med.logit, full.logit,
                     type = "latex",
                     title = "Binary Logit of Reported Happiness with and without Income",
                     covariate.labels = c("Occupational Prestige",
                                          "Income (thousands USD)",
                                          "Self-reported health (1 = healthy)",
                                          "Religiosity (1 = religious)",
                                          "Sex (1 = female)",
                                          "Marital Status (1 = married)",
                                          "Age",
                                          "(Intercept)"),
                     dep.var.labels = "Happiness",
                     column.labels = c("Reduced", "Full Model"),
                     out = "regressiontable.html")


% Table created by stargazer v.5.2.3 by Marek Hlavac, Social Policy Institute. E-mail: marek.hlavac at gmail.com
% Date and time: Fri, May 19, 2023 - 10:49:21 AM
\begin{table}[!htbp] \centering 
  \caption{Binary Logit of Reported Happiness with and without Income} 
  \label{} 
\begin{tabular}{@{\extracolsep{5pt}}lcc} 
\\[-1.8ex]\hline 
\hline \\[-1.8ex] 
 & \multicolumn{2}{c}{\textit{Dependent variable:}} \\ 
\cline{2-3} 
\\[-1.8ex] & \multicolumn{2}{c}{Happiness} \\ 
 & Reduced & Full Model \\ 
\\[-1.8ex] & (1) & (2)\\ 
\hline \\[-1.8ex] 
 Occupational Prestige & 0.002 & $-$0.001 \\ 
  & (0.003) & (0.003) \\ 
  & & \\ 
 Income (thousands USD) &  & 0.005$^{***}$ \\ 
  &  & (0.001) \\ 
  & & \\ 
 Self-reported health (1 = healthy) & 0.00003 & 0.0004 \\ 
  & (0.003) & (0.003) \\ 
  & & \\ 
 Religiosity (1 = religious) & 0.067 & 0.097 \\ 
  & (0.090) & (0.091) \\ 
  & & \\ 
 Sex (1 = female) & 1.011$^{***}$ & 0.910$^{***}$ \\ 
  & (0.094) & (0.099) \\ 
  & & \\ 
 Marital Status (1 = married) & 0.910$^{***}$ & 0.862$^{***}$ \\ 
  & (0.097) & (0.098) \\ 
  & & \\ 
 Age & 0.268$^{***}$ & 0.275$^{***}$ \\ 
  & (0.098) & (0.098) \\ 
  & & \\ 
 (Intercept) & $-$0.254 & $-$0.271 \\ 
  & (0.252) & (0.253) \\ 
  & & \\ 
\hline \\[-1.8ex] 
Observations & 3,204 & 3,204 \\ 
Log Likelihood & $-$1,580.534 & $-$1,574.798 \\ 
Akaike Inf. Crit. & 3,175.068 & 3,165.596 \\ 
\hline 
\hline \\[-1.8ex] 
\textit{Note:}  & \multicolumn{2}{r}{$^{*}$p$<$0.1; $^{**}$p$<$0.05; $^{***}$p$<$0.01} \\ 
\end{tabular} 
\end{table}