We have the following variables: pedu2, parental education which takes the values 0 low and 1 high
performance, performance in a test in 2007, continuos
tutor, private tutoring in 2008, 0 no 1 yes
parental education –> performance in 2007 –> Tutoring in 2008 A M Y
fit1<-lm(performance ~ pedu2, dtf)
stargazer(fit1, type="html")
| Dependent variable: | |
| performance | |
| pedu2 | 14.435*** |
| (0.203) | |
| Constant | 51.937*** |
| (0.150) | |
| Observations | 25,237 |
| R2 | 0.167 |
| Adjusted R2 | 0.167 |
| Residual Std. Error | 16.043 (df = 25235) |
| F Statistic | 5,066.230*** (df = 1; 25235) |
| Note: | p<0.1; p<0.05; p<0.01 |
fit2<-lm(performance ~ tutor, dtf)
stargazer(fit2, type="html")
| Dependent variable: | |
| performance | |
| tutor | -10.663*** |
| (0.372) | |
| Constant | 60.818*** |
| (0.114) | |
| Observations | 25,237 |
| R2 | 0.032 |
| Adjusted R2 | 0.032 |
| Residual Std. Error | 17.299 (df = 25235) |
| F Statistic | 822.945*** (df = 1; 25235) |
| Note: | p<0.1; p<0.05; p<0.01 |
Total effect of A on Y is negative BUT (!) Direct effect of A on Y is positive controlling for M
fit3<-lm(tutor~ pedu2 + performance, dtf)
stargazer(fit3, type="html")
| Dependent variable: | |
| tutor | |
| pedu2 | 0.018*** |
| (0.004) | |
| performance | -0.003*** |
| (0.0001) | |
| Constant | 0.275*** |
| (0.006) | |
| Observations | 25,237 |
| R2 | 0.032 |
| Adjusted R2 | 0.032 |
| Residual Std. Error | 0.288 (df = 25234) |
| F Statistic | 422.401*** (df = 2; 25234) |
| Note: | p<0.1; p<0.05; p<0.01 |
beta1 <- lm(performance ~ pedu2 , dtf)
summary(beta1)
##
## Call:
## lm(formula = performance ~ pedu2, data = dtf)
##
## Residuals:
## Min 1Q Median 3Q Max
## -66.372 -10.872 1.128 12.063 47.063
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 51.9369 0.1497 346.83 <2e-16 ***
## pedu2 14.4348 0.2028 71.18 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 16.04 on 25235 degrees of freedom
## Multiple R-squared: 0.1672, Adjusted R-squared: 0.1672
## F-statistic: 5066 on 1 and 25235 DF, p-value: < 2.2e-16
First we are going to calculate using simple OLS model and then adding a interaction term:
gamma <- lm(tutor ~ pedu2 + performance, dtf) #simple OLS
gamma.inter<- lm(tutor ~ pedu2*performance + pedu2 + performance, dtf) #OLS with interaction
Now we are going to see the same results using probit model and then the same but with an interaction term:
gamma.probit<- glm(tutor ~ pedu2 + performance, family = binomial(link="probit"), dtf) #simple Probit
gamma.probit.inter<- glm(tutor ~ pedu2*performance+ pedu2 + performance, family = binomial(link="probit"), dtf) # probit with interaction
We already set a seed and loaded “mediation” packages. Note that we are seting only 50 simulations to save time. We will therefore run 4 different mediation analysis.
First, med.out is a simple OLS model Second, med.out.inter includes a interaction term between performance and parental education Third, med.out.prob is a probit model Finally, med.out.prob.inter is a probit model that includes the above mentioned interaction term.
Remember that: ACME is the average causal mediaiton effects: indirect effect of parental education on having a private tutor that goes through our mediatior (performance)
ADE is the average direct effects: direct effect of parental education on tutoring
Total effect: direct + indirect
Prop.mediated describes the proportion of the effect of pedu on tutor that goes through the mediatior (perf).
med.out <- mediate(beta1, gamma, treat="pedu2", mediator="performance", sims=50)
summary(med.out)
##
## Causal Mediation Analysis
##
## Quasi-Bayesian Confidence Intervals
##
## Estimate 95% CI Lower 95% CI Upper p-value
## ACME -0.04594 -0.04900 -0.04 <2e-16 ***
## ADE 0.01799 0.00965 0.03 <2e-16 ***
## Total Effect -0.02795 -0.03606 -0.02 <2e-16 ***
## Prop. Mediated 1.66664 1.28160 2.33 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Sample Size Used: 25237
##
##
## Simulations: 50
med.out.inter <- mediate(beta1, gamma.inter, treat="pedu2", mediator="performance", sims=5)
summary(med.out.inter)
##
## Causal Mediation Analysis
##
## Quasi-Bayesian Confidence Intervals
##
## Estimate 95% CI Lower 95% CI Upper p-value
## ACME (control) -0.03971 -0.04193 -0.04 <2e-16 ***
## ACME (treated) -0.05167 -0.05662 -0.05 <2e-16 ***
## ADE (control) 0.02481 0.01874 0.03 <2e-16 ***
## ADE (treated) 0.01285 0.00966 0.02 <2e-16 ***
## Total Effect -0.02686 -0.03213 -0.02 <2e-16 ***
## Prop. Mediated (control) 1.40412 1.30565 1.86 <2e-16 ***
## Prop. Mediated (treated) 1.99894 1.59318 2.26 <2e-16 ***
## ACME (average) -0.04569 -0.04881 -0.04 <2e-16 ***
## ADE (average) 0.01883 0.01420 0.02 <2e-16 ***
## Prop. Mediated (average) 1.70153 1.44941 2.06 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Sample Size Used: 25237
##
##
## Simulations: 5
med.out.prob<- mediate(beta1, gamma.probit, treat="pedu2", mediator="performance", sims=5)
summary(med.out.prob)
##
## Causal Mediation Analysis
##
## Quasi-Bayesian Confidence Intervals
##
## Estimate 95% CI Lower 95% CI Upper p-value
## ACME (control) -0.0421 -0.0433 -0.04 <2e-16 ***
## ACME (treated) -0.0485 -0.0492 -0.05 <2e-16 ***
## ADE (control) 0.0220 0.0187 0.03 <2e-16 ***
## ADE (treated) 0.0155 0.0132 0.02 <2e-16 ***
## Total Effect -0.0265 -0.0296 -0.02 <2e-16 ***
## Prop. Mediated (control) 1.5085 1.4579 2.10 <2e-16 ***
## Prop. Mediated (treated) 1.7217 1.6548 2.54 <2e-16 ***
## ACME (average) -0.0453 -0.0463 -0.04 <2e-16 ***
## ADE (average) 0.0187 0.0160 0.03 <2e-16 ***
## Prop. Mediated (average) 1.6151 1.5564 2.32 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Sample Size Used: 25237
##
##
## Simulations: 5
med.out.prob.inter <- mediate(beta1, gamma.probit.inter, treat="pedu2", mediator="performance", sims=5)
summary(med.out.prob.inter)
##
## Causal Mediation Analysis
##
## Quasi-Bayesian Confidence Intervals
##
## Estimate 95% CI Lower 95% CI Upper p-value
## ACME (control) -0.03406 -0.03656 -0.03 <2e-16 ***
## ACME (treated) -0.05701 -0.06089 -0.05 <2e-16 ***
## ADE (control) 0.03336 0.02684 0.04 <2e-16 ***
## ADE (treated) 0.01040 0.00801 0.01 <2e-16 ***
## Total Effect -0.02365 -0.02724 -0.02 <2e-16 ***
## Prop. Mediated (control) 1.43915 1.29441 1.69 <2e-16 ***
## Prop. Mediated (treated) 2.39389 1.98634 3.13 <2e-16 ***
## ACME (average) -0.04553 -0.04762 -0.04 <2e-16 ***
## ADE (average) 0.02188 0.01743 0.03 <2e-16 ***
## Prop. Mediated (average) 1.91652 1.64038 2.41 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Sample Size Used: 25237
##
##
## Simulations: 5
First we need to pass mediate output through medsens function, we will do only for the OLS models, the probit ones take too long.
sens.cont <- medsens(med.out,rho.by=.1, eps=.01, effect.type="both")
summary(sens.cont)
##
## Mediation Sensitivity Analysis for Average Causal Mediation Effect
##
## Rho at which ACME = 0: -0.2
## R^2_M*R^2_Y* at which ACME = 0: 0.04
## R^2_M~R^2_Y~ at which ACME = 0: 0.0322
##
##
## Mediation Sensitivity Analysis for Average Direct Effect
##
## Sensitivity Region
##
## Rho ADE 95% CI Lower 95% CI Upper R^2_M*R^2_Y* R^2_M~R^2_Y~
## [1,] -0.1 -0.0076 -0.0154 3e-04 0.01 0.0081
##
## Rho at which ADE = 0: -0.1
## R^2_M*R^2_Y* at which ADE = 0: 0.01
## R^2_M~R^2_Y~ at which ADE = 0: 0.0081
Plot true ACMEs and ADEs as functions of rho and Plot true ACMEs and ADEs as functions of “R square tildes”
par.orig <- par(mfrow = c(2,2))
plot1<-plot(sens.cont)
# Plot true ACMEs and ADEs as functions of "R square tildes"
plot1<- plot1 + plot(sens.cont, sens.par="R2", r.type="total", sign.prod="positive")
par(par.orig)
sens.cont <- medsens(med.out.inter,rho.by=.1, eps=.01, effect.type="both")
summary(sens.cont)
##
## Mediation Sensitivity Analysis: Average Mediation Effect
##
## Rho at which ACME for Control Group = 0: -0.2
## R^2_M*R^2_Y* at which ACME for Control Group = 0: 0.04
## R^2_M~R^2_Y~ at which ACME for Control Group = 0: 0.0322
##
##
## Sensitivity Region: ACME for Treatment Group
##
## Rho ACME(treated) 95% CI Lower 95% CI Upper R^2_M*R^2_Y*
## [1,] -0.2 0.0013 -0.0032 0.0058 0.04
## R^2_M~R^2_Y~
## [1,] 0.0322
##
## Rho at which ACME for Treatment Group = 0: -0.2
## R^2_M*R^2_Y* at which ACME for Treatment Group = 0: 0.04
## R^2_M~R^2_Y~ at which ACME for Treatment Group = 0: 0.0322
##
##
## Mediation Sensitivity Analysis: Average Direct Effect
##
## Sensitivity Region: ADE for Control Group
##
## Rho ADE(control) 95% CI Lower 95% CI Upper R^2_M*R^2_Y* R^2_M~R^2_Y~
## [1,] -0.1 -0.0018 -0.0102 0.0067 0.01 0.0081
##
## Rho at which ADE for Control Group = 0: -0.1
## R^2_M*R^2_Y* at which ADE for Control Group = 0: 0.01
## R^2_M~R^2_Y~ at which ADE for Control Group = 0: 0.0081
##
##
## Rho at which ADE for Treatment Group = 0: 0
## R^2_M*R^2_Y* at which ADE for Treatment Group = 0: 0
## R^2_M~R^2_Y~ at which ADE for Treatment Group = 0: 0
# PLOT 2 & 3: ACME AND ADE FOR MED.OUT.INTER
# Again, pass mediate output through medsens function
sens.cont.inter <- medsens(med.out.inter,rho.by=.1, eps=.01, effect.type="both")
summary(sens.cont.inter)
##
## Mediation Sensitivity Analysis: Average Mediation Effect
##
## Rho at which ACME for Control Group = 0: -0.2
## R^2_M*R^2_Y* at which ACME for Control Group = 0: 0.04
## R^2_M~R^2_Y~ at which ACME for Control Group = 0: 0.0322
##
##
## Sensitivity Region: ACME for Treatment Group
##
## Rho ACME(treated) 95% CI Lower 95% CI Upper R^2_M*R^2_Y*
## [1,] -0.2 0.0013 -0.0032 0.0058 0.04
## R^2_M~R^2_Y~
## [1,] 0.0322
##
## Rho at which ACME for Treatment Group = 0: -0.2
## R^2_M*R^2_Y* at which ACME for Treatment Group = 0: 0.04
## R^2_M~R^2_Y~ at which ACME for Treatment Group = 0: 0.0322
##
##
## Mediation Sensitivity Analysis: Average Direct Effect
##
## Sensitivity Region: ADE for Control Group
##
## Rho ADE(control) 95% CI Lower 95% CI Upper R^2_M*R^2_Y* R^2_M~R^2_Y~
## [1,] -0.1 -0.0018 -0.0102 0.0067 0.01 0.0081
##
## Rho at which ADE for Control Group = 0: -0.1
## R^2_M*R^2_Y* at which ADE for Control Group = 0: 0.01
## R^2_M~R^2_Y~ at which ADE for Control Group = 0: 0.0081
##
##
## Rho at which ADE for Treatment Group = 0: 0
## R^2_M*R^2_Y* at which ADE for Treatment Group = 0: 0
## R^2_M~R^2_Y~ at which ADE for Treatment Group = 0: 0
# Plot true ACMEs and ADEs as functions of rho
par.orig <- par(mfrow = c(2,2))
plot2<-plot(sens.cont.inter)
# Plot true ACMEs and ADEs as functions of "R square tildes"
plot3<-plot(sens.cont.inter, sens.par="R2", r.type="total", sign.prod="positive")
We have U (dislexia) that affects M and Y
M is a collider; controlling for M creates a negative association between U and A that is likely to bias the direcrt effect of A controlling for M
M= alpha1 + beta1A + e1 where e1=delta1U
Y= alpha2 + beta2A + gammaM + e2 where e2=delta2U
Note that with dislexia the rho=corr(e1,e2) is likely to be negative (even if the same variable U is not observed) Following the results from the sensitivity analysis one could argue that we are overestimating.