set.seed(312)
n <- 50
y_j0t0 <- scale(rnorm(n))*10+43
y_j1t0 <- scale(rnorm(n))*10+49
y_j0t1 <- scale(rnorm(n, mean = .5*scale(y_j0t0)))*10+49
y_j1t1 <- scale(rnorm(n, mean = .5*scale(y_j1t0)))*10+59Two Groups, Two Times, Two Methods?
Options and relations in the analysis of (quasi) experimental data with pre and post measurments
The scenario
Researchers have \(i = \{1,2,\dots,n\}\) units which are members of two groups of interest, \(j =\{0,1\}\), each with measures of \(Y\) at two time points \(t=\{0,1\}\).
\(Y\) is some continuous variable
\(j\) is an indicator of a treatment that is either randomly assigned or assigned through a well-matched sample meeting the usual criteria found in Stuart (2010).
\(t\) is a time indicator with 0 indicating pretest and 1 indicating post-test
The analysis without covariates
- RQ1: what is the difference between groups on post-test? The data could be arranged with one row per unit, a column for \(Y_{ij1}\) and a column for \(I(j=1)\), the model is then
\(Y_{ij1}=\alpha_0 + \alpha_1 I(j = 1) + e_{ij1}\)
- RQ2: what is the difference between groups on gains between pre and post test? The data could be arranged with one row per unit, a column for the difference between \(Y_{ij1}\) and \(Y_{ij0}\) and a column for \(I(j=1)\), the model is then
\(Y_{ij1}-Y_{ij0}=\beta_0 + \beta_1 I(j = 1) + r_{ij}\)
\[ \alpha_1 \ne \beta_1 \]
The analysis with co-variates
- RQ1: what is the difference between groups on post-test, holding pretest constant? The data could be arranged with one row per unit, a column for \(Y_{ij0}\), another for \(Y_{ij1}\), and a column for \(I(j=1)\), the model is then
\(Y_{ij1}=\gamma_0 + \gamma_1 I(j = 1) + \gamma_2 Y_{ij0} + u_{ij1}\)
- RQ2: what is the difference between groups on gains between pre and post test, holding pretest constant? The data could be arranged with one row per unit, a column for the difference between \(Y_{ij1}\) and \(Y_{ij0}\), a column for \(Y_{ij0}\), and a column for \(I(j=1)\), the model is then
\(Y_{ij1}-Y_{ij0}=\lambda_0 + \lambda_1 I(j = 1) +\lambda_2 Y_{ij0} + q_{ij}\)
\[ \gamma_1 = \lambda_1 \]
The problem
Lord (1967) noted an analysis paradox when two groups where measured: differences in conditional means at one time point is not the same as differences in average unit change
The answer to a vague research question of “is there a difference?” can have different answers (neither wrong) depending on the analysis, which answers a specific research question.
The point
The point of this talk is to explore two assertions
- The analysis of post-test scores and gains are different research questions
- In either analysis, holding constant the pre-test in a regression model will produce conditional post-test differences
Data generation
Suppose two groups of 50 units each, with scores drawn from a normal distribution with an overall mean of 50 (but means vary by group and time) and standard deviation of 10.
Summary statistics
the means
knitr::kable(
rbind(
j0 = data.frame(t0 = mean(y_j0t0), t1 = mean(y_j0t1), total=mean(c(y_j0t0,y_j0t1))),
j1 = data.frame(t0 = mean(y_j1t0), t1 = mean(y_j1t1), total=mean(c(y_j1t0,y_j1t1))),
total = data.frame(t0 = mean(c(y_j1t0,y_j0t0)), t1 = mean(c(y_j1t1,y_j0t1)),total = mean(c(y_j1t0,y_j0t0,y_j1t1,y_j0t1))),
make.row.names =TRUE
), digits = 4)| t0 | t1 | total | |
|---|---|---|---|
| j0 | 43 | 49 | 46 |
| j1 | 49 | 59 | 54 |
| total | 46 | 54 | 50 |
the standard deviations
knitr::kable(
rbind(
j0 = data.frame(t0 = sd(y_j0t0), t1 = sd(y_j0t1), total=sd(c(y_j0t0,y_j0t1))),
j1 = data.frame(t0 = sd(y_j1t0), t1 = sd(y_j1t1), total=sd(c(y_j1t0,y_j1t1))),
total = data.frame(t0 = sd(c(y_j1t0,y_j0t0)), t1 = sd(c(y_j1t1,y_j0t1)),total = sd(c(y_j1t0,y_j0t0,y_j1t1,y_j0t1))),
make.row.names =TRUE
), digits = 4
)| t0 | t1 | total | |
|---|---|---|---|
| j0 | 10.0000 | 10.0000 | 10.3962 |
| j1 | 10.0000 | 10.0000 | 11.1464 |
| total | 10.3962 | 11.1464 | 11.4742 |
Scatter plot
plot(y_j0t1, y_j0t0, type = "p", col = "blue", ylim = c(0,100), xlim = c(0,100), pch = 1, xlab = "t=0", ylab = "t=1")
par(new=TRUE)
plot(y_j1t1, y_j1t0, type = "p", col = "red", ylim = c(0,100), xlim = c(0,100), pch = 2, xlab = "t=0", ylab = "t=1")
legend(0,90, pch = c(1,2), col = c("blue","red"), legend = c("j=0","j=1"))
title("Scatterplot of pre- and \npost-scores by group")Data engineering and independent t-test
y_t0 <- c(y_j0t0,y_j1t0)
y_t1 <- c(y_j0t1,y_j1t1)
j <- c(rep(0,length(y_j0t1)),rep(1,length(y_j1t1)))
t.test(y_t1~j, var.equal = TRUE)
Two Sample t-test
data: y_t1 by j
t = -5, df = 98, p-value = 2.514e-06
alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
95 percent confidence interval:
-13.968935 -6.031065
sample estimates:
mean in group 0 mean in group 1
49 59
Independent t-test via regression
indmod <- lm(y_t1~j)
summary(indmod)
Call:
lm(formula = y_t1 ~ j)
Residuals:
Min 1Q Median 3Q Max
-25.8984 -5.6354 -0.1711 6.5160 27.2431
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 49.000 1.414 34.65 < 2e-16 ***
j 10.000 2.000 5.00 2.51e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 10 on 98 degrees of freedom
Multiple R-squared: 0.2033, Adjusted R-squared: 0.1951
F-statistic: 25 on 1 and 98 DF, p-value: 2.514e-06
Independent t-test via anova
summary(aov(y_t1~j)) Df Sum Sq Mean Sq F value Pr(>F)
j 1 2500 2500 25 2.51e-06 ***
Residuals 98 9800 100
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Note the same p-value. This is because the \(F\) test with 1 degree of freedom in the numerator (i.e., 2-groups) is the same value as the \(t\) test squared. The degrees of freedom for the t-test is the same as those in the \(F\) test denominator.
Controlling for co-variate in post-test difference
Of course, we can hold the pretest constant.
ancovamod <- lm(y_t1~j+y_t0)
summary(ancovamod)
Call:
lm(formula = y_t1 ~ j + y_t0)
Residuals:
Min 1Q Median 3Q Max
-23.3921 -5.4853 0.6753 5.6545 24.2040
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 30.07996 4.12321 7.295 8.17e-11 ***
j 7.35999 1.88630 3.902 0.000176 ***
y_t0 0.44000 0.09118 4.826 5.19e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 9.026 on 97 degrees of freedom
Multiple R-squared: 0.3575, Adjusted R-squared: 0.3443
F-statistic: 26.99 on 2 and 97 DF, p-value: 4.806e-10
ANCOVA effect
Why did the treatment effect change? Porter and Raudenbush (1987) note that the conditional impact of a treatment is equal to
\[ (\mu_{Y|j=1,t=1} -\mu_{Y|j=0,t=1})-\phi(\mu_{Y|j=1,t=0} -\mu_{Y|j=0,t=0}) \]
where \(\phi\) is the conditional association between the outcome and co-variate.
The unadjusted difference in the post-test is 10, and the difference in the pretest is 6. The conditional association between pre and post is \(\phi=\) 0.440001.
The adjusted difference is equal to 7.3599939.
Data engineering for gains analysis and test of difference
d_t1 <- c(y_j0t1-y_j0t0, y_j1t1-y_j1t0)
t.test(d_t1 ~ j, var.equal = TRUE)
Two Sample t-test
data: d_t1 by j
t = -1.8898, df = 98, p-value = 0.06174
alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
95 percent confidence interval:
-8.2003221 0.2003221
sample estimates:
mean in group 0 mean in group 1
6 10
Difference in gains analysis via regression (difference in difference)
summary(lm(d_t1 ~ j))
Call:
lm(formula = d_t1 ~ j)
Residuals:
Min 1Q Median 3Q Max
-25.6537 -7.7752 0.1601 7.3185 27.4555
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.000 1.497 4.009 0.000119 ***
j 4.000 2.117 1.890 0.061737 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 10.58 on 98 degrees of freedom
Multiple R-squared: 0.03516, Adjusted R-squared: 0.02532
F-statistic: 3.571 on 1 and 98 DF, p-value: 0.06174
Difference in gains analysis via ANOVA
summary(aov(d_t1 ~ j)) Df Sum Sq Mean Sq F value Pr(>F)
j 1 400 400 3.571 0.0617 .
Residuals 98 10976 112
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Controlling for co-variate, difference in gains via regression
summary(lm(d_t1 ~ j + y_t0))
Call:
lm(formula = d_t1 ~ j + y_t0)
Residuals:
Min 1Q Median 3Q Max
-23.3921 -5.4853 0.6753 5.6545 24.2040
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 30.07996 4.12321 7.295 8.17e-11 ***
j 7.35999 1.88630 3.902 0.000176 ***
y_t0 -0.56000 0.09118 -6.142 1.79e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 9.026 on 97 degrees of freedom
Multiple R-squared: 0.3053, Adjusted R-squared: 0.291
F-statistic: 21.32 on 2 and 97 DF, p-value: 2.122e-08
ANCOVA influence
What gives?
Why is the treatment effect the same in
\(Y_{ij1}=\gamma_0 + \gamma_1 I(j = 1) + \gamma_2 Y_{ij0} + u_{ij1}\)
and
\(Y_{ij1}-Y_{ij0}=\lambda_0 + \lambda_1 I(j = 1) +\lambda_2 Y_{ij0} + q_{ij}\)
It is due to the double use of \(Y_{ij0}\) (it is in both sides of the model).
Diff-in-diff as post test with pre-test off set
Algebraically, the model
\(Y_{ij1}-Y_{ij0}=\beta_0 + \beta_1 I(j = 1) + r_{ij}\)
is equivalent to
\(Y_{ij1}=\beta_0 + \beta_1 I(j = 1) +Y_{ij0}+ r_{ij}\)
where the pretest is a covariate with a slope of 1
Diff-in-diff as post test with pre-test off set
summary(lm(d_t1 ~ j))$coef Estimate Std. Error t value Pr(>|t|)
(Intercept) 6 1.496662 4.008922 0.0001190755
j 4 2.116599 1.889824 0.0617369041
summary(lm(y_t1 ~ j, offset = y_t0))$coef Estimate Std. Error t value Pr(>|t|)
(Intercept) 6 1.496662 4.008922 0.0001190755
j 4 2.116599 1.889824 0.0617369041
Covariate slope in Diff-in-diff with pre-test
So, with a covariate, you are running
\(Y_{ij1}=\lambda_0 + \lambda_1 I(j = 1) +\lambda_2 Y_{ij0} +Y_{ij0}+ r_{ij}\)
\(Y_{ij1}=\lambda_0 + \lambda_1 I(j = 1) +(\lambda_2+1) Y_{ij0} + r_{ij}\)
So, while \(\gamma_1 =\lambda_1\), \(\gamma_2=\lambda_2+1\) , or \(\gamma_2-1=\lambda_2\)
Diff-in-diff as post test with pre-test off set
summary(lm(y_t1 ~ j+y_t0))$coef #gamma 2 - 1 ... Estimate Std. Error t value Pr(>|t|)
(Intercept) 30.079956 4.12321335 7.295270 8.167547e-11
j 7.359994 1.88630118 3.901813 1.761743e-04
y_t0 0.440001 0.09117781 4.825747 5.187617e-06
summary(lm(d_t1 ~ j+y_t0))$coef #...is lambda 2 Estimate Std. Error t value Pr(>|t|)
(Intercept) 30.079956 4.12321335 7.295270 8.167547e-11
j 7.359994 1.88630118 3.901813 1.761743e-04
y_t0 -0.559999 0.09117781 -6.141834 1.794142e-08
Diff-in-diff and ANCOVA
Put another way, difference in difference is defined as
\[ \mu_{D|j=1}-\mu_{D|j=0}=(\mu_{Y|j=1,t=1} -\mu_{Y|j=1,t=0})-(\mu_{Y|j=0,t=1} -\mu_{Y|j=0,t=0}) \]
where \(D_{ij} = Y_{ij,t=1}-Y_{ij,t=0}\). Rearranging, this can also be expressed as
\[ (\mu_{Y|j=1,t=1} -\mu_{Y|j=0,t=1})-(\mu_{Y|j=1,t=0} -\mu_{Y|j=0,t=0}) \]
c.f. ANCOVA
\[ (\mu_{Y|j=1,t=1} -\mu_{Y|j=0,t=1})-\phi(\mu_{Y|j=1,t=0} -\mu_{Y|j=0,t=0}) \]
DiD with co-variate controls for the association between pre-test and gains, \(\zeta\)
\[ (\mu_{D|j=1}-\mu_{D|j=0})-\zeta(\mu_{Y|j=1,t=0} -\mu_{Y|j=0,t=0}) \]
Diff-in-diff holding starting point constant
Yet another way to think about it. When we control for the pretest, and the dependent variable is a difference between post and pre, we “hold constant” the pre-test value, \(Y^{*}_{t=0}\), so the gains are from the same starting point.
Hence our conditional diff in diff is the difference in gains when all cases have the same pre-test, \(Y^{*}_{t=0}\)
\[ (\hat{\mu}_{Y|j=1,t=1} -Y^{*}_{t=0})-(\hat{\mu}_{Y|j=0,t=1} -Y^{*}_{t=0}) \] or rearranging,
\[ \hat{\mu}_{Y|j=1,t=1} -Y^{*}_{t=0} -\hat{\mu}_{Y|j=0,t=1} + Y^{*}_{t=0} \] the \(Y^{*}_{t=0}\) values, one negative and one positive, cancel to give the post-test difference \[ \hat{\mu}_{Y|j=1,t=1} -\hat{\mu}_{Y|j=0,t=1} \]
Note that \(\hat{\mu}\) is an adjusted mean, not the observed mean.
Diff-in-diff holding starting point constant: geometry
The following charts show the differences in gains, then shows the difference from a common starting point, then adjusts the predictions to show our ANCOVA effect
Difference in gains with different starting points (I)
plot(1, type = "n", xlab = "",
ylab = "", xlim = c(-.25, 1.25), xaxt='n',
ylim = c(40, 60))
segments(0,43,1,49)
segments(0,49,1,59)Difference in gains with different starting points (II)
plot(1, type = "n", xlab = "",
ylab = "", xlim = c(-.25, 1.25), xaxt='n',
ylim = c(40, 60))
segments(0,43,1,49)
segments(0,43,1,43, lty = 2)
arrows(1,43,1,49, lty = 2)
text(1.1, 46, labels="6")
segments(0,49,1,59)
segments(0,49,1,49, lty = 2)
arrows(1,49,1,59, lty = 2)
text(1.1, 54, labels="10")
text(.5,40, "10-6=4")Difference in gains with the same starting point (I)
plot(1, type = "n", xlab = "",
ylab = "", xlim = c(-.25, 1.25), xaxt='n',
ylim = c(40, 60))
segments(0,46,1,49)
segments(0,46,1,59)Difference in gains with the same starting point (II)
plot(1, type = "n", xlab = "",
ylab = "", xlim = c(-.25, 1.25), xaxt='n',
ylim = c(40, 60))
segments(0,46,1,49)
segments(0,46,1,46, lty = 2)
arrows(1,46,1,49, lty = 2)
text(1.1, 47, labels="3")
arrows(1,46,1,59, lty = 2)
text(1.1, 57, labels="13")
segments(0,46,1,59)
text(.5,42, "Post-hoc adjustment")
text(.5,44, "13-3=10")
text(.5,40, "10-.44*6=7.36")Difference in gains with the same starting point (III)
plot(1, type = "n", xlab = "",
ylab = "", xlim = c(-.25, 1.25), xaxt='n',
ylim = c(40, 60))
segments(0,46,1,50.32)
segments(0,46,1,46, lty = 2)
arrows(1,46,1,50.32, lty = 2)
text(1.1, 47, labels="4.32")
arrows(1,46,1,57.68, lty = 2)
text(1.1, 53, labels="11.68")
segments(0,46,1,57.68)
text(.5,42, "Adjust predictions by 1.32, each")
text(.5,44, "(.44*6)/2=1.32")
text(.5,40, "11.68-4.32=7.36")Discussion points
- Care must be taken to be very specific about the research question
- Whether you randomize or match units into groups matters internal validity, so co-variate balance is important
- See page 11 in Katz et al. (2022)
- If not randomized, or if the RCT is broken, QED methods such as matching are often not helpful (see, e.g., Daw and Hatfield (2018)). If only two time points are available, ANCOVA assumptions are testable, DID are not.
- These properties are not preserved for generalized outcomes (e.g., logistic regression) and some multilevel models (two levels of means, multiple sources of variance)