Two Groups, Two Times, Two Methods?

Options and relations in the analysis of (quasi) experimental data with pre and post measurments

Author

E. C. Hedberg, PhD

The scenario

Researchers have \(i = \{1,2,\dots,n\}\) units which are members of two groups of interest, \(j =\{0,1\}\), each with measures of \(Y\) at two time points \(t=\{0,1\}\).

  • \(Y\) is some continuous variable

  • \(j\) is an indicator of a treatment that is either randomly assigned or assigned through a well-matched sample meeting the usual criteria found in Stuart (2010).

  • \(t\) is a time indicator with 0 indicating pretest and 1 indicating post-test

The analysis without covariates

  • RQ1: what is the difference between groups on post-test? The data could be arranged with one row per unit, a column for \(Y_{ij1}\) and a column for \(I(j=1)\), the model is then

\(Y_{ij1}=\alpha_0 + \alpha_1 I(j = 1) + e_{ij1}\)

  • RQ2: what is the difference between groups on gains between pre and post test? The data could be arranged with one row per unit, a column for the difference between \(Y_{ij1}\) and \(Y_{ij0}\) and a column for \(I(j=1)\), the model is then

\(Y_{ij1}-Y_{ij0}=\beta_0 + \beta_1 I(j = 1) + r_{ij}\)

\[ \alpha_1 \ne \beta_1 \]

The analysis with co-variates

  • RQ1: what is the difference between groups on post-test, holding pretest constant? The data could be arranged with one row per unit, a column for \(Y_{ij0}\), another for \(Y_{ij1}\), and a column for \(I(j=1)\), the model is then

\(Y_{ij1}=\gamma_0 + \gamma_1 I(j = 1) + \gamma_2 Y_{ij0} + u_{ij1}\)

  • RQ2: what is the difference between groups on gains between pre and post test, holding pretest constant? The data could be arranged with one row per unit, a column for the difference between \(Y_{ij1}\) and \(Y_{ij0}\), a column for \(Y_{ij0}\), and a column for \(I(j=1)\), the model is then

\(Y_{ij1}-Y_{ij0}=\lambda_0 + \lambda_1 I(j = 1) +\lambda_2 Y_{ij0} + q_{ij}\)

\[ \gamma_1 = \lambda_1 \]

The problem

Lord (1967) noted an analysis paradox when two groups where measured: differences in conditional means at one time point is not the same as differences in average unit change

The answer to a vague research question of “is there a difference?” can have different answers (neither wrong) depending on the analysis, which answers a specific research question.

The point

The point of this talk is to explore two assertions

  • The analysis of post-test scores and gains are different research questions
  • In either analysis, holding constant the pre-test in a regression model will produce conditional post-test differences

Data generation

Suppose two groups of 50 units each, with scores drawn from a normal distribution with an overall mean of 50 (but means vary by group and time) and standard deviation of 10.

set.seed(312)
n <- 50
y_j0t0 <- scale(rnorm(n))*10+43
y_j1t0 <- scale(rnorm(n))*10+49
y_j0t1 <- scale(rnorm(n, mean = .5*scale(y_j0t0)))*10+49
y_j1t1 <- scale(rnorm(n, mean = .5*scale(y_j1t0)))*10+59

Summary statistics

the means

knitr::kable(
rbind(
  j0 = data.frame(t0 = mean(y_j0t0), t1 = mean(y_j0t1), total=mean(c(y_j0t0,y_j0t1))),
  j1 = data.frame(t0 = mean(y_j1t0), t1 = mean(y_j1t1), total=mean(c(y_j1t0,y_j1t1))),
  total = data.frame(t0 = mean(c(y_j1t0,y_j0t0)), t1 = mean(c(y_j1t1,y_j0t1)),total = mean(c(y_j1t0,y_j0t0,y_j1t1,y_j0t1))),
  make.row.names =TRUE
), digits = 4)
t0 t1 total
j0 43 49 46
j1 49 59 54
total 46 54 50

the standard deviations

knitr::kable(
  rbind(
  j0 = data.frame(t0 = sd(y_j0t0), t1 = sd(y_j0t1), total=sd(c(y_j0t0,y_j0t1))),
  j1 = data.frame(t0 = sd(y_j1t0), t1 = sd(y_j1t1), total=sd(c(y_j1t0,y_j1t1))),
  total = data.frame(t0 = sd(c(y_j1t0,y_j0t0)), t1 = sd(c(y_j1t1,y_j0t1)),total = sd(c(y_j1t0,y_j0t0,y_j1t1,y_j0t1))),
  make.row.names =TRUE
), digits = 4
)
t0 t1 total
j0 10.0000 10.0000 10.3962
j1 10.0000 10.0000 11.1464
total 10.3962 11.1464 11.4742

Scatter plot

plot(y_j0t1, y_j0t0, type = "p", col = "blue", ylim = c(0,100), xlim = c(0,100), pch = 1, xlab = "t=0", ylab = "t=1")
par(new=TRUE)
plot(y_j1t1, y_j1t0, type = "p", col = "red", ylim = c(0,100), xlim = c(0,100), pch = 2,  xlab = "t=0", ylab = "t=1")
legend(0,90, pch = c(1,2), col = c("blue","red"), legend = c("j=0","j=1"))
title("Scatterplot of pre- and \npost-scores by group")

Data engineering and independent t-test

y_t0 <- c(y_j0t0,y_j1t0)
y_t1 <- c(y_j0t1,y_j1t1)
j <- c(rep(0,length(y_j0t1)),rep(1,length(y_j1t1)))
t.test(y_t1~j, var.equal = TRUE)

    Two Sample t-test

data:  y_t1 by j
t = -5, df = 98, p-value = 2.514e-06
alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
95 percent confidence interval:
 -13.968935  -6.031065
sample estimates:
mean in group 0 mean in group 1 
             49              59 

Independent t-test via regression

indmod <- lm(y_t1~j)
summary(indmod)

Call:
lm(formula = y_t1 ~ j)

Residuals:
     Min       1Q   Median       3Q      Max 
-25.8984  -5.6354  -0.1711   6.5160  27.2431 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   49.000      1.414   34.65  < 2e-16 ***
j             10.000      2.000    5.00 2.51e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 10 on 98 degrees of freedom
Multiple R-squared:  0.2033,    Adjusted R-squared:  0.1951 
F-statistic:    25 on 1 and 98 DF,  p-value: 2.514e-06

Independent t-test via anova

summary(aov(y_t1~j))
            Df Sum Sq Mean Sq F value   Pr(>F)    
j            1   2500    2500      25 2.51e-06 ***
Residuals   98   9800     100                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Note the same p-value. This is because the \(F\) test with 1 degree of freedom in the numerator (i.e., 2-groups) is the same value as the \(t\) test squared. The degrees of freedom for the t-test is the same as those in the \(F\) test denominator.

Controlling for co-variate in post-test difference

Of course, we can hold the pretest constant.

ancovamod <- lm(y_t1~j+y_t0)
summary(ancovamod)

Call:
lm(formula = y_t1 ~ j + y_t0)

Residuals:
     Min       1Q   Median       3Q      Max 
-23.3921  -5.4853   0.6753   5.6545  24.2040 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 30.07996    4.12321   7.295 8.17e-11 ***
j            7.35999    1.88630   3.902 0.000176 ***
y_t0         0.44000    0.09118   4.826 5.19e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 9.026 on 97 degrees of freedom
Multiple R-squared:  0.3575,    Adjusted R-squared:  0.3443 
F-statistic: 26.99 on 2 and 97 DF,  p-value: 4.806e-10

ANCOVA effect

Why did the treatment effect change? Porter and Raudenbush (1987) note that the conditional impact of a treatment is equal to

\[ (\mu_{Y|j=1,t=1} -\mu_{Y|j=0,t=1})-\phi(\mu_{Y|j=1,t=0} -\mu_{Y|j=0,t=0}) \]

where \(\phi\) is the conditional association between the outcome and co-variate.

The unadjusted difference in the post-test is 10, and the difference in the pretest is 6. The conditional association between pre and post is \(\phi=\) 0.440001.

The adjusted difference is equal to 7.3599939.

Data engineering for gains analysis and test of difference

d_t1 <- c(y_j0t1-y_j0t0, y_j1t1-y_j1t0)
t.test(d_t1 ~ j, var.equal = TRUE)

    Two Sample t-test

data:  d_t1 by j
t = -1.8898, df = 98, p-value = 0.06174
alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
95 percent confidence interval:
 -8.2003221  0.2003221
sample estimates:
mean in group 0 mean in group 1 
              6              10 

Difference in gains analysis via regression (difference in difference)

summary(lm(d_t1 ~ j))

Call:
lm(formula = d_t1 ~ j)

Residuals:
     Min       1Q   Median       3Q      Max 
-25.6537  -7.7752   0.1601   7.3185  27.4555 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)    6.000      1.497   4.009 0.000119 ***
j              4.000      2.117   1.890 0.061737 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 10.58 on 98 degrees of freedom
Multiple R-squared:  0.03516,   Adjusted R-squared:  0.02532 
F-statistic: 3.571 on 1 and 98 DF,  p-value: 0.06174

Difference in gains analysis via ANOVA

summary(aov(d_t1 ~ j))
            Df Sum Sq Mean Sq F value Pr(>F)  
j            1    400     400   3.571 0.0617 .
Residuals   98  10976     112                 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Controlling for co-variate, difference in gains via regression

summary(lm(d_t1 ~ j + y_t0))

Call:
lm(formula = d_t1 ~ j + y_t0)

Residuals:
     Min       1Q   Median       3Q      Max 
-23.3921  -5.4853   0.6753   5.6545  24.2040 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 30.07996    4.12321   7.295 8.17e-11 ***
j            7.35999    1.88630   3.902 0.000176 ***
y_t0        -0.56000    0.09118  -6.142 1.79e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 9.026 on 97 degrees of freedom
Multiple R-squared:  0.3053,    Adjusted R-squared:  0.291 
F-statistic: 21.32 on 2 and 97 DF,  p-value: 2.122e-08

ANCOVA influence

What gives?

Why is the treatment effect the same in

\(Y_{ij1}=\gamma_0 + \gamma_1 I(j = 1) + \gamma_2 Y_{ij0} + u_{ij1}\)

and

\(Y_{ij1}-Y_{ij0}=\lambda_0 + \lambda_1 I(j = 1) +\lambda_2 Y_{ij0} + q_{ij}\)

It is due to the double use of \(Y_{ij0}\) (it is in both sides of the model).

Diff-in-diff as post test with pre-test off set

Algebraically, the model

\(Y_{ij1}-Y_{ij0}=\beta_0 + \beta_1 I(j = 1) + r_{ij}\)

is equivalent to

\(Y_{ij1}=\beta_0 + \beta_1 I(j = 1) +Y_{ij0}+ r_{ij}\)

where the pretest is a covariate with a slope of 1

Diff-in-diff as post test with pre-test off set

summary(lm(d_t1 ~ j))$coef
            Estimate Std. Error  t value     Pr(>|t|)
(Intercept)        6   1.496662 4.008922 0.0001190755
j                  4   2.116599 1.889824 0.0617369041
summary(lm(y_t1 ~ j, offset = y_t0))$coef
            Estimate Std. Error  t value     Pr(>|t|)
(Intercept)        6   1.496662 4.008922 0.0001190755
j                  4   2.116599 1.889824 0.0617369041

Covariate slope in Diff-in-diff with pre-test

So, with a covariate, you are running

\(Y_{ij1}=\lambda_0 + \lambda_1 I(j = 1) +\lambda_2 Y_{ij0} +Y_{ij0}+ r_{ij}\)

\(Y_{ij1}=\lambda_0 + \lambda_1 I(j = 1) +(\lambda_2+1) Y_{ij0} + r_{ij}\)

So, while \(\gamma_1 =\lambda_1\), \(\gamma_2=\lambda_2+1\) , or \(\gamma_2-1=\lambda_2\)

Diff-in-diff as post test with pre-test off set

summary(lm(y_t1 ~ j+y_t0))$coef #gamma 2 - 1 ...
             Estimate Std. Error  t value     Pr(>|t|)
(Intercept) 30.079956 4.12321335 7.295270 8.167547e-11
j            7.359994 1.88630118 3.901813 1.761743e-04
y_t0         0.440001 0.09117781 4.825747 5.187617e-06
summary(lm(d_t1 ~ j+y_t0))$coef #...is lambda 2
             Estimate Std. Error   t value     Pr(>|t|)
(Intercept) 30.079956 4.12321335  7.295270 8.167547e-11
j            7.359994 1.88630118  3.901813 1.761743e-04
y_t0        -0.559999 0.09117781 -6.141834 1.794142e-08

Diff-in-diff and ANCOVA

Put another way, difference in difference is defined as

\[ \mu_{D|j=1}-\mu_{D|j=0}=(\mu_{Y|j=1,t=1} -\mu_{Y|j=1,t=0})-(\mu_{Y|j=0,t=1} -\mu_{Y|j=0,t=0}) \]

where \(D_{ij} = Y_{ij,t=1}-Y_{ij,t=0}\). Rearranging, this can also be expressed as

\[ (\mu_{Y|j=1,t=1} -\mu_{Y|j=0,t=1})-(\mu_{Y|j=1,t=0} -\mu_{Y|j=0,t=0}) \]

c.f. ANCOVA

\[ (\mu_{Y|j=1,t=1} -\mu_{Y|j=0,t=1})-\phi(\mu_{Y|j=1,t=0} -\mu_{Y|j=0,t=0}) \]

DiD with co-variate controls for the association between pre-test and gains, \(\zeta\)

\[ (\mu_{D|j=1}-\mu_{D|j=0})-\zeta(\mu_{Y|j=1,t=0} -\mu_{Y|j=0,t=0}) \]

Diff-in-diff holding starting point constant

Yet another way to think about it. When we control for the pretest, and the dependent variable is a difference between post and pre, we “hold constant” the pre-test value, \(Y^{*}_{t=0}\), so the gains are from the same starting point.

Hence our conditional diff in diff is the difference in gains when all cases have the same pre-test, \(Y^{*}_{t=0}\)

\[ (\hat{\mu}_{Y|j=1,t=1} -Y^{*}_{t=0})-(\hat{\mu}_{Y|j=0,t=1} -Y^{*}_{t=0}) \] or rearranging,

\[ \hat{\mu}_{Y|j=1,t=1} -Y^{*}_{t=0} -\hat{\mu}_{Y|j=0,t=1} + Y^{*}_{t=0} \] the \(Y^{*}_{t=0}\) values, one negative and one positive, cancel to give the post-test difference \[ \hat{\mu}_{Y|j=1,t=1} -\hat{\mu}_{Y|j=0,t=1} \]

Note that \(\hat{\mu}\) is an adjusted mean, not the observed mean.

Diff-in-diff holding starting point constant: geometry

The following charts show the differences in gains, then shows the difference from a common starting point, then adjusts the predictions to show our ANCOVA effect

Difference in gains with different starting points (I)

plot(1, type = "n", xlab = "",
     ylab = "", xlim = c(-.25, 1.25), xaxt='n',
     ylim = c(40, 60))
segments(0,43,1,49)
segments(0,49,1,59)

Difference in gains with different starting points (II)

plot(1, type = "n", xlab = "",
     ylab = "", xlim = c(-.25, 1.25), xaxt='n',
     ylim = c(40, 60))
segments(0,43,1,49)
segments(0,43,1,43, lty = 2)
arrows(1,43,1,49, lty = 2)
text(1.1, 46, labels="6")
segments(0,49,1,59)
segments(0,49,1,49, lty = 2)
arrows(1,49,1,59, lty = 2)
text(1.1, 54, labels="10")
text(.5,40, "10-6=4")

Difference in gains with the same starting point (I)

plot(1, type = "n", xlab = "",
     ylab = "", xlim = c(-.25, 1.25), xaxt='n',
     ylim = c(40, 60))
segments(0,46,1,49)
segments(0,46,1,59)

Difference in gains with the same starting point (II)

plot(1, type = "n", xlab = "",
     ylab = "", xlim = c(-.25, 1.25), xaxt='n',
     ylim = c(40, 60))
segments(0,46,1,49)
segments(0,46,1,46, lty = 2)
arrows(1,46,1,49, lty = 2)
text(1.1, 47, labels="3")
arrows(1,46,1,59, lty = 2)
text(1.1, 57, labels="13")
segments(0,46,1,59)
text(.5,42, "Post-hoc adjustment")
text(.5,44, "13-3=10")
text(.5,40, "10-.44*6=7.36")

Difference in gains with the same starting point (III)

plot(1, type = "n", xlab = "",
     ylab = "", xlim = c(-.25, 1.25), xaxt='n',
     ylim = c(40, 60))
segments(0,46,1,50.32)
segments(0,46,1,46, lty = 2)
arrows(1,46,1,50.32, lty = 2)
text(1.1, 47, labels="4.32")
arrows(1,46,1,57.68, lty = 2)
text(1.1, 53, labels="11.68")
segments(0,46,1,57.68)
text(.5,42, "Adjust predictions by 1.32, each")
text(.5,44, "(.44*6)/2=1.32")
text(.5,40, "11.68-4.32=7.36")

Discussion points

  • Care must be taken to be very specific about the research question
  • Whether you randomize or match units into groups matters internal validity, so co-variate balance is important
  • See page 11 in Katz et al. (2022)
  • If not randomized, or if the RCT is broken, QED methods such as matching are often not helpful (see, e.g., Daw and Hatfield (2018)). If only two time points are available, ANCOVA assumptions are testable, DID are not.
  • These properties are not preserved for generalized outcomes (e.g., logistic regression) and some multilevel models (two levels of means, multiple sources of variance)

References

Daw, Jamie R, and Laura A Hatfield. 2018. “Matching and Regression to the Mean in Difference-in-Differences Analysis.” Health Services Research 53 (6): 4138–56.
Katz, Charles M, Hyunjung Cheon, EC Hedberg, and Scott H Decker. 2022. “Impact of Family-Based Secondary Prevention Programming on Risk, Resilience, and Delinquency: A 6-Month Follow up Within a Randomized Control Trial in Honduras.” Justice Quarterly 39 (6): 1237–62.
Lord, Frederic M. 1967. “A Paradox in the Interpretation of Group Comparisons.” Psychological Bulletin 68 (5): 304.
Porter, Andrew C, and Stephen W Raudenbush. 1987. “Analysis of Covariance: Its Model and Use in Psychological Research.” Journal of Counseling Psychology 34 (4): 383.
Stuart, Elizabeth A. 2010. “Matching Methods for Causal Inference: A Review and a Look Forward.” Statistical Science: A Review Journal of the Institute of Mathematical Statistics 25 (1): 1.