The most commonly used statistical method to estimate a causal effect is regression. When the circumstances are right and the assumptions are satisfied, regression can be applied to observed data to estimate the average treatment effect (ATE), regardless it is a randomized experiment or quasi-experiment.

To illustrate the use of regression model, I use simulated data in a hypothetical SAT example.

Generate data for SAT example

ID = c(1:6)
Grp = rep(c(0, 1), each = 3)
Score = c(550, 600, 650, 600, 720, 630)
SES = c(1, 2, 2, 2, 3, 2)
SATdat = data.frame(ID, Grp, Score, SES)
SATdat$SES = factor(SATdat$SES)
SATdat
##   ID Grp Score SES
## 1  1   0   550   1
## 2  2   0   600   2
## 3  3   0   650   2
## 4  4   1   600   2
## 5  5   1   720   3
## 6  6   1   630   2

There are:

The causal question is: could SAT tutoring service improve students’ SAT scores?

Scenario 1: Randomized experiment

In a randomized experiment, no confounder needs to be included, and the only predictor is the group indicator Grp:

lm1 = lm(Score ~ Grp, data = SATdat)
summary(lm1)
## 
## Call:
## lm(formula = Score ~ Grp, data = SATdat)
## 
## Residuals:
##          1          2          3          4          5          6 
## -5.000e+01 -7.638e-14  5.000e+01 -5.000e+01  7.000e+01 -2.000e+01 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   600.00      32.66  18.371 5.17e-05 ***
## Grp            50.00      46.19   1.083     0.34    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 56.57 on 4 degrees of freedom
## Multiple R-squared:  0.2266, Adjusted R-squared:  0.03323 
## F-statistic: 1.172 on 1 and 4 DF,  p-value: 0.3399
t.test(SATdat$Score[SATdat$Grp==1], SATdat$Score[SATdat$Grp==0])
## 
##  Welch Two Sample t-test
## 
## data:  SATdat$Score[SATdat$Grp == 1] and SATdat$Score[SATdat$Grp == 0]
## t = 1.0825, df = 3.8173, p-value = 0.3426
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -80.69522 180.69522
## sample estimates:
## mean of x mean of y 
##       650       600

Scenario 2: Quasi experiment

It’s most likely that students self-selected into the SAT tutoring service, as a result, the two groups are not randomly formed, there might be systematic differences between the two groups.

By definition, this is a quasi-experiment.

Suppose that in this scenario, we include one confounder, SES, in the regression analysis:

SES: socioeconomic status, broken into three levels (high, middle, and low)

ses.lm = lm(Score ~ Grp+SES, data = SATdat)
summary(ses.lm)
## 
## Call:
## lm(formula = Score ~ Grp + SES, data = SATdat)
## 
## Residuals:
##          1          2          3          4          5          6 
##  1.066e-14 -2.500e+01  2.500e+01 -1.500e+01 -4.016e-15  1.500e+01 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)   550.00      29.15  18.865   0.0028 **
## Grp           -10.00      29.15  -0.343   0.7643   
## SES2           75.00      35.71   2.100   0.1705   
## SES3          180.00      50.50   3.565   0.0705 . 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 29.15 on 2 degrees of freedom
## Multiple R-squared:  0.8973, Adjusted R-squared:  0.7432 
## F-statistic: 5.824 on 3 and 2 DF,  p-value: 0.1501