Federico Ferrero
How can we measure the effect of a program — such as high-dosage tutoring or gifted and talented initiatives — without placing students who truly need or qualify for support into a control group?
Randomly withholding access would be ethically unacceptable. Regression Discontinuity Designs provide a solution for evaluating such programs while maintaining fairness and rigor.
In education, we often want to know whether interventions like high-dosage tutoring or gifted and talented programs actually improve student outcomes. The gold standard for causal inference is a randomized controlled trial (RCT), but randomly assigning students to a “no intervention” control group is often ethically and politically impossible. We don’t want to withhold opportunities from students who could benefit the most.
Regression Discontinuity (RD) provides a solution. Many programs already have eligibility rules based on a cutoff, such as a failing score, a risk index, or a percentile threshold. Regression Discontinuity leverages pre-existing cutoff rules. For example, students just below the eligibility threshold receive the program, while students just above it do not. The RD design then compares posttest outcomes between these two groups—students who were very similar in their pretest scores but differed only in program participation.
KEY RULE: If there is a program effect, there will be a discontinuity in the regression lines at the cutoff.
As we can see in the plots below, each dot represents a student.
Left plot – Regression Continuity: Students with scores below the cutoff (3550) were not assigned to a tutoring program. When we compare their pretest scores with posttest scores, the outcome changes smoothly across the cutoff, showing regression continuity. There is no jump at the threshold.
Right plot – Regression Discontinuity: Students with scores below the cutoff (3550) were assigned to the tutoring program. After the program, when we compare pretest and posttest scores, we clearly see a jump at the cutoff, which is a regression discontinuity. This jump indicates the tutoring program had a measurable effect on student outcomes.It works!
This simulated example mirrors a realistic Texas STAAR EOC Algebra I tutoring policy.
Running variable (X): 2023–24 EOY STAAR EOC Algebra I score score.
Treatment: assignment to Algebra I high-dosage tutoring during the 2024–25 school year.
Outcome (Y): 2024–25 EOY STAAR EOC Algebra I scale score.
Cutoff (c): scale score = 3550, corresponding to Approaches Grade Level according to the TEA Raw Score Conversion Table (EOY 2023–24).
Note: To find the specific minimum scale score needed to achieve Approaches Grade Level for a given test, we need to consult the official TEA Raw Score Conversion Tables for that particular subject, grade, and test administration year.
Policy rule (sharp RD):
Students with raw score < 3550 did not meet Approaches and were assigned to tutoring.
Students with raw score ≥ 3550 were not assigned to tutoring.
Local randomization near the cutoff: Students just below vs just above 3550 are assumed to be similar in ability except for treatment assignment. The estimand is the causal effect of tutoring for students near the Approaches cutoff.
Before moving to diagnostics and estimation, it is useful to inspect the structure of the simulated data.
summary(staar_rd)
## eoy_2324 tutoring eoy_2425
## Min. :3109 Min. :0.0000 Min. :3031
## 1st Qu.:3469 1st Qu.:0.0000 1st Qu.:3513
## Median :3549 Median :1.0000 Median :3597
## Mean :3549 Mean :0.5007 Mean :3599
## 3rd Qu.:3625 3rd Qu.:1.0000 3rd Qu.:3686
## Max. :3960 Max. :1.0000 Max. :4043
head(staar_rd, 8)
## eoy_2324 tutoring eoy_2425
## 1 3612 0 3519
## 2 3420 1 3544
## 3 3567 0 3744
## 4 3540 1 3608
## 5 3470 1 3575
## 6 3248 1 3442
## 7 3462 1 3746
## 8 3428 1 3280
The output above shows the first eight rows of the dataset, for each student:
eoy_2324: prior-year STAAR Algebra I score score (running variable).
tutoring: assignment to Algebra I tutoring (1 = assigned, 0 = not assigned).
eoy_2425: next-year STAAR Algebra I scale score (outcome).
library(ggplot2)
ggplot(staar_rd, aes(x = eoy_2324, y = eoy_2425, color = factor(tutoring))) +
geom_point(alpha = 0.6) +
geom_vline(xintercept = cutoff_score, linetype = "dashed", color = "red") +
annotate("text", x = cutoff_score + 10, y = max(staar_rd$eoy_2425),
label = "Cutoff = 3550", color = "red", hjust = 0) +
# Local linear regressions per side of cutoff
geom_smooth(data = subset(staar_rd, eoy_2324 < cutoff_score),
method = "lm", se = FALSE, color = "black") +
geom_smooth(data = subset(staar_rd, eoy_2324 >= cutoff_score),
method = "lm", se = FALSE, color = "black") +
labs(
x = "
EOY 2023-24 STAAR EOC
Algebra I Scale Score",
y = "
EOY 2024-25 STAAR EOC
Algebra I Scale Score",
title = "Tutoring Effect on STAAR Algebra I",
color = "Tutoring"
) +
scale_color_manual(values = c("0" = "darkgreen", "1" = "purple"),
labels = c("Did Not Take", "Took")) +
theme_minimal() +
theme(panel.grid = element_blank(),
plot.title = element_text(hjust = 0.5))
We can also zoom in on the area around the cutoff score…
ggplot(staar_rd, aes(x = eoy_2324, y = eoy_2425, color = factor(tutoring))) +
geom_point(alpha = 0.6) +
geom_vline(xintercept = cutoff_score, linetype = "dashed", color = "red") +
annotate("text", x = cutoff_score + 10, y = max(staar_rd$eoy_2425),
label = "Cutoff = 3550", color = "red", hjust = 0) +
# Local linear regressions per side of cutoff
geom_smooth(data = subset(staar_rd, eoy_2324 < cutoff_score),
method = "lm", se = FALSE, color = "black") +
geom_smooth(data = subset(staar_rd, eoy_2324 >= cutoff_score),
method = "lm", se = FALSE, color = "black") +
labs(
x = "
EOY 2023-24 STAAR EOC
Algebra I Scale Score",
y = "
EOY 2024-25 STAAR EOC
Algebra I Scale Score",
title = "Tutoring Effect on STAAR Algebra I",
color = "Tutoring"
) +
scale_color_manual(values = c("0" = "darkgreen", "1" = "purple"),
labels = c("Did Not Take", "Took")) +
theme_minimal() +
theme(panel.grid = element_blank(),
plot.title = element_text(hjust = 0.5)) +
coord_cartesian(xlim = c(3500, 3600))
After inspecting the scatterplots, we can observe a regression discontinuity at the cutoff score of 3550 (STAAR EOC Algebra I). This means that among students with similar pretest scores near the cutoff, those who were assigned to and participated in the tutoring program performed better on the posttest than those who were not assigned.
The program appears to have a positive effect, but to determine how large this effect is with statistical accuracy, we need to run formal statistical analyses.
# Load the packages
library(rdrobust) # Implements regression discontinuity (RD) estimation with robust standard errors, allowing us to estimate the causal effect of a program at a cutoff
library(rddensity) # Helps check the density of the running variable around the cutoff, which is important to test for manipulation (no “gaming” of the score)
# Fit a regression discontinuity model: This function estimates the size of the jump at the cutoff, i.e., the causal effect of tutoring on students near the threshold
rd_model <- rdrobust(y = staar_rd$eoy_2425, x = staar_rd$eoy_2324, c = cutoff_score)
summary(rd_model)
## Sharp RD estimates using local polynomial regression.
##
## Number of Obs. 3000
## BW type mserd
## Kernel Triangular
## VCE method NN
##
## Number of Obs. 1502 1498
## Eff. Number of Obs. 984 984
## Order est. (p) 1 1
## Order bias (q) 2 2
## BW est. (h) 112.731 112.731
## BW bias (b) 182.005 182.005
## rho (h/b) 0.619 0.619
## Unique Obs. 273 271
##
## =====================================================================
## Point Robust Inference
## Estimate z P>|z| [ 95% C.I. ]
## ---------------------------------------------------------------------
## RD Effect -92.661 -8.287 0.000 [-114.495 , -70.696]
## =====================================================================
Type of analysis: The results are from a sharp regression discontinuity (RD) design using local linear regression (order 1 polynomial) with a triangular kernel. This is appropriate because treatment (tutoring) was strictly assigned based on the cutoff.
Number of observations: 3000 in total. Left of cutoff: 1502 students. Right of cutoff: 1498 students.
The effective number of observations used in the estimation is smaller (≈984 per side) because RD uses observations close to the cutoff, giving more weight to students near 3550.
Bandwidth: The estimation bandwidth (h) is about 113 points. This means that the RD effect is estimated using students with pretest scores roughly within 113 points of the cutoff on either side.The bias bandwidth (b) is larger (182 points) for calculating bias-corrected inference.
Estimated treatment effect: RD Effect = -92.66. Since in your simulation the tutoring was coded as 1 for students below the cutoff, a negative effect here means that the posttest scores of treated students (below cutoff) are higher than those above the cutoff (because of coding direction).
Robust z-value = -8.287, p < 0.001 → the effect is highly statistically significant.
Confidence interval: 95% CI: [-114.5, -70.7]. This means we are 95% confident that the tutoring program increased posttest scores by roughly 71–115 points for students near the cutoff.
The regression discontinuity analysis shows a large and statistically significant jump at the cutoff (3550). Students who were assigned and took tutoring scored substantially higher on the posttest than similar students just above the cutoff, confirming that the program had a clear positive impact.
We compare the tutoring effect to how much scores usually vary. This tells us how big the improvement was in a way that’s easy to understand, not just that the tutoring helped.
# Extract the RD effect estimate (jump at cutoff)
tutoring_effect <- rd_model$coef[1] # takes the first coefficient from the RD model, which is the estimated jump in posttest scores at the cutoff
tutoring_effect
## [1] -92.66131
# Calculate a standardized effect size (Cohen's d)
sd_post <- sd(staar_rd$eoy_2425) # calculates the standard deviation of the posttest scores
effect_size_d <- tutoring_effect / sd_post # divides the RD effect by this standard deviation. This gives a standardized effect size, also called Cohen’s d, which tells you how many standard deviations the tutoring improved scores.
effect_size_d
## [1] -0.6963258
The RD analysis shows that students who received tutoring scored about 93 points higher at the cutoff compared to similar students who did not receive tutoring. When we standardize this effect using the posttest score variability, we get a Cohen’s d of approximately 0.70, meaning the tutoring increased scores by about 0.7 standard deviations.
Note: Because we coded students below the cutoff as treated (1), the RD effect appears as a negative number (-93), even though the program increases posttest scores as we saw in the scatterplot. A negative estimate here actually corresponds to a positive tutoring effect.
According to Cohen’s conventions, an effect size of 0.2 is small, 0.5 is medium, and 0.8 is large, so 0.7 represents a substantial positive impact of tutoring program on student performance.
The code below is used to select the optimal bandwidth for a regression discontinuity (RD) analysis. In RD, the bandwidth determines how many observations near the cutoff are included when estimating the treatment effect. Choosing the right bandwidth is important: using a bandwidth that is too wide can include students far from the cutoff, who are less comparable, while using one that is too narrow can reduce statistical power. The function rdbwselect() from the rdrobust package computes data-driven, optimal bandwidths for RD estimation using standard methods, such as minimizing mean squared error.
bw <- rdbwselect(y = staar_rd$eoy_2425, x = staar_rd$eoy_2324, c = cutoff_score)
bw
## Call: rdbwselect
##
## Number of Obs. 3000
## BW type mserd
## Kernel Triangular
## VCE method NN
##
## Number of Obs. 1502 1498
## Order est. (p) 1 1
## Order bias (q) 2 2
## Unique Obs. 273 271
Out of 3,000 students, about 1,502 are below the cutoff and 1,498 above, but the effective number of observations used for estimation is smaller, focusing on students closest to the cutoff (≈273 below and 271 above). The method used to select the bandwidth is mserd (mean squared error–optimal), with a triangular kernel, which gives more weight to students nearer the cutoff. In short, this shows that the analysis focuses on students near the cutoff to get the most precise and unbiased estimate of the tutoring effect.
The code below checks whether students’ pretest scores are distributed smoothly around the cutoff, which is a critical assumption for the regression discontinuity design to produce valid causal estimates.
rd_density <- rddensity(staar_rd$eoy_2324, c = cutoff_score)
summary(rd_density)
##
## Manipulation testing using local polynomial density estimation.
##
## Number of obs = 3000
## Model = unrestricted
## Kernel = triangular
## BW method = estimated
## VCE method = jackknife
##
## c = 3550 Left of c Right of c
## Number of obs 1502 1498
## Eff. Number of obs 721 875
## Order est. (p) 2 2
## Order bias (q) 3 3
## BW est. (h) 77.425 94.121
##
## Method T P > |T|
## Robust 0.7925 0.4281
##
## P-values of binomial tests (H0: p=0.5).
##
## Window Length / 2 <c >=c P>|T|
## 2.000 22 34 0.1409
## 4.000 37 53 0.1133
## 6.000 56 78 0.0693
## 8.000 70 98 0.0369
## 10.000 93 122 0.0559
## 12.000 119 144 0.1388
## 14.000 135 161 0.1461
## 16.000 158 186 0.1454
## 18.000 177 207 0.1388
## 20.000 200 229 0.1764
The density test shows that the distribution of pretest scores (eoy_2324) is smooth around the cutoff (3550). The robust test statistic is 0.79 with a p-value of 0.43, well above 0.05, indicating no evidence of manipulation. What we want is p-value greater than 0.05, meaning there is no statistically significant jump in the distribution of scores at the cutoff.
The binomial tests for different window lengths also show p-values mostly above 0.05, confirming that the number of students just below and above the cutoff is roughly balanced. This means students could not “game the cutoff” (they could not manipulate their scores to qualify for tutoring).
Overall, this supports the key RD assumption, so the observed jump in posttest scores can be confidently interpreted as the causal effect of the tutoring program.
The code below checks whether other student characteristics are balanced across the cutoff. A non-significant effect indicates that the RD groups are comparable, supporting the validity of the estimated tutoring effect.
covariate <- rnorm(n)
rdrobust <- rdrobust(y = covariate, x = staar_rd$eoy_2324, c = cutoff_score)
summary(rdrobust)
## Sharp RD estimates using local polynomial regression.
##
## Number of Obs. 3000
## BW type mserd
## Kernel Triangular
## VCE method NN
##
## Number of Obs. 1502 1498
## Eff. Number of Obs. 649 677
## Order est. (p) 1 1
## Order bias (q) 2 2
## BW est. (h) 67.372 67.372
## BW bias (b) 108.726 108.726
## rho (h/b) 0.620 0.620
## Unique Obs. 273 271
##
## =====================================================================
## Point Robust Inference
## Estimate z P>|z| [ 95% C.I. ]
## ---------------------------------------------------------------------
## RD Effect -0.042 -0.500 0.617 [-0.369 , 0.219]
## =====================================================================
What we want: If the covariate does not jump at the cutoff, it confirms that students are comparable, and the observed jump in the posttest is likely caused by the tutoring program. Therefore, the estimate should be close to zero, meaning no jump at the cutoff. Likewise, p > 0.05 (not statistically significant) → no evidence of imbalance.
The covariate check shows that the RD estimate for this student-level variable is -0.042 with a p-value of 0.617, which is not statistically significant. The 95% confidence interval includes zero ([-0.369, 0.219]), indicating that the covariate is balanced across the cutoff. Students just below and just above the cutoff are similar in this characteristic, which supports the validity of the RD design and suggests that the jump in posttest scores is due to the tutoring program, not pre-existing differences.
Note: the same Regression Discontinuity analysis can be implemented in SPSS. For further guidance, material from Dr. Wuensch can be consulted for these purposes (see reference list below).