Federico Ferrero


How can we measure the effect of a program — such as high-dosage tutoring or gifted and talented initiatives — without placing students who truly need or qualify for support into a control group?

Randomly withholding access would be ethically unacceptable. Regression Discontinuity Designs provide a solution for evaluating such programs while maintaining fairness and rigor.


Why Regression Discontinuity Matters in Program Evaluation?

In education, we often want to know whether interventions like high-dosage tutoring or gifted and talented programs actually improve student outcomes. The gold standard for causal inference is a randomized controlled trial (RCT), but randomly assigning students to a “no intervention” control group is often ethically and politically impossible. We don’t want to withhold opportunities from students who could benefit the most.

Regression Discontinuity (RD) provides a solution. Many programs already have eligibility rules based on a cutoff, such as a failing score, a risk index, or a percentile threshold. Regression Discontinuity leverages pre-existing cutoff rules. For example, students just below the eligibility threshold receive the program, while students just above it do not. The RD design then compares posttest outcomes between these two groups—students who were very similar in their pretest scores but differed only in program participation.

Assumptions and Requirements

Regression Continuity vs. Regression Discontinuity

KEY RULE: If there is a program effect, there will be a discontinuity in the regression lines at the cutoff.

As we can see in the plots below, each dot represents a student.


Simulating an Educational RD Case (STAAR EOC Algebra I)

This simulated example mirrors a realistic Texas STAAR EOC Algebra I tutoring policy.

Policy rule (sharp RD):

Local randomization near the cutoff: Students just below vs just above 3550 are assumed to be similar in ability except for treatment assignment. The estimand is the causal effect of tutoring for students near the Approaches cutoff.

Previewing the Simulated Dataset

Before moving to diagnostics and estimation, it is useful to inspect the structure of the simulated data.

summary(staar_rd)
##     eoy_2324       tutoring         eoy_2425   
##  Min.   :3109   Min.   :0.0000   Min.   :3031  
##  1st Qu.:3469   1st Qu.:0.0000   1st Qu.:3513  
##  Median :3549   Median :1.0000   Median :3597  
##  Mean   :3549   Mean   :0.5007   Mean   :3599  
##  3rd Qu.:3625   3rd Qu.:1.0000   3rd Qu.:3686  
##  Max.   :3960   Max.   :1.0000   Max.   :4043
head(staar_rd, 8)
##   eoy_2324 tutoring eoy_2425
## 1     3612        0     3519
## 2     3420        1     3544
## 3     3567        0     3744
## 4     3540        1     3608
## 5     3470        1     3575
## 6     3248        1     3442
## 7     3462        1     3746
## 8     3428        1     3280

The output above shows the first eight rows of the dataset, for each student:

Visual Inspection: The First Diagnostic

library(ggplot2)
ggplot(staar_rd, aes(x = eoy_2324, y = eoy_2425, color = factor(tutoring))) +
  geom_point(alpha = 0.6) +
  geom_vline(xintercept = cutoff_score, linetype = "dashed", color = "red") +
  annotate("text", x = cutoff_score + 10, y = max(staar_rd$eoy_2425), 
           label = "Cutoff = 3550", color = "red", hjust = 0) +
  # Local linear regressions per side of cutoff
  geom_smooth(data = subset(staar_rd, eoy_2324 < cutoff_score), 
              method = "lm", se = FALSE, color = "black") +
  geom_smooth(data = subset(staar_rd, eoy_2324 >= cutoff_score), 
              method = "lm", se = FALSE, color = "black") +
  labs(
    x = "
    EOY 2023-24 STAAR EOC
    Algebra I Scale Score",
    y = "
    EOY 2024-25 STAAR EOC
    Algebra I Scale Score",
    title = "Tutoring Effect on STAAR Algebra I",
    color = "Tutoring"
  ) +
  scale_color_manual(values = c("0" = "darkgreen", "1" = "purple"),
                     labels = c("Did Not Take", "Took")) +
  theme_minimal() +
  theme(panel.grid = element_blank(),
        plot.title = element_text(hjust = 0.5))

We can also zoom in on the area around the cutoff score…

ggplot(staar_rd, aes(x = eoy_2324, y = eoy_2425, color = factor(tutoring))) +
  geom_point(alpha = 0.6) +
  geom_vline(xintercept = cutoff_score, linetype = "dashed", color = "red") +
  annotate("text", x = cutoff_score + 10, y = max(staar_rd$eoy_2425), 
           label = "Cutoff = 3550", color = "red", hjust = 0) +
  # Local linear regressions per side of cutoff
  geom_smooth(data = subset(staar_rd, eoy_2324 < cutoff_score), 
              method = "lm", se = FALSE, color = "black") +
  geom_smooth(data = subset(staar_rd, eoy_2324 >= cutoff_score), 
              method = "lm", se = FALSE, color = "black") +
  labs(
    x = "
    EOY 2023-24 STAAR EOC
    Algebra I Scale Score",
    y = "
    EOY 2024-25 STAAR EOC
    Algebra I Scale Score",
    title = "Tutoring Effect on STAAR Algebra I",
    color = "Tutoring"
  ) +
  scale_color_manual(values = c("0" = "darkgreen", "1" = "purple"),
                     labels = c("Did Not Take", "Took")) +
  theme_minimal() +
  theme(panel.grid = element_blank(),
        plot.title = element_text(hjust = 0.5)) +
  coord_cartesian(xlim = c(3500, 3600))

After inspecting the scatterplots, we can observe a regression discontinuity at the cutoff score of 3550 (STAAR EOC Algebra I). This means that among students with similar pretest scores near the cutoff, those who were assigned to and participated in the tutoring program performed better on the posttest than those who were not assigned.

The program appears to have a positive effect, but to determine how large this effect is with statistical accuracy, we need to run formal statistical analyses.

Estimate Regression Discontinuity Metrics

# Load the packages
library(rdrobust) # Implements regression discontinuity (RD) estimation with robust standard errors, allowing us to estimate the causal effect of a program at a cutoff
library(rddensity) # Helps check the density of the running variable around the cutoff, which is important to test for manipulation (no “gaming” of the score)

# Fit a regression discontinuity model: This function estimates the size of the jump at the cutoff, i.e., the causal effect of tutoring on students near the threshold
rd_model <- rdrobust(y = staar_rd$eoy_2425, x = staar_rd$eoy_2324, c = cutoff_score)
summary(rd_model)
## Sharp RD estimates using local polynomial regression.
## 
## Number of Obs.                 3000
## BW type                       mserd
## Kernel                   Triangular
## VCE method                       NN
## 
## Number of Obs.                 1502         1498
## Eff. Number of Obs.             984          984
## Order est. (p)                    1            1
## Order bias  (q)                   2            2
## BW est. (h)                 112.731      112.731
## BW bias (b)                 182.005      182.005
## rho (h/b)                     0.619        0.619
## Unique Obs.                     273          271
## 
## =====================================================================
##                    Point    Robust Inference
##                 Estimate         z     P>|z|      [ 95% C.I. ]       
## ---------------------------------------------------------------------
##      RD Effect   -92.661    -8.287     0.000  [-114.495 , -70.696]   
## =====================================================================

Type of analysis: The results are from a sharp regression discontinuity (RD) design using local linear regression (order 1 polynomial) with a triangular kernel. This is appropriate because treatment (tutoring) was strictly assigned based on the cutoff.

Number of observations: 3000 in total. Left of cutoff: 1502 students. Right of cutoff: 1498 students.

The effective number of observations used in the estimation is smaller (≈984 per side) because RD uses observations close to the cutoff, giving more weight to students near 3550.

Bandwidth: The estimation bandwidth (h) is about 113 points. This means that the RD effect is estimated using students with pretest scores roughly within 113 points of the cutoff on either side.The bias bandwidth (b) is larger (182 points) for calculating bias-corrected inference.

Estimated treatment effect: RD Effect = -92.66. Since in your simulation the tutoring was coded as 1 for students below the cutoff, a negative effect here means that the posttest scores of treated students (below cutoff) are higher than those above the cutoff (because of coding direction).

Robust z-value = -8.287, p < 0.001 → the effect is highly statistically significant.

Confidence interval: 95% CI: [-114.5, -70.7]. This means we are 95% confident that the tutoring program increased posttest scores by roughly 71–115 points for students near the cutoff.

The regression discontinuity analysis shows a large and statistically significant jump at the cutoff (3550). Students who were assigned and took tutoring scored substantially higher on the posttest than similar students just above the cutoff, confirming that the program had a clear positive impact.

Estimate the Effect Size of the Program: How Much Did Tutoring Improve Student Scores?

We compare the tutoring effect to how much scores usually vary. This tells us how big the improvement was in a way that’s easy to understand, not just that the tutoring helped.

# Extract the RD effect estimate (jump at cutoff)
tutoring_effect <- rd_model$coef[1]  # takes the first coefficient from the RD model, which is the estimated jump in posttest scores at the cutoff
tutoring_effect
## [1] -92.66131
# Calculate a standardized effect size (Cohen's d)
sd_post <- sd(staar_rd$eoy_2425) # calculates the standard deviation of the posttest scores

effect_size_d <- tutoring_effect / sd_post # divides the RD effect by this standard deviation. This gives a standardized effect size, also called Cohen’s d, which tells you how many standard deviations the tutoring improved scores.
effect_size_d
## [1] -0.6963258

The RD analysis shows that students who received tutoring scored about 93 points higher at the cutoff compared to similar students who did not receive tutoring. When we standardize this effect using the posttest score variability, we get a Cohen’s d of approximately 0.70, meaning the tutoring increased scores by about 0.7 standard deviations.

Note: Because we coded students below the cutoff as treated (1), the RD effect appears as a negative number (-93), even though the program increases posttest scores as we saw in the scatterplot. A negative estimate here actually corresponds to a positive tutoring effect.

According to Cohen’s conventions, an effect size of 0.2 is small, 0.5 is medium, and 0.8 is large, so 0.7 represents a substantial positive impact of tutoring program on student performance.


Validity Checks for the RD Design

Bandwidth Selection

The code below is used to select the optimal bandwidth for a regression discontinuity (RD) analysis. In RD, the bandwidth determines how many observations near the cutoff are included when estimating the treatment effect. Choosing the right bandwidth is important: using a bandwidth that is too wide can include students far from the cutoff, who are less comparable, while using one that is too narrow can reduce statistical power. The function rdbwselect() from the rdrobust package computes data-driven, optimal bandwidths for RD estimation using standard methods, such as minimizing mean squared error.

bw <- rdbwselect(y = staar_rd$eoy_2425, x = staar_rd$eoy_2324, c = cutoff_score)
bw
## Call: rdbwselect
## 
## Number of Obs.                 3000
## BW type                       mserd
## Kernel                   Triangular
## VCE method                       NN
## 
## Number of Obs.                 1502         1498
## Order est. (p)                    1            1
## Order bias  (q)                   2            2
## Unique Obs.                     273          271

Out of 3,000 students, about 1,502 are below the cutoff and 1,498 above, but the effective number of observations used for estimation is smaller, focusing on students closest to the cutoff (≈273 below and 271 above). The method used to select the bandwidth is mserd (mean squared error–optimal), with a triangular kernel, which gives more weight to students nearer the cutoff. In short, this shows that the analysis focuses on students near the cutoff to get the most precise and unbiased estimate of the tutoring effect.

Density Test (Manipulation Check)

The code below checks whether students’ pretest scores are distributed smoothly around the cutoff, which is a critical assumption for the regression discontinuity design to produce valid causal estimates.

rd_density <- rddensity(staar_rd$eoy_2324, c = cutoff_score)

summary(rd_density)
## 
## Manipulation testing using local polynomial density estimation.
## 
## Number of obs =       3000
## Model =               unrestricted
## Kernel =              triangular
## BW method =           estimated
## VCE method =          jackknife
## 
## c = 3550              Left of c           Right of c          
## Number of obs         1502                1498                
## Eff. Number of obs    721                 875                 
## Order est. (p)        2                   2                   
## Order bias (q)        3                   3                   
## BW est. (h)           77.425              94.121              
## 
## Method                T                   P > |T|             
## Robust                0.7925              0.4281
## 
## P-values of binomial tests (H0: p=0.5).
## 
## Window Length / 2          <c     >=c    P>|T|
## 2.000                      22      34    0.1409
## 4.000                      37      53    0.1133
## 6.000                      56      78    0.0693
## 8.000                      70      98    0.0369
## 10.000                     93     122    0.0559
## 12.000                    119     144    0.1388
## 14.000                    135     161    0.1461
## 16.000                    158     186    0.1454
## 18.000                    177     207    0.1388
## 20.000                    200     229    0.1764

The density test shows that the distribution of pretest scores (eoy_2324) is smooth around the cutoff (3550). The robust test statistic is 0.79 with a p-value of 0.43, well above 0.05, indicating no evidence of manipulation. What we want is p-value greater than 0.05, meaning there is no statistically significant jump in the distribution of scores at the cutoff.

The binomial tests for different window lengths also show p-values mostly above 0.05, confirming that the number of students just below and above the cutoff is roughly balanced. This means students could not “game the cutoff” (they could not manipulate their scores to qualify for tutoring).

Overall, this supports the key RD assumption, so the observed jump in posttest scores can be confidently interpreted as the causal effect of the tutoring program.

Optional: Covariate Check

The code below checks whether other student characteristics are balanced across the cutoff. A non-significant effect indicates that the RD groups are comparable, supporting the validity of the estimated tutoring effect.

covariate <- rnorm(n)
rdrobust <- rdrobust(y = covariate, x = staar_rd$eoy_2324, c = cutoff_score)

summary(rdrobust)
## Sharp RD estimates using local polynomial regression.
## 
## Number of Obs.                 3000
## BW type                       mserd
## Kernel                   Triangular
## VCE method                       NN
## 
## Number of Obs.                 1502         1498
## Eff. Number of Obs.             649          677
## Order est. (p)                    1            1
## Order bias  (q)                   2            2
## BW est. (h)                  67.372       67.372
## BW bias (b)                 108.726      108.726
## rho (h/b)                     0.620        0.620
## Unique Obs.                     273          271
## 
## =====================================================================
##                    Point    Robust Inference
##                 Estimate         z     P>|z|      [ 95% C.I. ]       
## ---------------------------------------------------------------------
##      RD Effect    -0.042    -0.500     0.617    [-0.369 , 0.219]     
## =====================================================================

What we want: If the covariate does not jump at the cutoff, it confirms that students are comparable, and the observed jump in the posttest is likely caused by the tutoring program. Therefore, the estimate should be close to zero, meaning no jump at the cutoff. Likewise, p > 0.05 (not statistically significant) → no evidence of imbalance.

The covariate check shows that the RD estimate for this student-level variable is -0.042 with a p-value of 0.617, which is not statistically significant. The 95% confidence interval includes zero ([-0.369, 0.219]), indicating that the covariate is balanced across the cutoff. Students just below and just above the cutoff are similar in this characteristic, which supports the validity of the RD design and suggests that the jump in posttest scores is due to the tutoring program, not pre-existing differences.


Note: the same Regression Discontinuity analysis can be implemented in SPSS. For further guidance, material from Dr. Wuensch can be consulted for these purposes (see reference list below).

References