Section 1: Short-answer applications

01. A city wants to know whether legal representation (i.e., lawyers) for tenants in eviction court reduces the probability of eviction. The city’s analysis compares tenants who had a lawyer to tenants who appeared without a lawyer.

(a) Using potential outcomes, decompose this comparison into a treatment-effect component and a selection-bias component.

Ans:

To decompose using potential outcomes, let

Then the observed difference in average eviction rates between tenants who had a lawyer and those who did not is:

\[ E[Y_i \mid D_i = 1] - E[Y_i \mid D_i = 0] \]

This equals:

\[ \underbrace{E[Y_i(1) - Y_i(0) \mid D_i = 1]}_{\text{Average treatment effect on treated}} + \underbrace{[E[Y_i(0) \mid D_i = 1] - E[Y_i(0) \mid D_i = 0]]}_{\text{Selection bias component}}. \]

(b) Give one reason the selection-bias component might be positive and one reason it might be negative in this setting.

Ans:

Positive selection bias: Tenants who obtain lawyers may have stronger cases or more defensible positions (e.g., landlord violations, procedural errors). If tenants with better underlying cases are more likely to secure legal representation, then E[Yi(0) | Di = 1] < E[Yi(0) | Di = 0], making the selection bias component positive. So lawyers appear more effective than they actually are because they’re working with inherently stronger cases.

Negative selection bias: Tenants who obtain lawyers may be those facing the most severe eviction threats or weakest positions, driving them to seek legal help as a last resort. If tenants with worse underlying cases are more likely to need and obtain lawyers, then E[Yi(0) | Di = 1] > E[Yi(0) | Di = 0], making the selection bias component negative. So observed difference underestimates the true treatment effect because lawyers are helping the most disadvantaged tenants.

(c) Describe an ideal experiment that would target the causal question without changing the policy being studied too much.

Ans:

An deal experiment would be a randomized controlled trial (RCT): randomly assign free legal representation to a subset of eligible tenants facing eviction, while others proceed as usual.This preserves normal court processes but ensures that having a lawyer is exogenous, isolating the causal effect of legal representation on eviction probability.

02.

(a)

Ans:

If the city compares eviction outcomes by representation status (represented vs. unrepresented), the comparison is observational, not causal.

\[ E[Y(1) \mid D=1] - E[Y(0) \mid D=0] \]

This difference combines the true treatment effect of legal representation with a selection bias component, since tenants who obtain lawyers may differ systematically from those who do not.

(b)

Ans:

If the city compares eviction outcomes by random-assignment status (informed vs. uninformed), the estimate is the Intention‑to‑Treat (ITT) effect:

\[ E[Y \mid Z=1] - E[Y \mid Z=0] \]

This measures the average causal effect of being offered information about the volunteer lawyers, regardless of whether the tenant actually obtains legal representation.

(c)

Ans:

If the randomized information is used as an instrument for receiving legal representation, a two‑stage least squares (2SLS) model identifies the Local Average Treatment Effect (LATE):

\[ \text{LATE} = E[Y(1) - Y(0) \mid \text{Compliers}] \]

This is the effect of legal representation for tenants whose decision to obtain a lawyer was influenced by being informed.

(d)

Ans:

No, the 2SLS regression does not compare all represented versus all unrepresented tenants. It identifies the causal effect only for compliers (tenants who take up legal aid because they were offered the information) and excludes always‑takers and never‑takers.

(e)

Ans:

The proposed instrument (the randomized information) satisfies the three IV requirements:

  1. Relevance: Randomly informing tenants increases the probability of obtaining legal representation.
  2. Exogeneity: The random assignment of information is independent of unobserved tenant characteristics.
  3. Exclusion restriction: The information affects eviction outcomes only through its effect on obtaining legal representation.

03 A researcher estimates a regression of wages on education and many covariates. She says, “I am not worried about causality; least squares estimates the CEF, and the CEF is causal.”

Explain what is right and wrong in her statement.

Ans:

What is right:

What is wrong:

04.

Ans:

  1. age: Include as a control.

Age may affect both participation (older workers may be less likely to enroll) and earnings potential.
If not controlled, omitted‑variable bias could occur, since age is likely correlated with both training participation and income.

  1. pre-program ability score: Include as a control.

This reflects pre‑existing ability that affects both program participation and later earnings. Controlling for it helps isolate the program’s effect from variation due to initial ability.

  1. post-program ability score: Exclude.

It is likely affected by the training itself and could be a bad control.Including it would “block” part of the treatment effect, biasing the estimate downward because it lies on the causal path between training and earnings.

  1. industry: Think carefully before including.

Industry measured after the program may be endogenous: training might influence which industry someone works in.
If industry assignment is itself an outcome, controlling for it would again remove part of the treatment’s causal effect. If measured before training, it could be used as a control; otherwise, exclude.

  1. employed: Exclude.

Employment status is a likely outcome (perhaps intermediate) of training. Controlling for it would induce bad‑control bias because it mediates part of the program’s effect on earnings.

  1. caseworker: Include only with caution.

Caseworker assignment could be treated as a fixed effect if caseworkers influence both the likelihood of participation and future earnings but are unrelated to unobserved worker traits. If caseworker assignment is correlated with unobserved ability or motivation, controlling for it may introduce bias. It can sometimes serve as a proxy variable if random or exogenous.

05

  1. Which variables are confounders, mediators, and colliders?

Ans:

Confounders:
Variables that affect both D and Y.
- E and C are confounders because each influences D and also has a path to Y (directly or through other nodes).

Mediators:
Variables that lie on the causal path from D to Y.
- A and G are mediators.

Colliders:
Variables where two arrows collide (two incoming arrows).
- F is a collider on paths E → F ← G.
- B is a collider on Y → B ← D.

  1. What control set would you use if your goal is the total effect of D on Y?

Ans:

To estimate the total effect, we should control for confounders but not mediators or colliders.

Control set: {E, C}

This blocks backdoor confounding paths without closing causal (mediating) paths such as D → A → Y.

  1. What control set would you use if your goal is the direct effect of D on Y?

Ans:

To estimate the direct effect, we want to remove the indirect (mediated) channels.

Control set: {E, C, A, G}

This controls for confounding and also holds the mediators fixed, isolating the direct link between D and Y.

  1. What happens if you condition on F?

Ans:

F is a collider on the path E → F ← G.
Conditioning on F opens this blocked path, creating a spurious association between E and G.

That, in turn, opens a non‑causal backdoor path from D to Y through E–F–G–Y, biasing the estimated effect of D.

06..

(a) Is this mismatch between treated and control groups a problem? Hint: Consider the propensity-score theorem.

Ans: Some mismatch is not necessarily a problem.
According to the propensity‑score theorem, if treatment assignment is strongly ignorable given the covariates \(X\), then it is also ignorable given the true propensity score \(p(X)\).

However, in practice the propensity score must be estimated. If the estimated scores do not perfectly balance covariates, it means that the propensity‑score model may be misspecified.

Thus, large imbalances after matching suggest a failure of the balancing property and raise concerns about bias.

(b) What would you check or change before trusting the estimate?

Ans:

Before trusting the estimate, we should:

If the model achieves balance on all relevant covariates, the matching estimate becomes more credible.

(c) Suppose we trust the estimate. What kind of treatment effect does it recover?

Ans:

If we trust the estimate after verifying balance and overlap, the matching estimator recovers the Average Treatment Effect on the Treated (ATT):

\[ E[Y(1) - Y(0) \mid D = 1] \]

This is because matching compares each treated unit to a similar untreated counterpart rather than to the full control population.

07 (a) Explain which estimand each estimator is trying to recover.

Ans:

Estimator 1: This is the Inverse Probability Weighted (IPW) estimator for the Average Treatment Effect (ATE). Each observation is weighted by the inverse probability of receiving the treatment it actually received, ensuring that the reweighted sample represents the full population (as if treatment were randomly assigned).

Formally, it estimates: \[ E[Y(1) - Y(0)] \]

Estimator 2: This weighting scheme targets the Average Treatment Effect on the Treated (ATT). It leaves the treated group unweighted and reweights the control group so its covariate distribution matches that of the treated population.

Formally, it estimates: \[ E[Y(1) - Y(0) \mid D = 1] \]

(b) Is the overlap requirement different for the two approaches? Explain why or why not.

Ans:

The overlap/common support requirement is conceptually similar in both cases:

we need \(0 < \hat{p}(X_i) < 1\) for all observations so that each treatment group has comparable units in the other.

However, in practice:

Thus, the ATT estimator is somewhat less demanding in terms of overlap, because it focuses on the part of the covariate space where treated units are found.

08 (a) Why is the first stage not sufficient to prove the IV design is valid?

Ans:

The first stage only shows that the letter affects program participation (instrument relevance).

However, a valid IV must also satisfy:

The first stage confirms the letter changes participation, but it cannot prove the exclusion restriction.

If the letter itself changes households’ awareness or motivation to save energy, it directly affects energy use, violating exclusion.

Hence, a strong first stage is necessary but not sufficient for a valid instrument.

(b) Why would we expect the 2SLS estimate to differ from the ATE? Under what conditions would the two estimates be equal?

Ans:

2SLS estimates the Local Average Treatment Effect (LATE) which is the effect of the program on compliers, i.e., households that join because they receive the letter.

The Average Treatment Effect (ATE) is the effect for all households.

We expect the 2SLS estimate to differ from the ATE because:

The two estimates would be equal only if treatment effects are homogeneous, meaning all households experience the same energy savings from joining the program.

(c) Suppose the letter also includes energy-saving tips. Which IV assumption(s) is threatened?

Ans:

If the letter includes energy‑saving tips, households may change their energy use even without joining the program.
This violates the exclusion restriction, because the instrument now directly affects the outcome through its content rather than only through program participation.
The instrument no longer isolates exogenous variation in treatment alone, making the IV design invalid.

Q9.

Ans:

This difference can occur even if both instruments are valid because each identifies a different Local Average Treatment Effect (LATE).
Each instrument affects a different subgroup of the population:

If treatment effects are heterogeneous across income groups, the two IV estimates will differ even though both are internally valid.This difference is therefore consistent with theory, not a sign of invalidity.

Why averaging the two IV estimates would be misleading:

Averaging the two IV estimates assumes each instrument identifies the same underlying treatment effect. That assumption fails here because the LATEs apply to different groups.

Averaging would misrepresent the causal effect for any particular population and obscure meaningful heterogeneity in the program’s impact.

Q10.

(a) Is this RD sharp or fuzzy? Explain your answer.

Ans:

This is a fuzzy RD design.In a sharp RD, treatment status changes from 0 to 1 exactly at the cutoff. Here, students can retake the test, and not everyone above 70 necessarily receives the scholarship.The presence of manipulation and imperfect compliance means treatment assignment is correlated with, but not perfectly determined by, the running variable. Hence, the probability of treatment changes discontinuously but not perfectly, making it a fuzzy RD.

(b) Which pieces of evidence threaten the RD design? Explain your answer.

Ans:

Two features threaten the validity of this RD:

Bunching just above 70: This indicates students are manipulating their scores to cross the threshold. Manipulation breaks the key RD assumption that units just below and just above the cutoff are comparable. So the unusually high number of scores just above the cutoff suggest people are trying to qualify for treatment. This violates the assumption of random assignment around the cutoff.

Large jump in parental education: This suggests that higher-educated parents help children retake or prepare better for the test, meaning that the composition of students above and below the cutoff differs systematically.This also violates the continuity assumption that potential outcomes vary smoothly at the cutoff.

Q11.

How the researcher should think about this disagreement

Ans:

This difference arises because the two specifications rely on different identification strategies and trade-offs between bias and variance.

  1. Large bandwidth, quadratic specification which Uses many observations far from the cutoff and may impose a strong functional form (quadratic) that does not fit the true relationship well. If the functional form is misspecified, it can produce biased estimates, even with small standard errors.

  2. Small bandwidth, local linear specification which focuses on observations very close to the cutoff, where the RD design is most credible. Reduces bias from functional form misspecification but increases sampling variability (wider standard errors).A smaller, insignificant estimate near the cutoff is often more reliable because it captures the local treatment effect with fewer assumptions.

How the researcher should proceed?

The main goal of RD is to estimate the local treatment effect at the cutoff. The local linear (smaller bandwidth) estimate should generally be preferred because it relies less on global functional form assumptions. The large, significant estimate obtained with the quadratic specification may be spurious, driven by extrapolation away from the cutoff or by incorrect curvature assumptions.

The researcher can check robustness by: - Trying several bandwidths and plotting the results. - Using visual diagnostics (e.g., RD plots) to assess fit near the cutoff.

Q12.

Ans:

The most likely explanation is that the standard errors are underestimated, which makes the confidence intervals too narrow.Even though the estimator is unbiased (the average estimate is close to the true value), incorrect standard errors lead to overly optimistic inference.

Q13.

Ans:

(a) Explain why this argument is wrong.

The argument is wrong because the randomization occurs at the village level, not at the individual level. This means treatment status is identical for all individuals within a village, creating within-cluster correlation.

As a result outcomes of individuals in the same village are not independent, and the effective sample size is only about 30, not 20,000. So using IID (individual-level) standard errors massively understates the true sampling variability, leading to overstated statistical significance.

Therefore, inference must account for cluster-level dependence by clustering standard errors at the village level.

(b) Explain how you could use a permutation test in this setting.

A permutation (randomization) test can exploit the fact that treatment is randomized across villages.

Steps: 1. Compute the observed difference in average outcomes between treated and control villages.
2. Randomly reassign treatment to villages (keeping the number of treated and control villages fixed).
3. For each reassignment, recompute the mean difference.
4. Build the empirical distribution of these placebo differences.
5. Compare the observed treatment effect to this distribution to obtain a p-value.

This approach gives valid inference under the actual randomization scheme and does not rely on large-sample or independence assumptions.

(c) Would bootstrapping work in this case? Explain.

Bootstrapping could work only if done at the correct level of randomization. That is, by resampling villages, not individuals.

Thus, cluster (village-level) bootstrapping is conceptually valid but may perform poorly in small samples.

Q14.

  1. Which part of the potential-outcomes framework is threatened?

Ans:

The Stable Unit Treatment Value Assumption (SUTVA) is threatened. Specifically, the no-interference part because students’ outcomes may be affected by their classmates’ treatment status.

  1. How would this threat change the meaning of \(Y_{1i}\) and \(Y_{0i}\)?

Ans:

Each potential outcome now depends not only on student \(i\)’s own treatment but also on others’ treatment assignments within the same class. Thus, \(Y_{1i}\) and \(Y_{0i}\) no longer represent well-defined individual potential outcomes; they vary depending on the treatment mix of peers.

  1. Explain how you might update the proposed research design and analysis to make the causal question clearer.

Ans:

To address interference, the researcher could:

Q15.

Ans:

AI can predict outcomes well but cannot identify causal effects without knowing how treatment is assigned.The fundamental problem of causal inference is that for each unit, we never observe both potential outcomes but only one of them. Prediction models learn correlations, not causal relationships.

Unless data come from an experiment or a valid causal design (e.g., IV, RD, diff-in-diff), AI cannot distinguish whether relationships are due to treatment or confounding factors.No algorithm can infer the missing counterfactual without assumptions about the data-generating process or treatment assignment mechanism.

Section 2: A simulated evaluation

Q16 .

Ans:

The Causal question would be:

“What is the causal effect of participating in the summer bridge program (treatment \(D_i = 1\)) on a student’s first-year GPA (\(Y_i\))?”

Formally, the question asks for the average difference in outcomes if the same student were to attend versus not attend the bridge program: \[ E[Y_i(1) - Y_i(0)]. \]

An Ideal experiment would be to randomly assign incoming students to either:

This randomization would eliminate confounding from unobserved factors such as motivation or ability (\(U_i\)), ensuring that any difference in first-year GPA can be causally attributed to participation in the program.

Q17.

Ans:

The Average Treatment Effect (ATE) is the expected value of \(\tau_i\):

\[ E[\tau_i] = 0.25 + 0.10E[U_i] \]

Since \(E[U_i] = 0\),

\[ \boxed{ATE = 0.25} \]

Q18.

# Load required packages
library(dagitty)
## Warning: package 'dagitty' was built under R version 4.5.3
library(ggdag)
## Warning: package 'ggdag' was built under R version 4.5.3
## 
## Attaching package: 'ggdag'
## The following object is masked from 'package:stats':
## 
##     filter
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.5.3
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.5.3
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(rdrobust)
## Warning: package 'rdrobust' was built under R version 4.5.3
library(AER)
## Warning: package 'AER' was built under R version 4.5.3
## Loading required package: car
## Loading required package: carData
## 
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
## 
##     recode
## Loading required package: lmtest
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## Loading required package: sandwich
## Loading required package: survival
# Q18.

dag <- dagitty("
dag {
  X [pos=\"0,2\"]
  U [pos=\"1,3\"]
  R [pos=\"1,1\"]
  Z [pos=\"2,0\"]
  D [pos=\"3,1\"]
  M [pos=\"4,3\"]
  Y [pos=\"5,2\"]

  X -> R
  R -> Z
  Z -> D
  X -> D
  U -> D
  D -> M
  U -> M
  X -> M
  D -> Y
  X -> Y
  U -> Y
  R -> Y
}
")

# Create the plot with a single label layer
ggdag(dag) +                      # no use_labels and no text argument
  geom_dag_text(color = "white") + # render node labels (one set only)
  theme_dag_blank() +
  labs(
    title = "DAG for Summer Bridge Program Evaluation",
    subtitle = "Data Generating Process"
  ) +
  theme(
    plot.title = element_text(hjust = 0.5),
    plot.subtitle = element_text(hjust = 0.5)
  )

Q19.

Ans:

The econometrician observes:
\(X, R, Z, D, M, Y\)

The econometrician does not observe: \(U\)

\(U\) represents unobserved student ability or motivation that affects attendance (\(D\)), study habits (\(M\)), and GPA (\(Y\)).

Why This Matters:

(i). Regression with controls

Regression estimates the causal effect of \(D\) on \(Y\) controlling for observed covariates (\(X, R, M, Z\)).Since \(U\) is omitted and correlated with both \(D\) and \(Y\), the exclusion of \(U\) leads to omitted variable bias. Thus, regression alone will not yield a consistent estimate of the treatment effect.

(ii). Matching or propensity-score weighting

These methods adjust for observable covariates like \(X\) or \(R\). However, \(U\) remains unobserved, and since it influences both participation and outcomes, matching will not eliminate the bias due to selection on unobservables.

(iii). IV using \(Z\) as an instrument

\(Z\) (eligibility) affects \(D\) but is not directly caused by \(U\).If the exclusion restriction holds (i.e., \(Z\) affects \(Y\) only through \(D\)), IV can correct for endogeneity stemming from \(U\). Therefore, this approach can produce a consistent estimate of the local average treatment effect (LATE).

(iv). Regression Discontinuity (RD) using \(R\) as the running variable

RD relies on the cutoff in \(R\) determining \(Z\). If \(U\) changes smoothly around the cutoff, comparisons just above and below it approximate random assignment. Thus, RD can identify the causal effect for students near the threshold, even though \(U\) is unobserved.

Q20.

# Set the seed for reproducibility
set.seed(6072026)

# Number of students
n <- 5000

# Generate exogenous variables
X <- rnorm(n, 0, 1)
U <- rnorm(n, 0, 1)
nu <- rnorm(n, 0, 1)

# Running variable and eligibility
R <- 0.7 * X + nu
Z <- ifelse(R >= 0, 1, 0)

# Latent attendance and realized attendance
xi <- rnorm(n, 0, 1)
D_star <- -0.2 + 0.8 * Z + 0.4 * X + 0.5 * U + xi
D <- ifelse(D_star >= 0, 1, 0)

# Individual treatment effect
tau <- 0.25 + 0.10 * U

# Outcome (first-year GPA)
eps <- rnorm(n, 0, 0.5)
Y <- 2.4 + tau * D + 0.35 * X + 0.40 * U + 0.10 * R + eps

# Post-treatment study-habits measure
eta <- rnorm(n, 0, 1)
M <- 0.4 * D + 0.3 * X + 0.3 * U + eta

# Combine into a data frame (as observed by the econometrician: X, R, Z, D, Y, M)
data_dgp <- data.frame(Y, D, Z, R, X, M)

# Display basic summary statistics
summary(data_dgp)
##        Y                 D                Z                R            
##  Min.   :-0.5683   Min.   :0.0000   Min.   :0.0000   Min.   :-4.471241  
##  1st Qu.: 1.9670   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:-0.851615  
##  Median : 2.5633   Median :1.0000   Median :0.0000   Median :-0.014848  
##  Mean   : 2.5567   Mean   :0.5702   Mean   :0.4954   Mean   :-0.006408  
##  3rd Qu.: 3.1305   3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.: 0.816493  
##  Max.   : 5.7813   Max.   :1.0000   Max.   :1.0000   Max.   : 4.438084  
##        X                   M          
##  Min.   :-3.462979   Min.   :-3.7440  
##  1st Qu.:-0.674040   1st Qu.:-0.5356  
##  Median : 0.007856   Median : 0.2459  
##  Mean   :-0.003644   Mean   : 0.2418  
##  3rd Qu.: 0.669703   3rd Qu.: 1.0123  
##  Max.   : 3.950339   Max.   : 4.1844

Q21. Ans:

# Estimate the naive regression: Y on D only
model_naive <- lm(Y ~ D, data = data_dgp)

# Display the regression results
summary(model_naive)
## 
## Call:
## lm(formula = Y ~ D, data = data_dgp)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.82834 -0.47887 -0.00819  0.49262  2.86076 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.07395    0.01586  130.79   <2e-16 ***
## D            0.84663    0.02100   40.32   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7351 on 4998 degrees of freedom
## Multiple R-squared:  0.2454, Adjusted R-squared:  0.2453 
## F-statistic:  1625 on 1 and 4998 DF,  p-value: < 2.2e-16

From the results:

\[ \hat{\beta}_1 = 0.8466 \]

Interpretation: Students who attended the bridge program (\(D = 1\)) have, on average, a 0.85 point higher first-year GPA than those who did not attend (\(D = 0\)).

Should it recover the ATE from the DGP?

No. The naive regression does not recover the true Average Treatment Effect (ATE) because program attendance \(D\) is endogenous. It depends partly on the unobserved factor \(U\), which also affects GPA (\(Y\)). This induces selection bias, meaning the estimated \(\hat{\beta}_1\) captures both the treatment effect and pre-existing differences in ability or motivation, not the causal effect alone.

Q22.

Ans:

# Run the controlled regression
model_controls <- lm(Y ~ D + X + R, data = data_dgp)

# Display the summary of the model
summary(model_controls)
## 
## Call:
## lm(formula = Y ~ D + X + R, data = data_dgp)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.44197 -0.42909  0.00136  0.44728  2.21781 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 2.232827   0.014766 151.211  < 2e-16 ***
## D           0.570756   0.020189  28.271  < 2e-16 ***
## X           0.314194   0.011435  27.476  < 2e-16 ***
## R           0.066861   0.009267   7.215  6.2e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6522 on 4996 degrees of freedom
## Multiple R-squared:  0.4061, Adjusted R-squared:  0.4058 
## F-statistic:  1139 on 3 and 4996 DF,  p-value: < 2.2e-16
Model Specification β̂₁ (Estimate on D)
Naive OLS \(Y_i = \beta_0 + \beta_1 D_i + e_i\) 0.847
OLS with X, R controls \(Y_i = \beta_0 + \beta_1 D_i + \beta_2 X_i + \beta_3 R_i + e_i\) 0.571

Interpretation:

The estimated effect of program attendance drops substantially from 0.85 to 0.57 after adding controls for \(X\) and \(R\).

Sources of Bias:

So adding observed controls moves the estimate closer to the true effect but does not fully eliminate bias because unobserved motivation \(U\) remains a confounder.

Q23.

Ans:

model_rich <- lm(Y ~ D + X + R + M, data = data_dgp)

# Display model summary
summary(model_rich)
## 
## Call:
## lm(formula = Y ~ D + X + R + M, data = data_dgp)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.59077 -0.41920 -0.01102  0.43970  2.15348 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 2.243879   0.014572 153.981  < 2e-16 ***
## D           0.505490   0.020570  24.574  < 2e-16 ***
## X           0.283689   0.011529  24.606  < 2e-16 ***
## R           0.071060   0.009134   7.779 8.79e-15 ***
## M           0.107846   0.008689  12.412  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6425 on 4995 degrees of freedom
## Multiple R-squared:  0.4239, Adjusted R-squared:  0.4234 
## F-statistic: 918.9 on 4 and 4995 DF,  p-value: < 2.2e-16

After adding the post-treatment variable \(M\) to the model, the estimated coefficient on \(D\) decreased from 0.571 (in Q22) to 0.505.

This coefficient should not be interpreted as the total effect of program attendance because \(M\) is a mediator, not a confounder. By controlling for \(M\), we are blocking part of the true causal pathway from \(D\) to \(Y\), thus understating the total program effect.

The RD and IV design

Q24.

# Set parameters for binning
binwidth <- 0.25  # width of each bin
cutoff <- 0        # RD cutoff

# 1. TREATMENT (Attendance D) by Running Variable (R)

binned_D <- data_dgp %>%
  mutate(bin = cut(R, breaks = seq(min(R), max(R), by = binwidth))) %>%
  group_by(bin) %>%
  summarise(
    R_bin = mean(R, na.rm = TRUE),
    D_bin = mean(D, na.rm = TRUE),
    n = n()
  )

p1 <- ggplot(binned_D, aes(x = R_bin, y = D_bin)) +
  geom_point(color = "steelblue") +
  geom_vline(xintercept = cutoff, linetype = "dashed", color = "black") +
  labs(
    title = "Program Attendance by Running Variable (Binned Means)",
    x = "Running Variable (R)",
    y = "Mean Attendance (Pr(D = 1))"
  ) +
  theme_minimal()


# 2. OUTCOME (First-Year GPA Y) by Running Variable (R)

binned_Y <- data_dgp %>%
  mutate(bin = cut(R, breaks = seq(min(R), max(R), by = binwidth))) %>%
  group_by(bin) %>%
  summarise(
    R_bin = mean(R, na.rm = TRUE),
    Y_bin = mean(Y, na.rm = TRUE),
    n = n()
  )

p2 <- ggplot(binned_Y, aes(x = R_bin, y = Y_bin)) +
  geom_point(color = "darkgreen") +
  geom_vline(xintercept = cutoff, linetype = "dashed", color = "black") +
  labs(
    title = "Outcome by Running Variable (Binned Means)",
    x = "Running Variable (R)",
    y = "Average GPA (Y)"
  ) +
  theme_minimal()


# 3. DENSITY (Counts) of Running Variable (R) near Cutoff

p3 <- ggplot(data_dgp, aes(x = R)) +
  geom_histogram(binwidth = binwidth, fill = "gray70", color = "black") +
  geom_vline(xintercept = cutoff, linetype = "dashed", color = "red") +
  labs(
    title = "Density of Running Variable Near the Cutoff",
    x = "Running Variable (R)",
    y = "Count"
  ) +
  theme_minimal()

# Display figures
p1

p2

p3

Figure 1. Program Attendance by Running Variable

There is a clear upward jump in average program attendance at the cutoff (R=0).
Students just above the threshold are much more likely to attend the bridge program than those just below,indicating a discontinuity in treatment probability. This confirms an effective first stage and supports a fuzzy RD design.

Figure 2. Outcome by Running Variable

Average GPA increases with the running variable, and a small visible rise appears at the cutoff. This suggests a modest positive discontinuity in outcomes for students just above the eligibility threshold, consistent with a positive local treatment effect of the program near R=0.

Figure 3. Density of the Running Variable Near the Cutoff

The distribution of R is smooth and symmetric around 0 with no visible bunching or breaks. This indicates no evidence of strategic manipulation or sorting near the eligibility cutoff,supporting the validity of the RD identification assumption.

Q25.

Ans:

The RD in this simulated design is fuzzy, and not sharp beacause from the first figure (“Program Attendance by Running Variable”), there is a discrete jump in the probability of attending the bridge program at the cutoff \(R = 0\), but the jump is incomplete. So attendance does not go from 0 to 1. This means eligibility (\(Z_i\)) strongly encourages attendance (\(D_i\)) but does not determine it perfectly, some eligible students do not attend, and some ineligible students do.

This pattern matches the DGP:
\[ D_i^* = -0.2 + 0.8 Z_i + 0.4 X_i + 0.5 U_i + \xi_i \] where both observed (\(X_i\)) and unobserved (\(U_i, \xi_i\)) factors influence attendance beyond eligibility. Because treatment assignment changes probabilistically rather than deterministically at the cutoff, this is a fuzzy RD design.

Q26.

# Define cutoff and bandwidth
cutoff <- 0
bandwidth <- 0.25

# Subset local window around the cutoff
local_data <- subset(data_dgp,
                     R >= (cutoff - bandwidth) & R <= (cutoff + bandwidth))

# 1. Reduced-form regression: effect of Z on Y near cutoff
reduced_form <- lm(Y ~ Z + R + I(R >= cutoff):R, data = local_data)
summary(reduced_form)
## 
## Call:
## lm(formula = Y ~ Z + R + I(R >= cutoff):R, data = local_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.29186 -0.51784  0.04654  0.48218  2.05423 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           2.44755    0.07844  31.204   <2e-16 ***
## Z                     0.06324    0.10740   0.589    0.556    
## R                    -0.02838    0.55396  -0.051    0.959    
## R:I(R >= cutoff)TRUE  0.68461    0.75409   0.908    0.364    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.755 on 783 degrees of freedom
## Multiple R-squared:  0.01071,    Adjusted R-squared:  0.00692 
## F-statistic: 2.826 on 3 and 783 DF,  p-value: 0.0378
# 2. First-stage regression: effect of Z on D near cutoff
first_stage <- lm(D ~ Z + R + I(R >= cutoff):R, data = local_data)
summary(first_stage)
## 
## Call:
## lm(formula = D ~ Z + R + I(R >= cutoff):R, data = local_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.7190 -0.4051  0.2873  0.3136  0.6222 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            0.3777     0.0493   7.662 5.43e-14 ***
## Z                      0.3034     0.0675   4.495 8.02e-06 ***
## R                     -0.1640     0.3482  -0.471    0.638    
## R:I(R >= cutoff)TRUE   0.3169     0.4740   0.669    0.504    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4745 on 783 degrees of freedom
## Multiple R-squared:  0.09276,    Adjusted R-squared:  0.08929 
## F-statistic: 26.69 on 3 and 783 DF,  p-value: < 2.2e-16

Choice of Bandwidth

A bandwidth of (+0.25, -0.25) around the cutoff was used to focus estimation on observations close to the eligibility threshold, where the discontinuity in program attendance is most evident. As shown in the graphs from Q24 and consistent with local estimation, this window minimizes bias from nonlinearity in the running variable while retaining enough observations for stable inference.

Functional form:
A linear specification with an interaction term I(R >= cutoff):R allows for different slopes on either side of the cutoff, capturing possible changes in the relationship between the running variable and the outcomes.

Q27.

# Using the same cutoff and bandwidth as Q26
cutoff <- 0
bandwidth <- 0.25

# Subset local window around the cutoff
local_data <- subset(data_dgp,
                     R >= (cutoff - bandwidth) & R <= (cutoff + bandwidth))

# Fuzzy RD / IV estimation using 2SLS
# Instrument = Z (eligibility indicator)
# Functional form same as Q26: include R and the interaction I(R >= cutoff):R
fuzzy_iv <- ivreg(Y ~ D + R + I(R >= cutoff):R | 
                    Z + R + I(R >= cutoff):R,
                  data = local_data)

# Display the results
summary(fuzzy_iv)
## 
## Call:
## ivreg(formula = Y ~ D + R + I(R >= cutoff):R | Z + R + I(R >= 
##     cutoff):R, data = local_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.14743 -0.48001  0.02273  0.45433  1.98817 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          2.368817   0.189487  12.501   <2e-16 ***
## D                    0.208443   0.338138   0.616    0.538    
## R                    0.005799   0.495866   0.012    0.991    
## R:I(R >= cutoff)TRUE 0.618555   0.721165   0.858    0.391    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7212 on 783 degrees of freedom
## Multiple R-Squared: 0.09724, Adjusted R-squared: 0.09378 
## Wald test: 3.096 on 3 and 783 DF,  p-value: 0.02626

Interpretation:

The fuzzy RD / IV estimate of program attendance on first-year GPA is 0.208 (SE=0.338), suggesting a small and statistically insignificant positive effect. This estimate captures the local average treatment effect (LATE) for compliers, students whose decision to attend the bridge program was influenced by becoming eligible (i.e., crossing the placement‐test cutoff).It reflects the causal effect of attending the program for students near the cutoff who enroll only because they become eligible. The estimate is local to this group and cannot be generalized to all students or to those far from the eligibility threshold.

Q28.

Ans:

The OLS estimate of program attendance on GPA is 0.505 (SE=0.021), while the fuzzy-RD/IV local estimate is 0.208 (SE=0.338).

The OLS estimate is considerably larger because it captures both the causal effect of attendance and selection bias. Students who choose to attend may differ systematically from those who do not.

In contrast, the fuzzy-RD/IV estimate identifies a local average treatment effect (LATE) for compliers near the cutoff, where eligibility is effectively randomized. This local estimate is less biased but noisier due to a smaller effective sample and imperfect compliance.

Inference

Q29.

#fuzzy_iv model from part Q27
coeftest(fuzzy_iv, vcov = vcovHC(fuzzy_iv, type = "HC1"))
## 
## t test of coefficients:
## 
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          2.3688169  0.1877648 12.6159   <2e-16 ***
## D                    0.2084429  0.3387845  0.6153   0.5386    
## R                    0.0057993  0.4834048  0.0120   0.9904    
## R:I(R >= cutoff)TRUE 0.6185549  0.7324169  0.8445   0.3986    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Explain Your Choice:

I used heteroskedasticity-robust (HC1) standard errors because RD designs often involve non-constant variance near the cutoff due to local fitting and variation in treatment probability (fuzzy design). These robust SEs correct for heteroskedasticity without assuming constant error variance and are appropriate with independent individual-level data. Clustering was not needed since observations are i.i.d. in the simulated sample. This approach provides more reliable inference for the fuzzy-RD/IV estimator.

Q30.

set.seed(6072026)

# Define cutoff
cutoff <- 0
bandwidths <- c(0.25, 0.50)
B <- 500  # number of bootstrap replications

# Function to run fuzzy RD/IV and extract coefficient on D
run_fuzzy_iv <- function(data, cutoff, bandwidth) {
  local_data <- subset(data,
                       R >= (cutoff - bandwidth) & R <= (cutoff + bandwidth))
  
  model <- ivreg(Y ~ D + R + I(R >= cutoff):R |
                   Z + R + I(R >= cutoff):R,
                 data = local_data)
  coef(model)["D"]
}

# Bootstrap results for each bandwidth
bootstrap_se <- function(bandwidth) {
  estimates <- replicate(B, {
    boot_idx <- sample(1:nrow(data_dgp), replace = TRUE)
    boot_data <- data_dgp[boot_idx, ]
    run_fuzzy_iv(boot_data, cutoff, bandwidth)
  })
  sd(estimates)
}

# Compute bootstrap standard errors for two bandwidths
se_bw1 <- bootstrap_se(bandwidths[1])
se_bw2 <- bootstrap_se(bandwidths[2])

se_bw1
## [1] 0.3571991
se_bw2
## [1] 0.2860775

Q31.

Ans:

The bootstrap results from Q30 show that the standard error decreases when the bandwidth increases:

Q32.

Ans:

The actual standard error should not combine information from both bootstraps. Each bandwidth reflects a distinct trade-off between bias and variance. Therefore, inference should rely on the preferred (justified) bandwidth, while sensitivity to alternative bandwidths can be reported as a robustness check, not averaged across.Combining SEs would misrepresent the sampling variability tied to a specific model choice.

Q33.

Ans:

I would emphasize the fuzzy RD/IV estimate using the smaller bandwidth (0.25) because it best balances precision and local validity near the cutoff. This estimate targets the Local Average Treatment Effect (LATE) for compliers. The LATE is policy-relevant since it captures how an encouragement policy (eligibility) affects those marginally induced to attend the bridge program. The key identifying assumptions are continuity of potential outcomes and first-stage relevance, that eligibility shifts attendance but has no direct effect on GPA other than through attendance. Bandwidth choice affects how “local” this interpretation is: narrower windows reduce bias but increase varaince, while my inference relies on robust, bootstrap-based SEs to ensure credible uncertainty estimates.

From this simulated exercise, I learned that RD and IV designs can yield credible causal estimates when assumptions hold and diagnostics support validity like no manipulation around the cutoff, and smooth outcomes. However, these designs are local and design-specific. Hence, results may not generalize beyond the cutoff or to all students. So this specific case reinforces how inference decisions (bandwidth, functional form, resampling) critically shape both the statistical strength and policy interpretation of causal findings.