In an era of rapid technological change and economic displacement, policymakers are increasingly turning to workforce training programs as a solution. Programs like Workforce Pell promise to retrain displaced workers and equip them with skills for the modern economy. But do these programs actually work?
ACME Inc., a company that runs job training programs, claims their programs significantly boost earnings by $1,890 per year according to their analysis. However, their study suffers from a critical flaw: selection bias. People who choose to enroll in training are likely different from those who don’t they might be more motivated, better connected, or have better prospects to begin with.
This analysis investigates whether job training programs genuinely improve earnings, or if the claimed benefits are simply artifacts of motivated individuals self-selecting into training.
The American workforce faces unprecedented challenges:
In response, programs like Workforce Pell have emerged, promising to retrain displaced workers and equip them with skills for the modern economy.
ACME Inc. operates job training programs and claims remarkable success. Their evidence? A regression analysis showing that training participants earn $1,890 more than non-participants (p < 0.001).
But there’s a problem…
The Fundamental Problem: ACME’s data comes from a convenience sample, not a randomized experiment. Participants self-selected into training, which means:
This creates simultaneity bias: we can’t tell if training causes higher earnings, or if people destined to earn more are simply more likely to get training.
# Create summary statistics
summary_stats <- jobs %>%
summarise(
`Sample Size` = as.character(n()),
`Trained Workers` = as.character(sum(train)),
`% Trained` = percent(mean(train)),
`Avg Age` = as.character(round(mean(age), 1)),
`Avg Education` = as.character(round(mean(educ), 1)),
`% Black` = percent(mean(black)),
`% Hispanic` = percent(mean(hisp)),
`% Married` = percent(mean(married)),
`Avg 1996 Earnings` = dollar(mean(earn96) * 1000),
`Avg 1998 Earnings` = dollar(mean(earn98) * 1000)
) %>%
pivot_longer(everything(), names_to = "Characteristic", values_to = "Value")
summary_stats %>%
kable(caption = "Table 1: Dataset Overview") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
full_width = FALSE)| Characteristic | Value |
|---|---|
| Sample Size | 1130 |
| Trained Workers | 376 |
| % Trained | 33% |
| Avg Age | 31.9 |
| Avg Education | 11 |
| % Black | 44% |
| % Hispanic | 5% |
| % Married | 69% |
| Avg 1996 Earnings | $12,199.98 |
| Avg 1998 Earnings | $9,927.75 |
# Create demographic comparison plots
p1 <- ggplot(jobs, aes(x = factor(train, labels = c("No Training", "Training")),
fill = factor(train))) +
geom_bar(stat = "count", alpha = 0.8) +
geom_text(stat = 'count', aes(label = after_stat(count)), vjust = -0.5) +
scale_fill_manual(values = c("#E74C3C", "#3498DB")) +
labs(title = "Training Participation", x = "", y = "Count") +
theme_minimal() +
theme(legend.position = "none",
plot.title = element_text(hjust = 0.5, face = "bold"))
p2 <- ggplot(jobs, aes(x = age, fill = factor(train, labels = c("No Training", "Training")))) +
geom_density(alpha = 0.6) +
scale_fill_manual(values = c("#E74C3C", "#3498DB")) +
labs(title = "Age Distribution", x = "Age", y = "Density", fill = "Group") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5, face = "bold"))
p3 <- ggplot(jobs, aes(x = educ, fill = factor(train, labels = c("No Training", "Training")))) +
geom_histogram(position = "dodge", bins = 15, alpha = 0.8) +
scale_fill_manual(values = c("#E74C3C", "#3498DB")) +
labs(title = "Education Distribution", x = "Years of Education", y = "Count", fill = "Group") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5, face = "bold"))
p4 <- ggplot(jobs, aes(x = earn96, fill = factor(train, labels = c("No Training", "Training")))) +
geom_density(alpha = 0.6) +
scale_fill_manual(values = c("#E74C3C", "#3498DB")) +
labs(title = "1996 Earnings Distribution",
x = "Earnings (1996, $1000s)", y = "Density", fill = "Group") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5, face = "bold"))
grid.arrange(p1, p2, p3, p4, ncol = 2,
top = "Figure 1: Demographic Characteristics by Training Status")Initial Observation: Even before formal analysis, we can see differences between trained and untrained groups in age, education, and prior earnings. This suggests selection bias may be present.
Let’s start with the simplest approach: comparing earnings between trained and untrained workers.
A t-test asks a simple question: “Are the average earnings of trained workers significantly different from untrained workers?”
# Perform t-test on 1998 earnings
ttest_result <- t.test(earn98 ~ train, data = jobs)
# Display results
ttest_df <- data.frame(
Group = c("No Training", "Training"),
`Mean Earnings` = c(
dollar(ttest_result$estimate[1] * 1000),
dollar(ttest_result$estimate[2] * 1000)
),
check.names = FALSE
)
difference <- ttest_result$estimate[2] - ttest_result$estimate[1]
pvalue <- ttest_result$p.value
ttest_df %>%
kable(caption = "Table 2: Simple t-Test Results - 1998 Earnings") %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)| Group | Mean Earnings | |
|---|---|---|
| mean in group 0 | No Training | $10,609.90 |
| mean in group 1 | Training | $8,559.84 |
t-Test Result:
# Create comparison plot
earn_summary <- jobs %>%
group_by(train) %>%
summarise(
mean_earn = mean(earn98),
se = sd(earn98) / sqrt(n()),
.groups = 'drop'
) %>%
mutate(train_label = factor(train, labels = c("No Training", "Training")))
ggplot(earn_summary, aes(x = train_label, y = mean_earn, fill = train_label)) +
geom_bar(stat = "identity", alpha = 0.8, width = 0.6) +
geom_errorbar(aes(ymin = mean_earn - 1.96*se, ymax = mean_earn + 1.96*se),
width = 0.2, linewidth = 1) +
geom_text(aes(label = dollar(mean_earn * 1000)),
vjust = -0.5, size = 5, fontface = "bold") +
scale_fill_manual(values = c("#E74C3C", "#3498DB")) +
labs(title = "Figure 2: Mean Earnings by Training Status (1998)",
subtitle = "Error bars represent 95% confidence intervals",
x = "",
y = "Mean Earnings ($1000s)") +
theme_minimal() +
theme(legend.position = "none",
plot.title = element_text(hjust = 0.5, face = "bold", size = 14),
plot.subtitle = element_text(hjust = 0.5, size = 10))But wait… This simple comparison doesn’t account for other factors that might differ between the groups!
A t-test only compares group means. But what if trained workers are: - Older or younger? - More educated? - Already earning more in 1996? - More likely to be married?
Regression analysis allows us to control for these confounding variables, isolating the effect of training while holding other factors constant.
# Run the naive regression (ACME's model)
naive_model <- lm(earn98 ~ train + age + educ + black + hisp + married + earn96 + unem96,
data = jobs)
# Display results
naive_results <- tidy(naive_model, conf.int = TRUE) %>%
mutate(
term = case_when(
term == "(Intercept)" ~ "Constant",
term == "train" ~ "Training",
term == "age" ~ "Age",
term == "educ" ~ "Education (years)",
term == "black" ~ "Black",
term == "hisp" ~ "Hispanic",
term == "married" ~ "Married",
term == "earn96" ~ "1996 Earnings",
term == "unem96" ~ "Unemployed 1996",
TRUE ~ term
),
significance = case_when(
p.value < 0.01 ~ "***",
p.value < 0.05 ~ "**",
p.value < 0.1 ~ "*",
TRUE ~ ""
)
)
naive_results %>%
select(Variable = term,
Coefficient = estimate,
`Std. Error` = std.error,
`t-value` = statistic,
`p-value` = p.value,
` ` = significance) %>%
mutate(
Coefficient = round(Coefficient, 3),
`Std. Error` = round(`Std. Error`, 3),
`t-value` = round(`t-value`, 2),
`p-value` = format.pval(`p-value`, digits = 3, eps = 0.001)
) %>%
kable(caption = "Table 3: ACME's Regression Results (Naive Model)",
align = c('l', 'r', 'r', 'r', 'r', 'c')) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed")) %>%
row_spec(2, bold = TRUE, background = "#fff3cd") # Highlight training coefficient| Variable | Coefficient | Std. Error | t-value | p-value | |
|---|---|---|---|---|---|
| Constant | 5.843 | 1.311 | 4.46 | <0.001 | *** |
| Training | 1.885 | 0.443 | 4.26 | <0.001 | *** |
| Age | -0.162 | 0.019 | -8.42 | <0.001 | *** |
| Education (years) | 0.408 | 0.069 | 5.92 | <0.001 | *** |
| Black | -0.116 | 0.408 | -0.29 | 0.776 | |
| Hispanic | 2.410 | 0.875 | 2.75 | 0.006 | *** |
| Married | 2.344 | 0.431 | 5.44 | <0.001 | *** |
| 1996 Earnings | 0.277 | 0.027 | 10.22 | <0.001 | *** |
| Unemployed 1996 | -2.886 | 0.598 | -4.83 | <0.001 | *** |
# Get model statistics
rsquared <- summary(naive_model)$r.squared
adj_rsquared <- summary(naive_model)$adj.r.squared
fstat <- summary(naive_model)$fstatistic[1]Model Fit Statistics:
# Create coefficient plot (excluding intercept)
naive_results %>%
filter(term != "Constant") %>%
mutate(
term = fct_reorder(term, estimate),
significant = p.value < 0.05
) %>%
ggplot(aes(x = estimate, y = term, color = significant)) +
geom_vline(xintercept = 0, linetype = "dashed", color = "gray50") +
geom_point(size = 4) +
geom_errorbarh(aes(xmin = conf.low, xmax = conf.high), height = 0.2, linewidth = 1) +
scale_color_manual(values = c("FALSE" = "#95A5A6", "TRUE" = "#E74C3C"),
labels = c("Not Significant", "Significant (p<0.05)")) +
labs(title = "Figure 3: Regression Coefficients with 95% Confidence Intervals",
subtitle = "Effect on 1998 Earnings ($1000s)",
x = "Coefficient Estimate",
y = "",
color = "") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5, face = "bold"),
plot.subtitle = element_text(hjust = 0.5),
legend.position = "bottom")ACME’s Claim:
Training increases earnings by $1,890 (p < 0.001), after controlling for age, education, race/ethnicity, marital status, prior earnings, and unemployment history.
But is this the true causal effect?
The regression controls for observable characteristics, but what about unobservable ones?
Consider factors we cannot measure in this dataset:
If these unobserved factors affect both training participation and earnings, our estimate is biased.
# Create balance table before matching
balance_before <- jobs %>%
group_by(train) %>%
summarise(
`Mean Age` = round(mean(age), 2),
`Mean Education` = round(mean(educ), 2),
`% Black` = percent(mean(black)),
`% Hispanic` = percent(mean(hisp)),
`% Married` = percent(mean(married)),
`Mean 1996 Earnings` = dollar(mean(earn96) * 1000),
`% Unemployed 1996` = percent(mean(unem96)),
.groups = 'drop'
) %>%
mutate(train = factor(train, labels = c("No Training", "Training")))
balance_before %>%
kable(caption = "Table 4: Balance Check - Characteristics Before Matching") %>%
kable_styling(bootstrap_options = c("striped", "hover"))| train | Mean Age | Mean Education | % Black | % Hispanic | % Married | Mean 1996 Earnings | % Unemployed 1996 |
|---|---|---|---|---|---|---|---|
| No Training | 32.49 | 11.08 | 38% | 4% | 77% | $15,804.04 | 25% |
| Training | 30.83 | 10.73 | 56% | 6% | 53% | $4,972.68 | 44% |
# Create love plot showing standardized differences
covariates <- c("age", "educ", "black", "hisp", "married", "earn96", "unem96")
std_diff <- function(var, treatment, data) {
mean_t <- mean(data[treatment == 1, var])
mean_c <- mean(data[treatment == 0, var])
sd_t <- sd(data[treatment == 1, var])
sd_c <- sd(data[treatment == 0, var])
pooled_sd <- sqrt((sd_t^2 + sd_c^2) / 2)
(mean_t - mean_c) / pooled_sd
}
balance_data <- data.frame(
Variable = c("Age", "Education", "Black", "Hispanic", "Married",
"1996 Earnings", "Unemployed 1996"),
Std_Diff = sapply(covariates, function(v) std_diff(v, jobs$train, jobs))
)
ggplot(balance_data, aes(x = Std_Diff, y = reorder(Variable, abs(Std_Diff)))) +
geom_vline(xintercept = c(-0.1, 0.1), linetype = "dashed", color = "gray60") +
geom_vline(xintercept = 0, color = "black") +
geom_point(size = 4, color = "#E74C3C") +
geom_segment(aes(x = 0, xend = Std_Diff, y = Variable, yend = Variable),
linewidth = 1, color = "#E74C3C") +
labs(title = "Figure 4: Standardized Differences Before Matching",
subtitle = "Values outside ±0.1 indicate meaningful imbalance",
x = "Standardized Difference",
y = "") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5, face = "bold"),
plot.subtitle = element_text(hjust = 0.5))⚠Imbalance Detected: The trained and untrained groups differ substantially on several key characteristics. This threatens the validity of our causal inference.
Imagine you have two groups: - Group A: People who chose to get training - Group B: People who didn’t
These groups are different in many ways (age, education, prior earnings, etc.). We can’t just compare them directly that’s like comparing apples to oranges.
Propensity Score Matching solves this by:
Think of it as creating an artificial “randomized experiment” from observational data.
# Perform propensity score matching
# Using nearest neighbor matching with 1:1 ratio and caliper
match_out <- matchit(train ~ age + educ + black + hisp + married + earn96 + unem96,
data = jobs,
method = "nearest",
distance = "logit",
caliper = 0.25,
ratio = 1)
# Summary of matching
summary(match_out)##
## Call:
## matchit(formula = train ~ age + educ + black + hisp + married +
## earn96 + unem96, data = jobs, method = "nearest", distance = "logit",
## caliper = 0.25, ratio = 1)
##
## Summary of Balance for All Data:
## Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean
## distance 0.5460 0.2264 1.4799 0.9527 0.3369
## age 30.8324 32.4907 -0.1533 0.9884 0.0432
## educ 10.7261 11.0849 -0.1237 0.9404 0.0240
## black 0.5559 0.3820 0.3500 . 0.1739
## hisp 0.0612 0.0371 0.1003 . 0.0240
## married 0.5319 0.7719 -0.4809 . 0.2400
## earn96 4.9727 15.8040 -1.7016 0.3222 0.3361
## unem96 0.4362 0.2507 0.3741 . 0.1855
## eCDF Max
## distance 0.5393
## age 0.0866
## educ 0.1209
## black 0.1739
## hisp 0.0240
## married 0.2400
## earn96 0.5169
## unem96 0.1855
##
## Summary of Balance for Matched Data:
## Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean
## distance 0.4814 0.4483 0.1530 1.1947 0.0287
## age 31.8561 32.3741 -0.0479 1.0942 0.0269
## educ 10.7554 10.7662 -0.0037 1.0953 0.0127
## black 0.4640 0.4928 -0.0579 . 0.0288
## hisp 0.0647 0.0432 0.0901 . 0.0216
## married 0.6007 0.6511 -0.1009 . 0.0504
## earn96 5.5154 5.6493 -0.0210 0.9586 0.0145
## unem96 0.4856 0.5288 -0.0870 . 0.0432
## eCDF Max Std. Pair Dist.
## distance 0.1367 0.1535
## age 0.0719 1.0409
## educ 0.0360 1.0604
## black 0.0288 0.7384
## hisp 0.0216 0.4203
## married 0.0504 0.6056
## earn96 0.0468 0.6261
## unem96 0.0432 0.6818
##
## Sample Sizes:
## Control Treated
## All 754 376
## Matched 278 278
## Unmatched 476 98
## Discarded 0 0
# Extract propensity scores
jobs_ps <- jobs %>%
mutate(ps = match_out$distance,
matched = !is.na(match_out$weights))
# Plot propensity score distributions
ggplot(jobs_ps, aes(x = ps, fill = factor(train, labels = c("No Training", "Training")))) +
geom_histogram(position = "identity", alpha = 0.6, bins = 30) +
geom_vline(xintercept = 0.5, linetype = "dashed", color = "black") +
facet_wrap(~matched, labeller = labeller(matched = c(
"FALSE" = "Unmatched Units",
"TRUE" = "Matched Units"
))) +
scale_fill_manual(values = c("#E74C3C", "#3498DB")) +
labs(title = "Figure 5: Propensity Score Distributions Before and After Matching",
subtitle = "Matched units show greater overlap in propensity scores",
x = "Propensity Score (Probability of Treatment)",
y = "Count",
fill = "Group") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5, face = "bold"),
plot.subtitle = element_text(hjust = 0.5),
legend.position = "bottom")# Create matched dataset
matched_data <- match.data(match_out)
# Show sample sizes
matching_summary <- data.frame(
Dataset = c("Original", "Matched"),
`Total N` = c(nrow(jobs), nrow(matched_data)),
`Treated` = c(sum(jobs$train), sum(matched_data$train)),
`Control` = c(sum(jobs$train == 0), sum(matched_data$train == 0)),
check.names = FALSE
)
matching_summary %>%
kable(caption = "Table 5: Sample Sizes Before and After Matching") %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)| Dataset | Total N | Treated | Control |
|---|---|---|---|
| Original | 1130 | 376 | 754 |
| Matched | 556 | 278 | 278 |
# Check balance after matching
balance_after <- matched_data %>%
group_by(train) %>%
summarise(
`Mean Age` = round(mean(age), 2),
`Mean Education` = round(mean(educ), 2),
`% Black` = percent(mean(black)),
`% Hispanic` = percent(mean(hisp)),
`% Married` = percent(mean(married)),
`Mean 1996 Earnings` = dollar(mean(earn96) * 1000),
`% Unemployed 1996` = percent(mean(unem96)),
.groups = 'drop'
) %>%
mutate(train = factor(train, labels = c("No Training", "Training")))
balance_after %>%
kable(caption = "Table 6: Balance Check - Characteristics After Matching") %>%
kable_styling(bootstrap_options = c("striped", "hover"))| train | Mean Age | Mean Education | % Black | % Hispanic | % Married | Mean 1996 Earnings | % Unemployed 1996 |
|---|---|---|---|---|---|---|---|
| No Training | 32.37 | 10.77 | 49% | 4% | 65% | $5,649.33 | 53% |
| Training | 31.86 | 10.76 | 46% | 6% | 60% | $5,515.40 | 49% |
# Calculate standardized differences after matching
balance_after_data <- data.frame(
Variable = c("Age", "Education", "Black", "Hispanic", "Married",
"1996 Earnings", "Unemployed 1996"),
Before = sapply(covariates, function(v) std_diff(v, jobs$train, jobs)),
After = sapply(covariates, function(v) std_diff(v, matched_data$train, matched_data))
) %>%
pivot_longer(cols = c(Before, After), names_to = "Timing", values_to = "Std_Diff")
ggplot(balance_after_data, aes(x = Std_Diff, y = Variable, color = Timing, shape = Timing)) +
geom_vline(xintercept = c(-0.1, 0.1), linetype = "dashed", color = "gray60") +
geom_vline(xintercept = 0, color = "black") +
geom_point(size = 4, position = position_dodge(width = 0.5)) +
scale_color_manual(values = c("Before" = "#E74C3C", "After" = "#27AE60")) +
scale_shape_manual(values = c("Before" = 16, "After" = 17)) +
labs(title = "Figure 6: Love Plot - Balance Before and After Matching",
subtitle = "After matching, standardized differences are substantially reduced",
x = "Standardized Difference",
y = "",
color = "Timing",
shape = "Timing") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5, face = "bold"),
plot.subtitle = element_text(hjust = 0.5),
legend.position = "bottom")*Improved Balance**: After matching, the standardized differences are much smaller, indicating the treated and control groups are now much more comparable.
Now let’s repeat the simple t-test, but this time on our balanced matched dataset.
# Perform t-test on matched data
matched_ttest <- t.test(earn98 ~ train, data = matched_data)
# Display results
matched_ttest_df <- data.frame(
Group = c("No Training", "Training"),
`Mean Earnings` = c(
dollar(matched_ttest$estimate[1] * 1000),
dollar(matched_ttest$estimate[2] * 1000)
),
check.names = FALSE
)
matched_difference <- matched_ttest$estimate[2] - matched_ttest$estimate[1]
matched_pvalue <- matched_ttest$p.value
matched_ttest_df %>%
kable(caption = "Table 7: t-Test Results on Matched Data - 1998 Earnings") %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)| Group | Mean Earnings | |
|---|---|---|
| mean in group 0 | No Training | $6,613.65 |
| mean in group 1 | Training | $8,580.18 |
Matched t-Test Result:
Interpretation: Even in the balanced sample, training significantly increases earnings.
# Run regression on matched data
matched_model <- lm(earn98 ~ train + age + educ + black + hisp + married + earn96 + unem96,
data = matched_data)
# Display results
matched_results <- tidy(matched_model, conf.int = TRUE) %>%
mutate(
term = case_when(
term == "(Intercept)" ~ "Constant",
term == "train" ~ "Training",
term == "age" ~ "Age",
term == "educ" ~ "Education (years)",
term == "black" ~ "Black",
term == "hisp" ~ "Hispanic",
term == "married" ~ "Married",
term == "earn96" ~ "1996 Earnings",
term == "unem96" ~ "Unemployed 1996",
TRUE ~ term
),
significance = case_when(
p.value < 0.01 ~ "***",
p.value < 0.05 ~ "**",
p.value < 0.1 ~ "*",
TRUE ~ ""
)
)
matched_results %>%
select(Variable = term,
Coefficient = estimate,
`Std. Error` = std.error,
`t-value` = statistic,
`p-value` = p.value,
` ` = significance) %>%
mutate(
Coefficient = round(Coefficient, 3),
`Std. Error` = round(`Std. Error`, 3),
`t-value` = round(`t-value`, 2),
`p-value` = format.pval(`p-value`, digits = 3, eps = 0.001)
) %>%
kable(caption = "Table 8: Regression Results on Matched Data",
align = c('l', 'r', 'r', 'r', 'r', 'c')) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed")) %>%
row_spec(2, bold = TRUE, background = "#dff0d8") # Highlight training coefficient| Variable | Coefficient | Std. Error | t-value | p-value | |
|---|---|---|---|---|---|
| Constant | 7.000 | 1.783 | 3.93 | <0.001 | *** |
| Training | 1.827 | 0.492 | 3.71 | <0.001 | *** |
| Age | -0.156 | 0.026 | -5.92 | <0.001 | *** |
| Education (years) | 0.336 | 0.096 | 3.50 | <0.001 | *** |
| Black | -0.768 | 0.581 | -1.32 | 0.1869 | |
| Hispanic | 2.494 | 1.126 | 2.21 | 0.0272 | ** |
| Married | 2.091 | 0.570 | 3.67 | <0.001 | *** |
| 1996 Earnings | 0.267 | 0.058 | 4.59 | <0.001 | *** |
| Unemployed 1996 | -2.944 | 0.835 | -3.53 | <0.001 | *** |
# Extract training coefficients
naive_coef <- coef(naive_model)["train"]
naive_se <- summary(naive_model)$coefficients["train", "Std. Error"]
matched_coef <- coef(matched_model)["train"]
matched_se <- summary(matched_model)$coefficients["train", "Std. Error"]
# Create comparison dataframe
comparison_df <- data.frame(
Model = c("Naive (ACME's Claim)", "Propensity Score Matched"),
Coefficient = c(naive_coef, matched_coef),
`Std. Error` = c(naive_se, matched_se),
`Lower 95% CI` = c(naive_coef - 1.96*naive_se, matched_coef - 1.96*matched_se),
`Upper 95% CI` = c(naive_coef + 1.96*naive_se, matched_coef + 1.96*matched_se),
check.names = FALSE
)
comparison_df %>%
mutate(
Coefficient = round(Coefficient, 3),
`Std. Error` = round(`Std. Error`, 3),
`Lower 95% CI` = round(`Lower 95% CI`, 3),
`Upper 95% CI` = round(`Upper 95% CI`, 3)
) %>%
kable(caption = "Table 9: Comparison of Training Effect Estimates") %>%
kable_styling(bootstrap_options = c("striped", "hover"))| Model | Coefficient | Std. Error | Lower 95% CI | Upper 95% CI |
|---|---|---|---|---|
| Naive (ACME’s Claim) | 1.885 | 0.443 | 1.017 | 2.754 |
| Propensity Score Matched | 1.827 | 0.492 | 0.863 | 2.792 |
# Create forest plot
ggplot(comparison_df, aes(x = Coefficient, y = Model, color = Model)) +
geom_vline(xintercept = 0, linetype = "dashed", color = "gray50") +
geom_point(size = 5) +
geom_errorbarh(aes(xmin = `Lower 95% CI`, xmax = `Upper 95% CI`),
height = 0.2, linewidth = 1.5) +
geom_text(aes(label = paste0("$", round(Coefficient * 1000, 0))),
vjust = -1, size = 4.5, fontface = "bold") +
scale_color_manual(values = c("#E74C3C", "#27AE60")) +
labs(title = "Figure 7: Comparison of Training Effect Estimates",
subtitle = "Point estimates with 95% confidence intervals",
x = "Effect on 1998 Earnings ($1000s)",
y = "") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5, face = "bold"),
plot.subtitle = element_text(hjust = 0.5),
legend.position = "none")# Calculate percentage difference
pct_difference <- ((matched_coef - naive_coef) / naive_coef) * 100Key Insight: The matched estimate ($1,827.50) is lower than the naive estimate ($1,885.39), representing a 3.1% reduction in the estimated effect.
Let’s check if training also affected unemployment status in 1998.
# t-test for unemployment
unem_ttest_naive <- t.test(unem98 ~ train, data = jobs)
unem_ttest_matched <- t.test(unem98 ~ train, data = matched_data)
unem_comparison <- data.frame(
Model = c("Naive Sample", "Matched Sample"),
`Unemployed (No Training)` = c(
percent(unem_ttest_naive$estimate[1]),
percent(unem_ttest_matched$estimate[1])
),
`Unemployed (Training)` = c(
percent(unem_ttest_naive$estimate[2]),
percent(unem_ttest_matched$estimate[2])
),
`Difference` = c(
paste0(round((unem_ttest_naive$estimate[2] - unem_ttest_naive$estimate[1]) * 100, 2), " pp"),
paste0(round((unem_ttest_matched$estimate[2] - unem_ttest_matched$estimate[1]) * 100, 2), " pp")
),
`p-value` = c(
format.pval(unem_ttest_naive$p.value, digits = 3),
format.pval(unem_ttest_matched$p.value, digits = 3)
),
check.names = FALSE
)
unem_comparison %>%
kable(caption = "Table 10: Effect of Training on 1998 Unemployment") %>%
kable_styling(bootstrap_options = c("striped", "hover"))| Model | Unemployed (No Training) | Unemployed (Training) | Difference | p-value |
|---|---|---|---|---|
| Naive Sample | 16% | 19% | 2.57 pp | 0.29 |
| Matched Sample | 28% | 21% | -7.19 pp | 0.0475 |
# Try different matching methods
match_methods <- list(
"1:1 Nearest" = matchit(train ~ age + educ + black + hisp + married + earn96 + unem96,
data = jobs, method = "nearest", ratio = 1),
"1:2 Nearest" = matchit(train ~ age + educ + black + hisp + married + earn96 + unem96,
data = jobs, method = "nearest", ratio = 2),
"1:3 Nearest" = matchit(train ~ age + educ + black + hisp + married + earn96 + unem96,
data = jobs, method = "nearest", ratio = 3)
)
# Extract treatment effects from each method
sensitivity_results <- map_df(names(match_methods), function(method_name) {
matched_df <- match.data(match_methods[[method_name]])
model <- lm(earn98 ~ train + age + educ + black + hisp + married + earn96 + unem96,
data = matched_df)
coef_train <- coef(model)["train"]
se_train <- summary(model)$coefficients["train", "Std. Error"]
data.frame(
Method = method_name,
Coefficient = coef_train,
`Std. Error` = se_train,
`Lower 95% CI` = coef_train - 1.96*se_train,
`Upper 95% CI` = coef_train + 1.96*se_train,
check.names = FALSE
)
})
sensitivity_results %>%
mutate(
Coefficient = round(Coefficient, 3),
`Std. Error` = round(`Std. Error`, 3),
`Lower 95% CI` = round(`Lower 95% CI`, 3),
`Upper 95% CI` = round(`Upper 95% CI`, 3)
) %>%
kable(caption = "Table 11: Sensitivity Analysis - Different Matching Methods") %>%
kable_styling(bootstrap_options = c("striped", "hover"))| Method | Coefficient | Std. Error | Lower 95% CI | Upper 95% CI | |
|---|---|---|---|---|---|
| train…1 | 1:1 Nearest | 1.899 | 0.439 | 1.040 | 2.759 |
| train…2 | 1:2 Nearest | 1.879 | 0.443 | 1.010 | 2.748 |
| train…3 | 1:3 Nearest | 1.885 | 0.443 | 1.017 | 2.754 |
# Visualize sensitivity analysis
ggplot(sensitivity_results, aes(x = Coefficient, y = Method, color = Method)) +
geom_vline(xintercept = naive_coef, linetype = "dashed", color = "#E74C3C", linewidth = 1) +
geom_point(size = 5) +
geom_errorbarh(aes(xmin = `Lower 95% CI`, xmax = `Upper 95% CI`),
height = 0.2, linewidth = 1.5) +
annotate("text", x = naive_coef, y = 3.5, label = "Naive Estimate",
color = "#E74C3C", hjust = -0.1, size = 3.5) +
scale_color_viridis_d() +
labs(title = "Figure 8: Sensitivity Analysis - Training Effect Across Methods",
subtitle = "Dashed line shows naive (unmatched) estimate",
x = "Effect on 1998 Earnings ($1000s)",
y = "Matching Method") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5, face = "bold"),
plot.subtitle = element_text(hjust = 0.5),
legend.position = "none")*Robustness Confirmed**: The training effect remains statistically significant and of similar magnitude across different matching specifications.
Yes, but with important caveats.
After accounting for selection bias using propensity score matching, we find that:
From a policy perspective:
Our analysis demonstrates that selection bias is a serious concern:
Based on findings:
Final Verdict: Training programs increase earnings by approximately $1,400-$1,900 annually. This effect is statistically significant and robust across specifications. While meaningful, it is more modest than naive comparisons suggest, highlighting the critical importance of controlling for selection bias in program evaluation.
# Create side-by-side regression table
stargazer::stargazer(naive_model, matched_model,
type = "html",
title = "Comparison of Naive and Matched Regression Models",
column.labels = c("Naive Model", "Matched Model"),
dep.var.labels = "1998 Earnings ($1000s)",
covariate.labels = c("Training", "Age", "Education (years)",
"Black", "Hispanic", "Married",
"1996 Earnings", "Unemployed 1996"),
omit.stat = c("ser", "f"),
digits = 3,
star.cutoffs = c(0.1, 0.05, 0.01))| Dependent variable: | ||
| 1998 Earnings (1000s) | ||
| Naive Model | Matched Model | |
| (1) | (2) | |
| Training | 1.885*** | 1.827*** |
| (0.443) | (0.492) | |
| Age | -0.162*** | -0.156*** |
| (0.019) | (0.026) | |
| Education (years) | 0.408*** | 0.336*** |
| (0.069) | (0.096) | |
| Black | -0.116 | -0.768 |
| (0.408) | (0.581) | |
| Hispanic | 2.410*** | 2.494** |
| (0.875) | (1.126) | |
| Married | 2.344*** | 2.091*** |
| (0.431) | (0.570) | |
| 1996 Earnings | 0.277*** | 0.267*** |
| (0.027) | (0.058) | |
| Unemployed 1996 | -2.886*** | -2.944*** |
| (0.598) | (0.835) | |
| Constant | 5.843*** | 7.000*** |
| (1.311) | (1.783) | |
| Observations | 1,130 | 556 |
| R2 | 0.421 | 0.364 |
| Adjusted R2 | 0.417 | 0.355 |
| Note: | p<0.1; p<0.05; p<0.01 | |
session_info <- sessionInfo()
versions <- data.frame(
Package = c("R", "MatchIt", "tidyverse", "ggplot2", "broom"),
Version = c(
paste(session_info$R.version$major, session_info$R.version$minor, sep = "."),
as.character(packageVersion("MatchIt")),
as.character(packageVersion("tidyverse")),
as.character(packageVersion("ggplot2")),
as.character(packageVersion("broom"))
)
)
versions %>%
kable(caption = "Software Versions Used") %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)| Package | Version |
|---|---|
| R | 4.5.3 |
| MatchIt | 4.7.2 |
| tidyverse | 2.0.0 |
| ggplot2 | 4.0.2 |
| broom | 1.0.12 |
The propensity score matching was implemented using the following specifications:
Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1), 41-55.
Austin, P. C. (2011). An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivariate Behavioral Research, 46(3), 399-424.
Stuart, E. A. (2010). Matching methods for causal inference: A review and a look forward. Statistical Science, 25(1), 1-21.
Ho, D. E., Imai, K., King, G., & Stuart, E. A. (2007). Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference. Political Analysis, 15(3), 199-236.