This project looks at the National Supported Work Demonstration (NSW) – a subsidized work program that operated in 15 locations in the 1970s. The program provided approximately 10,000 persons with 12-18 months of employment and helped them find a job afterwards. The program costed an average of $5,600 per person in 1978 dollars. To qualify for the program, the individual had to be unemployed, must have been employed full-time for less than half of the preceding 6 months, and fall into one of the following categories:
The primary question is whether or not the NSW program benefited its recipients and whether the benefits outweighed the cost.
load("~/Downloads/nsw.rda")
The data is a sample of males that participated in the program and those who did not and did not receive significant aid in another form. The relevant variables are
id – unique identifier representing the person in the
sampleyear – year of the observation. Observations in 1975
took place before participation in NSW; observations in 1978 took place
afterwards.treated – 1 if participated in NSW, 0 if not, and
NA if from a different control sample.re – Annual real earnings in 1975 dollarsre74 – Annual real earnings in 1974library(dplyr)
library(ggplot2)
## PLACE CODE HERE
# Replace NA with 0
nsw$treated <- ifelse(is.na(nsw$treated),0,nsw$treated)
# Dummy Variable
nsw <- nsw %>%
mutate(treatment = ifelse(year ==1975,0,1))
# Frequency Table
table <- table(nsw$treated, nsw$black)
prop.table(table, margin = 1)
##
## 0 1
## 0 0.8868144 0.1131856
## 1 0.1986532 0.8013468
Black people are over-representative in the NSW treatment group because there is much higher proportion of treatment group (treated =1) is Black (80.13%).
Plot a histogram of real earnings for both the control and treatment group, in the two years.
# 1st histogram
library(scales)
ggplot(nsw,aes(re74)) +
geom_histogram(position = "dodge", bins = 30) +
facet_wrap(~year + treated, ncol = 2) +
ggtitle("Histogram: Real earnings (1975) in 1975 and 1978, by control and treatment group") +
xlab("Real Earnings (1975)") +
scale_x_continuous(labels = label_number(scale_cut = cut_long_scale()))
To filter out high earnings, we can use the IQR method
Q <- quantile(nsw$re, probs = c(.25, .75), na.rm = FALSE)
iqr <- IQR(nsw$re)
# Histogram 2: Remove outlier in real earnings and converted treated to a factor
nsw1 <- nsw %>%
subset(re > (Q[1] - 1.5*iqr) & re <(Q[2]+1.5*iqr)) %>%
mutate(treated = as.factor(treated))
ggplot(nsw1, aes(re)) +
geom_histogram(position = "dodge", bins = 30) +
facet_wrap(~year + treated, ncol = 2, scales = "free_y") +
ggtitle("Histogram: Real earnings (1975) in 1975 and 1978, by control and treatment group") +
xlab("Real Earnings (1975)") +
scale_x_continuous(labels = label_number(scale_cut = cut_long_scale()))
Describe the histogram:
nsw1 %>%
filter(re ==0) %>%
group_by(treated) %>%
summarize(No_Earnings_Count = n())
## # A tibble: 2 × 2
## treated No_Earnings_Count
## <fct> <int>
## 1 0 4762
## 2 1 178
The reason is there is many rows have real earnings (1975) is 0.
Calculate and show the average real earnings for each group before and after the treatment.
# Avg. Real Earnings of each group before and after the treatment
avg_res <- nsw %>%
group_by(treated, year, treatment) %>%
summarize(avg_re = mean(re))
## `summarise()` has grouped output by 'treated', 'year'. You can override using
## the `.groups` argument.
avg_res
## # A tibble: 4 × 4
## # Groups: treated, year [4]
## treated year treatment avg_re
## <dbl> <dbl> <dbl> <dbl>
## 1 0 1975 0 14125.
## 2 0 1978 1 15511.
## 3 1 1975 0 3066.
## 4 1 1978 1 5976.
# Counterfactual group (parallet to control)
control_avg <- avg_res$avg_re[avg_res$treated ==0]
treatment_avg <- avg_res$avg_re[avg_res$treated ==1]
time_effect <- (control_avg[2] - control_avg[1])
treatment_ctrl_re <- treatment_avg[1] + time_effect
treatment_ctrl_re
## [1] 4451.967
# Dataframe of counterfactual
counter_treatment <- data.frame(
treated = c("Counter-treatment", "Counter-treatment"),
treatment = c(0,1),
year = c(1975, 1978),
avg_re = c(treatment_avg[1], treatment_ctrl_re)
)
counter_treatment
## treated treatment year avg_re
## 1 Counter-treatment 0 1975 3066.098
## 2 Counter-treatment 1 1978 4451.967
# Combine into a single dataframe
avg_res <- avg_res %>%
mutate(treated = ifelse(treated ==0, "Control", "Treatment")) %>%
bind_rows(counter_treatment)
avg_res
## # A tibble: 6 × 4
## # Groups: treated, year [6]
## treated year treatment avg_re
## <chr> <dbl> <dbl> <dbl>
## 1 Control 1975 0 14125.
## 2 Control 1978 1 15511.
## 3 Treatment 1975 0 3066.
## 4 Treatment 1978 1 5976.
## 5 Counter-treatment 1975 0 3066.
## 6 Counter-treatment 1978 1 4452.
# Line plot
ggplot(avg_res, aes(treatment, avg_re, color = treated)) +
geom_line() +
ggtitle("Real Earnings (1975) by control, treatment and counter-control group") +
scale_x_continuous(breaks = c(0,1), labels = c(1975, 1978)) +
ylab("Avg. Real Earnings (1975)") +
xlab("Year")
Average treatment effect appear to be positive.
Perform a simple linear regression to estimate the average treatment effect using the difference-in-differences approach.
reg <- lm( re ~ treated * treatment, nsw)
summary(reg)
##
## Call:
## lm(formula = re ~ treated * treatment, data = nsw)
##
## Residuals:
## Min 1Q Median 3Q Max
## -15511 -9470 744 9727 142528
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 14124.81 76.35 185.008 <2e-16 ***
## treated -11058.71 613.92 -18.013 <2e-16 ***
## treatment 1385.87 107.97 12.836 <2e-16 ***
## treated:treatment 1524.39 868.21 1.756 0.0791 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10500 on 38404 degrees of freedom
## Multiple R-squared: 0.01887, Adjusted R-squared: 0.0188
## F-statistic: 246.3 on 3 and 38404 DF, p-value: < 2.2e-16
Interpret the results:
treated coefficient = -11,058.71 indicates that the
treatment group’s average real earnings is lower than of the control
group before the treatment, by about 11,058.71treatment coefficient = 1385.87: On average, real
earnings increased by $13385.87 in the post-treatment for all
individualstreated:treatment coefficient = 1524.39: the
intervention had a positive effect of $1524.39 on the treated group’s
real earnings post-treatment but the p-value is higher than 0.05 so this
variable is not statistically significant.Perform the same regression as in question 4, but add in the other control variables from the dataset.
reg1 <- lm( re ~ married + hisp + black + treated * treatment , nsw)
summary(reg1)
##
## Call:
## lm(formula = re ~ married + hisp + black + treated * treatment,
## data = nsw)
##
## Residuals:
## Min 1Q Median 3Q Max
## -18055 -7177 819 7510 139985
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9473.7 112.7 84.080 < 2e-16 ***
## married 7194.9 113.2 63.555 < 2e-16 ***
## hisp -2455.0 201.6 -12.180 < 2e-16 ***
## black -3182.8 160.4 -19.849 < 2e-16 ***
## treated -4836.9 591.4 -8.179 2.95e-16 ***
## treatment 1385.9 101.8 13.617 < 2e-16 ***
## treated:treatment 1524.4 818.4 1.863 0.0625 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9895 on 38401 degrees of freedom
## Multiple R-squared: 0.1284, Adjusted R-squared: 0.1282
## F-statistic: 942.5 on 6 and 38401 DF, p-value: < 2.2e-16
After adding married, hisp and
black variable, we see that the
treated:treatment doesn’t change much so the average
treatment effect doesn’t change much. However, the R-squared go up to
0.1284, suggesting that the model increase its accuracy in explaining
variations in real income.
From the regression result in part #4, the coefficient of
treated:treatment is 1,524.39, suggesting that the real
earnings has increased $1,524.39 for the treatment group between 1975
and 1978, so the program did have a positive impact. However, this
interaction variable is not statistically significant (p-value >
0.05). To conduct a cost-benefit analysis of this program, it is crucial
to consider how long the real earnings increase can sustained after the
program. The cost is $5600, meaning the increase in the real earnings
needs to be at least ~3.7 years to cover the cost (5600/1524.39).
In addition, employment stability and social & economic impact needs to be considered as well. After the training program, there are many other factors affect the stability of the job, such as the job itself, etc. For social groups participating in the program (Ex-drug addict, Ex-offender, Young school dropout and female & recipient of “Aid to Families with Dependent Children” (AFDC)), increasing real earnings can take away government assistance from them, making it harder for the sustainability of the program, or benefits outweigh the costs.