Introduction

This project looks at the National Supported Work Demonstration (NSW) – a subsidized work program that operated in 15 locations in the 1970s. The program provided approximately 10,000 persons with 12-18 months of employment and helped them find a job afterwards. The program costed an average of $5,600 per person in 1978 dollars. To qualify for the program, the individual had to be unemployed, must have been employed full-time for less than half of the preceding 6 months, and fall into one of the following categories:

The primary question is whether or not the NSW program benefited its recipients and whether the benefits outweighed the cost.

 load("~/Downloads/nsw.rda")

The data is a sample of males that participated in the program and those who did not and did not receive significant aid in another form. The relevant variables are

library(dplyr)
library(ggplot2)

1. Prepping the Data


## PLACE CODE HERE
# Replace NA with 0
nsw$treated <- ifelse(is.na(nsw$treated),0,nsw$treated)

# Dummy Variable
nsw <- nsw %>% 
  mutate(treatment = ifelse(year ==1975,0,1))

# Frequency Table
table <- table(nsw$treated, nsw$black)
prop.table(table, margin = 1) 
##    
##             0         1
##   0 0.8868144 0.1131856
##   1 0.1986532 0.8013468

Black people are over-representative in the NSW treatment group because there is much higher proportion of treatment group (treated =1) is Black (80.13%).

2. Histogram

Plot a histogram of real earnings for both the control and treatment group, in the two years.


# 1st histogram
library(scales)
ggplot(nsw,aes(re74)) +
  geom_histogram(position = "dodge", bins = 30) +
  facet_wrap(~year + treated, ncol = 2) +
  ggtitle("Histogram: Real earnings (1975) in 1975 and 1978, by control and treatment group") +
  xlab("Real Earnings (1975)") +
  scale_x_continuous(labels = label_number(scale_cut = cut_long_scale()))

To filter out high earnings, we can use the IQR method

Q <- quantile(nsw$re, probs = c(.25, .75), na.rm = FALSE)
iqr <- IQR(nsw$re)
# Histogram 2: Remove outlier in real earnings and converted treated to a factor
nsw1 <- nsw %>% 
  subset(re > (Q[1] - 1.5*iqr) & re <(Q[2]+1.5*iqr)) %>% 
  mutate(treated = as.factor(treated))

ggplot(nsw1, aes(re)) +
  geom_histogram(position = "dodge", bins = 30) +
  facet_wrap(~year + treated, ncol = 2, scales = "free_y") +
  ggtitle("Histogram: Real earnings (1975) in 1975 and 1978, by control and treatment group") +
  xlab("Real Earnings (1975)") +
  scale_x_continuous(labels = label_number(scale_cut = cut_long_scale()))

Describe the histogram:

nsw1 %>% 
  filter(re ==0) %>% 
  group_by(treated) %>% 
  summarize(No_Earnings_Count = n())
## # A tibble: 2 × 2
##   treated No_Earnings_Count
##   <fct>               <int>
## 1 0                    4762
## 2 1                     178

The reason is there is many rows have real earnings (1975) is 0.

3. Plot

Calculate and show the average real earnings for each group before and after the treatment.


# Avg. Real Earnings of each group before and after the treatment

avg_res <- nsw %>% 
  group_by(treated, year, treatment) %>% 
  summarize(avg_re = mean(re))
## `summarise()` has grouped output by 'treated', 'year'. You can override using
## the `.groups` argument.
avg_res
## # A tibble: 4 × 4
## # Groups:   treated, year [4]
##   treated  year treatment avg_re
##     <dbl> <dbl>     <dbl>  <dbl>
## 1       0  1975         0 14125.
## 2       0  1978         1 15511.
## 3       1  1975         0  3066.
## 4       1  1978         1  5976.
# Counterfactual group (parallet to control)
control_avg <- avg_res$avg_re[avg_res$treated ==0]
treatment_avg <- avg_res$avg_re[avg_res$treated ==1]

time_effect <- (control_avg[2] - control_avg[1])
treatment_ctrl_re <- treatment_avg[1] + time_effect 
treatment_ctrl_re
## [1] 4451.967
# Dataframe of counterfactual
counter_treatment <- data.frame(
  treated = c("Counter-treatment", "Counter-treatment"),
  treatment = c(0,1),
  year = c(1975, 1978),
  avg_re = c(treatment_avg[1], treatment_ctrl_re)
)
counter_treatment
##             treated treatment year   avg_re
## 1 Counter-treatment         0 1975 3066.098
## 2 Counter-treatment         1 1978 4451.967
# Combine into a single dataframe
avg_res <- avg_res %>% 
  mutate(treated = ifelse(treated ==0, "Control", "Treatment")) %>% 
  bind_rows(counter_treatment)
avg_res
## # A tibble: 6 × 4
## # Groups:   treated, year [6]
##   treated            year treatment avg_re
##   <chr>             <dbl>     <dbl>  <dbl>
## 1 Control            1975         0 14125.
## 2 Control            1978         1 15511.
## 3 Treatment          1975         0  3066.
## 4 Treatment          1978         1  5976.
## 5 Counter-treatment  1975         0  3066.
## 6 Counter-treatment  1978         1  4452.
# Line plot
ggplot(avg_res, aes(treatment, avg_re, color = treated)) +
  geom_line() +
  ggtitle("Real Earnings (1975) by control, treatment and counter-control group") +
  scale_x_continuous(breaks = c(0,1), labels = c(1975, 1978)) +
  ylab("Avg. Real Earnings (1975)") +
  xlab("Year")

Average treatment effect appear to be positive.

4. Regression 1

Perform a simple linear regression to estimate the average treatment effect using the difference-in-differences approach.


reg <- lm( re ~ treated * treatment, nsw)
summary(reg)
## 
## Call:
## lm(formula = re ~ treated * treatment, data = nsw)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -15511  -9470    744   9727 142528 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        14124.81      76.35 185.008   <2e-16 ***
## treated           -11058.71     613.92 -18.013   <2e-16 ***
## treatment           1385.87     107.97  12.836   <2e-16 ***
## treated:treatment   1524.39     868.21   1.756   0.0791 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10500 on 38404 degrees of freedom
## Multiple R-squared:  0.01887,    Adjusted R-squared:  0.0188 
## F-statistic: 246.3 on 3 and 38404 DF,  p-value: < 2.2e-16

Interpret the results:

5. Regression 2

Perform the same regression as in question 4, but add in the other control variables from the dataset.


reg1 <- lm( re ~ married + hisp + black +  treated * treatment , nsw)
summary(reg1)
## 
## Call:
## lm(formula = re ~ married + hisp + black + treated * treatment, 
##     data = nsw)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -18055  -7177    819   7510 139985 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         9473.7      112.7  84.080  < 2e-16 ***
## married             7194.9      113.2  63.555  < 2e-16 ***
## hisp               -2455.0      201.6 -12.180  < 2e-16 ***
## black              -3182.8      160.4 -19.849  < 2e-16 ***
## treated            -4836.9      591.4  -8.179 2.95e-16 ***
## treatment           1385.9      101.8  13.617  < 2e-16 ***
## treated:treatment   1524.4      818.4   1.863   0.0625 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9895 on 38401 degrees of freedom
## Multiple R-squared:  0.1284, Adjusted R-squared:  0.1282 
## F-statistic: 942.5 on 6 and 38401 DF,  p-value: < 2.2e-16

After adding married, hisp and black variable, we see that the treated:treatment doesn’t change much so the average treatment effect doesn’t change much. However, the R-squared go up to 0.1284, suggesting that the model increase its accuracy in explaining variations in real income.

6. Cost-Benefit Analysis


From the regression result in part #4, the coefficient of treated:treatment is 1,524.39, suggesting that the real earnings has increased $1,524.39 for the treatment group between 1975 and 1978, so the program did have a positive impact. However, this interaction variable is not statistically significant (p-value > 0.05). To conduct a cost-benefit analysis of this program, it is crucial to consider how long the real earnings increase can sustained after the program. The cost is $5600, meaning the increase in the real earnings needs to be at least ~3.7 years to cover the cost (5600/1524.39).

In addition, employment stability and social & economic impact needs to be considered as well. After the training program, there are many other factors affect the stability of the job, such as the job itself, etc. For social groups participating in the program (Ex-drug addict, Ex-offender, Young school dropout and female & recipient of “Aid to Families with Dependent Children” (AFDC)), increasing real earnings can take away government assistance from them, making it harder for the sustainability of the program, or benefits outweigh the costs.