Import required packages and load data:

library(flexdashboard)
library(data.table)
load("~/MWR_course/exercises/exercise_data.RData")

Ex 1. Use the experimental data to estimate the effect of the job training treatment. How much does it appear to affect 1978 income?

setDT(d_exper)
t.test(d_exper[treat==0,re78], d_exper[treat==1,re78])

    Welch Two Sample t-test

data:  d_exper[treat == 0, re78] and d_exper[treat == 1, re78]
t = -1.8154, df = 557.06, p-value = 0.06999
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -1.84525058  0.07264296
sample estimates:
mean of x mean of y 
 5.090048  5.976352 

Now look at the observational data (for all exercises from now on). How large is the raw difference in 1978 income between the treatment group and the PSID comparison group?

setDT(d)
t.test(d[treat==0,re78], d[treat==1,re78])

    Welch Two Sample t-test

data:  d[treat == 0, re78] and d[treat == 1, re78]
t = 31.595, df = 801.32, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 15.51366 17.56902
sample estimates:
mean of x mean of y 
22.517690  5.976352 

  1. Try to estimate the effect of the treatment using regression. What does regression say the effect of the program is?
summary(lm(re78 ~ ., data=d))

Call:
lm(formula = re78 ~ ., data = d)

Residuals:
    Min      1Q  Median      3Q     Max 
-64.411  -4.364  -0.350   3.938 110.624 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.25001    1.79008  -0.140 0.888938    
treat       -2.46239    0.86283  -2.854 0.004355 ** 
age         -0.08364    0.02317  -3.611 0.000312 ***
educ         0.63322    0.10823   5.850 5.54e-09 ***
black       -0.39075    0.52073  -0.750 0.453087    
hisp         1.59185    1.07662   1.479 0.139382    
married      0.98918    0.61974   1.596 0.110586    
nodegr       0.50535    0.66269   0.763 0.445791    
re74         0.31633    0.03312   9.550  < 2e-16 ***
re75         0.52153    0.03245  16.072  < 2e-16 ***
u74          3.58345    1.08631   3.299 0.000985 ***
u75         -1.35541    1.02480  -1.323 0.186085    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 10.14 on 2498 degrees of freedom
Multiple R-squared:  0.5973,    Adjusted R-squared:  0.5955 
F-statistic: 336.9 on 11 and 2498 DF,  p-value: < 2.2e-16
Trying again, after removing some binary variables.
summary(lm(re78 ~ treat+age+educ+re74+re75, data=d))

Call:
lm(formula = re78 ~ treat + age + educ + re74 + re75, data = d)

Residuals:
    Min      1Q  Median      3Q     Max 
-63.915  -4.374  -0.376   3.825 109.927 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.56943    1.26510   0.450  0.65267    
treat       -2.21312    0.71187  -3.109  0.00190 ** 
age         -0.05753    0.02182  -2.637  0.00842 ** 
educ         0.61539    0.07574   8.125 6.96e-16 ***
re74         0.27247    0.02962   9.199  < 2e-16 ***
re75         0.54657    0.02903  18.830  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 10.17 on 2504 degrees of freedom
Multiple R-squared:  0.5941,    Adjusted R-squared:  0.5933 
F-statistic:   733 on 5 and 2504 DF,  p-value: < 2.2e-16

Based on observational data, the results show a reduction in income of about $2,000 due to treatment.