summary(assignment2)
##       re78            train             age             educ     
##  Min.   : 0.000   Min.   :0.0000   Min.   :17.00   Min.   : 3.0  
##  1st Qu.: 0.000   1st Qu.:0.0000   1st Qu.:20.00   1st Qu.: 9.0  
##  Median : 3.702   Median :0.0000   Median :24.00   Median :10.0  
##  Mean   : 5.301   Mean   :0.4157   Mean   :25.37   Mean   :10.2  
##  3rd Qu.: 8.125   3rd Qu.:1.0000   3rd Qu.:28.00   3rd Qu.:11.0  
##  Max.   :60.308   Max.   :1.0000   Max.   :55.00   Max.   :16.0  
##      black             hisp            married            re74        
##  Min.   :0.0000   Min.   :0.00000   Min.   :0.0000   Min.   : 0.0000  
##  1st Qu.:1.0000   1st Qu.:0.00000   1st Qu.:0.0000   1st Qu.: 0.0000  
##  Median :1.0000   Median :0.00000   Median :0.0000   Median : 0.0000  
##  Mean   :0.8337   Mean   :0.08764   Mean   :0.1685   Mean   : 2.1023  
##  3rd Qu.:1.0000   3rd Qu.:0.00000   3rd Qu.:0.0000   3rd Qu.: 0.8244  
##  Max.   :1.0000   Max.   :1.00000   Max.   :1.0000   Max.   :39.5707  
##       re75       
##  Min.   : 0.000  
##  1st Qu.: 0.000  
##  Median : 0.000  
##  Mean   : 1.377  
##  3rd Qu.: 1.221  
##  Max.   :25.142
str(assignment2)
## Classes 'tbl_df', 'tbl' and 'data.frame':    445 obs. of  9 variables:
##  $ re78   : num  9.93 3.6 24.91 7.51 0.29 ...
##   ..- attr(*, "label")= chr "real earns., 1978, $1000s"
##   ..- attr(*, "format.stata")= chr "%9.0g"
##  $ train  : num  1 1 1 1 1 1 1 1 1 1 ...
##   ..- attr(*, "label")= chr "=1 if assigned to job training"
##   ..- attr(*, "format.stata")= chr "%9.0g"
##  $ age    : num  37 22 30 27 33 22 23 32 22 33 ...
##   ..- attr(*, "label")= chr "age in 1977"
##   ..- attr(*, "format.stata")= chr "%9.0g"
##  $ educ   : num  11 9 12 11 8 9 12 11 16 12 ...
##   ..- attr(*, "label")= chr "years of education"
##   ..- attr(*, "format.stata")= chr "%9.0g"
##  $ black  : num  1 0 1 1 1 1 1 1 1 0 ...
##   ..- attr(*, "label")= chr "=1 if black"
##   ..- attr(*, "format.stata")= chr "%9.0g"
##  $ hisp   : num  0 1 0 0 0 0 0 0 0 0 ...
##   ..- attr(*, "label")= chr "=1 if Hispanic"
##   ..- attr(*, "format.stata")= chr "%9.0g"
##  $ married: num  1 0 0 0 0 0 0 0 0 1 ...
##   ..- attr(*, "label")= chr "=1 if married"
##   ..- attr(*, "format.stata")= chr "%9.0g"
##  $ re74   : num  0 0 0 0 0 0 0 0 0 0 ...
##   ..- attr(*, "label")= chr "real earns., 1974, $1000s"
##   ..- attr(*, "format.stata")= chr "%9.0g"
##  $ re75   : num  0 0 0 0 0 0 0 0 0 0 ...
##   ..- attr(*, "label")= chr "real earns., 1975, $1000s"
##   ..- attr(*, "format.stata")= chr "%9.0g"

Question 1

Test the baseline equivalence for variable (age, years of education, and marriage status) using t-test. Check the means for each group, the mean difference, SE, p- values, and effect size (Hedges’ g ). Report your results as a Table similar to the Table 1 of Konstantopoulos et al. 2013, and interpret the results.

tage <- t.test(assignment2$age ~ assignment2$train)
tage
## 
##  Welch Two Sample t-test
## 
## data:  assignment2$age by assignment2$train
## t = -1.114, df = 393.11, p-value = 0.2659
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -2.1077774  0.5830373
## sample estimates:
## mean in group 0 mean in group 1 
##        25.05385        25.81622
l1 <- length(assignment2$train[assignment2$train==0])
l2 <- length(assignment2$train[assignment2$train==1])
library(esc)
esc_t(t=tage$statistic, grp1n = l1, grp2n=l2, es.type="g")
## 
## Effect Size Calculation for Meta Analysis
## 
##      Conversion: t-value to effect size Hedges' g
##     Effect Size:  -0.1070
##  Standard Error:   0.0963
##        Variance:   0.0093
##        Lower CI:  -0.2956
##        Upper CI:   0.0817
##          Weight: 107.9394
teduc <- t.test(assignment2$educ ~ assignment2$train)
teduc
## 
##  Welch Two Sample t-test
## 
## data:  assignment2$educ by assignment2$train
## t = -1.4422, df = 340.6, p-value = 0.1502
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.60866000  0.09369118
## sample estimates:
## mean in group 0 mean in group 1 
##        10.08846        10.34595
esc_t(t=teduc$statistic, grp1n = l1, grp2n=l2, es.type="g")
## 
## Effect Size Calculation for Meta Analysis
## 
##      Conversion: t-value to effect size Hedges' g
##     Effect Size:  -0.1385
##  Standard Error:   0.0963
##        Variance:   0.0093
##        Lower CI:  -0.3272
##        Upper CI:   0.0503
##          Weight: 107.8379
tmarried <- t.test(assignment2$married ~ assignment2$train)
tmarried
## 
##  Welch Two Sample t-test
## 
## data:  assignment2$married by assignment2$train
## t = -0.96684, df = 375.72, p-value = 0.3342
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.10722173  0.03653566
## sample estimates:
## mean in group 0 mean in group 1 
##       0.1538462       0.1891892
esc_t(t=tmarried$statistic, grp1n = l1, grp2n=l2, es.type="g")
## 
## Effect Size Calculation for Meta Analysis
## 
##      Conversion: t-value to effect size Hedges' g
##     Effect Size:  -0.0928
##  Standard Error:   0.0962
##        Variance:   0.0093
##        Lower CI:  -0.2815
##        Upper CI:   0.0958
##          Weight: 107.9765

Question 2

Test the baseline equivalence using a regression of treatment assignment (i.e., train) on the full set of covariates included in the datasets. Report the regression results and the F-static, and interpret the results.

Reg <- lm(assignment2$train ~ assignment2$age+assignment2$educ+assignment2$married, data = assignment2)
summary(Reg)
## 
## Call:
## lm(formula = assignment2$train ~ assignment2$age + assignment2$educ + 
##     assignment2$married, data = assignment2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.5352 -0.4194 -0.3638  0.5682  0.6947 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)
## (Intercept)         0.140895   0.158293   0.890    0.374
## assignment2$age     0.003112   0.003370   0.924    0.356
## assignment2$educ    0.018523   0.013098   1.414    0.158
## assignment2$married 0.041645   0.064045   0.650    0.516
## 
## Residual standard error: 0.4929 on 441 degrees of freedom
## Multiple R-squared:  0.008614,   Adjusted R-squared:  0.00187 
## F-statistic: 1.277 on 3 and 441 DF,  p-value: 0.2816

Question 3

Using regression analysis to evaluate the treatment effects.

Simple regression analysis: \(re78 = β_0 + β_1Train + u\) * Interpret the results

SimReg <- lm(assignment2$re78 ~ assignment2$train, data = assignment2)
summary(SimReg)
## 
## Call:
## lm(formula = assignment2$re78 ~ assignment2$train, data = assignment2)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -6.349 -4.555 -1.829  2.917 53.959 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         4.5548     0.4080  11.162  < 2e-16 ***
## assignment2$train   1.7943     0.6329   2.835  0.00479 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.58 on 443 degrees of freedom
## Multiple R-squared:  0.01782,    Adjusted R-squared:  0.01561 
## F-statistic: 8.039 on 1 and 443 DF,  p-value: 0.004788
library(MASS)
RobReg <- rlm(assignment2$re78 ~ assignment2$train, data = assignment2)
summary(RobReg)
## 
## Call: rlm(formula = assignment2$re78 ~ assignment2$train, data = assignment2)
## Residuals:
##    Min     1Q Median     3Q    Max 
## -5.321 -4.156 -1.072  3.454 54.987 
## 
## Coefficients:
##                   Value   Std. Error t value
## (Intercept)        4.1557  0.3169    13.1120
## assignment2$train  1.1655  0.4916     2.3711
## 
## Residual standard error: 6.161 on 443 degrees of freedom

Multiple regression: \(re78 = β_0 + β_1Train + β_2age + β_3educ + β_4black + β_5hisp + β_6married + β_7re74 + β_8re75 + u\) * Should include all the covariates in the datasets

MulReg <- lm(assignment2$re78 ~ assignment2$train + assignment2$age + assignment2$educ + assignment2$black + assignment2$hisp + assignment2$married + assignment2$re74 + assignment2$re75, data = assignment2)
summary(MulReg)
## 
## Call:
## lm(formula = assignment2$re78 ~ assignment2$train + assignment2$age + 
##     assignment2$educ + assignment2$black + assignment2$hisp + 
##     assignment2$married + assignment2$re74 + assignment2$re75, 
##     data = assignment2)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -9.768 -4.399 -1.668  2.983 54.081 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)   
## (Intercept)          0.62174    2.44747   0.254  0.79959   
## assignment2$train    1.68259    0.63177   2.663  0.00802 **
## assignment2$age      0.05577    0.04477   1.246  0.21355   
## assignment2$educ     0.40588    0.17563   2.311  0.02130 * 
## assignment2$black   -2.16978    1.15859  -1.873  0.06177 . 
## assignment2$hisp     0.15793    1.54526   0.102  0.91864   
## assignment2$married -0.14027    0.87845  -0.160  0.87321   
## assignment2$re74     0.08286    0.07666   1.081  0.28040   
## assignment2$re75     0.05153    0.13419   0.384  0.70114   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.506 on 436 degrees of freedom
## Multiple R-squared:  0.05482,    Adjusted R-squared:  0.03748 
## F-statistic: 3.161 on 8 and 436 DF,  p-value: 0.001714
* Interpret the results regarding the treatment effects and the regression coefficients for all covariates

* Check if there is multicellularity problem using Stata command “vif” and interpret the results
library(car)
## Loading required package: carData
vif(MulReg)
##   assignment2$train     assignment2$age    assignment2$educ 
##            1.019210            1.059961            1.039169 
##   assignment2$black    assignment2$hisp assignment2$married 
##            1.956452            2.007244            1.136847 
##    assignment2$re74    assignment2$re75 
##            1.773579            1.875306

(Bonus question, 1 point)

Compare the treatment effects estimated from simple regression and multiple regression. Explain why they are not the same. Should they be the same theoretically?