summary(assignment2)
## re78 train age educ
## Min. : 0.000 Min. :0.0000 Min. :17.00 Min. : 3.0
## 1st Qu.: 0.000 1st Qu.:0.0000 1st Qu.:20.00 1st Qu.: 9.0
## Median : 3.702 Median :0.0000 Median :24.00 Median :10.0
## Mean : 5.301 Mean :0.4157 Mean :25.37 Mean :10.2
## 3rd Qu.: 8.125 3rd Qu.:1.0000 3rd Qu.:28.00 3rd Qu.:11.0
## Max. :60.308 Max. :1.0000 Max. :55.00 Max. :16.0
## black hisp married re74
## Min. :0.0000 Min. :0.00000 Min. :0.0000 Min. : 0.0000
## 1st Qu.:1.0000 1st Qu.:0.00000 1st Qu.:0.0000 1st Qu.: 0.0000
## Median :1.0000 Median :0.00000 Median :0.0000 Median : 0.0000
## Mean :0.8337 Mean :0.08764 Mean :0.1685 Mean : 2.1023
## 3rd Qu.:1.0000 3rd Qu.:0.00000 3rd Qu.:0.0000 3rd Qu.: 0.8244
## Max. :1.0000 Max. :1.00000 Max. :1.0000 Max. :39.5707
## re75
## Min. : 0.000
## 1st Qu.: 0.000
## Median : 0.000
## Mean : 1.377
## 3rd Qu.: 1.221
## Max. :25.142
str(assignment2)
## Classes 'tbl_df', 'tbl' and 'data.frame': 445 obs. of 9 variables:
## $ re78 : num 9.93 3.6 24.91 7.51 0.29 ...
## ..- attr(*, "label")= chr "real earns., 1978, $1000s"
## ..- attr(*, "format.stata")= chr "%9.0g"
## $ train : num 1 1 1 1 1 1 1 1 1 1 ...
## ..- attr(*, "label")= chr "=1 if assigned to job training"
## ..- attr(*, "format.stata")= chr "%9.0g"
## $ age : num 37 22 30 27 33 22 23 32 22 33 ...
## ..- attr(*, "label")= chr "age in 1977"
## ..- attr(*, "format.stata")= chr "%9.0g"
## $ educ : num 11 9 12 11 8 9 12 11 16 12 ...
## ..- attr(*, "label")= chr "years of education"
## ..- attr(*, "format.stata")= chr "%9.0g"
## $ black : num 1 0 1 1 1 1 1 1 1 0 ...
## ..- attr(*, "label")= chr "=1 if black"
## ..- attr(*, "format.stata")= chr "%9.0g"
## $ hisp : num 0 1 0 0 0 0 0 0 0 0 ...
## ..- attr(*, "label")= chr "=1 if Hispanic"
## ..- attr(*, "format.stata")= chr "%9.0g"
## $ married: num 1 0 0 0 0 0 0 0 0 1 ...
## ..- attr(*, "label")= chr "=1 if married"
## ..- attr(*, "format.stata")= chr "%9.0g"
## $ re74 : num 0 0 0 0 0 0 0 0 0 0 ...
## ..- attr(*, "label")= chr "real earns., 1974, $1000s"
## ..- attr(*, "format.stata")= chr "%9.0g"
## $ re75 : num 0 0 0 0 0 0 0 0 0 0 ...
## ..- attr(*, "label")= chr "real earns., 1975, $1000s"
## ..- attr(*, "format.stata")= chr "%9.0g"
Test the baseline equivalence for variable (age, years of education, and marriage status) using t-test. Check the means for each group, the mean difference, SE, p- values, and effect size (Hedges’ g ). Report your results as a Table similar to the Table 1 of Konstantopoulos et al. 2013, and interpret the results.
tage <- t.test(assignment2$age ~ assignment2$train)
tage
##
## Welch Two Sample t-test
##
## data: assignment2$age by assignment2$train
## t = -1.114, df = 393.11, p-value = 0.2659
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -2.1077774 0.5830373
## sample estimates:
## mean in group 0 mean in group 1
## 25.05385 25.81622
l1 <- length(assignment2$train[assignment2$train==0])
l2 <- length(assignment2$train[assignment2$train==1])
library(esc)
esc_t(t=tage$statistic, grp1n = l1, grp2n=l2, es.type="g")
##
## Effect Size Calculation for Meta Analysis
##
## Conversion: t-value to effect size Hedges' g
## Effect Size: -0.1070
## Standard Error: 0.0963
## Variance: 0.0093
## Lower CI: -0.2956
## Upper CI: 0.0817
## Weight: 107.9394
teduc <- t.test(assignment2$educ ~ assignment2$train)
teduc
##
## Welch Two Sample t-test
##
## data: assignment2$educ by assignment2$train
## t = -1.4422, df = 340.6, p-value = 0.1502
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.60866000 0.09369118
## sample estimates:
## mean in group 0 mean in group 1
## 10.08846 10.34595
esc_t(t=teduc$statistic, grp1n = l1, grp2n=l2, es.type="g")
##
## Effect Size Calculation for Meta Analysis
##
## Conversion: t-value to effect size Hedges' g
## Effect Size: -0.1385
## Standard Error: 0.0963
## Variance: 0.0093
## Lower CI: -0.3272
## Upper CI: 0.0503
## Weight: 107.8379
tmarried <- t.test(assignment2$married ~ assignment2$train)
tmarried
##
## Welch Two Sample t-test
##
## data: assignment2$married by assignment2$train
## t = -0.96684, df = 375.72, p-value = 0.3342
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.10722173 0.03653566
## sample estimates:
## mean in group 0 mean in group 1
## 0.1538462 0.1891892
esc_t(t=tmarried$statistic, grp1n = l1, grp2n=l2, es.type="g")
##
## Effect Size Calculation for Meta Analysis
##
## Conversion: t-value to effect size Hedges' g
## Effect Size: -0.0928
## Standard Error: 0.0962
## Variance: 0.0093
## Lower CI: -0.2815
## Upper CI: 0.0958
## Weight: 107.9765
Test the baseline equivalence using a regression of treatment assignment (i.e., train) on the full set of covariates included in the datasets. Report the regression results and the F-static, and interpret the results.
Reg <- lm(assignment2$train ~ assignment2$age+assignment2$educ+assignment2$married, data = assignment2)
summary(Reg)
##
## Call:
## lm(formula = assignment2$train ~ assignment2$age + assignment2$educ +
## assignment2$married, data = assignment2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.5352 -0.4194 -0.3638 0.5682 0.6947
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.140895 0.158293 0.890 0.374
## assignment2$age 0.003112 0.003370 0.924 0.356
## assignment2$educ 0.018523 0.013098 1.414 0.158
## assignment2$married 0.041645 0.064045 0.650 0.516
##
## Residual standard error: 0.4929 on 441 degrees of freedom
## Multiple R-squared: 0.008614, Adjusted R-squared: 0.00187
## F-statistic: 1.277 on 3 and 441 DF, p-value: 0.2816
Using regression analysis to evaluate the treatment effects.
Simple regression analysis: \(re78 = β_0 + β_1Train + u\) * Interpret the results
SimReg <- lm(assignment2$re78 ~ assignment2$train, data = assignment2)
summary(SimReg)
##
## Call:
## lm(formula = assignment2$re78 ~ assignment2$train, data = assignment2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.349 -4.555 -1.829 2.917 53.959
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.5548 0.4080 11.162 < 2e-16 ***
## assignment2$train 1.7943 0.6329 2.835 0.00479 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.58 on 443 degrees of freedom
## Multiple R-squared: 0.01782, Adjusted R-squared: 0.01561
## F-statistic: 8.039 on 1 and 443 DF, p-value: 0.004788
library(MASS)
RobReg <- rlm(assignment2$re78 ~ assignment2$train, data = assignment2)
summary(RobReg)
##
## Call: rlm(formula = assignment2$re78 ~ assignment2$train, data = assignment2)
## Residuals:
## Min 1Q Median 3Q Max
## -5.321 -4.156 -1.072 3.454 54.987
##
## Coefficients:
## Value Std. Error t value
## (Intercept) 4.1557 0.3169 13.1120
## assignment2$train 1.1655 0.4916 2.3711
##
## Residual standard error: 6.161 on 443 degrees of freedom
Multiple regression: \(re78 = β_0 + β_1Train + β_2age + β_3educ + β_4black + β_5hisp + β_6married + β_7re74 + β_8re75 + u\) * Should include all the covariates in the datasets
MulReg <- lm(assignment2$re78 ~ assignment2$train + assignment2$age + assignment2$educ + assignment2$black + assignment2$hisp + assignment2$married + assignment2$re74 + assignment2$re75, data = assignment2)
summary(MulReg)
##
## Call:
## lm(formula = assignment2$re78 ~ assignment2$train + assignment2$age +
## assignment2$educ + assignment2$black + assignment2$hisp +
## assignment2$married + assignment2$re74 + assignment2$re75,
## data = assignment2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.768 -4.399 -1.668 2.983 54.081
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.62174 2.44747 0.254 0.79959
## assignment2$train 1.68259 0.63177 2.663 0.00802 **
## assignment2$age 0.05577 0.04477 1.246 0.21355
## assignment2$educ 0.40588 0.17563 2.311 0.02130 *
## assignment2$black -2.16978 1.15859 -1.873 0.06177 .
## assignment2$hisp 0.15793 1.54526 0.102 0.91864
## assignment2$married -0.14027 0.87845 -0.160 0.87321
## assignment2$re74 0.08286 0.07666 1.081 0.28040
## assignment2$re75 0.05153 0.13419 0.384 0.70114
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.506 on 436 degrees of freedom
## Multiple R-squared: 0.05482, Adjusted R-squared: 0.03748
## F-statistic: 3.161 on 8 and 436 DF, p-value: 0.001714
* Interpret the results regarding the treatment effects and the regression coefficients for all covariates
* Check if there is multicellularity problem using Stata command “vif” and interpret the results
library(car)
## Loading required package: carData
vif(MulReg)
## assignment2$train assignment2$age assignment2$educ
## 1.019210 1.059961 1.039169
## assignment2$black assignment2$hisp assignment2$married
## 1.956452 2.007244 1.136847
## assignment2$re74 assignment2$re75
## 1.773579 1.875306
Compare the treatment effects estimated from simple regression and multiple regression. Explain why they are not the same. Should they be the same theoretically?