#loading necessary packages
library(haven)
library(stargazer)
##
## Please cite as:
## Hlavac, Marek (2018). stargazer: Well-Formatted Regression and Summary Statistics Tables.
## R package version 5.2.2. https://CRAN.R-project.org/package=stargazer
Reference: Krueger (1999) Experimental Estimates of Education Production Functions QJE 114 (2) : 497-532
a. What is the causal link the paper is trying to reveal?
This paper is trying to identify the causal link between the class size and student performamces based on their standardized test results.
b. What would be the ideal experiment to test this causal link?
Randomized control trial could be the ideal experiment to test the causal link where a large sample of students are randomly assigned to different control groups based on class sizes. The performances of different control groups can be evaluated based on the standardized test scores after the trial period.
c. What is the identification strategy?
The identification strategy used in this study is that students are randomly assigned to classes with different sizes and it would help overcome the problem of omitted variable/characteristics that might be correlated with the class sizes.
d. What are the assumptions / threats to this identification strategy?
The authors mentioned some of the limitations of the random assignments such as even after randomly assigning students in different class sizes, students switched between small and regular classes between grades, primarily because of behavioral problems or parental complaints. Also, the actual class sizes varied because some students nonrandomly left to join another school. These non-randomization could affect the identification strategy.
Reference: Ashenfelter and Krueger (1994) Estimates of the Economic Return to Schooling from a New Sample of Twins AER 84(5): 1157-1173
a. What is the causal link the paper is trying to reveal?
Ashenfelter and Krueger (1994) estimated the causal relationship between the wage rate of identical twins with different schooling levels.
b. What would be the ideal experiment to test this causal link?
The ideal experiment to test this causal link between wage rate and different schooling levels is to design a randomized control trial experiment. The sample size can be divided into the experiment group and control groups based on years spent in school and then comparing the parameter estimate of the regression between wage and level of school between different groups will give us the causal link.
c. What is the identification strategy?
The identification strategy of the paper is to estimate the variation in wage rate between identical twins with each year of school completed.
d. What are the assumptions / threats to this identification strategy?
The authors made the assumption that monozygotic (from the same egg) twins are genetically identical and have similar family backgrounds, therefore, the difference in wage could result from different level of schooling due to difference in individual preferences.
a. Load Ashenfelter and Krueger AER 1994 data.
#loading the dataset
paper2data <- read_dta("AshenfelterKrueger1994_twins.dta")
dim(paper2data)
## [1] 149 10
head(paper2data)
## # A tibble: 6 x 10
## famid age educ1 educ2 lwage1 lwage2 male1 male2 white1 white2
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 33.3 16 16 2.16 2.42 0 0 1 1
## 2 2 43.6 12 19 2.17 2.89 0 0 1 1
## 3 3 31.0 12 12 2.79 2.80 1 1 1 1
## 4 4 34.6 14 14 2.82 2.26 1 1 1 1
## 5 5 35.0 15 13 2.03 3.56 0 0 1 1
## 6 6 29.3 14 12 2.71 2.48 1 1 1 1
b. Reproduce the result from table 3 column 5.
paper2data$educdiff <- paper2data$educ1-paper2data$educ2
paper2data$wagediff <- paper2data$lwage1-paper2data$lwage2
model1 <- lm(wagediff~educdiff, data=paper2data)
stargazer(model1, type="text", title="Table 3 Column 5", align=TRUE, dep.var.labels = "First difference (v)",keep.stat = c("n", "rsq"), omit=c("Constant") )
##
## Table 3 Column 5
## ========================================
## Dependent variable:
## ---------------------------
## First difference (v)
## ----------------------------------------
## educdiff 0.092***
## (0.024)
##
## ----------------------------------------
## Observations 149
## R2 0.092
## ========================================
## Note: *p<0.1; **p<0.05; ***p<0.01
c. Explain how this coefficient should be interpreted.
On an average, the wage rate increases by 9.2% with one additional year of schooling or education at 1% significance level.
d. Reproduce the result in table 3 column 1.
tab3col1 <- reshape(paper2data, varying=c("educ1", "lwage1", "male1", "white1", "educ2", "lwage2", "male2", "white2"),
v.names=c("educ", "lwage", "male", "white"),
timevar = "twin",
times=c("T1", "T2"),
idvar = c("famid", "age"),
direction = "l")
tab3col1.sort <- tab3col1[order(tab3col1$famid), ]
tab3col1$agesq <- tab3col1$age^2/100
model2 <- lm(lwage~educ+age+agesq+male+white, data=tab3col1)
stargazer(model2, type="text", title="Table 3 Column 1", align=TRUE, dep.var.labels = "OLS (i)",keep.stat = c("n", "rsq"), covariate.labels=c("Own education" ,"Age", "Age squared (÷ 100)", "Male", "White"), omit=c("Constant") )
##
## Table 3 Column 1
## ===============================================
## Dependent variable:
## ---------------------------
## OLS (i)
## -----------------------------------------------
## Own education 0.084***
## (0.014)
##
## Age 0.088***
## (0.019)
##
## Age squared (÷ 100) -0.087***
## (0.023)
##
## Male 0.204***
## (0.063)
##
## White -0.410***
## (0.127)
##
## -----------------------------------------------
## Observations 298
## R2 0.272
## ===============================================
## Note: *p<0.1; **p<0.05; ***p<0.01
e. Explain how the coefficient on education should be interpreted.
One additional year of education will lead to a 8.4% increase in wage rate on an average at 1% significance level.
f. Explain how the coefficient on the control variables should be interpreted.
The control variable age has non-linear relation with wage rate which increases at first and then starts decreasing. The coefficient of control variable male indicates that on average the wage rate of male twins is 20.4% higher than the female twins. Also, white twins have 41% lower wage rate than non-white twins on average. All the parameter estimates of control variables are statistically significant at 1% level.