Assignment from the seminal paper by Alan Krueger. Krueger (1999) Experimental Estimates of Education Production Functions QJE 114 (2) : 497-532
The paper is trying to reveal causal relation between class size and student’s education outcome. Whether small class size is positively related to student’s test score?
Randomized controlled trial wherein random sample collected at the beginning is followed till the end of the experiment without contaminating the sample.
Test score is not impacted by anything other than random class size allocation. Any change in test score will be attributed to class size.
Assignment from the seminal paper by Orley Ashenfelter and Alan Krueger. Ashenfelter and Krueger (1994) Estimates of the Economic Return to Schooling from a New Sample of Twins AER 84(5): 1157-1173
The paper is trying to reveal causal relation between education level and wage. Whether more schooling years increases the wage?
The sample was collected from the fair rather than randomly from throughout USA. Random collection would have controlled for biases due to :
Controlled the unobservables by considering twins in which case everything else will remain constant. Measurement error was controlled by interviewing the twins separately about others education and wage.
<- read_dta("data/AshenfelterKrueger1994_twins.dta")
df head(df,6)
famid | age | educ1 | educ2 | lwage1 | lwage2 | male1 | male2 | white1 | white2 |
---|---|---|---|---|---|---|---|---|---|
1 | 33.3 | 16 | 16 | 2.16 | 2.42 | 0 | 0 | 1 | 1 |
2 | 43.6 | 12 | 19 | 2.17 | 2.89 | 0 | 0 | 1 | 1 |
3 | 31 | 12 | 12 | 2.79 | 2.8 | 1 | 1 | 1 | 1 |
4 | 34.6 | 14 | 14 | 2.82 | 2.26 | 1 | 1 | 1 | 1 |
5 | 35 | 15 | 13 | 2.03 | 3.56 | 0 | 0 | 1 | 1 |
6 | 29.3 | 14 | 12 | 2.71 | 2.48 | 1 | 1 | 1 | 1 |
<- df$educ1-df$educ2
edu <- df$lwage1-df$lwage2
lwage
<- lm(lwage ~ edu, data=df)
model1
stargazer(model1, header=FALSE, type='text', font.size="small",
omit.stat=c("adj.rsq", "ser", "f"), title = "Table 3: First Difference estimates of log wage for identical twins",
covariate.labels= c("Education","Constant"))
##
## Table 3: First Difference estimates of log wage for identical twins
## ========================================
## Dependent variable:
## ---------------------------
## lwage
## ----------------------------------------
## Education 0.092***
## (0.024)
##
## Constant -0.079*
## (0.045)
##
## ----------------------------------------
## Observations 149
## R2 0.092
## ========================================
## Note: *p<0.1; **p<0.05; ***p<0.01
The estimate of effect of difference in schooling of twins on the wage difference is 9.2%.
First reshape the data to the long form.
<- reshape(df, idvar = c("famid","age"), varying = list(c(3,4),c(5,6),c(7,8),c(9,10)), v.names = c("edu","lwage","male","white"), direction = "long")
df_new head(df_new,6)
famid | age | time | edu | lwage | male | white |
---|---|---|---|---|---|---|
1 | 33.3 | 1 | 16 | 2.16 | 0 | 1 |
2 | 43.6 | 1 | 12 | 2.17 | 0 | 1 |
3 | 31 | 1 | 12 | 2.79 | 1 | 1 |
4 | 34.6 | 1 | 14 | 2.82 | 1 | 1 |
5 | 35 | 1 | 15 | 2.03 | 0 | 1 |
6 | 29.3 | 1 | 14 | 2.71 | 1 | 1 |
Let’s reproduce the result from table 3 column 1 of the paper using the new dataframe
$age_sq <- ((df_new$age^2)/100)
df_new
<- lm(lwage ~ edu + age + age_sq + male + white, data=df_new)
model2
stargazer(model2, header=FALSE, type='text', font.size="small",
omit.stat=c("adj.rsq", "ser", "f"), title = "Table 3: Ordinary Least Square (OLS) estimates of log wage for identical twins",
covariate.labels= c("Own education","Age","Age Squared (/100)","Male","White"))
##
## Table 3: Ordinary Least Square (OLS) estimates of log wage for identical twins
## ==============================================
## Dependent variable:
## ---------------------------
## lwage
## ----------------------------------------------
## Own education 0.084***
## (0.014)
##
## Age 0.088***
## (0.019)
##
## Age Squared (/100) -0.087***
## (0.023)
##
## Male 0.204***
## (0.063)
##
## White -0.410***
## (0.127)
##
## Constant -0.471
## (0.426)
##
## ----------------------------------------------
## Observations 298
## R2 0.272
## ==============================================
## Note: *p<0.1; **p<0.05; ***p<0.01
Table with both the model-
= list("OLS (i)" = model2,"First difference (v)" = model1)
model_tbl3
<- names(coef(model_tbl3[[1]]))[str_detect(names(coef(model_tbl3[[1]])), "vdc")]
coefs
<- huxreg(model_tbl3,number_format = 3, omit_coefs =coefs,
huxtable coefs = c("Own education"="edu","Age"="age","Age Squared (/100)"="age_sq", "Male"="male","White"="white"),
statistics = c("Sample size:" = "nobs", "R2" = "r.squared"))%>%
set_caption("Table 3: Ordinary Least Square (OLS) and First difference estimates of log wage for identical twins")
add_footnote(huxtable,"Each equation also includes an intercept term. Number in parentheses are estimated standard errors.")
OLS (i) | First difference (v) | |
---|---|---|
Own education | 0.084 *** | 0.092 *** |
(0.014) | (0.024) | |
Age | 0.088 *** | |
(0.019) | ||
Age Squared (/100) | -0.087 *** | |
(0.023) | ||
Male | 0.204 ** | |
(0.063) | ||
White | -0.410 ** | |
(0.127) | ||
Sample size: | 298 | 149 |
R2 | 0.272 | 0.092 |
*** p < 0.001; ** p < 0.01; * p < 0.05. | ||
Each equation also includes an intercept term. Number in parentheses are estimated standard errors. |
The estimate of effect of schooling on wage is 8.4% in stacked data as against 9.2% in the first difference.
The estimate of effect of schooling on wage is 8.4% as against 9.2% in the first difference. Wage increases with age but after certain cutoff age, the wage starts declining with age as indicated by coefficient of Age Squared. The effect of race on age is -41%.