Krueger (1999) Experimental Estimates of Education Production Functions QJE 114 (2) : 497-532
Krueger (1999) estimated the effect of class sizes on student performance (test scores).
The author argued that most of the past studies are based on the value-added specification, thus showing a need to develop an appropriate model of student learning. To test this causal effect, the ideal experiment would be a random assignment of teachers and students in different class sizes across schools. At the end of each school year, student performance would be tested.
Identification strategy is that each school is required to have at least one of each class-size type (small, regular with aide, and regular without aide), and a random assignment of students within schools. The independence between class-size assignment and other variables is only valid within schools, because randomization was done within schools.
Krueger (1999) made several assumptions and deviated from the ideal experimental design:
Students were randomly reassigned between regular-size classes (with and without full-time aides) at the beginning of first grade, while students in small classes continued on in small classes, often with the same set of classmates (re-randomization).
Roughly 10% of students were switched between small and regular sized classes due to the behavioral problems or parental complaints (nonrandom transitions).
They addressed this problem, and the variability of class size for a given type of assignment, in some of the analysis that follows initial random assignment was used as an instrumental variable for actual class size. Furthermore, they addressed the limitation about students and their families relocation during the school year, .
Ashenfelter and Krueger (1994) Estimates of the Economic Return to Schooling from a New Sample of Twins AER 84(5): 1157-1173
Ashenfelter and krueger (1994) estimated the returns to schooling by contrasting the wage rates of identical twins with different schooling levels.
The ideal experiment would be a random assignment of subjects to different schooling levels so that all other differences are controlled, and the returns would be attributed to different schooling levels.
Ashenfelter and krueger (1994) controlled other unobservable factors by assuming that they would be identical for the twins to estimate the causal effects of schooling on wages.
Measurement errors that were not addressed in the past studies could be a threat to the identification. However, this study incorporates errors in the measurement of schooling. Schooling level of student may also be associated with the family factors such as twins who are raised by individual parents, thus having its effect on their wages.
# Load STATA file using the foreign package, make table using stargazer package, and melt data using reshape package
library(foreign)
library(stargazer)
library(reshape)
# Import dta data
my_data <- read.dta("AshenfelterKrueger1994_twins.dta")
head(my_data)
## famid age educ1 educ2 lwage1 lwage2 male1 male2 white1 white2
## 1 1 33.25120 16 16 2.161021 2.420368 0 0 1 1
## 2 2 43.57016 12 19 2.169054 2.890372 0 0 1 1
## 3 3 30.96783 12 12 2.791778 2.803360 1 1 1 1
## 4 4 34.63381 14 14 2.824351 2.263366 1 1 1 1
## 5 5 34.97878 15 13 2.032088 3.555348 0 0 1 1
## 6 6 29.33881 14 12 2.708050 2.484907 1 1 1 1
# Create difference variable for lwage and education
my_data$wage_diff <- my_data$lwage1 - my_data$lwage2
my_data$educ_diff <- my_data$educ1 - my_data$educ2
# Run the first difference model
mod <- lm(wage_diff ~ educ_diff, data = my_data)
# Create a table with stargazer package
stargazer(mod, type = "text", title = "TABLE 3", align = TRUE, keep.stat = c("n","rsq"),
dep.var.labels = c("First difference"), covariate.labels = c("Own education"),
omit = c("Constant")) # Display sample size and R-squared and remove constant
##
## TABLE 3
## =========================================
## Dependent variable:
## ---------------------------
## First difference
## -----------------------------------------
## Own education 0.092***
## (0.024)
##
## -----------------------------------------
## Observations 149
## R2 0.092
## =========================================
## Note: *p<0.1; **p<0.05; ***p<0.01
Interpretation: The result shows that \(\hat{\beta} = 0.092\), which means wage increases by 9.2% when the schooling level increases by 1 year and is statistically significant at 1% significance level.
# reshape the data (make it long using melt command from the reshape package)
wage <- melt(cbind(my_data$lwage1, my_data$lwage2))
educ <- melt(cbind(my_data$educ1, my_data$educ2))
male <- melt(cbind(my_data$malew, my_data$male2))
white <- melt(cbind(my_data$white1, my_data$white2))
age <- melt(cbind(my_data$age, my_data$age))
# create a new dataset by combining these variables and make it a data frame
my_newdata <- data.frame(cbind(wage[,3], educ[,3], male[,3], white[,3], age[,3]))
# Give variable names to the data frame
colnames(my_newdata) <- c("wage", "educ", "male", "white", "age")
# Then, create new variable age squared
my_newdata$agesq <- ((my_newdata$age)^2) / 100
# Run the model for this new dataset
mod1 <- lm(wage ~ educ + male + white + age + agesq, data=my_newdata)
# Create a table with stargazer package
stargazer(mod1, type = "text", title = "TABLE 3", align = TRUE, keep.stat = c("n","rsq"),
dep.var.labels = c("OLS"),
covariate.labels = c("Own education", "Male", "White", "Age", "Age squared / 100"),
omit = c("Constant")) # Display sample size and R-squared and remove constant
##
## TABLE 3
## =============================================
## Dependent variable:
## ---------------------------
## OLS
## ---------------------------------------------
## Own education 0.084***
## (0.014)
##
## Male 0.204***
## (0.063)
##
## White -0.410***
## (0.127)
##
## Age 0.088***
## (0.019)
##
## Age squared / 100 -0.087***
## (0.023)
##
## ---------------------------------------------
## Observations 298
## R2 0.272
## =============================================
## Note: *p<0.1; **p<0.05; ***p<0.01
Coefficient on education:
The result shows that the coefficient on education is 0.084, which means wage increases by 8.4% on average when the schooling level increases by 1 year and is statistically significant at 1% significance level.
Coefficient on other control variables:
The coefficient on male is 0.204, which means wage of male twins is 22.63% higher than the female on average and is statistically significant at 1% significance level.
The coefficient on white is -0.410, which means wage of white twins is 33.63% lower than non-white and is statistically significant at 1% significance level.
The coefficient on age is 0.088 and the coefficient on agesq is -0.087. So, the marginal effect of age on wage is 100(0.088) + 2(-0.087)age. This mean at age 40, wage increases by 1.84% for an additional year and is statistically significant at 1% significance level.