Based on Krueger (1999) Experimental Estimates of Education Production Functions QJE 114(2) : 497-532
That between the size of a student’s class (in terms of numbers of students) and their performance on two standardized tests: the Stanford Achievement Test (SAT), and the Tennessee Basic Skills First (BSF).
One where we could randomly assign students and teachers to different-sized classes and ensure 100 percent compliance. We would then assess students’ performance on tests.
Causal identification is based on the fact that provided both student and teacher class-assignments are random with total compliance, the treatment groups will be comparable and any difference in student performance can be attributed to the treatment (class-size and access to aide).
Due to some parents’ complaints, students in regular-size classes were randomly assigned again between classes with and without full-time aides at the beginning of first grade, while students in the small class often did not switch and had the same set of classmates.
There were some non-random transition (10 percent) between small and regular classes between grades due to behavioral problems and parental complaints. Besides, some families also relocated during the course of the experiment. Attrition was also observed and not all students continued from kindergarten to the first grade in the same school, with the biggest concern being that some of the switches were non-random (i.e., students moved upon learning their class assignments.)
Based on Ashenfelter and Krueger (1994) Estimates of the Economic Return to Schooling from a New Sample of Twins AER 84(5): 1157-1173
The effect of receiving an extra year of education on wages.
Having pupils randomly assigned to receive different levels of education and measuring their earnings upon completion of their schooling.
Use data on education levels and labor market outcomes of twins to identify the effect of education on earnings, assuming that all unobservable factors (especially those that are related to education and earnings) will be the same between identical twins.
The main threat to identification would be that each individual might have different levels of inherent proclivity towards obtaining higher education, which may also impact their earnings. In other words, some people might be more inclined to receive higher education due to an intrinsic drive, and this drive may also somehow manifest itself in the form of higher wages.
library(utils) # required to download file off the Web
library(haven) # has the read_dta() function to read Stata files
# fetch data from the url and store it in the working directory
download.file("http://www.mfilipski.com/files/AshenfelterKrueger1994_twins.dta", "~/metricsHW/twins.dta")
twins <- read_dta("twins.dta")
head(twins)
## # A tibble: 6 x 10
## famid age educ1 educ2 lwage1 lwage2 male1 male2 white1 white2
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 33.3 16 16 2.16 2.42 0 0 1 1
## 2 2 43.6 12 19 2.17 2.89 0 0 1 1
## 3 3 31.0 12 12 2.79 2.80 1 1 1 1
## 4 4 34.6 14 14 2.82 2.26 1 1 1 1
## 5 5 35.0 15 13 2.03 3.56 0 0 1 1
## 6 6 29.3 14 12 2.71 2.48 1 1 1 1
Table 3 column 5 reports results from a regression of the intrapair difference in wage rates on the intrapair difference in schooling levels.
# fit a linear model as: difference in wages ~ difference in education
# to fit the model without creating extra variables, use the following code
# the I operator inhibits conversion, i.e. it is treated 'as is'
tab3col5 <- lm(I(lwage2 - lwage1) ~ I(educ2 - educ1), data = twins)
# print the table with some light edits
stargazer::stargazer(tab3col5,
title = "FIXED-EFFECTS ESTIMATES OF LOG WAGE EQUATIONS FOR IDENTICAL TWINS",
out.header = F,
dep.var.labels = "First difference",
dep.var.caption = "",
type = "text",
summary= F,
covariate.labels = "Own education")
##
## FIXED-EFFECTS ESTIMATES OF LOG WAGE EQUATIONS FOR IDENTICAL TWINS
## ===============================================
## First difference
## -----------------------------------------------
## Own education 0.092***
## (0.024)
##
## Constant 0.079*
## (0.045)
##
## -----------------------------------------------
## Observations 149
## R2 0.092
## Adjusted R2 0.086
## Residual Std. Error 0.554 (df = 147)
## F Statistic 14.914*** (df = 1; 147)
## ===============================================
## Note: *p<0.1; **p<0.05; ***p<0.01
If our assumptions hold, having an extra year of education will result in about 9.2 percent higher (log of) wages, on average.
# fit the model
# Note here that instead of using the melt or gather function (or something similar),
# we are directly specifying the formula in terms of vectors from the
# 'twins' dataset, using the 'c' operator which combines arguments.
# Resuts match those from the paper
tab3col1<- lm(c(lwage1, lwage2) ~ c(educ1, educ2) + c(age, age) +
c(age ^ 2/100, age^2/100) + c(male1, male2) +
c(white1, white2) , data = twins)
#print the table with some light editing
stargazer::stargazer(tab3col1,
title = "FIXED-EFFECTS ESTIMATES OF LOG WAGE EQUATIONS FOR IDENTICAL TWINS",
out.header = F,
dep.var.labels = "OLS",
dep.var.caption = "",
type = "text",
covariate.labels = c("Own education", "Age", "Age squared/100", "Male", "White"))
##
## FIXED-EFFECTS ESTIMATES OF LOG WAGE EQUATIONS FOR IDENTICAL TWINS
## ===============================================
## OLS
## -----------------------------------------------
## Own education 0.084***
## (0.014)
##
## Age 0.088***
## (0.019)
##
## Age squared/100 -0.087***
## (0.023)
##
## Male 0.204***
## (0.063)
##
## White -0.410***
## (0.127)
##
## Constant -0.471
## (0.426)
##
## -----------------------------------------------
## Observations 298
## R2 0.272
## Adjusted R2 0.260
## Residual Std. Error 0.532 (df = 292)
## F Statistic 21.860*** (df = 5; 292)
## ===============================================
## Note: *p<0.1; **p<0.05; ***p<0.01
We find that, given our assumptions are valid, an extra year of schooling results in 8.4 percent higher (log of) wages.
Being male and white will increase (log of) wages by approx. 20.4 percent and 41.0 percent, respectively, if everything else is the same.
The coefficient on age will give us the effect of being an additional year older on (log of) wages. The age-squared/100 helps to capture any non-linearities in the relationship between income and age.