Part 1 : Paper using randomized data: Impact of Class Size on Learning

Based on Krueger (1999) Experimental Estimates of Education Production Functions QJE 114(2) : 497-532

1.1

  1. What is the causal link the paper is trying to reveal?

That between the size of a student’s class (in terms of numbers of students) and their performance on two standardized tests: the Stanford Achievement Test (SAT), and the Tennessee Basic Skills First (BSF).

  1. What would be the ideal experiment to test this causal link?

One where we could randomly assign students and teachers to different-sized classes and ensure 100 percent compliance. We would then assess students’ performance on tests.

  1. What is the identification strategy?

Causal identification is based on the fact that provided both student and teacher class-assignments are random with total compliance, the treatment groups will be comparable and any difference in student performance can be attributed to the treatment (class-size and access to aide).

  1. What are the assumptions/threats to this identification strategy?
  1. Due to some parents’ complaints, students in regular-size classes were randomly assigned again between classes with and without full-time aides at the beginning of first grade, while students in the small class often did not switch and had the same set of classmates.

  2. There were some non-random transition (10 percent) between small and regular classes between grades due to behavioral problems and parental complaints. Besides, some families also relocated during the course of the experiment. Attrition was also observed and not all students continued from kindergarten to the first grade in the same school, with the biggest concern being that some of the switches were non-random (i.e., students moved upon learning their class assignments.)

Part 2: Paper using Twins for IdentificationL Economic Returns to Schooling

Based on Ashenfelter and Krueger (1994) Estimates of the Economic Return to Schooling from a New Sample of Twins AER 84(5): 1157-1173

2.1

  1. What is the causal link the paper is trying to reveal?

The effect of receiving an extra year of education on wages.

  1. What would be the ideal experiment to test this causal link?

Having pupils randomly assigned to receive different levels of education and measuring their earnings upon completion of their schooling.

  1. What is the identification strategy?

Use data on education levels and labor market outcomes of twins to identify the effect of education on earnings, assuming that all unobservable factors (especially those that are related to education and earnings) will be the same between identical twins.

  1. What are the assumptions/threats to this identification strategy?

The main threat to identification would be that each individual might have different levels of inherent proclivity towards obtaining higher education, which may also impact their earnings. In other words, some people might be more inclined to receive higher education due to an intrinsic drive, and this drive may also somehow manifest itself in the form of higher wages.

2.2 Replication Analysis

  1. Load Ashenfelter and Kruegeer AER 1994 data.
library(utils) # required to download file off the Web
library(haven) # has the read_dta() function to read Stata files

# fetch data from the url and store it in the working directory  
download.file("http://www.mfilipski.com/files/AshenfelterKrueger1994_twins.dta", "~/metricsHW/twins.dta")
twins <- read_dta("twins.dta")
head(twins)
## # A tibble: 6 x 10
##   famid   age educ1 educ2 lwage1 lwage2 male1 male2 white1 white2
##   <dbl> <dbl> <dbl> <dbl>  <dbl>  <dbl> <dbl> <dbl>  <dbl>  <dbl>
## 1     1  33.3    16    16   2.16   2.42     0     0      1      1
## 2     2  43.6    12    19   2.17   2.89     0     0      1      1
## 3     3  31.0    12    12   2.79   2.80     1     1      1      1
## 4     4  34.6    14    14   2.82   2.26     1     1      1      1
## 5     5  35.0    15    13   2.03   3.56     0     0      1      1
## 6     6  29.3    14    12   2.71   2.48     1     1      1      1
  1. Reproduce the result from table 3 column 5

Table 3 column 5 reports results from a regression of the intrapair difference in wage rates on the intrapair difference in schooling levels.

# fit a linear model as: difference in wages  ~ difference in education
# to fit the model without creating extra variables, use the following code
# the I operator inhibits conversion, i.e. it is treated 'as is'
tab3col5 <- lm(I(lwage2 - lwage1) ~ I(educ2 - educ1), data = twins)

# print the table with some light edits 
stargazer::stargazer(tab3col5,
                     title = "FIXED-EFFECTS ESTIMATES OF LOG WAGE EQUATIONS FOR IDENTICAL TWINS", 
                     out.header = F,
                     dep.var.labels = "First difference",
                     dep.var.caption = "",
                     type = "text",
                     summary= F, 
                     covariate.labels = "Own education")
## 
## FIXED-EFFECTS ESTIMATES OF LOG WAGE EQUATIONS FOR IDENTICAL TWINS
## ===============================================
##                          First difference      
## -----------------------------------------------
## Own education                0.092***          
##                               (0.024)          
##                                                
## Constant                      0.079*           
##                               (0.045)          
##                                                
## -----------------------------------------------
## Observations                    149            
## R2                             0.092           
## Adjusted R2                    0.086           
## Residual Std. Error      0.554 (df = 147)      
## F Statistic           14.914*** (df = 1; 147)  
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01
  1. Explain how this coefficient should be interpreted.

If our assumptions hold, having an extra year of education will result in about 9.2 percent higher (log of) wages, on average.

  1. Reproduce the result in table 3 column 1.
# fit the model
# Note here that instead of using the melt or gather function (or something similar), 
# we are directly specifying the formula in terms of vectors from the 
# 'twins' dataset, using the 'c' operator which combines arguments. 
# Resuts match those from the paper

tab3col1<- lm(c(lwage1, lwage2) ~ c(educ1, educ2) + c(age, age) +
          c(age ^ 2/100, age^2/100) + c(male1, male2) +
          c(white1, white2) , data = twins)

#print the table with some light editing 
stargazer::stargazer(tab3col1,
                     title = "FIXED-EFFECTS ESTIMATES OF LOG WAGE EQUATIONS FOR IDENTICAL TWINS", 
                     out.header = F,
                     dep.var.labels = "OLS",
                     dep.var.caption = "",
                     type = "text", 
                     covariate.labels = c("Own education", "Age", "Age squared/100", "Male", "White"))
## 
## FIXED-EFFECTS ESTIMATES OF LOG WAGE EQUATIONS FOR IDENTICAL TWINS
## ===============================================
##                                 OLS            
## -----------------------------------------------
## Own education                0.084***          
##                               (0.014)          
##                                                
## Age                          0.088***          
##                               (0.019)          
##                                                
## Age squared/100              -0.087***         
##                               (0.023)          
##                                                
## Male                         0.204***          
##                               (0.063)          
##                                                
## White                        -0.410***         
##                               (0.127)          
##                                                
## Constant                      -0.471           
##                               (0.426)          
##                                                
## -----------------------------------------------
## Observations                    298            
## R2                             0.272           
## Adjusted R2                    0.260           
## Residual Std. Error      0.532 (df = 292)      
## F Statistic           21.860*** (df = 5; 292)  
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01
  1. Explain how the coefficient on education should be interpreted.

We find that, given our assumptions are valid, an extra year of schooling results in 8.4 percent higher (log of) wages.

  1. Explain how the coefficient on the control variables should be interpreted.

Being male and white will increase (log of) wages by approx. 20.4 percent and 41.0 percent, respectively, if everything else is the same.

The coefficient on age will give us the effect of being an additional year older on (log of) wages. The age-squared/100 helps to capture any non-linearities in the relationship between income and age.