Part 1: Paper using randomized data: Impact of Class Size on Learning

Krueger (1999) Experimental Estimates of Education Production Functions QJE 114 (2) : 497-532

c. What is the identification strategy?

Identification strategy is that each school is required to have at least one of each class-size type (small, regular with aide, and regular without aide), and a random assignment of students within schools. The independence between class-size assignment and other variables is only valid within schools, because randomization was done within schools.

d. What are the assumptions / threats to this identification strategy?

Krueger (1999) made several assumptions and deviated from the ideal experimental design:

  1. Students were randomly reassigned between regular-size classes (with and without full-time aides) at the beginning of first grade, while students in small classes continued on in small classes, often with the same set of classmates (re-randomization).

  2. Roughly 10% of students were switched between small and regular sized classes due to the behavioral problems or parental complaints (nonrandom transitions).

They addressed this problem, and the variability of class size for a given type of assignment, in some of the analysis that follows initial random assignment was used as an instrumental variable for actual class size. Furthermore, they addressed the limitation about students and their families relocation during the school year, .

Part 2: Paper using Twins for Identification: Economic Returns to Schooling

Ashenfelter and Krueger (1994) Estimates of the Economic Return to Schooling from a New Sample of Twins AER 84(5): 1157-1173

c. What is the identification strategy?

Ashenfelter and krueger (1994) controlled other unobservable factors by assuming that they would be identical for the twins to estimate the causal effects of schooling on wages.

d. What are the assumptions / threats to this identification strategy

Measurement errors that were not addressed in the past studies could be a threat to the identification. However, this study incorporates errors in the measurement of schooling. Schooling level of student may also be associated with the family factors such as twins who are raised by individual parents, thus having its effect on their wages.

Replication Analysis

Reproduce the result from table 3 column 5

# Load STATA file using the foreign package, make table using stargazer package, and melt data using reshape package
library(foreign)
library(stargazer)
library(reshape)

# Import dta data
my_data <- read.dta("AshenfelterKrueger1994_twins.dta")
head(my_data)
##   famid      age educ1 educ2   lwage1   lwage2 male1 male2 white1 white2
## 1     1 33.25120    16    16 2.161021 2.420368     0     0      1      1
## 2     2 43.57016    12    19 2.169054 2.890372     0     0      1      1
## 3     3 30.96783    12    12 2.791778 2.803360     1     1      1      1
## 4     4 34.63381    14    14 2.824351 2.263366     1     1      1      1
## 5     5 34.97878    15    13 2.032088 3.555348     0     0      1      1
## 6     6 29.33881    14    12 2.708050 2.484907     1     1      1      1
# Create difference variable for lwage and education
my_data$wage_diff <- my_data$lwage1 - my_data$lwage2
my_data$educ_diff <- my_data$educ1 - my_data$educ2

# Run the first difference model
mod <- lm(wage_diff ~ educ_diff, data = my_data)

# Create a table with stargazer package
stargazer(mod, type = "text", title = "TABLE 3", align = TRUE, keep.stat = c("n","rsq"),
          dep.var.labels = c("First difference"), covariate.labels = c("Own education"),
          omit = c("Constant"))               # Display sample size and R-squared and remove constant 
## 
## TABLE 3
## =========================================
##                   Dependent variable:    
##               ---------------------------
##                    First difference      
## -----------------------------------------
## Own education          0.092***          
##                         (0.024)          
##                                          
## -----------------------------------------
## Observations              149            
## R2                       0.092           
## =========================================
## Note:         *p<0.1; **p<0.05; ***p<0.01

Interpretation: The result shows that \(\hat{\beta} = 0.092\), which means wage increases by 9.2% when the schooling level increases by 1 year and is statistically significant at 1% significance level.

Reproduce the result from table 3 column 1

# reshape the data (make it long using melt command from the reshape package)
wage <- melt(cbind(my_data$lwage1, my_data$lwage2))
educ <- melt(cbind(my_data$educ1, my_data$educ2))
male <- melt(cbind(my_data$malew, my_data$male2))
white <- melt(cbind(my_data$white1, my_data$white2))
age <- melt(cbind(my_data$age, my_data$age))

# create a new dataset by combining these variables and make it a data frame
my_newdata <- data.frame(cbind(wage[,3], educ[,3], male[,3], white[,3], age[,3]))

# Give variable names to the data frame
colnames(my_newdata) <- c("wage", "educ", "male", "white", "age")

# Then, create new variable age squared
my_newdata$agesq <- ((my_newdata$age)^2) / 100

# Run the model for this new dataset
mod1 <- lm(wage ~ educ + male + white + age + agesq, data=my_newdata)

# Create a table with stargazer package
stargazer(mod1, type = "text", title = "TABLE 3", align = TRUE, keep.stat = c("n","rsq"),
          dep.var.labels = c("OLS"),
          covariate.labels = c("Own education", "Male", "White", "Age", "Age squared / 100"),
          omit = c("Constant"))               # Display sample size and R-squared and remove constant 
## 
## TABLE 3
## =============================================
##                       Dependent variable:    
##                   ---------------------------
##                               OLS            
## ---------------------------------------------
## Own education              0.084***          
##                             (0.014)          
##                                              
## Male                       0.204***          
##                             (0.063)          
##                                              
## White                      -0.410***         
##                             (0.127)          
##                                              
## Age                        0.088***          
##                             (0.019)          
##                                              
## Age squared / 100          -0.087***         
##                             (0.023)          
##                                              
## ---------------------------------------------
## Observations                  298            
## R2                           0.272           
## =============================================
## Note:             *p<0.1; **p<0.05; ***p<0.01

Coefficient on education:
The result shows that the coefficient on education is 0.084, which means wage increases by 8.4% on average when the schooling level increases by 1 year and is statistically significant at 1% significance level.

Coefficient on other control variables:
The coefficient on male is 0.204, which means wage of male twins is 22.63% higher than the female on average and is statistically significant at 1% significance level.

The coefficient on white is -0.410, which means wage of white twins is 33.63% lower than non-white and is statistically significant at 1% significance level.

The coefficient on age is 0.088 and the coefficient on agesq is -0.087. So, the marginal effect of age on wage is 100(0.088) + 2(-0.087)age. This mean at age 40, wage increases by 1.84% for an additional year and is statistically significant at 1% significance level.