Part 1: Paper using randomized data: Impact of Class Size on Learning*
Krueger (1999) Experimental Estimates of Education Production Functions QJE 114 (2) : 497-532

  1. What is the causal link the paper is trying to reveal?
    The paper is estimating the impact of class size on student performance (test score).

  2. What would be the ideal experiment to test this causal link?
    The ideal experiment to test this causal effect would be the randomization of students and their teachers in different class sizes across schools.

  3. What is the identification strategy?
    Each school is required to have at least one of each class-size type (small, regular, and regular/aide), and random assignment within schools.

  4. What are the assumptions/threats to this identification strategy?
    There were some deviations from the idea experimental design;

Part 2: Paper using Twins for Identification: Economic Returns to Schooling
Ashenfelter and Krueger (1994) Estimates of the Economic Return to Schooling from a New Sample of Twins AER 84(5): 1157-1173

2.1. Briefly answer these questions:

  1. What is the causal link the paper is trying to reveal?
    The paper is estimating the returns to schooling by contrasting the wage rates of identical twins with different schooling levels.

  2. What would be the ideal experiment to test this causal link?
    The ideal experiment to test this causal effect would be the randomization of subjects to different schooling levels so that the result will attribute to different schooling level while all other differences are controlled.

  3. What is the identification strategy?
    The identification strategy is to control unobservable factors with the assumption that they would be identical for two twins.

  4. What are the assumptions/threats to this identification strategy?
    One of the threats to the above identification strategy would be measurement error.

2.2. Replication analysis

  1. Load Ashenfelter and Krueger AER 1994 data.
#Import dataset
df <- read.dta("AshenfelterKrueger1994_twins.dta")
head(df)
##   famid      age educ1 educ2   lwage1   lwage2 male1 male2 white1 white2
## 1     1 33.25120    16    16 2.161021 2.420368     0     0      1      1
## 2     2 43.57016    12    19 2.169054 2.890372     0     0      1      1
## 3     3 30.96783    12    12 2.791778 2.803360     1     1      1      1
## 4     4 34.63381    14    14 2.824351 2.263366     1     1      1      1
## 5     5 34.97878    15    13 2.032088 3.555348     0     0      1      1
## 6     6 29.33881    14    12 2.708050 2.484907     1     1      1      1
  1. Reproduce the result from table 3 column 5.
# Create first difference variable for wage and education
df$wage_diff1 <- df$lwage1 - df$lwage2
df$educ_diff1 <- df$educ1 - df$educ2

#First difference model
firstDif <- lm(wage_diff1 ~ educ_diff1, data = df)

stargazer(firstDif, type = "text", title = "TABLE 3 column 5", align = TRUE, 
          keep.stat = c("n","rsq"), dep.var.labels = c("First difference"), 
          covariate.labels = c("Own education"), omit = c("Constant"))
## 
## TABLE 3 column 5
## =========================================
##                   Dependent variable:    
##               ---------------------------
##                    First difference      
## -----------------------------------------
## Own education          0.092***          
##                         (0.024)          
##                                          
## -----------------------------------------
## Observations              149            
## R2                       0.092           
## =========================================
## Note:         *p<0.1; **p<0.05; ***p<0.01
  1. Explain how this coefficient should be interpreted.
    \(\hat{\beta} = 0.098\). It is interpreted as one more year of schooling increases the wage by 9.2%.

  2. Reproduce the result in table 3 column 1.

# Reshape the data (melt command from the reshape package to make data long)
wage <- melt(cbind(df$lwage1, df$lwage2))
educ <- melt(cbind(df$educ1, df$educ2))
age <- melt(cbind(df$age, df$age))
male <- melt(cbind(df$malew, df$male2))
white <- melt(cbind(df$white1, df$white2))

# create new dataframe with above new variables
newdf <- data.frame(cbind(wage[,3], educ[,3], age[,3],  male[,3], white[,3]))

# Give column names
colnames(newdf) <- c("wage", "educ", "age", "male", "white")

# Create age square variable
newdf$sqAge <- ((newdf$age)^2) / 100

# Run the OLS model
olsMod <- lm(wage ~ educ + age + sqAge + male + white, data = newdf)

stargazer(olsMod, type = "text", title = "TABLE 3 column 1", align = TRUE, 
         keep.stat = c("n","rsq"), dep.var.labels = c("OLS"),
         covariate.labels = c("Own education", "Age",  "Age squared (/100)", 
         "Male", "White"), omit = c("Constant"))    
## 
## TABLE 3 column 1
## ==============================================
##                        Dependent variable:    
##                    ---------------------------
##                                OLS            
## ----------------------------------------------
## Own education               0.084***          
##                              (0.014)          
##                                               
## Age                         0.088***          
##                              (0.019)          
##                                               
## Age squared (/100)          -0.087***         
##                              (0.023)          
##                                               
## Male                        0.204***          
##                              (0.063)          
##                                               
## White                       -0.410***         
##                              (0.127)          
##                                               
## ----------------------------------------------
## Observations                   298            
## R2                            0.272           
## ==============================================
## Note:              *p<0.1; **p<0.05; ***p<0.01
  1. Explain how the coefficient on education should be interpreted.
    \(\hat{\beta education} = 0.084\). It is interpreted as one more year of schooling increases the wage by 8.4%.

  2. Explain how the coefficient on the control variables should be interpreted.
    \(\hat{\beta age} = 0.088\) and \(\hat{\beta age square} = -0.087\). So, the marginal effect of age on wage is 100(0.088) + 2(-0.087)age. This mean at age 40, wage increases by 1.84% for an additional year.
    \(\hat{\beta male} = 0.204\). Wage of male twins is 22.63% higher than female twins on an average.
    \(\hat{\beta white} = -0.410\). Wage of white twins is 33.63% lower than non-white twins.