Part 1: Paper using randomized data: Impact of Class Size on Learning

1.1

  1. What is the causal link the paper is trying to reveal?

The paper tries to reveal the causal link between school achievement (measured through SAT and BSF test scores) and class size.

  1. What would be the ideal experiment to test this causal link?

The ideal experiment would be perfectly controlled with an experimental group (smaller classes) and a control group (regular classes) across schools with no confounders, perfectly comparable students and no spillovers or attrition.

  1. What is the identification strategy?

The identification strategy used is that each school was required to have at least one of each class-size type, and random assignment took place within schools.

  1. What are the assumptions / threats to this identification strategy?
  1. Re-randomization was done at the beginning of the first grade where students in regular-size classes were randomly assigned again between classes with and without full-time aides at the beginning of first grade, while students in small classes continued in small classes.
  2. Approximately 10 percent of students switched between small and regular classes between grades because of behavioral problems or parental complaints. Also because students and their families naturally relocate during the year, actual class size varied in small and regular classes
  3. Sample attrition was common – half of students who were present in kindergarten were missing in at least one subsequent year.
  4. Baseline test scores were not available, so one cannot examine whether the treatment and control groups looked similar on this measure before the experiment began.

Part 2: Paper using Twins for Identification: Economic Returns to Schooling

2.1

  1. What is the causal link the paper is trying to reveal?

The causal link that the paper tries to reveal is the economic returns to schooling specifically how each year of schooling completed affects a worker’s wage rate.

  1. What would be the ideal experiment to test this causal link?

The ideal experiment to test this causal link would involve the random assignment of subjects to schooling levels so that all other differences are controlled.

  1. What is the identification strategy?

The identification strategy used is the estimation of returns to schooling by contrasting the wage rates of identical twins with similar worker’s ability and other characteristics but with different schooling levels.

  1. What are the assumptions / threats to this identification strategy?
  1. Measurement error from unobserved factors may lead to considerable underestimation of the returns to schooling in studies based on siblings
  2. Twins in the sample used may bear stronger similarities than would be the case in a random sample of twins

2.2 Replication Analysis

  1. Reproduce the result from table 3 column 5
library(foreign)
library(stargazer)
library(reshape2)
# Load the data
data <- read.dta("AshenfelterKrueger1994_twins.dta")

# Create variables for intrapair wage difference and intrapair difference in school levels
wagediff <- data$lwage1 - data$lwage2
educdiff <- data$educ1 - data$educ2
# First difference regression
y1 <- lm(wagediff ~ educdiff)
# Outputting the table
stargazer(y1, type = "text", title = "Table 3, Col 5", align = TRUE, keep.stat = c("n","rsq"),
          dep.var.labels = c("First difference"), covariate.labels = c("Own education"),
          omit = c("Constant"))
## 
## Table 3, Col 5
## =========================================
##                   Dependent variable:    
##               ---------------------------
##                    First difference      
## -----------------------------------------
## Own education          0.092***          
##                         (0.024)          
##                                          
## -----------------------------------------
## Observations              149            
## R2                       0.092           
## =========================================
## Note:         *p<0.1; **p<0.05; ***p<0.01

c. Interpretation of coefficient - An increase of schooling by 1 year results in 9.2% increase in wage. This result is statistically significant at the 1% level.

# Reshape the data
colnames(data) <- c('famid','age', 'educ.1', 'educ.2', 'lwage.1','lwage.2', 'male.1','male.2','white.1','white.2')
data2 <- reshape(data, direction = "long",
                 varying = c('educ.1','educ.2','lwage.1','lwage.2','male.1','male.2','white.1','white.2'),
                 sep = ".")
 # Create age squared variable                
data2$agesqr <- data$age^2/100
# Run OLS regression
y2 <- lm(lwage ~ educ + age + agesqr + male + white, data = data2)
#Outputting the table
stargazer(y2, type = "text", title = "TABLE 3, Col 1", align = TRUE, keep.stat = c("n","rsq"),
          dep.var.labels = c("OLS"),
          covariate.labels = c("Own education", "Age", "Age squared (/100)", "Male", "White"),
          omit = c("Constant"))
## 
## TABLE 3, Col 1
## ==============================================
##                        Dependent variable:    
##                    ---------------------------
##                                OLS            
## ----------------------------------------------
## Own education               0.084***          
##                              (0.014)          
##                                               
## Age                         0.088***          
##                              (0.019)          
##                                               
## Age squared (/100)          -0.087***         
##                              (0.023)          
##                                               
## Male                        0.204***          
##                              (0.063)          
##                                               
## White                       -0.410***         
##                              (0.127)          
##                                               
## ----------------------------------------------
## Observations                   298            
## R2                            0.272           
## ==============================================
## Note:              *p<0.1; **p<0.05; ***p<0.01

e. Coefficient on education - An increase of schooling by 1 year increases the wage by 8.4%. This result is statistically significant at the 1% level.

f. Coefficients on control variables -

  1. Age - The coefficient on age is 0.088 while the coefficient on age squared /100 is -0.087. This means that age and wage have a non-linear relationship. As age increases by 1 year, the wage also increases upto a point beyond which there is a negative effect of age on wage.

  2. Male - Being male increases the wage by 20.4% on average as opposed to being female.

  3. White - Being white decreases the wage by 41% on average as opposed to being non-white.

These results are significant at the 1% level.