Part 1: Paper using randomized data: Impact of Class Size on Learning

Assignment from the seminal paper by Alan Krueger. Krueger (1999) Experimental Estimates of Education Production Functions QJE 114 (2) : 497-532

1.1. Briefly answer these questions:

c. What is the identification strategy?

Test score is not impacted by anything other than random class size allocation. Any change in test score will be attributed to class size.

d. What are the assumptions / threats to this identification strategy? (answer specifically with reference to the data the authors are using)

Assumptions was students and teachers were randomly assigned to class but:

  1. Students might have been assigned to small class size under the influence of parents who might consider small class size better for their child
  2. Inducting new students in the class or the attrition of students during the period of experiment

Part 2: Using Twins for Identification: Economic Returns to Schooling

Assignment from the seminal paper by Orley Ashenfelter and Alan Krueger. Ashenfelter and Krueger (1994) Estimates of the Economic Return to Schooling from a New Sample of Twins AER 84(5): 1157-1173

2.1. Briefly answer these questions:

c. What is the identification strategy?

Controlled the unobservables by considering twins in which case everything else will remain constant. Measurement error was controlled by interviewing the twins separately about others education and wage.

d. What are the assumptions / threats to this identification strategy? (answer specifically with reference to the data the authors are using)

Assumptions are:

  1. Ability as unobservables was taken care of by considering twins but WILL differs from person to person. Some are more inclines towards education and others are not. Schooling levels among twins were similar
  2. Everything else is constant (Eg:Ability and family background)
  3. Bias due to the data collection method. Since the fair celebrates twins and their similarity, twins who believes they are not similar in terms of education, jobs and financially may have chosen not to attend.

2.2. Replication analysis from Ashenfleter and Krueger AER 1994

a. Load the data from this website. Variable names are self-explanatory

df <- read_dta("data/AshenfelterKrueger1994_twins.dta")
head(df,6)
famidageeduc1educ2lwage1lwage2male1male2white1white2
133.316162.162.420011
243.612192.172.890011
331  12122.792.8 1111
434.614142.822.261111
535  15132.033.560011
629.314122.712.481111

b. Reproduce the result from table 3 column 5 of the paper

edu <- df$educ1-df$educ2
lwage <- df$lwage1-df$lwage2

model1 <- lm(lwage ~ edu, data=df)

stargazer(model1, header=FALSE, type='text', font.size="small",
            omit.stat=c("adj.rsq", "ser", "f"), title = "Table 3: First Difference estimates of log wage for identical twins",
            covariate.labels= c("Education","Constant"))
## 
## Table 3: First Difference estimates of log wage for identical twins
## ========================================
##                  Dependent variable:    
##              ---------------------------
##                         lwage           
## ----------------------------------------
## Education             0.092***          
##                        (0.024)          
##                                         
## Constant               -0.079*          
##                        (0.045)          
##                                         
## ----------------------------------------
## Observations             149            
## R2                      0.092           
## ========================================
## Note:        *p<0.1; **p<0.05; ***p<0.01

c. Explain how this coefficient should be interpreted.

The estimate of effect of difference in schooling of twins on the wage difference is 9.2%.

d. Reproduce the result in table 3 column 1. You will need to reshape the data first.

First reshape the data to the long form.

df_new <- reshape(df, idvar = c("famid","age"), varying = list(c(3,4),c(5,6),c(7,8),c(9,10)), v.names = c("edu","lwage","male","white"), direction = "long")
head(df_new,6)
famidagetimeedulwagemalewhite
133.31162.1601
243.61122.1701
331  1122.7911
434.61142.8211
535  1152.0301
629.31142.7111

Let’s reproduce the result from table 3 column 1 of the paper using the new dataframe

df_new$age_sq <- ((df_new$age^2)/100)

model2 <- lm(lwage ~ edu + age + age_sq + male + white, data=df_new)

stargazer(model2, header=FALSE, type='text', font.size="small",
            omit.stat=c("adj.rsq", "ser", "f"), title = "Table 3: Ordinary Least Square (OLS) estimates of log wage for identical twins",
            covariate.labels= c("Own education","Age","Age Squared (/100)","Male","White"))
## 
## Table 3: Ordinary Least Square (OLS) estimates of log wage for identical twins
## ==============================================
##                        Dependent variable:    
##                    ---------------------------
##                               lwage           
## ----------------------------------------------
## Own education               0.084***          
##                              (0.014)          
##                                               
## Age                         0.088***          
##                              (0.019)          
##                                               
## Age Squared (/100)          -0.087***         
##                              (0.023)          
##                                               
## Male                        0.204***          
##                              (0.063)          
##                                               
## White                       -0.410***         
##                              (0.127)          
##                                               
## Constant                     -0.471           
##                              (0.426)          
##                                               
## ----------------------------------------------
## Observations                   298            
## R2                            0.272           
## ==============================================
## Note:              *p<0.1; **p<0.05; ***p<0.01

Table with both the model-

model_tbl3 = list("OLS (i)" = model2,"First difference (v)" = model1)


coefs <- names(coef(model_tbl3[[1]]))[str_detect(names(coef(model_tbl3[[1]])), "vdc")]


huxtable <- huxreg(model_tbl3,number_format = 3, omit_coefs =coefs,
       coefs = c("Own education"="edu","Age"="age","Age Squared (/100)"="age_sq", "Male"="male","White"="white"),
      statistics = c("Sample size:" = "nobs", "R2" = "r.squared"))%>% 
  set_caption("Table 3: Ordinary Least Square (OLS) and First difference estimates of log wage for identical twins")

add_footnote(huxtable,"Each equation also includes an intercept term. Number in parentheses are estimated standard errors.")
Table 3: Ordinary Least Square (OLS) and First difference estimates of log wage for identical twins
OLS (i)First difference (v)
Own education0.084 ***0.092 ***
(0.014)   (0.024)   
Age0.088 ***        
(0.019)           
Age Squared (/100)-0.087 ***        
(0.023)           
Male0.204 **         
(0.063)           
White-0.410 **         
(0.127)           
Sample size:298        149        
R20.272    0.092    
*** p < 0.001; ** p < 0.01; * p < 0.05.
Each equation also includes an intercept term. Number in parentheses are estimated standard errors.

e. Explain how the coefficient on education should be interpreted.

The estimate of effect of schooling on wage is 8.4% in stacked data as against 9.2% in the first difference.

f. Explain how the coefficient on the control variables should be interpreted.

The estimate of effect of schooling on wage is 8.4% as against 9.2% in the first difference. Wage increases with age but after certain cutoff age, the wage starts declining with age as indicated by coefficient of Age Squared. The effect of race on age is -41%.