Part1:Paper using randomized data: Impact of Class Size on Learning

Download and go over this seminal paper by Alan Krueger. Krueger (1999) Experimental Estimates of Education Production Functions QJE 114 (2) : 497-532

1.1. Briefly answer these questions:

a. What is the causal link the paper is trying to reveal?

Answer: The effect of class size to students’ learning performance.

b. What would be the ideal experiment to test this causal link?

Answer: The ideal experiment is to randomly assign students into different class size and control all other factors same and observe the different learning outcomes.The experiment used in this paper is to require public school have different class sized for each grade and assign students randomly into different size of class through the third grades.They collected tests scores as the learning results. But the limitation is the unavailable baseline test scores for students.

c. What is the identification strategy?

Answer: The identification strategy is to control all the school effects that might influence students learning performance same for control group and treatment group.

d. What are the assumptions / threats to this identification strategy? (Answer specifically with reference to the data the authors are using) (For instance: “This identification strategy would not be revealing a causal effect if [insert potential issue]”)

Answer: This identification strategy would not be revealing a causal effect if there are other school factors that are not same for different class size, such like there are small differences in the fraction of students on free lunch, the racial mix and the average age of students in the different sizes of class.

Part2 Paper using Twins for Identification: Economic Returns to Schooling

Download and go over this seminal paper by Orley Ashenfelter and Alan Krueger. Ashenfelter and Krueger (1994) Estimates of the Economic Return to Schooling from a New Sample of Twins AER 84(5): 1157-1173

2.1. Briefly answer these questions:

a. What is the causal link the paper is trying to reveal?

Answer: The causal link is the effect of education to the wage earned.

b. What would be the ideal experiment to test this causal link?

Answer: The ideal experiment is to randomly assign people into schooling year and control all other factors same and observe the different wages.And the author applied a similar experiment that using twins data which could have independent measures of the schooling year for each couple of twins.

c. What is the identification strategy?

Answer: The identification strategy here is to control all other unobserved variables same for treatment and control group that is to maintain IQ, family background and others same for twins.

d. What are the assumptions / threats to this identification strategy? (Answer specifically with reference to the data the authors are using)

Answer:The assumption is that the authors used data for twins which requires twins have same family background and genes and all controlled variables and unobserved variables same for the treatment group and control group.This identification strategy would not be revealing a causal effect if twins have different background such like raised by different families or have different intelligence level.

2.2 Replication analysis

a. Load Ashenfelter and Krueger AER 1994 data. You can load it directly from my website here. Variable names should be self-explanatory if you read the paper.

library(reshape2);
library(stargazer);
library(foreign);
library(knitr);
hw4 <- read.dta("AshenfelterKrueger1994_twins.dta");
kable(head(hw4, n = 5))
famid age educ1 educ2 lwage1 lwage2 male1 male2 white1 white2
1 33.25120 16 16 2.161021 2.420368 0 0 1 1
2 43.57016 12 19 2.169054 2.890372 0 0 1 1
3 30.96783 12 12 2.791778 2.803360 1 1 1 1
4 34.63381 14 14 2.824351 2.263366 1 1 1 1
5 34.97878 15 13 2.032088 3.555348 0 0 1 1

b. Reproduce the result from table 3 column 5.

hw4$lwageD <- hw4$lwage1-hw4$lwage2;
hw4$educD <- hw4$educ1-hw4$educ2;
reg1 <- lm(lwageD ~ educD, data = hw4)
stargazer(reg1, type = "html",  
          title = "Regression results for Table 3 column 5")
Regression results for Table 3 column 5
Dependent variable:
lwageD
educD 0.092***
(0.024)
Constant -0.079*
(0.045)
Observations 149
R2 0.092
Adjusted R2 0.086
Residual Std. Error 0.554 (df = 147)
F Statistic 14.914*** (df = 1; 147)
Note: p<0.1; p<0.05; p<0.01

c. Explain how this coefficient should be interpreted.

Answer: This coefficient 0.092 should be interpreted that each unit of education year difference will increase 9.2 percents of wage difference for each pair of twins when we control all other variables same.

d. Reproduce the result in table 3 column 1. You will need to reshape the data first.Hint: I used the reshape2 package. It required me to rename the variables with “.1” instead of just “1”.There are probably other ways to do it using melt or gather.

hw4$educ.1<-hw4$educ1;
hw4$educ.2<-hw4$educ2;
hw4$lwage.1<-hw4$lwage1;
hw4$lwage.2<-hw4$lwage2;
hw4$male.1<-hw4$male1;
hw4$male.2<-hw4$male2;
hw4$white.1<-hw4$white1;
hw4$white.2<-hw4$white2;
hw4new<-reshape(hw4, direction="long",varying=c('educ.1', 'educ.2', 'lwage.1', 'lwage.2','male.1', 'male.2', 'white.1', 'white.2'))
hw4new$age2<-hw4new$age^2/100;
reg2<-lm(lwage~educ+male+white+age+age2,data=hw4new);
stargazer(reg2, type = "html", 
          title = "Regression results for Table 3 column 1")
Regression results for Table 3 column 1
Dependent variable:
lwage
educ 0.084***
(0.014)
male 0.204***
(0.063)
white -0.410***
(0.127)
age 0.088***
(0.019)
age2 -0.087***
(0.023)
Constant -0.471
(0.426)
Observations 298
R2 0.272
Adjusted R2 0.260
Residual Std. Error 0.532 (df = 292)
F Statistic 21.860*** (df = 5; 292)
Note: p<0.1; p<0.05; p<0.01

e. Explain how the coefficient on education should be interpreted.

Answer:The coefficient on education can be interpreted that each unit increase of education year can increase wage by 8.4 percents when we control all other variables constant.

f. Explain how the coefficient on the control variables should be interpreted.

Answer: The coefficients on control variables can be interpreted that each unit increase of age can increase wage by 8.8 percents, each unit increase of age square/100 can decrease wage by 8.7 percents with other variables same. Being a male can increase wage by 20.4 percents than being a female and being a white person would decrease wage by 41 percents than not being a white person when we control all other variables unchanged.