Part 1: Paper using randomized data: Impact of Class Size on Learning

Looking at the paper: Krueger (1999) Experimental Estimates of Education Production Functions QJE 114 (2) : 497-532

Questions 1

Briefly answer these questions

What is the causal link the paper is trying to reveal?

The causal link that they want to find is if class size affects the student’s learning.

What would be the ideal experiment to test this causal link?

The ideal set up would be a situation in which students are randomly assigned in different class sizes and they are forcefd to stay in those classes through out all the experiment. Without people moving, or parents complaining, without changes of the students into different classes.

What is the identification strategy?

The identification strategy is to randomly assign students into classes that have different sizes. Given that they are randomly assigned these two groups (the treatment and the control) will be comparable. The main goal is to isolate the effect that they want to study in order to identify the causality. The class size on the education outcomes.

What are the assumptions / threats to this identification strategy?

The problems that I stated above is that this assignment, although it was random at the beginning, it was not in the following rounds. Given that some parents complained that their children were selected into the larger classrooms without aid (supposedly the worst case scenario for the students) and demanded the kids to be changed. These generated a different setting, now the students in the big classrooms had different classmates, while the small ones had the same (this incorporate a new variable that could potentially influence the outcome). Also some families moved and had to change their kids to a different school, affecting the original groups. All this changes create some problems, the results might be biased by these effects.

Ashenfelter and Krueger (1994) Estimates of the Economic Return to Schooling from a New Sample of Twins AER 84(5): 1157-1173

Part 2: Paper using Twins for Identification: Economic Returns to Schooling

Questions 2

Briefly answer these questions

What is the causal link the paper is trying to reveal?

They want to study what is the impact that education has on labor income.

What would be the ideal experiment to test this causal link?

The ideal experiment would be to have the same person (or two people that are extremely identical like twins) have different levels of education assigned randomly.

What is the identification strategy?

The identification strategy that they use is taking pairs of twins that have different levels of education and comparing their wages. The assumption is that twins are identical in the unobservable variables, like in ability for example.

What are the assumptions / threats to this identification strategy?

The main issue is that there is self selection in the decision of education. This means that people that have higher levels of ability are more likely to get more years of education and earn higher wages. In this sense the education variable would also be capturing the “ability” effect given that they are correlated from one another. The problem with the ability measure is that I cannot be measures, it is an unobservable variable.

Load Ashenfelter and Krueger AER 1994 data

my_data_HW4<- read.dta("C:/Users/Diego/Dropbox/Pendrive/UGA/UGA_4/Metrics_II/HW4/AshenfelterKrueger1994_twins.dta") 
my_data_HW4_02<- mutate(my_data_HW4,
                  diff_wage = lwage1 - lwage2,
                  diff_educ = educ1 - educ2)

Reproduce the result from table 3 column 5

First_difference <-lm(diff_wage ~ diff_educ, data = my_data_HW4_02)
stargazer(First_difference, title="Table 3 - v", align=TRUE,  type='html')
Table 3 - v
Dependent variable:
diff_wage
diff_educ 0.092***
(0.024)
Constant -0.079*
(0.045)
Observations 149
R2 0.092
Adjusted R2 0.086
Residual Std. Error 0.554 (df = 147)
F Statistic 14.914*** (df = 1; 147)
Note: p<0.1; p<0.05; p<0.01

Explain how this coefficient should be interpreted

The coefficient show us how does an extra year of education impact on the percentage of wage they receive. In this particular case, an extra year of education increases wages on an average of 9.2% ceteris paribus. The result is significant, an extra year of education has a positive impact on wages as expected. We are assuming that the twins are exactly identical in all the observables and unobservales.

Reproduce the result in table 3 column 1

my_data_HW4_03<- rename(my_data_HW4,
                educ.1=educ1,
                educ.2=educ2,
                lwage.1=lwage1,
                lwage.2=lwage2,
                male.1=male1,
                male.2=male2,
                white.1=white1,
                white.2=white2)

my_data_HW4_03.01<-melt(my_data_HW4_03, id.vars = c("famid", "age"), c("educ.1","educ.2"))
my_data_HW4_03.02<-melt(my_data_HW4_03, id.vars = c("famid", "age"), c("lwage.1","lwage.2"))
my_data_HW4_03.03<-melt(my_data_HW4_03, id.vars = c("famid", "age"), c("male.1","male.2"))
my_data_HW4_03.04<-melt(my_data_HW4_03, id.vars = c("famid", "age"), c("white.1","white.2"))
colnames(my_data_HW4_03.01) <- c("famid", "age", "var1", "educ")
colnames(my_data_HW4_03.02) <- c("famid", "age", "var2", "lwage")
colnames(my_data_HW4_03.03) <- c("famid", "age", "var3", "male")
colnames(my_data_HW4_03.04) <- c("famid", "age", "var4", "white")
my_data_HW4_03.01$id<-c(1:298)
my_data_HW4_03.02$id<-c(1:298)
my_data_HW4_03.03$id<-c(1:298)
my_data_HW4_03.04$id<-c(1:298)
my_data_HW4_03.01<-subset(my_data_HW4_03.01, select = -c(var1) )
my_data_HW4_03.02<-subset(my_data_HW4_03.02, select = -c(var2, famid, age) )
my_data_HW4_03.03<-subset(my_data_HW4_03.03, select = -c(var3, famid, age) )
my_data_HW4_03.04<-subset(my_data_HW4_03.04, select = -c(var4, famid, age) )
my_data_HW4_03.05 <-  merge(my_data_HW4_03.01, my_data_HW4_03.02, by="id")
my_data_HW4_03.06 <-  merge(my_data_HW4_03.05, my_data_HW4_03.03, by="id")
my_data_HW4_03.07 <-  merge(my_data_HW4_03.06, my_data_HW4_03.04, by="id")
my_data_HW4_03.07<- mutate(my_data_HW4_03.07,
                age2 = (age^2)/100)
OLS_i <-lm(lwage ~ educ + age + age2 + male + white, data = my_data_HW4_03.07)
stargazer(OLS_i, title="Table 3 - i", align=TRUE,  type='html')
Table 3 - i
Dependent variable:
lwage
educ 0.084***
(0.014)
age 0.088***
(0.019)
age2 -0.087***
(0.023)
male 0.204***
(0.063)
white -0.410***
(0.127)
Constant -0.471
(0.426)
Observations 298
R2 0.272
Adjusted R2 0.260
Residual Std. Error 0.532 (df = 292)
F Statistic 21.860*** (df = 5; 292)
Note: p<0.1; p<0.05; p<0.01

Explain how the coefficient on education should be interpreted

The coefficient show us how does an extra year of education impact on the percentage of wage they receive. In this particular case, an extra year of education increases wages on an average of 8.4% ceteris paribus. The result is significant, an extra year of education has a positive impact on wages as expected.

Explain how the coefficient on the control variables should be interpreted

The coefficients of the control variables show us how does the percentage of wage change when any of the is increased by one unit. As an example:

  • being male increases wages on an average of 20.4% ceteris paribus.
  • being white decreases wages on an average of 41% ceteris paribus.
  • We are allowing the marginal effect of aging to be non linear. As an exmaple is some is 40, turning 41 will increase her wage by 1.84% (0.088 - 2 times 0.087 times 40 / 100) on average, ceteris paribus.