Part 1: Paper using randomized data: Impact of Class Size on Learning
Download and go over this seminal paper by Alan Krueger.
Krueger (1999) Experimental Estimates of Education Production Functions QJE 114 (2) : 497-532
1.1. Briefly answer these questions:
a. What is the causal link the paper is trying to reveal?
b. What would be the ideal experiment to test this causal link?
c. What is the identification strategy?
Part 2: Using Twins for Identification: Economic Returns to Schooling
Download and go over this seminal paper by Orley Ashenfelter and Alan Krueger.
Ashenfelter and Krueger (1994) Estimates of the Economic Return to Schooling from a New Sample of Twins AER 84(5): 1157-1173
2.1. Briefly answer these questions:
a. What is the causal link the paper is trying to reveal?
b. What would be the ideal experiment to test this causal link?
c. What is the identification strategy?
2.2. Replication analysis
a. Load Ashenfleter and Krueger AER 1994 data
You can load it directly from my website here. Variable names are self-explanatory if you read the paper.| famid | age | educ1 | educ2 | lwage1 | lwage2 | male1 | male2 | white1 | white2 |
|---|---|---|---|---|---|---|---|---|---|
| 1 | 33.25120 | 16 | 16 | 2.161021 | 2.420368 | 0 | 0 | 1 | 1 |
| 2 | 43.57016 | 12 | 19 | 2.169054 | 2.890372 | 0 | 0 | 1 | 1 |
| 3 | 30.96783 | 12 | 12 | 2.791778 | 2.803360 | 1 | 1 | 1 | 1 |
| 4 | 34.63381 | 14 | 14 | 2.824351 | 2.263366 | 1 | 1 | 1 | 1 |
| 5 | 34.97878 | 15 | 13 | 2.032088 | 3.555348 | 0 | 0 | 1 | 1 |
| 6 | 29.33881 | 14 | 12 | 2.708050 | 2.484907 | 1 | 1 | 1 | 1 |
b. Reproduce the result from table 3 column 5 of the paper
You will need to create the “difference” variables first.
I use the stargazer package to make ok-looking regression
result tables. There are other ways.
##
## ========================================
## Dependent variable:
## ---------------------------
## lwageD
## ----------------------------------------
## educD 0.092***
## (0.024)
##
## ----------------------------------------
## Observations 149
## Adjusted R2 0.086
## ========================================
## Note: *p<0.1; **p<0.05; ***p<0.01
c. Explain how this coefficient should be interpreted.
d. Reproduce the result in table 3 column 1
You will need to reshape the data first.
Hint: I used the reshape2 package. It required me to rename
the variables with a dot, like “educ.1” instead of just “educ1”. Then I
just run a
reshape(data, direction="long", varying =..., timevar = ...).
There are probably other ways to do it using melt or
gather.
| famid | age | twin | educ | lwage | male | white | id | agesq | |
|---|---|---|---|---|---|---|---|---|---|
| 1.1 | 1 | 33.25120 | 1 | 16 | 2.161021 | 0 | 1 | 1 | 11.056424 |
| 1.2 | 1 | 33.25120 | 2 | 16 | 2.420368 | 0 | 1 | 1 | 11.056424 |
| 2.1 | 2 | 43.57016 | 1 | 12 | 2.169054 | 0 | 1 | 2 | 18.983588 |
| 2.2 | 2 | 43.57016 | 2 | 19 | 2.890372 | 0 | 1 | 2 | 18.983588 |
| 3.1 | 3 | 30.96783 | 1 | 12 | 2.791778 | 1 | 1 | 3 | 9.590065 |
| 3.2 | 3 | 30.96783 | 2 | 12 | 2.803360 | 1 | 1 | 3 | 9.590065 |
Regression result matches the paper exactly:
##
## ========================================
## Dependent variable:
## ---------------------------
## lwage
## ----------------------------------------
## educ 0.084***
## (0.014)
##
## age 0.088***
## (0.019)
##
## agesq -0.087***
## (0.023)
##
## male 0.204***
## (0.063)
##
## white -0.410***
## (0.127)
##
## ----------------------------------------
## Observations 298
## Adjusted R2 0.260
## ========================================
## Note: *p<0.1; **p<0.05; ***p<0.01