Part 1: Paper using randomized data: Impact of Class Size on Learning

Download and go over this seminal paper by Alan Krueger. Krueger (1999) Experimental Estimates of Education Production Functions QJE 114 (2) : 497-532

1.1. Briefly answer these questions:

a. What is the causal link the paper is trying to reveal?

The author tries to reveal that the causal effect of classes’s size on students’ grades.

b. What would be the ideal experiment to test this causal link?

For the ideal experiment, we should control all the other variables except classes’ size, like similar performances of students, similar teachers, similar classrooms setting. After controlling all the variables, we randomly assign the students to different classes’ size. Then the students will be taught under the same environment for the same period of time. And then they will be asked to take the same exam at the same time and then collect the data of students’ test scores.

c. What is the identification strategy?

The paper uses RCT(randomized controlled trial) conducted in the United States. This experiment is a Tennessee Student/Teacher Achievement Ratio experiment, known as Project STAR. This experiment randomly assign students and teachers into three groups of different class sizes: “small classes (13-17 students per teacher), regular-size classes (22-25 students), and regular/aide classes (22-25 students) which also included a full-time teacher’s aide”. Students of each group are given “standardized tests at the end of each school year”. The experiment last for 4 years. The author then compares the tests score in each class size to analyse the effect of the class size on students’ performance.

d. What are the assumptions / threats to this identification strategy?

assumptions:
- randomization.
- controlled environment.
- similar characteristic of students/ teaching environment.
threats (as the author illustrate in the introduction part):
- Re-randomization of students in regular-size classes with and without full-time could compromise the experimental results.
- Other nonrandom transitions: 10% students switched between small and regular classes because of parents’ complaints.
- Class sizes varied more than intended caused by family relocation.
- sample attrition was common.
- students may have switched schools nonrandomly.

Part 2: Paper using Twins for Identification: Economic Returns to Schooling

Download and go over this seminal paper by Orley Ashenfelter and Alan Krueger. Ashenfelter and Krueger (1994) Estimates of the Economic Return to Schooling from a New Sample of Twins AER 84(5): 1157-1173

2.1. Briefly answer these questions:

a. What is the causal link the paper is trying to reveal?

This paper analyze the causal effect of schooling level on wages.

b. What would be the ideal experiment to test this causal link?

The ideal experiment could be impossible to implement. To find this causal link, we should ask pairs of two same people with identity background and ability to attend different level of school and then we follows pairs of these two persons years later to find out the wages they get after controlling all the other variables during their growth.

c. What is the identification strategy?

The author’s team interviewed twins at 16th Annual Twins Days Festival in Twinsburg, Ohio, in August of 1991. The twins they interview is identity twins, which means that they are genetically identity. is After collecting the survey data,

d. What are the assumptions / threats to this identification strategy? (Answer specifically with reference to the data the authors are using)

Threats:
- Twins may have different abilities even though they are genetically identity.
- selection bias:
  - twins in the sample have stronger similarities than in a random sample of twin because the author chooses them in a festival.
  - twins in this study do vary in dimensions that the twins in other studies do not.
- measurement error

The author mentions two types of threat that the data should deal with:

omitted ability variables
- The author tries to use “coefficients \(\beta\) to measure the structural (or selection-corrected) effect of the observables on earnings” and gets unbiased estimator.
Measurement Error
- The measure error in the correlation between the two measures of schooling, which can be revealed by estimation of “the reliability ratio for the twins schooling levels in Table 2 are 0.92 and 0.88”. The measure error clearly biases the estimator.

2.2. Replication analysis

a. Load Ashenfelter and Krueger AER 1994 data. You can load it directly from my website here. Variable names should be self-explanatory if you read the paper.

library(haven)
d <- read_dta("hw4/AshenfelterKrueger1994_twins.dta")

b. Reproduce the result from table 3 column 5.

# first difference
wage_dif = d$lwage1 - d$lwage2
edu_dif = d$educ1 -d$educ2
g <-  lm(wage_dif ~ edu_dif ,data = d)
summary(g)

## 
## Call:
## lm(formula = wage_dif ~ edu_dif, data = d)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.03115 -0.20909  0.00722  0.34395  1.15740 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.07859    0.04547  -1.728 0.086023 .  
## edu_dif      0.09157    0.02371   3.862 0.000168 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5542 on 147 degrees of freedom
## Multiple R-squared:  0.09211,    Adjusted R-squared:  0.08593 
## F-statistic: 14.91 on 1 and 147 DF,  p-value: 0.0001682

library(stargazer)

## 
## Please cite as:

##  Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.

##  R package version 5.2.3. https://CRAN.R-project.org/package=stargazer

stargazer(g,  
          type="text",
          title = "Table 3 column 5",
          dep.var.labels = c("First difference (v)"),
          covariate.labels = c("Own education"))

## 
## Table 3 column 5
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                        First difference (v)    
## -----------------------------------------------
## Own education                0.092***          
##                               (0.024)          
##                                                
## Constant                      -0.079*          
##                               (0.045)          
##                                                
## -----------------------------------------------
## Observations                    149            
## R2                             0.092           
## Adjusted R2                    0.086           
## Residual Std. Error      0.554 (df = 147)      
## F Statistic           14.914*** (df = 1; 147)  
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01

c. Explain how this coefficient should be interpreted.

The coefficient can be interpreted that a unit increase of intrapair difference in education in twins will increase the intrapair difference in income by 9.2% on average.

d. Reproduce the result in table 3 column 1. You will need to reshape the data first.

Hint: I used the reshape command from the rehsape2 package. It likes to have a “.” in variable names so I renamed the variables with “.1” and “.2” instead of just “1” and “2” – but you can avoid that by just setting sep=““. There are probably other ways to do it using melt or gather.

library(reshape2)
d2 <- reshape(d,
          idvar= c("famid","age"),
             sep= "",
           timevar = "twin",
          direction = "long",
          varying = 3:ncol(d))

d2$age2 <- ((d2$age)^2)/100
g2 <- lm(lwage ~ educ + age + age2 + male + white , data = d2)
library(stargazer)
stargazer(g2,  
          type="text",
          title = "Table 3 column 1",
          dep.var.labels = c("OLS (i)"),
          covariate.labels = c("Own education","Age","Age squared(/100)","Male","White"))

## 
## Table 3 column 1
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                               OLS (i)          
## -----------------------------------------------
## Own education                0.084***          
##                               (0.014)          
##                                                
## Age                          0.088***          
##                               (0.019)          
##                                                
## Age squared(/100)            -0.087***         
##                               (0.023)          
##                                                
## Male                         0.204***          
##                               (0.063)          
##                                                
## White                        -0.410***         
##                               (0.127)          
##                                                
## Constant                      -0.471           
##                               (0.426)          
##                                                
## -----------------------------------------------
## Observations                    298            
## R2                             0.272           
## Adjusted R2                    0.260           
## Residual Std. Error      0.532 (df = 292)      
## F Statistic           21.860*** (df = 5; 292)  
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01

e. Explain how the coefficient on education should be interpreted.

If the years of schooling increases by one year when other variables remain the same, the wages of the twins will increase by 8.4% on average.

f. Explain how the coefficient on the control variables should be interpreted.

When twins grow one year older, holding other variables constant, wages increase by an average of 8.8%.

Age squared \[wage = \beta_1 + \beta_1 age + \beta_2 age^2\] The coefficient of age is positive and the coefficient of age squared is negative, which means that the relationship between age and wage is a inverted “U” shape. Wages increases as age increase but at a certain peak, wages start to decrease when age increases.
Male

Male twins on average earn 20.4% more wages than female holding other variables constant.

White

White people on average earn 41% less wages than other races holding other variables constant.

HW4

Sue Zeng

2023-02-02

Part 1: Paper using randomized data: Impact of Class Size on Learning

1.1. Briefly answer these questions:

a. What is the causal link the paper is trying to reveal?

b. What would be the ideal experiment to test this causal link?

c. What is the identification strategy?

d. What are the assumptions / threats to this identification strategy?

Part 2: Paper using Twins for Identification: Economic Returns to Schooling

2.1. Briefly answer these questions:

a. What is the causal link the paper is trying to reveal?

b. What would be the ideal experiment to test this causal link?

c. What is the identification strategy?

d. What are the assumptions / threats to this identification strategy? (Answer specifically with reference to the data the authors are using)

2.2. Replication analysis

a. Load Ashenfelter and Krueger AER 1994 data. You can load it directly from my website here. Variable names should be self-explanatory if you read the paper.

b. Reproduce the result from table 3 column 5.

c. Explain how this coefficient should be interpreted.

d. Reproduce the result in table 3 column 1. You will need to reshape the data first.

e. Explain how the coefficient on education should be interpreted.

f. Explain how the coefficient on the control variables should be interpreted.