Download and go over this seminal paper by Alan Krueger. Krueger (1999) Experimental Estimates of Education Production Functions QJE 114 (2) : 497-532
The author tries to reveal that the causal effect of classes’s size on students’ grades.
For the ideal experiment, we should control all the other variables except classes’ size, like similar performances of students, similar teachers, similar classrooms setting. After controlling all the variables, we randomly assign the students to different classes’ size. Then the students will be taught under the same environment for the same period of time. And then they will be asked to take the same exam at the same time and then collect the data of students’ test scores.
The paper uses RCT(randomized controlled trial) conducted in the United States. This experiment is a Tennessee Student/Teacher Achievement Ratio experiment, known as Project STAR. This experiment randomly assign students and teachers into three groups of different class sizes: “small classes (13-17 students per teacher), regular-size classes (22-25 students), and regular/aide classes (22-25 students) which also included a full-time teacher’s aide”. Students of each group are given “standardized tests at the end of each school year”. The experiment last for 4 years. The author then compares the tests score in each class size to analyse the effect of the class size on students’ performance.
Download and go over this seminal paper by Orley Ashenfelter and Alan Krueger. Ashenfelter and Krueger (1994) Estimates of the Economic Return to Schooling from a New Sample of Twins AER 84(5): 1157-1173
This paper analyze the causal effect of schooling level on wages.
The ideal experiment could be impossible to implement. To find this causal link, we should ask pairs of two same people with identity background and ability to attend different level of school and then we follows pairs of these two persons years later to find out the wages they get after controlling all the other variables during their growth.
The author’s team interviewed twins at 16th Annual Twins Days Festival in Twinsburg, Ohio, in August of 1991. The twins they interview is identity twins, which means that they are genetically identity. is After collecting the survey data,
library(haven)
d <- read_dta("hw4/AshenfelterKrueger1994_twins.dta")
# first difference
wage_dif = d$lwage1 - d$lwage2
edu_dif = d$educ1 -d$educ2
g <- lm(wage_dif ~ edu_dif ,data = d)
summary(g)
##
## Call:
## lm(formula = wage_dif ~ edu_dif, data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.03115 -0.20909 0.00722 0.34395 1.15740
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.07859 0.04547 -1.728 0.086023 .
## edu_dif 0.09157 0.02371 3.862 0.000168 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5542 on 147 degrees of freedom
## Multiple R-squared: 0.09211, Adjusted R-squared: 0.08593
## F-statistic: 14.91 on 1 and 147 DF, p-value: 0.0001682
library(stargazer)
##
## Please cite as:
## Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
## R package version 5.2.3. https://CRAN.R-project.org/package=stargazer
stargazer(g,
type="text",
title = "Table 3 column 5",
dep.var.labels = c("First difference (v)"),
covariate.labels = c("Own education"))
##
## Table 3 column 5
## ===============================================
## Dependent variable:
## ---------------------------
## First difference (v)
## -----------------------------------------------
## Own education 0.092***
## (0.024)
##
## Constant -0.079*
## (0.045)
##
## -----------------------------------------------
## Observations 149
## R2 0.092
## Adjusted R2 0.086
## Residual Std. Error 0.554 (df = 147)
## F Statistic 14.914*** (df = 1; 147)
## ===============================================
## Note: *p<0.1; **p<0.05; ***p<0.01
The coefficient can be interpreted that a unit increase of intrapair difference in education in twins will increase the intrapair difference in income by 9.2% on average.
Hint: I used the reshape command from the rehsape2 package. It likes to have a “.” in variable names so I renamed the variables with “.1” and “.2” instead of just “1” and “2” – but you can avoid that by just setting sep=““. There are probably other ways to do it using melt or gather.
library(reshape2)
d2 <- reshape(d,
idvar= c("famid","age"),
sep= "",
timevar = "twin",
direction = "long",
varying = 3:ncol(d))
d2$age2 <- ((d2$age)^2)/100
g2 <- lm(lwage ~ educ + age + age2 + male + white , data = d2)
library(stargazer)
stargazer(g2,
type="text",
title = "Table 3 column 1",
dep.var.labels = c("OLS (i)"),
covariate.labels = c("Own education","Age","Age squared(/100)","Male","White"))
##
## Table 3 column 1
## ===============================================
## Dependent variable:
## ---------------------------
## OLS (i)
## -----------------------------------------------
## Own education 0.084***
## (0.014)
##
## Age 0.088***
## (0.019)
##
## Age squared(/100) -0.087***
## (0.023)
##
## Male 0.204***
## (0.063)
##
## White -0.410***
## (0.127)
##
## Constant -0.471
## (0.426)
##
## -----------------------------------------------
## Observations 298
## R2 0.272
## Adjusted R2 0.260
## Residual Std. Error 0.532 (df = 292)
## F Statistic 21.860*** (df = 5; 292)
## ===============================================
## Note: *p<0.1; **p<0.05; ***p<0.01
If the years of schooling increases by one year when other variables remain the same, the wages of the twins will increase by 8.4% on average.
When twins grow one year older, holding other variables constant, wages increase by an average of 8.8%.
Age squared \[wage = \beta_1 + \beta_1 age + \beta_2 age^2\] The coefficient of age is positive and the coefficient of age squared is negative, which means that the relationship between age and wage is a inverted “U” shape. Wages increases as age increase but at a certain peak, wages start to decrease when age increases.
Male
Male twins on average earn 20.4% more wages than female holding other variables constant.
White people on average earn 41% less wages than other races holding other variables constant.