Download and go over this seminal paper by Alan Krueger. Krueger (1999) Experimental Estimates of Education Production Functions QJE 114 (2) : 497-532
ANSWER: The author is trying to explore the impact of class size on student’s educational outcome. In other words, whether smaller class sizes lead to higher positive student educational outcome.
ANSWER: The Ideal experiment would be a random sample of initial sample of kids and that sample is maintained throughout the period of experiment without the contamination owing to attrition or input of new sample at different period.
ANSWER: The identifcation strategy is that the change in educational outcome is solely coming from the random sorting into different class sizes and not any other effect impacting the educational outcome.
ANSWER: Ashenfelter & Krueger re-examine the returns to schooling on wages hypothesis by examining the variation in education level and wages among identical twins.
ANSWER: The ideal experiment for the authors would have been to randomly sample “twins” across the USA that ensures twins of diiferent backgrounds rather than at an event with gathering of twins which could cause a selection bias. Moreover, to get an accurate measure of schooling and wages
ANSWER: The identification strategy is claimed to be a random sample of twins at a large and popular event that brings a lot of twins from across the US and world. Moreover, isolating the twins and aksing theirs and their siblings educational and wages helps to address the measurement error in self-repoting characteristics.
setwd("C:\\Users\\Akash\\Dropbox\\UGA\\AAEC8610AdvEcotrix_Filipski\\FilipskiHW9")
library(tidyverse)
library(haven)
library(stargazer)
mydata <- read_dta(file = "AshenfelterKrueger1994_twins.dta")
head(mydata)mydata$wagediff = mydata$lwage2-mydata$lwage1
mydata$educdiff = mydata$educ2-mydata$educ1
firstdiff.model = lm(wagediff~educdiff, data=mydata)stargazer(firstdiff.model,
header=FALSE, type='html',
font.size="small", digits=3,
omit.stat=c("adj.rsq", "ser", "f"),
title = "Result of Table 3 and Column 5")| Dependent variable: | |
| wagediff | |
| educdiff | 0.092*** |
| (0.024) | |
| Constant | 0.079* |
| (0.045) | |
| Observations | 149 |
| R2 | 0.092 |
| Note: | p<0.1; p<0.05; p<0.01 |
Moreover, replicating the figure 1:
plot(mydata$educdiff, mydata$wagediff, main="FIGURE 1. INTRAPAIR RETURNS TO SCHOOLING, IDENTICAL TWINS",
xlab="Difference in Years of Schooling", ylab="Difference in Log Hourly Wage", pch=19)
abline(lm(wagediff~educdiff, data=mydata), col="red") # regression line (y~x)ANSWER: The first-difference estimate tells us that the if the intr-pair difference in schooling increases by 1 year the intra-pair difference in wages among the twin sibling increases by 9.2%, ceteris paribus. Thus the return to schooling has a positive and statistically significant effect on wages.
library(tidyr)
mydata.wide = mydata[-(11:12)]
# Make sure the subject column is a factor
mydata.wide$famid <- factor(mydata.wide$famid)
#First to long
mydata.long <- gather(mydata.wide, variables, values, educ1:white2, factor_key=TRUE)## Warning: attributes are not identical across measure variables;
## they will be dropped
# Separate the text from numeric
separate_DF <- mydata.long %>% separate(variables, sep = "(?<=[A-Za-z])(?=[0-9])", c("xvar", "twinid"))
# Now spread it to look like a long data we need.
spread_df <- separate_DF %>% spread(xvar, values)
## Generate AGe squared /100
spread_df$agesq = (spread_df$age*spread_df$age)/100
## Now run your OLS regression
ols.model = lm(lwage ~ educ+age+agesq+male+white, data = spread_df)
##print(summary(ols.model),digits=3)stargazer(ols.model,
header=FALSE, type='html',
font.size="small", digits=3,
omit.stat=c("rsq", "ser", "f"),
title = "Result of Table 3 and Column 1")| Dependent variable: | |
| lwage | |
| educ | 0.084*** |
| (0.014) | |
| age | 0.088*** |
| (0.019) | |
| agesq | -0.087*** |
| (0.023) | |
| male | 0.204*** |
| (0.063) | |
| white | -0.410*** |
| (0.127) | |
| Constant | -0.471 |
| (0.426) | |
| Observations | 298 |
| Adjusted R2 | 0.260 |
| Note: | p<0.1; p<0.05; p<0.01 |
ANSWER: The OLS estimate on returns to schooling tells us the effect of schooling on the wage of 8.4 percent per year completed, ceteris paribus.
ANSWER:
Age: If the age of the twin changes by 1 year, the wage increases by 8.8%, ceteris paribus.
Age-squared: This variable estimates the non-linear effect of age where the negative coefficient tells us that wages increase as age increases and decreases after a certain age (a concave relationship). Doing some math \[\dfrac{-\beta[age]}{(2*\beta[age^2])} \] tells us that the peak is at age 50.5 years.
Male: Being a Male increases the wage by 20.4%, ceteris paribus
White: Effect of being White reduces wages by 41.0% , ceteris paribus. The authors mention this is different from earlier CPS studies.
Reference: Card and Krueger (1994) Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania AER 84(4): 772-793.
ANSWER: The authors look at the effect of an increase in minimum wage on establishment level employment outcomes. The authors use the minimum wage policy change in New Jersey on April 1, 1992 as a treatment variable and see its effect on job growth in the fast-food industry of New Jersey and compare it to those in Pennsylvania (a neighboring state with no change in policy). In words of the authors, “How do employers in a low-wage labor market respond to an increase in the mini- mum wage?”
ANSWER: Ideal experiment would be if the policy introduction was truly random and not precipitated by economic sitautions.
ANSWER: The identification strategy relies on a couple of things. First, on the assumption that it is unlikely that the rise in minimum wage was obscured by an economy in agood period. Second, based on the parallel trends assumption, East Pennsylvania fast-food restaurants serve as a control group for NJ treatment group of fast food restaurant.
(Note: for this one, you will not obtain exacly the same results as the paper - but close.)
ANSWER:
rm(list = ls())
setwd("C:\\Users\\Akash\\Dropbox\\UGA\\AAEC8610AdvEcotrix_Filipski\\FilipskiHW9")
task3data = read.csv("CardKrueger1994_fastfood.csv", header = TRUE)
head(task3data)ANSWER:
## Computing Table 2 statistics for verifying the data.
#DISTRIBUTION OF STORES :
### NewJersey
njstore.bk = format(round(mean(bk[state==1], na.rm=TRUE)*100,1),nsmall=1)
njstore.kfc = format(round(mean(kfc[state==1], na.rm=TRUE)*100,1),nsmall=1)
njstore.roys = format(round(mean(roys[state==1], na.rm=TRUE)*100,1) ,nsmall=1)
njstore.wendys = format(round(mean(wendys[state==1], na.rm=TRUE)*100,1),nsmall=1)
### Pennsylvania
pennstore.bk = format(round(mean(bk[state==0], na.rm=TRUE)*100,1) ,nsmall=1)
pennstore.kfc = format(round(mean(kfc[state==0], na.rm=TRUE)*100,1),nsmall=1)
pennstore.roys = format(round(mean(roys[state==0], na.rm=TRUE)*100,1) ,nsmall=1)
pennstore.wendys = format(round(mean(wendys[state==0], na.rm=TRUE)*100,1),nsmall=1)
##### T_TESTS
tstat.bk = t.test(bk[state==1], bk[state==0], var.equal = FALSE)
#round(tstat.bk$statistic,1)
tstat.kfc = t.test(kfc[state==1], kfc[state==0], var.equal = FALSE)
#round(tstat.kfc$statistic,1)
tstat.roys = t.test(roys[state==1], roys[state==0], var.equal = FALSE)
#round(tstat.roys$statistic,1)
tstat.wendys = t.test(wendys[state==1], wendys[state==0], var.equal = FALSE)
#round(tstat.wendys$statistic,1)
# WAVE 1 : MEAN FTE -
FTE.NJ.wave1.mean = round(mean(emptot[state==1], na.rm=TRUE),1)
FTE.NJ.wave1.sd = sd(emptot[state==1], na.rm=TRUE)
FTE.NJ.wave1.se = round(
FTE.NJ.wave1.sd/sqrt(length(which(!is.na(emptot[state==1])))), 2
)
FTE.PENN.wave1.mean = round(mean(emptot[state==0], na.rm=TRUE),1)
FTE.PENN.wave1.sd = sd(emptot[state==0], na.rm=TRUE)
FTE.PENN.wave1.se = round(
FTE.PENN.wave1.sd/sqrt(length(which(!is.na(emptot[state==0])))), 2
)
##### T_TESTS - WAVE 1 _ FTE
tstat.fte.wave1 = t.test(emptot[state==1], emptot[state==0], var.equal = FALSE)
#format(round(tstat.fte.wave1$statistic,1), nsmall=1)
# WAVE 2 : MEAN FTE -
FTE.NJ.wave2.mean = format(round(mean(emptot2[state==1], na.rm=TRUE),1), nsmall=1)
FTE.NJ.wave2.sd = sd(emptot2[state==1], na.rm=TRUE)
FTE.NJ.wave2.se = round(
FTE.NJ.wave2.sd/sqrt(length(which(!is.na(emptot2[state==1])))),2
)
FTE.PENN.wave2.mean =format(round(mean(emptot2[state==0], na.rm=TRUE),1), nsmall=1)
FTE.PENN.wave2.sd = sd(emptot2[state==0], na.rm=TRUE)
FTE.PENN.wave2.se = round(
FTE.PENN.wave2.sd/sqrt(length(which(!is.na(emptot2[state==0])))), 2
)
##### T_TESTS - WAVE 1 _ FTE
tstat.fte.wave2 = t.test(emptot2[state==1], emptot2[state==0], var.equal = FALSE)
#format(round(tstat.fte.wave2$statistic,1), nsmall=1)This is the reproduction of Table 2 - it verifies our data matches the paper.
## TABLE 2:
row1 = c(njstore.bk, pennstore.bk, round(tstat.bk$statistic,1) )
row2 = c(njstore.kfc, pennstore.kfc, round(tstat.kfc$statistic,1) )
row3 = c(njstore.roys, pennstore.roys, round(tstat.roys$statistic,1) )
row4 = c(njstore.wendys, pennstore.wendys, round(tstat.wendys$statistic,1) )
row5 = c("","","")
row6 = c(FTE.NJ.wave1.mean, FTE.PENN.wave1.mean, format(round(tstat.fte.wave1$statistic,1), nsmall=1))
row7 = c(FTE.NJ.wave1.se, FTE.PENN.wave1.se,"")
row8 = c("","","")
row9 = c(FTE.NJ.wave2.mean, FTE.PENN.wave2.mean, format(round(tstat.fte.wave2$statistic,1), nsmall=1))
row10 = c(FTE.NJ.wave2.se, FTE.PENN.wave2.se,"")
tab2 <- data.frame(
rbind(row1, row2, row3, row4, row5, row6, row7, row8, row9, row10)
)
row.names(tab2)<-c("a. Burger King", "b. KFC", "c. Roy Rogers","d. Wendys", "Means In Wave 1", "FTE Employment (Wave I)", "Std. Err (Wave I).", "Means In Wave II", "FTE Employment (Wave II)", "Std. Err. (Wave II)")
colnames(tab2) <- c("Stores: NJ","Stores:PA","t-statistic")
tab2Don’t use a regression. Reproduce the columns 1,2,3 rows 1,2,4 (not 3) of the top-left corner of Table 3 in the paper. You will not obtain the exact same estimates - but pretty close (differences only on the decimals). You can skip computing the standard errors by hand if you are not sure how to do that. My table is transposed compared to theirs, but the results are the same. If you have time, try to make a table that matches theirs, but that isn’t the point here.
ANSWER:
Let’s first get column 1,2 and 3 of the row 1 of Table 3: FTE employment before minimum wage policy change.
### 1. FTE employment before policy change all available observations
##Pennsylvania:
FTE.PENN.before = round(mean(emptot[state==0], na.rm=TRUE),2)
FTE.PENN.before.sd = sd(emptot[state==0], na.rm=TRUE)
FTE.PENN.before.se = round(FTE.PENN.before.sd/sqrt(length(which(!is.na(emptot[state==0])))), 2)
##NEW JERSEY (COLUMN 2):
FTE.NJ.before = round(mean(emptot[state==1], na.rm=TRUE),2)
FTE.NJ.before.sd = sd(emptot[state==1], na.rm=TRUE)
FTE.NJ.before.se = round(FTE.NJ.before.sd/sqrt(length(which(!is.na(emptot[state==1])))), 2)
## DIFFERENCE BETWEEN COLUMN 1 and Column 2 [NJ-PA]:
#Difference between means
## Calculate difference btween the means
diff.means <- FTE.NJ.before - FTE.PENN.before
## GETTING THE STANDARD ERROR OF DIFFERENCE OF MEANS
obs <- c(length(which(!is.na(emptot[state==0]))),length(which(!is.na(emptot[state==1]))))
SDs <- c(FTE.PENN.before.sd, FTE.NJ.before.sd)
# Standard error of difference
se.diff.before = round(sqrt(
((SDs[1]^2)/obs[1]) +
((SDs[2]^2)/obs[2])
), 2
)
## put means & Se into a vector
means.FTE.before<-c(FTE.PENN.before,FTE.NJ.before, diff.means)
se.FTE.before<- c(FTE.PENN.before.se,FTE.NJ.before.se, se.diff.before)
## add means to dataframe
tab3 <- rbind(means.FTE.before)
tab3<- rbind(tab3,se.FTE.before)
row.names(tab3)[1] <- "FTE employment before"
row.names(tab3)[2] <- "Std. Err. (Before)"Now, we will calculate column 1,2 and 3 of the row 2 of Table 3: FTE employment AFTER minimum wage policy change.
### 1. FTE employment AFTER policy change all available observations
##Pennsylvania (COLUMN 1):
FTE.PENN.after = round(mean(emptot2[state==0], na.rm=TRUE),2)
FTE.PENN.after.sd = sd(emptot2[state==0], na.rm=TRUE)
FTE.PENN.after.se = round(FTE.PENN.after.sd/sqrt(length(which(!is.na(emptot2[state==0])))), 2)
##NEW JERSEY (COLUMN 2):
FTE.NJ.after = round(mean(emptot2[state==1], na.rm=TRUE),2)
FTE.NJ.after.sd = sd(emptot2[state==1], na.rm=TRUE)
FTE.NJ.after.se = round(FTE.NJ.after.sd/sqrt(length(which(!is.na(emptot2[state==1])))), 2)
## DIFFERENCE BETWEEN COLUMN 1 and Column 2 [NJ-PA]:
#Difference between means
## Calculate difference btween the means
diff.means.after <- FTE.NJ.after - FTE.PENN.after
## GETTING THE STANDARD ERROR OF DIFFERENCE OF MEANS
obs <- c(length(which(!is.na(emptot2[state==0]))),length(which(!is.na(emptot2[state==1]))))
SDs <- c(FTE.PENN.after.sd, FTE.NJ.after.sd)
# Standard error of difference
se.diff.after = round(sqrt(
((SDs[1]^2)/obs[1]) +
((SDs[2]^2)/obs[2])
), 2
)
## put means & Se into a vector
means.FTE.after<-c(FTE.PENN.after,FTE.NJ.after, diff.means.after)
se.FTE.after<- c(FTE.PENN.after.se,FTE.NJ.after.se, se.diff.after)
## add means to dataframe
tab3 <- rbind(tab3, means.FTE.after)
tab3<- rbind(tab3,se.FTE.after)
row.names(tab3)[3] <- "FTE employment After"
row.names(tab3)[4] <- "Std. Err. (After)"Now, we calculate row 4 (column 1, 2, 3) of Table 3. For this we need a balanced sample.
### 1. FTE employment AFTER policy change all available observations
smalldf = subset(task3data, select=c(id, state, emptot,emptot2, demp))
smalldf = na.omit(smalldf)
## PENNSYLVANIA
FTE.PENN.before = round(mean(smalldf$emptot[smalldf$state==0], na.rm=TRUE),2)
FTE.PENN.after = round(mean(smalldf$emptot2[smalldf$state==0], na.rm=TRUE),2)
## Calculate difference btween the means
diff.means.PENN <- FTE.PENN.after - FTE.PENN.before
# Standard error of difference
se.diff.PENN = round(
sqrt(var(smalldf$demp[smalldf$state==0])/length(smalldf$demp[smalldf$state==0])),
2)
##NEW JERSEY
FTE.NJ.before = round(mean(smalldf$emptot[smalldf$state==1]),2)
FTE.NJ.after = round(mean(smalldf$emptot2[smalldf$state==1]),2)
## Calculate difference btween the means
diff.means.NJ <- FTE.NJ.after - FTE.NJ.before
# Standard error of difference
se.diff.NJ = round(
sqrt(var(smalldf$demp[smalldf$state==1])/length(smalldf$demp[smalldf$state==1])),
2)
## DIFFERENCES IN DIFFERENCES
## COLUMN 2-Column 1 (NJ-PA)
diffindiff <- diff.means.NJ-diff.means.PENN
## Std Err
se.diffindiff = round(sqrt(se.diff.PENN^2+se.diff.NJ^2),2)
## put means & Se into a vector
diff.FTE<-c(diff.means.PENN ,diff.means.NJ , diffindiff)
se.FTE.diff<- c(se.diff.PENN,se.diff.NJ, se.diffindiff)THIS IS THE TABLE 3
## add means to dataframe
tab3 <- rbind(tab3, diff.FTE)
tab3<- rbind(tab3,se.FTE.diff)
row.names(tab3)[5] <- "Change in Mean FTE employment, balanced"
row.names(tab3)[6] <- "Std. Err. (Change in Mean FTE)"
tab3.df = data.frame(tab3)
colnames(tab3.df)[1] = "PA"
colnames(tab3.df)[2] = "NJ"
colnames(tab3.df)[3] = "Difference: NJ-PA"
tab3.dfANSWER:
I get the estimates that match the paper.
The differences in differences estimator tells us whether the expected mean change in outcome (here, employment level in fast food industry) from before to after the introduction of the minimum wage policy was different between Pennsylvania and New Jersey. We see that the “relative gain” in employment in NJ is 2.75 FTE employees (statistically significant at 5%).
ANSWER:
smalldf = subset(task3data, select=c(id, state, emptot,emptot2, demp))
smalldf = na.omit(smalldf)
lmfit <-lm(demp~state, data = smalldf)
#Load libraries
library("lmtest")
library("sandwich")
# Robust t test
#Using "HC1" will replicate the robust standard errors you would obtain using STATA.
simple.ols<- coeftest(lmfit, vcov = vcovHC(lmfit, type = "HC1"))The results are represented in the below table:
library(stargazer)
stargazer(simple.ols, header=FALSE, type='html', font.size="small", digits=2,
omit.stat=c("adj.rsq", "ser", "f"), title = "OLS to obtain Diff-in-diff")| Dependent variable: | |
| state | 2.75** |
| (1.34) | |
| Constant | -2.28* |
| (1.25) | |
| Note: | p<0.1; p<0.05; p<0.01 |
ANSWER:
rm(list = ls())
task3data = read.csv("CardKrueger1994_fastfood.csv", header = TRUE)
task3data$emptot0 <- task3data$emptot
task3data$emptot1 <- task3data$emptot2
library(tidyr)
task3.wide = subset(task3data, select = c("id", "state", "emptot0", "emptot1"))
# Make sure the subject column is a factor
task3.wide$id <- factor(task3.wide$id)
#First to long
task3.long <- gather(task3.wide, variables, empvalue, emptot0:emptot1, factor_key=TRUE)
# Separate the text from numeric
separate_DF <- task3.long %>% separate(variables, sep = "(?<=[A-Za-z])(?=[0-9])", c("xvar", "treatdate"))
# Now spread it to look like a long data we need.
spread_df <- separate_DF %>% spread(xvar, empvalue)
spread_df$state<-as.factor(spread_df$state)
spread_df$treatdate<-as.factor(spread_df$treatdate)
summary(spread_df)## id state treatdate emptot
## 407 : 4 0:158 0:410 Min. : 0.00
## 1 : 2 1:662 1:410 1st Qu.:14.50
## 2 : 2 Median :20.00
## 3 : 2 Mean :21.03
## 4 : 2 3rd Qu.:25.50
## 5 : 2 Max. :85.00
## (Other):806 NA's :26
ANSWER:
##fixed effects regression uses the plm package
library(plm)
fe <- plm(emptot ~ state + treatdate + state*treatdate
, data=spread_df, index = c("id") )
fe.robust<- coeftest(fe, vcov = vcovHC(fe, type = "HC1"))stargazer(fe.robust, header=FALSE, type='html', font.size="small", digits=2,
omit.stat=c("adj.rsq", "ser", "f"), title = "Fixed Effect Diff-In-Diff")| Dependent variable: | |
| state1 | 0.88 |
| (0.67) | |
| treatdate1 | -2.28* |
| (1.25) | |
| state1:treatdate1 | 2.75** |
| (1.34) | |
| Note: | p<0.1; p<0.05; p<0.01 |
COMMENT: My results exactly match the results of Table 3 . I get the diff-in-diff estimator : 2.75 (1.34). Here, again the interpretation is that post-treatment (change in min wage) employment level rose 2.75 FTE in New Jersey. So, running a regression is an easier way to obtain a diff-in-diff estimator
ANSWER: “Linear Trends” or “Parallel Trends” assumption tells us that the validity of the causal effect depends on the assumption that trends would have been the same in absence of treatment.