Download and go over this seminal paper by David Card and Alan
Krueger. Card and Krueger (1994) Minimum Wages and
Employment: A Case Study of the Fast-Food Industry in New Jersey and
Pennsylvania AER 84(4): 772-793. Be careful: They released a 2000
follow up with the exact same title followed by “:Reply”. We want the
original, not the follow-up.
1.1. Briefly answer these questions:
a. What is the causal link the paper is trying to reveal?
This paper seeks to identify the impact of minimum wages raise on employment growth.
b. What would be the ideal experiment to test this causal link?
The ideal experiment should be set in an environment where are a number of states with identical demographic characteristics, the same number of restaurants with the same size, the minimum wage is raised in some randomly chosen states while the minimum wages for the rest of the states remain constant. Then the change of employment would be captured by comparing the controlled groups and the treatment groups with the increase in minimum wage. However such environment is unrealistic in real world.
c. What is the identification strategy?
The author conducted two waves of survey over four major fast food chains covering 410 stores in New Jersey, where the minimum wage was raised eight months after the second wave, and the adjacent Eastern Pennsylvania, where the minimum wage had not been changed and share very similar geographic and demographic attributes with NJ.
d. What are the assumptions / threats to this identification strategy?
The assumption is that East Pennsylvania is neighboring New Jersey,
two regions share highly similar characteristics and can be considered
as control group and treatment group to test the impact of minimum wage
increase. One threat to this identification strategy is the raise of
minimum wage in NJ would cause a spill-over effect that motivates
employees previously working in fast food chains in eastern PA to move
to NJ and work in fast food chains in NJ.
a. Load data from Card and Krueger AER 1994
You can load it directly from my website here. Variable names are self-explanatory if you read the paper.
library(haven)
library(stargazer)
library(foreign)
library(kableExtra)
library(vtable)
library(dplyr)
library(reshape)
setwd("/Users/hunteryuan/Downloads/AAEC 8610/R Working Directory/HW5")
Fastfood <- read.csv("CardKrueger1994_fastfood.csv")
head(Fastfood)
## id state emptot emptot2 demp chain bk kfc roys wendys wage_st wage_st2
## 1 46 0 40.50 24.0 -16.50 1 1 0 0 0 NA 4.30
## 2 49 0 13.75 11.5 -2.25 2 0 1 0 0 NA 4.45
## 3 506 0 8.50 10.5 2.00 2 0 1 0 0 NA 5.00
## 4 56 0 34.00 20.0 -14.00 4 0 0 0 1 5.0 5.25
## 5 61 0 24.00 35.5 11.50 4 0 0 0 1 5.5 4.75
## 6 62 0 20.50 NA NA 4 0 0 0 1 5.0 NA
b. Verify that the data is correct. Reproduce the % of Burger King, KFC, Roys, and Wendys, as well as the FTE means in the 2 waves. (Note: This is just to force you to do a summary stats table with R. I used group_by then %>% then summarize. I’m sure some of you will find better ways to do it.)
Fastfood$state_name <- ifelse(Fastfood$state == "0", "PA", "NJ")
Table_2b <- t(Fastfood %>%
group_by(state_name) %>%
summarise(bk_bar = mean(bk)*100,
kfcbk_bar = mean(kfc)*100,
wendys_bar = mean(wendys)*100,
roys_bar = mean(roys)*100,
emptot_bar = mean(emptot, na.rm = TRUE),
emptot2_bar = mean(emptot2, na.rm = TRUE)))
colnames(Table_2b) <- Table_2b[1, ]
Table_2b <- Table_2b[-1, ]
rownames(Table_2b) <- c("Burger King", "KFC","Wendy's","Roy Rogers","FTE Wave 1","FTE Wave 2")
Table_2b
## NJ PA
## Burger King "41.08761" "44.30380"
## KFC "20.54381" "15.18987"
## Wendy's "13.59517" "18.98734"
## Roy Rogers "24.77341" "21.51899"
## FTE Wave 1 "20.43941" "23.33117"
## FTE Wave 2 "21.02743" "21.16558"
c. Use a “first-differenced” OLS to obtain their Diff-in-diff estimator (almost – you won’t get it exactly). Comment on how your OLS compared to the DiD estimate in Table 3 of the paper.
dif_ols <- lm(demp ~ state, data = Fastfood)
stargazer(dif_ols, type = "text", title = "Table 3 Column (iii)",
align = TRUE,
dep.var.labels = c("Difference, NJ - PA"), covariate.labels = c("State"),
keep.stat = c("n", "rsq", "adj.rsq"), omit = c("Constant"),
table.layout = "=d=t-s=n")
##
## Table 3 Column (iii)
## ========================================
## Difference, NJ - PA
## ========================================
## State 2.750**
## (1.154)
##
## ----------------------------------------
## Observations 384
## R2 0.015
## Adjusted R2 0.012
## ========================================
## Note: *p<0.1; **p<0.05; ***p<0.01
The result indicates that after the minimum wage increase, NJ
fast food restaurants gain 2.75 full-time-equivalent (FTE) employees in
average, in relative to PA fast food restaurants. The coefficient
estimate is significant at 95% level of confidience. The DiD estimate in
Table 3 of the paper indicates a relative gain of 2.76 FTE employees,
which is slightly higher than the 1st-difference OLS estimae.
d. What would be the equation of a standard “difference in difference” regression? Just write down the equation and briefly explain each coefficient.
The equation of a standard “Difference-in-Difference” regression in the context of this paper would be:
where:
\(Y_{i,t}\) is the outcome employment variable for individual i from time t;
State is a binary dummy variable that is equal to 0 if that individual is in the control group (PA) and equal to 1 if that individual is in the treatment group (NJ);
State \(*\) \(Time_t\) is the interaction term between two waves of the survey and the treatment of minimum wage increase;
\(\epsilon_{i,t}\) is the error term.
e. Compute the difference-in-differences estimator “by hand”. Interpret the results in a couple of sentences.
\(\Delta^{Wave1}_{NJ-PA}\) =
20.44 - 23.33 = -2.89
\(\Delta^{Wave2}_{NJ-PA}\) = 21.03 - 21.07 = -0.14
\(\beta_{DID} =\)\(\Delta^{Wave2}_{NJ-PA}\)-\(\Delta^{Wave1}_{NJ-PA}\) = -0.14 - (-2.89)
= 2.75
f. Run the regression you wrote up in part d.
(Note: You will likely need to reshape your data to long form first)
Comment on the results you obtain.
Fastfood_1 <- reshape(Fastfood, varying=c("emptot", "emptot2"),
v.names=c("employ_tot"),
timevar = "time",
times=c("1", "2"),
idvar = c("id", "state"),
direction = "long")
head(Fastfood_1)
## id state demp chain bk kfc roys wendys wage_st wage_st2 state_name
## 46.0.1 46 0 -16.50 1 1 0 0 0 NA 4.30 PA
## 49.0.1 49 0 -2.25 2 0 1 0 0 NA 4.45 PA
## 506.0.1 506 0 2.00 2 0 1 0 0 NA 5.00 PA
## 56.0.1 56 0 -14.00 4 0 0 0 1 5.0 5.25 PA
## 61.0.1 61 0 11.50 4 0 0 0 1 5.5 4.75 PA
## 62.0.1 62 0 NA 4 0 0 0 1 5.0 NA PA
## time employ_tot
## 46.0.1 1 40.50
## 49.0.1 1 13.75
## 506.0.1 1 8.50
## 56.0.1 1 34.00
## 61.0.1 1 24.00
## 62.0.1 1 20.50
DID_3f <- lm(employ_tot ~ state + time + state*time, data = Fastfood_1)
stargazer(DID_3f, type="text", align=TRUE,
title="Difference in Difference Regression",
dep.var.labels = "Difference, NJ - PA",
covariate.labels=c("State", "Time", "State * Time"),
keep.stat = c("n", "rsq"), omit=c("Constant", "adj.rsq"),
table.layout = "=d=t-s=n")
##
## Difference in Difference Regression
## ========================================
## Difference, NJ - PA
## ========================================
## State -2.892**
## (1.194)
##
## Time -2.166
## (1.516)
##
## State * Time 2.754
## (1.688)
##
## ----------------------------------------
## Observations 794
## R2 0.007
## ========================================
## Note: *p<0.1; **p<0.05; ***p<0.01
The obtained result is slightly larger in magnitude compared to
the “first-differenced” OLS estimate from Part 2 Question 3 but closer
to the the DiD estimate from the paper. However, the estimate is not
statistically significant.