Card and Krueger(1994) Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania AER 84(4): 772-793.
Briefly answer these questions:
a.What is the causal link the paper is trying to reveal?
The paper is trying to find the effects of minimum wage on establishment-level employment outcomes in New Jersey and Pennsylvania.
b.What would be the ideal experiment to test this causal link?
The idea experiment to test this causal link would be comparing employment, wages, and prices at stores before and after the increase of minimum wage.Within the New Jersey, comparisons could be made between initially high-wage stores and other stores.
c.What is the identification strategy?
The identification strategy is the Difference-in-differences estimation before and after the increase in New Jersey’s minimum wage.
d.What are the assumptions/threats to this identification strategy?
The paper used stores of eastern Pennsylvania as a control group for comparison. One of the threat to the difference-in-difference estimation would be the sample size before and after, but the paper was able to address the threat.
a.Load data from Card and Krueger AER 1994.
#Load dataset
df <- read.csv("CardKrueger1994_fastfood.csv")
head(df)
## id state emptot emptot2 demp chain bk kfc roys wendys wage_st wage_st2
## 1 46 0 40.50 24.0 -16.50 1 1 0 0 0 NA 4.30
## 2 49 0 13.75 11.5 -2.25 2 0 1 0 0 NA 4.45
## 3 506 0 8.50 10.5 2.00 2 0 1 0 0 NA 5.00
## 4 56 0 34.00 20.0 -14.00 4 0 0 0 1 5.0 5.25
## 5 61 0 24.00 35.5 11.50 4 0 0 0 1 5.5 4.75
## 6 62 0 20.50 NA NA 4 0 0 0 1 5.0 NA
summary(df)
## id state emptot emptot2
## Min. : 1.0 Min. :0.0000 Min. : 5.00 Min. : 0.00
## 1st Qu.:119.2 1st Qu.:1.0000 1st Qu.:14.56 1st Qu.:14.50
## Median :237.5 Median :1.0000 Median :19.50 Median :20.50
## Mean :246.5 Mean :0.8073 Mean :21.00 Mean :21.05
## 3rd Qu.:371.8 3rd Qu.:1.0000 3rd Qu.:24.50 3rd Qu.:26.50
## Max. :522.0 Max. :1.0000 Max. :85.00 Max. :60.50
## NA's :12 NA's :14
## demp chain bk kfc
## Min. :-41.50000 Min. :1.000 Min. :0.0000 Min. :0.0000
## 1st Qu.: -4.00000 1st Qu.:1.000 1st Qu.:0.0000 1st Qu.:0.0000
## Median : 0.00000 Median :2.000 Median :0.0000 Median :0.0000
## Mean : -0.07044 Mean :2.117 Mean :0.4171 Mean :0.1951
## 3rd Qu.: 4.00000 3rd Qu.:3.000 3rd Qu.:1.0000 3rd Qu.:0.0000
## Max. : 34.00000 Max. :4.000 Max. :1.0000 Max. :1.0000
## NA's :26
## roys wendys wage_st wage_st2
## Min. :0.0000 Min. :0.0000 Min. :4.250 Min. :4.250
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:4.250 1st Qu.:5.050
## Median :0.0000 Median :0.0000 Median :4.500 Median :5.050
## Mean :0.2415 Mean :0.1463 Mean :4.616 Mean :4.996
## 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:4.950 3rd Qu.:5.050
## Max. :1.0000 Max. :1.0000 Max. :5.750 Max. :6.250
## NA's :20 NA's :21
b.Verify that the data is correct.
#Data verification
library(dplyr)
by_store_fte <- t(df %>% #data transpose
group_by(state)%>% #group by state
summarise(across(c(bk, kfc, roys, wendys, emptot, emptot2), list(mean = mean), na.rm = TRUE))) #store average by state
colnames(by_store_fte) <- c("PA", "NJ") #name to column
by_store_fte1 <- round(by_store_fte[1:5, ], 3)*100 #percentage for store and round off to 3 digit
by_store_fte2 <- round(by_store_fte[6:7, ], 2) #round off FTE to 2 digit
newCount <- rbind(by_store_fte1, by_store_fte2) #combine 2 dataset by row
newCount <- newCount[-1, ]
rownames(newCount) <- c("Burger King", "KFC", "Roy Rogers", "Wendy's",
"FTE employment1", "FTE employment2")
newCount
## PA NJ
## Burger King 44.30 41.10
## KFC 15.20 20.50
## Roy Rogers 21.50 24.80
## Wendy's 19.00 13.60
## FTE employment1 23.33 20.44
## FTE employment2 21.17 21.03
c.Use OLS to obtain their Diff-in Diff estimator.
#OLS
modols <- lm(demp ~ state, data = df)
stargazer(modols, type = "text", title = "TABLE 3 output from OLS", align = TRUE, keep.stat = c("n","rsq"), dep.var.labels = c("Difference, NJ - PA"), covariate.labels = c("State"))
##
## TABLE 3 output from OLS
## ========================================
## Dependent variable:
## ---------------------------
## Difference, NJ - PA
## ----------------------------------------
## State 2.750**
## (1.154)
##
## Constant -2.283**
## (1.036)
##
## ----------------------------------------
## Observations 384
## R2 0.015
## ========================================
## Note: *p<0.1; **p<0.05; ***p<0.01
\(\hat{\beta} = 2.75\) and is significant at 5% significance level. If we compare our estimate with that of paper, they are almost equal.The positive estimate indicates that after the change in minimum wage, full time employment (FTE) increased in New Jersey.
d.What would be the equation of a standard “difference in difference” regression?
\[FTE_{i,t} = \alpha+\beta1(state_i)+\beta2(time_{i,t})+\beta3(state_i *time_{i,t})+\epsilon_{i,t}\]
where, \(state_i\) is a group dummy (NJ vs PA).
\(time_{i, t}\) is a period dummy (before vs. after).
e.Run the regression you wrote up in part d.
# Reshape the data
totEmp <- melt(cbind(df$emptot, df$emptot2))
time <- c(rep(0, length(df$emptot)), rep(1, length(df$emptot2))) # 1 after treatment, 0 otherwise
totState <- df$state
#create new dataset
new_df <- data.frame(cbind(totEmp[, 3], totState, time))
colnames(new_df) <- c("emptot", "state", "time")
#Run DiD model
modDiD <- lm(emptot ~ state + time + state*time, data = new_df)
stargazer(modDiD, type = "text", title = "TABLE 3 output from Difference-In-Differences",
align = TRUE, keep.stat = c("n","rsq"),
dep.var.labels = c("Difference, NJ - PA"),
covariate.labels = c("State", "Treatment time"))
##
## TABLE 3 output from Difference-In-Differences
## ==========================================
## Dependent variable:
## ---------------------------
## Difference, NJ - PA
## ------------------------------------------
## State -2.892**
## (1.194)
##
## Treatment time -2.166
## (1.516)
##
## state:time 2.754
## (1.688)
##
## Constant 23.331***
## (1.072)
##
## ------------------------------------------
## Observations 794
## R2 0.007
## ==========================================
## Note: *p<0.1; **p<0.05; ***p<0.01
The estimate (\(\hat{\beta_3} = 2.75\)) is similar to that of paper, however, it is not statistically significant.
f.Compute the difference-in-differences estimator “by hand”. Don’t use a regression.
newCount #using FTEmeans from part2(b)
## PA NJ
## Burger King 44.30 41.10
## KFC 15.20 20.50
## Roy Rogers 21.50 24.80
## Wendy's 19.00 13.60
## FTE employment1 23.33 20.44
## FTE employment2 21.17 21.03
FTE <- newCount[5:6,]
diffMean <- FTE[, 1] - FTE[, 2]
#Results = 2.89, 0.14
DiD <- 2.89 - 0.14
DiD
## [1] 2.75