Part 1: Reading and Questions

Download and go over this seminal paper by David Card and Alan Krueger. Card and Krueger (1994) Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania AER 84(4): 772-793. Be careful: They released a 2000 follow up with the exact same title followed by “:Reply”. We want the original, not the follow-up.

1.1. Briefly answer these questions:

a. What is the causal link the paper is trying to reveal?

This paper seeks to identify the impact of minimum wages raise on employment growth.

b. What would be the ideal experiment to test this causal link?

The ideal experiment should be set in an environment where are a number of states with identical demographic characteristics, the same number of restaurants with the same size, the minimum wage is raised in some randomly chosen states while the minimum wages for the rest of the states remain constant. Then the change of employment would be captured by comparing the controlled groups and the treatment groups with the increase in minimum wage. However such environment is unrealistic in real world.

c. What is the identification strategy?

The author conducted two waves of survey over four major fast food chains covering 410 stores in New Jersey, where the minimum wage was raised eight months after the second wave, and the adjacent Eastern Pennsylvania, where the minimum wage had not been changed and share very similar geographic and demographic attributes with NJ.

d. What are the assumptions / threats to this identification strategy?

The assumption is that East Pennsylvania is neighboring New Jersey, two regions share highly similar characteristics and can be considered as control group and treatment group to test the impact of minimum wage increase. One threat to this identification strategy is the raise of minimum wage in NJ would cause a spill-over effect that motivates employees previously working in fast food chains in eastern PA to move to NJ and work in fast food chains in NJ.

Part 2: Replication Analysis

a. Load data from Card and Krueger AER 1994

You can load it directly from my website here. Variable names are self-explanatory if you read the paper.

library(haven)
library(stargazer)
library(foreign)
library(kableExtra)
library(vtable)
library(dplyr)
library(reshape)

setwd("/Users/hunteryuan/Downloads/AAEC 8610/R Working Directory/HW5")

Fastfood <- read.csv("CardKrueger1994_fastfood.csv")
head(Fastfood)
##    id state emptot emptot2   demp chain bk kfc roys wendys wage_st wage_st2
## 1  46     0  40.50    24.0 -16.50     1  1   0    0      0      NA     4.30
## 2  49     0  13.75    11.5  -2.25     2  0   1    0      0      NA     4.45
## 3 506     0   8.50    10.5   2.00     2  0   1    0      0      NA     5.00
## 4  56     0  34.00    20.0 -14.00     4  0   0    0      1     5.0     5.25
## 5  61     0  24.00    35.5  11.50     4  0   0    0      1     5.5     4.75
## 6  62     0  20.50      NA     NA     4  0   0    0      1     5.0       NA

b. Verify that the data is correct. Reproduce the % of Burger King, KFC, Roys, and Wendys, as well as the FTE means in the 2 waves. (Note: This is just to force you to do a summary stats table with R. I used group_by then %>% then summarize. I’m sure some of you will find better ways to do it.)

Fastfood$state_name  <- ifelse(Fastfood$state == "0", "PA", "NJ") 

Table_2b <- t(Fastfood %>% 
              group_by(state_name) %>%
              summarise(bk_bar = mean(bk)*100,
                       kfcbk_bar = mean(kfc)*100,
                       wendys_bar = mean(wendys)*100,
                       roys_bar = mean(roys)*100,
                       emptot_bar = mean(emptot, na.rm = TRUE),
                       emptot2_bar = mean(emptot2, na.rm = TRUE)))

colnames(Table_2b) <- Table_2b[1, ]
Table_2b <- Table_2b[-1, ]

rownames(Table_2b) <- c("Burger King", "KFC","Wendy's","Roy Rogers","FTE Wave 1","FTE Wave 2")
Table_2b
##             NJ         PA        
## Burger King "41.08761" "44.30380"
## KFC         "20.54381" "15.18987"
## Wendy's     "13.59517" "18.98734"
## Roy Rogers  "24.77341" "21.51899"
## FTE Wave 1  "20.43941" "23.33117"
## FTE Wave 2  "21.02743" "21.16558"

c. Use a “first-differenced” OLS to obtain their Diff-in-diff estimator (almost – you won’t get it exactly). Comment on how your OLS compared to the DiD estimate in Table 3 of the paper.

dif_ols <- lm(demp ~ state, data = Fastfood)
stargazer(dif_ols, type = "text", title = "Table 3 Column (iii)", 
          align = TRUE,  
          dep.var.labels = c("Difference, NJ - PA"), covariate.labels = c("State"),
          keep.stat = c("n", "rsq", "adj.rsq"), omit = c("Constant"), 
          table.layout = "=d=t-s=n") 
## 
## Table 3 Column (iii)
## ========================================
##                  Difference, NJ - PA    
## ========================================
## State                  2.750**          
##                        (1.154)          
##                                         
## ----------------------------------------
## Observations             384            
## R2                      0.015           
## Adjusted R2             0.012           
## ========================================
## Note:        *p<0.1; **p<0.05; ***p<0.01


The result indicates that after the minimum wage increase, NJ fast food restaurants gain 2.75 full-time-equivalent (FTE) employees in average, in relative to PA fast food restaurants. The coefficient estimate is significant at 95% level of confidience. The DiD estimate in Table 3 of the paper indicates a relative gain of 2.76 FTE employees, which is slightly higher than the 1st-difference OLS estimae.

Part 3: Alternative ways of running DiD

d. What would be the equation of a standard “difference in difference” regression? Just write down the equation and briefly explain each coefficient.

The equation of a standard “Difference-in-Difference” regression in the context of this paper would be:

\(Y_{i,t}\) = \(\beta_0\) + \(\beta_1\)State + \(\beta_2\)\(Time_t\) + \(\beta_3\)(State \(*\) \(Time_t\)) + \(\epsilon_{i,t}\)


where:

\(Y_{i,t}\) is the outcome employment variable for individual i from time t;

State is a binary dummy variable that is equal to 0 if that individual is in the control group (PA) and equal to 1 if that individual is in the treatment group (NJ);

State \(*\) \(Time_t\) is the interaction term between two waves of the survey and the treatment of minimum wage increase;

\(\epsilon_{i,t}\) is the error term.

e. Compute the difference-in-differences estimator “by hand”. Interpret the results in a couple of sentences.


\(\Delta^{Wave1}_{NJ-PA}\) = 20.44 - 23.33 = -2.89

\(\Delta^{Wave2}_{NJ-PA}\) = 21.03 - 21.07 = -0.14

\(\beta_{DID} =\)\(\Delta^{Wave2}_{NJ-PA}\)-\(\Delta^{Wave1}_{NJ-PA}\) = -0.14 - (-2.89) = 2.75

f. Run the regression you wrote up in part d. (Note: You will likely need to reshape your data to long form first) Comment on the results you obtain.

Fastfood_1 <- reshape(Fastfood, varying=c("emptot", "emptot2"), 
                      v.names=c("employ_tot"),
                      timevar = "time",
                      times=c("1", "2"),
                      idvar = c("id", "state"),
                      direction = "long")
head(Fastfood_1)
##          id state   demp chain bk kfc roys wendys wage_st wage_st2 state_name
## 46.0.1   46     0 -16.50     1  1   0    0      0      NA     4.30         PA
## 49.0.1   49     0  -2.25     2  0   1    0      0      NA     4.45         PA
## 506.0.1 506     0   2.00     2  0   1    0      0      NA     5.00         PA
## 56.0.1   56     0 -14.00     4  0   0    0      1     5.0     5.25         PA
## 61.0.1   61     0  11.50     4  0   0    0      1     5.5     4.75         PA
## 62.0.1   62     0     NA     4  0   0    0      1     5.0       NA         PA
##         time employ_tot
## 46.0.1     1      40.50
## 49.0.1     1      13.75
## 506.0.1    1       8.50
## 56.0.1     1      34.00
## 61.0.1     1      24.00
## 62.0.1     1      20.50
DID_3f <- lm(employ_tot ~ state + time + state*time, data = Fastfood_1)

stargazer(DID_3f, type="text", align=TRUE,
          title="Difference in Difference Regression", 
          dep.var.labels = "Difference, NJ - PA",
          covariate.labels=c("State", "Time", "State * Time"), 
          keep.stat = c("n", "rsq"), omit=c("Constant", "adj.rsq"), 
          table.layout = "=d=t-s=n")
## 
## Difference in Difference Regression
## ========================================
##                  Difference, NJ - PA    
## ========================================
## State                 -2.892**          
##                        (1.194)          
##                                         
## Time                   -2.166           
##                        (1.516)          
##                                         
## State * Time            2.754           
##                        (1.688)          
##                                         
## ----------------------------------------
## Observations             794            
## R2                      0.007           
## ========================================
## Note:        *p<0.1; **p<0.05; ***p<0.01


The obtained result is slightly larger in magnitude compared to the “first-differenced” OLS estimate from Part 2 Question 3 but closer to the the DiD estimate from the paper. However, the estimate is not statistically significant.