Using Panel data: Difference-in-Differences Impact of Minimum Wages

Part 1: Reading and questions

Download and go over this seminal paper by David Card and Alan Krueger. Card and Krueger (1994) Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania AER 84(4): 772-793. Be careful: They released a 2000 follow up with the exact same title followed by “:Reply”. We want the original, not the follow-up.

Briefly answer these questions:

c. What is the identification strategy?

The author uses the difference in difference (DID) method to analyze the data from survey conducted by telephone to 410 fast-food stores in New Jersey and eastern Pennsylvania from the Burger King, KFC, Wendy’s and Roy Rogers chains before and after the minimum wage rise. The author chooses two waves of the survey time: the first wave is a month before minimum wage increase; the second is 8 months after the increase. Then the author uses the DID to get the minimum wage increase effect on employment.

d. What are the assumptions / threats to this identification strategy?

  • fast-food restaurants’ employment can represent the whole employment: The authors thinks that fast-food restaurants is “leading employer of low-wage workers”; it complies with the minimum-wage regulations and response to the rise in minimum wage; and it is reliable measurement to employment; and it easy to build the frame.

  • no anticipation effect: with or without the rise of the minimum wage, it will not affect the workers in New Jersey before they are actually treated with higher minimum wage.

  • parallel trend assumption: without the rise of the minimum wage, the difference employment between New Jersey and eastern Pennsylvania remain the same.

Part 2: Replication Analysis

a. Load data from Card and Krueger AER 1994

library(tidyverse)
fastfood <-read.csv("hw5/CardKrueger1994_fastfood.csv",header=TRUE)
head(fastfood)
##    id state emptot emptot2   demp chain bk kfc roys wendys wage_st wage_st2
## 1  46     0  40.50    24.0 -16.50     1  1   0    0      0      NA     4.30
## 2  49     0  13.75    11.5  -2.25     2  0   1    0      0      NA     4.45
## 3 506     0   8.50    10.5   2.00     2  0   1    0      0      NA     5.00
## 4  56     0  34.00    20.0 -14.00     4  0   0    0      1     5.0     5.25
## 5  61     0  24.00    35.5  11.50     4  0   0    0      1     5.5     4.75
## 6  62     0  20.50      NA     NA     4  0   0    0      1     5.0       NA

b. Verify that the data is correct Reproduce the % of Burger King, KFC, Roys, and Wendys, as well as the FTE means in the 2 waves.

# the % of Burger King, KFC, Roys, and Wendys
library(dplyr)
d <- fastfood %>% group_by(state) %>% summarize(Burger_King =sum(bk), 
                                                KFC = sum(kfc), 
                                                Roys = sum(roys), 
                                                Wendys = sum(wendys), )
d <- as.matrix(d)
d <- d[,-1]
rownames(d) <- c( "eastern Pennsylvania", "New Jersey")
(d/rowSums(d) *100)
##                      Burger_King      KFC     Roys   Wendys
## eastern Pennsylvania    44.30380 15.18987 21.51899 18.98734
## New Jersey              41.08761 20.54381 24.77341 13.59517
# the FTE means in the 2 waves
d1 <- fastfood %>% group_by(state) %>% summarize(First_Wave=mean(emptot,na.rm=TRUE),Second_Wave=mean(emptot2,na.rm=TRUE))
d1 <- as.matrix(d1)    
d1 <- d1[,-1]
rownames(d1) <- c( "eastern Pennsylvania", "New Jersey")
d1
##                      First_Wave Second_Wave
## eastern Pennsylvania   23.33117    21.16558
## New Jersey             20.43941    21.02743

c. Use a “first-differenced” OLS to obtain their Diff-in-diff estimator (almost – you won’t get it exactly) Comment on how your OLS compared to the DiD estimate in Table 3 of the paper.

g <-  lm(demp~state,data = fastfood)
library(stargazer)
stargazer(g,  
          type="text",
          title = "Table 3")
## 
## Table 3
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                                demp            
## -----------------------------------------------
## state                         2.750**          
##                               (1.154)          
##                                                
## Constant                     -2.283**          
##                               (1.036)          
##                                                
## -----------------------------------------------
## Observations                    384            
## R2                             0.015           
## Adjusted R2                    0.012           
## Residual Std. Error      8.968 (df = 382)      
## F Statistic            5.675** (df = 1; 382)   
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01

The results (2.75) from the OLS is similar to the results (2.76) in DID from the paper.

Part 3: Alternative ways of running DiD

d. What would be the equation of a standard “difference in difference” regression? Just write down the equation and briefly explain each coefficient coefficient.

\[ y_{it} = \beta_0 + \beta_1 state_i + \beta_2 D_t + \beta_3 state_i \times D_t + \epsilon_{it}\] Where the dummy variable \(D_t\) represents the treatment in each period t, which is 0 when the wage doesn’t increase and 1 when the wage increases.

e. Compute the difference-in-differences estimator “by hand”

I calculate the difference-in-differences estimator

library(knitr)
did <- data.frame(
  Name = c("PA", "NJ","Diff"),
  FTE_before = c(23.33,20.44,-2.89),
  FTE_after = c(21.17,21.03,-0.14),
  Diff = c(-2.28,0.23,2.75)
)
kable(did,caption = "DID calculation")
DID calculation
Name FTE_before FTE_after Diff
PA 23.33 21.17 -2.28
NJ 20.44 21.03 0.23
Diff -2.89 -0.14 2.75

Then I calculate the DID estimator as below \[ \begin{aligned} & A T T=\left(Y_{1, i, \text { after }}\left|D=1-Y_{1, i, a \text { after }}\right| D=0\right)-\left(Y_{i, \text { before }}\left|D=1-Y_{i, \text { before }}\right| D=0\right)=-0.14 -(-2.89) =2.75 \\ & \text { or }: \\ & A T T=\left(Y_{1, i, \text { after }}\left|D=1-Y_{i, \text { before }}\right| D=1\right)-\left(Y_{1, i, a \text { fter }}\left|D=0-Y_{i, \text { before }}\right| D=0\right)=0.23 -(-2.28)=2.75 \end{aligned} \]

The DID estimator here shows the difference in how two cities (PA and NJ) changed after increasing minimum wage.

f. Run the regression you wrote up in part d

# transfer the wide data into long
library(reshape2)
fastfood2 <- reshape(fastfood,
          idvar= c("id","state"),
             sep= "",
           timevar = "D",
          direction = "long",
          varying = rbind(c("emptot","emptot2"),
                          c("wage_st","wage_st2")) ) 
# regression
g2 <-  lm(emptot ~ state * D ,data = fastfood2)
summary(g2)
## 
## Call:
## lm(formula = emptot ~ state * D, data = fastfood2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -21.166  -6.439  -1.027   4.473  64.561 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   25.497      2.397  10.638   <2e-16 ***
## state         -5.645      2.669  -2.115   0.0347 *  
## D             -2.166      1.516  -1.429   0.1535    
## state:D        2.754      1.688   1.631   0.1033    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.406 on 790 degrees of freedom
##   (26 observations deleted due to missingness)
## Multiple R-squared:  0.007401,   Adjusted R-squared:  0.003632 
## F-statistic: 1.964 on 3 and 790 DF,  p-value: 0.118
library(stargazer)
stargazer(g2,  
          type="text")
## 
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                               emptot           
## -----------------------------------------------
## state                        -5.645**          
##                               (2.669)          
##                                                
## D                             -2.166           
##                               (1.516)          
##                                                
## state:D                        2.754           
##                               (1.688)          
##                                                
## Constant                     25.497***         
##                               (2.397)          
##                                                
## -----------------------------------------------
## Observations                    794            
## R2                             0.007           
## Adjusted R2                    0.004           
## Residual Std. Error      9.406 (df = 790)      
## F Statistic             1.964 (df = 3; 790)    
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01