Using Panel data: Difference-in-Differences Impact of Minimum Wages

Card and Krueger(1994) Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania AER 84(4): 772-793.

Part 1: Reading and questions

Briefly answer these questions:

a.What is the causal link the paper is trying to reveal?
The paper is trying to find the effects of minimum wage on establishment-level employment outcomes in New Jersey and Pennsylvania.

b.What would be the ideal experiment to test this causal link?
The idea experiment to test this causal link would be comparing employment, wages, and prices at stores before and after the increase of minimum wage.Within the New Jersey, comparisons could be made between initially high-wage stores and other stores.

c.What is the identification strategy?
The identification strategy is the Difference-in-differences estimation before and after the increase in New Jersey’s minimum wage.

d.What are the assumptions/threats to this identification strategy?
The paper used stores of eastern Pennsylvania as a control group for comparison. One of the threat to the difference-in-difference estimation would be the sample size before and after, but the paper was able to address the threat.

Part 2: Replication Analysis

a.Load data from Card and Krueger AER 1994.

#Load dataset
df <- read.csv("CardKrueger1994_fastfood.csv")
head(df)
##    id state emptot emptot2   demp chain bk kfc roys wendys wage_st wage_st2
## 1  46     0  40.50    24.0 -16.50     1  1   0    0      0      NA     4.30
## 2  49     0  13.75    11.5  -2.25     2  0   1    0      0      NA     4.45
## 3 506     0   8.50    10.5   2.00     2  0   1    0      0      NA     5.00
## 4  56     0  34.00    20.0 -14.00     4  0   0    0      1     5.0     5.25
## 5  61     0  24.00    35.5  11.50     4  0   0    0      1     5.5     4.75
## 6  62     0  20.50      NA     NA     4  0   0    0      1     5.0       NA
summary(df)
##        id            state            emptot         emptot2     
##  Min.   :  1.0   Min.   :0.0000   Min.   : 5.00   Min.   : 0.00  
##  1st Qu.:119.2   1st Qu.:1.0000   1st Qu.:14.56   1st Qu.:14.50  
##  Median :237.5   Median :1.0000   Median :19.50   Median :20.50  
##  Mean   :246.5   Mean   :0.8073   Mean   :21.00   Mean   :21.05  
##  3rd Qu.:371.8   3rd Qu.:1.0000   3rd Qu.:24.50   3rd Qu.:26.50  
##  Max.   :522.0   Max.   :1.0000   Max.   :85.00   Max.   :60.50  
##                                   NA's   :12      NA's   :14     
##       demp               chain             bk              kfc        
##  Min.   :-41.50000   Min.   :1.000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.: -4.00000   1st Qu.:1.000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :  0.00000   Median :2.000   Median :0.0000   Median :0.0000  
##  Mean   : -0.07044   Mean   :2.117   Mean   :0.4171   Mean   :0.1951  
##  3rd Qu.:  4.00000   3rd Qu.:3.000   3rd Qu.:1.0000   3rd Qu.:0.0000  
##  Max.   : 34.00000   Max.   :4.000   Max.   :1.0000   Max.   :1.0000  
##  NA's   :26                                                           
##       roys            wendys          wage_st         wage_st2    
##  Min.   :0.0000   Min.   :0.0000   Min.   :4.250   Min.   :4.250  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:4.250   1st Qu.:5.050  
##  Median :0.0000   Median :0.0000   Median :4.500   Median :5.050  
##  Mean   :0.2415   Mean   :0.1463   Mean   :4.616   Mean   :4.996  
##  3rd Qu.:0.0000   3rd Qu.:0.0000   3rd Qu.:4.950   3rd Qu.:5.050  
##  Max.   :1.0000   Max.   :1.0000   Max.   :5.750   Max.   :6.250  
##                                    NA's   :20      NA's   :21

b.Verify that the data is correct.

#Data verification
library(dplyr)
by_store_fte <- t(df %>%       #data transpose
  group_by(state)%>%          #group by state
  summarise(across(c(bk, kfc, roys, wendys, emptot, emptot2), list(mean = mean),       na.rm = TRUE)))           #store average by state
colnames(by_store_fte) <- c("PA", "NJ")      #name to column
by_store_fte1 <- round(by_store_fte[1:5, ], 3)*100     #percentage for store and round off to 3 digit
by_store_fte2 <- round(by_store_fte[6:7, ], 2)    #round off FTE to 2 digit     
newCount <- rbind(by_store_fte1, by_store_fte2)   #combine 2 dataset by row
newCount <- newCount[-1, ]
rownames(newCount) <- c("Burger King", "KFC", "Roy Rogers", "Wendy's",
                            "FTE employment1", "FTE employment2")
newCount
##                    PA    NJ
## Burger King     44.30 41.10
## KFC             15.20 20.50
## Roy Rogers      21.50 24.80
## Wendy's         19.00 13.60
## FTE employment1 23.33 20.44
## FTE employment2 21.17 21.03

c.Use OLS to obtain their Diff-in Diff estimator.

#OLS
modols <- lm(demp ~ state, data = df)
stargazer(modols, type = "text", title = "TABLE 3 output from OLS", align = TRUE, keep.stat = c("n","rsq"), dep.var.labels = c("Difference, NJ - PA"), covariate.labels = c("State"))
## 
## TABLE 3 output from OLS
## ========================================
##                  Dependent variable:    
##              ---------------------------
##                  Difference, NJ - PA    
## ----------------------------------------
## State                  2.750**          
##                        (1.154)          
##                                         
## Constant              -2.283**          
##                        (1.036)          
##                                         
## ----------------------------------------
## Observations             384            
## R2                      0.015           
## ========================================
## Note:        *p<0.1; **p<0.05; ***p<0.01

\(\hat{\beta} = 2.75\) and is significant at 5% significance level. If we compare our estimate with that of paper, they are almost equal.The positive estimate indicates that after the change in minimum wage, full time employment (FTE) increased in New Jersey.

d.What would be the equation of a standard “difference in difference” regression?
\[FTE_{i,t} = \alpha+\beta1(state_i)+\beta2(time_{i,t})+\beta3(state_i *time_{i,t})+\epsilon_{i,t}\]
where, \(state_i\) is a group dummy (NJ vs PA).
\(time_{i, t}\) is a period dummy (before vs. after).

Part 3: Optional Questions

e.Run the regression you wrote up in part d.

# Reshape the data
totEmp <- melt(cbind(df$emptot, df$emptot2))
time <- c(rep(0, length(df$emptot)), rep(1, length(df$emptot2))) # 1 after treatment, 0 otherwise
totState <- df$state

#create new dataset
new_df <- data.frame(cbind(totEmp[, 3], totState, time))
colnames(new_df) <- c("emptot", "state", "time")

#Run DiD model
modDiD <- lm(emptot ~ state + time + state*time, data = new_df)
stargazer(modDiD, type = "text", title = "TABLE 3 output from Difference-In-Differences",
          align = TRUE, keep.stat = c("n","rsq"),
          dep.var.labels = c("Difference, NJ - PA"), 
          covariate.labels =      c("State", "Treatment time"))
## 
## TABLE 3 output from Difference-In-Differences
## ==========================================
##                    Dependent variable:    
##                ---------------------------
##                    Difference, NJ - PA    
## ------------------------------------------
## State                   -2.892**          
##                          (1.194)          
##                                           
## Treatment time           -2.166           
##                          (1.516)          
##                                           
## state:time                2.754           
##                          (1.688)          
##                                           
## Constant                23.331***         
##                          (1.072)          
##                                           
## ------------------------------------------
## Observations               794            
## R2                        0.007           
## ==========================================
## Note:          *p<0.1; **p<0.05; ***p<0.01

The estimate (\(\hat{\beta_3} = 2.75\)) is similar to that of paper, however, it is not statistically significant.

f.Compute the difference-in-differences estimator “by hand”. Don’t use a regression.

newCount    #using FTEmeans from part2(b)
##                    PA    NJ
## Burger King     44.30 41.10
## KFC             15.20 20.50
## Roy Rogers      21.50 24.80
## Wendy's         19.00 13.60
## FTE employment1 23.33 20.44
## FTE employment2 21.17 21.03
FTE <- newCount[5:6,]
diffMean <- FTE[, 1] - FTE[, 2] 
#Results = 2.89, 0.14
DiD <- 2.89 - 0.14
DiD
## [1] 2.75