Using Panel data: Difference-in-Differences Impact of Minimum Wages

Paper-1 : David Card and Alan Krueger Card and Krueger (1994) Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania AER 84(4): 772-793.

Download and go over this seminal paper by David Card and Alan Krueger Card and Krueger (1994) Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania AER 84(4): 772-793. Be careful: They released a 2000 follow up with the exact same title followed by “:Reply”. We want the original, not the follow-up.

Sub-Part 1.1. Answers to brief questions:

1.1.a. What is the causal link the paper is trying to reveal?

The causal link is the paper is to present the new evidence on the effect of minimum wages on establishment-level employment outcomes. The authors compare employment levels in fast-food restaurants in New Jersey and Pennsylvania (its neighboring state without similar policy change) before and after the minimum wage shifts from $4.25 to $5.05 per hour in New Jersey.

1.1.b. What would be the ideal experiment to test this causal link?

The ideal experiment to test this causal link the authors used is data collection in New Jersey among fast-food restaurants where the new minmum wage increased and using a neighboring State (Pennslyvania) with no increase in minimum wage as control variables. Data on different variables (such as wages,prices at store,etc.) were collected before the increment in minimum wage and after the increment in New Jersey. Also data covered 410 fast-food restaurants in New Jersey and Pennslynia. First wave of data was collected in between February 15 and March 4 1992(period before the new increased minimum wage) and the second wave is between November 5 and December 31, 1992 (period after the new increased minimum wages)

1.1.c. What is the identification strategy?

The authors use a difference-in-difference approach as the identification strategy and estimate the impact at the micro level. They compare employment levels in fast-food restaurants in New Jersey and Pennsylvania (its neighboring state without similar policy change) before and after the minimum wage shifts from $4.25 to $5.05 per hour in New Jersey. The model dependent variables is the change in employment from wave 1 to wave 2 at a particular store and the explanatory variables is set of chracteristics of stores and dummy variable that equals 1 for stores in New Jersey. For another equation with the same dependent variable, the explanatory variables are; the set of charateristics of stores and an alaternative measure of the impcat of the minimum wage at a certain store.

1.1.d. What are the assumptions / threats to this identification strategy?

The difference-in-difference method faces a central threat known as the parallel assumption - that is, employment levels in New Jersey and Pennsylvania would follow the same time trend in the absence of the minimum wage policy change.

This identification strategy also assumes that fast-food restaurants in Pennsylvania are the best counterfactuals for those in New Jersey. Of the most importance, the extent of firm competitiveness in the food industry is assumed to be the same in the two states.

Sub-Part 1.2. Replication analysis:

1.2.a. Load Ashenfelter and Krueger AER 1994 data.

# Loading packages
knitr::opts_chunk$set(echo = TRUE, eval=TRUE, message=FALSE, warning=FALSE, fig.height=4)
necessaryPackages <- c("foreign","reshape","rvest","tidyverse","dplyr","stringr","ggplot2", "stargazer","readr")
new.packages <- necessaryPackages[
              !(necessaryPackages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages)
lapply(necessaryPackages, require, character.only = TRUE)
## Loading required package: foreign
## Loading required package: reshape
## Loading required package: rvest
## Loading required package: xml2
## Loading required package: tidyverse
## ── Attaching packages ────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.0     ✓ purrr   0.3.3
## ✓ tibble  3.0.0     ✓ dplyr   1.0.2
## ✓ tidyr   1.0.2     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.5.0
## ── Conflicts ───────────────────────────────────────── tidyverse_conflicts() ──
## x tidyr::expand()         masks reshape::expand()
## x dplyr::filter()         masks stats::filter()
## x readr::guess_encoding() masks rvest::guess_encoding()
## x dplyr::lag()            masks stats::lag()
## x purrr::pluck()          masks rvest::pluck()
## x dplyr::rename()         masks reshape::rename()
## Loading required package: stargazer
## 
## Please cite as:
##  Hlavac, Marek (2018). stargazer: Well-Formatted Regression and Summary Statistics Tables.
##  R package version 5.2.2. https://CRAN.R-project.org/package=stargazer
## [[1]]
## [1] TRUE
## 
## [[2]]
## [1] TRUE
## 
## [[3]]
## [1] TRUE
## 
## [[4]]
## [1] TRUE
## 
## [[5]]
## [1] TRUE
## 
## [[6]]
## [1] TRUE
## 
## [[7]]
## [1] TRUE
## 
## [[8]]
## [1] TRUE
## 
## [[9]]
## [1] TRUE
#Load dataset for Card and Kruger 
library(readr)
df_minimumwage <- read_csv("/Users/twinkleroy/Downloads/CardKrueger1994_fastfood.csv")
## Parsed with column specification:
## cols(
##   id = col_double(),
##   state = col_double(),
##   emptot = col_double(),
##   emptot2 = col_double(),
##   demp = col_double(),
##   chain = col_double(),
##   bk = col_double(),
##   kfc = col_double(),
##   roys = col_double(),
##   wendys = col_double(),
##   wage_st = col_double(),
##   wage_st2 = col_double()
## )
df_minimumwage
## # A tibble: 410 x 12
##       id state emptot emptot2   demp chain    bk   kfc  roys wendys wage_st
##    <dbl> <dbl>  <dbl>   <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>   <dbl>
##  1    46     0   40.5    24   -16.5      1     1     0     0      0   NA   
##  2    49     0   13.8    11.5  -2.25     2     0     1     0      0   NA   
##  3   506     0    8.5    10.5   2        2     0     1     0      0   NA   
##  4    56     0   34      20   -14        4     0     0     0      1    5   
##  5    61     0   24      35.5  11.5      4     0     0     0      1    5.5 
##  6    62     0   20.5    NA    NA        4     0     0     0      1    5   
##  7   445     0   70.5    29   -41.5      1     1     0     0      0    5   
##  8   451     0   23.5    36.5  13        1     1     0     0      0    5   
##  9   455     0   11      11     0        2     0     1     0      0    5.25
## 10   458     0    9       8.5  -0.5      2     0     1     0      0    5   
## # … with 400 more rows, and 1 more variable: wage_st2 <dbl>
head(df_minimumwage)
## # A tibble: 6 x 12
##      id state emptot emptot2   demp chain    bk   kfc  roys wendys wage_st
##   <dbl> <dbl>  <dbl>   <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>   <dbl>
## 1    46     0   40.5    24   -16.5      1     1     0     0      0    NA  
## 2    49     0   13.8    11.5  -2.25     2     0     1     0      0    NA  
## 3   506     0    8.5    10.5   2        2     0     1     0      0    NA  
## 4    56     0   34      20   -14        4     0     0     0      1     5  
## 5    61     0   24      35.5  11.5      4     0     0     0      1     5.5
## 6    62     0   20.5    NA    NA        4     0     0     0      1     5  
## # … with 1 more variable: wage_st2 <dbl>
summary(df_minimumwage)
##        id            state            emptot         emptot2     
##  Min.   :  1.0   Min.   :0.0000   Min.   : 5.00   Min.   : 0.00  
##  1st Qu.:119.2   1st Qu.:1.0000   1st Qu.:14.56   1st Qu.:14.50  
##  Median :237.5   Median :1.0000   Median :19.50   Median :20.50  
##  Mean   :246.5   Mean   :0.8073   Mean   :21.00   Mean   :21.05  
##  3rd Qu.:371.8   3rd Qu.:1.0000   3rd Qu.:24.50   3rd Qu.:26.50  
##  Max.   :522.0   Max.   :1.0000   Max.   :85.00   Max.   :60.50  
##                                   NA's   :12      NA's   :14     
##       demp               chain             bk              kfc        
##  Min.   :-41.50000   Min.   :1.000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.: -4.00000   1st Qu.:1.000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :  0.00000   Median :2.000   Median :0.0000   Median :0.0000  
##  Mean   : -0.07044   Mean   :2.117   Mean   :0.4171   Mean   :0.1951  
##  3rd Qu.:  4.00000   3rd Qu.:3.000   3rd Qu.:1.0000   3rd Qu.:0.0000  
##  Max.   : 34.00000   Max.   :4.000   Max.   :1.0000   Max.   :1.0000  
##  NA's   :26                                                           
##       roys            wendys          wage_st         wage_st2    
##  Min.   :0.0000   Min.   :0.0000   Min.   :4.250   Min.   :4.250  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:4.250   1st Qu.:5.050  
##  Median :0.0000   Median :0.0000   Median :4.500   Median :5.050  
##  Mean   :0.2415   Mean   :0.1463   Mean   :4.616   Mean   :4.996  
##  3rd Qu.:0.0000   3rd Qu.:0.0000   3rd Qu.:4.950   3rd Qu.:5.050  
##  Max.   :1.0000   Max.   :1.0000   Max.   :5.750   Max.   :6.250  
##                                    NA's   :20      NA's   :21

1.2.b. Verify that the data is correct Reproduce the % of Burger King, KFC, Roys, and Wendys, as well as the FTE means in the 2 waves. (Note: This is just to force you to do a summary stats table with R. I used group_by then %>% then summarize. I’m sure some of you will find better ways to do it.)

###Distribution of Store Types (percentages by state) and 

panel1Variables <- df_minimumwage %>% group_by(state) %>% 
  summarize(a.BurgerKing = sum(bk), b.KFC = sum(kfc), c.RoyRogers = sum(roys), d.Wendys =
  sum(wendys))

table2Panel1 <- as.matrix(panel1Variables) 
table2Panel1 <- prop.table(t(table2Panel1[,-1]), margin=2)*100  

colnames(table2Panel1) <- c("PA", "NJ")
print(table2Panel1)
##                    PA       NJ
## a.BurgerKing 44.30380 41.08761
## b.KFC        15.18987 20.54381
## c.RoyRogers  21.51899 24.77341
## d.Wendys     18.98734 13.59517
##FTE employment (means by wave and state)


panel2Variables <- df_minimumwage %>% group_by(state) %>% 
  summarize(FTEemploymentWave1= mean(emptot, na.rm = TRUE), 
            FTEemploymentWave2 = mean(emptot2,na.rm = TRUE))

table2Panel2 <- as.matrix(panel2Variables)
table2Panel2 <- table2Panel2[,-1]

colnames(table2Panel2) <- c("FTE employment (Wave 1)", "FTE employment (Wave 2)")
rownames(table2Panel2) <- c("PA", "NJ")
table2Panel2 <- t(table2Panel2)
print(table2Panel2)
##                               PA       NJ
## FTE employment (Wave 1) 23.33117 20.43941
## FTE employment (Wave 2) 21.16558 21.02743

1.2.c. Use OLS to obtain their Diff-in-diff estimator (almost – you won’t get it exactly) Comment on how your OLS compared to the DiD estimate in Table 3 of the paper

To compute this estimate accurately, incomplete cases of the difference variable “demp” would be discarded.

dataNodempNA <- df_minimumwage[complete.cases(df_minimumwage$demp),]
olsmodel <- lm(demp ~ state, dataNodempNA)
stargazer(olsmodel, type = "html")
Dependent variable:
demp
state 2.750**
(1.154)
Constant -2.283**
(1.036)
Observations 384
R2 0.015
Adjusted R2 0.012
Residual Std. Error 8.968 (df = 382)
F Statistic 5.675** (df = 1; 382)
Note: p<0.1; p<0.05; p<0.01

Given the table of FTE employment means by wave and state above, the difference-in-differences estimate (DD) can be calculated in the following steps:

  • Wave 2 Difference: D2=(20.89725-21.09667) is the post-policy change difference in FTE employment means between the treated state (NJ) and the control state (PA);

  • Wave 1 Difference: D1=(20.43058-23.38000) is the pre-policy change difference in FTE employment means between the treated state (NJ) and the control state (PA);

This coefficient means that the relatives gain after the increase in minimum wage for New Jersey stores compare to Pennslyvania stores (“the difference-in-differences” of the changes in employment for the subset of stores with available employment data in wave 1 and wave 2) is 2.75 FTE employees Since the number is positif, at face value, the policy change has not led employers to cut employment. However, the standard error of this mean is needed to see if the estimated impact is statistically significant. The OLS result also shows that the impact estimate is significant at the 5% level of significance. Also, I did not get the standard error close to the one on the paper.

2.2.d. What would be the equation of a standard “difference in difference” regression? Just write down the equation.

\[ \Delta FTEemployment= FTEemployment_{NJ}-FTEemployment_{PA} \]

The equation of the OLS regression to obtain the same Diff-in-diff estimate DD is shown in Equation, where above equation and state is the indicator for NJ.

The equation of the appropriate DiD regression is shown , where the restaurant is indexed by i, the period is indexed by t, and post is the indicator for post-policy change. The coefficient of interest is

\[\begin{equation} \tag{1} FTEemployment_{i,t}=\alpha + \beta state_{i} + \gamma post_{i,t} + \delta interaction_{i,t} + \varepsilon_{i,t}. \end{equation}\]