Download and go over this seminal paper by David Card and Alan Krueger Card and Krueger (1994) Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania AER 84(4): 772-793. Be careful: They released a 2000 follow up with the exact same title followed by “:Reply”. We want the original, not the follow-up.
1.1.a. What is the causal link the paper is trying to reveal?
The causal link is the paper is to present the new evidence on the effect of minimum wages on establishment-level employment outcomes. The authors compare employment levels in fast-food restaurants in New Jersey and Pennsylvania (its neighboring state without similar policy change) before and after the minimum wage shifts from $4.25 to $5.05 per hour in New Jersey.
1.1.b. What would be the ideal experiment to test this causal link?
The ideal experiment to test this causal link the authors used is data collection in New Jersey among fast-food restaurants where the new minmum wage increased and using a neighboring State (Pennslyvania) with no increase in minimum wage as control variables. Data on different variables (such as wages,prices at store,etc.) were collected before the increment in minimum wage and after the increment in New Jersey. Also data covered 410 fast-food restaurants in New Jersey and Pennslynia. First wave of data was collected in between February 15 and March 4 1992(period before the new increased minimum wage) and the second wave is between November 5 and December 31, 1992 (period after the new increased minimum wages)
1.1.c. What is the identification strategy?
The authors use a difference-in-difference approach as the identification strategy and estimate the impact at the micro level. They compare employment levels in fast-food restaurants in New Jersey and Pennsylvania (its neighboring state without similar policy change) before and after the minimum wage shifts from $4.25 to $5.05 per hour in New Jersey. The model dependent variables is the change in employment from wave 1 to wave 2 at a particular store and the explanatory variables is set of chracteristics of stores and dummy variable that equals 1 for stores in New Jersey. For another equation with the same dependent variable, the explanatory variables are; the set of charateristics of stores and an alaternative measure of the impcat of the minimum wage at a certain store.
1.1.d. What are the assumptions / threats to this identification strategy?
The difference-in-difference method faces a central threat known as the parallel assumption - that is, employment levels in New Jersey and Pennsylvania would follow the same time trend in the absence of the minimum wage policy change.
This identification strategy also assumes that fast-food restaurants in Pennsylvania are the best counterfactuals for those in New Jersey. Of the most importance, the extent of firm competitiveness in the food industry is assumed to be the same in the two states.
1.2.a. Load Ashenfelter and Krueger AER 1994 data.
# Loading packages
knitr::opts_chunk$set(echo = TRUE, eval=TRUE, message=FALSE, warning=FALSE, fig.height=4)
necessaryPackages <- c("foreign","reshape","rvest","tidyverse","dplyr","stringr","ggplot2", "stargazer","readr")
new.packages <- necessaryPackages[
!(necessaryPackages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages)
lapply(necessaryPackages, require, character.only = TRUE)
## Loading required package: foreign
## Loading required package: reshape
## Loading required package: rvest
## Loading required package: xml2
## Loading required package: tidyverse
## ── Attaching packages ────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.0 ✓ purrr 0.3.3
## ✓ tibble 3.0.0 ✓ dplyr 1.0.2
## ✓ tidyr 1.0.2 ✓ stringr 1.4.0
## ✓ readr 1.3.1 ✓ forcats 0.5.0
## ── Conflicts ───────────────────────────────────────── tidyverse_conflicts() ──
## x tidyr::expand() masks reshape::expand()
## x dplyr::filter() masks stats::filter()
## x readr::guess_encoding() masks rvest::guess_encoding()
## x dplyr::lag() masks stats::lag()
## x purrr::pluck() masks rvest::pluck()
## x dplyr::rename() masks reshape::rename()
## Loading required package: stargazer
##
## Please cite as:
## Hlavac, Marek (2018). stargazer: Well-Formatted Regression and Summary Statistics Tables.
## R package version 5.2.2. https://CRAN.R-project.org/package=stargazer
## [[1]]
## [1] TRUE
##
## [[2]]
## [1] TRUE
##
## [[3]]
## [1] TRUE
##
## [[4]]
## [1] TRUE
##
## [[5]]
## [1] TRUE
##
## [[6]]
## [1] TRUE
##
## [[7]]
## [1] TRUE
##
## [[8]]
## [1] TRUE
##
## [[9]]
## [1] TRUE
#Load dataset for Card and Kruger
library(readr)
df_minimumwage <- read_csv("/Users/twinkleroy/Downloads/CardKrueger1994_fastfood.csv")
## Parsed with column specification:
## cols(
## id = col_double(),
## state = col_double(),
## emptot = col_double(),
## emptot2 = col_double(),
## demp = col_double(),
## chain = col_double(),
## bk = col_double(),
## kfc = col_double(),
## roys = col_double(),
## wendys = col_double(),
## wage_st = col_double(),
## wage_st2 = col_double()
## )
df_minimumwage
## # A tibble: 410 x 12
## id state emptot emptot2 demp chain bk kfc roys wendys wage_st
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 46 0 40.5 24 -16.5 1 1 0 0 0 NA
## 2 49 0 13.8 11.5 -2.25 2 0 1 0 0 NA
## 3 506 0 8.5 10.5 2 2 0 1 0 0 NA
## 4 56 0 34 20 -14 4 0 0 0 1 5
## 5 61 0 24 35.5 11.5 4 0 0 0 1 5.5
## 6 62 0 20.5 NA NA 4 0 0 0 1 5
## 7 445 0 70.5 29 -41.5 1 1 0 0 0 5
## 8 451 0 23.5 36.5 13 1 1 0 0 0 5
## 9 455 0 11 11 0 2 0 1 0 0 5.25
## 10 458 0 9 8.5 -0.5 2 0 1 0 0 5
## # … with 400 more rows, and 1 more variable: wage_st2 <dbl>
head(df_minimumwage)
## # A tibble: 6 x 12
## id state emptot emptot2 demp chain bk kfc roys wendys wage_st
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 46 0 40.5 24 -16.5 1 1 0 0 0 NA
## 2 49 0 13.8 11.5 -2.25 2 0 1 0 0 NA
## 3 506 0 8.5 10.5 2 2 0 1 0 0 NA
## 4 56 0 34 20 -14 4 0 0 0 1 5
## 5 61 0 24 35.5 11.5 4 0 0 0 1 5.5
## 6 62 0 20.5 NA NA 4 0 0 0 1 5
## # … with 1 more variable: wage_st2 <dbl>
summary(df_minimumwage)
## id state emptot emptot2
## Min. : 1.0 Min. :0.0000 Min. : 5.00 Min. : 0.00
## 1st Qu.:119.2 1st Qu.:1.0000 1st Qu.:14.56 1st Qu.:14.50
## Median :237.5 Median :1.0000 Median :19.50 Median :20.50
## Mean :246.5 Mean :0.8073 Mean :21.00 Mean :21.05
## 3rd Qu.:371.8 3rd Qu.:1.0000 3rd Qu.:24.50 3rd Qu.:26.50
## Max. :522.0 Max. :1.0000 Max. :85.00 Max. :60.50
## NA's :12 NA's :14
## demp chain bk kfc
## Min. :-41.50000 Min. :1.000 Min. :0.0000 Min. :0.0000
## 1st Qu.: -4.00000 1st Qu.:1.000 1st Qu.:0.0000 1st Qu.:0.0000
## Median : 0.00000 Median :2.000 Median :0.0000 Median :0.0000
## Mean : -0.07044 Mean :2.117 Mean :0.4171 Mean :0.1951
## 3rd Qu.: 4.00000 3rd Qu.:3.000 3rd Qu.:1.0000 3rd Qu.:0.0000
## Max. : 34.00000 Max. :4.000 Max. :1.0000 Max. :1.0000
## NA's :26
## roys wendys wage_st wage_st2
## Min. :0.0000 Min. :0.0000 Min. :4.250 Min. :4.250
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:4.250 1st Qu.:5.050
## Median :0.0000 Median :0.0000 Median :4.500 Median :5.050
## Mean :0.2415 Mean :0.1463 Mean :4.616 Mean :4.996
## 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:4.950 3rd Qu.:5.050
## Max. :1.0000 Max. :1.0000 Max. :5.750 Max. :6.250
## NA's :20 NA's :21
1.2.b. Verify that the data is correct Reproduce the % of Burger King, KFC, Roys, and Wendys, as well as the FTE means in the 2 waves. (Note: This is just to force you to do a summary stats table with R. I used group_by then %>% then summarize. I’m sure some of you will find better ways to do it.)
###Distribution of Store Types (percentages by state) and
panel1Variables <- df_minimumwage %>% group_by(state) %>%
summarize(a.BurgerKing = sum(bk), b.KFC = sum(kfc), c.RoyRogers = sum(roys), d.Wendys =
sum(wendys))
table2Panel1 <- as.matrix(panel1Variables)
table2Panel1 <- prop.table(t(table2Panel1[,-1]), margin=2)*100
colnames(table2Panel1) <- c("PA", "NJ")
print(table2Panel1)
## PA NJ
## a.BurgerKing 44.30380 41.08761
## b.KFC 15.18987 20.54381
## c.RoyRogers 21.51899 24.77341
## d.Wendys 18.98734 13.59517
##FTE employment (means by wave and state)
panel2Variables <- df_minimumwage %>% group_by(state) %>%
summarize(FTEemploymentWave1= mean(emptot, na.rm = TRUE),
FTEemploymentWave2 = mean(emptot2,na.rm = TRUE))
table2Panel2 <- as.matrix(panel2Variables)
table2Panel2 <- table2Panel2[,-1]
colnames(table2Panel2) <- c("FTE employment (Wave 1)", "FTE employment (Wave 2)")
rownames(table2Panel2) <- c("PA", "NJ")
table2Panel2 <- t(table2Panel2)
print(table2Panel2)
## PA NJ
## FTE employment (Wave 1) 23.33117 20.43941
## FTE employment (Wave 2) 21.16558 21.02743
1.2.c. Use OLS to obtain their Diff-in-diff estimator (almost – you won’t get it exactly) Comment on how your OLS compared to the DiD estimate in Table 3 of the paper
To compute this estimate accurately, incomplete cases of the difference variable “demp” would be discarded.
dataNodempNA <- df_minimumwage[complete.cases(df_minimumwage$demp),]
olsmodel <- lm(demp ~ state, dataNodempNA)
stargazer(olsmodel, type = "html")
| Dependent variable: | |
| demp | |
| state | 2.750** |
| (1.154) | |
| Constant | -2.283** |
| (1.036) | |
| Observations | 384 |
| R2 | 0.015 |
| Adjusted R2 | 0.012 |
| Residual Std. Error | 8.968 (df = 382) |
| F Statistic | 5.675** (df = 1; 382) |
| Note: | p<0.1; p<0.05; p<0.01 |
Given the table of FTE employment means by wave and state above, the difference-in-differences estimate (DD) can be calculated in the following steps:
Wave 2 Difference: D2=(20.89725-21.09667) is the post-policy change difference in FTE employment means between the treated state (NJ) and the control state (PA);
Wave 1 Difference: D1=(20.43058-23.38000) is the pre-policy change difference in FTE employment means between the treated state (NJ) and the control state (PA);
This coefficient means that the relatives gain after the increase in minimum wage for New Jersey stores compare to Pennslyvania stores (“the difference-in-differences” of the changes in employment for the subset of stores with available employment data in wave 1 and wave 2) is 2.75 FTE employees Since the number is positif, at face value, the policy change has not led employers to cut employment. However, the standard error of this mean is needed to see if the estimated impact is statistically significant. The OLS result also shows that the impact estimate is significant at the 5% level of significance. Also, I did not get the standard error close to the one on the paper.
2.2.d. What would be the equation of a standard “difference in difference” regression? Just write down the equation.
\[ \Delta FTEemployment= FTEemployment_{NJ}-FTEemployment_{PA} \]
The equation of the OLS regression to obtain the same Diff-in-diff estimate DD is shown in Equation, where above equation and state is the indicator for NJ.
The equation of the appropriate DiD regression is shown , where the restaurant is indexed by i, the period is indexed by t, and post is the indicator for post-policy change. The coefficient of interest is
\[\begin{equation} \tag{1} FTEemployment_{i,t}=\alpha + \beta state_{i} + \gamma post_{i,t} + \delta interaction_{i,t} + \varepsilon_{i,t}. \end{equation}\]