Card and Krueger (1994) estimated the effect of higher minimum wage on employment by surveying 410 fast-food restaurants in New Jersey and Pennsylvania.
The ideal experiment would be to compare the employment, wages, and prices at fast-food stores in New Jersey and Pennsylvania (no increase in minimum wage) before and after the rise. Alternatively, comparing initially high-wage paying stores and other stores within New Jersey would also estimate impacts of the new law.
Difference-in-differences was the identification strategy used to construct a sample frame of fast-food restaurants, conduct a telephone survey before the scheduled increase in New Jersey’s minimum wage and then another survey after the minimum-wage increase.
# Read csv file
my_data <- read.csv("CardKrueger1994_fastfood.csv")
head(my_data)
## id state emptot emptot2 demp chain bk kfc roys wendys wage_st wage_st2
## 1 46 0 40.50 24.0 -16.50 1 1 0 0 0 NA 4.30
## 2 49 0 13.75 11.5 -2.25 2 0 1 0 0 NA 4.45
## 3 506 0 8.50 10.5 2.00 2 0 1 0 0 NA 5.00
## 4 56 0 34.00 20.0 -14.00 4 0 0 0 1 5.0 5.25
## 5 61 0 24.00 35.5 11.50 4 0 0 0 1 5.5 4.75
## 6 62 0 20.50 NA NA 4 0 0 0 1 5.0 NA
summary(my_data) # look at the data summary
## id state emptot emptot2
## Min. : 1.0 Min. :0.0000 Min. : 5.00 Min. : 0.00
## 1st Qu.:119.2 1st Qu.:1.0000 1st Qu.:14.56 1st Qu.:14.50
## Median :237.5 Median :1.0000 Median :19.50 Median :20.50
## Mean :246.5 Mean :0.8073 Mean :21.00 Mean :21.05
## 3rd Qu.:371.8 3rd Qu.:1.0000 3rd Qu.:24.50 3rd Qu.:26.50
## Max. :522.0 Max. :1.0000 Max. :85.00 Max. :60.50
## NA's :12 NA's :14
## demp chain bk kfc
## Min. :-41.50000 Min. :1.000 Min. :0.0000 Min. :0.0000
## 1st Qu.: -4.00000 1st Qu.:1.000 1st Qu.:0.0000 1st Qu.:0.0000
## Median : 0.00000 Median :2.000 Median :0.0000 Median :0.0000
## Mean : -0.07044 Mean :2.117 Mean :0.4171 Mean :0.1951
## 3rd Qu.: 4.00000 3rd Qu.:3.000 3rd Qu.:1.0000 3rd Qu.:0.0000
## Max. : 34.00000 Max. :4.000 Max. :1.0000 Max. :1.0000
## NA's :26
## roys wendys wage_st wage_st2
## Min. :0.0000 Min. :0.0000 Min. :4.250 Min. :4.250
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:4.250 1st Qu.:5.050
## Median :0.0000 Median :0.0000 Median :4.500 Median :5.050
## Mean :0.2415 Mean :0.1463 Mean :4.616 Mean :4.996
## 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:4.950 3rd Qu.:5.050
## Max. :1.0000 Max. :1.0000 Max. :5.750 Max. :6.250
## NA's :20 NA's :21
# Percentage calculation: Use dplyr package
library(dplyr)
stores <- t(my_data %>% # transpose because we want stores in a row, not column
group_by(state) %>% # grouping by states
summarise(across(c(bk, kfc, roys, wendys), list(mean = mean)))) # gives average of all stores
colnames(stores) <- c("PA", "NJ") # provide column names
stores <- round(stores[-1,], 3) * 100 # remove first row and calculate percentage by *100
rownames(stores) <- c("Burger King", "KFC", "Roy Rogers", "Wendy's") # change row names
stores
## PA NJ
## Burger King 44.3 41.1
## KFC 15.2 20.5
## Roy Rogers 21.5 24.8
## Wendy's 19.0 13.6
# Mean of FTE
fte <- t(my_data %>% # transpose because we want fte in a row, not column
group_by(state) %>% # grouping by states
summarise(across(c(emptot, emptot2), list(mean = mean), na.rm = TRUE))) # gives average of FTE
colnames(fte) <- c("PA", "NJ") # provide column names
fte <- round(fte[-1,], 1) # remove first row and calculate percentage by *100
rownames(fte) <- c("FTE employment1", "FTE employment2") # change row names
fte
## PA NJ
## FTE employment1 23.3 20.4
## FTE employment2 21.2 21.0
# OLS estimation to obtain DiD estimator
# Regress difference variable with state
mod <- lm(demp ~ state, data = my_data)
# Create a table with stargazer package
library(stargazer)
stargazer(mod, type = "text", title = "TABLE 3 output from OLS", align = TRUE, keep.stat = c("n","rsq"), dep.var.labels = c("Difference, NJ - PA"), covariate.labels = c("State")) # Display sample size and R-squared
##
## TABLE 3 output from OLS
## ========================================
## Dependent variable:
## ---------------------------
## Difference, NJ - PA
## ----------------------------------------
## State 2.750**
## (1.154)
##
## Constant -2.283**
## (1.036)
##
## ----------------------------------------
## Observations 384
## R2 0.015
## ========================================
## Note: *p<0.1; **p<0.05; ***p<0.01
The output shows that \(\hat{\beta} = 2.75\), which is statistically significant at 5% significance level. This estimate is similar to that of Table 3 in the article. This means, the full time employment increased in NJ after the changes in the minimum wage.
We need to create a dummy variable to indicate treatment time (0 for before treatment and 1 for after treatment) and then the model can be expressed as:
\[
FTEemployment_{i,t} = \alpha + \beta(state_i) + \gamma(time_{i,t}) + \delta(state_i*time_{i,t}) + \epsilon_{i,t}
\]
# DiD regression
# Reshape data first using the melt function from the reshape package
library(reshape)
emptot <- melt(cbind(my_data$emptot, my_data$emptot2))
state <- melt(cbind(my_data$state, my_data$state))
time <- c(rep(0, length(my_data$emptot)), rep(1, length(my_data$emptot2))) #0 for before treatment and 1 for after treatment
# Create new data
my_newdata <- data.frame(cbind(emptot[,3], state[,3], time)) # create new data frame
# Give variable names to the data frame
colnames(my_newdata) <- c("emptot", "state", "time") # renaming the column names
# Run the DiD model for this new dataset
mod1 <- lm(emptot ~ state + time + state*time, data=my_newdata) # Note that there is an interaction term
stargazer(mod1, type = "text", title = "TABLE 3 output from Difference-In-Differences",
align = TRUE, keep.stat = c("n","rsq"),
dep.var.labels = c("Difference, NJ - PA"), covariate.labels = c("State", "Tretament time")) # Display sample size and R-squared
##
## TABLE 3 output from Difference-In-Differences
## ==========================================
## Dependent variable:
## ---------------------------
## Difference, NJ - PA
## ------------------------------------------
## State -2.892**
## (1.194)
##
## Tretament time -2.166
## (1.516)
##
## state:time 2.754
## (1.688)
##
## Constant 23.331***
## (1.072)
##
## ------------------------------------------
## Observations 794
## R2 0.007
## ==========================================
## Note: *p<0.1; **p<0.05; ***p<0.01
The output shows \(\hat{\delta} = 2.75\) that matches the output in the article. However, it is not statistically significant.
To do this, first I used the code from part b (i.e. verify that the data is correct), where I found mean of FTE in wave 1 and wave 2. Then, I took the difference between means within each wave. Next, I took the difference between the values obtained from previous step, which is our parameter of interest.
# DiD by hand (wo regression)
# Mean of FTE
fte <- t(my_data %>% # transpose because we want fte in a row, not column
group_by(state) %>% # grouping by states
summarise(across(c(emptot, emptot2), list(mean = mean), na.rm = TRUE))) # gives average of FTE
colnames(fte) <- c("PA", "NJ") # provide column names
fte <- round(fte[-1,], 2)
rownames(fte) <- c("FTEemployment1", "FTEemployment2") # change row names
fte
## PA NJ
## FTEemployment1 23.33 20.44
## FTEemployment2 21.17 21.03
# Take mean difference
MeanDiff <- fte[, 1] - fte[, 2]
# Then the difference between wave 1 and wave 2
DiD <- 2.89 - 0.14
DiD
## [1] 2.75
Result shows the estimate = 2.75, which matches the estimate from the article.