Using Panel data: Difference-in-Differences Impact of Minimum Wages

Based on Card and Krueger (1994) Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania

Part 1: Reading and questions

a. What is the causal link the paper is trying to reveal?

That between increasing minimum wages and employment in the fast food industry.

b. What would be the ideal experiment to test this link?

Ideally, we would want two identical states with identical working populations where movement between states is not possible. We would then increase the minimum wage in one state while leaving it constant at the other, observing employment levels before and after the change.
Maybe we can simulate it!

c. What is the identification strategy?

Minimum wage changed in one state and not the other. Compute the differences in means of employment levels before and after the change in each state to get effect of minimum wage increase on employment, assuming the economies are otherwise comparable.

d. What are the assumptions/ threats to this identification strategy?

While there is reason to believe that the economies of the two states are similar, it is a big assumption. Any difference in seasonal or local trends between the two states during the course of the study would threaten the validity of the identification strategy. Incomplete information on the opening and closing of stores also poses problems.

Part 2: Replication Analysis

a. Load data from Card and Krueger AER 1994

# Retrieve dataset from the url 
download.file("http://www.mfilipski.com/files/CardKrueger1994_fastfood.csv", "fastFood.csv" )
fastFood <- read.csv("fastFood.csv") 
head(fastFood)
##    id state emptot emptot2   demp chain bk kfc roys wendys wage_st wage_st2
## 1  46     0  40.50    24.0 -16.50     1  1   0    0      0      NA     4.30
## 2  49     0  13.75    11.5  -2.25     2  0   1    0      0      NA     4.45
## 3 506     0   8.50    10.5   2.00     2  0   1    0      0      NA     5.00
## 4  56     0  34.00    20.0 -14.00     4  0   0    0      1     5.0     5.25
## 5  61     0  24.00    35.5  11.50     4  0   0    0      1     5.5     4.75
## 6  62     0  20.50      NA     NA     4  0   0    0      1     5.0       NA


b. Verify that the data is correct.

library(dplyr) # for data manipulation 
library(kableExtra) # for the 'kable' function to make tables  

# to compute and display means 
fastFood %>% 
  group_by(state) %>% 
  summarise(
    "a. Burger King" = sum(bk)/sum(bk, wendys, roys, kfc) * 100,
    "b. KFC" = sum(kfc)/sum(bk, wendys, roys, kfc)* 100,
    "c. Roy Rogers" = sum(roys)/sum(bk, wendys, roys, kfc)* 100,
    "d. Wendy's" = sum(wendys)/sum(bk, wendys, roys, kfc)* 100,
    "Wave 1 FTE " = mean(na.omit(emptot)),  # NAs need to be removed first 
    "Wave 2 FTE " = mean(na.omit(emptot2))) %>%  
  arrange(desc(state)) %>% select(-state) %>%  # sort by state and drop "state"
  t()  %>%   # transpose the  table to mimic the paper
  kable(
    format = "html",
    digits = 1,  
    caption = "Means of Key Variables",
    col.names = c( "NJ", "PA"),
  ) %>% kable_styling()
Means of Key Variables
NJ PA
  1. Burger King
41.1 44.3
  1. KFC
20.5 15.2
  1. Roy Rogers
24.8 21.5
  1. Wendy’s
13.6 19.0
Wave 1 FTE 20.4 23.3
Wave 2 FTE 21.0 21.2


c. Use OLS to obtain their diff-in-diff estimator.

# regress stacked employment columns on state 
diffInDiff <- lm(c(emptot2 - emptot) ~ state, data = fastFood) 

# print table
stargazer::stargazer(diffInDiff, 
                     title = "Difference in Differences Estimate",
                     type = "html", keep.stat = c("n","adj.rsq"),
                     dep.var.caption= "", 
                     dep.var.labels = "Difference, NJ - PA",
                     covariate.labels = "Change in mean FTE employment, balanced sample of stores",
                     omit = "Constant")
Difference in Differences Estimate
Difference, NJ - PA
Change in mean FTE employment, balanced sample of stores 2.750**
(1.154)
Observations 384
Adjusted R2 0.012
Note: p<0.1; p<0.05; p<0.01

We get the same point estimate, however our standard error is smaller than that in the paper.


d. What would be the equation of a standard “difference in difference” regression?

A general form of the “difference in difference” regression may be:

\[Y_{ist} = \alpha + \gamma \times state_s + \lambda \times d_t+ \delta (state_s \times d_t) + \epsilon_{ist}\]

where: \(Y_{ist} =\) \(i\)th observation on outcome of state (or any other variable) \(s\) at time \(t\),
\(state_s =\) dummy that codes for state or (any other variable) under observation,
\(d_t =\) dummy that codes for time, \(=0\) if observation is from before the change, \(=1\) otherwise,
\(\epsilon_{ist}\) is the state and time specific shock

Part 3: Optional Questions

e. Run the regression you wrote up in part d

optionalDiD <- lm(c(emptot, emptot2) ~ c(state,state) + c(rep(0, 410), rep(1, 410)) + c(c(rep(0, 410), rep(1, 410))* state), data = fastFood)

stargazer::stargazer(optionalDiD, 
                     title = "'Standard' Difference in Differences Regression Result",
                     type = "html", keep.stat = c("n","adj.rsq"),
                     dep.var.caption= "", 
                     dep.var.labels = "Difference, NJ - PA",
                     covariate.labels = c("State", "Time", "Diff-in-Diff Estimate"),
                     omit = "Constant")
‘Standard’ Difference in Differences Regression Result
Difference, NJ - PA
State -2.892**
(1.194)
Time -2.166
(1.516)
Diff-in-Diff Estimate 2.754
(1.688)
Observations 794
Adjusted R2 0.004
Note: p<0.1; p<0.05; p<0.01


f. Compute the difference-in-differences estimator “by hand”.

# compute the difference in means; modifies the original dataset, not recommended
fastFood <- fastFood %>% 
  group_by(state) %>% 
  mutate(
    "DiffInMeans" = mean(na.omit(emptot2))- mean(na.omit(emptot))
    )  

# mean differences have opposite signs, calculate difference by using absolute values, round the result 
DiffInDiff <- round(sum(abs(unique(fastFood$DiffInMeans))), 2) 
DiffInDiff
## [1] 2.75
# Hand-computed difference in differences = 2.75; in paper : 2.76
# for that, sum rounded absolute values
diffInDiff <- sum(abs(round(unique(fastFood$DiffInMeans), 2))) 
diffInDiff
## [1] 2.76

compute Standard Errors “by hand” too

🤔 🙅