Homework #5

Using Panel data: Difference-in-Differences Impact of Minimum Wages

Part 1: Reading and questions

Article: Card and Krueger (1994) Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania AER 84(4): 772-793.

a. What is the causal link the paper is trying to reveal?

The paper is trying to estimate the impacts of minimum wage on employment, specifically that in fast-food stores.

b. What would be the ideal experiment to test this causal link?

This cannot be a purely natural experiment in that it is a policy-level intervention that affects the entire economy. That’s where federalism is our friend in impact evaluation in such a quasi-experimental setting. The ideal experiment would have been if some of the states in the U.S. are made to undergo that change in policy randomly, while the rest do not implement it. Then, we can compare the difference in the outcome variables between the control and treatment groups, before and after the intervention to see the impacts.

c. What is the identification strategy?

The authors exploit the fact that there is an increase in the minimum wage in the state of New Jersey but not in Pennsylvania. The authors state that since the two states are similar in a lot of characteristics, they can be compared.

d. What are the assumptions / threats to this identification strategy?

As we talked last week after class about my Brazil paper, it’s always tricky if there is only one control state. This can be an issue if there are other state-level policy changes. This can affect our estimates. Moreover, since difference-in-differences estimates hang on to the parallel trends assumption, it is not sure if the assumption is satisfied in this study. If there are multiple pre-treatment years, we can show the satisfaction of the assumption using the event study method.

Part 2: Replication

a. Load data from Card and Krueger AER 1994.

b. Verify that the data is correct. Reproduce the % of Burger King, KFC, Roys, and Wendys, as well as the FTE means in the 2 waves.

library(haven)
library(stargazer)

data <- read.csv("CardKrueger1994_fastfood.csv")
summary<-as.data.frame(matrix(nrow=4,ncol=2))
summary[,1]<-colnames(data)[7:10]
for (val in seq(from=1,to=4,by=1)){
  data1<-subset(data,chain==val)
  summary[val,2]<-nrow(data1)/nrow(data)
}
print(summary)

##       V1        V2
## 1     bk 0.4170732
## 2    kfc 0.1951220
## 3   roys 0.2414634
## 4 wendys 0.1463415

c. Use OLS to obtain their Diff-in-diff estimator. Comment on how your OLS compared to the DiD estimate in Table 3 of the paper.

reg <- lm(demp ~ state, data=data)
stargazer(reg, type = "text", align = TRUE, keep.stat = c("n","rsq"), dep.var.labels = c("NJ - PA"), covariate.labels = c("State"))

## 
## ========================================
##                  Dependent variable:    
##              ---------------------------
##                        NJ - PA          
## ----------------------------------------
## State                  2.750**          
##                        (1.154)          
##                                         
## Constant              -2.283**          
##                        (1.036)          
##                                         
## ----------------------------------------
## Observations             384            
## R2                      0.015           
## ========================================
## Note:        *p<0.1; **p<0.05; ***p<0.01

The estimate is close to that in the paper.

d. What would be the equation of a standard “difference in difference” regression? Just write down the equation.

The standard differnece-in-differences equation is given by:

\(y_{it}\) = \(\beta\)\(_0\) + \(\beta\)\(_1\) \(Treat_{i}\) + \(\beta\)\(_2\) \(Post_{t}\) + \(\beta\)\(_3\) \(Treat_{i}\) * \(Post_{t}\) + \(e_{it}\)

…where, \(y_{it}\) is the outcome variable of individual i from time period t. \(Treat_{i}\) is a binary dummy that is equal to 1 if an individual belongs to the treatment group; \(Post_{t}\) is a binary dummy that is equal to 1 if the observation is from a time period after the treatment; \(e_{it}\) is the idiosyncratic error term that is clustered at the unit of randomization. \(\beta\)\(_3\) gives us the estimate of the impact of the intervention, and is the coefficient of our interest.