Assignment from the seminal paper by David Card and Alan Krueger. Card and Krueger (1994) Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania AER 84(4): 772-793.

Part 1: Briefly answer these questions:

c. What is the identification strategy?

Implementation of the higher minimum wage policy in NJ wage will affect the employment in NJ and no any effect is likely in PA where no such policy is implemented. Before and after comparison is done in both the states to study the impact of policy implementation

d. What are the assumptions / threats to this identification strategy? (answer specifically with reference to the data the authors are using)

Assumptions to the identification strategy are the following:

  1. Employment largely depends on socio-economic condition of the place. It is assumed that NJ and PA are same in terms of socio-economic condition, which might be a concern unless the author took the border of the two state for the survey.
  2. It is also assumed that all other factors are same, which implies no change other than policy implementation in NJ and not in PA.

Part 2: Replication Analysis

a. Load the data

df <- read.csv("data/CardKrueger1994_fastfood.csv")
head(df,6)
idstateemptotemptot2dempchainbkkfcroyswendyswage_stwage_st2
46040.524  -16.5 11000  4.3 
49013.811.5-2.2520100  4.45
50608.510.52   20100  5   
56034  20  -14   400015  5.25
61024  35.511.5 400015.54.75
62020.5     400015     

b. Verify that the data is correct. Reproduce the % of Burger King, KFC, Roys, and Wendys, as well as the FTE means in the 2 waves

a <- df %>% group_by(state) %>% summarise(mean(bk)*100, mean(kfc)*100, mean(roys)*100, 
                                          mean(wendys)*100, mean(emptot, na.rm=TRUE),mean(emptot2, na.rm=TRUE)) %>% t()
aa <- a[2:7,]

tt_bk = t.test(df$bk[df$state==1],df$bk[df$state==0], var.equal=FALSE)$statistic
tt_kfc = t.test(df$kfc[df$state==1],df$kfc[df$state==0], var.equal=FALSE)$statistic
tt_roys = t.test(df$roys[df$state==1],df$roys[df$state==0], var.equal=FALSE)$statistic 
tt_wendys = t.test(df$wendys[df$state==1],df$wendys[df$state==0], var.equal=FALSE)$statistic 
tt_fte1 = t.test(df$emptot[df$state==1],df$emptot[df$state==0], var.equal=FALSE)$statistic
tt_fte2 = t.test(df$emptot2[df$state==1],df$emptot2[df$state==0], var.equal=FALSE)$statistic
ttest <- data.frame(c(tt_bk,tt_kfc,tt_roys,tt_wendys,tt_fte1,tt_fte2))


tab <- cbind(aa,ttest)
 
colnames(tab) <- c("PA","NJ","t")
tab[,1:2] <- sapply(tab[,1:2],as.numeric)
tab$Var <- c("Burger King","KFC","Roy Rogers","Wendy's","FTE-Wave1","FTE-Wave2")
#Reordering column names
tab2 <- tab[,c(4,2,1,3)]
rownames(tab2) <- NULL
huxtable(tab2)
VarNJPAt
Burger King41.144.3-0.515
KFC20.515.21.16 
Roy Rogers24.821.50.623
Wendy's13.619  -1.12 
FTE-Wave120.423.3-2    
FTE-Wave221  21.2-0.128

c. Use OLS to obtain the diff-in-diff

model1 <- lm(demp~state,data=df)

huxtable <- huxreg(model1, coefs = c("State"="state"),
                   statistics = c("N. obs." = "nobs","R squared" = "r.squared"))
huxtable
(1)
State2.750 *
(1.154) 
N. obs.384      
R squared0.015  
*** p < 0.001; ** p < 0.01; * p < 0.05.

As per Table 3 in the paper, the coeff is 2.76 whereas with OLS what I get is 2.75

d. The equation of a standard “difference in difference” regression:

\(y_{i,t}\) = \(\alpha\) + \(\beta S_i\) + \(\tau T_t\) + \(\gamma (S_i * T_t)\) + \(\varepsilon_{i,t}\)

where:

\(y_{i,t}\) is the \(i^{th}\) observation at time t,

\(S_i\) is the dummy for state, 1 for NJ and 0 for PA,

\(T_t\) is the dummy for time, 0 for before and 1 is after the implementation of the policy,

\(\varepsilon_{i,t}\) is the error term

Part 3: Optional:

e. To do Diff-in-Diff,First reshape the data to the long form.

df$time_1 <- rep(0,410)
df$time_2 <- rep(1,410)
df1 <- df %>% select(c(1,2,3,4,13,14))

df_new <- reshape(df1, idvar = c("id","state"), varying = list(c("emptot","emptot2"),c("time_1","time_2")), v.names = c("emp_toto","timeofyr"), direction = "long")
head(df_new,6)
idstatetimeemp_tototimeofyr
460140.50
490113.80
506018.50
560134  0
610124  0
620120.50
df_new[is.na(df_new)]=0
model2 <- lm(emp_toto ~ state, data=df_new)
model3 <- lm(emp_toto ~ state + timeofyr + state*timeofyr, data=df_new)
model4 <- plm(emp_toto ~ state + timeofyr + state*timeofyr, data=df_new,index=c("id"))

huxtable <- huxreg("Simple Reg" = model2,"Diff-in-Diff" = model3,"Diff-in-Diff using PLM" = model4, coefs = c("State"="state","Time" = "timeofyr","State:Time" = "state:timeofyr" ),
                   statistics = c("N. obs." = "nobs","R squared" = "r.squared"))%>% 
  set_caption("Table: Simple regression and the Diff-in-Diff regression")
huxtable
Table: Simple regression and the Diff-in-Diff regression
Simple RegDiff-in-DiffDiff-in-Diff using PLM
State-1.642 -2.919 *0.973 
(0.882)(1.247) (7.438)
Time     -2.111  -2.111 
     (1.585) (1.179)
State:Time     2.554  2.554 
     (1.764) (1.312)
N. obs.820     820      820     
R squared0.004 0.007  0.009 
*** p < 0.001; ** p < 0.01; * p < 0.05.

f. Computing DID

DID computed from table3 in the paper:

\(\beta_{NJ,0}\) = 20.44

\(\beta_{NJ,1}\) = 21.03

\(\beta_{PA,0}\) = 23.33

\(\beta_{PA,1}\) = 21.17

\(\beta_{DID}\) = \(\beta_{NJ,1}\) - \(\beta_{NJ,0}\) - (\(\beta_{PA,1}\) - \(\beta_{PA,0}\) )

## [1] 2.75