Assignment from the seminal paper by David Card and Alan Krueger. Card and Krueger (1994) Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania AER 84(4): 772-793.
The paper is trying to study the impact of higher minimum wage policy on the employment. The treatment group is New Jersey (NJ) where the law is implemented and Pennsylvania(PA) is the control group.
Ideal experiment would be to randomly assign the treatment and control group. Here, one state is treatment and other is control group assuming they are almost identical which might not be the case.
Implementation of the higher minimum wage policy in NJ wage will affect the employment in NJ and no any effect is likely in PA where no such policy is implemented. Before and after comparison is done in both the states to study the impact of policy implementation
df <- read.csv("data/CardKrueger1994_fastfood.csv")
head(df,6)
id | state | emptot | emptot2 | demp | chain | bk | kfc | roys | wendys | wage_st | wage_st2 |
---|---|---|---|---|---|---|---|---|---|---|---|
46 | 0 | 40.5 | 24 | -16.5 | 1 | 1 | 0 | 0 | 0 | 4.3 | |
49 | 0 | 13.8 | 11.5 | -2.25 | 2 | 0 | 1 | 0 | 0 | 4.45 | |
506 | 0 | 8.5 | 10.5 | 2 | 2 | 0 | 1 | 0 | 0 | 5 | |
56 | 0 | 34 | 20 | -14 | 4 | 0 | 0 | 0 | 1 | 5 | 5.25 |
61 | 0 | 24 | 35.5 | 11.5 | 4 | 0 | 0 | 0 | 1 | 5.5 | 4.75 |
62 | 0 | 20.5 | 4 | 0 | 0 | 0 | 1 | 5 |
a <- df %>% group_by(state) %>% summarise(mean(bk)*100, mean(kfc)*100, mean(roys)*100,
mean(wendys)*100, mean(emptot, na.rm=TRUE),mean(emptot2, na.rm=TRUE)) %>% t()
aa <- a[2:7,]
tt_bk = t.test(df$bk[df$state==1],df$bk[df$state==0], var.equal=FALSE)$statistic
tt_kfc = t.test(df$kfc[df$state==1],df$kfc[df$state==0], var.equal=FALSE)$statistic
tt_roys = t.test(df$roys[df$state==1],df$roys[df$state==0], var.equal=FALSE)$statistic
tt_wendys = t.test(df$wendys[df$state==1],df$wendys[df$state==0], var.equal=FALSE)$statistic
tt_fte1 = t.test(df$emptot[df$state==1],df$emptot[df$state==0], var.equal=FALSE)$statistic
tt_fte2 = t.test(df$emptot2[df$state==1],df$emptot2[df$state==0], var.equal=FALSE)$statistic
ttest <- data.frame(c(tt_bk,tt_kfc,tt_roys,tt_wendys,tt_fte1,tt_fte2))
tab <- cbind(aa,ttest)
colnames(tab) <- c("PA","NJ","t")
tab[,1:2] <- sapply(tab[,1:2],as.numeric)
tab$Var <- c("Burger King","KFC","Roy Rogers","Wendy's","FTE-Wave1","FTE-Wave2")
#Reordering column names
tab2 <- tab[,c(4,2,1,3)]
rownames(tab2) <- NULL
huxtable(tab2)
Var | NJ | PA | t |
---|---|---|---|
Burger King | 41.1 | 44.3 | -0.515 |
KFC | 20.5 | 15.2 | 1.16 |
Roy Rogers | 24.8 | 21.5 | 0.623 |
Wendy's | 13.6 | 19 | -1.12 |
FTE-Wave1 | 20.4 | 23.3 | -2 |
FTE-Wave2 | 21 | 21.2 | -0.128 |
model1 <- lm(demp~state,data=df)
huxtable <- huxreg(model1, coefs = c("State"="state"),
statistics = c("N. obs." = "nobs","R squared" = "r.squared"))
huxtable
(1) | |
---|---|
State | 2.750 * |
(1.154) | |
N. obs. | 384 |
R squared | 0.015 |
*** p < 0.001; ** p < 0.01; * p < 0.05. |
As per Table 3 in the paper, the coeff is 2.76 whereas with OLS what I get is 2.75
\(y_{i,t}\) = \(\alpha\) + \(\beta S_i\) + \(\tau T_t\) + \(\gamma (S_i * T_t)\) + \(\varepsilon_{i,t}\)
where:
\(y_{i,t}\) is the \(i^{th}\) observation at time t,
\(S_i\) is the dummy for state, 1 for NJ and 0 for PA,
\(T_t\) is the dummy for time, 0 for before and 1 is after the implementation of the policy,
\(\varepsilon_{i,t}\) is the error term
df$time_1 <- rep(0,410)
df$time_2 <- rep(1,410)
df1 <- df %>% select(c(1,2,3,4,13,14))
df_new <- reshape(df1, idvar = c("id","state"), varying = list(c("emptot","emptot2"),c("time_1","time_2")), v.names = c("emp_toto","timeofyr"), direction = "long")
head(df_new,6)
id | state | time | emp_toto | timeofyr |
---|---|---|---|---|
46 | 0 | 1 | 40.5 | 0 |
49 | 0 | 1 | 13.8 | 0 |
506 | 0 | 1 | 8.5 | 0 |
56 | 0 | 1 | 34 | 0 |
61 | 0 | 1 | 24 | 0 |
62 | 0 | 1 | 20.5 | 0 |
df_new[is.na(df_new)]=0
model2 <- lm(emp_toto ~ state, data=df_new)
model3 <- lm(emp_toto ~ state + timeofyr + state*timeofyr, data=df_new)
model4 <- plm(emp_toto ~ state + timeofyr + state*timeofyr, data=df_new,index=c("id"))
huxtable <- huxreg("Simple Reg" = model2,"Diff-in-Diff" = model3,"Diff-in-Diff using PLM" = model4, coefs = c("State"="state","Time" = "timeofyr","State:Time" = "state:timeofyr" ),
statistics = c("N. obs." = "nobs","R squared" = "r.squared"))%>%
set_caption("Table: Simple regression and the Diff-in-Diff regression")
huxtable
Simple Reg | Diff-in-Diff | Diff-in-Diff using PLM | |
---|---|---|---|
State | -1.642 | -2.919 * | 0.973 |
(0.882) | (1.247) | (7.438) | |
Time | -2.111 | -2.111 | |
(1.585) | (1.179) | ||
State:Time | 2.554 | 2.554 | |
(1.764) | (1.312) | ||
N. obs. | 820 | 820 | 820 |
R squared | 0.004 | 0.007 | 0.009 |
*** p < 0.001; ** p < 0.01; * p < 0.05. |
DID computed from table3 in the paper:
\(\beta_{NJ,0}\) = 20.44
\(\beta_{NJ,1}\) = 21.03
\(\beta_{PA,0}\) = 23.33
\(\beta_{PA,1}\) = 21.17
\(\beta_{DID}\) = \(\beta_{NJ,1}\) - \(\beta_{NJ,0}\) - (\(\beta_{PA,1}\) - \(\beta_{PA,0}\) )
## [1] 2.75