#loading necessary packages
library(haven)
library(stargazer)
##
## Please cite as:
## Hlavac, Marek (2018). stargazer: Well-Formatted Regression and Summary Statistics Tables.
## R package version 5.2.2. https://CRAN.R-project.org/package=stargazer
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
Reference: Card and Krueger (1994) Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania AER 84(4): 772-793
Briefly answer these questions:
a. What is the causal link the paper is trying to reveal?
The paper is trying to reveal the causal link on the the effect of minimum wages on establishment-level employment outcomes.
b. What would be the ideal experiment to test this causal link?
The ideal experiment to test this causal link is to perform difference in difference (DiD) estimation. DiD is useful to estimate the effects of new policies by observing outcomes of people who were exposed to the intervention (treated) and people not exposed to the intervention (control), both before and after the intervention.
c. What is the identification strategy?
The identification strategy used in this paper is to observe the impact of new law of raising minimum wage in New Jersey. The effect of the higher minimum wage was identified with two groups (New Jersey where the minimum wage increased and Pennsylvania where the minimum wage was constant) and two periods (before and after the law). Comparisons of employment, wages, and prices at stores in New Jersey and Pennsylvania before and after the rise offer a simple method for evaluating the effects of the higher minimum wage.
d. What are the assumptions / threats to this identification strategy?
The authors made the assumptions that New Jersey is a relatively small state with an economy that is closely linked to nearby states. Therefore, a control group of fast-food stores in eastern Pennsylvania forms a natural basis for comparison with the experiences of restaurants in New Jersey. Seasonal patterns of employment are also similar in both New Jersey and eastern Pennsylvania.
a. Load data from Card and Krueger AER 1994
#loading the dataset
wagedata <- read.csv("CardKrueger1994_fastfood.csv")
dim(wagedata)
## [1] 410 12
head(wagedata)
## id state emptot emptot2 demp chain bk kfc roys wendys wage_st wage_st2
## 1 46 0 40.50 24.0 -16.50 1 1 0 0 0 NA 4.30
## 2 49 0 13.75 11.5 -2.25 2 0 1 0 0 NA 4.45
## 3 506 0 8.50 10.5 2.00 2 0 1 0 0 NA 5.00
## 4 56 0 34.00 20.0 -14.00 4 0 0 0 1 5.0 5.25
## 5 61 0 24.00 35.5 11.50 4 0 0 0 1 5.5 4.75
## 6 62 0 20.50 NA NA 4 0 0 0 1 5.0 NA
b. Verify that the data is correct
The percentage of means of key variables matched with values in table 2 of Card and Krueger (1994) paper which verifies that the data is correct.
summary <- wagedata %>%
group_by(state) %>%
summarize(Burger_King=mean(bk)*100, KFC = mean(kfc)*100, Roy_Rogers = mean(roys)*100, Wendys=mean(wendys)*100, FTE_employment_wave1= mean(emptot, na.rm=TRUE), FTE_employment_wave2= mean(emptot2, na.rm = TRUE))
summary$state[1] <- "PA"
summary$state[2] <- "NJ"
table2 <- t(summary)
stargazer(table2 , type="text", title="Table 2: Means of Key Variables", align=TRUE, dep.var.labels = "State")
##
## Table 2: Means of Key Variables
## ======================================
## state PA NJ
## Burger_King 44.30380 41.08761
## KFC 15.18987 20.54381
## Roy_Rogers 21.51899 24.77341
## Wendys 18.98734 13.59517
## FTE_employment_wave1 23.33117 20.43941
## FTE_employment_wave2 21.16558 21.02743
## --------------------------------------
c. Use OLS to obtain their Diff-in-diff estimator
#OLS Estimation
model1 <- lm(demp~state, data=wagedata)
stargazer(model1, type="text", title="Table 3: AVERAGE EMPLOYMENT PER STORE BEFORE AND AFTER THE RISE IN NEW JERSEY MINIMUM WAGE", align=TRUE, dep.var.labels = "Difference, NJ-PA (iii)", covariate.labels=c("Change in mean FTE employment"))
##
## Table 3: AVERAGE EMPLOYMENT PER STORE BEFORE AND AFTER THE RISE IN NEW JERSEY MINIMUM WAGE
## =========================================================
## Dependent variable:
## ---------------------------
## Difference, NJ-PA (iii)
## ---------------------------------------------------------
## Change in mean FTE employment 2.750**
## (1.154)
##
## Constant -2.283**
## (1.036)
##
## ---------------------------------------------------------
## Observations 384
## R2 0.015
## Adjusted R2 0.012
## Residual Std. Error 8.968 (df = 382)
## F Statistic 5.675** (df = 1; 382)
## =========================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
The OLS estimate for the difference in FTE employment between wave 1 and wave 2 is 2.75 (SE=1.154) which is statistically significant at 5% significance level. The OLS estimate is slightly lower than the estimate in table 3 (Coefficient for change in mean employment = 2.76 and SE = 1.36)
d. What would be the equation of a standard “difference in difference” regression? Just write down the equation.
The 2-period DiD regression is:
\[ Y_{i,t} = \alpha + \beta D_i + \tau T_t + \gamma (D_i × T_t) + e_{i,t} \]
where, \(Y_{i,t}\) = FTE employment for state i and wave t (before and after wage change)
\(D_i\) = Dummy variable for state (NJ or PA)
\(T_t\) = time period dummy (before vs after or wave 1 vs wave 2)
\(e_{i,t}\)= error term
e. Run the regression you wrote up in part d
Reshaping the dataset in long form
DiD <- reshape(wagedata, varying=c("emptot", "emptot2"),
v.names=c("employment"),
timevar = "Wave",
times=c("1", "2"),
idvar = c("id", "state"),
direction = "l")
Running the model
model2 <- lm(demp~state+Wave+state*Wave, data=DiD)
stargazer(model2, type="text", title="Difference in Difference Regression", align=TRUE, dep.var.labels = "DiD",keep.stat = c("n", "rsq"), covariate.labels=c("Change in FTE Employment" ,"After wage change", "State*Wave2"), omit=c("Constant") )
##
## Difference in Difference Regression
## ====================================================
## Dependent variable:
## ---------------------------
## DiD
## ----------------------------------------------------
## Change in FTE Employment 2.750**
## (1.154)
##
## After wage change -0.000
## (1.464)
##
## State*Wave2 0.000
## (1.633)
##
## ----------------------------------------------------
## Observations 768
## R2 0.015
## ====================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
f. Compute the difference-in-differences estimator “by hand”
Computing the DiD estiamtor from the values in Table 3 (Column 1,2,3 and rows 1,2,4)
For Wave 1, before wage change
\(\beta_{PA, before}=23.33\)
\(\beta_{NJ, before}=20.44\)
\(∆Y_{NJ-PA, before} = 20.44 - 23.33 = -2.89\)
For Wave 2, after wage change
\(\beta_{PA, after}=21.17\)
\(\beta_{NJ, after}=21.03\)
\(∆\beta_{NJ-PA, after} = 21.03 - 21.17 = -0.14\)
The DiD estimator
\(\beta_{DiD} = ∆\beta_{NJ-PA, after} - ∆Y_{NJ-PA, before} = -0.14 - (-2.89) = 2.75\)
#State PA=0
#State NJ =1
PA <- filter(wagedata, state==0)
NJ <- filter(wagedata, state==1)
modBPA <- lm(emptot~state, data=PA)
modAPA <- lm(emptot2~state, data=PA)
modDPA <- lm(demp~state, data=PA)
stargazer(modBPA, modAPA, modDPA ,type="text", title="Difference in Difference Regression (Pennsylvania)", align=TRUE, dep.var.labels = c("FTE employment before", "FTE employment after", "Change in FTE employment"),keep.stat = c("n", "rsq"), covariate.labels=c("PA (i)"))
##
## Difference in Difference Regression (Pennsylvania)
## ================================================================================
## Dependent variable:
## -------------------------------------------------------------------
## FTE employment before FTE employment after Change in FTE employment
## (1) (2) (3)
## --------------------------------------------------------------------------------
## PA (i)
##
##
## Constant 23.331*** 21.166*** -2.283*
## (1.351) (0.943) (1.253)
##
## --------------------------------------------------------------------------------
## Observations 77 77 75
## R2 0.000 0.000 0.000
## ================================================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
modBNJ <- lm(emptot~state, data=NJ)
modANJ <- lm(emptot2~state, data=NJ)
modDNJ <- lm(demp~state, data=NJ)
stargazer(modBNJ, modANJ, modDNJ ,type="text", title="Difference in Difference Regression (New Jersey)", align=TRUE, dep.var.labels = c("FTE employment before", "FTE employment after", "Change in FTE employment"),keep.stat = c("n", "rsq"), covariate.labels=c("NJ (ii)"))
##
## Difference in Difference Regression (New Jersey)
## ================================================================================
## Dependent variable:
## -------------------------------------------------------------------
## FTE employment before FTE employment after Change in FTE employment
## (1) (2) (3)
## --------------------------------------------------------------------------------
## NJ (ii)
##
##
## Constant 20.439*** 21.027*** 0.467
## (0.508) (0.520) (0.481)
##
## --------------------------------------------------------------------------------
## Observations 321 319 309
## R2 0.000 0.000 0.000
## ================================================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
modBdiff <- lm(emptot~state, data=wagedata)
modAdiff <- lm(emptot2~state, data=wagedata)
DiD <- lm(demp~state, data=wagedata)
stargazer(modBdiff, modAdiff, DiD ,type="text", title="Difference in Difference Regression (NJ-PA)", align=TRUE, dep.var.labels = c("FTE employment before", "FTE employment after", "Change in FTE employment"),keep.stat = c("n", "rsq"), covariate.labels=c("Differences, NJ-PA, (iii)"), omit=c("Constant"))
##
## Difference in Difference Regression (NJ-PA)
## =============================================================================================
## Dependent variable:
## -------------------------------------------------------------------
## FTE employment before FTE employment after Change in FTE employment
## (1) (2) (3)
## ---------------------------------------------------------------------------------------------
## Differences, NJ-PA, (iii) -2.892** -0.138 2.750**
## (1.230) (1.156) (1.154)
##
## ---------------------------------------------------------------------------------------------
## Observations 398 396 384
## R2 0.014 0.00004 0.015
## =============================================================================================
## Note: *p<0.1; **p<0.05; ***p<0.01