1. Introduction
2. DiD Theory
3. Replication: Study by David Card and Alan B. Krueger about the effect of a raise in minimum wages on employment.
2022-08-10
1. Introduction
2. DiD Theory
3. Replication: Study by David Card and Alan B. Krueger about the effect of a raise in minimum wages on employment.
In economics, researchers are often using natural or quasi experimental setting. With them one can study a given change in the environment, which allows to split the population at hand into a treatment and control group.
We want to evaluate a program or treatment.
We have treatment and control groups.
We observe them before and after.
Treatment is not random.
Other things were happening while the program was in effect.
We can’t control for all the potencial confounders.
quasi-experimental technique appropriate for estimating a treatment effect in a given situation.evaluate its validity.Quasi-experiment: a situation where you, as researcher, did not assign people to treatment/control. The context isolates pathway between treatment and outcome
The DiD approach includes a before-after comparison for a treatment and control group. This is a combination of:
A cross-sectional comparison (= compare a sample that was treated to an non-treated control group)
A before-after comparison (= compare treatment group with itself, before and after the treatment)
Regression: \(Y_{i} = \beta_{0} + \beta_{1}T_{i} + \beta_{2}P_{i} + \beta_{3}T_{i}P_{i} + u_{i}\)
\(Y_{i} =\) outcome
\(T_{i} =\) 1 if treatment
\(P_{i} =\) 1 if after event
\(\hat{\beta_{3}} = (\overline{Y}^{TG,AT} - \overline{Y}^{TG,BT}) - (\overline{Y}^{CG,AT} - \overline{Y}^{CG,BT})\)
\(\hat{\beta_{3}} = ((\beta_{0}+\beta_{1}+\beta_{2}+\beta_{3}) - (\beta_{0}+\beta_{2})) - ((\beta_{0}+\beta_{1})-(\beta_{0}))\)parallel trend assumption. That means without the change in the environment, the change in the outcome variable would have been the same for the two groups (counterfactual outcome).
The validity of the DiD approach is closely related to the similarity of the treatment and control group. Hence, some plausibility checks should be conducted:
Compute Placebo-DiD for periods without a change in the environment.
For (longer) time series: check and demonstrate the parallel time trends.
Use an alternative control group (if available): the estimate should be the same.
Replace Y by an alternative outcome which is definitely independent of the treatment (the DiD estimator should be 0).
The idea that earnings often fall just prior to entering a training program, which complicates measurement of treatment effect.
- Over-estimation of the treatment effect.
- Treatment under study is specific to a particular target group. (Certain treatment is measured in adults over 50 years of age, this will not have the same results as if it is applied to young people under 18 years of age)
A functional form refers to the algebraic form of a relationship between a dependent variable and regressors or explanatory variables.
- Consider the difference in logs.
Parallel trends are more plausible over shorter time period than over long time period.
But from policy point of view: interest in long-term effect
It is possible to apply the DiD estimator, if both groups are affected by a policy change, but with different doses.
But: DiD estimator may be misleading, if the intensity of the response is different between the two groups.
In this section I am replicate a study by David Card and Alan B. Krueger about the effect of a raise in minimum wages on employment.
Conventional economic theory suggests that in a labour market with perfect competition an increase in the minimum wage leads to an increase in unemployment.
In April 1992, the U.S. state of New Jersey (NJ) raised the minimum wage from $4.25 to $5.05. Card and Krueger (1994) use a DiD approach and show that this increase in minimum wages led to an increase in employment in the sector of fast food restaurants.
The control group in their setting is the neighbouring U.S. state of Pennsylvania (PA), which was not subject to this policy change The authors conducted a survey before and after the raise of the minimum wage with a representative sample of fast food restaurants in NJ and PA. This setting can be regarded as quasi experimental, as both states are not identical in many aspects and the legislative procedure, in order the raise the minimum wage, was not initiated at random.
\(emp_{it} = \beta_{0} + \beta_{1}NJ_{i} + \beta_{2}POST_{t} + \beta_{3}NJ_{i}POST_{t} + u_{it}\)
\(i\): denotes a fast food restaurant
\(t\): denotes time
\(emp_{it}\): number of employees in restaurant \(i\) at time \(t\)
\(emp^{PA,feb} = \beta_{0} + u\)
\(emp^{PA,nov} = \beta_{0} + \beta_{2} + u\)
\(emp^{NJ,feb} = \beta_{0} + \beta_{1} + u\)
\(emp^{NJ,nov} = \beta_{0} + \beta_{1} + \beta_{2} + \beta_{3} + u\)
Diffence in Difference Estimator:
\(counterfactual = emp^{NJ,feb} - (emp^{PA,feb} - emp^{PA,nov})\)
\(counterfactual = (\beta_{0} + \beta_{1}) - ((\beta_{0}) - (\beta_{0} + \beta_{2}))\)
\(counterfactual = \beta_{0} + \beta_{1} + \beta_{2}\)
library(dplyr) library(readr) library(ggplot2) library(tidyr) library(sjlabelled) library(ggrepel) library(scales) library(ggpubr) library(plm) library(lmtest) library(stringi) library(tidyverse)
Raw Data
Cleaned Data
Transposed Data (February 1992 vs November 1992)
Final Data
Generate a variable that measures employment. According to the paper, the full-time equivalents (FTE/emptot) consist of full-time employees (empft), managers (nmgrs) and part-time employees (emppt). The latter are multiplied by factor 0.5 before entering the calculation. Also, I am generating the share of full-time employees of all FTE (pct_ftw).
card_krueger_1994_mod <- card_krueger_1994 %>%
mutate(emptot = empft + nmgrs + 0.5 * emppt,
pct_fte = empft / emptot * 100)
Table 2 in the paper shows extensive descriptive statistics of the dataset. Some of them are replicated in this section, in order to show that reading and processing the data was not prone to errors.
## state ## chain New Jersey Pennsylvania ## bk 41.1% 44.3% ## kfc 20.5% 15.2% ## roys 24.8% 21.5% ## wendys 13.6% 19.0%
Next, I am adding the mean values of certain variables of the first wave of the survey grouped by each state …
## # A tibble: 4 × 3 ## variable `New Jersey` Pennsylvania ## <chr> <dbl> <dbl> ## 1 emptot 20.4 23.3 ## 2 pct_fte 32.8 35.0 ## 3 wage_st 4.61 4.63 ## 4 hrsopen 14.4 14.5
… as well as the mean values of the second wave of the survey. My calculations are in line with the numbers published in the paper.
## # A tibble: 4 × 3 ## variable `New Jersey` Pennsylvania ## <chr> <dbl> <dbl> ## 1 emptot 21.0 21.2 ## 2 pct_fte 35.9 30.4 ## 3 wage_st 5.08 4.62 ## 4 hrsopen 14.4 14.7
Before calculating the DiD estimator with OLS, I want to deduce it by means of differencing the mean values of employment (emptot) between each group. This is easily done with functions group_by() and summarise(). We obtain four groups with distinct mean values!
differences <- card_krueger_1994_mod %>% group_by(observation, state) %>% summarise(emptot = mean(emptot, na.rm = TRUE)) # Treatment group (NJ) before treatment njfeb <- differences[1,3] # Control group (PA) before treatment pafeb <- differences[2,3] # Treatment group (NJ) after treatment njnov <- differences[3,3] # Control group (PA) after treatment panov <- differences[4,3]
The Average Treatment Effect (ATE) in this setting can be determined in two ways:
(njnov-njfeb)-(panov-pafeb)
## emptot ## 1 2.753606
(njnov-panov)-(njfeb-pafeb)
## emptot ## 1 2.753606
First, I use the differences of variable emptotcalculated in the previous step for NJ and PJ in February and November. Additionally, we require the outcome of NJ if the treatment (raise of the minimum wage) did not happen. This is called the counterfactual outcome (nj_counterfactual). The DiD assumption states that the trends of treatment and control group are identical until the treatment takes place. Hence, without the treatment the employment (emptot) of NJ would decline from February to November by the same amount as PA.
# Calculate counterfactual outcome
nj_counterfactual <- tibble(
observation = c("February 1992","November 1992"),
state = c("New Jersey (Counterfactual)","New Jersey (Counterfactual)"),
emptot = as.numeric(c(njfeb, njfeb-(pafeb-panov))))
# Data points for treatment event
intervention <- tibble(
observation = c("Intervention", "Intervention", "Intervention"),
state = c("New Jersey", "Pennsylvania", "New Jersey (Counterfactual)"),
emptot = c(19.35, 22.3, 19.35))
# Combine data
did_plotdata <- bind_rows(differences, nj_counterfactual, intervention)
In November 1992, the distance between the actual and counterfactual employment (emptot) of NJ identifies the causal effect of an increase in minimum wages on employment.
With linear regression, this result can be achieved very easy. At first, we need to create two dummy variables. One indicates the start of the treatment (time) and is equal to zero before the treatment and equal to one after the treatment. The other variable separates the observations into a treatment and control group (treated). This dummy variable is equal to one for fast food restaurants located in NJ and equal to zero for fast food restaurants located in PA.
card_krueger_1994_mod <- mutate(card_krueger_1994_mod,
time = ifelse(observation == "November 1992", 1, 0),
treated = ifelse(state == "New Jersey", 1, 0)
)
The DiD estimator is an interaction between both dummy variables. This interaction can be specified with the : operator in the formula of function lm() in addition to the individual dummy variables.
Hint: Another possibility is to only specify time*treated in the formula which adds the individual dummy variables automatically.
The coefficient of time:treated is the difference-in-differences estimator. The treatment (raise of the minimum wage) leads on average to an increase of employment (emptot) in NJ by 2.75 FTE.
Balanced sample: In RStudio we can get this result by computing a fixed effects model which is sometimes also called a within estimator. I am using R package plm to run this regression with function plm() and argument model = "within". Beforehand, the data has to be declared as a panel with function p.dataframe(). With variable sheet each fast food restaurant can be uniquely identified. Additionally, we need the function coeftest() from R package lmtest in order to obtain the correct standard errors which must be clustered by sheet.
# Declare as panel data
panel <- pdata.frame(card_krueger_1994_mod, "sheet")
# Within model
did.reg <- plm(emptot ~ time + treated + time:treated,
data = panel, model = "within")
# obtain clustered standard errors
coeftest(did.reg, vcov = function(x)
vcovHC(x, cluster = "group", type = "HC1"))
## ## t test of coefficients: ## ## Estimate Std. Error t value Pr(>|t|) ## time2 -2.2833 1.2465 -1.8319 0.06775 . ## time2:treated 2.7500 1.3359 2.0585 0.04022 * ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Baker, A. (September 25, 2019). Difference-in-Differences Methodology. Retrieved on August 3 2022, from https://andrewcbaker.netlify.app/2019/09/25/difference-in-differences-methodology/
Leppert, P. (September 18, 2020). R Tutorial: Difference-in-Differences. Retrieved on August 3 2022, from https://rpubs.com/phle/r_tutorial_difference_in_differences
Ashenfelter, O. (2007). Orley Ashenfelter, Distinguished Fellow 2007. Retrieved on August 3 2022, from https://www.aeaweb.org/about-aea/honors-awards/distinguished-fellows/orley-ashenfelter