GENERAL NOTES

Objective of Problem Set 5

  • This is the first of two problem sets focused on the difference-in-differences method. The goal of this first problem set is to review the fundamentals of the method using R. In the second problem set, you will build on this foundation by combining the difference-in-differences approach with other analytic methods in R.

Submission process

  • When you are done with the problem set, publish it on Rpubs using your temporary account.
  • Copy the RPubs link of your work and submit it on Canvas.
  • For any entirely equal submissions, whoever sent me their RPubs link last has copied the others. So, timely submissions are important. Own your work. I can randomly ask your R script and .Rmd files for double-checking purposes. As a standard practice, work in a script file before making your code chunks in the .Rmd file. Your .Rmd file and Rpubs submission page MUST show the code used to produce any of the outputs you present in your answers.

Academic integrity

Academic integrity is the pursuit of scholarly activity in an open, honest and responsible manner. Academic integrity is a basic guiding principle for all academic activity at The Pennsylvania State University, and all members of the University community are expected to act in accordance with this principle. Consistent with this expectation, the University’s Code of Conduct states that all students should act with personal integrity, respect other students’ dignity, rights and property, and help create and maintain an environment in which all can succeed through the fruits of their efforts.

Academic integrity includes a commitment by all members of the University community not to engage in or tolerate acts of falsification, misrepresentation or deception. Such acts of dishonesty violate the fundamental ethical principles of the University community and compromise the worth of work completed by others.

# Load packages
library(pacman)
p_load(wooldridge, multiwayvcov, causalweight, lmtest, did, usmap, 
       sandwich, AER, ivmodel, haven, estimatr, tidyverse, lubridate, 
       gridExtra, stringr, readxl, reshape2, scales, broom, 
       ggplot2, fixest, lfe, stargazer, foreign, ggthemes, ggforce,  
       ggridges, latex2exp, viridis, extrafont, kableExtra, snakecase, 
       janitor, tableone, causaldata, multcomp, etable)

PROBLEM 1. Canonical Difference-in-Differences.

Practice data. The practice data of this question is Kiel & McClain (JEEM 1995)’s dataset on house prices in North Andover, Massachusetts, before and after the construction of a garbage incinerator. You can load the wooldridge package by Shea (2021) to access the kielmc dataset. Our treatment variable nearinc is the binary indicator for a house being located nearer the incinerator (\(nearinc=1\)) versus farther away from the incinerator (\(nearinc=0\)). We have two periods of data captured by the variable y81: 1978 or before the construction of the incinerator (\(y81=0\)) and 1981 or after the construction of the incinerator (\(y81=1\)). The outcome variable is the real price of the house in 1978 US dollars (rprice). Other variables include the distance from the house to the central business district discretely coded (cbd), number of bedrooms (rooms), number of bathrooms (baths), age of the house in years (age) and squared (agesq), living area in square feet (area) and log-transformed (larea), lot size in square feet (land) and log-transformed (lland), etc. You may call the dataset (?kielmc) for further description.

# Load `kielmc` data
data(kielmc)
attach(kielmc)
#table(nearinc)
#table(y81)
as.data.frame(table(nearinc, y81))
##   nearinc y81 Freq
## 1       0   0  123
## 2       1   0   56
## 3       0   1  102
## 4       1   1   40
summary(rprice)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   26000   59000   82000   83721  100230  300000
length(unique(cbd)) # number of clusters
## [1] 34
#?kielmc

i. Without covariates.

  • Estimate the treatment effect on house prices using the 2x2 difference-in-differences estimator without covariates, clustering the standard errors at the cbd level. You may use the multiwaycov package by Graham et al. (2016) and the lmtest package to compute the cluster-robust standard errors. You may use packages like stargazer, modelsummary, texreg, or any other package that helps you produce well-formatted estimation tables. Interpret your results.

ii. With covariates.

  • Re-estimate the treatment effect on house prices using the 2x2 difference-in-differences estimator with the set of covariates rooms, baths, age, agesq, larea, and lland, while clustering the standard errors at the cbd level.

  • Re-estimate the treatment effect on house prices using the 2x2 difference-in-difference estimator with the set of covariates rooms, baths, age, area, and land, while clustering the standard errors at the cbd level. Present a well-formatted estimation table and compare your results to the precedent ones.

  • Using the set of covariates rooms, baths, age, agesq, larea, and lland, re-estimate the treatment effect on house prices, while applying a difference-in-differences estimator based on inverse probability weighting (IPW) with cluster bootstrap. You may use the didweight command of the causalweight package. Show the treatment effect, standard errors, and p-value.

  • Using the set of covariates rooms, baths, age, area, and land, re-estimate the treatment effect on house prices, while applying a difference-in-differences estimator based on inverse probability weighting (IPW) with cluster bootstrap. Show the treatment effect, standard errors, and p-value.

  • Interpret and compare these estimation results using covariates.

  • Of the five difference-in-differences estimates, what will be your preferred estimate? Why?

iii. Two-Way Fixed Effects

You observed a discussion between two friends who had access to county-year panel data spanning two time periods (\(t=1,2\)). The dataset includes the outcome variable for county \(i\) at time \(t\) \(\left(Y_{it}\right)\) and a binary treatment indicator \(\left(D_{it}\right)\) for county \(i\) at time \(t\). In this setup, no counties were treated in the first period \(\left(D_{i1}=0 \quad \forall i \right)\), but some counties received treatment in the second period \(\left(D_{i2}=0,1\right)\).

Your friends defined \(Y_{it}(0)\) as the potential outcome for county \(i\) in period \(t\) if it remained untreated in both periods, and \(Y_{it}(1)\) as the potential outcome for county \(i\) in period \(t\) if it was untreated at \(t=1\) but received treatment at \(t=2\). They assumed parallel trends in the average outcome for the treated and untreated counties in the absence of treatment:

\[\begin{equation} \tag{A1} \mathop{E}\left[Y_{i2}(0)-Y_{i1}(0)|D_{i}=1\right] = \mathop{E}\left[Y_{i2}(0)-Y_{i1}(0)|D_{i}=0\right], \end{equation}\] where \(D_{i}=max \lbrace D_{i1}, D_{i2} \rbrace = D_{i2}\) is the treatment status of county \(i\).

Both friends were interested in estimating the average treatment effect on the treated \(\left(ATT\right)\) at \(t=2\) that would be robust to treatment effect heterogeneity. Friend 1 proposed estimating Equation (1) to obtain \(\tau\), while Friend 2 suggested using Equation (2) to estimate \(\delta\). In Equation (1), \(\eta_{i}\) and \(\theta_{t}\) represent county and time fixed effects, respectively, while in Equation (2), \(P_{t}\) serves as the indicator for the second period.

\[\begin{equation} \tag{1} Y_{it} = \eta_{i} + \theta_{t} + \tau D_{it} + \epsilon_{it}. \end{equation}\]

\[\begin{equation} \tag{2} Y_{it} = \alpha + \beta D_{i} + \lambda P_{t} + \delta \left(D_{i} \times P_{t}\right) + \varepsilon_{it}. \end{equation}\]

  • Your first task is to adjudicate between the two approaches. Demonstrate your reasoning.

  • Friend 3 joined your discussion and suggested estimating the equation below as in Equation (3) of Roth, Sant’Anna, Bilinski, & Poe (JoE 2023).

\[\begin{equation} \tag{3} Y_{it} = \gamma_{i} + \phi_{t} + \rho \left(D_{i} \times P_{t}\right) + \nu_{it}. \end{equation}\]

Using a subset of their dataset provided below, compare \(\hat{\tau}\), \(\hat{\delta}\), and \(\hat{\rho}\), clustering standard errors at \(i\) level. You may use packages like stargazer, modelsummary, texreg, or any other package that helps you produce well-formatted estimation tables. Draw your conclusions.

# Friend's data
i <- c(1, 1, 2, 2, 3, 3, 4, 4)
t <- c(1, 2, 1, 2, 1, 2, 1, 2)
P_t <- c(0, 1, 0, 1, 0, 1, 0, 1)
D_it <- c(0, 0, 0, 0, 0, 1, 0, 1)
D_i <- c(0, 0, 0, 0, 1, 1, 1, 1)
Y_it <- c(6, 7, 10, 9, 8, 14, 8, 12)

df <- data.frame(i=i, t=t, P_t=P_t, D_it=D_it, D_i=D_i, Y_it=Y_it)
df
##   i t P_t D_it D_i Y_it
## 1 1 1   0    0   0    6
## 2 1 2   1    0   0    7
## 3 2 1   0    0   0   10
## 4 2 2   1    0   0    9
## 5 3 1   0    0   1    8
## 6 3 2   1    1   1   14
## 7 4 1   0    0   1    8
## 8 4 2   1    1   1   12
rm(i, t, P_t, D_it, D_i, Y_it)

PROBLEM 2. Difference-in-Differences with Multiple Periods of Treatment

Practice data. The practice data of this question is Callaway & Sant’Anna (JoE 2021)’s dataset on teen employment in 500 US counties from 2003 to 2007, before and after introducing a minimum wage policy. You can load the did package by Callaway & Sant’Anna (2020) to access the dataset mpdta. Counties are identified by the variable countyreal, and the year of the observation by year. The variable treat is the binary indicator for a county introducing a minimum wage in the staggered sense. The variable first.treat is the first year in which the county introduced a minimum wage, which takes value 0 for counties without minimum wage policy (\(treat=0\)) and values 2004-2007 for counties with minimum wage policy (\(treat=1\)). The variable lemp is the log of employment among teenagers in the county in that year. Other variables include the log of population in thousands measured only once for the county (lpop), etc. You may call the dataset (?mpdta) for further description.

# Load `mpdta` data
library(did)
data(mpdta)
table(mpdta$treat, mpdta$year)
##    
##     2003 2004 2005 2006 2007
##   0  309  309  309  309  309
##   1  191  191  191  191  191
length(unique(mpdta$first.treat)) # cohorts or groups (incl. never-treated)
## [1] 4
table(mpdta$first.treat, mpdta$year)
##       
##        2003 2004 2005 2006 2007
##   0     309  309  309  309  309
##   2004   20   20   20   20   20
##   2006   40   40   40   40   40
##   2007  131  131  131  131  131
summary(mpdta$first.treat[mpdta$treat==0])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       0       0       0       0       0       0
summary(mpdta$first.treat[mpdta$treat==1])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    2004    2006    2007    2006    2007    2007
summary(mpdta$lemp)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.099   4.868   5.697   5.773   6.684  10.443
#?mpdta

i. Descriptive county-level maps.

  • Create a single figure combining two county-level maps: one showing the treat data, another showing the first.treat data. You may use the usmap package or any other package that helps you produce high-quality maps. Interpret the figure.

  • Create county-level maps showing the means of lemp by first.treat categories, year-by-year. Combine the maps as desired. You may use the usmap package or any other package that helps you produce high-quality maps. Interpret the figure.

ii. Cohort-year-specific average treatment effects on the treated

  • Using the att_gt command of the did package, compute the group-time average treatment effects on the treated based on outcome regression estimation with the never-treated as the control group, lpop as the covariate, and clustering standard errors at the countyreal level. Interpret your results.

  • Using the att_gt command of the did package, compute the group-time average treatment effects on the treated based on inverse probability weighting (IPW) estimation with the never-treated as the control group, lpop as the covariate, and clustering standard errors at the countyreal level. Do your results qualitatively change as compared to the regression adjustment above?

  • Using the att_gt command of the did package, compute the group-time average treatment effects on the treated based on a doubly robust (DR) estimation with the never-treated as the control group, lpop as the covariate, and clustering standard errors at the countyreal level. Do your results qualitatively change as compared to the regression adjustment and IPW procedures above?

  • Using the ggdid command of the did package, plot the group-time average treatment effects on the treated based on a doubly robust (DR) estimation. Do your earlier maps hint to these results?

iii. Cohort-specific and aggregate average treatment effects on the treated

  • Using the aggte command of the did package, compute the group-specific and overall average treatment effects on the treated based on a doubly robust (DR) estimation with the never-treated as the control group, lpop as the covariate, and clustering standard errors at the countyreal level. Interpret your results.

  • Based on Table 2 from Callaway & Sant’Anna (JoE 2021) or Table 2 from de Chaisemartin & D’Haultfœuille (Econometrics J 2022), identify three R packages, apart from did, that are suitable for heterogeneity-robust estimations in contexts of staggered treatment timing. Employ each of these packages to re-estimate the average treatment effect on the treated and compare the results with those obtained using the did package.

HAVE FUN AND KEEP FAITH IN THE FUN!