Introduction
Difference in Differences estimators are most frequently
used in policy-analysis projects, allowing the researchers to
investigate the impact of some treatment on the dependent variable. This
discussion post demonstrates the effectiveness of Diff-in-Diff by
analyzing a dataset from 2005, which focuses on the impact that natural
disasters, specifically hurricanes, have on the prices of homes in
affected areas. In my model, I will compare control and treatment groups
to see whether there are statistically significant effects that natural
disasters have on home prices in an area.
Dataset
Exploration
The dataset contains six variables, which can be seen in the
variable table below. Additionally, I have included summary statistics
and PDF plots to show how the variables are structured. ## Variable
Table
|
Variable
|
Description
|
|
STATE
|
US State Identifier
|
|
HPI_CHG
|
House Price Inflation in the Coastal States
|
|
Disaster_Affected
|
Binary Treatment Variable indicating being impacted by the Hurricane
Season
|
|
Time_Period
|
Binary variable taking on the value of one if the year is 2005
|
|
NUM_IND_ASSIST
|
Number of Counties recieving Individual Assistance from FEMA
|
|
NUM_IND_DISASTERS
|
The Number of Natural Disasters
|
Summary Stats
## SummaryStats was converted to a data frame
Variable Graphs

Regression Model
Estimating
Equation
\(HPI\)_\(CHG_{it}\) \(=\) \(\beta_{0}\) + \(\delta_{0}\)\(TimePeriod_{t}\) + \(\beta_{1}\)\(DisasterAffected_{it}\)+\(\delta_{1}\)\(TimePeriod_{t}\)*\(DisasterAffected_{it}\) + \(\epsilon_{it}\)
The DiD regression model will only utilize two of the
variables from the dataset. The time period variable and the treatment
variable, which in this case is a binary variable taking on the value of
one if a state received a certain amount of aid from FEMA in a given
year. The \(\beta_{0}\) and the \(\delta_{0}\) coefficients are both
intercepts for a given year, where \(\beta_{0}\) is the initial intercept and
\(\beta_{0}\)+\(\delta_{0}\) is the intercept in the
post-period (year 2005).
An important thing to notice is that
we now have two subscripts, “i” and “t.” The subscript “i” represents
each state, while the subscript “t” represents the year during which the
state was observed. We see that while the \(TimePeriod_{t}\) variable only has
subscript “t,” since it only varies across time, \(HPI\)_\(CHG_{it}\), \(\beta_{1}\)\(DisasterAffected_{it}\), and \(\epsilon_{it}\) contain both subscripts,
since these variables vary across time and individuals.
Naturally, the most essential part of the model is the interaction term.
Represented by \(\delta_{1}\), this is
the treatment effect during the post-period, and it represents the
impact of the policy we are attempting to study through our model. The
table below demonstrates how the \(\delta_{1}\) becomes the DiD coefficient.
Coefficient
Table
|
.
|
Control
|
Treatment
|
DiffInDiff
|
|
Control
|
beta_{0}
|
delta_{0} + beta_{0}
|
beta_{0}+delta_{0}-beta_{0}
|
|
Treatment
|
beta_{0} + beta_{1}
|
beta_{0} + delta_{0}+ beta_{1} + delta_{1}
|
beta_{0}+delta_{0}+beta_{1}+delta_{1}-(beta_{0}+beta_{1})
|
|
DiffInDiff
|
beta_{1}
|
beta_{1} + delta_{1}
|
delta_{1}
|
The table above shows that by subtracting the control from
the treatment, we ultimately arrive at the
\(delta_{1}\) coefficient. This can be
further proven by running our model.
The intuition behind
this table is simple. We are subtracting different
\(y^{-}\) from each other. The table can be
re-written as the equation:
\(\delta_{1}\)=(\(yhat_{1T}\)-\(yhat_{1C}\))-(\(yhat_{0T}\)-\(yhat_{0C}\))
where the T and C subscripts represent “treatment” and
“control”, and the numerical subscripts represent the time period. The
equation can also be written as:
\(\delta_{1}\)=(\(yhat_{1T}\)-\(yhat_{0T}\))-(\(yhat_{1C}\)-\(yhat_{0C}\))
Looking at these equations, it becomes clear that we are
subtracting the results of a regression from period one without
treatment from the results of a regression from period one with
treatment, and then repeating the process for period one. We then take
the results of these differences and subtract the results from period
zero from the results from period one, giving us the
difference-in-differences coefficient.
In the most simple
terms, we are subtracting the difference between two groups before from
the current difference and seeing whether anything has changed.
Regression Model
pDIDdy <- lm(HPI_CHG~Time_Period+Disaster_Affected+Time_Period*Disaster_Affected,(data=coastal))
tab_model(pDIDdy)
|
|
HPI_CHG
|
|
Predictors
|
Estimates
|
CI
|
p
|
|
(Intercept)
|
0.04
|
0.03 – 0.04
|
<0.001
|
|
Time Period
|
-0.03
|
-0.04 – -0.02
|
<0.001
|
|
Disaster Affected
|
-0.01
|
-0.03 – -0.00
|
0.029
|
Time Period × Disaster Affected
|
0.02
|
0.00 – 0.04
|
0.029
|
|
Observations
|
48
|
|
R2 / R2 adjusted
|
0.536 / 0.504
|
From the results, we can see that, on average, houses in
2005 were three percentage points cheaper compared to the previous
period, cetris paribus. Additionally, houses in areas affected by
hurricanes were, on average, one percentage point cheaper relative to
the houses that were not, cetris paribus. Most importantly, however, the
interaction term between the Time Period and Disaster Affected variable
tells us that houses in 2005 affected by a disaster were two percentage
points more expensive relative to the non-affected homes in the initial
period, cetris paribus. This could be an indicator of the disaster
relief aid provided by FEMA working and mitigating the negative effects
that natural disasters have on the prices of homes.
To prove that our difference in differences estimator is
accurate, we can plug in our \(\beta_{0}\),\(\beta_{1}\), and \(\delta_{0}\) coefficient into the
coefficient table.
DiD Coefficient
Derivation from Estimated Values
|
.
|
Control
|
Treatment
|
DiffInDiff
|
|
Control
|
0.04
|
-0.03 + 0.04
|
0.04+(-0.03-0.04)
|
|
Treatment
|
0.04 + -0.01
|
0.04 + (-0.03) + (-0.01) + 0.02
|
0.04+(-0.03)+(-0.01)+0.02-(0.04+(-0.01)
|
|
DiffInDiff
|
-0.01
|
-0.01 + 0.02
|
0.02
|
As we can see in the table, after doing some basic math,
the coefficients do indeed add put to the DiD coefficient.
Parallel Trends
Assumption
The DiD model, like an ordinary OLS, needs to meet the
Gauss-Markov assumptions to be BLUE; however, on top of that, it also
has to meet the parallel trends assumption. According to PTA, the
treatment and control groups are to “move” in the same way, with the
only difference being the treatment. In this case, for example, the
inflation of house prices should change the same in all places included
in the dataset, were it not for natural disasters. This could be an
issue since the dataset spans all American coastal states from
California to Georgia. Since each state has its own real-estate market
with different regulations and trends, which surely changed over this
period in different ways, the PTA assumption was likely violated.