1 Introduction


      Difference in Differences estimators are most frequently used in policy-analysis projects, allowing the researchers to investigate the impact of some treatment on the dependent variable. This discussion post demonstrates the effectiveness of Diff-in-Diff by analyzing a dataset from 2005, which focuses on the impact that natural disasters, specifically hurricanes, have on the prices of homes in affected areas. In my model, I will compare control and treatment groups to see whether there are statistically significant effects that natural disasters have on home prices in an area.

2 Dataset Exploration


      The dataset contains six variables, which can be seen in the variable table below. Additionally, I have included summary statistics and PDF plots to show how the variables are structured. ## Variable Table
Variable Description
STATE US State Identifier
HPI_CHG House Price Inflation in the Coastal States
Disaster_Affected Binary Treatment Variable indicating being impacted by the Hurricane Season
Time_Period Binary variable taking on the value of one if the year is 2005
NUM_IND_ASSIST Number of Counties recieving Individual Assistance from FEMA
NUM_IND_DISASTERS The Number of Natural Disasters

2.1 Summary Stats

## SummaryStats was converted to a data frame

Data Frame Summary

SummaryStats

Dimensions: 15 x 5
Duplicates: 0
Variable Stats / Values Freqs (% of Valid) Graph
Disaster_Affected [numeric]
Mean (sd) : 10.2 (27.7)
min ≤ med ≤ max:
-0.1 ≤ 0.2 ≤ 100
IQR (CV) : 1.2 (2.7)
-0.06 !:1(6.7%)
0.00  :6(40.0%)
0.21 !:1(6.7%)
0.34 !:1(6.7%)
0.41 !:1(6.7%)
1.00  :1(6.7%)
1.39 !:1(6.7%)
1.97 !:1(6.7%)
48.00  :1(6.7%)
100.00  :1(6.7%)
! rounded
HPI_CHG [numeric]
Mean (sd) : 10 (27.8)
min ≤ med ≤ max:
-0.4 ≤ 0 ≤ 100
IQR (CV) : 0.5 (2.8)
15 distinct values
NUM_DISASTERS [numeric]
Mean (sd) : 12.3 (27)
min ≤ med ≤ max:
0.3 ≤ 3 ≤ 100
IQR (CV) : 2.8 (2.2)
15 distinct values
NUM_IND_ASSIST [numeric]
Mean (sd) : 16.9 (28.7)
min ≤ med ≤ max:
0 ≤ 3.2 ≤ 100
IQR (CV) : 12.3 (1.7)
12 distinct values
Time_Period [numeric]
Mean (sd) : 10.2 (27.7)
min ≤ med ≤ max:
-2 ≤ 0.5 ≤ 100
IQR (CV) : 0.8 (2.7)
-2.04 !:1(6.7%)
0.00  :3(20.0%)
0.34 !:1(6.7%)
0.50  :2(13.3%)
0.51 !:1(6.7%)
0.74 !:1(6.7%)
1.00  :3(20.0%)
1.01 !:1(6.7%)
48.00  :1(6.7%)
100.00  :1(6.7%)
! rounded

Generated by summarytools 1.0.1 (R version 4.2.3)
2023-10-09

2.2 Variable Graphs

3 Regression Model

3.1 Estimating Equation

\(HPI\)_\(CHG_{it}\) \(=\) \(\beta_{0}\) + \(\delta_{0}\)\(TimePeriod_{t}\) + \(\beta_{1}\)\(DisasterAffected_{it}\)+\(\delta_{1}\)\(TimePeriod_{t}\)*\(DisasterAffected_{it}\) + \(\epsilon_{it}\)


      The DiD regression model will only utilize two of the variables from the dataset. The time period variable and the treatment variable, which in this case is a binary variable taking on the value of one if a state received a certain amount of aid from FEMA in a given year. The \(\beta_{0}\) and the \(\delta_{0}\) coefficients are both intercepts for a given year, where \(\beta_{0}\) is the initial intercept and \(\beta_{0}\)+\(\delta_{0}\) is the intercept in the post-period (year 2005).
      An important thing to notice is that we now have two subscripts, “i” and “t.” The subscript “i” represents each state, while the subscript “t” represents the year during which the state was observed. We see that while the \(TimePeriod_{t}\) variable only has subscript “t,” since it only varies across time, \(HPI\)_\(CHG_{it}\), \(\beta_{1}\)\(DisasterAffected_{it}\), and \(\epsilon_{it}\) contain both subscripts, since these variables vary across time and individuals.
      Naturally, the most essential part of the model is the interaction term. Represented by \(\delta_{1}\), this is the treatment effect during the post-period, and it represents the impact of the policy we are attempting to study through our model. The table below demonstrates how the \(\delta_{1}\) becomes the DiD coefficient.

3.2 Coefficient Table

. Control Treatment DiffInDiff
Control beta_{0} delta_{0} + beta_{0} beta_{0}+delta_{0}-beta_{0}
Treatment beta_{0} + beta_{1} beta_{0} + delta_{0}+ beta_{1} + delta_{1} beta_{0}+delta_{0}+beta_{1}+delta_{1}-(beta_{0}+beta_{1})
DiffInDiff beta_{1} beta_{1} + delta_{1} delta_{1}

      The table above shows that by subtracting the control from the treatment, we ultimately arrive at the \(delta_{1}\) coefficient. This can be further proven by running our model.
      The intuition behind this table is simple. We are subtracting different \(y^{-}\) from each other. The table can be re-written as the equation:
\(\delta_{1}\)=(\(yhat_{1T}\)-\(yhat_{1C}\))-(\(yhat_{0T}\)-\(yhat_{0C}\))

      where the T and C subscripts represent “treatment” and “control”, and the numerical subscripts represent the time period. The equation can also be written as:
\(\delta_{1}\)=(\(yhat_{1T}\)-\(yhat_{0T}\))-(\(yhat_{1C}\)-\(yhat_{0C}\))


      Looking at these equations, it becomes clear that we are subtracting the results of a regression from period one without treatment from the results of a regression from period one with treatment, and then repeating the process for period one. We then take the results of these differences and subtract the results from period zero from the results from period one, giving us the difference-in-differences coefficient.
      In the most simple terms, we are subtracting the difference between two groups before from the current difference and seeing whether anything has changed.

3.3 Regression Model

pDIDdy <- lm(HPI_CHG~Time_Period+Disaster_Affected+Time_Period*Disaster_Affected,(data=coastal))

tab_model(pDIDdy)
  HPI_CHG
Predictors Estimates CI p
(Intercept) 0.04 0.03 – 0.04 <0.001
Time Period -0.03 -0.04 – -0.02 <0.001
Disaster Affected -0.01 -0.03 – -0.00 0.029
Time Period × Disaster
Affected
0.02 0.00 – 0.04 0.029
Observations 48
R2 / R2 adjusted 0.536 / 0.504


      From the results, we can see that, on average, houses in 2005 were three percentage points cheaper compared to the previous period, cetris paribus. Additionally, houses in areas affected by hurricanes were, on average, one percentage point cheaper relative to the houses that were not, cetris paribus. Most importantly, however, the interaction term between the Time Period and Disaster Affected variable tells us that houses in 2005 affected by a disaster were two percentage points more expensive relative to the non-affected homes in the initial period, cetris paribus. This could be an indicator of the disaster relief aid provided by FEMA working and mitigating the negative effects that natural disasters have on the prices of homes.

      To prove that our difference in differences estimator is accurate, we can plug in our \(\beta_{0}\),\(\beta_{1}\), and \(\delta_{0}\) coefficient into the coefficient table.

3.4 DiD Coefficient Derivation from Estimated Values

. Control Treatment DiffInDiff
Control 0.04 -0.03 + 0.04 0.04+(-0.03-0.04)
Treatment 0.04 + -0.01 0.04 + (-0.03) + (-0.01) + 0.02 0.04+(-0.03)+(-0.01)+0.02-(0.04+(-0.01)
DiffInDiff -0.01 -0.01 + 0.02 0.02


      As we can see in the table, after doing some basic math, the coefficients do indeed add put to the DiD coefficient.