1 Load the Data

library(readr)
mydata <- read_csv("/Users/gaoyuqi/Desktop/ADEC 7320/us_fred_coastal_us_states_avg_hpi_before_after_2005.csv")
## Rows: 48 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): STATE
## dbl (5): HPI_CHG, Time_Period, Disaster_Affected, NUM_DISASTERS, NUM_IND_ASSIST
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
print(mydata)
## # A tibble: 48 × 6
##    STATE       HPI_CHG Time_Period Disaster_Affected NUM_DISASTERS NUM_IND_ASS…¹
##    <chr>         <dbl>       <dbl>             <dbl>         <dbl>         <dbl>
##  1 GASTHPI_CHG  0.0140           0                 0             1             0
##  2 NCSTHPI_CHG  0.0142           0                 0             3             0
##  3 TXSTHPI_CHG  0.0102           0                 1             5            22
##  4 MASTHPI_CHG  0.0275           0                 0             4             9
##  5 ALSTHPI_CHG  0.0176           0                 1             4            14
##  6 MSSTHPI_CHG  0.0133           0                 1             3            49
##  7 SCSTHPI_CHG  0.0180           0                 0             1             0
##  8 NHSTHPI_CHG  0.0285           0                 0             5             6
##  9 LASTHPI_CHG  0.0156           0                 1             5            55
## 10 CTSTHPI_CHG  0.0323           0                 0             3             0
## # … with 38 more rows, and abbreviated variable name ¹​NUM_IND_ASSIST

2 Fit Diff in Diff Regression Model

\[ HPICHG = \beta_0\ + \beta_1DisasterAffected\ + \beta_2TimePeriod\ + \beta_3 DisasterAffected*TimePeriod\ + \epsilon\ \]

  • Control Group: The areas that unaffected by the disaster.

  • Treatment Group: The areas that affected by the disaster. The group represents the event of interest.

  • Disaster_Affected which is the treatment that introduces in our diff in diff regression model. Specifically, 0 means areas are not affected by disaster, and 1 means ares affected by disaster.

  • Time_Period equals to 0 indicates observations from pre - disaster period, and 1 otherwise.

lm_mydata <- lm(HPI_CHG ~ Disaster_Affected + Time_Period + Disaster_Affected * Time_Period, data = mydata )
summary(lm_mydata)
## 
## Call:
## lm(formula = HPI_CHG ~ Disaster_Affected + Time_Period + Disaster_Affected * 
##     Time_Period, data = mydata)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.023081 -0.007610 -0.000171  0.004656  0.035981 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                    0.037090   0.002819  13.157  < 2e-16 ***
## Disaster_Affected             -0.013944   0.006176  -2.258   0.0290 *  
## Time_Period                   -0.027847   0.003987  -6.985  1.2e-08 ***
## Disaster_Affected:Time_Period  0.019739   0.008734   2.260   0.0288 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01229 on 44 degrees of freedom
## Multiple R-squared:  0.5356, Adjusted R-squared:  0.504 
## F-statistic: 16.92 on 3 and 44 DF,  p-value: 1.882e-07
  • Note: All variables have p-value less than 0.05, indicating all the features are statistically significant.

2.1 Methodology for Diff in Diff

  • DID will capture the treatment effect as the difference in changes over time across treatment and control groups. To be more specific, difference in changes indicates each group start the pre-treatment average vs. each group end up post - treatment average.

  • We hope to gain the magnitude of treatment effect by estimating the interaction term coefficient in our DID model.

3 2 * 2 Matrix

Pre - Treatment Post - Treatment
Control Group 0.037090 0.023146
Treatment Group 0.009243 0.015038
diff -0.02784 -0.008108
#Calculating the diff in diff between two groups
diff <- (-0.008108) - (-0.02784)
print(diff)
## [1] 0.019732
  • -0.008108 is the post treatment differences between two groups. Moreover, -0.02784 is the pre-treatment differences between two groups which is the gap already exists in pre-treatment period.

  • The value of 0.0197 is exactly the value of coefficient of the interaction term from the summary statistics.

4 Assumptions for DID

Parallel trends: The idea comparison for two groups is sharing identical background expect one factor difference. For example, consider a scenario where the Federal Reserve implements different monetary policies in two states to determine which policy is more effective. For the DID to be valid, both states should ideally have similar economic environments prior to the policy changes. This means factors like unemployment rates, tax rates, and other relevant economic indicators should be comparable between the two states.Thus, we could dive into casual effect of monteary policy evaluation.