library(readr)
mydata <- read_csv("/Users/gaoyuqi/Desktop/ADEC 7320/us_fred_coastal_us_states_avg_hpi_before_after_2005.csv")
## Rows: 48 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): STATE
## dbl (5): HPI_CHG, Time_Period, Disaster_Affected, NUM_DISASTERS, NUM_IND_ASSIST
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
print(mydata)
## # A tibble: 48 × 6
## STATE HPI_CHG Time_Period Disaster_Affected NUM_DISASTERS NUM_IND_ASS…¹
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 GASTHPI_CHG 0.0140 0 0 1 0
## 2 NCSTHPI_CHG 0.0142 0 0 3 0
## 3 TXSTHPI_CHG 0.0102 0 1 5 22
## 4 MASTHPI_CHG 0.0275 0 0 4 9
## 5 ALSTHPI_CHG 0.0176 0 1 4 14
## 6 MSSTHPI_CHG 0.0133 0 1 3 49
## 7 SCSTHPI_CHG 0.0180 0 0 1 0
## 8 NHSTHPI_CHG 0.0285 0 0 5 6
## 9 LASTHPI_CHG 0.0156 0 1 5 55
## 10 CTSTHPI_CHG 0.0323 0 0 3 0
## # … with 38 more rows, and abbreviated variable name ¹NUM_IND_ASSIST
\[ HPICHG = \beta_0\ + \beta_1DisasterAffected\ + \beta_2TimePeriod\ + \beta_3 DisasterAffected*TimePeriod\ + \epsilon\ \]
Control Group: The areas that unaffected by the disaster.
Treatment Group: The areas that affected by the disaster. The group represents the event of interest.
Disaster_Affected which is the treatment that introduces in our diff in diff regression model. Specifically, 0 means areas are not affected by disaster, and 1 means ares affected by disaster.
Time_Period equals to 0 indicates observations from pre - disaster period, and 1 otherwise.
lm_mydata <- lm(HPI_CHG ~ Disaster_Affected + Time_Period + Disaster_Affected * Time_Period, data = mydata )
summary(lm_mydata)
##
## Call:
## lm(formula = HPI_CHG ~ Disaster_Affected + Time_Period + Disaster_Affected *
## Time_Period, data = mydata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.023081 -0.007610 -0.000171 0.004656 0.035981
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.037090 0.002819 13.157 < 2e-16 ***
## Disaster_Affected -0.013944 0.006176 -2.258 0.0290 *
## Time_Period -0.027847 0.003987 -6.985 1.2e-08 ***
## Disaster_Affected:Time_Period 0.019739 0.008734 2.260 0.0288 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.01229 on 44 degrees of freedom
## Multiple R-squared: 0.5356, Adjusted R-squared: 0.504
## F-statistic: 16.92 on 3 and 44 DF, p-value: 1.882e-07
DID will capture the treatment effect as the difference in changes over time across treatment and control groups. To be more specific, difference in changes indicates each group start the pre-treatment average vs. each group end up post - treatment average.
We hope to gain the magnitude of treatment effect by estimating the interaction term coefficient in our DID model.
| Pre - Treatment | Post - Treatment | |
|---|---|---|
| Control Group | 0.037090 | 0.023146 |
| Treatment Group | 0.009243 | 0.015038 |
| diff | -0.02784 | -0.008108 |
#Calculating the diff in diff between two groups
diff <- (-0.008108) - (-0.02784)
print(diff)
## [1] 0.019732
-0.008108 is the post treatment differences between two groups. Moreover, -0.02784 is the pre-treatment differences between two groups which is the gap already exists in pre-treatment period.
The value of 0.0197 is exactly the value of coefficient of the interaction term from the summary statistics.
Parallel trends: The idea comparison for two groups is sharing identical background expect one factor difference. For example, consider a scenario where the Federal Reserve implements different monetary policies in two states to determine which policy is more effective. For the DID to be valid, both states should ideally have similar economic environments prior to the policy changes. This means factors like unemployment rates, tax rates, and other relevant economic indicators should be comparable between the two states.Thus, we could dive into casual effect of monteary policy evaluation.