Dif-in-Dif

Author

Betty Wang

Quarto

Quarto enables you to weave together content and executable code into a finished document. To learn more about Quarto see https://quarto.org.

Running Code

When you click the Render button a document will be generated that includes both content and the output of embedded code. You can embed code like this:

1 + 1
[1] 2

You can add options to executable code like this

[1] 4

The echo: false option disables the printing of code (only output is displayed).

library(pacman)
library(readr)
library(stargazer)

Please cite as: 
 Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
 R package version 5.2.3. https://CRAN.R-project.org/package=stargazer 
library(ggplot2)
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
# Load the CSV file using read_csv
dataset <- read_csv("/Users/bettywang/Desktop/dataset.csv")
Rows: 48 Columns: 6
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): STATE
dbl (5): HPI_CHG, Time_Period, Disaster_Affected, NUM_DISASTERS, NUM_IND_ASSIST

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Create the interaction term
dataset$Interaction <- dataset$Time_Period * dataset$Disaster_Affected
# Run the linear regression
model <- lm(HPI_CHG ~ Time_Period + Disaster_Affected + Interaction,
            data = dataset)
# Summary of the model
summary(model)

Call:
lm(formula = HPI_CHG ~ Time_Period + Disaster_Affected + Interaction, 
    data = dataset)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.023081 -0.007610 -0.000171  0.004656  0.035981 

Coefficients:
                   Estimate Std. Error t value Pr(>|t|)    
(Intercept)        0.037090   0.002819  13.157  < 2e-16 ***
Time_Period       -0.027847   0.003987  -6.985  1.2e-08 ***
Disaster_Affected -0.013944   0.006176  -2.258   0.0290 *  
Interaction        0.019739   0.008734   2.260   0.0288 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.01229 on 44 degrees of freedom
Multiple R-squared:  0.5356,    Adjusted R-squared:  0.504 
F-statistic: 16.92 on 3 and 44 DF,  p-value: 1.882e-07
# Control Group:
# Time_Period = 0 (pre-disaster) and Disaster_Affected = 0 (non-affected region)
# Treatment Group:
# Time_Period = 1 (post-disaster) and Disaster_Affected = 1 (affected region)

# Difference-in-Difference:
# The control group represents regions not affected by the disaster during the
# pre-disaster time period.
# The treatment group represents regions affected by the disaster in the
# post-disaster time period.
# Difference-in-Differences (DiD) measure the effect of disaster on one group
# by comparing it to another group that didn't experience the disaster.
# Before the disaster, check the outcome variable HPI_CHG for both groups
# (Disaster_Affected = 0 and Disaster_Affected = 1);
# Repeat this after the disaster happened
# Then look at how HPI_CHG changed for group (Disaster_Affected = 0)
# and group (Disaster_Affected = 1) before and after the disaster respectively.
# The DiD method isolates the effect of the disaster by comparing the extra
# change in the group (Disaster_Affected = 1) against
# the group (Disaster_Affected = 0). Did helps control for other things that
# might affect both groups equally .

# Calculate the mean of HPI_CHG
mean_HPI_CHG <- mean(dataset$HPI_CHG, na.rm = TRUE)
# Display the mean
mean_HPI_CHG
[1] 0.02231762
# Extract the regression coefficients
coef_model <- coef(model)
# Calculate the predicted values for each group
# Group 1: Time_Period = 0, Disaster_Affected = 0
pred_00 <- coef_model["(Intercept)"]
# Group 2: Time_Period = 1, Disaster_Affected = 0
pred_10 <- coef_model["(Intercept)"] + coef_model["Time_Period"]
# Group 3: Time_Period = 0, Disaster_Affected = 1
pred_01 <- coef_model["(Intercept)"] + coef_model["Disaster_Affected"]
# Group 4: Time_Period = 1, Disaster_Affected = 1
pred_11 <- coef_model["(Intercept)"] + coef_model["Time_Period"] +
  coef_model["Disaster_Affected"] + coef_model["Interaction"]

# Create the 2x2 matrix
regression_matrix <- matrix(c(pred_00, pred_01, pred_10, pred_11),
                            nrow = 2, byrow = TRUE)
# Name the rows and columns
rownames(regression_matrix) <- c("Time_Period = 0", "Time_Period = 1")
colnames(regression_matrix) <- c("Disaster_Affected = 0", "Disaster_Affected = 1")
# Display the matrix
regression_matrix
                Disaster_Affected = 0 Disaster_Affected = 1
Time_Period = 0           0.037090020            0.02314612
Time_Period = 1           0.009242792            0.01503835
# Calculate the difference for the non-affected and affected groups across time
difference_Time <- pred_10 - pred_00
difference_Time_Affected <- pred_11 - pred_01
# Display the result
difference_Time
(Intercept) 
-0.02784723 
difference_Time_Affected
 (Intercept) 
-0.008107772 
# Calculate the difference in difference coefficient
difference_Interaction <- difference_Time_Affected - difference_Time
# Display the result
difference_Interaction
(Intercept) 
 0.01973946 
# This result from the 2 X 2 matrix table matches the difference in difference
# coefficient "Interaction" in the linear regression model.

\[ HPI.CHG_i = \beta_0 + \beta_1 Time.Period_i + \beta_2 Disaster.Affected_i + \beta_3 Interaction_i + \epsilon_i \] \[ y_{ihv} = \beta_0 + \beta_1 BH_{ihv} + \beta_2 F_{ihv} + \beta_3 T_{ihv} + \beta_4 F_{ihv}*BH_{ihv} + \beta_5 T_{ihv}*BH_{ihv} + \beta_6 T_{ihv}*F_{ihv} + \beta_7 T_{ihv}*F_{ihv}*BH_{ihv} + \epsilon_{ihv} \]