Part B.

#import data 
df1 <- read_csv("us_fred_coastal_us_states_avg_hpi_before_after_2005.csv", show_col_types = FALSE)
#overview of the data
stargazer(df1)
## 
## % Table created by stargazer v.5.2.3 by Marek Hlavac, Social Policy Institute. E-mail: marek.hlavac at gmail.com
## % Date and time: Tue, Oct 10, 2023 - 19:59:12
## \begin{table}[!htbp] \centering 
##   \caption{} 
##   \label{} 
## \begin{tabular}{@{\extracolsep{5pt}}lccccc} 
## \\[-1.8ex]\hline 
## \hline \\[-1.8ex] 
## Statistic & \multicolumn{1}{c}{N} & \multicolumn{1}{c}{Mean} & \multicolumn{1}{c}{St. Dev.} & \multicolumn{1}{c}{Min} & \multicolumn{1}{c}{Max} \\ 
## \hline \\[-1.8ex] 
## \hline \\[-1.8ex] 
## \end{tabular} 
## \end{table}
#create did variable 
df1$did <- df1$Time_Period*df1$Disaster_Affected

Estimating Equation

\[ y_i = \beta_0 + \beta_1TimePeriod_i + \beta_2Treated_i +\beta_3(TimePeriod_i*Treated_i)+\epsilon_i \]

#create model
lm_formula = HPI_CHG ~ Time_Period + Disaster_Affected + did
lm1 <- lm(formula = lm_formula, data = df1)
summary(lm1)
## 
## Call:
## lm(formula = lm_formula, data = df1)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.023081 -0.007610 -0.000171  0.004656  0.035981 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        0.037090   0.002819  13.157  < 2e-16 ***
## Time_Period       -0.027847   0.003987  -6.985  1.2e-08 ***
## Disaster_Affected -0.013944   0.006176  -2.258   0.0290 *  
## did                0.019739   0.008734   2.260   0.0288 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01229 on 44 degrees of freedom
## Multiple R-squared:  0.5356, Adjusted R-squared:  0.504 
## F-statistic: 16.92 on 3 and 44 DF,  p-value: 1.882e-07
stargazer(lm1, type = "text", title = "Model Results")
## 
## Model Results
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                               HPI_CHG          
## -----------------------------------------------
## Time_Period                  -0.028***         
##                               (0.004)          
##                                                
## Disaster_Affected            -0.014**          
##                               (0.006)          
##                                                
## did                           0.020**          
##                               (0.009)          
##                                                
## Constant                     0.037***          
##                               (0.003)          
##                                                
## -----------------------------------------------
## Observations                    48             
## R2                             0.536           
## Adjusted R2                    0.504           
## Residual Std. Error       0.012 (df = 44)      
## F Statistic           16.916*** (df = 3; 44)   
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01

2 X 2 Matrix

As seen above, this is how a matrix for the diff in diff model can be computed. All of these values utilizes the expected value of the dependent variable based on different outcomes. Hence, we will construct a simple table first to calculate the mean of HPI_CHG based on Time_Period and Disaster_Affected.

#compute mean 
mn <- aggregate(df1$HPI_CHG, FUN=mean, 
                by=list(df1$Time_Period, df1$Disaster_Affected))
colnames(mn) <- c('Time_Period', 'Disaster_Affected', 'Mean of HPI_CHG') #rename columns
print(mn)
##   Time_Period Disaster_Affected Mean of HPI_CHG
## 1           0                 0     0.037090020
## 2           1                 0     0.009242792
## 3           0                 1     0.023146118
## 4           1                 1     0.015038346

Calculate Diff in Diff Coefficients using Data

\(\beta_0\) is simply the mean value of the control group before treatment which is the first mean in the table where Time_Period and Disaster_Affected is 0.

\(\beta_1\) reflects the difference in mean between the control group when Time_Period 0 and 1 which is respectively the first and second mean in the table.

\(\beta_2\) is the difference in mean between the control (Disaster_Affected = 0) and treatment group (Disaster_Affected=1) when Time_Period is 0.

\(\beta_3\) represents the mean outcome of the treatment group after treatment. Essentially, it is the net difference between the two groups, \(E(Y_{Treatment}-Y_{Control}|Time=1)-E(Y_{Treatment}-Y_{Control}|Time=0)\), before and after treatment which can be calculated by deducting \(\beta_2\) and the difference between the control and treatment group when Time_Period is 1.

# calculate coefficients from mean 
b0 <- signif(mn[1,3], 2)
b1 <- signif((mn[1,3] - mn[2,3]), 2)
b2 <- signif((mn[1,3] - mn[3,3]), 2)
b3 <- signif((b2 - (mn[2,3]-mn[4,3])), 2)
b3
## [1] 0.02

Now that we have the coefficients, we can create a complete matrix with the values.

2 x 2 Matrix
Time_Period = 0 Time_Period = 1
Treated = 0 \(y_i= 0.037+\epsilon_i\) \(y_i=0.037+0.028+\epsilon_i\)
Treated = 1 \(y_i=0.037+0.014+\epsilon_i\) \(y_i=0.037+0.028+0.014+0.02+\epsilon_i\)

Part C.

Interpretation

  1. What is the control and the control group, and what is the treatment and treatment group?

    In this model, the treated variable, \(\beta_2\) which is Disaster_Affected has a value of 0 or 1 since it is a dummy variable. If the observation is 0, it is in the control group; 1 indicates a observation being in the treatment group. The treatment is Time_Period whereby 0 indicates pre-treatment and 1 is post-treatment.

  2. The basic idea is to compare the difference in outcomes between treatment and control groups before and after treatment is introduced. By differencing these difference, what do you hope to achieve and why should it work? Intuitively, why does methodology work/describe it in plain simple English to your grandma.

    One of the main purpose in many studies is to understand the effect of treatment. However, there are other factors that might affect the outcome. In simple terms, Difference in Differences is effective in finding out the effect of treatment while accounting for other factors that could affect the outcome. The reason for this is because the methodology introduces a control and control group that allows us to reliably determine the effect of treatment.

  3. Does your difference in differences coefficient in the linear regression above match the difference in differences effect of the two groups from the 2*2 table created?

Yes, the effect of the two groups in the 2 x 2 matrix matches the coefficients in the linear regression model.

Part D.

What are the “threats to identification”? In other words, what are the “implicit assumptions” like for simple OLS we have the 5 Gauss Markov Assumptions/conditions?  Alternatively, under what conditions can you trust the point estimates, and when would you buy the study results with a grain of salt?

  1. For any causal effect in a difference in differences regression model to be reliable, several assumptions should not be violated.
    • Data used in the model must be panel data or repeated cross-sectional data to effectively demonstrate the effect of treatment. This avoids potential bias caused by permanent differences between the control and treatment group.

    • Exchangeability - This is achieved when the control and treatment group are exchangeable from having a randomized controlled trial.

    • Parallel Trend Assumption - The control and treatment group should have similar trend over time in the absence of treatment. Difference between groups should be constant to have an unbiased estimation of the causal effect. This can be checked for using data visualization.

    • Stable Unit Treatment Values Assumption (SUTVA)

      • Each obeservation’s potential outcome is not affected by another observation’s exposure to treatment.

      • There should be consistency across treatment level; otherwise, this might lead to different outcomes.

Useful Links (Diff-in-Diff)