A Guide To Using The Difference-In-Differences Regression Model: https://towardsdatascience.com/a-guide-to-using-the-difference-in-differences-regression-model-87cd2fb3224a
In a simple diff-in-diff paper, we are looking for either the 2 X 2 matrix of regression equations (so that you can construct the diff-in-diff estimator), or you are looking for the following regression form: Y = α + β1(time)+ β2(treatment) + β3(time*treatment)
β1 is the expected mean change in outcome from before to after the onset of the intervention era among the control group. It reflects, if you will, the pure effect of the passage of time in the absence of the actual intervention.
β2 (coefficient of the treatment variable) is the estimated mean difference in Y between the treatment and control groups prior to the intervention: it represents whatever “baseline” differences existed between the groups before the intervention was applied to the control group.
β3 by itself is the difference in differences estimator. In most contexts, it is β3 that is the focus of interest. It tells us whether the expected mean change in outcome from before to after was different in the two groups. (That would typically be the hallmark of an effective intervention, assuming adequate power, etc.) To get the estimated mean difference in Y between the treatment and control groups after the intervention, you need to look at β1 + β3. It is possible that you will find that β1 + β3 is significantly different from zero, even though neither β1, nor β3 by itself is.
Estimating Equation
\(y_i\) = \(\beta_0\) + \(\beta_1TimePeriod_i\) + \(\beta_2Treatment_i\) + \(\beta_3(TimePeriod*Treatment)i\) + \(\epsilon_i\)
The dependent variable is HPI_CHG and yhe independent variables are Time_Period (dummy), Disaster Affected (dummy), and their interaction term.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
diff_data <- read.csv("E:/RM/us_fred_coastal_us_states_avg_hpi_before_after_2005.csv")
# Creating the interaction term
Interaction_Term <- diff_data$Time_Period * diff_data$Disaster_Affected
# Simple linear regression
model <- lm(HPI_CHG ~ Time_Period + Disaster_Affected + Interaction_Term, data = diff_data)
regression_DID <- summary(model)$coefficients["Interaction_Term", "Estimate"]
# Summary of the regression
summary(model)
##
## Call:
## lm(formula = HPI_CHG ~ Time_Period + Disaster_Affected + Interaction_Term,
## data = diff_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.023081 -0.007610 -0.000171 0.004656 0.035981
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.037090 0.002819 13.157 < 2e-16 ***
## Time_Period -0.027847 0.003987 -6.985 1.2e-08 ***
## Disaster_Affected -0.013944 0.006176 -2.258 0.0290 *
## Interaction_Term 0.019739 0.008734 2.260 0.0288 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.01229 on 44 degrees of freedom
## Multiple R-squared: 0.5356, Adjusted R-squared: 0.504
## F-statistic: 16.92 on 3 and 44 DF, p-value: 1.882e-07
print(regression_DID)
## [1] 0.01973946
Control represents the pre-weather event (2005 Atlantic hurricane season) conditions and house price changes in states not affected by these events and control group consists of states that were not significantly affected by coastal weather events. Treatment is the impact of coastal weather events on house prices in the affected states, comparing before and after the events. Treatment group has states that were heavily impacted by coastal weather events, representing the group exposed to the treatment.
We’re trying to figure out if hurricanes affect house prices. The Control Group includes states where hurricanes had little impact, so we can see how prices change without hurricanes. The Treatment Group has states hit hard by hurricanes, so we can see how prices change with hurricanes. We want to see if hurricanes really make prices go up or down. To do that, we compare how prices changed before and after hurricanes in both groups. It should work because it helps us separate the real effect of hurricanes from other things that might also be changing prices. By comparing the two groups and looking at how prices change before and after the hurricanes, we can be more confident that any big price changes we see are likely caused by the hurricanes themselves.
mean_control_pre <- mean(diff_data$HPI_CHG[diff_data$Time_Period == 0 & diff_data$Disaster_Affected == 0])
mean_control_post <- mean(diff_data$HPI_CHG[diff_data$Time_Period == 1 & diff_data$Disaster_Affected == 0])
mean_treatment_pre <- mean(diff_data$HPI_CHG[diff_data$Time_Period == 0 & diff_data$Disaster_Affected == 1])
mean_treatment_post <- mean(diff_data$HPI_CHG[diff_data$Time_Period == 1 & diff_data$Disaster_Affected == 1])
matrix_2x2 <- matrix(c(mean_control_pre, mean_control_post, mean_treatment_pre, mean_treatment_post), nrow = 2)
print(matrix_2x2)
## [,1] [,2]
## [1,] 0.037090020 0.02314612
## [2,] 0.009242792 0.01503835
DID_coefficient <- (mean_treatment_post - mean_treatment_pre) - (mean_control_post - mean_control_pre)
print(DID_coefficient)
## [1] 0.01973946
coefficients_match <- abs(DID_coefficient - regression_DID)
coefficients_match
## [1] 0
Therefore, the difference in differences coefficient from the linear regression matched the difference between the means of the treatment group in the post-treatment period and the pre-treatment period, minus the difference between the means of the control group in the post-treatment period and the pre-treatment period.
In difference in difference analysis, there are many assumptions and potential threats to identification that we should consider. These assumptions are critical to trust the point estimates and the validity of the study results.
Parallel Trends Assumption: It assumes that, in the absence of the treatment, the two groups (treatment and control) would follow parallel trends over time. If this assumption is violated, it suggests that other factors are influencing the outcomes, making it difficult to attribute changes solely to the treatment. The parallel trends assumption tells that in the absence of coastal weather events, both groups should have similar pre-treatment trends in house prices. This is a baseline for comparison.
Common Trends: From this assumption, both groups should have common trends before the treatment begins. This is essential for creating a baseline to compare post-treatment changes.
Adequate Time Periods: DiD may not work well with very short time periods because there may not be enough data points to capture trends accurately. To satisfy this assumption we got a dataset with sufficient time periods before and after the coastal weather events. We got the target variable data (housing prices) for four quarters before and after the hurricane and calculated the average quarter-over-quarter fractional change in the house price index over the two sets of quarters state-wise.
Large Enough Sample: Having a sufficiently large sample size is essential to detect statistically significant treatment effects. To accomplish this, we got the data from 24 coastal states and classified them into treatment and control groups based upon the median values of the affected states to avoid bias due to a imbalanced dataset.
A. ARTICLE SUMMARY
The treatment in this study is the sudden inflow of low-skilled Cuban immigrants due to the Mariel Boatlift. The treatment group consists of Miami where this occurred, and the control group consists of other cities in the United States that did not experience the same immigration shock.
The methodology in the study compared Miami to other cities and it is a the conventional way to assess the impact of the Mariel Boatlift on the local labor market. This approach is interesting because it allows researchers to isolate the effects of the Boatlift from other potential factors influencing the labor market. The study’s methodology works because it leverages the differences in labor market outcomes between Miami and other cities both before and after the Boatlift. Although the study findings are interesting, it’s important to exercise caution before making significant policy decisions based on a single study. It’s valuable to consider other relevant studies and factors when making policy decisions. The papers findings seem interesting but I’m not convinced with the study’s methodolgy, as the study covers a relatively short time frame, and the impact of the Boatlift might not be fully realized in the data. Longer-term effects could differ from the short-term findings. The boatlift happend in the year 1980, and the survey compared the multiple cities from 1979 to 1985. I’m not convinced about the long term effects to make an informed decision.
The study covers a relatively short time frame. The labor market effects of the Boatlift may not fully manifest within this period. Long-term effects might differ from short-term effects.
Cuban immigrants arrived in Miami as a result of the Boatlift. These immigrants might have differed from non-migrants in terms of skills. This immigration may have an impact on labor market results. If the migrants were more skilled, the estimated effects cannot be solely attributed to the Boatlift itself. Selectivity could be driving the results.
If the Miami’s labor market was already on a different trajectory compared to the comparison cities before the Boatlift, it violates the parallel trends assumption.
I would consider the results of the study if the above criteria have been met.
“California Paid Family Leave and Parental Time Use”. - Samantha Trajkovski
The author used a Diff-in-Diff method to compare parents time allocation before and after the implementation of Paid Family Leave Policy in California in 2004. They found that this policy had a positive effect on the time parents spent with their children. This study highlights the importance of family-friendly policies in promoting more time for children in their crucial early years. In this paper, the treatment group are the people residing in the California.
https://surface.syr.edu/cpr/249/