#Clearing the environment
rm(list = ls())
##
## Please cite as:
## Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
## R package version 5.2.3. https://CRAN.R-project.org/package=stargazer
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.5.0 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
data = read.csv("~/Desktop/1fc451683137398e11c75b2e47031cf1-211bac7f1490d57867d34c1a516617d59a485b21/us_fred_coastal_us_states_avg_hpi_before_after_2005.csv")
head(data)
## STATE HPI_CHG Time_Period Disaster_Affected NUM_DISASTERS
## 1 GASTHPI_CHG 0.01400856 0 0 1
## 2 NCSTHPI_CHG 0.01422063 0 0 3
## 3 TXSTHPI_CHG 0.01019172 0 1 5
## 4 MASTHPI_CHG 0.02753656 0 0 4
## 5 ALSTHPI_CHG 0.01758507 0 1 4
## 6 MSSTHPI_CHG 0.01325241 0 1 3
## NUM_IND_ASSIST
## 1 0
## 2 0
## 3 22
## 4 9
## 5 14
## 6 49
stargazer(data,
type = "text")
##
## ================================================
## Statistic N Mean St. Dev. Min Max
## ------------------------------------------------
## HPI_CHG 48 0.022 0.017 -0.006 0.061
## Time_Period 48 0.500 0.505 0 1
## Disaster_Affected 48 0.208 0.410 0 1
## NUM_DISASTERS 48 3.208 2.143 1 10
## NUM_IND_ASSIST 48 8.583 14.946 0 55
## ------------------------------------------------
# Histogram of HPI_CHG
hist(data$HPI_CHG,
breaks = 30, col = "skyblue",
main = "Distribution of HPI_CHG",
xlab = "State-wise house price inflation ",
ylab = "Frequency")
#Creating the interaction term
data$did = data$Time_Period * data$Disaster_Affected
data$did
## [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 1 0 0 0 0 0
## [39] 0 0 0 1 0 0 0 0 0 0
#Passing it to the linear regression model
model1 = lm( data = data,
formula = HPI_CHG ~ Time_Period + Disaster_Affected + did
)
summary(model1)
##
## Call:
## lm(formula = HPI_CHG ~ Time_Period + Disaster_Affected + did,
## data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.023081 -0.007610 -0.000171 0.004656 0.035981
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.037090 0.002819 13.157 < 2e-16 ***
## Time_Period -0.027847 0.003987 -6.985 1.2e-08 ***
## Disaster_Affected -0.013944 0.006176 -2.258 0.0290 *
## did 0.019739 0.008734 2.260 0.0288 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.01229 on 44 degrees of freedom
## Multiple R-squared: 0.5356, Adjusted R-squared: 0.504
## F-statistic: 16.92 on 3 and 44 DF, p-value: 1.882e-07
Interpretation: The negative values for both time period and disaster_affected=1 when other factors are constant suggest that the housing prices decreases. However, the interaction term suggests that the negative impact on the prices is not very severe when both of the independent variables occur.
The goal of comparing the differences in outcomes (housing price changes) between treatment and control groups before and after the intervention (disasters) is to isolate the disaster’s effect while adjusting for any broader trends. This method is effective because it eliminates extraneous trends impacting both groups, leaving solely the impact of the event.
Imagine two towns, one got hit by a storm, and the other didn't. We look at house prices before and after the storm in both towns. By comparing the change in house prices in the town with the storm to the change in the town without the storm, we can figure out how much the storm affected prices. This helps us not blame the storm for something that would have happened anyway.
# Create a two-way table with labels
raw_table <- table(Time_Period = ifelse(test = data$Time_Period == 0, yes = "pre", no = "post"),
Treatment_Status = ifelse(data$Disaster_Affected == 0, yes = "control", no = "treatment")
)
raw_table
## Treatment_Status
## Time_Period control treatment
## post 19 5
## pre 19 5
?tapply # Apply a function to each cell of a ragged array, that is to each (non-empty) group of values given by a unique combination of the levels of certain factors.
# Calculate means for each group
mean_table <- tapply(X = data$HPI_CHG,
INDEX = list(data$Time_Period,
data$Disaster_Affected),
FUN = mean)
mean_table
## 0 1
## 0 0.037090020 0.02314612
## 1 0.009242792 0.01503835
# Calculate means for each group
mean_table <- tapply(data$HPI_CHG, list(Time_Period = ifelse(data$Time_Period == 0, "pre", "post"),
Treatment_Status = ifelse(data$Disaster_Affected == 0, "control", "treatment")),
mean)
# Display the table
print(mean_table)
## Treatment_Status
## Time_Period control treatment
## post 0.009242792 0.01503835
## pre 0.037090020 0.02314612
# Calculate DiD effect
DiD_effect <- (mean_table[1, 2] - mean_table[2, 2]) - (mean_table[1, 1] - mean_table[2, 1])
print(DiD_effect)
## [1] 0.01973946
They match.
The primary assumption of Difference-in-Differences (DiD) is parallel trends, which means that the treatment and control groups would have followed the same path in the absence of the treatment. If additional events or shocks only influence one group, or if groups were already on distinct paths prior to treatment, the results may be skewed. When pre-treatment trends are consistent and external influences are well-controlled, the estimations can be trusted.
\[ \text{Enrollment} = \beta_0 + \beta_1 \text{Treat} + \beta_2 \text{Female} + \beta_3 \text{Bihar} + \beta_4 (\text{Treat} \times \text{Female}) + \beta_5 (\text{Treat} \times \text{Bihar}) + \beta_6 (\text{Female} \times \text{Bihar}) + \beta_7 (\text{Treat} \times \text{Female} \times \text{Bihar}) + \epsilon \]
This study compares females and males in Bihar and other states in terms of grade 9 enrollment before and after the cycle program was implemented. The DDD technique removes confounding effects from gender and location variations, as well as time differences (pre/post program). By examining improvements across all three parameters, the study isolates the Cycle Program’s impact on female school enrollment in Bihar.