Setting up the environment

#Clearing the environment
rm(list = ls())

## 
## Please cite as:

##  Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.

##  R package version 5.2.3. https://CRAN.R-project.org/package=stargazer

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.5.0     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

B.

data = read.csv("~/Desktop/1fc451683137398e11c75b2e47031cf1-211bac7f1490d57867d34c1a516617d59a485b21/us_fred_coastal_us_states_avg_hpi_before_after_2005.csv")
head(data)

##         STATE    HPI_CHG Time_Period Disaster_Affected NUM_DISASTERS
## 1 GASTHPI_CHG 0.01400856           0                 0             1
## 2 NCSTHPI_CHG 0.01422063           0                 0             3
## 3 TXSTHPI_CHG 0.01019172           0                 1             5
## 4 MASTHPI_CHG 0.02753656           0                 0             4
## 5 ALSTHPI_CHG 0.01758507           0                 1             4
## 6 MSSTHPI_CHG 0.01325241           0                 1             3
##   NUM_IND_ASSIST
## 1              0
## 2              0
## 3             22
## 4              9
## 5             14
## 6             49

Exploring dataset

stargazer(data, 
          type = "text")

## 
## ================================================
## Statistic         N  Mean  St. Dev.  Min    Max 
## ------------------------------------------------
## HPI_CHG           48 0.022  0.017   -0.006 0.061
## Time_Period       48 0.500  0.505     0      1  
## Disaster_Affected 48 0.208  0.410     0      1  
## NUM_DISASTERS     48 3.208  2.143     1     10  
## NUM_IND_ASSIST    48 8.583  14.946    0     55  
## ------------------------------------------------

# Histogram of HPI_CHG  
hist(data$HPI_CHG, 
     breaks = 30, col = "skyblue", 
     main = "Distribution of HPI_CHG", 

     xlab = "State-wise house price inflation ", 
     ylab = "Frequency")

#Creating the interaction term
data$did = data$Time_Period * data$Disaster_Affected
data$did

##  [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 1 0 0 0 0 0
## [39] 0 0 0 1 0 0 0 0 0 0

Estimating the DID estimator

#Passing it to the linear regression model
model1 = lm( data = data, 
             formula = HPI_CHG ~ Time_Period + Disaster_Affected + did
             )

summary(model1)

## 
## Call:
## lm(formula = HPI_CHG ~ Time_Period + Disaster_Affected + did, 
##     data = data)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.023081 -0.007610 -0.000171  0.004656  0.035981 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        0.037090   0.002819  13.157  < 2e-16 ***
## Time_Period       -0.027847   0.003987  -6.985  1.2e-08 ***
## Disaster_Affected -0.013944   0.006176  -2.258   0.0290 *  
## did                0.019739   0.008734   2.260   0.0288 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01229 on 44 degrees of freedom
## Multiple R-squared:  0.5356, Adjusted R-squared:  0.504 
## F-statistic: 16.92 on 3 and 44 DF,  p-value: 1.882e-07

Interpretation: The negative values for both time period and disaster_affected=1 when other factors are constant suggest that the housing prices decreases. However, the interaction term suggests that the negative impact on the prices is not very severe when both of the independent variables occur.

C.

Control Group: States unaffected by disasters before the time period change (Time_Period = 0, Disaster_Affected = 0).

Treatment Group: States affected by disasters after the time period change (Time_Period = 1, Disaster_Affected = 1).

The goal of comparing the differences in outcomes (housing price changes) between treatment and control groups before and after the intervention (disasters) is to isolate the disaster’s effect while adjusting for any broader trends. This method is effective because it eliminates extraneous trends impacting both groups, leaving solely the impact of the event.

Imagine two towns, one got hit by a storm, and the other didn't. We look at house prices before and after the storm in both towns. By comparing the change in house prices in the town with the storm to the change in the town without the storm, we can figure out how much the storm affected prices. This helps us not blame the storm for something that would have happened anyway.

# Create a two-way table with labels
raw_table <- table(Time_Period = ifelse(test = data$Time_Period == 0, yes = "pre", no = "post"),
                   Treatment_Status = ifelse(data$Disaster_Affected == 0, yes = "control", no = "treatment")
                   )

raw_table

##            Treatment_Status
## Time_Period control treatment
##        post      19         5
##        pre       19         5

?tapply # Apply a function to each cell of a ragged array, that is to each (non-empty) group of values given by a unique combination of the levels of certain factors.

# Calculate means for each group
mean_table <- tapply(X = data$HPI_CHG, 
                     INDEX = list(data$Time_Period, 
                                  data$Disaster_Affected),
                     FUN =  mean)
mean_table

##             0          1
## 0 0.037090020 0.02314612
## 1 0.009242792 0.01503835

# Calculate means for each group
mean_table <- tapply(data$HPI_CHG, list(Time_Period = ifelse(data$Time_Period == 0, "pre", "post"),
                                       Treatment_Status = ifelse(data$Disaster_Affected == 0, "control", "treatment")),
                    mean)

# Display the table
print(mean_table)

##            Treatment_Status
## Time_Period     control  treatment
##        post 0.009242792 0.01503835
##        pre  0.037090020 0.02314612

# Calculate DiD effect
DiD_effect <- (mean_table[1, 2] - mean_table[2, 2])  - (mean_table[1, 1] - mean_table[2, 1])
print(DiD_effect)

## [1] 0.01973946

They match.

D.

The primary assumption of Difference-in-Differences (DiD) is parallel trends, which means that the treatment and control groups would have followed the same path in the absence of the treatment. If additional events or shocks only influence one group, or if groups were already on distinct paths prior to treatment, the results may be skewed. When pre-treatment trends are consistent and external influences are well-controlled, the estimations can be trusted.

E.

\[ \text{Enrollment} = \beta_0 + \beta_1 \text{Treat} + \beta_2 \text{Female} + \beta_3 \text{Bihar} + \beta_4 (\text{Treat} \times \text{Female}) + \beta_5 (\text{Treat} \times \text{Bihar}) + \beta_6 (\text{Female} \times \text{Bihar}) + \beta_7 (\text{Treat} \times \text{Female} \times \text{Bihar}) + \epsilon \]

This study compares females and males in Bihar and other states in terms of grade 9 enrollment before and after the cycle program was implemented. The DDD technique removes confounding effects from gender and location variations, as well as time differences (pre/post program). By examining improvements across all three parameters, the study isolates the Cycle Program’s impact on female school enrollment in Bihar.

Diff in Diff

Aritra

2024-10-09

Setting up the environment

B.

Exploring dataset

Estimating the DID estimator

C.

D.

E.