diff-in-diff

Author

Song Lu

B. Linear Regression

  • the dependent variable is HPI_CHG

  • the independent variables are Time_Period (dummy), Disaster Affected (dummy), and their interaction term

df <- read.csv("E:\\coastal.csv")

model <- lm(df$HPI_CHG ~ df$Time_Period + df$Disaster_Affected + df$Time_Period*df$Disaster_Affected)

model

Call:
lm(formula = df$HPI_CHG ~ df$Time_Period + df$Disaster_Affected + 
    df$Time_Period * df$Disaster_Affected)

Coefficients:
                        (Intercept)                       df$Time_Period  
                            0.03709                             -0.02785  
               df$Disaster_Affected  df$Time_Period:df$Disaster_Affected  
                           -0.01394                              0.01974  

C.

  1. What is the control and the control group, and what is the treatment and the treatment group?

    The control: is the condition in area not affected by the disaster df$Disaster_Affected = 0

    The treatment: is the condition in area affected by the disaster df$Disaster_Affected = 1

    The control group: is the area not affected by the disaster

    The treatment group: is the area affected by the disaster

  2. The basic idea is to compare the difference in outcomes between the treatment and control groups before and after the treatment is introduced. By differencing these differences, what do you hope to achieve, and why should it work?

    By comparing the differences in conditions between the treatment and control groups before and after the treatment, we can isolate the effect of the treatment itself. By looking at the changes over time for both groups, we can control the trends that affect both groups equally regardless of the treatment. By differentiating the conditions based on treatment reception, we can isolate the effect of treatments. The assumption is in the absence of treatment, the difference between the treat group and the control group would remain the same.

  3.  2 X 2 matrix of regression equations

    mean_00 <- mean(df$HPI_CHG[df$Time_Period == 0 & df$Disaster_Affected == 0], na.rm = TRUE)
    mean_01 <- mean(df$HPI_CHG[df$Time_Period == 0 & df$Disaster_Affected == 1], na.rm = TRUE)
    mean_10 <- mean(df$HPI_CHG[df$Time_Period == 1 & df$Disaster_Affected == 0], na.rm = TRUE)
    mean_11 <- mean(df$HPI_CHG[df$Time_Period == 1 & df$Disaster_Affected == 1], na.rm = TRUE)
    
    
    matrix_2x2 <- matrix(c(
      mean_00, mean_01,
      mean_10, mean_11
    ), nrow = 2, byrow = TRUE)
    
    colnames(matrix_2x2) <- c("Disaster_Affected = 0", "Disaster_Affected = 1")
    rownames(matrix_2x2) <- c("Time_Period = 0", "Time_Period = 1")
    
    print(matrix_2x2)
                    Disaster_Affected = 0 Disaster_Affected = 1
    Time_Period = 0           0.037090020            0.02314612
    Time_Period = 1           0.009242792            0.01503835

    Diff-in-Diff comparison

    [1] "Matrix DiD: 0.0197394560315789"
    [1] "Regression DiD: 0.0197394560315789"

D.

  1. Parallel Trends Assumption:

    • Threat: If the treatment and control groups have different underlying trends over time, the DiD estimate will be biased.
  2. Stable Unit Treatment Value Assumption (SUTVA):

    • Threat: If there are spillover effects or interference between control and treatment groups, the DiD estimate may be biased.
  3. No Anticipation Effects:

    • Threat: If individuals change their behavior in anticipation of the treatment, it can bias the DiD estimate.

E.

There are 4 different studies with different controlled variables. While the margins stayed the same. They are: treatment, gender, and location.

\[ Y_{it} = \alpha + \beta_1treatment + \beta_2gender + \beta_3location + \beta_4 (treatment \text{ } X \text{ } gender) + \beta_5 (treatment \text{ } X \text{ } location) + \beta_6 (gender\text{ } X \text{ }location) + \beta_7 (treatment\text{ } X\text{ } gender \text{ }X \text{ }location) + \epsilon \]

The study design works by comparing the changes in school enrollment across three variables: time, gender, and location, which helps isolate the effect of the bicycle program. By using boys and other locations as control groups while also controlling other variables such as demographic and socioeconomics, the authors can control for other factors that might influence school enrollment, ensuring that the estimated impact is due to the bicycle program.