Advanced quantitative data analysis

class: center, middle, inverse, title-slide

.title[
# Advanced quantitative data analysis
]
.subtitle[
## Difference in Difference II
]
.author[
### Mengni Chen
]
.institute[
### Department of Sociology, University of Copenhagen
]

---

#Let's get ready

```r
#install.packages("did")
library(tidyverse) # Add the tidyverse package to my current library.
library(haven) # Handle labelled data.
library(splitstackshape) #transform wide data (with stacked variables) to long data
library(plm) #linear models for panel data
library(did) #for difference in difference analysis
```

---
#Difference in Difference: Visualize the three ways of understanding
- Assumption
    - We assume that trends of dependent variable over time were identical between treated and non-treated group before the treatment takes place
    - We assume that the trends would have remained parallel, if there would have been no treatment.

- Three ways of understanding
<img src="https://github.com/fancycmn/slide12/blob/main/S12_Pic11.PNG?raw=true" width="100%" style="display: block; margin-left:0px;">

---
#Difference in Difference: application
- It can not only looks at the effect of life events on individual's life satisfaction, mental health, salary, working hours, etc.

- It can also evaluate the effect of policies.
    - for example, KU introduces a new one-year MA program. How do you evaluate the impact of this new program?

---
#Difference in differnce: time-varying treatment
- Time-varying treatment or sometime called staggered treatment
  - More then two group: one control, one treated group is treated earlier, one treated group is treated later
  - More than two periods
  - The treatment time is not the same for all members in the treated group
  - This is the so-called **staggered treatment**

- Example:KU introduces a new one-year MA program. How does this affect the salary of the graduates?
  - The program is rolled out to students at different times based on their departments:
  - One-year MA program in the Sociology Department starts in Sep 2025.
  - One-year MA program in the Psychology Department starts in Sep 2026.
  - One-year MA program in the Economics Department starts in Sep 2027.

- Data: We have yearly data on graduates' employment and salary from January 2022 to December 2030.

- Goal: To estimate the causal effect of the one-year MA program on graduates' salary.

---
# TWFE is problematic when the treatment is staggered
The TWFE model includes:
- Unit Fixed Effects `\(μ_{i}\)`: Captures time-invariant characteristics of each unit (e.g., state, individual, etc.)
- Time Fixed Effects `\(λ_{t}\)`: Captures time-varying factors common to all units (e.g., macroeconomic trends).
- The regression equation is : `\(Y_{i,t}=μ_{i}+λ_{t}+ β^{DD}Treatment_{i,t}+ ϵ_{i,t}\)`
  - `\(Treatment_{i,t}\)`  is a binary indicator equal to 1 if the individual `\(i\)` is treated at time t, and 0 otherwise. 
  - `\(β^{DD}\)` is the average treatment effect on the treated (ATT).

- Using a two-way fixed effect (TWFE) to estimate the average treatment effect (ATT) is problematic (Goodman-Bacon 2018)
  - Problem 1: `\(β^{DD}\)` becomes a very strange weighted average, due to the strange weight
  - Problem 2: the treatment effects may be heterogeneous across groups and over time

---
# TWFE is problematic when the treatment is staggered
- Goodman-Bacon(2018) identifies that the `\(β^{DD}\)` is a [strange weighted average](https://www.youtube.com/watch?v=aUHCAG98G-o) of 2x2 comparisons (see the following graphs for different 2x2 comparisons).
  - Later treated groups become a control group for early treated groups
  - Earlier treated groups become also a control group of late treated groups
  - Heterogeneous treatment effects may lead to severe bias

---
#Difference in differnce: time-varying treatment
- Using a two-way fixed effect (TWFE) to estimate the average treatment effect (ATT) is problematic (Goodman-Bacon 2018)
  - Difficult to interpret what the ATT means from the twoway fixed effect
  -  `\(β^{DD}\)` becomes a very strange weighted average, due to the strange weight (Goodman-Bacon 2018)
  
<img src="https://github.com/fancycmn/2024-Session13/blob/main/Figure1.JPG?raw=true" width="80%" style="display: block; margin-left:0px;">

---
# Two-way fixed effect is problematic
- `\(β^{DD}\)` from two-way fixed effect estimate depends on many factors: variation in treatment time, group size, etc
  - treated group vs untreated group
  - earlier treated vs later treated
  - later treated vs earlier treated
  
- Greater weight will be given to
  - Big groups (i.e. many observations)
  - Groups that are treated closer to th middle of the sample period
  
- Therefore, due to the strange weight, it is very difficult to explain what `\(β^{DD}\)` it means

---
#Solutions:  Callaway and Sant’Anna (2021)
- Give a group-period specific estimate
  - ATT(g,t): average treatment effect of group G and at time t
  
- Use the "never treated" group as the comparison for every treated group and every period

- Summarize the ATT(g,t)s, using weights flexibly and meaningful to you

---
#Using DID package
- [prepare the data](https://rpubs.com/fancycmn/1251748)
- three codes from the `did` package 
  - estimate a group-period specific effect: `att_gt()`
  - plot the result: `ggdid()`
  - summarize the aggregate treatment effect: `aggte()`

---
#Estimate a group-period specific effect

```r
sixwaves_long4 <- sixwaves_long3 %>% 
    group_by(id) %>%   
    mutate(
    treatgroup=case_when(
      anchorwave %in% c(2:6) ~ anchorwave, #use anchorwave to define the different treated groups
      anchorwave ==99 ~ 0) #if not treated, define the treatgroup as 0, that is the control group
          )
```

---
#Estimate a group-period specific effect

```r
did <- att_gt(yname = "sat", #dependent variable
              tname = "wave", #time variable
              idname = "id", #id
              gname = "treatgroup", #identify five treatment groups
              xformla = ~ 1, #when you don't have any covariates to control, use "~ 1"; if yes, you can add covariates here by ~ x1+x2
              data = sixwaves_long4 #specify your data,
)
```

```
## Warning in pre_process_did(yname = yname, tname = tname, idname = idname, :
## Dropped 1844 observations while converting to balanced panel.
```

---
#Estimate a group-period specific effect

```r
summary(did)
```

```
## 
## Call:
## att_gt(yname = "sat", tname = "wave", idname = "id", gname = "treatgroup", 
##     xformla = ~1, data = sixwaves_long4)
## 
## Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015> 
## 
## Group-Time Average Treatment Effects:
##  Group Time ATT(g,t) Std. Error [95% Simult.  Conf. Band]  
##      2    2   0.4211     0.2102       -0.1838      1.0261  
##      2    3   0.5487     0.1890        0.0050      1.0925 *
##      2    4   0.3960     0.2137       -0.2191      1.0110  
##      2    5   0.3775     0.2126       -0.2343      0.9893  
##      2    6   0.3463     0.2313       -0.3191      1.0117  
##      3    2   0.1969     0.2009       -0.3813      0.7751  
##      3    3   0.3428     0.1971       -0.2242      0.9098  
##      3    4  -0.0380     0.2866       -0.8626      0.7867  
##      3    5  -0.1051     0.2239       -0.7492      0.5390  
##      3    6   0.2281     0.2099       -0.3757      0.8320  
##      4    2  -0.2392     0.2341       -0.9128      0.4344  
##      4    3   0.3234     0.2081       -0.2754      0.9223  
##      4    4   0.0800     0.2108       -0.5265      0.6865  
##      4    5   0.1634     0.1605       -0.2984      0.6253  
##      4    6   0.3281     0.1797       -0.1891      0.8452  
##      5    2   0.2097     0.2408       -0.4831      0.9024  
##      5    3  -0.2581     0.2160       -0.8795      0.3634  
##      5    4   0.2928     0.2047       -0.2961      0.8817  
##      5    5   0.2506     0.2130       -0.3623      0.8635  
##      5    6   0.1241     0.2107       -0.4821      0.7302  
##      6    2  -0.4166     0.2878       -1.2448      0.4116  
##      6    3  -0.0192     0.2405       -0.7114      0.6729  
##      6    4  -0.1032     0.3224       -1.0310      0.8245  
##      6    5   0.1628     0.3322       -0.7931      1.1186  
##      6    6   0.4390     0.2421       -0.2576      1.1356  
## ---
## Signif. codes: `*' confidence band does not cover 0
## 
## P-value for pre-test of parallel trends assumption:  0.3068
## Control Group:  Never Treated,  Anticipation Periods:  0
## Estimation Method:  Doubly Robust
```

---
#Plot group-period specific effect

```r
ggdid(did, ylim=c(-2,2))
```
<img src="https://github.com/fancycmn/slide13/blob/main/S13_Pic6.png?raw=true" width="45%" style="display: block; margin-left:20px;">

---
#Summarize the ATT(g,t)s: get an overall one ATT

```r
#"simple" (this just computes a weighted average of all group-time average treatment effects with weights proportional to group size)
agg.ovearll <- aggte(did, type = "simple") #summarize the overall effect
summary(agg.ovearll)
```

```
## 
## Call:
## aggte(MP = did, type = "simple")
## 
## Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015> 
## 
## 
##     ATT    Std. Error     [ 95%  Conf. Int.]  
##  0.2726        0.0976     0.0814      0.4638 *
## 
## 
## ---
## Signif. codes: `*' confidence band does not cover 0
## 
## Control Group:  Never Treated,  Anticipation Periods:  0
## Estimation Method:  Doubly Robust
```

---
#Summarize the ATT(g,t)s: get an group-specific ATT

```r
#average treatment effects across different groups
agg.group <- aggte(did, type = "group") #summarize the effect by the group
summary(agg.group)
```

```
## 
## Call:
## aggte(MP = did, type = "group")
## 
## Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015> 
## 
## 
## Overall summary of ATT's based on group/cohort aggregation:  
##     ATT    Std. Error     [ 95%  Conf. Int.]  
##  0.2678        0.0892      0.093      0.4426 *
## 
## 
## Group Effects:
##  Group Estimate Std. Error [95% Simult.  Conf. Band]  
##      2   0.4179     0.1752        0.0012      0.8347 *
##      3   0.1070     0.1854       -0.3341      0.5480  
##      4   0.1905     0.1473       -0.1599      0.5409  
##      5   0.1873     0.1852       -0.2533      0.6280  
##      6   0.4390     0.2441       -0.1418      1.0198  
## ---
## Signif. codes: `*' confidence band does not cover 0
## 
## Control Group:  Never Treated,  Anticipation Periods:  0
## Estimation Method:  Doubly Robust
```

---
#Summarize the ATT(g,t)s: plot a group-specific ATT

```r
ggdid(agg.group) #plot the effect by the group
```
<img src="https://github.com/fancycmn/slide13/blob/main/S13_Pic7.png?raw=true" width="45%" style="display: block; margin-left:20px;">

---
#Summarize the ATT(g,t)s
- Get a time-dynamic ATT

```r
agg.dynamic <- aggte(did, type = "dynamic") #summarize the time dynamic effect
summary(agg.dynamic)
```

```
## 
## Call:
## aggte(MP = did, type = "dynamic")
## 
## Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015> 
## 
## 
## Overall summary of ATT's based on event-study/dynamic aggregation:  
##     ATT    Std. Error     [ 95%  Conf. Int.]  
##  0.2841          0.11     0.0684      0.4997 *
## 
## 
## Dynamic Effects:
##  Event time Estimate Std. Error [95% Simult.  Conf. Band]  
##          -4  -0.4166     0.2780       -1.1785      0.3453  
##          -3   0.1110     0.1618       -0.3323      0.5543  
##          -2  -0.2111     0.1331       -0.5759      0.1537  
##          -1   0.2526     0.1012       -0.0249      0.5300  
##           0   0.3012     0.0890        0.0573      0.5452 *
##           1   0.2278     0.1023       -0.0524      0.5081  
##           2   0.2309     0.1152       -0.0848      0.5465  
##           3   0.3142     0.1554       -0.1118      0.7401  
##           4   0.3463     0.2476       -0.3322      1.0249  
## ---
## Signif. codes: `*' confidence band does not cover 0
## 
## Control Group:  Never Treated,  Anticipation Periods:  0
## Estimation Method:  Doubly Robust
```
---
#Summarize the ATT(g,t)s: plot a time-dynamic ATT

```r
ggdid(agg.dynamic) #plot the time dynamic effect 
```

---
#Estimate a group-period specific effect: using unbalanced
sometimes, the sample size of a balanced data is very small. You can use the unbalanced option.
However, you should consider any selection issues that come with balanced or unbalanced data.

```r
did_unbalanced <- att_gt(yname = "sat", #dependent variable
              tname = "wave", #time variable
              idname = "id", #id
              gname = "treatgroup", #identify five treatment groups
              xformla = ~ 1, #when you don't have any covariates to control, use "~ 1"; if yes, you can add covariates here by ~ x1+x2
              data = sixwaves_long4, #specify your data,
              allow_unbalanced_panel =TRUE #you can specify here to use the unbalanced panel
              )
```

---
#Take home
- What is staggered treatment
- What is the problem of using two-way fixed effect to estimate the ATT of staggered treatment
- Using `did` package to do staggered DID
  - `att_gt()`: estimate the group-time specific effect
  - `ggdid()`: plot the effect
  - `aggte()`: aggregate the group-time specific effect
  - [weight explanation in Callaway and Sant’Anna(1:08:01)](https://www.youtube.com/watch?v=VLviaylakAo&t=4642s)

---
class: center, middle
#[Exercise](https://rpubs.com/fancycmn/1251752)