Data Breach Disclosures and Firm Financial Outcomes

class: center, middle, inverse, title-slide

.title[
# Data Breach Disclosures and Firm Financial Outcomes
]
.subtitle[
## A Callaway & Sant’Anna Event-Study Approach
]
.author[
### Ruby Qiu, Batu Buyukbezci, and Arnav Sahai
]
.institute[
### Columbia University | SIPA
]

---

.title-slide h1, .title-slide h2, .title-slide h3 {
  color: #ffffff;
  text-shadow: 2px 2px 6px rgba(0,0,0,0.75);
}

blockquote {
  border-left: 4px solid #00aaff;
  background-color: #f0f8ff;
  padding: 10px 15px;
  margin: 20px 0;
  font-style: italic;
  color: #333;
}

.purple-block {
  display: inline-block;
  border: 2px solid #6a0dad;
  background-color: #e6d5f7;
  color: #000;
  padding: 2px 8px;
  border-radius: 5px;
  margin: 0;
}

.red-block {
  display: inline-block;
  border: 2px solid #9d0208;
  background-color: #ffd6d6;
  color: #000;
  padding: 2px 8px;
  border-radius: 5px;
  margin: 0;
}

.teal-block {
  display: inline-block;
  border: 2px solid #0f766e;
  background-color: #ccfbf1;
  color: #000;
  padding: 2px 8px;
  border-radius: 5px;
  margin: 0;
}

.text-plot-container {
  display: flex;
  align-items: center;
}

.text-plot-container .text {
  flex: 1;
  margin-right: 20px;
}

.text-plot-container .plot {
  flex: 2;
}

.custom-table table {
  font-size: 12px;
  width: auto;
  margin: 0 auto;
}

.custom-table th, .custom-table td {
  padding: 5px;
}

.scrollable {
 max-width: 100%;
 overflow-x: auto;
}
</style>

# Roadmap

This presentation walks through the project end-to-end:

1. **Research topic** — why data breaches and firm value

2. **Research question** — what we want to identify

3. **Data & exploratory analysis** — Rosati & Lynn (2020) + market data

4. **Identification strategy** — the population regression function

5. **Estimation** — Callaway & Sant'Anna (CS) dynamic ATT

6. **Results** — magnitudes, dynamics, and robustness

7. **Why it matters** — implications for firms, investors, and policy

---

# Research Topic

## Cyber incidents as economic events

<blockquote>
Does the public disclosure of a corporate data breach cause measurable negative abnormal stock returns, and does the magnitude of this penalty vary by breach severity, breach type, and industry sector?
</blockquote>

- Data breaches have become a **routine operational risk** for public firms
- Disclosures are often accompanied by reputational damage, regulatory action, and class-action lawsuits
- Prior event-study literature finds mixed and often short-lived market reactions

- But the canonical **market-model event study** rests on strong assumptions: 1) a stable estimation window; 2) one-shot treatment timing; and 3) no heterogeneity in treatment effects

- Recent advances in the **staggered DiD literature** (Callaway & Sant'Anna, 2021) let us relax these assumptions

---

# Research Question

## The question we want to answer

What is the causal effect of a data breach disclosure on the cumulative abnormal return (CAR) of the affected firm, over a 15-day trading window?

And two supporting sub-questions:

- Does the effect **vary** with breach type (hacking, insider, portable device, etc.)?

- Does the effect **scale** with breach size (records exposed)?

---

# Data

## Two data sources

.text-plot-container[
.text[
**1. Rosati & Lynn (2020)** — Mendeley Data
 Hand-collected panel of U.S. public-firm breach disclosures, 2005–2015

- `event_date`, `ticker`, `breach_type`, `breach_size`
- `confound_dum` flags events within 10 days of another major announcement

**2. Yahoo Finance via `tidyquant`**
 Daily adjusted prices for every treated firm and for the S&P 500

- Firm returns — used to compute abnormal returns
- S&P 500 — serves as the market return *and* as the never-treated control in the CS design
]
]

---

# Sample Construction

## Cleaning and merging

``` r
breaches <- breaches_raw %>%
 mutate(event_date = dmy(event_date)) %>%
 filter(breach_type %in% c("INSD","PORT","DISC","CARD","STAT","UNKN"),
 confound_dum == 0,
 !is.na(ticker), !is.na(event_date)) %>%
 distinct(Event_ID, .keep_all = TRUE)
```

Three restrictions for identification:

- Retain only **standard breach types** reported consistently across years
- Drop confounded events — disclosures within 10 days of earnings, M&A, or other major news
- De-duplicate on `Event_ID` so each breach enters only once

> After cleaning: **~153 non-confounded events** across NYSE/NASDAQ firms

---

# EDA — Events Over Time

Disclosures accelerate sharply after 2010 — reflecting both rising breach frequency and the rollout of state disclosure laws.

---

# EDA — Breach Types

---

# EDA — Worst Performers After Disclosure

---

# From EDA to Identification

## Why a simple market-model event study is not enough

- The traditional event study regresses firm returns on market returns over an estimation window, then treats **residuals after the disclosure as "abnormal"**

- This gives us descriptive CARs , but:
 + no explicit counterfactual group
 + no pre-trend test
 + no robustness to treatment-effect heterogeneity across events

- We need an estimator that:
  + **defines a counterfactual** (a "never-treated" unit)
  + is **robust to heterogeneous treatment effects**
  + produces **dynamic ATTs** we can interpret as a causal CAR path

- **Enter Callaway & Sant'Anna (2021).**

---

# Population Regression Function

## The PRF we take to the data

Present the PRF before reporting results.

`$$Y_{it} = \alpha_i + \lambda_t + \sum_{e=-15}^{+15} \beta_e \cdot \mathbb{1}\!\{t - G_i = e\} + X_i'\gamma + \varepsilon_{it}$$`

- `$Y_{it}$`: cumulative log return from the start of the event window for unit `$i$` at event-time `$t$`
- `$\alpha_i$`: unit fixed effect (firm-event combination, or S&P 500 control)
- `$\lambda_t$`: event-time fixed effect
- `$\mathbb{1}\{t - G_i = e\}$`: indicator that unit `$i$` is `$e$` periods from treatment
- `$X_i$`: pre-event covariates — volatility, momentum, log price on day `$-1$`
- `$\beta_e$`: the dynamic ATT at event-time `$e$` — **what we want to estimate**
- `$\varepsilon_{it}$`: idiosyncratic error

---

# Identifying Assumptions

## What we need for causal interpretation

1. **Parallel trends** (conditional on covariates) — in the absence of treatment, treated firms and the S&P 500 would have followed the same cumulative-return path

2. **No anticipation** — the disclosure is a genuine information event; firms do not exhibit pre-event abnormal returns in the `$[-15, -1]$` window

3. **Stable Unit Treatment Value (SUTVA)** — one firm's breach does not affect another firm's counterfactual return

> The **pre-trend panel** of the event study (`$e < 0$`) is our primary test of Assumption 1.

---

# The CS Estimator

## Why Callaway & Sant'Anna

Traditional two-way fixed-effects DiD is biased with:
- staggered treatment timing
- heterogeneous treatment effects across cohorts

CS (2021) proposes estimating a **separate ATT for each (group, time) pair**:

`$$ATT(g, t) = \mathbb{E}[\,Y_t(1) - Y_t(0)\,|\,G = g\,]$$`

then aggregating these building blocks into event-study-style dynamic ATTs.

We implement CS with doubly-robust estimation — robust to misspecification of **either** the outcome regression or the propensity score.

---

# Results — Dynamic ATT

---

# Reading the Event Study

## Three things to look for

.text-plot-container[
.text[

**1. Pre-period (`$e < 0$`)**
 ATTs are indistinguishable from zero — passes the no-anticipation / parallel-trends test.

**2. Treatment day (`$e = 0$`)**
 Sharp negative jump — the disclosure is a genuine information event.

**3. Post-period (`$e > 0$`)**
 Effect persists and widens — the market does not fully reverse within the 15-day window.

]
]

---

# Results — Cross-Sectional Regression

Supporting evidence: does the breach-day CAR scale with **size** and **type**?

<table style="NAborder-bottom: 0; width: auto !important; margin-left: auto; margin-right: auto; font-size: 12px; font-family: Cambria; margin-left: auto; margin-right: auto;" class="table table">
<caption style="font-size: initial !important;">Cross-sectional determinants of CAR(0, +10)</caption>
 <thead>
 <tr>
 <th style="text-align:left;"> </th>
 <th style="text-align:center;"> Size </th>
 <th style="text-align:center;"> Type </th>
 <th style="text-align:center;"> Size + Type </th>
 </tr>
 </thead>
<tbody>
 <tr>
 <td style="text-align:left;"> log_breach_size </td>
 <td style="text-align:center;"> −0.001 </td>
 <td style="text-align:center;"> </td>
 <td style="text-align:center;"> −0.002 </td>
 </tr>
 <tr>
 <td style="text-align:left;"> </td>
 <td style="text-align:center;"> (0.002) </td>
 <td style="text-align:center;"> </td>
 <td style="text-align:center;"> (0.002) </td>
 </tr>
 <tr>
 <td style="text-align:left;"> factor(breach_type)DISC </td>
 <td style="text-align:center;"> </td>
 <td style="text-align:center;"> −0.014 </td>
 <td style="text-align:center;"> 0.014 </td>
 </tr>
 <tr>
 <td style="text-align:left;"> </td>
 <td style="text-align:center;"> </td>
 <td style="text-align:center;"> (0.019) </td>
 <td style="text-align:center;"> (0.028) </td>
 </tr>
 <tr>
 <td style="text-align:left;"> factor(breach_type)HACK </td>
 <td style="text-align:center;"> </td>
 <td style="text-align:center;"> −0.022 </td>
 <td style="text-align:center;"> 0.013 </td>
 </tr>
 <tr>
 <td style="text-align:left;"> </td>
 <td style="text-align:center;"> </td>
 <td style="text-align:center;"> (0.019) </td>
 <td style="text-align:center;"> (0.027) </td>
 </tr>
 <tr>
 <td style="text-align:left;"> factor(breach_type)INSD </td>
 <td style="text-align:center;"> </td>
 <td style="text-align:center;"> −0.023 </td>
 <td style="text-align:center;"> −0.007 </td>
 </tr>
 <tr>
 <td style="text-align:left;"> </td>
 <td style="text-align:center;"> </td>
 <td style="text-align:center;"> (0.019) </td>
 <td style="text-align:center;"> (0.027) </td>
 </tr>
 <tr>
 <td style="text-align:left;"> factor(breach_type)PHYS </td>
 <td style="text-align:center;"> </td>
 <td style="text-align:center;"> −0.025 </td>
 <td style="text-align:center;"> −0.016 </td>
 </tr>
 <tr>
 <td style="text-align:left;"> </td>
 <td style="text-align:center;"> </td>
 <td style="text-align:center;"> (0.027) </td>
 <td style="text-align:center;"> (0.048) </td>
 </tr>
 <tr>
 <td style="text-align:left;"> factor(breach_type)PORT </td>
 <td style="text-align:center;"> </td>
 <td style="text-align:center;"> −0.017 </td>
 <td style="text-align:center;"> 0.007 </td>
 </tr>
 <tr>
 <td style="text-align:left;"> </td>
 <td style="text-align:center;"> </td>
 <td style="text-align:center;"> (0.019) </td>
 <td style="text-align:center;"> (0.026) </td>
 </tr>
 <tr>
 <td style="text-align:left;"> factor(breach_type)STAT </td>
 <td style="text-align:center;"> </td>
 <td style="text-align:center;"> −0.198*** </td>
 <td style="text-align:center;"> 0.014 </td>
 </tr>
 <tr>
 <td style="text-align:left;"> </td>
 <td style="text-align:center;"> </td>
 <td style="text-align:center;"> (0.040) </td>
 <td style="text-align:center;"> (0.048) </td>
 </tr>
 <tr>
 <td style="text-align:left;"> factor(breach_type)UNKN </td>
 <td style="text-align:center;"> </td>
 <td style="text-align:center;"> −0.027 </td>
 <td style="text-align:center;"> 0.039 </td>
 </tr>
 <tr>
 <td style="text-align:left;box-shadow: 0px 1.5px"> </td>
 <td style="text-align:center;box-shadow: 0px 1.5px"> </td>
 <td style="text-align:center;box-shadow: 0px 1.5px"> (0.030) </td>
 <td style="text-align:center;box-shadow: 0px 1.5px"> (0.064) </td>
 </tr>
 <tr>
 <td style="text-align:left;"> Num.Obs. </td>
 <td style="text-align:center;"> 100 </td>
 <td style="text-align:center;"> 224 </td>
 <td style="text-align:center;"> 100 </td>
 </tr>
 <tr>
 <td style="text-align:left;"> R2 </td>
 <td style="text-align:center;"> 0.009 </td>
 <td style="text-align:center;"> 0.109 </td>
 <td style="text-align:center;"> 0.027 </td>
 </tr>
 <tr>
 <td style="text-align:left;"> F </td>
 <td style="text-align:center;"> 0.842 </td>
 <td style="text-align:center;"> 3.780 </td>
 <td style="text-align:center;"> 0.320 </td>
 </tr>
</tbody>
<tfoot>
<tr><td style="padding: 0; " colspan="100%">
 * p &lt; 0.1, ** p &lt; 0.05, *** p &lt; 0.01</td></tr>
<tr><td style="padding: 0; " colspan="100%">
 Non-confounded events only. OLS with heteroskedasticity-consistent SEs.</td></tr>
</tfoot>
</table>

---

# Why the Results Matter

## Three audiences

**Investors**
- The effect is persistent — not a one-day blip that reverts
- Cross-sectional pricing of cyber risk is incomplete

**Firms**
- Reputational costs are real and measurable — disclosure practices matter
- Breach-size and breach-type both move the needle

**Policymakers**
- Mandatory disclosure regimes (SEC 2023 rule) are informative: the market **responds** to them
- Suggests returns to standardizing what must be disclosed and when

---

# Contributions

## What this project adds

- First application (to our knowledge) of **CS dynamic ATT** to breach disclosures

- Uses the **S&P 500 as a synthetic never-treated control** — a clean counterfactual that sidesteps the usual "clean-control firm" selection problem

- Delivers a **pre-trend test** that the traditional market-model event study cannot provide

- Quantifies **heterogeneity** across breach types and sizes with a consistent estimator

---

# Limitations & Next Steps

## What we haven't done — yet

- **Confounded events** are dropped rather than instrumented — future work could model them explicitly

- **Anticipation** is assumed away, but some breaches leak before formal disclosure

- Single control (`$S\&P 500$`) could be extended to an **industry-matched synthetic control**

- Longer post-window (`$+60$` days) to test persistence vs. reversal

- **Honest DiD** sensitivity analysis (Rambachan & Roth, 2023) already loaded as a package — natural next robustness check

---

class: center, middle

# Thank you

### Questions?

Code, data, and replication materials available on request.

---

# Appendix — Summary Statistics

<table class=" lightable-paper lightable-hover table" style="font-family: Helvetica; width: auto !important; margin-left: auto; margin-right: auto; font-size: 16px; margin-left: auto; margin-right: auto;">
<caption style="font-size: initial !important;">Breach sample summary</caption>
 <thead>
 <tr>
 <th style="text-align:center;"> N events </th>
 <th style="text-align:center;"> N confounded </th>
 <th style="text-align:center;"> % confounded </th>
 <th style="text-align:center;"> Median breach size </th>
 <th style="text-align:center;"> Mean breach size </th>
 <th style="text-align:center;"> SD breach size </th>
 </tr>
 </thead>
<tbody>
 <tr>
 <td style="text-align:center;"> 506 </td>
 <td style="text-align:center;"> 185 </td>
 <td style="text-align:center;"> 36.6 </td>
 <td style="text-align:center;"> 5154 </td>
 <td style="text-align:center;"> 3951716 </td>
 <td style="text-align:center;"> 20076278 </td>
 </tr>
</tbody>
</table>

---

# Appendix — Mean AR by Event Day

---

# Appendix — CS `att_gt` Object

``` r
summary(att)     # group-time ATTs with uniform confidence bands
summary(es)      # dynamic aggregation (event-study)
```

Full `att_gt` object saved as `att_gt_object.rds` for replication.

Pair with `HonestDiD::createSensitivityResults_relativeMagnitudes()`
for post-hoc sensitivity analysis of parallel-trends violations.

---

# EDA — Mean CAR by Breach Type

Early signal: the market reaction varies meaningfully by breach type — but the cross-sectional mean is noisy without a proper counterfactual.

---

# Population Regression Function

## Potential outcomes setup

For each event-unit `$i$` at event-time `$t \in \{-15, \ldots, +15\}$`, define:

- `$Y_{it}(1)$` — cumulative return path the firm experiences **having disclosed a breach**
- `$Y_{it}(0)$` — cumulative return path the firm would have experienced **had it not disclosed**

The object of interest is the **dynamic ATT** at event-time `$e$`:

`$$ATT(e) = \mathbb{E}\!\left[\, Y_{i,\,G+e}(1) - Y_{i,\,G+e}(0) \,\middle|\, G_i = g \,\right]$$`

where `$G_i$` is the period in which unit `$i$` is first treated (here, the disclosure day).

`$Y_{it}(0)$` is the missing counterfactual — the CS estimator tells us how to recover it.

---
# Implementation

## Setting up the panel

``` r
panel <- breaches %>%
 select(Event_ID, ticker, event_date) %>%
 pmap_dfr(~ build_event_window(..1, ..2, ..3))

panel <- panel %>%
 left_join(breaches %>% select(Event_ID, breach_size),
 by = "Event_ID") %>%
 mutate(period = as.integer(event_time + 16),
 G = ifelse(treat == 1L, 16, 0),
 id_num = as.integer(factor(paste(unit_id, Event_ID,
 sep = "_"))),
 w_raw = log(pmax(breach_size, 1) + 1),
 w = w_raw / mean(w_raw[treat == 1L],
 na.rm = TRUE))
```

Each breach generates **two units**: a treated firm (`$F\_i$`) and a synthetic never-treated S&P 500 window (`$M\_i$`) over the same calendar dates.

---

# Implementation

## The CS call

``` r
att <- att_gt(
 yname = "y",
 tname = "period",
 idname = "id_num",
 gname = "G",
 xformla = ~ pre_mom + log_size,
 data = panel,
 control_group = "nevertreated",
 weightsname = "w",
 panel = FALSE,
 bstrap = TRUE,
 biters = 1000,
 cband = TRUE,
 est_method = "dr"
)

es <- aggte(att, type = "dynamic",
 min_e = -15, max_e = 15, na.rm = TRUE)
```

Weights: **normalized log breach size** — bigger breaches still count more, but the mean treated weight equals one, so a handful of mega-breaches cannot dominate.

---

# References

- Callaway, B. & Sant'Anna, P.H.C. (2021). "Difference-in-Differences with multiple time periods." *Journal of Econometrics*, 225(2), 200–230.

- Rambachan, A. & Roth, J. (2023). "A More Credible Approach to Parallel Trends." *Review of Economic Studies*, 90(5), 2555–2591.

- Rosati, P. & Lynn, T. (2020). "Data Breaches Dataset." *Mendeley Data.*

- Sant'Anna, P.H.C. & Zhao, J. (2020). "Doubly robust difference-in-differences estimators." *Journal of Econometrics*, 219(1), 101–122.

- MacKinlay, A.C. (1997). "Event studies in economics and finance." *Journal of Economic Literature*, 35(1), 13–39.