---
title: "The Residual Alpha Search: Citadel 13-F Deep Dive"
subtitle: "Information Decay | Lindy Filter | Factor Attribution"
date: today
format:
  html:
    theme: cosmo
    toc: true
    toc-depth: 3
    toc-title: "Report Sections"
    code-fold: true
    code-summary: "Show Code"
    fig-width: 11
    fig-height: 6
    self-contained: true
    highlight-style: github
execute:
  echo: true
  warning: false
  message: false
---
```{r setup}
#| include: false

library(dplyr)
library(ggplot2)
library(tidyr)
library(scales)
library(knitr)
library(kableExtra)
library(broom)

bt     <- readRDS("data/backtest_results.rds")
lindy  <- readRDS("data/lindy_backtest.rds")
decay  <- readRDS("data/alpha_decay.rds")
fa     <- readRDS("data/factor_attribution.rds")
hp     <- readRDS("data/holding_periods.rds")

FUND_LABEL <- "Citadel Advisors LLC"
N_QTRS     <- nrow(bt$cumulative_returns)
N_YRS      <- N_QTRS / 4
```

## The Verdict Up Front

> *We ran a 5-year, 20-quarter backtest simulating a "copy Citadel 45 days late" strategy across 275 positions per quarter. The result is unambiguous: the alpha is gone by Day 2.*

This report documents the full analysis — the decay curve, the tenure segmentation, and the factor attribution that explains why the signal dies so fast and whether Citadel's edge is skill or speed.

---

## 1. Portfolio Structure

### 1.1 What We Are Working With
```{r portfolio-summary}
holdings <- readRDS("data/holdings_raw.rds") |>
  dplyr::filter(
    is.na(putCall) | putCall == "",
    !is.na(issuerSector)
  )

tibble(
  Metric = c(
    "Quarters of Data",
    "Total Position-Quarter Observations",
    "Avg. Positions Per Quarter (Long Equity)",
    "Sectors Covered",
    "Latest Quarter"
  ),
  Value = c(
    as.character(N_QTRS),
    scales::comma(nrow(holdings)),
    scales::comma(round(nrow(holdings) / N_QTRS)),
    as.character(n_distinct(holdings$issuerSector)),
    as.character(max(holdings$quarter_end_date))
  )
) |>
  kable(align = c("l","r")) |>
  kable_styling(
    bootstrap_options = c("striped","hover"),
    full_width = FALSE
  )
```

### 1.2 Tactical Noise vs. Structural Conviction

Lindy positions (20+ quarters) represent only 20% of positions but control
53.6% of portfolio weight. Citadel's capital allocation is itself a signal —
they concentrate dollars in their oldest, highest-conviction ideas.
```{r tenure-chart}
seg <- hp$segment_summary |>
  tidyr::pivot_longer(
    cols      = c(pct_positions, pct_weight),
    names_to  = "metric",
    values_to = "value"
  ) |>
  mutate(metric = recode(metric,
    "pct_positions" = "% of Positions",
    "pct_weight"    = "% of Portfolio Weight"
  ))

ggplot(seg, aes(x = segment, y = value, fill = metric)) +
  geom_col(position = "dodge", width = 0.65) +
  geom_text(
    aes(label = paste0(round(value, 1), "%")),
    position = position_dodge(width = 0.65),
    vjust = -0.4, size = 3.2, fontface = "bold"
  ) +
  scale_fill_manual(values = c(
    "% of Positions"        = "#0d1b2a",
    "% of Portfolio Weight" = "#457b9d"
  )) +
  scale_y_continuous(
    labels = \(x) paste0(x, "%"),
    expand = expansion(mult = c(0, 0.15))
  ) +
  labs(
    title    = "Citadel — Tactical Noise vs. Structural Conviction",
    subtitle = "Lindy positions punch far above their weight in capital allocation",
    x = NULL, y = "Share of Total (%)", fill = NULL
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title         = element_text(face = "bold", size = 14),
    plot.subtitle      = element_text(color = "#457b9d"),
    legend.position    = "top",
    panel.grid.major.x = element_blank()
  )
```

---

## 2. The Information Decay Backtest

### 2.1 Headline Results
```{r headline-results}
final     <- tail(bt$cumulative_returns, 1)
ann_full  <- (final$cum_full_alpha^(1/N_YRS) - 1) * 100
ann_clone <- (final$cum_clone^(1/N_YRS)      - 1) * 100
ann_spy   <- (final$cum_spy^(1/N_YRS)         - 1) * 100

tibble(
  Strategy = c(
    "Full Alpha — Enter at Quarter-End (Theoretical)",
    "Lagged Clone — Enter at Filing Date (+45 Days)",
    "SPY Benchmark"
  ),
  `Annualised Return` = c(
    scales::percent(ann_full  / 100, accuracy = 0.1),
    scales::percent(ann_clone / 100, accuracy = 0.1),
    scales::percent(ann_spy   / 100, accuracy = 0.1)
  ),
  `vs. SPY` = c(
    scales::percent((ann_full  - ann_spy) / 100, accuracy = 0.1),
    scales::percent((ann_clone - ann_spy) / 100, accuracy = 0.1),
    "—"
  )
) |>
  kable(align = c("l","r","r")) |>
  kable_styling(
    bootstrap_options = c("striped","hover"),
    full_width = FALSE
  ) |>
  row_spec(1, bold = TRUE, background = "#e8f8f5") |>
  row_spec(2, bold = TRUE, background = "#fdecea")
```

The 45-day delay destroys `r round(ann_full - ann_clone, 1)` percentage points
of annualised return. Citadel's theoretical edge is real and large. None of it
is accessible to a 13-F replicator.

### 2.2 Cumulative Returns
```{r cumulative-chart}
cum_long <- bt$cumulative_returns |>
  select(filing_date, cum_full_alpha, cum_clone, cum_spy) |>
  pivot_longer(
    cols      = c(cum_full_alpha, cum_clone, cum_spy),
    names_to  = "series",
    values_to = "value"
  ) |>
  mutate(series = recode(series,
    "cum_full_alpha" = "Full Alpha (Theoretical)",
    "cum_clone"      = "Lagged Clone (+45 Days)",
    "cum_spy"        = "SPY Benchmark"
  ))

ggplot(cum_long,
       aes(x = filing_date, y = (value - 1) * 100,
           color = series, linetype = series)) +
  geom_hline(yintercept = 0, linetype = "dotted",
             color = "grey50", linewidth = 0.5) +
  geom_line(linewidth = 1.1) +
  scale_color_manual(values = c(
    "Full Alpha (Theoretical)" = "#0d1b2a",
    "Lagged Clone (+45 Days)"  = "#457b9d",
    "SPY Benchmark"            = "#e63946"
  )) +
  scale_linetype_manual(values = c("solid","dashed","solid")) +
  scale_x_date(date_breaks = "6 months", date_labels = "%b %Y") +
  scale_y_continuous(labels = \(x) paste0(x, "%")) +
  labs(
    title    = paste(FUND_LABEL, "— 5-Year Information Decay Backtest"),
    subtitle = "Cumulative return above starting value. Clone enters at filing date (+45 days).",
    x = NULL, y = "Cumulative Return (%)",
    color = NULL, linetype = NULL
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title       = element_text(face = "bold", size = 14),
    plot.subtitle    = element_text(color = "#457b9d", size = 10),
    legend.position  = "bottom",
    panel.grid.minor = element_blank(),
    axis.text.x      = element_text(angle = 30, hjust = 1)
  )
```

---

## 3. Information Life Expectancy

*On what day does the alpha hit zero?*
```{r decay-curve}
decay$decay_plot_df |>
  filter(segment != "All Positions") |>
  ggplot(aes(x = day, y = avg_excess_return * 100,
             color = segment, linetype = segment)) +
  geom_hline(yintercept = 0, color = "grey40", linewidth = 0.6) +
  geom_vline(xintercept = 45, color = "#e63946",
             linewidth = 0.8, linetype = "dashed") +
  geom_line(linewidth = 1.0) +
  geom_line(
    data = decay$decay_plot_df |> filter(segment == "All Positions"),
    aes(x = day, y = avg_excess_return * 100),
    color = "grey30", linewidth = 0.7,
    linetype = "dotted", inherit.aes = FALSE
  ) +
  annotate("text", x = 46.5, y = Inf,
           label = "Day 45\nFiling Date",
           color = "#e63946", hjust = 0, vjust = 1.5,
           size = 3.2, fontface = "bold") +
  annotate("rect",
           xmin = 0, xmax = 2, ymin = -Inf, ymax = Inf,
           fill = "#e63946", alpha = 0.08) +
  annotate("text", x = 1, y = Inf,
           label = "Alpha\nDead\nZone",
           color = "#e63946", hjust = 0.5, vjust = 1.3,
           size = 2.8, fontface = "bold") +
  scale_color_manual(values = c(
    "Tactical (1-3 Qtrs)"    = "#e63946",
    "Established (4-7 Qtrs)" = "#f4a261",
    "Structural (8-19 Qtrs)" = "#457b9d",
    "Lindy (20+ Qtrs)"       = "#2a9d8f"
  )) +
  scale_x_continuous(
    breaks = c(0, 2, 15, 30, 45, 60, 75, 90),
    labels = c("D0","D2","D15","D30","D45\n(Filing)","D60","D75","D90")
  ) +
  scale_y_continuous(labels = \(x) paste0(x, "%")) +
  labs(
    title    = "Citadel 13-F — Information Life Expectancy",
    subtitle = paste0(
      "Avg. cumulative excess return vs. SPY from quarter-end to Day 90.\n",
      "Every segment crosses zero by Day 2. Red zone = alpha already dead."
    ),
    x = NULL,
    y = "Avg. Cumulative Excess Return vs. SPY (%)",
    color = "Tenure Segment",
    linetype = "Tenure Segment"
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title       = element_text(face = "bold", size = 14, color = "#0d1b2a"),
    plot.subtitle    = element_text(color = "#457b9d", size = 9),
    legend.position  = "bottom",
    panel.grid.minor = element_blank(),
    axis.text.x      = element_text(size = 9)
  )
```

Every tenure segment including Lindy positions held for 5+ years crosses
zero by Day 2. The alpha life expectancy is 48 hours, not 45 days.
The 13-F replication project is officially closed.

### Decay Values at Key Days
```{r decay-table}
decay$decay_plot_df |>
  filter(day %in% c(1, 2, 5, 10, 20, 30, 45, 60, 90)) |>
  select(segment, day, avg_excess_return) |>
  mutate(avg_excess_return = round(avg_excess_return * 100, 3)) |>
  tidyr::pivot_wider(names_from = day, values_from = avg_excess_return) |>
  rename_with(\(x) paste0("Day ", x), -segment) |>
  kable(align = c("l", rep("r", 9))) |>
  kable_styling(
    bootstrap_options = c("striped","hover","condensed"),
    full_width = TRUE
  ) |>
  add_header_above(c(" " = 1, "Cumulative Excess Return vs. SPY (%)" = 9))
```

---

## 4. The Lindy Filter

*Does position tenure slow the decay rate?*
```{r lindy-table}
lindy$segment_perf_summary |>
  arrange(segment) |>
  mutate(
    `Full Alpha`    = paste0(round(ann_full_alpha, 1), "%"),
    `Leakage`       = paste0(round(ann_leakage,   1), "%"),
    `Clone Return`  = paste0(round(ann_clone,      1), "%"),
    `SPY`           = paste0(round(ann_spy,        1), "%"),
    `Active Return` = paste0(round(active_return,  2), "%")
  ) |>
  select(
    Segment = segment,
    `Full Alpha`, `Leakage`,
    `Clone Return`, `SPY`, `Active Return`
  ) |>
  kable(align = c("l","r","r","r","r","r")) |>
  kable_styling(
    bootstrap_options = c("striped","hover","condensed"),
    full_width = TRUE
  ) |>
  row_spec(4, bold = TRUE, background = "#e8f8f5")
```
```{r lindy-chart}
lindy$cumulative_by_segment |>
  filter(!is.na(cum_clone)) |>
  mutate(cum_clone_pct = (cum_clone - 1) * 100) |>
  ggplot(aes(x = filing_date, y = cum_clone_pct,
             color = segment, linetype = segment)) +
  geom_hline(yintercept = 0, linetype = "dotted", color = "grey50") +
  geom_line(linewidth = 1.0) +
  scale_color_manual(values = c(
    "Tactical (1-3 Qtrs)"    = "#e63946",
    "Established (4-7 Qtrs)" = "#f4a261",
    "Structural (8-19 Qtrs)" = "#457b9d",
    "Lindy (20+ Qtrs)"       = "#2a9d8f"
  )) +
  scale_x_date(date_breaks = "6 months", date_labels = "%b %Y") +
  scale_y_continuous(labels = \(x) paste0(x, "%")) +
  labs(
    title    = "Lagged Clone Returns by Tenure Segment",
    subtitle = "Lindy positions lose the least — but all segments underperform SPY",
    x = NULL, y = "Cumulative Clone Return (%)",
    color = NULL, linetype = NULL
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title       = element_text(face = "bold", size = 14),
    plot.subtitle    = element_text(color = "#457b9d"),
    legend.position  = "bottom",
    panel.grid.minor = element_blank(),
    axis.text.x      = element_text(angle = 30, hjust = 1)
  )
```

The gradient is real — each step up in tenure reduces the annual decay penalty
by roughly 1 percentage point. But no segment escapes the fundamental problem.
By Day 45 the alpha is gone across all buckets.

---

## 5. Factor Attribution

*Is Citadel smart or just fast?*
```{r factor-table}
fa$regression_results |>
  filter(series == "Alpha Gap (Full - Clone)") |>
  mutate(
    Significant = ifelse(p.value < 0.05, "Yes", ""),
    term = recode(term,
      "(Intercept)" = "Jensen's Alpha (Intercept)",
      "Mkt_RF"      = "Market Beta (Mkt-RF)",
      "SMB"         = "Size Factor (SMB)",
      "HML"         = "Value Factor (HML)",
      "UMD"         = "Momentum Factor (UMD)"
    )
  ) |>
  select(
    Factor      = term,
    Estimate    = estimate,
    `Std Error` = std.error,
    `t-Stat`    = statistic,
    `p-Value`   = p.value,
    Significant
  ) |>
  mutate(across(where(is.numeric), \(x) round(x, 4))) |>
  kable(align = c("l","r","r","r","r","c")) |>
  kable_styling(
    bootstrap_options = c("striped","hover"),
    full_width = FALSE
  ) |>
  row_spec(1, bold = TRUE, background = "#e8f8f5") |>
  row_spec(5, bold = TRUE, background = "#fff3cd")
```
```{r factor-chart}
fa$regression_results |>
  filter(
    series == "Alpha Gap (Full - Clone)",
    term != "(Intercept)"
  ) |>
  mutate(
    term = recode(term,
      "Mkt_RF" = "Market\n(Mkt-RF)",
      "SMB"    = "Size\n(SMB)",
      "HML"    = "Value\n(HML)",
      "UMD"    = "Momentum\n(UMD)"
    ),
    significant = p.value < 0.05,
    ci_low  = estimate - 1.96 * std.error,
    ci_high = estimate + 1.96 * std.error
  ) |>
  ggplot(aes(x = term, y = estimate, fill = significant)) +
  geom_col(width = 0.5, show.legend = FALSE) +
  geom_errorbar(aes(ymin = ci_low, ymax = ci_high),
                width = 0.15, linewidth = 0.8, color = "grey30") +
  geom_hline(yintercept = 0, linewidth = 0.5, color = "grey40") +
  scale_fill_manual(values = c("TRUE" = "#0d1b2a", "FALSE" = "#adb5bd")) +
  labs(
    title    = "Factor Loadings on the Alpha Gap (Full Alpha minus Clone)",
    subtitle = "Dark bars = statistically significant (p < 0.05). Error bars = 95% CI.",
    x = NULL, y = "Factor Loading"
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title         = element_text(face = "bold", size = 13),
    plot.subtitle      = element_text(color = "#457b9d", size = 9),
    panel.grid.major.x = element_blank()
  )
```

### Interpretation
```{r factor-interpretation}
#| echo: false

alpha_row <- fa$alpha_summary |>
  filter(series == "Alpha Gap (Full - Clone)")

mom_row <- fa$regression_results |>
  filter(series == "Alpha Gap (Full - Clone)", term == "UMD")

smb_row <- fa$regression_results |>
  filter(series == "Alpha Gap (Full - Clone)", term == "SMB")

rsq <- fa$regression_results |>
  filter(series == "Alpha Gap (Full - Clone)") |>
  pull(r_sq) |>
  first()
```

**Jensen's Alpha: `r round(alpha_row$estimate, 4)`
(p = `r round(alpha_row$p.value, 4)`)** — significant and large, translating
to `r round(alpha_row$ann_alpha, 1)`% annualised unexplained alpha. After
controlling for all four factors, Citadel still has a genuine information edge.
They are not just riding factor tailwinds — they are smart AND fast.

**Momentum (UMD): `r round(mom_row$estimate, 3)`
(p = `r round(mom_row$p.value, 4)`)** — significant. Citadel enters momentum
positions before the signal is widely observable. By Day 2, momentum-driven
price discovery has already incorporated their information.

**Size (SMB): `r round(smb_row$estimate, 3)`
(p = `r round(smb_row$p.value, 4)`)** — the strongest factor loading.
The gap concentrates in smaller-cap names where Citadel's information advantage
is widest and prices move fastest. Exactly where you would expect a
multi-strategy fund to have an edge — and exactly where a replicator
gets hurt the most.

**Model R-squared = `r round(rsq, 3)`.** The four factors explain
`r round(rsq * 100, 1)`% of quarterly variance in the gap. The significant
intercept confirms genuine idiosyncratic skill in the remaining
`r round((1 - rsq) * 100, 1)`%.

---

## 6. Final Verdict
```{r verdict-table}
tibble(
  Question = c(
    "Does the 13-F clone beat SPY?",
    "Does the Lindy filter recover alpha?",
    "What day does alpha hit zero?",
    "Is Citadel's edge skill or speed?",
    "Is 13-F replication viable for Citadel?",
    "Where does the real edge live?"
  ),
  Answer = c(
    "No. Best segment (Lindy) returns -0.39% ann. vs SPY at +0.9%.",
    "Directionally yes — decay slows with tenure. Not enough to beat SPY.",
    "Day 2. Within 48 hours of quarter-end.",
    "Both. Significant Jensen's alpha AND large momentum loading.",
    "No, unfortunately not",
    "Small-cap information asymmetry and momentum front-running."
  )
) |>
  kable(align = c("l","l")) |>
  kable_styling(
    bootstrap_options = c("striped","hover"),
    full_width = TRUE
  ) |>
  row_spec(5, bold = TRUE, background = "#fdecea") |>
  row_spec(6, bold = TRUE, background = "#e8f8f5")
```

Citadel's 13-F is a graveyard not because they are bad — but because they are
exceptional. Their edge is real, significant, and concentrated in exactly the
positions where information moves prices fastest. By the time the filing hits
EDGAR, you are not following smart money. You are providing exit liquidity to it.

The next research mandate: apply this same framework to a low-turnover,
high-conviction manager where structural alpha has a longer half-life.
The Lindy gradient found here is the thread worth pulling.

---

## Appendix: Methodology
```{r appendix}
#| echo: false

cat(paste0(
  "Data source:       QUANTkiosk API via qkiosk R package\n",
  "Fund:              ", FUND_LABEL, "\n",
  "Quarters:          2020 Q1 to 2025 Q1 (", N_QTRS, " quarters)\n",
  "Universe:          Top 25 long equity positions per sector per quarter\n",
  "Sector weights:    Citadel actual allocations\n",
  "Within-sector:     Equal weighted\n",
  "Price data:        Yahoo Finance via tidyquant (adjusted close)\n",
  "Benchmark:         SPY (SPDR S&P 500 ETF)\n",
  "Factor data:       Ken French Data Library (FF3 + Carhart Momentum)\n",
  "Decay granularity: Daily, top 100 positions, Day 0 to Day 90\n",
  "Report generated:  ", as.character(Sys.time()), "\n"
))
```