Extreme value analysis demo

Fort Collins Precipitation
- How unsual was the event?
- Quick side question: why not just use empirical quantiles?
Modeling Fort Collins extremes
- Approach (1): modelling all non-zero data
- Approach (2): model only extreme data
Why did we see what we saw?

Demo based on example from WCRP (world climate research program) summer school 2019.

Fort Collins Precipitation

Fort Collins flood, 1997 caused $250 million in damage and killed 5 people.

Recording station not at center of storm.

How unsual was the event?

Measured value for the 1997 event: 117.6 mm.

Questions to answer: what is the return period for a 117.6 mm event? And, what is the 100-year return level?

To answer the questions: we will use data from 1948-1990.

This requires extrapolation into the tail: the largest observed event from 1948-1990 is 103 mm.

we will model the data in two ways:

model all non-zero data
model only extreme data

Quick side question: why not just use empirical quantiles?

Formula for empirical frequency estimates:

\[ \hat{p} = \frac{1}{N}\sum^N_{i=1}\mathbf{1}_{x_i>u} \] which has relative error

\[ \text{RE} = \frac{1}{\sqrt{\hat{p}N}} \]

…to estimate a return period of 100 years with 10% error, we need 10,000 years of data.

Modeling Fort Collins extremes

Approach (1): modelling all non-zero data

Let $X_t$ be the daily summer precipitation amount for Fort Collins, where summer is April - October.

To model precipitation, we need to account for zeros. We assume:

\[ X_t > 0 \text{ w.p. } p \\ X_t = 0 \text{ w.p. } 1- p \] Probability that we have a non-zero precipitation event:

length(summer[Prec>0,get("Prec")]) / length(summer[,get("Prec")])

## [1] 0.2628776

so probability that $X_T$ > 0 is 0.2629.

Now assume that $[X_t | X_t > 0]$ are distributed Gamma(alpha, beta):

ML estimates: $\hat{\alpha} = 0.657$, $\hat{\beta} = 3.18$.

All precipitation model estimate:

\[ \begin{align} P(X_t > 117.6) &= P(X_t > 117.6 | X_t > 0)P(X_t > 0)\\ &= (1 - F_x(117.6))(0.2629)\\ &=3.01 \times 10^{-8} \end{align} \]

Now get return period probability:

\[ \begin{align} P(\text{ann max} > 117.6) &= 1- P(\text{entire year's obs} < 117.6))\\ &= 1 - (1- P(\text{indiv obs} > 117.6))^{214}\\ &= 1 - (1 - 3.01 \times 10^{-8})^{214}\\ &= 6.44 \times 10^{-6} \end{align} \] Assumes independence of individual observations, 214 summer days in a year.

Return period for precipitation event of 117.6:

\[ 1 / 6.44 \times 10^{-6} = 115,246 \text{ years} \] Estimate of the 100 year return level: 60.5 mm

98% of the data lies below 30 mm…

Approach (2): model only extreme data

Extract annual maxima and fit a model:

Let $M_n = \text{max}_{t=1,\dots,n}(X_t)$. Assume $M_n \sim \text{GEV}(\mu,\sigma,\xi)$.

ML estimates: $\hat{\mu} = 36.0$, $\hat{\sigma} = 14.12$, $\hat{\xi} = 0.16$.

Calculate the return period of the 117.6 mm event and the return level of the 100 year return period:

\[ P(\text{ann max > 117.6}) = 1 - F_{M_n}(117.6) = 0.016 \] Return period point estimate for an event of 117.6 mm: $1 / 0.016 = 62.5$ years

100-year return level point estimate: 130.9 mm

Why did we see what we saw?

Intuitively: phenomena that generate extreme events are fundamentally different than those that generate typical events.

Mathematically: Assume $X_t$ has cdf $F_X(x)$.

\[ \begin{align} F_{M_n}(x) &= P(M_n \leq x) = P(X_t \leq x \text{ for all } t = 1, \dots,n) \\ &= P^n(X_t \leq x)\\ &= F_X^n(x) \end{align} \] If we know $F_X$ exactly, then we know $F_{M_n}$ exactly. But if we have to estimate $F_X$, any errors get amplified by $n$.