Demo based on example from WCRP (world climate research program) summer school 2019.
Fort Collins flood, 1997 caused $250 million in damage and killed 5 people.
Recording station not at center of storm.
Measured value for the 1997 event: 117.6 mm.
Questions to answer: what is the return period for a 117.6 mm event? And, what is the 100-year return level?
To answer the questions: we will use data from 1948-1990.
This requires extrapolation into the tail: the largest observed event from 1948-1990 is 103 mm.
we will model the data in two ways:
model all non-zero data
model only extreme data
Formula for empirical frequency estimates:
\[ \hat{p} = \frac{1}{N}\sum^N_{i=1}\mathbf{1}_{x_i>u} \] which has relative error
\[ \text{RE} = \frac{1}{\sqrt{\hat{p}N}} \]
…to estimate a return period of 100 years with 10% error, we need 10,000 years of data.
Let \(X_t\) be the daily summer precipitation amount for Fort Collins, where summer is April - October.
To model precipitation, we need to account for zeros. We assume:
\[ X_t > 0 \text{ w.p. } p \\ X_t = 0 \text{ w.p. } 1- p \] Probability that we have a non-zero precipitation event:
length(summer[Prec>0,get("Prec")]) / length(summer[,get("Prec")])
## [1] 0.2628776
so probability that \(X_T\) > 0 is 0.2629.
Now assume that \([X_t | X_t > 0]\) are distributed Gamma(alpha, beta):
ML estimates: \(\hat{\alpha} = 0.657\), \(\hat{\beta} = 3.18\).
All precipitation model estimate:
\[ \begin{align} P(X_t > 117.6) &= P(X_t > 117.6 | X_t > 0)P(X_t > 0)\\ &= (1 - F_x(117.6))(0.2629)\\ &=3.01 \times 10^{-8} \end{align} \]
Now get return period probability:
\[ \begin{align} P(\text{ann max} > 117.6) &= 1- P(\text{entire year's obs} < 117.6))\\ &= 1 - (1- P(\text{indiv obs} > 117.6))^{214}\\ &= 1 - (1 - 3.01 \times 10^{-8})^{214}\\ &= 6.44 \times 10^{-6} \end{align} \] Assumes independence of individual observations, 214 summer days in a year.
Return period for precipitation event of 117.6:
\[ 1 / 6.44 \times 10^{-6} = 115,246 \text{ years} \] Estimate of the 100 year return level: 60.5 mm
98% of the data lies below 30 mm…
Extract annual maxima and fit a model:
Let \(M_n = \text{max}_{t=1,\dots,n}(X_t)\). Assume \(M_n \sim \text{GEV}(\mu,\sigma,\xi)\).
ML estimates: \(\hat{\mu} = 36.0\), \(\hat{\sigma} = 14.12\), \(\hat{\xi} = 0.16\).
Calculate the return period of the 117.6 mm event and the return level of the 100 year return period:
\[ P(\text{ann max > 117.6}) = 1 - F_{M_n}(117.6) = 0.016 \] Return period point estimate for an event of 117.6 mm: \(1 / 0.016 = 62.5\) years
100-year return level point estimate: 130.9 mm
Intuitively: phenomena that generate extreme events are fundamentally different than those that generate typical events.
Mathematically: Assume \(X_t\) has cdf \(F_X(x)\).
\[ \begin{align} F_{M_n}(x) &= P(M_n \leq x) = P(X_t \leq x \text{ for all } t = 1, \dots,n) \\ &= P^n(X_t \leq x)\\ &= F_X^n(x) \end{align} \] If we know \(F_X\) exactly, then we know \(F_{M_n}\) exactly. But if we have to estimate \(F_X\), any errors get amplified by \(n\).