Frequency analysis for Sandsli floods

Dataset

Data pulled from Hydra II with command:

lescon_var -b 0 -f timevalue 35 56 1 0 1001 1 > sandsli.txt

We use data from 01.01.2002 - 31.12.2023. Data older than this has time spacing that is too coarse for frequency analysis.

Currently the peaks used in the frequency analysis have time spacing of one minute.

Streamflow data used in frequency analysis

Original time spacing

Data from 01.01.2002 to 01.06.2014 has time spacing of one minute, except when there are gaps in the data. The gaps usually occur in periods of low flow.

On 01.06.2014 a new sensor was installed that records peaks at spacing of one minute and records every 60 minutes otherwise.

Modeling Threshold Excesses

We model the threshold excesses with the generalized Pareto distribution. There are several parameterizations of the generalized Pareto. We use the following pdf:

\[\begin{equation*} p(y|u,\sigma,k) = \begin{cases} \frac{1}{\sigma}\left(1 + k\left(\frac{y-u}{\sigma}\right)\right)^{-\frac{1}{k}-1} & \text{if } k \neq 0\\ \frac{1}{\sigma}\text{exp}\left(\frac{y-u}{\sigma}\right) & \text{if } k = 0 \end{cases} \end{equation*}\]

where \(u\) is the threshold, \(\sigma\) is the scale parameter, \(k\) is the shape parameters and the data \(y \in (u, \infty)\).

We use a Bayesian approach to estimate the parameters \(\sigma, k\) of the distribution. Bayesian inference is performed using a Hamiltonian Monte Carlo (HMC) algorithm implemented in the probabilistic programming language Stan. Implementation of the generalized Pareto follows Aki Vehtari’s work with user defined probability functions in Stan.

Preliminary threshold selection

We can make an empirical threshold selection prior to fitting a model via a mean residual life plot. We plot average excess value against a series of thresholds; the mean residual life plot should be approximately linear above the threshold at which the generalized Pareto model holds. The interpretation of mean residual plots is not always simple in practice. Ours seems to have several potentially linear sections. Above 0.10 we get large variation–likely meaning there are too few points to make meaningful inferences above this level. We choose instead a threshold of 0.065, marking the beginning of the second linear section.

Identifying independent events

We assume events are separated by at least 24 hours. We then identify the maximum from each event that exceeds the threshold value. This set of maxima are modeled generalized Pareto.

Parameter estimation

Estimated values

The parameter estimates and credible intervals are:

##             mean        2.5%         25%        50%        75%     97.5%
## sigma 0.02699058  0.01917025  0.02382493 0.02666858 0.02977309 0.0369629
## k     0.08415996 -0.14251167 -0.01050895 0.06910055 0.16516279 0.3853012

We plot the full posterior distributions for both \(k\) and \(\sigma\) and compare the Stan estimates to two other estimation methods: maximum likelihood from the extRemes package and the method proposed in Zhang & Stephens (2009) and implemented in the loo package.

The estimates all show good agreement. Note that the estimation uncertainty for \(k\) is large and the 80% credible interval crosses zero.

Model fit and MCMC diagnostics

We fit the Stan model (functions getEventsFn.R and gpareto.stan found in filepath).

u = 0.065 # threshold
peaks <- getEventsFn(sandsli.Q, u) # find independent POT maxima

ds <- list(ymin = u,
           N = length(peaks$cumecs),
           y = peaks$cumecs)
fit.gpd <- stan(file = 'gpareto.stan', 
                data = ds, 
                refresh = 0, chains = 4, seed = 42)

The MCMC diagnostics for the model fit look good:

monitor(as.array(fit.gpd, pars=c("k","sigma")))

## Inference for the input samples (4 chains: each with iter = 1000; warmup = 500):
## 
##         Q5 Q50 Q95 Mean  SD  Rhat Bulk_ESS Tail_ESS
## k     -0.1 0.1 0.3  0.1 0.1  1.01      784      814
## sigma  0.0 0.0 0.0  0.0 0.0  1.01      818     1078
## 
## For each parameter, Bulk_ESS and Tail_ESS are crude measures of 
## effective sample size for bulk and tail quantities respectively (an ESS > 100 
## per chain is considered good), and Rhat is the potential scale reduction 
## factor on rank normalized split chains (at convergence, Rhat <= 1.05).

check_hmc_diagnostics(fit.gpd)

## 
## Divergences:

## 0 of 4000 iterations ended with a divergence.

## 
## Tree depth:

## 0 of 4000 iterations saturated the maximum tree depth of 10.

## 
## Energy:

## E-BFMI indicated no pathological behavior.

Posterior predictive checks

Posterior predictive checking is the process of comparing observed data to simulated samples. If our model is a good fit, we should be able to use it to generate samples that are consistent with observed data.

Ideally, this comparison would take place with an independent test set of holdout data, but this is often not accessible, particularly in the data-sparse world of extreme value modeling. We can still do some checking and performance assessments with what we have; these checks can tell us if there are aspects of our in-sample data that our model is failing to capture.

We assess the following:

Kernel density estimates for observed vs simulated data
Posterior predictive test statistics
Leave-one-out cross-validation (LOO) probability integral transforms (PIT)

All of the checks showed good agreement between the model and the data.

Kernel density estimates

Kernel density estimates for observed vs simulated data compare the empirical distribution of the data (thick dark curve) to the distributions of simulated data from the posterior predictive distribution (thin light blue lines). Here we simulate 100 data sets.

The simulated vs observed data are in good agreement.

Posterior predictive test statistics

In addition to comparing the density estimates for simulated vs observed data, it is often useful to compare “test” statistics that summarize different distributional properties. In extreme value analysis, the maximum data point is often important, so we compute the maximum of each of our simulated data sets and compare to the maximum of the observed data set.

The observed and simulated data are in good agreement, and we see that our model can predict maxima beyond the observed maximum value.

LOO PIT

The kernel density estimates and test statistics used replicated data from the joint posterior predictive distribution. Alternatively, we can use leave-one-out (LOO) cross-validation to partially avoid double use of the data.

To check how reliable our model is, we can look at the LOO cross-validation predictive cumulative density function values, which are asymptotically uniform if the model is producing an estimated distribution close to the unknown true distribution.

In this figure the light blue lines are 100 simulated data sets from a standard uniform distribution and the computed LOO PIT values are the thick dark curve.

Overall, we see good agreement between the simulated data and the LOO PIT values. The dip at higher quantiles shows the model may slightly underestimate at these quantiles, but overall the LOO PITs are consistent with the simulated uniform data.

Return levels

Return levels in a POT model are not as straightforward as in a model that uses block maxima. Return levels are tied to yearly exceedence probabilities: the \(N\) year return level is the level with an average recurrence interval of \(N\) years. But in a POT model, more than one event can happen per year.

Thus, in addition to estimating \(k\) and \(\sigma\), in order to calculate return levels we need an estimate of \(\tau_u\), the probability of an individual observation exceeding the threshold \(u\) in any given year.

We compute this empirically. Note that this ignores the estimation uncertainty in \(\tau_u\); if we want to account for this we could explore modeling the number of exceedences binomial Bin\((n,\tau_u)\) as in Coles (2001).

Return level plot

We plot the return level plot with 90% credible interval and observed points overlaid. Observed points are plotted using the Weibull plotting position formula (see section 2.4.3 in NVE report 2011-09). Plotting positions are used only to visualize extreme values on return value plots; plotting positions are not used to generate the flood frequency curve.

Computation of return levels (code)

Return levels for Sandsli are computed as

## 'peaks' are the independent threshold-exceeding event maxima

u = 0.065 # threshold

# total num. of independent threshold exceedences:
nu <- dim(peaks)[1] 

# average number of independent threshold exceedences per year:
nu.avg <- mean(peaks[,.N,by=year(date)]$N) 

M <- length(unique(year(peaks$date))) # num. of years of records
ny <- 525600 # number of minutes in a year

tau.hat <- nu/(ny*M*nu.avg)

# parameter estimates (posterior means) from generalized Pareto:
sigma.hat <- sigma_k_summary[1,1] 
k.hat <- sigma_k_summary[2,1]

# desired return periods:
N = 2:500

## written to match notation in Coles (2001) pg. 82.
## this return level ignores the estimation uncertainty tied to tau.hat:
rl <- u + sigma.hat/k.hat*( (N*ny*tau.hat)^k.hat - 1 )

Upcoming: comparison of return levels to DDDevent output

References

Coles, Stuart, et al. An introduction to statistical modeling of extreme values. Vol. 208. London: Springer, 2001.

Zhang, Jin, and Michael A. Stephens. “A new and efficient estimation method for the generalized Pareto distribution.” Technometrics 51.3 (2009): 316-325.