By the end of this chapter, the student should be able to:
Before fitting a statistical model for extremes, it is important to look at the data carefully. Extreme value models are sensitive to tail behaviour, and tail behaviour can be affected by outliers, data errors, dependence, volatility clustering, skewness, and changes in the data-generating process. For this reason, exploratory data analysis is not a cosmetic step. It is part of model building.
In ordinary statistical analysis, exploratory tools help the analyst understand the centre, spread, and shape of the data. In extreme value analysis, they also help the analyst understand the tail. Since the Peaks Over Threshold method depends on selecting a high threshold, the analyst must examine how the data behave at high levels before fitting the Generalized Pareto Distribution.
Useful exploratory tools include histograms, box plots, time series plots, empirical distribution function plots, probability plots, quantile plots, and scatter plots in multivariate cases. These tools help reveal whether the distribution is symmetric or skewed, whether it appears heavy-tailed, whether outliers are present, and whether observations show time dependence.
For financial return series, exploratory analysis should normally begin by plotting returns and losses over time. This helps reveal volatility clustering. A histogram can then show the general shape of the distribution. A box plot can highlight unusually large observations. A QQ plot can compare the empirical distribution with a reference distribution such as the normal distribution. If the data have heavier tails than the reference distribution, the QQ plot will show systematic departures from linearity in the tails.
set.seed(2426)
n <- 2500
returns <- rt(n, df = 4) / 100
losses <- -returns
loss_data <- tibble(
Time = 1:n,
Loss = losses
)
summary(loss_data$Loss)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.0742262 -0.0077355 -0.0005381 -0.0005440 0.0067286 0.0667676
ggplot(loss_data, aes(x = Time, y = Loss)) +
geom_line() +
labs(
title = "Simulated Loss Series",
x = "Time",
y = "Loss"
)
ggplot(loss_data, aes(x = Loss)) +
geom_histogram(bins = 70) +
labs(
title = "Histogram of Losses",
x = "Loss",
y = "Frequency"
)
ggplot(loss_data, aes(y = Loss)) +
geom_boxplot() +
labs(
title = "Box Plot of Losses",
y = "Loss"
)
The time plot gives a first impression of whether the loss series is stable over time. The histogram gives the overall distributional shape. The box plot highlights unusually large losses, which are potential candidates for tail modelling.
Probability plots and quantile plots are useful for comparing the empirical data with a reference distribution. A quantile-quantile plot, commonly called a QQ plot, compares sample quantiles with theoretical quantiles from a chosen distribution.
If the data are generated from the reference distribution, the QQ plot should be approximately linear. If the data are a linear transformation of the reference distribution, the plot should still be approximately linear, but with a different slope and intercept. This makes QQ plots useful for assessing distributional fit, location, scale, and shape.
QQ plots are also useful for detecting outliers. If one or a few observations are very different from the rest, they may appear as points far from the general linear pattern. In extreme value analysis, the most important departures usually occur in the tail. If the right tail of the data is heavier than the right tail of the reference distribution, the upper tail points may curve upward relative to the reference line.
For financial losses, a normal QQ plot can show whether the normal distribution underestimates the upper tail. If losses are heavy-tailed, the largest sample quantiles will often be larger than the corresponding normal quantiles.
qqnorm(losses, main = "Normal QQ Plot of Losses")
qqline(losses, lty = 2)
If the points follow a straight line closely, the normal model may be a reasonable approximation. If the upper tail bends away from the line, the losses may be heavier-tailed than the normal distribution. This supports the use of extreme value methods.
Threshold selection is one of the most important practical problems in Peaks Over Threshold modelling. The POT method assumes that exceedances over a sufficiently high threshold can be approximated by the Generalized Pareto Distribution. The phrase “sufficiently high” is important. The GPD approximation is a tail approximation, not a model for the entire distribution.
If the threshold is too low, the analyst includes observations from the centre of the distribution. These observations are not genuine tail observations and can bias the fitted GPD model. A low threshold usually gives more exceedances, which may reduce variance, but the model may be wrong because non-tail data have been included.
If the threshold is too high, only a few exceedances remain. This reduces bias because the selected observations are more genuinely extreme, but it increases variance because the parameter estimates are based on very little data. Estimates may become unstable and unreliable.
Therefore, threshold selection involves a bias-variance trade-off. A lower threshold gives more observations but may introduce bias. A higher threshold gives fewer observations but may increase variance. The practical goal is to choose a threshold high enough for the GPD approximation to be reasonable, but low enough to retain enough exceedances for stable estimation.
There is no universally perfect threshold selection method. Analysts often compare several graphical and numerical tools before making a final judgement. This chapter discusses three methods: the Mean Residual Life Plot, the Hill Plot, and the Square Error Method.
The Mean Residual Life Plot, also called the Mean Excess Plot, is one of the most commonly used graphical methods for threshold selection.
For a threshold \(u\), the mean excess function is
\[ e(u)=E[X-u\mid X>u]. \]
It measures the expected amount by which a loss exceeds the threshold, given that the threshold has already been exceeded.
In a sample, the empirical mean excess at threshold \(u\) is estimated by
\[ \hat{e}(u)=\frac{1}{N_u}\sum_{i:X_i>u}(X_i-u), \]
where \(N_u\) is the number of observations exceeding \(u\).
The theoretical reason behind the Mean Residual Life Plot is that if the excesses above a threshold follow a GPD, then the mean excess function is approximately linear in \(u\), provided \(\xi<1\). For a GPD tail, the mean excess over threshold (u”) can be expressed in a linear form over a suitable threshold range. Therefore, a good GPD fit is suggested by a region of the Mean Residual Life Plot that is approximately linear.
In practice, the analyst plots empirical mean excesses over a range of candidate thresholds. The threshold is often chosen as the lowest level above which the plot appears roughly linear, allowing for sampling uncertainty. Choosing the lowest point in a stable linear region helps retain as many exceedances as possible while still focusing on the tail.
mean_excess <- function(x, thresholds) {
map_dfr(thresholds, function(u) {
exceedances <- x[x > u]
tibble(
Threshold = u,
Number_Exceedances = length(exceedances),
Mean_Excess = ifelse(length(exceedances) > 0, mean(exceedances - u), NA_real_)
)
})
}
threshold_grid <- quantile(losses, probs = seq(0.70, 0.98, by = 0.01))
mrl_data <- mean_excess(losses, threshold_grid)
head(mrl_data)
ggplot(mrl_data, aes(x = Threshold, y = Mean_Excess)) +
geom_line() +
geom_point() +
labs(
title = "Mean Residual Life Plot",
x = "Threshold",
y = "Mean excess"
)
A threshold may be considered reasonable where the plot becomes approximately linear. In practice, this decision should not be based on one point alone. The analyst should look for a range of thresholds where the pattern is stable.
The Hill Plot is another important graphical method for studying heavy-tailed data. It is based on the Hill estimator of the tail index. The Hill estimator is mainly used for heavy-tailed distributions, especially those in the Fréchet domain of attraction.
Let the sample observations be ordered from largest to smallest:
\[ X_{1,n}\geq X_{2,n}\geq \cdots \geq X_{n,n}. \]
Using the largest \(k\) observations, the Hill estimator is commonly written as
\[ \widehat{\gamma}_k = \frac{1}{k}\sum_{i=1}^{k} \left(\log X_{i,n}-\log X_{k+1,n}\right), \]
where \(k\) is the number of upper order statistics used. The value \(X_{k+1,n}\) acts like a threshold. The Hill estimator is often interpreted as an estimator of the tail index for heavy-tailed data.
The Hill Plot is constructed by plotting the Hill estimate against \(k\), the number of upper order statistics. The idea is to search for a region where the estimate is relatively stable. A stable region suggests a reasonable range of thresholds or exceedance counts.
If \(k\) is too small, very few observations are used and the estimator may be highly variable. If \(k\) is too large, non-tail observations may be included and the estimator may be biased. This is another form of the same threshold selection problem.
hill_estimates <- function(x, max_k = NULL) {
x_positive <- sort(x[x > 0], decreasing = TRUE)
n_pos <- length(x_positive)
if (is.null(max_k)) {
max_k <- floor(n_pos / 3)
}
k_values <- 5:max_k
map_dfr(k_values, function(k) {
hill_value <- mean(log(x_positive[1:k]) - log(x_positive[k + 1]))
tibble(
k = k,
Hill_Estimate = hill_value,
Threshold = x_positive[k + 1]
)
})
}
hill_data <- hill_estimates(losses, max_k = 300)
head(hill_data)
ggplot(hill_data, aes(x = k, y = Hill_Estimate)) +
geom_line() +
labs(
title = "Hill Plot",
x = "Number of upper order statistics, k",
y = "Hill estimate"
)
The Hill Plot should be interpreted cautiously. A completely flat region is rarely observed in real data. The analyst looks for a reasonably stable portion of the curve, not mathematical perfection. The Hill Plot is useful, but it should be compared with other threshold selection tools.
The Square Error Method is a more numerical approach to threshold selection. The purpose is to choose a threshold that minimizes the mean square error of the tail index estimator. The method is motivated by the idea that threshold selection should balance bias and variance.
A very low threshold may produce a biased estimator because it includes non-extreme observations. A very high threshold may produce an estimator with high variance because too few exceedances remain. The Square Error Method attempts to find a threshold that gives a better balance between these two problems.
In principle, the optimal threshold is the one that minimizes an estimate of the mean square error. Mean square error can be viewed as
\[ MSE(\hat{\theta})=Bias(\hat{\theta})^2+Var(\hat{\theta}), \]
where \(\hat{\theta}\) is an estimator of a tail parameter. Although estimating the exact MSE can be difficult in practice, the method is useful because it encourages analysts to think objectively about threshold choice rather than relying only on visual judgement.
The Square Error Method is therefore useful as a complement to graphical methods. It does not remove the need for judgement, but it provides a more systematic basis for comparing thresholds.
The following simple R illustration compares thresholds by fitting a GPD at each threshold and recording changes in the estimated shape parameter. This is not a full implementation of the theoretical Square Error Method, but it gives a practical numerical diagnostic for threshold stability.
candidate_thresholds <- as.numeric(quantile(losses, probs = seq(0.80, 0.98, by = 0.02)))
fit_gpd_safe <- function(x, u) {
fit <- tryCatch(
evir::gpd(x, threshold = u),
error = function(e) NULL
)
if (is.null(fit)) {
return(tibble(
Threshold = u,
Exceedances = sum(x > u),
Beta = NA_real_,
Xi = NA_real_
))
}
tibble(
Threshold = u,
Exceedances = sum(x > u),
Beta = as.numeric(fit$par.ests["beta"]),
Xi = as.numeric(fit$par.ests["xi"])
)
}
threshold_stability <- map_dfr(candidate_thresholds, ~ fit_gpd_safe(losses, .x))
threshold_stability
ggplot(threshold_stability, aes(x = Threshold, y = Xi)) +
geom_line() +
geom_point() +
labs(
title = "Shape Parameter Stability Across Thresholds",
x = "Threshold",
y = "Estimated shape parameter"
)
A reasonable threshold range is suggested when the estimated shape parameter becomes relatively stable while still retaining enough exceedances. If the estimates vary wildly, the threshold choice may be unreliable.
After selecting a threshold, the next task is to estimate the parameters of the Generalized Pareto Distribution. The most common approach is Maximum Likelihood Estimation.
Maximum Likelihood Estimation, abbreviated as MLE, is a general method for estimating the parameters of a statistical model. The principle is simple: choose the parameter values that make the observed data most likely under the assumed model.
Suppose the excess observations are
\[ Y_1,Y_2,\ldots,Y_m, \]
where \(m=N_u\) is the number of exceedances above threshold \(u\). If the excesses are treated as independent observations from a GPD with parameters \(\xi\) and \(\beta\), then the likelihood is the joint density of the observed excesses as a function of the parameters.
For independent and identically distributed observations, the likelihood is
\[ L(\xi,\beta)=\prod_{i=1}^{m} g_{\xi,\beta}(Y_i), \]
where \(g_{\xi,\beta}\) is the GPD density. The log-likelihood is
\[ \ell(\xi,\beta)=\sum_{i=1}^{m}\log g_{\xi,\beta}(Y_i). \]
The maximum likelihood estimates are the values \(\hat{\xi}\) and \(\hat{\beta}\) that maximize this log-likelihood. In practice, numerical optimization is usually used because the equations may not have simple closed-form solutions.
The likelihood approach depends on two key requirements. First, the distributional form must be specified. In this case, the assumed distribution is the GPD. Second, the likelihood function must be tractable enough to evaluate for admissible parameter values.
chosen_threshold <- as.numeric(quantile(losses, 0.95))
gpd_fit <- evir::gpd(losses, threshold = chosen_threshold)
gpd_fit
## $n
## [1] 2500
##
## $data
## [1] 0.02115044 0.02542669 0.03704204 0.02099064 0.02260003 0.02896119
## [7] 0.03647515 0.02200445 0.03450129 0.02161077 0.02546448 0.02555203
## [13] 0.02137535 0.04564327 0.02282741 0.02367057 0.02427238 0.03061952
## [19] 0.02095952 0.02803815 0.02590543 0.02328326 0.03194959 0.04006507
## [25] 0.02159431 0.02589774 0.02808120 0.02396135 0.02634717 0.02698198
## [31] 0.04177238 0.03062239 0.06676763 0.02870527 0.03047829 0.02145501
## [37] 0.02883841 0.02776283 0.03157348 0.04079771 0.02543815 0.03955570
## [43] 0.04579705 0.02759358 0.04741408 0.03746513 0.02597083 0.02843993
## [49] 0.04817968 0.02227096 0.02936915 0.04273414 0.03151796 0.03706597
## [55] 0.03107302 0.02120672 0.02200827 0.02619922 0.02268391 0.03550055
## [61] 0.04000169 0.02420011 0.02114728 0.02112431 0.03463345 0.02180116
## [67] 0.02937626 0.02466265 0.03077847 0.02182595 0.03155997 0.02861594
## [73] 0.02181366 0.02287301 0.02420867 0.03811384 0.04518738 0.02287953
## [79] 0.02320065 0.03218452 0.02803604 0.03957394 0.02550968 0.02783878
## [85] 0.03182176 0.02820740 0.02391506 0.03069947 0.03047560 0.02415990
## [91] 0.05119441 0.02878724 0.02735805 0.02398549 0.02107715 0.02355790
## [97] 0.04010289 0.02495669 0.02319530 0.02458218 0.02375462 0.02546648
## [103] 0.02999231 0.02745287 0.02344161 0.02599453 0.03259374 0.03153653
## [109] 0.03996491 0.04146278 0.04802274 0.02103627 0.04484540 0.02375889
## [115] 0.02809160 0.03024708 0.02779610 0.02244865 0.02107879 0.02936198
## [121] 0.02567790 0.02577683 0.02389564 0.02391462 0.02092191
##
## $threshold
## [1] 0.02086317
##
## $p.less.thresh
## [1] 0.95
##
## $n.exceed
## [1] 125
##
## $method
## [1] "ml"
##
## $par.ests
## xi beta
## -0.066368645 0.009037808
##
## $par.ses
## xi beta
## 0.088550531 0.001091323
##
## $varcov
## [,1] [,2]
## [1,] 7.841196e-03 -7.260395e-05
## [2,] -7.260395e-05 1.190985e-06
##
## $information
## [1] "observed"
##
## $converged
## [1] 0
##
## $nllh.final
## [1] -471.5698
##
## attr(,"class")
## [1] "gpd"
The fitted object gives estimates of the GPD scale and shape parameters. The shape parameter is especially important because it determines tail heaviness and affects the existence of moments and Expected Shortfall.
This application combines the main tools in the chapter. We simulate heavy-tailed losses, examine the data, use graphical threshold diagnostics, fit GPD models at different thresholds, and compare risk estimates.
set.seed(2026)
n <- 3000
returns <- rt(n, df = 4) / 100
losses <- -returns
threshold_probs <- c(0.90, 0.925, 0.95, 0.975)
thresholds <- as.numeric(quantile(losses, threshold_probs))
threshold_comparison <- map_dfr(seq_along(thresholds), function(i) {
u <- thresholds[i]
fit <- evir::gpd(losses, threshold = u)
beta_hat <- as.numeric(fit$par.ests["beta"])
xi_hat <- as.numeric(fit$par.ests["xi"])
Nu <- sum(losses > u)
pot_var <- function(q) {
if (abs(xi_hat) < 1e-8) {
u + beta_hat * log(Nu / (n * (1 - q)))
} else {
u + (beta_hat / xi_hat) * ((Nu / (n * (1 - q)))^xi_hat - 1)
}
}
var_99 <- pot_var(0.99)
es_99 <- if (xi_hat < 1) {
var_99 + (beta_hat + xi_hat * (var_99 - u)) / (1 - xi_hat)
} else {
Inf
}
tibble(
Threshold_Probability = threshold_probs[i],
Threshold = u,
Exceedances = Nu,
Beta = beta_hat,
Xi = xi_hat,
VaR_99 = var_99,
ES_99 = es_99
)
})
threshold_comparison
ggplot(threshold_comparison, aes(x = Threshold_Probability, y = Xi)) +
geom_line() +
geom_point() +
labs(
title = "Estimated GPD Shape Parameter Across Threshold Choices",
x = "Threshold probability",
y = "Estimated shape parameter"
)
The comparison table shows how the number of exceedances, GPD parameter estimates, VaR, and Expected Shortfall change as the threshold increases. A good threshold choice should not be based only on the highest possible threshold. It should balance tail purity against estimation stability.
Students often treat threshold selection as a mechanical rule. In practice, it requires judgement informed by diagnostic plots, parameter stability, sample size, and the purpose of the analysis.
Another common mistake is to choose a threshold that is too low because it gives many exceedances. More data is useful only if the data are relevant to the tail model. Including central observations can bias the GPD fit.
A third mistake is to choose a threshold that is too high because it seems more extreme. A very high threshold may leave too few observations, causing unstable parameter estimates.
Students also sometimes misread the Mean Residual Life Plot. The aim is not to find a perfectly straight plot from the beginning. The aim is to find a region where the plot becomes approximately linear.
The Hill Plot is also frequently overinterpreted. Real data rarely produce a perfectly stable Hill estimate. The analyst looks for a reasonably stable range, while comparing with other diagnostics.
Finally, students sometimes forget that threshold selection affects VaR and Expected Shortfall estimates. A poor threshold can produce misleading tail-risk estimates even if the GPD fitting procedure itself is correctly coded.
A risk analyst wants to estimate extreme losses using the Peaks Over Threshold method. The analyst knows that threshold selection is a major practical problem and is considering the Mean Residual Life Plot, Hill Plot, and Square Error Method.
Required: