1 Learning Objectives

By the end of this chapter, the student should be able to:

  1. explain why exploratory data analysis is necessary before fitting extreme value models;
  2. explain the role of histograms, box plots, empirical distribution functions, probability plots, and quantile plots in tail analysis;
  3. explain why threshold selection is difficult in Peaks Over Threshold modelling;
  4. describe the bias-variance trade-off involved in choosing a threshold;
  5. explain the Mean Residual Life Plot and how it is used to choose a threshold;
  6. explain the Hill Plot and how it is used to study tail stability;
  7. explain the Square Error Method as a more objective threshold selection method;
  8. describe the basic idea of Maximum Likelihood Estimation for GPD parameters;
  9. use R to construct threshold diagnostic plots;
  10. use R to compare threshold choices and fit GPD models.

2 Exploratory Data Analysis for Extremes

Before fitting a statistical model for extremes, it is important to look at the data carefully. Extreme value models are sensitive to tail behaviour, and tail behaviour can be affected by outliers, data errors, dependence, volatility clustering, skewness, and changes in the data-generating process. For this reason, exploratory data analysis is not a cosmetic step. It is part of model building.

In ordinary statistical analysis, exploratory tools help the analyst understand the centre, spread, and shape of the data. In extreme value analysis, they also help the analyst understand the tail. Since the Peaks Over Threshold method depends on selecting a high threshold, the analyst must examine how the data behave at high levels before fitting the Generalized Pareto Distribution.

Useful exploratory tools include histograms, box plots, time series plots, empirical distribution function plots, probability plots, quantile plots, and scatter plots in multivariate cases. These tools help reveal whether the distribution is symmetric or skewed, whether it appears heavy-tailed, whether outliers are present, and whether observations show time dependence.

For financial return series, exploratory analysis should normally begin by plotting returns and losses over time. This helps reveal volatility clustering. A histogram can then show the general shape of the distribution. A box plot can highlight unusually large observations. A QQ plot can compare the empirical distribution with a reference distribution such as the normal distribution. If the data have heavier tails than the reference distribution, the QQ plot will show systematic departures from linearity in the tails.

set.seed(2426)

n <- 2500
returns <- rt(n, df = 4) / 100
losses <- -returns

loss_data <- tibble(
  Time = 1:n,
  Loss = losses
)

summary(loss_data$Loss)
##       Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
## -0.0742262 -0.0077355 -0.0005381 -0.0005440  0.0067286  0.0667676
ggplot(loss_data, aes(x = Time, y = Loss)) +
  geom_line() +
  labs(
    title = "Simulated Loss Series",
    x = "Time",
    y = "Loss"
  )

ggplot(loss_data, aes(x = Loss)) +
  geom_histogram(bins = 70) +
  labs(
    title = "Histogram of Losses",
    x = "Loss",
    y = "Frequency"
  )

ggplot(loss_data, aes(y = Loss)) +
  geom_boxplot() +
  labs(
    title = "Box Plot of Losses",
    y = "Loss"
  )

The time plot gives a first impression of whether the loss series is stable over time. The histogram gives the overall distributional shape. The box plot highlights unusually large losses, which are potential candidates for tail modelling.

3 Probability and Quantile Plots

Probability plots and quantile plots are useful for comparing the empirical data with a reference distribution. A quantile-quantile plot, commonly called a QQ plot, compares sample quantiles with theoretical quantiles from a chosen distribution.

If the data are generated from the reference distribution, the QQ plot should be approximately linear. If the data are a linear transformation of the reference distribution, the plot should still be approximately linear, but with a different slope and intercept. This makes QQ plots useful for assessing distributional fit, location, scale, and shape.

QQ plots are also useful for detecting outliers. If one or a few observations are very different from the rest, they may appear as points far from the general linear pattern. In extreme value analysis, the most important departures usually occur in the tail. If the right tail of the data is heavier than the right tail of the reference distribution, the upper tail points may curve upward relative to the reference line.

For financial losses, a normal QQ plot can show whether the normal distribution underestimates the upper tail. If losses are heavy-tailed, the largest sample quantiles will often be larger than the corresponding normal quantiles.

qqnorm(losses, main = "Normal QQ Plot of Losses")
qqline(losses, lty = 2)

If the points follow a straight line closely, the normal model may be a reasonable approximation. If the upper tail bends away from the line, the losses may be heavier-tailed than the normal distribution. This supports the use of extreme value methods.

4 Threshold Selection in POT Models

Threshold selection is one of the most important practical problems in Peaks Over Threshold modelling. The POT method assumes that exceedances over a sufficiently high threshold can be approximated by the Generalized Pareto Distribution. The phrase “sufficiently high” is important. The GPD approximation is a tail approximation, not a model for the entire distribution.

If the threshold is too low, the analyst includes observations from the centre of the distribution. These observations are not genuine tail observations and can bias the fitted GPD model. A low threshold usually gives more exceedances, which may reduce variance, but the model may be wrong because non-tail data have been included.

If the threshold is too high, only a few exceedances remain. This reduces bias because the selected observations are more genuinely extreme, but it increases variance because the parameter estimates are based on very little data. Estimates may become unstable and unreliable.

Therefore, threshold selection involves a bias-variance trade-off. A lower threshold gives more observations but may introduce bias. A higher threshold gives fewer observations but may increase variance. The practical goal is to choose a threshold high enough for the GPD approximation to be reasonable, but low enough to retain enough exceedances for stable estimation.

There is no universally perfect threshold selection method. Analysts often compare several graphical and numerical tools before making a final judgement. This chapter discusses three methods: the Mean Residual Life Plot, the Hill Plot, and the Square Error Method.

5 Mean Residual Life Plot

The Mean Residual Life Plot, also called the Mean Excess Plot, is one of the most commonly used graphical methods for threshold selection.

For a threshold \(u\), the mean excess function is

\[ e(u)=E[X-u\mid X>u]. \]

It measures the expected amount by which a loss exceeds the threshold, given that the threshold has already been exceeded.

In a sample, the empirical mean excess at threshold \(u\) is estimated by

\[ \hat{e}(u)=\frac{1}{N_u}\sum_{i:X_i>u}(X_i-u), \]

where \(N_u\) is the number of observations exceeding \(u\).

The theoretical reason behind the Mean Residual Life Plot is that if the excesses above a threshold follow a GPD, then the mean excess function is approximately linear in \(u\), provided \(\xi<1\). For a GPD tail, the mean excess over threshold (u”) can be expressed in a linear form over a suitable threshold range. Therefore, a good GPD fit is suggested by a region of the Mean Residual Life Plot that is approximately linear.

In practice, the analyst plots empirical mean excesses over a range of candidate thresholds. The threshold is often chosen as the lowest level above which the plot appears roughly linear, allowing for sampling uncertainty. Choosing the lowest point in a stable linear region helps retain as many exceedances as possible while still focusing on the tail.

mean_excess <- function(x, thresholds) {
  map_dfr(thresholds, function(u) {
    exceedances <- x[x > u]
    tibble(
      Threshold = u,
      Number_Exceedances = length(exceedances),
      Mean_Excess = ifelse(length(exceedances) > 0, mean(exceedances - u), NA_real_)
    )
  })
}

threshold_grid <- quantile(losses, probs = seq(0.70, 0.98, by = 0.01))
mrl_data <- mean_excess(losses, threshold_grid)

head(mrl_data)
ggplot(mrl_data, aes(x = Threshold, y = Mean_Excess)) +
  geom_line() +
  geom_point() +
  labs(
    title = "Mean Residual Life Plot",
    x = "Threshold",
    y = "Mean excess"
  )

A threshold may be considered reasonable where the plot becomes approximately linear. In practice, this decision should not be based on one point alone. The analyst should look for a range of thresholds where the pattern is stable.

6 Hill Plot Approach

The Hill Plot is another important graphical method for studying heavy-tailed data. It is based on the Hill estimator of the tail index. The Hill estimator is mainly used for heavy-tailed distributions, especially those in the Fréchet domain of attraction.

Let the sample observations be ordered from largest to smallest:

\[ X_{1,n}\geq X_{2,n}\geq \cdots \geq X_{n,n}. \]

Using the largest \(k\) observations, the Hill estimator is commonly written as

\[ \widehat{\gamma}_k = \frac{1}{k}\sum_{i=1}^{k} \left(\log X_{i,n}-\log X_{k+1,n}\right), \]

where \(k\) is the number of upper order statistics used. The value \(X_{k+1,n}\) acts like a threshold. The Hill estimator is often interpreted as an estimator of the tail index for heavy-tailed data.

The Hill Plot is constructed by plotting the Hill estimate against \(k\), the number of upper order statistics. The idea is to search for a region where the estimate is relatively stable. A stable region suggests a reasonable range of thresholds or exceedance counts.

If \(k\) is too small, very few observations are used and the estimator may be highly variable. If \(k\) is too large, non-tail observations may be included and the estimator may be biased. This is another form of the same threshold selection problem.

hill_estimates <- function(x, max_k = NULL) {
  x_positive <- sort(x[x > 0], decreasing = TRUE)
  n_pos <- length(x_positive)

  if (is.null(max_k)) {
    max_k <- floor(n_pos / 3)
  }

  k_values <- 5:max_k

  map_dfr(k_values, function(k) {
    hill_value <- mean(log(x_positive[1:k]) - log(x_positive[k + 1]))
    tibble(
      k = k,
      Hill_Estimate = hill_value,
      Threshold = x_positive[k + 1]
    )
  })
}

hill_data <- hill_estimates(losses, max_k = 300)
head(hill_data)
ggplot(hill_data, aes(x = k, y = Hill_Estimate)) +
  geom_line() +
  labs(
    title = "Hill Plot",
    x = "Number of upper order statistics, k",
    y = "Hill estimate"
  )

The Hill Plot should be interpreted cautiously. A completely flat region is rarely observed in real data. The analyst looks for a reasonably stable portion of the curve, not mathematical perfection. The Hill Plot is useful, but it should be compared with other threshold selection tools.

7 Square Error Method

The Square Error Method is a more numerical approach to threshold selection. The purpose is to choose a threshold that minimizes the mean square error of the tail index estimator. The method is motivated by the idea that threshold selection should balance bias and variance.

A very low threshold may produce a biased estimator because it includes non-extreme observations. A very high threshold may produce an estimator with high variance because too few exceedances remain. The Square Error Method attempts to find a threshold that gives a better balance between these two problems.

In principle, the optimal threshold is the one that minimizes an estimate of the mean square error. Mean square error can be viewed as

\[ MSE(\hat{\theta})=Bias(\hat{\theta})^2+Var(\hat{\theta}), \]

where \(\hat{\theta}\) is an estimator of a tail parameter. Although estimating the exact MSE can be difficult in practice, the method is useful because it encourages analysts to think objectively about threshold choice rather than relying only on visual judgement.

The Square Error Method is therefore useful as a complement to graphical methods. It does not remove the need for judgement, but it provides a more systematic basis for comparing thresholds.

The following simple R illustration compares thresholds by fitting a GPD at each threshold and recording changes in the estimated shape parameter. This is not a full implementation of the theoretical Square Error Method, but it gives a practical numerical diagnostic for threshold stability.

candidate_thresholds <- as.numeric(quantile(losses, probs = seq(0.80, 0.98, by = 0.02)))

fit_gpd_safe <- function(x, u) {
  fit <- tryCatch(
    evir::gpd(x, threshold = u),
    error = function(e) NULL
  )

  if (is.null(fit)) {
    return(tibble(
      Threshold = u,
      Exceedances = sum(x > u),
      Beta = NA_real_,
      Xi = NA_real_
    ))
  }

  tibble(
    Threshold = u,
    Exceedances = sum(x > u),
    Beta = as.numeric(fit$par.ests["beta"]),
    Xi = as.numeric(fit$par.ests["xi"])
  )
}

threshold_stability <- map_dfr(candidate_thresholds, ~ fit_gpd_safe(losses, .x))
threshold_stability
ggplot(threshold_stability, aes(x = Threshold, y = Xi)) +
  geom_line() +
  geom_point() +
  labs(
    title = "Shape Parameter Stability Across Thresholds",
    x = "Threshold",
    y = "Estimated shape parameter"
  )

A reasonable threshold range is suggested when the estimated shape parameter becomes relatively stable while still retaining enough exceedances. If the estimates vary wildly, the threshold choice may be unreliable.

8 Maximum Likelihood Estimation for GPD Parameters

After selecting a threshold, the next task is to estimate the parameters of the Generalized Pareto Distribution. The most common approach is Maximum Likelihood Estimation.

Maximum Likelihood Estimation, abbreviated as MLE, is a general method for estimating the parameters of a statistical model. The principle is simple: choose the parameter values that make the observed data most likely under the assumed model.

Suppose the excess observations are

\[ Y_1,Y_2,\ldots,Y_m, \]

where \(m=N_u\) is the number of exceedances above threshold \(u\). If the excesses are treated as independent observations from a GPD with parameters \(\xi\) and \(\beta\), then the likelihood is the joint density of the observed excesses as a function of the parameters.

For independent and identically distributed observations, the likelihood is

\[ L(\xi,\beta)=\prod_{i=1}^{m} g_{\xi,\beta}(Y_i), \]

where \(g_{\xi,\beta}\) is the GPD density. The log-likelihood is

\[ \ell(\xi,\beta)=\sum_{i=1}^{m}\log g_{\xi,\beta}(Y_i). \]

The maximum likelihood estimates are the values \(\hat{\xi}\) and \(\hat{\beta}\) that maximize this log-likelihood. In practice, numerical optimization is usually used because the equations may not have simple closed-form solutions.

The likelihood approach depends on two key requirements. First, the distributional form must be specified. In this case, the assumed distribution is the GPD. Second, the likelihood function must be tractable enough to evaluate for admissible parameter values.

chosen_threshold <- as.numeric(quantile(losses, 0.95))
gpd_fit <- evir::gpd(losses, threshold = chosen_threshold)
gpd_fit
## $n
## [1] 2500
## 
## $data
##   [1] 0.02115044 0.02542669 0.03704204 0.02099064 0.02260003 0.02896119
##   [7] 0.03647515 0.02200445 0.03450129 0.02161077 0.02546448 0.02555203
##  [13] 0.02137535 0.04564327 0.02282741 0.02367057 0.02427238 0.03061952
##  [19] 0.02095952 0.02803815 0.02590543 0.02328326 0.03194959 0.04006507
##  [25] 0.02159431 0.02589774 0.02808120 0.02396135 0.02634717 0.02698198
##  [31] 0.04177238 0.03062239 0.06676763 0.02870527 0.03047829 0.02145501
##  [37] 0.02883841 0.02776283 0.03157348 0.04079771 0.02543815 0.03955570
##  [43] 0.04579705 0.02759358 0.04741408 0.03746513 0.02597083 0.02843993
##  [49] 0.04817968 0.02227096 0.02936915 0.04273414 0.03151796 0.03706597
##  [55] 0.03107302 0.02120672 0.02200827 0.02619922 0.02268391 0.03550055
##  [61] 0.04000169 0.02420011 0.02114728 0.02112431 0.03463345 0.02180116
##  [67] 0.02937626 0.02466265 0.03077847 0.02182595 0.03155997 0.02861594
##  [73] 0.02181366 0.02287301 0.02420867 0.03811384 0.04518738 0.02287953
##  [79] 0.02320065 0.03218452 0.02803604 0.03957394 0.02550968 0.02783878
##  [85] 0.03182176 0.02820740 0.02391506 0.03069947 0.03047560 0.02415990
##  [91] 0.05119441 0.02878724 0.02735805 0.02398549 0.02107715 0.02355790
##  [97] 0.04010289 0.02495669 0.02319530 0.02458218 0.02375462 0.02546648
## [103] 0.02999231 0.02745287 0.02344161 0.02599453 0.03259374 0.03153653
## [109] 0.03996491 0.04146278 0.04802274 0.02103627 0.04484540 0.02375889
## [115] 0.02809160 0.03024708 0.02779610 0.02244865 0.02107879 0.02936198
## [121] 0.02567790 0.02577683 0.02389564 0.02391462 0.02092191
## 
## $threshold
## [1] 0.02086317
## 
## $p.less.thresh
## [1] 0.95
## 
## $n.exceed
## [1] 125
## 
## $method
## [1] "ml"
## 
## $par.ests
##           xi         beta 
## -0.066368645  0.009037808 
## 
## $par.ses
##          xi        beta 
## 0.088550531 0.001091323 
## 
## $varcov
##               [,1]          [,2]
## [1,]  7.841196e-03 -7.260395e-05
## [2,] -7.260395e-05  1.190985e-06
## 
## $information
## [1] "observed"
## 
## $converged
## [1] 0
## 
## $nllh.final
## [1] -471.5698
## 
## attr(,"class")
## [1] "gpd"

The fitted object gives estimates of the GPD scale and shape parameters. The shape parameter is especially important because it determines tail heaviness and affects the existence of moments and Expected Shortfall.

9 Chapter R Application

This application combines the main tools in the chapter. We simulate heavy-tailed losses, examine the data, use graphical threshold diagnostics, fit GPD models at different thresholds, and compare risk estimates.

set.seed(2026)

n <- 3000
returns <- rt(n, df = 4) / 100
losses <- -returns

threshold_probs <- c(0.90, 0.925, 0.95, 0.975)
thresholds <- as.numeric(quantile(losses, threshold_probs))

threshold_comparison <- map_dfr(seq_along(thresholds), function(i) {
  u <- thresholds[i]
  fit <- evir::gpd(losses, threshold = u)
  beta_hat <- as.numeric(fit$par.ests["beta"])
  xi_hat <- as.numeric(fit$par.ests["xi"])
  Nu <- sum(losses > u)

  pot_var <- function(q) {
    if (abs(xi_hat) < 1e-8) {
      u + beta_hat * log(Nu / (n * (1 - q)))
    } else {
      u + (beta_hat / xi_hat) * ((Nu / (n * (1 - q)))^xi_hat - 1)
    }
  }

  var_99 <- pot_var(0.99)
  es_99 <- if (xi_hat < 1) {
    var_99 + (beta_hat + xi_hat * (var_99 - u)) / (1 - xi_hat)
  } else {
    Inf
  }

  tibble(
    Threshold_Probability = threshold_probs[i],
    Threshold = u,
    Exceedances = Nu,
    Beta = beta_hat,
    Xi = xi_hat,
    VaR_99 = var_99,
    ES_99 = es_99
  )
})

threshold_comparison
ggplot(threshold_comparison, aes(x = Threshold_Probability, y = Xi)) +
  geom_line() +
  geom_point() +
  labs(
    title = "Estimated GPD Shape Parameter Across Threshold Choices",
    x = "Threshold probability",
    y = "Estimated shape parameter"
  )

The comparison table shows how the number of exceedances, GPD parameter estimates, VaR, and Expected Shortfall change as the threshold increases. A good threshold choice should not be based only on the highest possible threshold. It should balance tail purity against estimation stability.

10 Common Mistakes

Students often treat threshold selection as a mechanical rule. In practice, it requires judgement informed by diagnostic plots, parameter stability, sample size, and the purpose of the analysis.

Another common mistake is to choose a threshold that is too low because it gives many exceedances. More data is useful only if the data are relevant to the tail model. Including central observations can bias the GPD fit.

A third mistake is to choose a threshold that is too high because it seems more extreme. A very high threshold may leave too few observations, causing unstable parameter estimates.

Students also sometimes misread the Mean Residual Life Plot. The aim is not to find a perfectly straight plot from the beginning. The aim is to find a region where the plot becomes approximately linear.

The Hill Plot is also frequently overinterpreted. Real data rarely produce a perfectly stable Hill estimate. The analyst looks for a reasonably stable range, while comparing with other diagnostics.

Finally, students sometimes forget that threshold selection affects VaR and Expected Shortfall estimates. A poor threshold can produce misleading tail-risk estimates even if the GPD fitting procedure itself is correctly coded.

11 Exercises

11.1 Conceptual Questions

  1. Why is exploratory data analysis important before fitting an extreme value model?
  2. What information can a histogram provide in extreme value analysis?
  3. What information can a box plot provide in extreme value analysis?
  4. What is the purpose of a QQ plot?
  5. How can a QQ plot suggest heavy-tailed behaviour?
  6. Why is threshold selection important in POT modelling?
  7. Explain the bias-variance trade-off in threshold selection.
  8. What happens if the threshold is too low?
  9. What happens if the threshold is too high?
  10. Define the mean excess function.
  11. Explain how the Mean Residual Life Plot is used to choose a threshold.
  12. Why does approximate linearity in the Mean Residual Life Plot support a GPD fit?
  13. Define the Hill estimator.
  14. What is the purpose of the Hill Plot?
  15. Why should the Hill Plot be interpreted cautiously?
  16. What is the main idea of the Square Error Method?
  17. Explain the basic principle of Maximum Likelihood Estimation.
  18. Why is MLE commonly used for GPD parameter estimation?
  19. Why should threshold diagnostics be compared rather than used in isolation?
  20. How does threshold choice affect VaR and Expected Shortfall estimates?

11.2 Computational Questions

  1. Simulate 2,000 losses from a Student’s t distribution with 4 degrees of freedom. Plot the time series, histogram, box plot, and normal QQ plot.
  2. Construct a Mean Residual Life Plot for the simulated losses.
  3. Construct a Hill Plot for the positive losses.
  4. Fit GPD models using thresholds at the 90th, 95th, and 97.5th percentiles.
  5. Compare the number of exceedances at each threshold.
  6. Compare the estimated shape parameter at each threshold.
  7. Estimate the 99% VaR and Expected Shortfall at each threshold.
  8. Based on the diagnostics and estimates, discuss which threshold appears most reasonable.

11.3 Exam-Style Question

A risk analyst wants to estimate extreme losses using the Peaks Over Threshold method. The analyst knows that threshold selection is a major practical problem and is considering the Mean Residual Life Plot, Hill Plot, and Square Error Method.

Required:

  1. Explain why threshold selection is important in POT modelling.
  2. Explain the bias-variance trade-off involved in choosing a threshold.
  3. Describe the Mean Residual Life Plot and explain how it is used to select a threshold.
  4. Describe the Hill Plot and explain how it is used in tail analysis.
  5. Explain the basic idea of the Square Error Method.
  6. Explain why Maximum Likelihood Estimation is used after threshold selection.
  7. Explain how threshold choice can affect estimated VaR and Expected Shortfall.
  8. State two common mistakes analysts should avoid when selecting a threshold.