By the end of this chapter, the student should be able to:
The previous chapters introduced Extreme Value Theory as a framework for studying rare but severe financial losses. Two major EVT approaches were identified: the Block Maxima method and the Peaks Over Threshold method. This chapter focuses on the second approach.
The Peaks Over Threshold method, usually abbreviated as POT, focuses on all observations that exceed a sufficiently high threshold. If the threshold is chosen well, the observations above it represent the extreme tail of the distribution. This is useful because tail-based risk measures such as Value-at-Risk and Expected Shortfall are concerned precisely with such extreme observations.
POT is often preferred in practical risk measurement because it uses extreme data more efficiently than the Block Maxima method. In Block Maxima analysis, only one maximum is taken from each block. If a block contains several large losses, all except the largest are discarded. In POT analysis, all observations above the threshold are retained.
In financial risk measurement, POT is especially important because large losses may be rare. Since tail observations are limited, it is valuable to use as much relevant tail information as possible.
Let
\[ X_1,X_2,\ldots,X_n \]
be independent realizations of a random variable \(X\) with distribution function \(F(x)\). Suppose we choose a high threshold \(u\). An observation \(X\) is an exceedance if
\[ X>u. \]
The excess over the threshold is the amount by which \(X\) exceeds \(u\). It is written as
\[ Y=X-u, \qquad X>u. \]
For example, if the threshold is KSh 10 million and the observed loss is KSh 14 million, then the excess is KSh 4 million.
The excess distribution describes the conditional distribution of these exceedances. More precisely, the distribution function of the excesses over threshold \(u\) is
\[ F_u(y)=P(X-u\leq y\mid X>u), \]
where
\[ 0\leq y<x_F-u, \]
and \(x_F\) is the right endpoint of the distribution \(F\). The right endpoint is defined as
\[ x_F=\sup\{x\in\mathbb{R}:F(x)<1\}. \]
If \(x_F=\infty\), the distribution allows arbitrarily large losses. If \(x_F<\infty\), the distribution has a finite maximum possible value.
The excess distribution is useful because it answers the question: given that the loss has already exceeded a high threshold, what is the probability that the excess over that threshold is at most \(y\)?
The excess distribution can be written in terms of the original distribution function \(F\). Starting from the definition,
\[ F_u(y)=P(X-u\leq y\mid X>u). \]
The event \(X-u\leq y\) is the same as
\[ X\leq u+y. \]
Therefore,
\[ F_u(y)=P(X\leq u+y\mid X>u). \]
Using the definition of conditional probability,
\[ F_u(y)=\frac{P(X\leq u+y,\;X>u)}{P(X>u)}. \]
The event \(X\leq u+y\) and \(X>u\) means
\[ u<X\leq u+y. \]
Thus,
\[ P(X\leq u+y,\;X>u)=P(u<X\leq u+y). \]
This probability is
\[ P(X\leq u+y)-P(X\leq u)=F(u+y)-F(u). \]
Also,
\[ P(X>u)=1-F(u). \]
Hence,
\[ F_u(y)=\frac{F(u+y)-F(u)}{1-F(u)}, \]
for
\[ 0\leq y<x_F-u. \]
This formula is central to the theory of threshold exceedances. It shows that the excess distribution is determined by the behaviour of the original distribution above the threshold \(u\).
The Pickands-Balkema-de Haan theorem is one of the most important results in Peaks Over Threshold analysis. It explains why the Generalized Pareto Distribution is used to model excesses over a high threshold.
The theorem states that, for a large class of underlying distributions, as the threshold \(u\) becomes large, the excess distribution \(F_u(y)\) can be approximated by a Generalized Pareto Distribution.
More formally, suppose that the underlying distribution \(F\) belongs to the domain of attraction of a Generalized Extreme Value distribution. Then there exists a positive scaling function \(\beta(u)\) such that
\[ \sup_{0\leq y<x_F-u} \left|F_u(y)-G_{\xi,\beta(u)}(y)\right| \to 0 \]
as
\[ u\to x_F. \]
Here, \(G_{\xi,\beta(u)}\) denotes the Generalized Pareto Distribution with shape parameter \(\xi\) and scale parameter \(\beta(u)\).
The meaning of the theorem is that the distribution of excesses above a sufficiently high threshold is approximately GPD, even if the full distribution \(F\) is unknown. This is why the GPD is the natural model for threshold exceedances.
The theorem also shows the connection between the Block Maxima approach and the POT approach. The Block Maxima method leads to the Generalized Extreme Value distribution, while the POT method leads to the Generalized Pareto Distribution. These are not unrelated ideas; they are linked through the same tail behaviour of the underlying distribution.
The Generalized Pareto Distribution, abbreviated as GPD, is a two-parameter distribution used to model threshold excesses. Its distribution function is
\[ G_{\xi,\beta}(y)= \begin{cases} 1-\left(1+\dfrac{\xi y}{\beta}\right)^{-1/\xi}, & \xi\neq0,\\[1.2em] 1-\exp\left(-\dfrac{y}{\beta}\right), & \xi=0, \end{cases} \]
where
\[ \beta>0 \]
is the scale parameter and \(\xi\) is the shape parameter.
The support of the distribution depends on the value of \(\xi\). If \(\xi\geq0\), then
\[ y\geq0. \]
If \(\xi<0\), then
\[ 0\leq y\leq -\frac{\beta}{\xi}. \]
The shape parameter \(\xi\) is the most important parameter for extreme risk measurement because it determines the tail behaviour.
If \(\xi>0\), the GPD is heavy-tailed and corresponds to a Pareto-type tail. This case is particularly relevant in finance and insurance because it allows very large losses.
If \(\xi=0\), the GPD reduces to the exponential distribution. This corresponds to a light-tailed case.
If \(\xi<0\), the distribution has a finite upper endpoint. This means there is a maximum possible excess.
The scale parameter \(\beta\) controls the spread of the excess distribution. Larger values of \(\beta\) imply more dispersed excess losses.
The GPD is useful because it can represent different tail behaviours through the value of \(\xi\). The case \(\xi>0\) is especially important in risk management because it represents heavy-tailed behaviour.
A heavy-tailed distribution may not possess all moments. For the GPD with \(\xi>0\), the \(k\)-th moment is finite only if
\[ k<\frac{1}{\xi}. \]
Equivalently, the \(k\)-th moment is infinite if
\[ k\geq \frac{1}{\xi}. \]
This has important implications. If
\[ \xi=\frac{1}{2}, \]
then the second moment is infinite. This means the variance is infinite. If
\[ \xi=\frac{1}{4}, \]
then the fourth moment is infinite. This affects kurtosis-based interpretations of risk.
Financial losses and insurance claims sometimes show very heavy-tailed behaviour. In such cases, the normal distribution is inappropriate because it has finite moments of all orders and thin tails. The GPD can capture behaviour that the normal distribution cannot.
The main steps of implementing the Peaks Over Threshold method are as follows.
First, the data should be checked for whether the working assumptions are reasonable. Basic POT theory is developed under independent and identically distributed observations. In financial time series, this assumption is often not exactly true because returns may exhibit volatility clustering. In practice, one may work with filtered residuals from a volatility model or use diagnostic checks before fitting a tail model.
Second, an appropriate threshold must be selected. This is one of the most important and difficult steps in POT analysis. If the threshold is too low, observations from the centre of the distribution may be included, causing bias. If the threshold is too high, too few exceedances remain, causing high variance and unstable estimates.
Third, the excesses over the selected threshold are computed:
\[ Y_i=X_i-u, \qquad X_i>u. \]
Fourth, the GPD parameters \(\xi\) and \(\beta\) are estimated using a suitable method, commonly Maximum Likelihood Estimation.
Fifth, diagnostic checks are carried out. These may include probability plots, quantile plots, return level plots, and checks of parameter stability.
Finally, the fitted tail model is used to estimate extreme risk measures such as VaR and Expected Shortfall.
Mathematically, extreme risk measures are defined in terms of the loss distribution \(F\). If \(q\) is a high probability level such as 0.95 or 0.99, then Value-at-Risk is the \(q\)-quantile of the distribution:
\[ VaR_q = F^{-1}(q). \]
Expected Shortfall is the expected loss size given that VaR has been exceeded:
\[ ES_q = E[X \mid X > VaR_q]. \]
In practice, \(F\) is unknown. Therefore, \(VaR_q\) and \(ES_q\) are theoretical quantities that must be estimated from data. The POT-GPD method estimates these quantities by fitting a GPD to the excesses above a high threshold and then extrapolating into the tail.
Suppose \(X\) is a loss random variable and \(u\) is a high threshold. The excess distribution is
\[ F_u(y)=P(X-u \leq y \mid X>u). \]
For a sufficiently high threshold, the Pickands-Balkema-de Haan theorem motivates the approximation
\[ F_u(y) \approx G_{\xi,\beta}(y), \]
where \(G_{\xi,\beta}\) is the Generalized Pareto Distribution:
\[ G_{\xi,\beta}(y)= \begin{cases} 1-\left(1+\dfrac{\xi y}{\beta}\right)^{-1/\xi}, & \xi \neq 0,\\[1.2em] 1-\exp\left(-\dfrac{y}{\beta}\right), & \xi = 0. \end{cases} \]
Using the relationship between the excess distribution and the original distribution,
\[ F_u(y)=\frac{F(u+y)-F(u)}{1-F(u)}, \]
we can solve for \(F(u+y)\):
\[ F(u+y)=F(u)+F_u(y)[1-F(u)]. \]
Since \(F_u(y)\approx G_{\xi,\beta}(y)\), we obtain
\[ F(u+y)\approx F(u)+G_{\xi,\beta}(y)[1-F(u)]. \]
For \(\xi \neq 0\), substituting the GPD distribution function gives
\[ F(u+y)\approx F(u)+\left[1-\left(1+\frac{\xi y}{\beta}\right)^{-1/\xi}\right][1-F(u)]. \]
Expanding,
\[ F(u+y)\approx F(u)+[1-F(u)]-[1-F(u)]\left(1+\frac{\xi y}{\beta}\right)^{-1/\xi}. \]
Since
\[ F(u)+[1-F(u)]=1, \]
we get
\[ F(u+y)\approx 1-[1-F(u)]\left(1+\frac{\xi y}{\beta}\right)^{-1/\xi}. \]
Therefore, the survival probability above \(u+y\) is
\[ 1-F(u+y)\approx [1-F(u)]\left(1+\frac{\xi y}{\beta}\right)^{-1/\xi}. \]
This is the key tail approximation used to derive the GPD-based VaR and Expected Shortfall formulas.
In a sample of size \(n\), suppose \(N_u\) observations exceed the threshold \(u\). Then the exceedance probability \(1-F(u)\) is estimated by
\[ \widehat{1-F(u)}=\frac{N_u}{n}. \]
For \(x>u\), set \(y=x-u\). The estimated tail probability is therefore
\[ \widehat{P(X>x)} = \frac{N_u}{n} \left(1+\frac{\hat{\xi}(x-u)}{\hat{\beta}}\right)^{-1/\hat{\xi}}. \]
To derive GPD-based VaR, set \(x=VaR_q\). By definition,
\[ P(X>VaR_q)=1-q. \]
Using the GPD tail approximation,
\[ 1-q = \frac{N_u}{n} \left(1+\frac{\hat{\xi}(VaR_q-u)}{\hat{\beta}}\right)^{-1/\hat{\xi}}. \]
Divide both sides by \(N_u/n\):
\[ \frac{n(1-q)}{N_u} = \left(1+\frac{\hat{\xi}(VaR_q-u)}{\hat{\beta}}\right)^{-1/\hat{\xi}}. \]
Raise both sides to the power \(-\hat{\xi}\):
\[ \left(\frac{n(1-q)}{N_u}\right)^{-\hat{\xi}} = 1+\frac{\hat{\xi}(VaR_q-u)}{\hat{\beta}}. \]
Since
\[ \left(\frac{n(1-q)}{N_u}\right)^{-\hat{\xi}} = \left(\frac{N_u}{n(1-q)}\right)^{\hat{\xi}}, \]
we obtain
\[ 1+\frac{\hat{\xi}(VaR_q-u)}{\hat{\beta}} = \left(\frac{N_u}{n(1-q)}\right)^{\hat{\xi}}. \]
Subtracting 1 from both sides gives
\[ \frac{\hat{\xi}(VaR_q-u)}{\hat{\beta}} = \left(\frac{N_u}{n(1-q)}\right)^{\hat{\xi}}-1. \]
Multiplying by \(\hat{\beta}/\hat{\xi}\),
\[ VaR_q-u = \frac{\hat{\beta}}{\hat{\xi}} \left[ \left(\frac{N_u}{n(1-q)}\right)^{\hat{\xi}}-1 \right]. \]
Therefore, the GPD-based estimator of VaR is
\[ \widehat{VaR}_q = u+ \frac{\hat{\beta}}{\hat{\xi}} \left[ \left(\frac{N_u}{n(1-q)}\right)^{\hat{\xi}}-1 \right], \qquad \hat{\xi}\neq 0. \]
For the special case \(\hat{\xi}=0\), the GPD becomes exponential and the VaR formula becomes
\[ \widehat{VaR}_q = u+ \hat{\beta} \log\left(\frac{N_u}{n(1-q)}\right). \]
Expected Shortfall under the GPD tail model is derived using the fact that if the excess distribution above \(u\) is GPD with parameters \(\xi\) and \(\beta\), then the excess distribution above a higher threshold \(x>u\) is also GPD with the same shape parameter and updated scale parameter
\[ \beta_x=\beta+\xi(x-u). \]
For \(\xi<1\), the mean excess over \(x\) is
\[ E[X-x \mid X>x] = \frac{\beta+\xi(x-u)}{1-\xi}. \]
Therefore,
\[ E[X\mid X>x] = x+E[X-x\mid X>x]. \]
Substituting the mean excess expression gives
\[ E[X\mid X>x] = x+ \frac{\beta+\xi(x-u)}{1-\xi}. \]
Setting \(x=VaR_q\), we obtain
\[ ES_q = VaR_q+ \frac{\beta+\xi(VaR_q-u)}{1-\xi}. \]
Using estimated parameters, the GPD-based Expected Shortfall estimator is
\[ \widehat{ES}_q = \widehat{VaR}_q+ \frac{ \hat{\beta}+\hat{\xi}(\widehat{VaR}_q-u) } {1-\hat{\xi}}, \qquad \hat{\xi}<1. \]
This can also be rearranged into the equivalent form
\[ \widehat{ES}_q = \frac{ \widehat{VaR}_q+\hat{\beta}-\hat{\xi}u } {1-\hat{\xi}}, \qquad \hat{\xi}<1. \]
The condition \(\hat{\xi}<1\) is important. If \(\xi\geq1\), the mean of the GPD tail is infinite, and Expected Shortfall is not finite. This is one reason why the shape parameter is central in extreme risk measurement.
The interpretation is simple. VaR estimates a high loss threshold, while Expected Shortfall estimates the average loss beyond that threshold.
There are two broad categories of approaches that use EVT results to estimate market risk. The first is the Block Maxima Model, which uses the extremal types theorem to model the distribution of the largest or smallest observations collected from non-overlapping blocks of fixed size. A Generalized Extreme Value distribution is then fitted to the block extrema.
For example, suppose the data consist of daily returns of a portfolio and we are interested in the lower tail of the return distribution. Under the Block Maxima approach, one may divide the data into monthly blocks and select the minimum return from each block. These minima correspond to large losses. A GEV distribution may then be fitted to these block extremes.
The second approach is the Peaks Over Threshold method, which models all observations exceeding a high threshold. When losses are used, this means modelling all losses above \(u\). When returns are used and the lower tail is of interest, this may involve modelling negative returns below a low threshold after converting them into positive losses.
Both approaches are useful, but POT is often preferred because it uses more tail information. However, POT depends heavily on threshold selection, which will be studied in detail in the next chapter.
The following application illustrates the POT workflow using simulated heavy-tailed losses. We generate losses, select a high threshold, compute excesses, and fit a GPD model.
set.seed(2426)
n <- 3000
returns <- rt(n, df = 4) / 100
losses <- -returns
threshold <- as.numeric(quantile(losses, 0.95))
exceedances <- losses[losses > threshold]
excesses <- exceedances - threshold
pot_summary <- tibble(
Total_Observations = n,
Threshold = threshold,
Number_Exceedances = length(exceedances),
Exceedance_Rate = length(exceedances) / n,
Mean_Excess = mean(excesses),
Maximum_Excess = max(excesses)
)
pot_summary
tibble(Excess = excesses) %>%
ggplot(aes(x = Excess)) +
geom_histogram(bins = 40) +
labs(
title = "Excesses Over a High Threshold",
x = "Excess over threshold",
y = "Frequency"
)
gpd_fit <- gpd(losses, threshold = threshold)
gpd_fit
## $n
## [1] 3000
##
## $data
## [1] 0.02115044 0.02542669 0.03704204 0.02260003 0.02896119 0.03647515
## [7] 0.02200445 0.03450129 0.02161077 0.02546448 0.02555203 0.02137535
## [13] 0.04564327 0.02282741 0.02367057 0.02427238 0.03061952 0.02803815
## [19] 0.02590543 0.02328326 0.03194959 0.04006507 0.02159431 0.02589774
## [25] 0.02808120 0.02396135 0.02634717 0.02698198 0.04177238 0.03062239
## [31] 0.06676763 0.02870527 0.03047829 0.02145501 0.02883841 0.02776283
## [37] 0.03157348 0.04079771 0.02543815 0.03955570 0.04579705 0.02759358
## [43] 0.04741408 0.03746513 0.02597083 0.02843993 0.04817968 0.02227096
## [49] 0.02936915 0.04273414 0.03151796 0.03706597 0.03107302 0.02120672
## [55] 0.02200827 0.02619922 0.02268391 0.03550055 0.04000169 0.02420011
## [61] 0.02114728 0.02112431 0.03463345 0.02180116 0.02937626 0.02466265
## [67] 0.03077847 0.02182595 0.03155997 0.02861594 0.02181366 0.02287301
## [73] 0.02420867 0.03811384 0.04518738 0.02287953 0.02320065 0.03218452
## [79] 0.02803604 0.03957394 0.02550968 0.02783878 0.03182176 0.02820740
## [85] 0.02391506 0.03069947 0.03047560 0.02415990 0.05119441 0.02878724
## [91] 0.02735805 0.02398549 0.02107715 0.02355790 0.04010289 0.02495669
## [97] 0.02319530 0.02458218 0.02375462 0.02546648 0.02999231 0.02745287
## [103] 0.02344161 0.02599453 0.03259374 0.03153653 0.03996491 0.04146278
## [109] 0.04802274 0.04484540 0.02375889 0.02809160 0.03024708 0.02779610
## [115] 0.02244865 0.02107879 0.02936198 0.02567790 0.02577683 0.02389564
## [121] 0.02391462 0.02400737 0.05112878 0.04279581 0.03370550 0.02374807
## [127] 0.02582502 0.02343280 0.02600867 0.02215507 0.03748672 0.03208982
## [133] 0.11053057 0.03898657 0.02249191 0.03462917 0.02157815 0.06302766
## [139] 0.02177466 0.02826290 0.03023844 0.03266240 0.02553206 0.02245835
## [145] 0.05230422 0.03138011 0.02932979 0.02859193 0.03643016 0.02689226
##
## $threshold
## [1] 0.02105156
##
## $p.less.thresh
## [1] 0.95
##
## $n.exceed
## [1] 150
##
## $method
## [1] "ml"
##
## $par.ests
## xi beta
## 0.076115358 0.008765281
##
## $par.ses
## xi beta
## 0.0764881180 0.0009437829
##
## $varcov
## [,1] [,2]
## [1,] 5.850432e-03 -4.470833e-05
## [2,] -4.470833e-05 8.907261e-07
##
## $information
## [1] "observed"
##
## $converged
## [1] 0
##
## $nllh.final
## [1] -549.1046
##
## attr(,"class")
## [1] "gpd"
The fitted GPD output provides parameter estimates for the scale and shape parameters. The shape parameter is especially important because it indicates whether the tail is heavy, light, or bounded.
If the estimated shape parameter is positive, the fitted model suggests heavy-tailed losses. If it is close to zero, the fitted tail resembles an exponential tail. If it is negative, the fitted tail has a finite upper endpoint.
# GPD diagnostic plots
# Extract fitted GPD parameters from the evir::gpd object
beta_hat <- gpd_fit$par.ests["beta"]
xi_hat <- gpd_fit$par.ests["xi"]
# Sort observed excesses
observed_excesses <- sort(excesses)
m <- length(observed_excesses)
plotting_positions <- ppoints(m)
# Fitted GPD CDF
pgpd_manual <- function(y, beta, xi) {
if (abs(xi) < 1e-8) {
1 - exp(-y / beta)
} else {
1 - (1 + xi * y / beta)^(-1 / xi)
}
}
# Fitted GPD quantile function
qgpd_manual <- function(p, beta, xi) {
if (abs(xi) < 1e-8) {
-beta * log(1 - p)
} else {
beta / xi * ((1 - p)^(-xi) - 1)
}
}
fitted_probabilities <- pgpd_manual(observed_excesses, beta_hat, xi_hat)
theoretical_quantiles <- qgpd_manual(plotting_positions, beta_hat, xi_hat)
par(mfrow = c(1, 2))
plot(
plotting_positions,
fitted_probabilities,
xlab = "Empirical probability",
ylab = "Fitted GPD probability",
main = "GPD Probability Plot"
)
abline(0, 1, lty = 2)
plot(
theoretical_quantiles,
observed_excesses,
xlab = "Theoretical GPD quantiles",
ylab = "Observed excesses",
main = "GPD QQ Plot"
)
abline(0, 1, lty = 2)
par(mfrow = c(1, 1))
The diagnostic plots should be interpreted carefully. A reasonable GPD fit should show that the model is broadly consistent with the observed excesses. Severe deviations may suggest that the threshold is inappropriate or that the model assumptions are not suitable.
Students often confuse exceedances and excesses. An exceedance is an observation above the threshold, while an excess is the amount by which that observation exceeds the threshold.
Another common mistake is to choose the threshold mechanically without considering the bias-variance trade-off. A low threshold gives more data but may include non-tail observations. A high threshold focuses more purely on the tail but leaves fewer observations for estimation.
Students also sometimes forget that the GPD is an approximation to the excess distribution for high thresholds. It is not usually assumed to model the whole distribution.
A further mistake is to misinterpret the shape parameter. The shape parameter \(\xi\) controls tail heaviness. Positive \(\xi\) indicates a heavy-tailed case; zero corresponds to the exponential case; negative \(\xi\) implies a finite upper endpoint.
Finally, students sometimes try to estimate VaR and Expected Shortfall as if the full distribution \(F\) were known. In practice, \(F\) is unknown, and the purpose of the tail model is to estimate the relevant high quantiles and tail expectations.
Let \(X\) be a loss random variable with distribution function \(F\), and let \(u\) be a high threshold. The excess distribution over \(u\) is defined by
\[ F_u(y)=P(X-u\leq y\mid X>u). \]
Required: