1 Learning Objectives

By the end of this chapter, the student should be able to:

  1. explain the general purpose of Extreme Value Theory;
  2. describe the type of financial and actuarial data for which EVT is useful;
  3. explain exceedances over a high threshold and the logic of point processes of exceedances;
  4. define sample maxima and explain why their asymptotic behaviour depends on the right tail of the distribution;
  5. derive the exact distribution function of the sample maximum;
  6. explain the Fisher-Tippett theorem as the limiting result for normalized maxima;
  7. distinguish between the Fréchet, Gumbel, and Weibull extreme value distributions;
  8. explain the meaning of maximum domain of attraction;
  9. distinguish between the Block Maxima and Peaks Over Threshold approaches;
  10. use R to illustrate exceedances, block maxima, and the three limiting tail behaviours.

2 General Theory

Extreme Value Theory is a branch of probability and statistics concerned with the behaviour of very large or very small observations. In financial risk measurement, the focus is usually on large losses. These large losses may arise from daily negative returns on a financial asset or portfolio, operational losses, catastrophic insurance claims, credit losses, or other high-impact financial events.

Let

\[ X_1, X_2, \ldots, X_n \]

be a sequence of identically distributed random variables with unknown distribution function

\[ F(x)=P(X_i\leq x). \]

In this chapter we work mainly with distribution functions rather than densities. This is important because Extreme Value Theory is concerned with probabilities in the tail, and the distribution function gives a direct way of describing such probabilities.

The variables \(X_i\) may represent losses, negative returns, claims, or other risk quantities. If we use the convention that losses are positive, then the largest observations in the sample represent the most severe losses. EVT asks how such largest observations behave, especially when the sample size becomes large.

The central idea is that the extreme part of a distribution may have a structure that can be modelled, even when the full underlying distribution is unknown. This is very useful in finance because the full distribution of returns or losses is rarely known exactly. Instead of trying to model the entire distribution, EVT concentrates on the tail.

3 Exceedances Over a High Threshold

A natural way to study extremes is to choose a high threshold and observe which data points exceed it. Suppose the threshold is denoted by \(u_n\). An observation \(X_i\) is called an exceedance if

\[ X_i>u_n. \]

The number of exceedances in a sample of size \(n\) is

\[ \#\{i:X_i>u_n,\;i=1,\ldots,n\} = \sum_{i=1}^{n} I(X_i>u_n), \]

where

\[ I(X_i>u_n)= \begin{cases} 1, & X_i>u_n,\\ 0, & X_i\leq u_n. \end{cases} \]

If the data are independent and identically distributed, then each observation has the same probability of exceeding the threshold. This probability is

\[ P(X_i>u_n). \]

Therefore, the number of exceedances follows a binomial distribution with parameters \(n\) and \(P(X_i>u_n)\).

For extremes, the threshold should not remain fixed as the sample size increases. If the sample size grows but the threshold remains too low, the selected observations may include too many ordinary observations from the centre of the distribution. EVT therefore considers the case where \(n\to\infty\) and the threshold \(u_n\) also increases in a suitable way.

A key condition is that for some \(\tau>0\),

\[ nP(X_i>u_n)\to \tau, \qquad n\to\infty. \]

This condition says that as the sample size grows and the threshold rises, the expected number of exceedances approaches a finite positive value. The threshold is rising, so exceedances become rarer, but the sample size is also growing, so a meaningful number of exceedances remains.

Under this condition, the number of exceedances converges in distribution to a Poisson distribution with parameter \(\tau\). This result helps explain why exceedances over high thresholds are often modelled using point process ideas.

set.seed(2426)

n <- 1000
x <- rt(n, df = 4)
threshold <- quantile(x, 0.95)
indicators <- ifelse(x > threshold, 1, 0)
number_exceedances <- sum(indicators)

list(
  threshold = threshold,
  number_exceedances = number_exceedances,
  proportion_exceeding = mean(indicators)
)
## $threshold
##      95% 
## 2.092996 
## 
## $number_exceedances
## [1] 50
## 
## $proportion_exceeding
## [1] 0.05

The R output shows the threshold, the number of observations exceeding it, and the proportion of observations above it. If the threshold is chosen as the empirical 95th percentile, roughly 5% of the observations exceed it.

4 Time-Normalized Point Process of Exceedances

When exceedances are observed in a sample, one may index the exceedance times. If the observations are \(X_1,\ldots,X_n\), the original observation indices are \(1,2,\ldots,n\). However, as \(n\) increases, the interval \([0,n]\) becomes larger and larger. A more convenient representation is obtained by rescaling time to the interval \([0,1]\).

An observation \(X_i\) exceeding \(u_n\) is then represented by its normalized time point

\[ \frac{i}{n}. \]

For an interval \((a,b]\subset[0,1]\), define

\[ N_n((a,b]) = \#\left\{\frac{i}{n}\in(a,b]:X_i>u_n,\;i=1,2,\ldots,n\right\}. \]

This counts the number of exceedances whose normalized times fall in \((a,b]\). The resulting object is called a time-normalized point process of exceedances.

The important idea is that as the threshold rises and the sample size increases, exceedances become sparse. Under suitable conditions, the point process of exceedances converges to a Poisson process. This supports the use of threshold exceedance models in EVT.

exceedance_data <- tibble(
  index = 1:n,
  time_normalized = index / n,
  value = x,
  exceedance = x > threshold
)

ggplot(exceedance_data, aes(x = time_normalized, y = value)) +
  geom_point(alpha = 0.5) +
  geom_hline(yintercept = threshold, linetype = "dashed") +
  labs(
    title = "Threshold Exceedances on Normalized Time Scale",
    x = "Normalized time i/n",
    y = "Observation"
  )

The dashed line represents the threshold. The points above the line are exceedances. In later chapters, these exceedances will become the foundation for the Peaks Over Threshold method.

5 Review of Extreme Value Theory

Extreme Value Theory studies the distribution of extreme realizations of a distribution function or stochastic process under suitable assumptions. The foundational results are associated with Fisher and Tippett and later Gnedenko, who established that, after suitable rescaling, the distribution of sample extremes can converge to one of only three possible limiting families.

This is one reason EVT is powerful. In ordinary probability theory, there are many possible distributions. However, for normalized maxima, only three broad types of non-degenerate limiting distributions can arise. This is analogous to the role played by the Central Limit Theorem for sample averages.

The Central Limit Theorem tells us that, under suitable conditions, normalized sums or averages converge to the normal distribution. EVT gives a parallel result for maxima. Instead of asking about the average of a sample, EVT asks about the largest observation in the sample.

The three possible limiting families are:

  1. Gumbel, associated with distributions whose tails decay roughly exponentially and which have all finite moments, such as the normal, lognormal, and gamma distributions;
  2. Fréchet, associated with heavy-tailed distributions whose tails decay by a power law, such as the Pareto, Cauchy, Student’s t, and stable Paretian distributions;
  3. Weibull, associated with distributions that have a finite upper endpoint.

This classification is important in risk management because financial losses often appear heavy-tailed. A heavy-tailed loss distribution can produce extreme losses more frequently than the normal distribution suggests.

EVT is particularly useful in finance because VaR and Expected Shortfall are tail-based quantities. They depend mainly on high quantiles and losses beyond high quantiles. The centre of the distribution is less important for these measures than the behaviour of the tail.

6 Distribution of Maxima

Let

\[ X_1,X_2,\ldots \]

be a sequence of independent and identically distributed non-degenerate random variables with common distribution function \(F\). Define the sample maximum by

\[ M_n=\max(X_1,X_2,\ldots,X_n), \qquad n\geq 2. \]

The random variable \(M_n\) records the largest observation in the sample. If the observations are losses, \(M_n\) is the largest loss observed in the sample.

The distribution function of \(M_n\) can be derived exactly. The event \(M_n\leq x\) means that the largest observation is less than or equal to \(x\). This can happen only if every observation in the sample is less than or equal to \(x\). Therefore,

\[ \begin{aligned} P(M_n\leq x) &=P(X_1\leq x,X_2\leq x,\ldots,X_n\leq x). \end{aligned} \]

If the observations are independent, this joint probability becomes the product of the individual probabilities:

\[ P(M_n\leq x)=P(X_1\leq x)P(X_2\leq x)\cdots P(X_n\leq x). \]

Since the observations are identically distributed,

\[ P(X_i\leq x)=F(x) \]

for each \(i\). Hence,

\[ P(M_n\leq x)=F(x)^n. \]

This formula is simple but very important. It shows that the distribution of the maximum depends on the underlying distribution \(F\), especially on the right tail of \(F\). Extremes occur near the upper end of the support of the distribution.

If the right endpoint of \(F\) is denoted by

\[ x_F=\sup\{x\in\mathbb{R}:F(x)<1\}, \]

then for values \(x<x_F\), one has \(F(x)<1\), and therefore

\[ F(x)^n\to 0, \qquad n\to\infty. \]

If \(x_F<\infty\) and \(x\geq x_F\), then \(F(x)=1\), so

\[ F(x)^n=1. \]

This means that, without rescaling, \(M_n\) tends to the upper endpoint of the distribution. If the endpoint is infinite, the maximum tends to drift upward without settling into a useful non-degenerate distribution. This is why EVT studies centred and normalized maxima.

set.seed(2426)

sample_sizes <- c(10, 50, 100, 500)
B <- 5000

maxima_data <- map_dfr(sample_sizes, function(size) {
  maxima <- replicate(B, max(rnorm(size)))
  tibble(
    sample_size = factor(size),
    maximum = maxima
  )
})

ggplot(maxima_data, aes(x = maximum)) +
  geom_histogram(bins = 60) +
  facet_wrap(~ sample_size, scales = "free_y") +
  labs(
    title = "Distribution of Sample Maxima from Normal Samples",
    x = "Sample maximum",
    y = "Frequency"
  )

The simulation shows that as the sample size increases, the sample maximum tends to move to the right. This is expected because a larger sample gives more opportunities to observe a large value.

7 The Fisher-Tippett Theorem

The Fisher-Tippett theorem, also called the extremal types theorem, is a fundamental result in EVT. It gives the possible limiting distributions for centred and normalized maxima.

Suppose there exist constants \(c_n>0\) and \(d_n\in\mathbb{R}\), and a non-degenerate distribution function \(H\), such that

\[ \frac{M_n-d_n}{c_n}\xrightarrow{d}H. \]

Then \(H\) must belong to one of only three possible types of extreme value distributions: Fréchet, Weibull, or Gumbel.

The Fréchet distribution is given by

\[ \Phi_\alpha(x)= \begin{cases} 0, & x\leq 0,\\ \exp\{-x^{-\alpha}\}, & x>0, \end{cases} \qquad \alpha>0. \]

The Weibull distribution is given by

\[ \Psi_\alpha(x)= \begin{cases} \exp\{-(-x)^\alpha\}, & x\leq 0,\\ 1, & x>0, \end{cases} \qquad \alpha>0. \]

The Gumbel distribution is given by

\[ \Lambda(x)=\exp\{-e^{-x}\}, \qquad x\in\mathbb{R}. \]

The parameter \(\alpha\) is called the tail index. It helps describe the heaviness of the tail.

The theorem is powerful because it does not require the full distribution \(F\) to be known. It says that if a non-degenerate limiting distribution for normalized maxima exists, then it must be one of these three types.

8 Maximum Domain of Attraction

If normalized maxima from a distribution \(F\) converge to an extreme value distribution \(H\), then \(F\) is said to belong to the maximum domain of attraction of \(H\). This is written as

\[ F\in MDA(H). \]

The maximum domain of attraction of a distribution \(H\) is the set of all distributions whose normalized maxima converge to \(H\).

The three major cases are as follows.

The Gumbel domain of attraction contains distributions with relatively thin tails and often infinite upper endpoints. Examples include the normal, lognormal, exponential, and gamma distributions. These distributions may allow very large observations, but the tail decays quickly.

The Fréchet domain of attraction contains heavy-tailed distributions. Examples include Pareto, Cauchy, Student’s t, and stable Paretian distributions. These distributions are important in finance because they can assign much higher probability to extreme losses.

The Weibull domain of attraction contains distributions with finite upper endpoints. In this case, the distribution has a maximum possible value. An example is a beta distribution on a bounded interval.

9 Block Maxima and Peaks Over Threshold

EVT has two major practical approaches.

The first approach is the Block Maxima method. The data are divided into non-overlapping blocks, and the maximum observation from each block is selected. For example, if daily losses are available, one may divide them into months or years and select the largest loss in each month or year. These block maxima are then modelled using an extreme value distribution.

The second approach is the Peaks Over Threshold method. A high threshold is selected, and all observations exceeding that threshold are retained. If the threshold is \(u\), the exceedances are observations satisfying \(X_i>u\). The excesses are the amounts by which those observations exceed the threshold:

\[ Y_i=X_i-u, \qquad X_i>u. \]

The POT approach is often preferred in practical applications because it uses data more efficiently. The block maxima method may discard many large observations if they are not the largest within their blocks. POT keeps all observations above a sufficiently high threshold.

Within the POT class, two styles of analysis are common. The first is semi-parametric and uses estimators such as the Hill estimator. The second is fully parametric and is based on the Generalized Pareto Distribution. These will be studied in later chapters.

set.seed(2426)

n <- 1500
losses <- -rt(n, df = 4) / 100
block_size <- 50

loss_data <- tibble(
  time = 1:n,
  loss = losses,
  block = ceiling(time / block_size)
)

block_maxima <- loss_data %>%
  group_by(block) %>%
  summarise(maximum_loss = max(loss), .groups = "drop")

threshold <- quantile(loss_data$loss, 0.95)
exceedances <- loss_data %>%
  filter(loss > threshold) %>%
  mutate(excess = loss - threshold)

list(
  number_of_blocks = n_distinct(loss_data$block),
  number_of_block_maxima = nrow(block_maxima),
  threshold = as.numeric(threshold),
  number_of_threshold_exceedances = nrow(exceedances)
)
## $number_of_blocks
## [1] 30
## 
## $number_of_block_maxima
## [1] 30
## 
## $threshold
## [1] 0.02096107
## 
## $number_of_threshold_exceedances
## [1] 75
ggplot(loss_data, aes(x = time, y = loss)) +
  geom_line() +
  geom_hline(yintercept = threshold, linetype = "dashed") +
  labs(
    title = "Losses with a High Threshold for POT Analysis",
    x = "Time",
    y = "Loss"
  )

The block maxima approach produces one maximum per block. The POT approach produces all losses above the threshold. In many practical risk problems, the POT method gives a larger set of extreme observations for estimation.

10 Chapter R Application

This application shows, in one place, how to move from raw simulated losses to exceedance counts, block maxima, and the empirical behaviour of maxima.

set.seed(2026)

n <- 2000
losses <- -rt(n, df = 3) / 100

u <- quantile(losses, 0.975)
excesses <- losses[losses > u] - u

block_size <- 100
blocks <- ceiling(seq_along(losses) / block_size)
block_max <- tibble(loss = losses, block = blocks) %>%
  group_by(block) %>%
  summarise(max_loss = max(loss), .groups = "drop")

summary_table <- tibble(
  Total_Observations = n,
  Threshold = as.numeric(u),
  Number_Exceedances = length(excesses),
  Proportion_Exceeding = length(excesses) / n,
  Number_Blocks = nrow(block_max),
  Mean_Block_Maximum = mean(block_max$max_loss),
  Maximum_Observed_Loss = max(losses)
)

summary_table
tibble(Maximum_Loss = block_max$max_loss) %>%
  ggplot(aes(x = Maximum_Loss)) +
  geom_histogram(bins = 30) +
  labs(
    title = "Empirical Distribution of Block Maxima",
    x = "Block maximum loss",
    y = "Frequency"
  )

tibble(Excess = excesses) %>%
  ggplot(aes(x = Excess)) +
  geom_histogram(bins = 30) +
  labs(
    title = "Excesses Above a High Threshold",
    x = "Excess over threshold",
    y = "Frequency"
  )

The first histogram summarizes block maxima. The second summarizes excesses over a high threshold. These represent the two practical routes through which EVT enters financial risk measurement.

11 Common Mistakes

Students often confuse the maximum observation \(M_n\) with the original sample \(X_1,\ldots,X_n\). The maximum is a new random variable formed from the sample.

Another common mistake is to forget the independence assumption when deriving

\[ P(M_n\leq x)=F(x)^n. \]

This formula uses both independence and identical distribution. Without independence, the joint probability does not generally factor into a product.

Students also sometimes interpret EVT as a method that predicts the exact worst possible loss. That is not correct. EVT provides probabilistic models for tail behaviour. It helps estimate rare-event probabilities and high quantiles, but it does not eliminate uncertainty.

A further mistake is to think that the block maxima method and POT method use the same observations. They do not. Block maxima selects one maximum from each block. POT selects all observations above a high threshold.

Finally, students often mix up the three domains of attraction. A helpful memory is: Gumbel is associated with relatively thin tails, Fréchet with heavy tails, and Weibull with finite upper endpoints.

12 Exercises

12.1 Conceptual Questions

  1. What is the main purpose of Extreme Value Theory?
  2. Give five examples of data to which EVT may be applied in finance, insurance, or risk management.
  3. Explain why EVT works mainly with the tail of the distribution rather than the centre.
  4. Define an exceedance over a threshold.
  5. Explain why the threshold \(u_n\) is allowed to increase as \(n\to\infty\).
  6. Explain the meaning of the condition \(nP(X_i>u_n)\to\tau\).
  7. Why does the number of exceedances follow a binomial distribution under iid assumptions?
  8. Explain why the limiting distribution of exceedance counts may be Poisson.
  9. Define the sample maximum \(M_n\).
  10. Derive the distribution function of \(M_n\).
  11. Explain why the distribution of \(M_n\) depends on the right tail of \(F\).
  12. State the Fisher-Tippett theorem in words.
  13. List the three possible limiting extreme value distributions.
  14. Which domain of attraction is associated with heavy-tailed distributions?
  15. Which domain of attraction is associated with finite upper endpoints?
  16. Explain the meaning of maximum domain of attraction.
  17. Distinguish between Block Maxima and Peaks Over Threshold.
  18. Why is POT often preferred in practical financial risk measurement?
  19. What is an excess over a threshold?
  20. Why is EVT particularly useful for VaR and Expected Shortfall estimation?

12.2 Computational Questions

  1. Simulate 1,000 observations from a standard normal distribution. Choose the empirical 95th percentile as a threshold and count the number of exceedances.
  2. Repeat Question 1 using a Student’s t distribution with 3 degrees of freedom. Compare the largest observations with those from the normal distribution.
  3. Simulate 500 samples of size 100 from a normal distribution. Store the maximum from each sample and plot a histogram of the maxima.
  4. Generate 2,000 simulated losses from a Student’s t distribution with 4 degrees of freedom divided by 100 and multiplied by \(-1\). Divide the data into blocks of size 50 and extract the block maxima.
  5. Using the same data in Question 4, select the 95th percentile as a threshold and extract all exceedances and excesses.
  6. Compare the number of block maxima with the number of threshold exceedances. Which method uses more extreme observations?
  7. Simulate observations from a Pareto-type distribution using 1 / runif(1000). Plot a histogram and comment on its tail behaviour.
  8. Simulate observations from a bounded uniform distribution on \([0,1]\). Extract sample maxima for increasing sample sizes and comment on the finite upper endpoint.

12.3 Exam-Style Question

Let \(X_1,X_2,\ldots,X_n\) be independent and identically distributed random variables with common distribution function \(F\), and let

\[ M_n=\max(X_1,X_2,\ldots,X_n). \]

Required:

  1. Explain why \(M_n\) is important in Extreme Value Theory.
  2. Derive the distribution function of \(M_n\).
  3. Explain why the behaviour of \(M_n\) is related to the right tail of \(F\).
  4. State the Fisher-Tippett theorem and name the three possible limiting extreme value distributions.
  5. Explain the meaning of maximum domain of attraction.
  6. Distinguish between the Gumbel, Fréchet, and Weibull domains of attraction.
  7. Explain the difference between the Block Maxima method and the Peaks Over Threshold method.
  8. Give one reason why the POT method is often useful in financial risk measurement.