TPE Statistique des Extrêmes

Présentation des lois d’extremum généralisées en particulier des lois a queue épaisse.

Soit \(X_1,X_2....,X_n\) une séquence de variables indépendantes et identiquement distribuées et \(M_n = max\lbrace X_1,....,X_n\rbrace\).Si une séquence de paires de nombres réels \((a_n,b_n)\) existe telle que \(\lim _{n \to \infty} P(\frac{M_n - b_n} {a_n} \leq x) = F(x)\) , où F est une fonction de distribution non dégénérée, alors la distribution de F appartient à la famille des lois de Gumbel, des lois de Fréchet, ou des lois de Weibull.Ces familles peuvent être regroupées dans la classe des lois d’extremum généralisées.

On a la répartition du domaine d’attraction de certaine lois en fonction de l’indice de valeurs extrêmes ci-dessous.

Domaine d’attraction	Gumbel \(\gamma = 0\)	Fréchet \(\gamma > 0\)	Weibull \(\gamma < 0\)
Loi	Normale	Cauchy	Uniforme
Loi	Exponentielle	Pareto	Beta
Loi	Lognormale	Student
Loi	Gamma	Burr
Loi	Weibull

On notera en particulier que les lois à queue épaisse (c’est à dire où \(\gamma > 0\)) sont dans le domaine d’attraction de Fréchet.

Ainsi , on peut dire que la loi limites des maxima d’une loi à queue épaisse aura les variations d’une loi de Fréchet.

Prenons \((X_n)\) une suite de variables aléatoires de fonction de répartition F,dans notre cas on prendra des \(X_n\) à queue épaisse. On sait que la fonction de survie \(\overline{F}=1-F\)est à variation régulière si il existe \(\gamma>0\) tel que \(\lim _{t \to \infty}\frac{\overline{F}(tx)}{\overline{F}(t)}=x^{-\gamma}\) (Elle est dite à variation lente si \(\gamma=0\))

Or ,comme F est une fonction de répartition d’une loi à queue épaisse on sait qu’il existe un \(\gamma\) positif tel que \(\lim _{x \to \infty}\frac{\overline{F}(tx)}{\overline{F}(t)}=t^\frac{-1}{\gamma}\quad\forall t>0\) Donc on a bien \(\lim _{t \to \infty}\frac{\overline{F}(tx)}{\overline{F}(t)}=x^{-\gamma}\) et on sait que \(\overline{F}\) est a variation régulière.

La variation régulière implique que \(\overline{F}(F^{\leftarrow}(1-\frac{1}{t}))\sim t^{-1}\)

Méthodes d’estimation de l’indice de valeurs extrêmes

On présente ici deux estimateurs différents, basés sur la statistique d’ordre \(X_{k,n}\leq....\leq X_{1,n}\), obtenue à partir de la série initiale en considérant les k valeurs les plus grandes(ou les plus petites).

Estimateur de Pickands

Il est défini par la statistique :\[\hat{\gamma} _{k,n} ^P = \frac{1}{\ln 2} \ln \left( \frac{X_{k,n} - X_{2k,n}}{X_{2k,n} - X_{4k,n}} \right)\] Il présente l’intérêt d’être valable quelle que soit la distribution des extrêmes(Gumbel,Weibull ou Fréchet).La représentation graphique de cet estimateur en fonction du nombre k d’observations considérées montre un comportement en général très volatil au départ, ce qui nuit à la lisibilité du graphique. De plus, cet estimateur est très sensible à la taille de l’échantillon sélectionné, ce qui le rend peu robuste. Il est donc d’un maniement délicat. On peut noter qu’il est asymptotiquement normal, avec :

\[\sqrt{k} \ \frac{\hat{\gamma} _{k,n} ^P - \gamma}{\sigma (\gamma)} \longrightarrow N(0,1)\] lorsque \(k \longrightarrow + \infty\) la variance asymptotique étant donnée par: \[\sigma(\gamma) = \frac{\gamma \sqrt{2^{2 \gamma + 1} + 1}}{2(2^{\gamma}-1) \ln 2}\]

Estimateur de Hill

L’estimateur de Hill n’est utilisable que pour les distributions de Fréchet(donc telles que \(\gamma > 0\) ) pour lesquelles il fournit un estimateur de l’indice de queue plus efficace que l’estimateur de Pickands.Il est défini par la statistique suivante : \[\hat{\gamma} _{k,n} ^H = \frac{1}{k-1} \sum _{j=1} ^{k-1} \ln \left( \frac{X_{j,n}}{X_{k,n}} \right)\] Si on choisit \(k,n \longrightarrow +\infty\) de sorte que \(\frac{k}{n} \longrightarrow 0\) alors on peut montrer que \(\lim _{k \to \infty}\hat{\gamma} _{k,n} ^H=\gamma\) et l’estimateur de Hill est le plus asymptotiquement normal: \[\sqrt{k} \ \frac{\hat{\gamma} _{k,n} ^H - \gamma}{\gamma} \longrightarrow N(0,1)\] la convergence étant en loi. Cet estimateur est l’estimateur du maximum de vraisemblance dans le cas particulier du modèle \(S(x)=1-F(x)=Cx^{\frac{-1}{\gamma}}\); on reconnaît ici une distribution de Pareto d’indice \(\alpha=\frac{1}{\gamma}\).Dans le cas général du domaine de Fréchet, la fonction de survie est de la forme \(S(x)=1-F(x)=x^{\frac{-1}{\gamma}}L(x)\) avec L une fonction à variation lente. Cela induit un biais important sur l’estimateur de Hill, qui est donc en pratique d’un maniement délicat.Dans le cas général, la fonction L apparaît comme un paramètre de nuisance de dimension infinie, qui complique l’estimation.

Consistance des estimateurs

Si en augmentant la taille de l’échantillon on peut diminuer l’erreur commise en prenant \({\hat\gamma}\) à la place de \(\gamma\) on dit que l’estimateur est consistant.C’est à dire qu’il converge vers sa vraie valeur. La définition précise en mathématique est la suivante:

L’estimateur \(\hat\gamma_n\) est convergent s’il converge en probabilité vers \(\gamma\), soit : \[\lim _{n \to \infty}P(|\hat\gamma_n-\gamma|>\epsilon)=0 \quad\forall\epsilon>0\]

Hill

Si on choisit \(k,n \longrightarrow +\infty\) de sorte que \(\frac{k}{n} \longrightarrow 0\) alors on peut montrer que \(\lim _{k \to \infty}\hat{\gamma} _{k,n} ^H=\gamma\).

Pickands

Dans la suite on utilisera le lemme suivant :

Soit \(X_1,X_2....,X_n\) des variables aléatoires indépendantes et de fonction de répartition F.Soit \(U_1,U_2....,U_n\) des variables aléatoires indépendantes de loi uniforme sur [0;1]. Alors \((F(U_{(1,n)}),F(U_{(2,n)})....,F(U_{(n,n)})\) a la même loi que \((F(X_{(1,n)}),F(X_{(2,n)})....,F(X_{(n,n)})\)

On utilisera aussi cette notation :

On a \(F \in D(H(\gamma))\) si et seulement si \[n \lim \limits _{n \to \infty} n \overline{F}(xa_n + b_n) = - \log H(\gamma)(x)\] pour une certaine suite \((a_n, b_n)_{n \geq 1}\) où \(a_n > 0\) et \(b_n \in \mathbb{R}\). On a alors la convergence en loi de \((a_n ^{-1} (M_n - b_n))_{n \geq 1}\) vers une variable aléatoire de la fonction de répartition \(H(\gamma)\).

Soit \((X_n)_{n \leq 1}\) une suite de variables aléatoires indépendantes de même fonction de répartition \(F \in D(H(\gamma))\), où \(\gamma \in \mathbb{R}\). Si \(\lim _{n \to \infty} k(n) = \infty\) et \(\lim _{n \to \infty} \frac{k(n)}{n} = 0\), alors l’estimateur de Pickand converge en probabilité vers \(\gamma\). Soit \((X_n)_{n \geq 1}\) une suite de variables aléatoires indépendantes de même fonction de répartition \(F \in D(H(\gamma))\), où \(\gamma \in \mathbb{R}\).

Si \(\lim _{n \to \infty} k(n) = \infty\) et \(\lim _{n \to \infty} \frac{k(n)}{n} = 0\), alors l’estimateur de Pickand converge en probabilité vers \(\gamma\).

Pour \(\gamma \in \mathbb{R}\), on a avec le choix \(t=2s\) et \(y=1/2\) : \[\lim _{t \to \infty} \frac{U(t) - U(t/2)}{U(t/2) - U(t/4)} = 2^{\gamma}\]

En fait, en utilisant la croissance de U qui se déduit de la croissance de F, on obtient : \[\lim \limits _{t \to \infty} \frac{U(t) - U(t_{c1}(t))}{U(t_{c1}(t)) - U(t_{c2}(t))} = 2^{\gamma}\] dès lors que \(\lim \limits _{t \to \infty} c_1(t) = 1/2 \ \text{ et } \ \lim \limits _{t \to \infty} c_2(t) = 1/4\)

Il reste donc à trouver des estimateurs pour \(U(t)\).

Soit \((k(n))_{n \geq 1}\) une suite d’entiers telle que : \[\lim _{n \to \infty} k(n) = \infty \ \ \text{ et } \ \lim _{n \to \infty} \frac{k(n)}{n} = 0\]

Soit \((V_{(1,n)},...,V_{(n,n)})\) la statistique d’ordre d’un échantillon de variables aléatoires indépendantes de loi de pareto. On note \(F_V(x) = 1 - \frac{1}{x}\), pour \(x \geq 1\), la fonction de répartition de la loi de pareto. On a les convergences en probabilité suivante : \[V_{(n-k+1,n)} \to \infty\] \[\frac{V_{(n-2k+1,n)}}{V_{(n-k+1,n)}} \to 1/2\] \[\frac{V_{(n-4k+1,n)}}{V_{(n-k+1,n)}} \to 1/4\]

On en déduit donc que la convergence suivante a lieu en probabilité : \[\frac{U(V_{n-k+1,n)} - U(V_{(n-2k+1,n)})}{U(V_{(n-2k+1,n)}) - U(V_{(n-4k+1,n)})} \to 2^{\gamma}\]

Il reste maintenant à déterminer la loi de \((U(V_{(1,n)},...,U(V_{(n,n)}))\).

Remarquons que si \(x \geq 1\), alors \(U(x) = F^{-1}(F_V(x))\). On a donc : \[(U(V_{(1,n)}),...,U(V_{(n,n)})) = (F^{-1}(F_V(V_{(1,n)})),...,F^{-1}(F_V(V_{(1,n)})))\] où \(F_V\) est la fonction de répartition de la loi de pareto. ON déduit du lemme précédent que le vecteur aléatoire \((F^{-1}(F_V(V_{(1,n)})), ..., F^{-1}(F_V(V_{(1,n)})))\) a la même loi que \((x_{(1,n)},...,x_{(n,n)})\) la statistique d’ordre d’un échantillon de n variables aléatoires indépendantes dont la loi a pour fonction de répartition F.

Donc, la variable aléatoire \[\frac{U(V_{(n-k+1,n)}) - U(V_{(n-2k+1,n)})}{U(V_{(n-2k+1,n)}) - U(V_{(n-4k+1,n)})}\] a même loi que \[\frac{X_{(n-k+1,n)} - X_{(n-2k+1,n)}}{X_{(n-2k+1,n)} - X_{(n-4k+1,n)}}\]

Ainsi, cette quantité converge en loi vers \(2^{\gamma}\) quand n tend vers l’infini.

Comme la fonction logarithme est continue sur \(\mathbb{R} _+ ^*\), on en déduit que l’estimateur de Pickand converge en loi vers \(\gamma\). Mais comme \(\gamma\) est une constante, on a également la convergence en probabilité.

Problème de sélection d’estimateurs

Le problème de séléction d’estimateurs posé par l’estimateur de Hill est comme dit plus haut le fait que l’estimateur de Hill n’est utilisable que pour les distributions de Fréchet.Ainsi, il sera préféré à l’estimateur de Pickands dans ce cas car plus efficace.Mais dans le cas des distributions de Weibull et de Gumbel il ne sera pas utilisable ainsi on utilisera l’estimateur de Pickands.

Simulation de lois et estimations

Dans la suite on simule plusieurs lois(Pareto,Fréchet,Student,Log-Gamma) et on applique l’estimateur de Hill avec les données simulées.

library(evir)

library(evmix)

## Loading required package: MASS

## Loading required package: splines

## Loading required package: gsl

## Loading required package: SparseM

## 
## Attaching package: 'SparseM'

## The following object is masked from 'package:base':
## 
##     backsolve

## 
## Attaching package: 'evmix'

## The following objects are masked from 'package:evir':
## 
##     dgpd, pgpd, qgpd, qplot, rgpd

library(RobExtremes)

## Loading required package: distrMod

## Loading required package: distr

## Loading required package: startupmsg

## Utilities for Start-Up Messages (version 0.9.6)

## For more information see ?"startupmsg", NEWS("startupmsg")

## Loading required package: sfsmisc

## Object Oriented Implementation of Distributions (version 2.8.0)

## Attention: Arithmetics on distribution objects are understood as operations on corresponding random variables (r.v.s); see distrARITH().
## Some functions from package 'stats' are intentionally masked ---see distrMASK().
## Note that global options are controlled by distroptions() ---c.f. ?"distroptions".

## For more information see ?"distr", NEWS("distr"), as well as
##   http://distr.r-forge.r-project.org/
## Package "distrDoc" provides a vignette to this package as well as to several extension packages; try vignette("distr").

## 
## Attaching package: 'distr'

## The following object is masked from 'package:evir':
## 
##     shape

## The following objects are masked from 'package:stats':
## 
##     df, qqplot, sd

## Loading required package: distrEx

## Extensions of Package 'distr' (version 2.8.0)

## Note: Packages "e1071", "moments", "fBasics" should be attached /before/ package "distrEx". See distrExMASK().Note: Extreme value distribution functionality has been moved to
##       package "RobExtremes". See distrExMOVED().

## For more information see ?"distrEx", NEWS("distrEx"), as well as
##   http://distr.r-forge.r-project.org/
## Package "distrDoc" provides a vignette to this package as well as to several related packages; try vignette("distr").

## 
## Attaching package: 'distrEx'

## The following objects are masked from 'package:stats':
## 
##     IQR, mad, median, var

## Loading required package: RandVar

## Implementation of Random Variables (version 1.2.0)

## For more information see ?"RandVar", NEWS("RandVar"), as well as
##   http://robast.r-forge.r-project.org/
## This package also includes a vignette; try vignette("RandVar").

## Loading required package: stats4

## Object Oriented Implementation of Probability Models (version 2.8.2)

## Some functions from pkg's 'base' and 'stats' are intentionally masked ---see distrModMASK().
## Note that global options are controlled by distrModoptions() ---c.f. ?"distrModoptions".

## For more information see ?"distrMod", NEWS("distrMod"), as well as
##   http://distr.r-forge.r-project.org/
## There is a vignette to this package; try vignette("distrMod").
## 
## Package "distrDoc" provides a vignette to the other distrXXX packages,
## as well as to several related packages; try vignette("distr").

## 
## Attaching package: 'distrMod'

## The following object is masked from 'package:stats4':
## 
##     confint

## The following object is masked from 'package:SparseM':
## 
##     norm

## The following object is masked from 'package:stats':
## 
##     confint

## The following object is masked from 'package:base':
## 
##     norm

## Loading required package: ROptEst

## Loading required package: RobAStBase

## Loading required package: rrcov

## Loading required package: robustbase

## Scalable Robust Estimators with High Breakdown Point (version 1.4-7)

## Robust Asymptotic Statistics (version 1.2.1)

## Some functions from pkg's 'stats' and 'graphics' are intentionally masked ---see RobAStBaseMASK().
## Note that global options are controlled by RobAStBaseoptions() ---c.f. ?"RobAStBaseoptions".

## For more information see ?"RobAStBase", NEWS("RobAStBase"), as well as
##   http://robast.r-forge.r-project.org/

## 
## Attaching package: 'RobAStBase'

## The following object is masked from 'package:graphics':
## 
##     clip

## Loading required package: evd

## 
## Attaching package: 'evd'

## The following objects are masked from 'package:evmix':
## 
##     dgpd, mrlplot, pgpd, qgpd, rgpd, tcplot

## The following objects are masked from 'package:evir':
## 
##     dgev, dgpd, pgev, pgpd, qgev, qgpd, rgev, rgpd

## Optimally Robust Estimation for Extreme Value Distributions (version 1.2.0)

##

## For more information see ?"RobExtremes", NEWS("RobExtremes"), as well as
##   http://robast.r-forge.r-project.org/

## 
## Attaching package: 'RobExtremes'

## The following objects are masked from 'package:robustbase':
## 
##     Qn, Sn

##Student


par(mfrow=c(2,5))
for (i in 1:10){
  T=rt(10000,df=i)
  #plot(T)
  #hist(T)
  hill(T)
}

##Pareto


library(VGAM)

## 
## Attaching package: 'VGAM'

## The following objects are masked from 'package:evd':
## 
##     dfrechet, dgev, dgpd, dgumbel, pfrechet, pgev, pgpd, pgumbel,
##     qfrechet, qgev, qgpd, qgumbel, rfrechet, rgev, rgpd, rgumbel,
##     venice

## The following object is masked from 'package:distr':
## 
##     Max

## The following objects are masked from 'package:evmix':
## 
##     dgpd, pgpd, qgpd, rgpd, rlplot

## The following objects are masked from 'package:gsl':
## 
##     erf, erfc, hzeta, zeta

## The following objects are masked from 'package:evir':
## 
##     dgev, dgpd, gev, gpd, gumbel, meplot, pgev, pgpd, qgev, qgpd,
##     rgev, rgpd

par(mfrow=c(2,5))

P=rpareto(10000,1,1)
plot(P)
hist(P)

P2=rpareto(10000,1,5)
plot(P2)
hist(P2)

P3=rpareto(10000,1,10)
plot(P3)
hist(P3)

P4=rpareto(10000,1,20)
plot(P4)
hist(P4)

P5=rpareto(10000,1,50)
plot(P5)
hist(P5)

hill(P)
hill(P2)
hill(P3)
hill(P4)
hill(P5)

PickandsEstimator(P)

## Evaluations of PickandsEstimator:
## ---------------------------------
## An object of class "Estimate" 
## generated by call
##   PickandsEstimator(x = P)
## samplesize:   10000
## estimate:
##       scale         shape   
##   2.897369485   0.005494826 
##  (0.069397894) (0.036073354)
## asymptotic (co)variance (multiplied with samplesize):
##           scale     shape
## scale  48.16068 -21.77534
## shape -21.77534  13.01287
## Infos:
##      method              message
## [1,] "PickandsEstimator" ""

#Frechet

library(evd)

par(mfrow=c(2,5))

F=rfrechet(10000,0,1,1)
plot(F)
hist(F)

F2=rfrechet(10000,0,1,2)
plot(F2)
hist(F2)

F3=rfrechet(10000,0,1,3)
plot(F3)
hist(F3)

hill(F)
hill(F2)
hill(F3)


#log-gamma


library(actuar)

## 
## Attaching package: 'actuar'

## The following objects are masked from 'package:VGAM':
## 
##     dgumbel, dlgamma, dpareto, pgumbel, plgamma, ppareto, qgumbel,
##     qlgamma, qpareto, rgumbel, rlgamma, rpareto

## The following objects are masked from 'package:evd':
## 
##     dgumbel, pgumbel, qgumbel, rgumbel

## The following object is masked from 'package:grDevices':
## 
##     cm

par(mfrow=c(2,5))

L=rlgamma(10000,1,1)
plot(L)
hist(L)

L2=rlgamma(10000,1,2)
plot(L2)
hist(L2)

L3=rlgamma(10000,2,2)
plot(L3)
hist(L3)

L4=rlgamma(10000,3,2)
plot(L4)
hist(L4)

L5=rlgamma(10000,3,3)
plot(L5)
hist(L5)

hill(L)
hill(L2)
hill(L3)
hill(L4)
hill(L5)