Exercise 5 Assignment 1 Extremes

CHEVALIER Paolo

2025-02-06

Exercise 5)

Loading the data

france_extremes = read.table("https://www.mas.ncl.ac.uk/~nlf8/teaching/mas3918/VE_Spring25-France.txt",header=TRUE)

Attach the data so we don’t have to use france_extremes$…

attach(france_extremes)

Overlook on the data

I have to use the annual maximum sunshine hours data.

Let’s look at basic stats from our data.

summary(sunshine_anmax)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   13.40   14.70   15.00   14.86   15.10   15.40

(Thankfully these are maximas because Rennes isn’t usually that sunny, trust me)

Plotting the series gives the following graph;

plot(x=Year,y=sunshine_anmax,type="b",main="Annual maximum sunshine hours by year",xlab="Year",ylab="AMSH")

The data does not look like it contains a trend which is great for GEV analysis.

Perform a GEV fit using the ismev package

library(ismev)
mod=gev.fit(xdat=sunshine_anmax)
## $conv
## [1] 0
## 
## $nllh
## [1] 12.09655
## 
## $mle
## [1] 14.7989245  0.4005894 -0.6517679
## 
## $se
## [1] 0.06438650 0.05051933 0.08421450

Let’s check the goodness of fit of our model

gev.diag(mod)

All these plots seems to indicate that our model is adequate. The interesting shapes of the two plots on the top are due to some values being exactly the same.

Let’s store the parameters and standard errors for later use.

varcov=mod$cov
param=list("location"=mod$mle[1],"scale"=mod$mle[2],"shape"=mod$mle[3])
param_se=list("location"=mod$se[1],"scale"=mod$se[2],"shape"=mod$se[3])

Let’s compute the 95% confidence intervals for our parameters

\(\mu\)

location_CI=c(param$location-1.96*param_se$location,param$location+1.96*param_se$location)
location_CI
## [1] 14.67273 14.92512

\(\sigma\)

scale_CI=c(param$scale-1.96*param_se$scale,param$scale+1.96*param_se$scale)
scale_CI
## [1] 0.3015715 0.4996073

\(\xi\)

shape_CI=c(param$shape-1.96*param_se$shape,param$shape+1.96*param_se$shape)
shape_CI
## [1] -0.8168283 -0.4867075

Zero is not included in our confidence interval, therefore we can rule out the possibility of having a Gumbel distribution (type 1) and our distribution is of Weibull (type 3). This conclusion seems adequate given the data we are working with, indeed sunshine hours are bounded at a location like Rennes.

Return levels, their standard errors and empirical estimates

Function to compute return levels

return_level=function(location,scale,shape,r){
  return(location + (scale/shape)*((-log(1-(1/r)))^(-shape)-1))
}

Let’s compute the return levels

z50=return_level(param$location,param$scale,param$shape,50)
z500=return_level(param$location,param$scale,param$shape,500)

Let’s now compute their respective standard errors using the delta method as detailed in the lesson.

return_level_se=function(location,scale,shape,r,varcov){
  yr=-log(1-(1/r))
  deltazr=c(1, -shape^-1 * ( 1-yr^-shape ), scale * shape^-2 * ( 1-yr^-shape ) - scale * shape^-1 * yr^-shape * log(yr))
  return( as.numeric(sqrt(t(deltazr) %*% varcov %*% deltazr)) ) #as.numeric just so the result is prettier (just a number and not a 1x1 matrix)
}
z50_se=return_level_se(param$location,param$scale,param$shape,50,varcov)
z500_se=return_level_se(param$location,param$scale,param$shape,500,varcov)
z50
## [1] 15.36522
z50_se
## [1] 0.01916292
z500
## [1] 15.40283
z500_se
## [1] 0.01829871

Therefore, we have \(z_{50}=15.36522 \, (0.01916292) \, \text{hours}\) and \(z_{500}=15.40283 \, (0.01829871) \, \text{hours}\)

We can also compare our return levels with their empirical counterparts (when r<=44), for that we infer the empirical estimate of the 10-years return level by looking at the empirical distribution function.

sunshine_anmax_sorted=sort(sunshine_anmax)
sunshine_anmax_sorted
##  [1] 13.4 14.0 14.1 14.2 14.4 14.5 14.6 14.6 14.7 14.7 14.7 14.7 14.7 14.7 14.8
## [16] 14.8 14.8 14.9 14.9 14.9 14.9 15.0 15.0 15.0 15.0 15.0 15.0 15.1 15.1 15.1
## [31] 15.1 15.1 15.1 15.1 15.1 15.2 15.2 15.2 15.2 15.2 15.2 15.2 15.3 15.4
n=length(sunshine_anmax_sorted)
n
## [1] 44
empirical_pr=(1:n)/(n+1)
empirical_pr
##  [1] 0.02222222 0.04444444 0.06666667 0.08888889 0.11111111 0.13333333
##  [7] 0.15555556 0.17777778 0.20000000 0.22222222 0.24444444 0.26666667
## [13] 0.28888889 0.31111111 0.33333333 0.35555556 0.37777778 0.40000000
## [19] 0.42222222 0.44444444 0.46666667 0.48888889 0.51111111 0.53333333
## [25] 0.55555556 0.57777778 0.60000000 0.62222222 0.64444444 0.66666667
## [31] 0.68888889 0.71111111 0.73333333 0.75555556 0.77777778 0.80000000
## [37] 0.82222222 0.84444444 0.86666667 0.88888889 0.91111111 0.93333333
## [43] 0.95555556 0.97777778
comparaison=cbind(sunshine_anmax_sorted,empirical_pr)
comparaison
##       sunshine_anmax_sorted empirical_pr
##  [1,]                  13.4   0.02222222
##  [2,]                  14.0   0.04444444
##  [3,]                  14.1   0.06666667
##  [4,]                  14.2   0.08888889
##  [5,]                  14.4   0.11111111
##  [6,]                  14.5   0.13333333
##  [7,]                  14.6   0.15555556
##  [8,]                  14.6   0.17777778
##  [9,]                  14.7   0.20000000
## [10,]                  14.7   0.22222222
## [11,]                  14.7   0.24444444
## [12,]                  14.7   0.26666667
## [13,]                  14.7   0.28888889
## [14,]                  14.7   0.31111111
## [15,]                  14.8   0.33333333
## [16,]                  14.8   0.35555556
## [17,]                  14.8   0.37777778
## [18,]                  14.9   0.40000000
## [19,]                  14.9   0.42222222
## [20,]                  14.9   0.44444444
## [21,]                  14.9   0.46666667
## [22,]                  15.0   0.48888889
## [23,]                  15.0   0.51111111
## [24,]                  15.0   0.53333333
## [25,]                  15.0   0.55555556
## [26,]                  15.0   0.57777778
## [27,]                  15.0   0.60000000
## [28,]                  15.1   0.62222222
## [29,]                  15.1   0.64444444
## [30,]                  15.1   0.66666667
## [31,]                  15.1   0.68888889
## [32,]                  15.1   0.71111111
## [33,]                  15.1   0.73333333
## [34,]                  15.1   0.75555556
## [35,]                  15.1   0.77777778
## [36,]                  15.2   0.80000000
## [37,]                  15.2   0.82222222
## [38,]                  15.2   0.84444444
## [39,]                  15.2   0.86666667
## [40,]                  15.2   0.88888889
## [41,]                  15.2   0.91111111
## [42,]                  15.2   0.93333333
## [43,]                  15.3   0.95555556
## [44,]                  15.4   0.97777778

An empirical estimate of the 10-year return level is the value which is exceeded once every 10 observations.

That is \(\mathbb{P}(M_{n}>z_{10})=\frac{1}{10} \implies \mathbb{P}(M_n \leq z_{10})=0.9\)

Hence, we need to find the value of \(i\) for which \(\frac{i}{n + 1} = 0.9\). This gives us a return level of \(\approx 15.2 \text{h}\)

With the function above we get

z10=return_level(param$location,param$scale,param$shape,10)
z10_se=return_level_se(param$location,param$scale,param$shape,10,varcov)
z10
## [1] 15.27176
z10_se
## [1] 0.03123557

Hence, the model gives us a pretty decent estimate.

Conclusion

What we get from this analysis is interesting, it shows that even a 500 years maxima would be around 15.4 hours which means we wouldn’t reach any maxima we haven’t already (given there is no trend here). The very small standard errors are probably due to the fact that our data does not really contains any extreme value and seems very stable. We can therefore conclude that there is no significant “risk” of having a maxima very high compared to what we have already seen, which seems fairly adequate given the fact we’re working with sunshine hours.