Prevence hydrologických extrémů
Extrémy
Definice extrému
- Absolutní maximum
- Bloková maxima
- Nadprahové hodnoty
Zobrazení extrému
Data
Stanice Herwijnen v Holandsku, záznam od 5.5.1993
lop <- c('stats4',
'lmom',
'CoSMoS',
'extraDistr')
to.instal <- lop[which(x = !lop %in% installed.packages()[,'Package'])]
if(length(x = to.instal) != 0) {
install.packages(pkgs = to.instal)
}
aux <- lapply(X = lop,
FUN = library, character.only = T)
rm(list = 'aux')
dta.raw <- fread(input = 'https://climexp.knmi.nl/data/rhrh356.dat',
skip = 16,
verbose = FALSE,
col.names = c('Y', 'm', 'd', 'P'))
dta.raw[, `:=`(DTM = as.Date(x = paste(Y, m, d,
sep = '-'),
format = '%Y-%m-%d'))]
dta <- dta.raw[, .(DTM, P)]
summary(object = dta)
> DTM P
> Min. :1993-05-05 Min. : 0.000
> 1st Qu.:2000-01-09 1st Qu.: 0.000
> Median :2006-09-08 Median : 0.000
> Mean :2006-09-07 Mean : 2.159
> 3rd Qu.:2013-05-07 3rd Qu.: 2.400
> Max. :2020-01-05 Max. :99.100
ggplot(data = dta) +
geom_line(mapping = aes(x = DTM,
y = P),
colour = 'red4') +
labs(x = 'Date',
y = 'Precipitation totals [mm]') +
theme_bw()
Gumbel plot
ÚKOL - vykreslete ‘Gumbel plot’
Jak na to?
- Odhadnout pravděpodobnost \(p\)
- na osu \(x\) vynést \(-log(-log(p))\), na osu \(y\) pak daná maxima
Rozdělení
GEV - Generalized extreme value distribution
Rozdělení pro bloková maxima
\[f_{\text{GEV}}(x)=exp\left\{-\left[1+\gamma(\frac{x-\alpha}{\beta})\right]^{-\frac{1}{\gamma}}\right\},\gamma\neq0,\\ f_{\text{GEV}}(x)=exp\left\{-exp\left[-(\frac{x-\alpha}{\beta})\right]\right\},\gamma=0. \]
GPD - Generalized Pareto distribution
Rozdělení používané pro nad/podprahové hodnoty \[f_{\text{GPD}}(x) = \beta^{-1}e^{-(1 - \gamma)y}, \quad y = \begin{cases} -\gamma^{-1}\log \Big[ 1 - \frac{\gamma(x - \alpha)}{\beta} \Big], & \gamma \neq 0,\\\frac{(x - \alpha)}{\beta}, & \gamma = 0.\end{cases} \]
Burr XII/ggamma - Burr type XII & Generalzed gamma distribution
Rozdělení vhodné pro denní klimatické veličiny \[ f_{\text{BrXII}}(x)=\frac{\left(\frac{x}{\beta }\right)^{\gamma _1-1} \left(\gamma _2 \left(\frac{x}{\beta }\right)^{\gamma _1}+1\right){}^{-\frac{1}{\gamma _1 \gamma _2}-1}}{\beta}. \] \[f_{\mathcal{G}\mathcal{G}}(x)=\frac{\gamma _2 e^{-\left(\frac{x}{\beta }\right)^{\gamma _2}} \left(\frac{x}{\beta }\right)^{\gamma _1-1}}{\beta \Gamma \left(\frac{\gamma _1}{\gamma _2}\right)}. \]
Fitování
Jak je možné získat parametry rozdělení?
Maximum likelihood
\[L = \sum_{t}^{}l[\theta(t) ;X(t)]. \]
opt <- function(loc, scale, shape) {
aux <- dgev(x = mx[, mx],
mu = loc,
sigma = scale,
xi = shape)
aux <- log(x = aux)
-sum(x = aux[!is.infinite(aux)], na.rm = TRUE)
}
ml <- mle(minuslogl = opt,
method = 'L-BFGS-B',
start = list(loc = mean(mx[, mx]),
scale = 5,
shape = .6),
lower = c(-Inf, .01, -1),
upper = c(Inf, Inf, 1))
par.ml <- coef(ml)
L-moments
\[ \lambda_{1}=E(X_{1:1}), \] \[ \lambda_{2}=E(X_{2:2} - X_{1:2}), \] \[ \lambda_{3}=E(X_{3:3} - 2X_{2:3} + X_{1:3}), \] \[ \lambda_{4}=E(X_{4:4} - 3X_{3:4} + 3X_{2:4} - X_{1:4}). \]
Norms
\[ N1 = \frac{1}{n}\sum_{i = 1}^{n}\left( \frac{\overline{F}(x_{(i)})}{\overline{F}_{n}(x_{(i)})} - 1 \right)^{2}, \] \[ N2 = \frac{1}{n}\sum_{i = 1}^{n}\left(\overline{F}(x_{(i)}) - \overline{F}_{n}(x_{(i)})\right)^{2}, \] \[ N3 = \frac{1}{n}\sum_{i = 1}^{n}\left(\frac{x_{u}}{x{(i)}} - 1\right)^{2}, \] \[ N4 = \frac{1}{n}\sum_{i = 1}^{n}\left(x_{u} - x{(i)}\right)^{2}. \]
par.n <- fitDist(data = mx[,mx],
dist = 'gev',
n.points = 50,
norm = 'N2',
constrain = FALSE)
par.n <- unlist(x = par.n)
names(x = par.n) <- c('loc', 'scale', 'shape')
> loc scale shape
> 33.4890466 5.4100687 0.6518717
> loc scale shape
> 26.4281093 5.1142200 -0.4698528
> loc scale shape
> 26.3355573 4.8867747 0.6576104
Goodnes-of-fit
Jak se dá ověřit správnost odhadu parametrů?
- Subjektivní metody (grafické porovnání empirických a teoretických hodnot etc…)
- Objektivni metody (statistické testy - Anderson-Darling, Kolmogorov-Smirnof etc…)
Porovnání distribučních funkcí
ÚKOL - vykreslete následující graf
Anderson-Darling test
Anderson-Darling (AD) test je statistický test porovnávající rozdíl mezi \(F\) (teoretickou distribuční funkcí pod nulovou hypotézoua \(F_{n}\) (empirickou distribuční funkcí).
\[ A^{2} = n\int_{-\infty}^{\infty} \frac{[F_{n}(x) - F(x)]^{2}}{F(x)[1 - F(x)]}{\text{d}F(x)}. \]
AD.test <- function(data, dist, ...) {
val <- unlist(x = data)
val <- val[!is.na(x = val)]
pars <- list(...)
p <- do.call(what = paste0('p', dist),
args = c(list(q = val),
pars))
u <- sort(x = p[(p != 0) & (p != 1)])
nr <- length(x = u)
-nr - 1/nr*sum((2*1:nr - 1)*log(x = u) +
(2*nr - 2*1:nr + 1)*log(x = 1 - u))
}
base.ad <- AD.test(data = mx[, mx],
dist = 'gev',
mu = par.ml['loc'],
sigma = par.ml['scale'],
xi = par.ml['shape'])
base.ad
> [1] 19.53714
Máme kritickou hodnotu a co dál?
n.smp <- 200
dist <- 'gev'
pars <- list(mu = par.n['loc'],
sigma = par.n['scale'],
xi = par.n['shape'])
smp <- replicate(n = n.smp,
expr = do.call(what = paste0('r', dist),
args = c(list(n = dim(mx)[1]),
pars)))
smp.ad <- apply(X = smp,
MARGIN = 2,
FUN = function(x) {
par.aux <- fitDist(data = x,
dist = 'gev',
n.points = 50,
norm = 'N2',
constrain = FALSE)
par.aux <- unlist(x = par.aux)
names(x = par.aux) <- c('loc', 'scale', 'shape')
AD.test(data = x,
dist = 'gev',
mu = par.aux['loc'],
sigma = par.aux['scale'],
xi = par.aux['shape'])
}
)
p.val <- length(x = which(x = smp.ad >= base.ad))/n.smp
p.val
> [1] 0
ÚKOL - odhadněte hodnoty pro doby opakování 5, 10, 20, 50 a 100let.