We fitted a Generalized Extreme Value (GEV) distribution using the fevd() function from the extRemes package on annual maxima data up to 1999.
The model estimated with estimated GEV Parameters as below, Location (μ): 48.61, Scale (σ): 20.32, Shape (ξ): 0.1505
Model Fit Assessment gives information based on the plots: First, Diagnostic plots (from plot(gev_fit)) provide visual tools for evaluating fit. QQ Plot and Probability Plot: Show how well the empirical quantiles match the model predictions. Return Level Plot: Indicates how well return levels predicted by the model align with observed extremes. LAST, Density Plot: Compares the empirical density and fitted GEV density. The GEV diagnostic plots confirm that, The model provides a good fit across most of the distribution. The return level estimates are consistent with the empirical maxima. The shape parameter ξ > 0 captures moderate tail heaviness appropriately, which is important for extreme rainfall. These plots suggest the GEV model is a reasonable fit, particularly since the data aligns well on the QQ and return level plots. Based on the result of the plots, we assume the GEV model a suitable tool for modeling the annual maxima of the pre-1999 rainfall data.
# Return level estimation for 100-year event
return_levels <- return.level(gev_fit, return.period = 100)
rl.extremes <- as.numeric(return_levels)
rl.extremes
# 410.4 mm Event return period:
1 / (1 - pevd(410.4, loc = 48.61, scale = 20.32, shape = 0.1505, type = "GEV"))
The estimated return period for 410.4 mm Event is around 5747.875 years.
Null Hypothesis: ξ = 0 (Gumbel distribution) Alternative: ξ ≠ 0 (general GEV)
library(extRemes)
gev_fit_gumbel <- fevd(annual_maxima$max_rain, type = "GEV", shape = 0)
lr.test(gev_fit, gev_fit_gumbel)
To assess whether the GEV model with an estimated shape parameter provides a significantly better fit than the simplified Gumbel model (which assumes the shape parameter ξ = 0), we performed a likelihood ratio test. The result shows a likelihood ratio value of 0, a chi-square critical value of 0.00, and a p-value of 1. This indicates that the two models are statistically indistinguishable in terms of fit. In other words, adding the shape parameter (ξ) does not improve the model’s performance. Consequently, we fail to reject the null hypothesis, and we conclude that the Gumbel distribution (ξ = 0) is sufficient for modeling the annual maxima data. This simplifies interpretation and may lead to more stable return level estimates, especially when sample sizes are limited.
# Likelihood Ratio Test between GEV and Gumbel
lrt_result <- lr.test(gev_fit, gev_fit_gumbel)
# Print the test result
print(lrt_result)
# Optional: Extract and display the p-value
cat("\nP-value for the likelihood ratio test:", lrt_result$p.value, "\n")
P-value for the likelihood ratio test: 1
return_period_gumbel
[1] 53940580
Under the Gumbel distribution (ξ = 0), the return period of an annual maximum event of 410.4mm is 53940580 years.
In this case, we compared two models for estimating the return period of a very extreme rainfall event of 410.4 mm: Under the GEV model, the estimated return period is approximately 5,748 years. Under the Gumbel model (a special case of GEV with shape ξ=0), the return period is approximately 53,940,580 years.
Interpretation: The Gumbel model assumes a thin-tailed distribution (ξ = 0), which underestimates the probability of extremely large events. This leads to inflated return periods, such as tens of millions of years—well beyond any meaningful climatological or planning timescale.
In contrast, the GEV model fitted (with ξ ≈ 0.15) allows for heavy tails, acknowledging the possibility of more extreme rainfall. The return period of ~5,748 years is still very long, but it’s a more realistic estimate for this extreme event and lies within computational and interpretive feasibility.
Conclusion: The GEV model provides the more useful return period estimate. It accounts for tail behavior more accurately through the shape parameter ξ, allowing for more realistic extrapolation to rare events. The Gumbel model severely underestimates tail risk, resulting in a practically unusable return period.
library(extRemes)
Loading required package: Lmoments
Warning: package ‘Lmoments’ was built under R version 4.1.3
Loading required package: distillery
Attaching package: ‘extRemes’
The following object is masked from ‘package:evd’:
mrlplot
The following objects are masked from ‘package:stats’:
qqnorm, qqplot
Daily Rainfall Over Time (First Plot) This time series plot shows daily rainfall from 1961 to 1999. Observation: The majority of the rainfall events appear low in magnitude, with a few pronounced spikes (extreme events) occurring sporadically. Conclusion: There is no clear long-term increasing or decreasing trend in the rainfall intensity over the years. The data appears stationary in the mean, with occasional extreme events scattered across the time series.
Rainfall vs. Day of Year (Second Plot) This scatter plot shows rainfall values against the day of the year (DOY), repeated for each year. Observation: Some clustering of higher rainfall values is observed towards the end of the year (DOY 330–365), where several of the largest values, including the maximum, appear. Conclusion: There is evidence of weak within-year seasonality, suggesting that extreme rainfall events tend to occur more frequently near the end of the calendar year, potentially corresponding to the wet season in the region of study.
Summary Annual Trend: No significant annual trend is detected. Seasonality: Weak but noticeable seasonality exists, with heavier rainfall more likely in the final months of the year.
To identify a suitable threshold for fitting a Generalized Pareto Distribution (GPD), we examined the mean residual life (MRL) plot using data up to and including 13th December 1999. The MRL plot shows the average excesses (mean values of exceedances) over a range of threshold values, along with confidence bands.
From the plot provided, we observe the following behavior: The curve initially rises, indicating increasing excesses with increasing thresholds, which is expected in the lower range. Around the threshold value of u ≈ 60–80 mm, the plot begins to level off and follows a roughly linear trend. Beyond u ≈ 100 mm, the plot becomes unstable, and the confidence intervals widen significantly due to the scarcity of data (fewer exceedances), making estimates unreliable.
Chosen Threshold We select a threshold of u = 70 mm for the following reasons: The MRL plot is approximately linear starting at u = 70 mm, satisfying one of the key conditions for a valid GPD approximation. This region provides a balance between bias and variance — a threshold too low can violate the assumptions of the GPD model, while a threshold too high leads to high variance due to few exceedances. The number of exceedances above 70 mm appears to be adequate for inference, while still maintaining the validity of the GPD approximation.
# Fit GPD to excesses over threshold
threshold <- 70
fit_gpd <- fevd(ven_sub$rainfall, threshold = threshold, type = "GP")
# Diagnostic plots
plot(fit_gpd)
# Estimate return period for 410.4 mm event
# Extract parameter estimates from the fitted GPD model
sigma_gpd <- fit_gpd$results$par["scale"]
xi_gpd <- fit_gpd$results$par["shape"]
u <- threshold # The threshold you used
# Define the event
event <- 410.4
# Compute exceedance probability
prob_exceed <- 1 - pevd(event, threshold = u,
scale = sigma_gpd,
shape = xi_gpd,
type = "GP")
# Estimate number of exceedances per year
years <- length(unique(format(ven_sub$date, "%Y")))
lambda <- sum(ven_sub$rainfall > threshold) / years
# Return period
return_period <- 1 / (lambda * prob_exceed)
cat("Return period under GPD model:", return_period, "years\n")
Return period under GPD model: Inf years
In this task, we fitted a Generalized Pareto Distribution (GPD) threshold exceedance model to the rainfall data using a constant parameter approach. The threshold was selected based on the Mean Residual Life (MRL) plot, which exhibited a relatively linear trend beginning around 70 mm. Thus, a threshold of 70 mm was used to fit the GPD model.
After fitting the model, we examined its diagnostic plots. The quantile-quantile (QQ) plot and return level plot both indicate a reasonably good fit between the observed exceedances and the GPD model. Specifically, the model quantiles align closely with the empirical quantiles, and the return level plot shows plausible extrapolation behavior with relatively tight confidence bounds.
We then attempted to estimate the return period for the 410.4 mm rainfall event, which occurred in mid-December 1999. The estimated probability of exceedance was extremely close to zero, resulting in a calculated return period of infinity. This implies that, under the fitted GPD model, an event of this magnitude is so rare (relative to the historical record) that it is considered virtually unprecedented. This aligns with the empirical rarity of such an extreme value in the dataset.
Comparison with Annual Maxima (GEV) Method In contrast, the return period estimated using the GEV model (Block Maxima approach) was approximately 5748 years. While still extreme, this estimate is finite and likely more interpretable in practical hydrological planning.
The key distinction lies in the data used: The GEV model summarizes annual maxima and may smooth over shorter extreme events. The GPD model, using threshold exceedances, focuses on localized extremes and has the potential to be more sensitive to very rare events. However, the GPD model’s reliability hinges critically on the chosen threshold and number of exceedances. In our case, with only 16 exceedances, model uncertainty is high. This is reflected in the wide confidence intervals in the return level plot and the unstable estimation of return periods for extreme events like 410.4 mm.
Conclusion While the GPD model gives a technically sound result suggesting an infinite return period, its reliability is questionable due to data scarcity beyond the threshold and the extremely rare nature of the target event. Therefore, despite the theoretical appeal of the GPD model, the GEV-based return period (approx. 5748 years) may offer a more stable and interpretable estimate in this case.
# Likelihood Ratio Test: constant vs time-varying
lr.test(fit_gpd, fit_gpd_time)
Likelihood-ratio Test
data: ven_sub$rainfallrainfall
Likelihood-ratio = 5.2206, chi-square critical value = 5.9915, alpha = 0.0500,
Degrees of Freedom = 2.0000, p-value = 0.07351
alternative hypothesis: greater
We constructed a GPD threshold exceedance model in which both the scale parameter σ and the shape parameter ξ were modeled as linear functions of time, using the following formulation: fit_gpd_time <- fevd(x = rainfall, data = ven_sub, threshold = threshold, scale.fun = ~time, shape.fun = ~time, type = “GP”) Parameter Estimates From the model output: σ(t) = 22.71 + 0.00289·t ξ(t) = –0.0663 – 9.1e–5·t
Both components of scale and shape functions showed very small time trends. The standard errors were extremely small (≈ 2e–8), indicating numerical stability, but possibly overfitting or limited informative contribution from the time covariate.
Hypothesis Testing A likelihood ratio test was performed to compare the time-varying model (fit_gpd_time) with the constant-parameter model (fit_gpd): lr.test(fit_gpd, fit_gpd_time) Likelihood ratio statistic: 5.22 Degrees of freedom: 2 Chi-square critical value (α = 0.05): 5.99 p-value: 0.0735 Since the p-value > 0.05, we fail to reject the null hypothesis that the simpler (constant-parameter) model is sufficient. This suggests that adding time-dependence does not significantly improve model fit.
Model Diagnostics The diagnostic plots from plot(fit_gpd_time) support this conclusion: The quantile-quantile (QQ) plot and return level plot fit the data reasonably well but do not show notable improvements over the constant GPD model. Residual plots do not indicate strong violations of model assumptions.
Interpretation The magnitude of the time-dependent coefficients (σ1, ξ1) is very small, implying minimal temporal variation in extremes. The data do not support strong evidence for temporal changes in the scale or shape of the tail behavior. This suggests the process is stationary, or that the dataset is too limited in sample size to detect non-stationarity robustly.
Final Evaluation The time-varying GPD model is technically more flexible but statistically not significantly better than the constant-parameter GPD model. Based on: The likelihood ratio test. The tiny magnitudes of time coefficients Model diagnostics plots We conclude that the constant-parameter model is more parsimonious and appropriate for this dataset. Thus, the added complexity of time-varying parameters is not justified by the evidence.
cat("Return period at 410.4mm:", return_period, "years\n")
Return period at 410.4mm: Inf years
In part (e), we aim to estimate the return period of the 15th December event (410.4 mm) using a time-varying GPD threshold exceedance model and compare it with results obtained from the annual maxima model.
Time-Varying GPD Model Result: From the output: Return period at 410.4mm: Inf years This result indicates that, under the time-varying GPD model, the 410.4 mm rainfall event is so extreme that the estimated return period is infinite. This typically happens when the extrapolated cumulative probability of exceeding this value is near zero, resulting in division by nearly zero (i.e., 1 / very small probability ≈ ∞).
This result can be due to: High tail heaviness (positive shape parameter). Limited number of exceedances beyond the threshold (in this case, only 16 exceedances as per diagnostic plots). The time-varying covariate structure may shrink the probability of occurrence further.
Annual Maxima Model Result: In contrast, under the GEV fit to annual maxima, the return period was estimated at approximately: ~ 5748 years. This is already a very long return period, indicating extreme rarity. However, it is still a finite estimate, suggesting a quantifiable, though rare, likelihood of recurrence.
Interpretation and Comparison: The time-varying GPD model, while more flexible, may overfit trends or yield unstable estimates for extremes far outside the data range. Extrapolation from a few exceedances (e.g., 16 points) can be dangerous, especially for estimating return levels near or beyond observed maximums.
Final Comment: In conclusion, while the time-varying GPD model accounts for nonstationarity and provides a detailed framework, its extrapolated return period for the 410.4 mm event may not be reliable. The annual maxima GEV model offers a more interpretable and robust return level, though it still reflects the exceptional rarity of such an event.
Part3. Rolling 3-day storm maxima (a): Replace Dec 16, Aggregate into 3-Day Storms, and Decluster
Plot over Time The declustered 3-day rainfall maxima over time shows a significant spike in December 1999, corresponding to the Vargas Tragedy. Otherwise, the majority of 3-day maxima stay below 200 mm. There is no clear upward or downward trend in the frequency or magnitude of storms over the 40-year period (1960–1999).
Plot against Day of Year (DOY) The seasonal pattern is weakly visible. Most 3-day extreme events occur throughout the year, but there’s a slight concentration toward the end of the year, including the most extreme event.
Comparison with Non-Aggregated (Daily) Data The daily data (as seen in Part 2 plots) also shows: No strong trend over time. A slightly more clear seasonal pattern, with extremes tending to occur in the latter part of the year. The 3-day aggregation smooths short-term noise and better captures multi-day storm systems (e.g., the 1999 event), which are missed or underestimated by daily maxima. The declustering also helps reduce dependence between events, making the data more suitable for extreme value modeling.
Conclusion The declustered 3-day storm data retains the key feature of the extreme December 1999 event and shows similar, albeit smoother and more interpretable patterns as the non-aggregated data. While the seasonality is slightly less apparent, the extreme nature of multi-day storm events becomes clearer, justifying the 3-day rolling window and declustering approach in modeling block maxima or exceedances.
Part 3(b): Fit GPD & Estimate Return Period (Excluding Dec 1999 Storm)
A good candidate threshold based on the plot is around u = 65 to 80. *we’ve tested the threshold from u = 65 to 80, but with Infinite as our result. Hence, we reduce our threshold to 50 to see whethere there would be a more reasonable result.
cat("Estimated return period:", return_period, "years\n")
Estimated return period: 72177645 years
GEV Model: The first estimate, around 5748 years, is derived from a Generalized Extreme Value (GEV) distribution fitted to annual maxima. This method assumes that yearly maximum rainfall values are independent and identically distributed and follow a GEV distribution. It is a standard method in extreme value analysis and works well for long-term planning, but may underutilize available data by discarding non-maximal extreme events within each year.
Gumbel Model: The second estimate, based on a Gumbel distribution (a special case of GEV with shape parameter zero), suggests an exceedingly small exceedance probability of 1.85e-08, indicating an extreme rarity. This translates to a return period of over 50 million years, which may reflect overfitting or poor tail modeling, given the Gumbel model’s light-tail assumption that may not suit heavy-tailed rainfall extremes.
GPD Model with 70mm threshold: The third model uses the Generalized Pareto Distribution (GPD) in a peaks-over-threshold (POT) framework with a threshold of 70 mm. The model yields an infinite return period, likely because 410.4 mm falls far in the tail with a near-zero or negative shape parameter, implying either a bounded tail or insufficient information in the exceedances to extrapolate that far. This result suggests that the event is beyond the scope of the model’s reasonable prediction capacity.
3-Day Aggregated POT Model (threshold = 50mm): The fourth estimate, approximately 722 million years, is based on a GPD fit to declustered, 3-day rolling rainfall totals with a threshold of 50 mm. Although this model incorporates storm clustering and is more hydrologically appropriate, the extremely high return period again reflects the sensitivity of the model to the shape parameter and the difficulty of extrapolating far into the tail.
In summary, each model has different assumptions: GEV assumes block maxima, Gumbel assumes a light-tailed distribution, and GPD requires a careful threshold choice and assumes independent threshold exceedances. The 3-day model incorporates temporal aggregation to improve independence and physical realism. However, due to model sensitivity and extrapolation uncertainty, particularly in the tail, the GEV estimate (∼5748 years) is the most statistically robust and interpretable among the four, though the 3-day model provides a physically meaningful complementary view.