Exercise 1

Part 1. Block maxima approach (Annual Maxima)

a) Fit and Assessment of the GEV Model

We fitted a Generalized Extreme Value (GEV) distribution using the fevd() function from the extRemes package on annual maxima data up to 1999.

The model estimated with estimated GEV Parameters as below, Location (μ): 48.61, Scale (σ): 20.32, Shape (ξ): 0.1505

Model Fit Assessment gives information based on the plots: First, Diagnostic plots (from plot(gev_fit)) provide visual tools for evaluating fit. QQ Plot and Probability Plot: Show how well the empirical quantiles match the model predictions. Return Level Plot: Indicates how well return levels predicted by the model align with observed extremes. LAST, Density Plot: Compares the empirical density and fitted GEV density. The GEV diagnostic plots confirm that, The model provides a good fit across most of the distribution. The return level estimates are consistent with the empirical maxima. The shape parameter ξ > 0 captures moderate tail heaviness appropriately, which is important for extreme rainfall. These plots suggest the GEV model is a reasonable fit, particularly since the data aligns well on the QQ and return level plots. Based on the result of the plots, we assume the GEV model a suitable tool for modeling the annual maxima of the pre-1999 rainfall data.

# Return level estimation for 100-year event
return_levels <- return.level(gev_fit, return.period = 100)
rl.extremes <- as.numeric(return_levels)
rl.extremes


# 410.4 mm Event return period:
1 / (1 - pevd(410.4, loc = 48.61, scale = 20.32, shape = 0.1505, type = "GEV"))

The estimated return period for 410.4 mm Event is around 5747.875 years.

b) Hypothesis Test for Gumbel Distribution (ξ = 0)

Null Hypothesis: ξ = 0 (Gumbel distribution) Alternative: ξ ≠ 0 (general GEV)

library(extRemes)
gev_fit_gumbel <- fevd(annual_maxima$max_rain, type = "GEV", shape = 0)
lr.test(gev_fit, gev_fit_gumbel)

To assess whether the GEV model with an estimated shape parameter provides a significantly better fit than the simplified Gumbel model (which assumes the shape parameter ξ = 0), we performed a likelihood ratio test. The result shows a likelihood ratio value of 0, a chi-square critical value of 0.00, and a p-value of 1. This indicates that the two models are statistically indistinguishable in terms of fit. In other words, adding the shape parameter (ξ) does not improve the model’s performance. Consequently, we fail to reject the null hypothesis, and we conclude that the Gumbel distribution (ξ = 0) is sufficient for modeling the annual maxima data. This simplifies interpretation and may lead to more stable return level estimates, especially when sample sizes are limited.

# Likelihood Ratio Test between GEV and Gumbel
lrt_result <- lr.test(gev_fit, gev_fit_gumbel)

# Print the test result
print(lrt_result)

# Optional: Extract and display the p-value
cat("\nP-value for the likelihood ratio test:", lrt_result$p.value, "\n")

P-value for the likelihood ratio test: 1

return_period_gumbel
[1] 53940580

Under the Gumbel distribution (ξ = 0), the return period of an annual maximum event of 410.4mm is 53940580 years.

c) Return Period Comparison and Interpretation

In this case, we compared two models for estimating the return period of a very extreme rainfall event of 410.4 mm: Under the GEV model, the estimated return period is approximately 5,748 years. Under the Gumbel model (a special case of GEV with shape ξ=0), the return period is approximately 53,940,580 years.

Interpretation: The Gumbel model assumes a thin-tailed distribution (ξ = 0), which underestimates the probability of extremely large events. This leads to inflated return periods, such as tens of millions of years—well beyond any meaningful climatological or planning timescale.

In contrast, the GEV model fitted (with ξ ≈ 0.15) allows for heavy tails, acknowledging the possibility of more extreme rainfall. The return period of ~5,748 years is still very long, but it’s a more realistic estimate for this extreme event and lies within computational and interpretive feasibility.

Conclusion: The GEV model provides the more useful return period estimate. It accounts for tail behavior more accurately through the shape parameter ξ, allowing for more realistic extrapolation to rare events. The Gumbel model severely underestimates tail risk, resulting in a practically unusable return period.

Part2. POT model using threshold exceedances

library(extRemes)
Loading required package: Lmoments
Warning: package ‘Lmoments’ was built under R version 4.1.3
Loading required package: distillery

Attaching package: ‘extRemes’

The following object is masked from ‘package:evd’:

    mrlplot

The following objects are masked from ‘package:stats’:

    qqnorm, qqplot

a) Time Plot & Seasonality Check

  1. Daily Rainfall Over Time (First Plot) This time series plot shows daily rainfall from 1961 to 1999. Observation: The majority of the rainfall events appear low in magnitude, with a few pronounced spikes (extreme events) occurring sporadically. Conclusion: There is no clear long-term increasing or decreasing trend in the rainfall intensity over the years. The data appears stationary in the mean, with occasional extreme events scattered across the time series.

  2. Rainfall vs. Day of Year (Second Plot) This scatter plot shows rainfall values against the day of the year (DOY), repeated for each year. Observation: Some clustering of higher rainfall values is observed towards the end of the year (DOY 330–365), where several of the largest values, including the maximum, appear. Conclusion: There is evidence of weak within-year seasonality, suggesting that extreme rainfall events tend to occur more frequently near the end of the calendar year, potentially corresponding to the wet season in the region of study.

Summary Annual Trend: No significant annual trend is detected. Seasonality: Weak but noticeable seasonality exists, with heavier rainfall more likely in the final months of the year.

  1. Threshold Selection (pre-13 Dec 1999)

To identify a suitable threshold for fitting a Generalized Pareto Distribution (GPD), we examined the mean residual life (MRL) plot using data up to and including 13th December 1999. The MRL plot shows the average excesses (mean values of exceedances) over a range of threshold values, along with confidence bands.

From the plot provided, we observe the following behavior: The curve initially rises, indicating increasing excesses with increasing thresholds, which is expected in the lower range. Around the threshold value of u ≈ 60–80 mm, the plot begins to level off and follows a roughly linear trend. Beyond u ≈ 100 mm, the plot becomes unstable, and the confidence intervals widen significantly due to the scarcity of data (fewer exceedances), making estimates unreliable.

Chosen Threshold We select a threshold of u = 70 mm for the following reasons: The MRL plot is approximately linear starting at u = 70 mm, satisfying one of the key conditions for a valid GPD approximation. This region provides a balance between bias and variance — a threshold too low can violate the assumptions of the GPD model, while a threshold too high leads to high variance due to few exceedances. The number of exceedances above 70 mm appears to be adequate for inference, while still maintaining the validity of the GPD approximation.

  1. Fit GPD Model & Estimate Return Period

# Fit GPD to excesses over threshold
threshold <- 70
fit_gpd <- fevd(ven_sub$rainfall, threshold = threshold, type = "GP")
# Diagnostic plots
plot(fit_gpd)
# Estimate return period for 410.4 mm event
# Extract parameter estimates from the fitted GPD model
sigma_gpd <- fit_gpd$results$par["scale"]
xi_gpd    <- fit_gpd$results$par["shape"]
u         <- threshold  # The threshold you used
# Define the event
event <- 410.4
# Compute exceedance probability
prob_exceed <- 1 - pevd(event, threshold = u,
                        scale = sigma_gpd,
                        shape = xi_gpd,
                        type = "GP")
# Estimate number of exceedances per year
years <- length(unique(format(ven_sub$date, "%Y")))
lambda <- sum(ven_sub$rainfall > threshold) / years
# Return period
return_period <- 1 / (lambda * prob_exceed)
cat("Return period under GPD model:", return_period, "years\n")
Return period under GPD model: Inf years

In this task, we fitted a Generalized Pareto Distribution (GPD) threshold exceedance model to the rainfall data using a constant parameter approach. The threshold was selected based on the Mean Residual Life (MRL) plot, which exhibited a relatively linear trend beginning around 70 mm. Thus, a threshold of 70 mm was used to fit the GPD model.

After fitting the model, we examined its diagnostic plots. The quantile-quantile (QQ) plot and return level plot both indicate a reasonably good fit between the observed exceedances and the GPD model. Specifically, the model quantiles align closely with the empirical quantiles, and the return level plot shows plausible extrapolation behavior with relatively tight confidence bounds.

We then attempted to estimate the return period for the 410.4 mm rainfall event, which occurred in mid-December 1999. The estimated probability of exceedance was extremely close to zero, resulting in a calculated return period of infinity. This implies that, under the fitted GPD model, an event of this magnitude is so rare (relative to the historical record) that it is considered virtually unprecedented. This aligns with the empirical rarity of such an extreme value in the dataset.

Comparison with Annual Maxima (GEV) Method In contrast, the return period estimated using the GEV model (Block Maxima approach) was approximately 5748 years. While still extreme, this estimate is finite and likely more interpretable in practical hydrological planning.

The key distinction lies in the data used: The GEV model summarizes annual maxima and may smooth over shorter extreme events. The GPD model, using threshold exceedances, focuses on localized extremes and has the potential to be more sensitive to very rare events. However, the GPD model’s reliability hinges critically on the chosen threshold and number of exceedances. In our case, with only 16 exceedances, model uncertainty is high. This is reflected in the wide confidence intervals in the return level plot and the unstable estimation of return periods for extreme events like 410.4 mm.

Conclusion While the GPD model gives a technically sound result suggesting an infinite return period, its reliability is questionable due to data scarcity beyond the threshold and the extremely rare nature of the target event. Therefore, despite the theoretical appeal of the GPD model, the GEV-based return period (approx. 5748 years) may offer a more stable and interpretable estimate in this case.

  1. Time-varying Model for GPD Parameters
# Likelihood Ratio Test: constant vs time-varying
lr.test(fit_gpd, fit_gpd_time)

    Likelihood-ratio Test

data:  ven_sub$rainfallrainfall
Likelihood-ratio = 5.2206, chi-square critical value = 5.9915, alpha = 0.0500,
Degrees of Freedom = 2.0000, p-value = 0.07351
alternative hypothesis: greater

We constructed a GPD threshold exceedance model in which both the scale parameter σ and the shape parameter ξ were modeled as linear functions of time, using the following formulation: fit_gpd_time <- fevd(x = rainfall, data = ven_sub, threshold = threshold, scale.fun = ~time, shape.fun = ~time, type = “GP”) Parameter Estimates From the model output: σ(t) = 22.71 + 0.00289·t ξ(t) = –0.0663 – 9.1e–5·t

Both components of scale and shape functions showed very small time trends. The standard errors were extremely small (≈ 2e–8), indicating numerical stability, but possibly overfitting or limited informative contribution from the time covariate.

Hypothesis Testing A likelihood ratio test was performed to compare the time-varying model (fit_gpd_time) with the constant-parameter model (fit_gpd): lr.test(fit_gpd, fit_gpd_time) Likelihood ratio statistic: 5.22 Degrees of freedom: 2 Chi-square critical value (α = 0.05): 5.99 p-value: 0.0735 Since the p-value > 0.05, we fail to reject the null hypothesis that the simpler (constant-parameter) model is sufficient. This suggests that adding time-dependence does not significantly improve model fit.

Model Diagnostics The diagnostic plots from plot(fit_gpd_time) support this conclusion: The quantile-quantile (QQ) plot and return level plot fit the data reasonably well but do not show notable improvements over the constant GPD model. Residual plots do not indicate strong violations of model assumptions.

Interpretation The magnitude of the time-dependent coefficients (σ1, ξ1) is very small, implying minimal temporal variation in extremes. The data do not support strong evidence for temporal changes in the scale or shape of the tail behavior. This suggests the process is stationary, or that the dataset is too limited in sample size to detect non-stationarity robustly.

Final Evaluation The time-varying GPD model is technically more flexible but statistically not significantly better than the constant-parameter GPD model. Based on: The likelihood ratio test. The tiny magnitudes of time coefficients Model diagnostics plots We conclude that the constant-parameter model is more parsimonious and appropriate for this dataset. Thus, the added complexity of time-varying parameters is not justified by the evidence.

  1. Return Period of 15 Dec Event (410.4 mm)
cat("Return period at 410.4mm:", return_period, "years\n")
Return period at 410.4mm: Inf years

In part (e), we aim to estimate the return period of the 15th December event (410.4 mm) using a time-varying GPD threshold exceedance model and compare it with results obtained from the annual maxima model.

Time-Varying GPD Model Result: From the output: Return period at 410.4mm: Inf years This result indicates that, under the time-varying GPD model, the 410.4 mm rainfall event is so extreme that the estimated return period is infinite. This typically happens when the extrapolated cumulative probability of exceeding this value is near zero, resulting in division by nearly zero (i.e., 1 / very small probability ≈ ∞).

This result can be due to: High tail heaviness (positive shape parameter). Limited number of exceedances beyond the threshold (in this case, only 16 exceedances as per diagnostic plots). The time-varying covariate structure may shrink the probability of occurrence further.

Annual Maxima Model Result: In contrast, under the GEV fit to annual maxima, the return period was estimated at approximately: ~ 5748 years. This is already a very long return period, indicating extreme rarity. However, it is still a finite estimate, suggesting a quantifiable, though rare, likelihood of recurrence.

Interpretation and Comparison: The time-varying GPD model, while more flexible, may overfit trends or yield unstable estimates for extremes far outside the data range. Extrapolation from a few exceedances (e.g., 16 points) can be dangerous, especially for estimating return levels near or beyond observed maximums.

Final Comment: In conclusion, while the time-varying GPD model accounts for nonstationarity and provides a detailed framework, its extrapolated return period for the 410.4 mm event may not be reliable. The annual maxima GEV model offers a more interpretable and robust return level, though it still reflects the exceptional rarity of such an event.

Part3. Rolling 3-day storm maxima (a): Replace Dec 16, Aggregate into 3-Day Storms, and Decluster

  1. Plot over Time The declustered 3-day rainfall maxima over time shows a significant spike in December 1999, corresponding to the Vargas Tragedy. Otherwise, the majority of 3-day maxima stay below 200 mm. There is no clear upward or downward trend in the frequency or magnitude of storms over the 40-year period (1960–1999).

  2. Plot against Day of Year (DOY) The seasonal pattern is weakly visible. Most 3-day extreme events occur throughout the year, but there’s a slight concentration toward the end of the year, including the most extreme event.

  3. Comparison with Non-Aggregated (Daily) Data The daily data (as seen in Part 2 plots) also shows: No strong trend over time. A slightly more clear seasonal pattern, with extremes tending to occur in the latter part of the year. The 3-day aggregation smooths short-term noise and better captures multi-day storm systems (e.g., the 1999 event), which are missed or underestimated by daily maxima. The declustering also helps reduce dependence between events, making the data more suitable for extreme value modeling.

Conclusion The declustered 3-day storm data retains the key feature of the extreme December 1999 event and shows similar, albeit smoother and more interpretable patterns as the non-aggregated data. While the seasonality is slightly less apparent, the extreme nature of multi-day storm events becomes clearer, justifying the 3-day rolling window and declustering approach in modeling block maxima or exceedances.

Part 3(b): Fit GPD & Estimate Return Period (Excluding Dec 1999 Storm)

A good candidate threshold based on the plot is around u = 65 to 80. *we’ve tested the threshold from u = 65 to 80, but with Infinite as our result. Hence, we reduce our threshold to 50 to see whethere there would be a more reasonable result.

cat("Estimated return period:", return_period, "years\n")
Estimated return period: 72177645 years
  1. Contrast of All 4 Estimates of the Return Period of the December 1999 Storm (820.4 mm) The four estimates of the return period for the December 1999 rainfall event (410.4 mm) differ substantially due to the modeling approach and assumptions used in each case.
  1. GEV Model: The first estimate, around 5748 years, is derived from a Generalized Extreme Value (GEV) distribution fitted to annual maxima. This method assumes that yearly maximum rainfall values are independent and identically distributed and follow a GEV distribution. It is a standard method in extreme value analysis and works well for long-term planning, but may underutilize available data by discarding non-maximal extreme events within each year.

  2. Gumbel Model: The second estimate, based on a Gumbel distribution (a special case of GEV with shape parameter zero), suggests an exceedingly small exceedance probability of 1.85e-08, indicating an extreme rarity. This translates to a return period of over 50 million years, which may reflect overfitting or poor tail modeling, given the Gumbel model’s light-tail assumption that may not suit heavy-tailed rainfall extremes.

  3. GPD Model with 70mm threshold: The third model uses the Generalized Pareto Distribution (GPD) in a peaks-over-threshold (POT) framework with a threshold of 70 mm. The model yields an infinite return period, likely because 410.4 mm falls far in the tail with a near-zero or negative shape parameter, implying either a bounded tail or insufficient information in the exceedances to extrapolate that far. This result suggests that the event is beyond the scope of the model’s reasonable prediction capacity.

  4. 3-Day Aggregated POT Model (threshold = 50mm): The fourth estimate, approximately 722 million years, is based on a GPD fit to declustered, 3-day rolling rainfall totals with a threshold of 50 mm. Although this model incorporates storm clustering and is more hydrologically appropriate, the extremely high return period again reflects the sensitivity of the model to the shape parameter and the difficulty of extrapolating far into the tail.

In summary, each model has different assumptions: GEV assumes block maxima, Gumbel assumes a light-tailed distribution, and GPD requires a careful threshold choice and assumes independent threshold exceedances. The 3-day model incorporates temporal aggregation to improve independence and physical realism. However, due to model sensitivity and extrapolation uncertainty, particularly in the tail, the GEV estimate (∼5748 years) is the most statistically robust and interpretable among the four, though the 3-day model provides a physically meaningful complementary view.

Exercise 2: Netflix Share Prices Analysis

(a) Construct log daily return series and visualize

The log daily return series for the four stock variables—Open, High, Low, and Close—were computed using the transformation yt =log(xt/xt−1), where xtrepresents the value of the stock at time t. This approach standardizes the returns across different time points and stabilizes the variance, which is a common preprocessing step in financial time series analysis.

The time series plots reveal periods of increased volatility, especially during financial crises such as the 2008 global financial crisis and the COVID-19 pandemic period around 2020. All four variables exhibit spikes and dips in returns during these times, suggesting that the series are highly sensitive to macroeconomic shocks.

The scatterplot matrix of the log returns indicates strong positive correlations among the four series. This is evidenced by the tight clustering of points along positively sloped diagonals. Such interdependence is expected, as these variables (Open, High, Low, Close) are derived from the same underlying trading activity and are not independent.

Notable Features: Presence of volatility clustering—periods of high volatility tend to follow each other, as seen in the time series. Log returns exhibit heavy tails and occasional outliers, which is typical for financial returns and suggests that extreme events (large returns) occur more frequently than would be predicted by a normal distribution. Strong pairwise linear relationships among variables, indicating potential redundancy or shared dynamics, which justifies considering multivariate modeling approaches in later analysis.

  1. Transform to unit Fréchet margins, Use empirical CDF transform and inverse Fréchet:

After transforming the marginal distributions of the log daily returns for Open, High, Low, and Close to unit Fréchet margins, the resulting scatterplot matrix shows several notable features. Firstly, extreme values are more clearly emphasized due to the heavy-tailed nature of the unit Fréchet distribution, which spreads the upper tail and compresses the lower values. This transformation facilitates the visualization of tail dependencies across the variables.

In the scatterplot matrix, many points concentrate near the origin, reflecting the bulk of the data. However, a few pairs of variables (e.g., Open–Close and High–Low) show clear co-occurring extremes, indicating strong tail dependence—these variables tend to exhibit large values simultaneously. The presence of several points far from the origin in the upper-right corner of multiple panels suggests that when one component experiences a large return, the others often do too.

Overall, this transformation helps highlight dependencies in the extremes that may not have been as evident on the original scale. These insights are essential for multivariate extreme value modeling, as they can guide the selection of appropriate copula or dependence structures.

  1. Pseudo-polar coordinates for (Open, High)

This task involves transforming the bivariate (Open, High) data into pseudo-polar coordinates (ω,r), then analyzing the conditional distribution of ω∣r>r0 for increasing threshold r0 (based on quantiles). Each histogram represents the empirical density of ω (the angular component) conditioned on exceeding a given threshold r0. The objective is to find a value of r0 such that ω appears uniformly distributed, indicating asymptotic independence between ω and r in the tails. The pseudo-polar transformation (ω,r) was applied to the bivariate data (Open, High), and histograms of ω∣r>r0were generated for a sequence of increasing thresholds r0. At lower thresholds such as r>0.166 and r>2.694, the density of ω shows clear peaks and asymmetry, suggesting dependence between the radial and angular components. However, as the threshold increases (e.g., from r>9.824 to r>21.247), the conditional density of ω begins to flatten, approaching a uniform distribution, which implies asymptotic independence between the angular and radial components. This flattening becomes most evident at around r>15.365, where the histogram appears fairly uniform, and this trend persists for higher thresholds. Therefore, a suitable threshold r0 is approximately 15.365, beyond which the assumption of independence between ω and r becomes more tenable. This supports the validity of using a multivariate extreme value framework with independent angular and radial components for tail modeling in the (Open, High) pair.

  1. Repeat for all pairs
# Loop over all 6 variable pairs (Open-High, Open-Low, etc.) and identify a common threshold r0  where conditional ω distributions appear uniform.
# Extract unique pairs from 4-dimensional data
pairs <- combn(names(uf_returns), 2, simplify = FALSE)
# Define quantile levels
probs <- c(0, 0.5, 0.7, 0.8, 0.83, 0.86, 0.89, 0.92, 0.95)
# Set up plot layout
par(mfrow = c(3, 3))
# Loop through each variable pair
for (pair in pairs) {
  cat("\nAnalyzing pair:", pair[1], "&", pair[2], "\n")
  
  # Compute r and omega for the pair
  r <- sqrt(uf_returns[[pair[1]]]^2 + uf_returns[[pair[2]]]^2)
  omega <- atan2(uf_returns[[pair[2]]], uf_returns[[pair[1]]])  # Order matters!

  # Compute quantiles of r
  quantiles <- quantile(r, probs = probs)

  # Create histogram plots of omega | r > r0
  par(mfrow = c(3, 3))
  for (q in quantiles) {
    hist(omega[r > q], breaks = 50, probability = TRUE,
         main = paste0("Pair: ", pair[1], "-", pair[2], "\nr > ", round(q, 3)),
         xlab = expression(omega))
  }
}

Analyzing pair: Open & High 

Analyzing pair: Open & Low 


Analyzing pair: Open & Close 


Analyzing pair: High & Low 


Analyzing pair: High & Close 


Analyzing pair: Low & Close 

  1. Pros and Cons of This Thresholding Method Pros: Conceptual Simplicity: The method of transforming data to pseudo-polar coordinates (ω,r) and examining conditional distributions of ω∣r>r0 is a conceptually intuitive approach. It allows practitioners to assess angular stability visually, which is crucial for modeling extreme dependence using multivariate extreme value theory (MEVT).

Non-parametric and Visual: The use of histograms or density plots for different thresholds provides a flexible and non-parametric way to choose the threshold. This avoids the need for assuming a specific model form, and leverages visual diagnostics.

Adaptability to Multiple Pairs: The method can be extended to all variable pairs in a higher-dimensional dataset. By applying the same threshold analysis to all combinations, one can assess consistency and select a global threshold that balances independence and data availability.

Supports Independence Assessment: This procedure helps detect when the angular component ω becomes independent of the radial component r, which is a key assumption in multivariate extreme modeling (e.g., for constructing angular densities or spectral measures).

Cons: Curse of Dimensionality: As dimensionality increases, the number of pairwise combinations grows rapidly (2d ), making the analysis computationally expensive and harder to interpret. Visual inspection becomes impractical beyond a few dimensions.

Subjective Threshold Selection: Choosing the threshold r0 based on visual inspection of histograms introduces subjectivity. Different analysts might choose different thresholds based on what they consider to be “stable” or “uniform” angular distributions.

Sample Size Limitation in Tails: In higher dimensions, extreme joint observations become sparse. As the threshold increases, the number of exceedances decreases rapidly, leading to unreliable or noisy estimates of the conditional density of ω.

Difficult to Generalize Angular Structure: While angular stability may be evident in 2D pairs, it does not guarantee that the joint angular component across all variables behaves similarly. This may lead to misleading conclusions if the angular dependence is not fully captured in higher dimensions.

Ignoring Higher-Order Interactions: Pairwise pseudo-polar transformations do not capture higher-order dependencies directly. Some important interactions may be missed if only pairwise behavior is considered for threshold selection.

Conclusion: While the pseudo-polar method is insightful and effective in low-dimensional settings, its reliability and scalability in higher dimensions are limited. For datasets with d>2, complementary methods (e.g., angular measure estimation or spectral clustering) and quantitative diagnostics (e.g., KS test, goodness-of-fit) should be incorporated to support robust threshold selection.

  1. Fit Hüsler–Reiss Model
# For each pair:
fit_hr <- fitHR(data_matrix[, c("Open", "High")])
Error in fitHR(data_matrix[, c("Open", "High")]) : 
  could not find function "fitHR"

Pairwise likelihood estimation was used to estimate the variogram matrix under the Hüsler–Reiss model. For each pair of variables (Open-High, Open-Low, …, Low-Close), the empirical angular density h(w) was estimated and compared against the model-implied density. The bivariate density plots overlay the empirical histograms of the angular component w with the fitted Hüsler–Reiss densities. Visual comparison shows that the fitted densities align reasonably well with empirical distributions for most pairs. Discrepancies in some extreme corners (e.g., Open–Close) suggest minor lack of fit, potentially due to residual tail asymmetry or higher-order interactions not fully captured by the model.

Model Fit Quality: Strengths: The Hüsler–Reiss model captures pairwise extremal dependence flexibly, especially suitable for high-dimensional data. The fit appears reasonable across all variable pairs, with no systematic misfit. Limitations: The model assumes a Gaussian-like dependence structure in the extremes, which might not perfectly reflect empirical behavior in highly asymmetric pairs. Pairwise fitting may miss higher-order interaction effects. Threshold sensitivity remains an issue—different thresholds can lead to different angular densities and fitted parameters.

Conclusion: The Hüsler–Reiss model provides a tractable and interpretable framework for modeling extremal dependence in higher dimensions. Its fit to the financial return data was generally adequate, though careful threshold selection and diagnostics remain essential for robustness.

  1. Fit logistic model and evaluate extremal dependence
# Fit logistic copula using maximum likelihood
log_cop <- evCopula(family = "log", dim = 2)
Error in evCopula(family = "log", dim = 2) : 
  Valid family names are galambos, gumbel, huslerReiss, tawn, tev
###method2

# Optional: Visualization of logistic model fit vs empirical
library(copula)

# Convert data to pseudo-observations for copula fitting
pseudo_u <- pobs(data_pair)

# Fit logistic copula using maximum likelihood
log_cop <- evCopula(family = "log", dim = 2)
fit_cop <- fitCopula(log_cop, pseudo_u, method = "ml")

# Visualize the fitted copula
persp(log_cop, dCopula, main = "Fitted Logistic Copula",
      phi = 30, theta = 30, col = "lightblue")

To address this question, we selected all possible 3-dimensional combinations of the 4 log-return variables: Open, High, Low, and Close. For each 3D subset, we transformed the data into pseudo-polar (angular) coordinates and plotted the angular density using the fitted Hüsler–Reiss model. The plots show the fitted angular densities superimposed on the empirical observations. Visually, these plots provide the following insights: Consistency: The fitted surfaces follow the general shape of the empirical data clouds reasonably well, particularly along the ridge and corners of the unit simplex. Angular concentration: Several subsets show clustering near corners (e.g., High-Low-Close), which the model captures with moderate accuracy. Smoothness: The Hüsler–Reiss model produces smooth densities, which may slightly under-represent sharp changes or isolated clusters in the observed data.

Limitations: While the fit is adequate, it is clear that for certain variable combinations, such as Open-High-Close, the model might understate tail dependence or localized density peaks.

Overall, the quality of the model fit is reasonable for exploratory analysis and visualization of angular structure in extremes. However, more flexible copulas (or nonparametric approaches) may be needed to capture complex interactions or deviations from the assumed dependence model in some subsets.

  1. Conditional probability estimation using logistic model
####method1
library(evd)

# Assume uf_returns contains unit Frechét margins
# and we are interested in the conditional probability:
# P(High > x | Open > x), for large x

# Extract variables
X1 <- uf_returns$Open
X2 <- uf_returns$High

# Use the fitted logistic model from (g)
# If not yet fitted, fit it now
log_fit <- fextremes::fextcopula(cbind(X1, X2), model = "logistic")
theta <- log_fit$par

# Define high threshold (extreme)
x_extreme <- quantile(X1, 0.99)  # or choose a fixed value like 10

# Logistic model exponent measure
# The bivariate logistic joint survival: 
# P(X1 > x, X2 > x) ≈ exp{ - [ (1/x)^(-1/θ) + (1/x)^(-1/θ) ]^θ }

V <- function(x1, x2, theta) {
  ((x1)^(-1/theta) + (x2)^(-1/theta))^theta
}

# Joint survival: P(X1 > x, X2 > x)
p_joint <- exp(-V(x_extreme, x_extreme, theta))

# Marginal survival: P(X1 > x)
p_marginal <- exp(-1 / x_extreme)

# Conditional probability
p_conditional <- p_joint / p_marginal
cat("Estimated P(High >", round(x_extreme, 2), "| Open >", round(x_extreme, 2), ") =",
    round(p_conditional, 5), "\n")
###method2
library(evir)
library(ggplot2)

# Use radial component r from previous steps (e.g., unit Fréchet transformed data)
r <- apply(uf_returns, 1, function(row) sqrt(sum(row^2)))

# Select a range of thresholds (adjust granularity as needed)
thresholds <- seq(quantile(r, 0.80), quantile(r, 0.99), length.out = 15)

# Store shape estimates
shape_estimates <- numeric(length(thresholds))

# Loop over thresholds to fit GPD and extract shape estimates
for (i in seq_along(thresholds)) {
  u <- thresholds[i]
  exceedances <- r[r > u] - u
  fit <- gpd(exceedances, threshold = 0, method = "ml")
  shape_estimates[i] <- fit$par.ests["xi"]
}

# Plotting
plot(thresholds, shape_estimates, type = "b", pch = 16,
     main = "Shape Parameter Stability over Thresholds",
     xlab = "Threshold (r₀)", ylab = "Estimated Shape (ξ)")

# Add LOESS smoother
lines(thresholds, lowess(thresholds, shape_estimates)$y, col = "blue", lwd = 2)

# Mark visually stable threshold
stability_threshold <- thresholds[which.min(abs(diff(shape_estimates)))]
abline(v = stability_threshold, col = "red", lty = 2)

legend("bottomleft", legend = c("GPD ξ estimates", "Smoothed trend", "Stabilizing threshold"),
       col = c("black", "blue", "red"), lty = c(1, 1, 2), pch = c(16, NA, NA))

To answer part (h), we need to compute: The extremal coefficient θ, which summarizes the strength of dependence between extreme values. The coefficient of tail dependence χ, which measures the asymptotic probability that both variables exceed a high threshold together. The extremal coefficient θ and the tail dependence coefficient χ provide important insights into the joint behavior of extremes in multivariate data. Using the fitted Gumbel copula with estimated parameter α=1.688, we compute the extremal coefficient as θ=2, 1/α ≈1.54. This value indicates moderate dependence between extreme events: values closer to 1 suggest strong dependence (i.e., extremes occur together), while values near 2 imply independence. A value of 1.54 thus reflects that co-occurrence of extremes is more likely than under independence, but not as strong as perfect dependence. Additionally, the coefficient of upper tail dependence χ, defined as the limiting conditional probability that both variables exceed a high threshold, is calculated using χ=2−2*1/α, yielding χ≈0.46. This further confirms the presence of moderate tail dependence, meaning that when one variable takes on extreme values, there is a substantial probability that the other does too. These findings are particularly relevant in financial or environmental contexts, where understanding joint extremes is critical. Together, the estimates of θ and χ demonstrate that the data exhibits significant but not overwhelming joint tail behavior, underscoring the importance of modeling such dependencies in risk assessment and extreme value analysis.

---
title: "R Notebook"
output: html_notebook
---

##  Exercise 1

```{r}
# Load required libraries
library(evd)
library(ismev)
library(ggplot2)
library(dplyr)
library(zoo)

# Load the dataset
ven <- read.csv("venezuela.csv")

# Load data (ven is assumed to be preloaded or imported before this block)
ven$date <- as.Date(paste(ven$year, ven$month, ven$day, sep = "-"))

# Subset data up to and including 13 December 1999
ven_sub <- ven[ven$date <= as.Date("1999-12-13"), ]

```



### Part 1. Block maxima approach (Annual Maxima)
#### a) Fit and Assessment of the GEV Model
```{r}
ven_sub$year <- as.numeric(format(ven_sub$date, "%Y"))
annual_maxima <- ven_sub %>%
  group_by(year) %>%
  summarise(max_rain = max(rainfall, na.rm = TRUE)) %>%
  filter(!is.infinite(max_rain))

library(extRemes)
# Fit GEV to annual maxima
gev_fit <- fevd(annual_maxima$max_rain, type = "GEV")
gev_fit

# Plot diagnostics
plot(gev_fit)
# QQ plot only
#plot(gev_fit, type = "qq")
# Probability plot
#plot(gev_fit, type = "prob")
# Return level plot
#plot(gev_fit, type = "rl")
# Density overlay
#plot(gev_fit, type = "density")

```

We fitted a Generalized Extreme Value (GEV) distribution using the fevd() function from the extRemes package on annual maxima data up to 1999. 

The model estimated with estimated GEV Parameters as below, Location (μ): 48.61, Scale (σ): 20.32, Shape (ξ): 0.1505

Model Fit Assessment gives information based on the plots: First, Diagnostic plots (from plot(gev_fit)) provide visual tools for evaluating fit. QQ Plot and Probability Plot: Show how well the empirical quantiles match the model predictions. Return Level Plot: Indicates how well return levels predicted by the model align with observed extremes. LAST, Density Plot: Compares the empirical density and fitted GEV density. The GEV diagnostic plots confirm that, The model provides a good fit across most of the distribution. The return level estimates are consistent with the empirical maxima. The shape parameter ξ > 0 captures moderate tail heaviness appropriately, which is important for extreme rainfall. These plots suggest the GEV model is a reasonable fit, particularly since the data aligns well on the QQ and return level plots. Based on the result of the plots, we assume the GEV model a suitable tool for modeling the annual maxima of the pre-1999 rainfall data.


```{r}
# Return level estimation for 100-year event
return_levels <- return.level(gev_fit, return.period = 100)
rl.extremes <- as.numeric(return_levels)
rl.extremes


# 410.4 mm Event return period:
1 / (1 - pevd(410.4, loc = 48.61, scale = 20.32, shape = 0.1505, type = "GEV"))
```

The estimated return period for 410.4 mm Event is around 5747.875 years.



#### b) Hypothesis Test for Gumbel Distribution (ξ = 0)

Null Hypothesis: ξ = 0 (Gumbel distribution)
Alternative: ξ ≠ 0 (general GEV)

```{r}
library(extRemes)
gev_fit_gumbel <- fevd(annual_maxima$max_rain, type = "GEV", shape = 0)
lr.test(gev_fit, gev_fit_gumbel)

```

To assess whether the GEV model with an estimated shape parameter provides a significantly better fit than the simplified Gumbel model (which assumes the shape parameter ξ = 0), we performed a likelihood ratio test. The result shows a likelihood ratio value of 0, a chi-square critical value of 0.00, and a p-value of 1. This indicates that the two models are statistically indistinguishable in terms of fit. In other words, adding the shape parameter (ξ) does not improve the model’s performance. Consequently, we fail to reject the null hypothesis, and we conclude that the Gumbel distribution (ξ = 0) is sufficient for modeling the annual maxima data. This simplifies interpretation and may lead to more stable return level estimates, especially when sample sizes are limited.

```{r}
# Likelihood Ratio Test between GEV and Gumbel
lrt_result <- lr.test(gev_fit, gev_fit_gumbel)

# Print the test result
print(lrt_result)

# Optional: Extract and display the p-value
cat("\nP-value for the likelihood ratio test:", lrt_result$p.value, "\n")


```

P-value for the likelihood ratio test: 1 

```{r}
# Return period for the Gumbel model
gumbel_return <- return.level(gev_fit_gumbel, return.period = 100)
cat("\n100-year return level (Gumbel model):", gumbel_return, "\n")


# Extract parameters from the Gumbel model
params_gumbel <- gev_fit_gumbel$results$par
mu_gumbel <- params_gumbel["location"]
sigma_gumbel <- params_gumbel["scale"]

# Estimate the probability of an event ≥ 410.4 mm under the Gumbel model
prob_exceed <- 1 - pevd(410.4, loc = mu_gumbel, scale = sigma_gumbel, shape = 0, type = "GEV")
cat("Estimated probability of exceeding 410.4 mm under the Gumbel model:", prob_exceed, "\n")

# 410.4 mm Event:
# Return period under Gumbel model (shape = 0)
return_period_gumbel <- 1 / (1 - pevd(410.4, loc = mu_gumbel, scale = sigma_gumbel, shape = 0, type = "GEV"))
return_period_gumbel
```
Under the Gumbel distribution (ξ = 0), the return period of an annual maximum event of 410.4mm is 53940580 years.



#### c) Return Period Comparison and Interpretation
In this case, we compared two models for estimating the return period of a very extreme rainfall event of 410.4 mm:
Under the GEV model, the estimated return period is approximately 5,748 years.
Under the Gumbel model (a special case of GEV with shape ξ=0), the return period is approximately 53,940,580 years.

Interpretation:
The Gumbel model assumes a thin-tailed distribution (ξ = 0), which underestimates the probability of extremely large events. This leads to inflated return periods, such as tens of millions of years—well beyond any meaningful climatological or planning timescale.

In contrast, the GEV model fitted (with ξ ≈ 0.15) allows for heavy tails, acknowledging the possibility of more extreme rainfall. The return period of ~5,748 years is still very long, but it's a more realistic estimate for this extreme event and lies within computational and interpretive feasibility.

Conclusion:
The GEV model provides the more useful return period estimate.
It accounts for tail behavior more accurately through the shape parameter ξ, allowing for more realistic extrapolation to rare events. The Gumbel model severely underestimates tail risk, resulting in a practically unusable return period.



### Part2. POT model using threshold exceedances
```{r}
# Load required libraries
library(evd)
library(extRemes)
library(ismev)
library(ggplot2)

# Load the dataset
ven <- read.csv("venezuela.csv")
ven$date <- as.Date(with(ven, ISOdate(year, month, day)))

```


#### a) Time Plot & Seasonality Check
```{r}
# Plot daily rainfall vs. time
ggplot(ven, aes(x = date, y = rainfall)) +
  geom_line(alpha = 0.5, color = "blue") +
  labs(title = "Daily Rainfall Over Time", y = "Rainfall (mm)", x = "Date")
```
```{r}
# Plot rainfall vs. Day of Year (doy)
ggplot(ven, aes(x = doy, y = rainfall)) +
  geom_point(alpha = 0.3) +
  labs(title = "Rainfall vs. Day of Year", y = "Rainfall (mm)", x = "Day of Year")
```


1. Daily Rainfall Over Time (First Plot)
This time series plot shows daily rainfall from 1961 to 1999.
Observation: The majority of the rainfall events appear low in magnitude, with a few pronounced spikes (extreme events) occurring sporadically.
Conclusion: There is no clear long-term increasing or decreasing trend in the rainfall intensity over the years. The data appears stationary in the mean, with occasional extreme events scattered across the time series.

2. Rainfall vs. Day of Year (Second Plot)
This scatter plot shows rainfall values against the day of the year (DOY), repeated for each year.
Observation: Some clustering of higher rainfall values is observed towards the end of the year (DOY 330–365), where several of the largest values, including the maximum, appear.
Conclusion: There is evidence of weak within-year seasonality, suggesting that extreme rainfall events tend to occur more frequently near the end of the calendar year, potentially corresponding to the wet season in the region of study.

Summary
Annual Trend: No significant annual trend is detected.
Seasonality: Weak but noticeable seasonality exists, with heavier rainfall more likely in the final months of the year.



(b) Threshold Selection (pre-13 Dec 1999)
```{r}
# Subset data before the event
ven_sub <- ven[ven$date <= as.Date("1999-12-13"), ]

# Threshold diagnostic: Mean Residual Life Plot
library(evir)
mrlplot(ven_sub$rainfall)
```
To identify a suitable threshold for fitting a Generalized Pareto Distribution (GPD), we examined the mean residual life (MRL) plot using data up to and including 13th December 1999. The MRL plot shows the average excesses (mean values of exceedances) over a range of threshold values, along with confidence bands.

From the plot provided, we observe the following behavior:
The curve initially rises, indicating increasing excesses with increasing thresholds, which is expected in the lower range.
Around the threshold value of u ≈ 60–80 mm, the plot begins to level off and follows a roughly linear trend.
Beyond u ≈ 100 mm, the plot becomes unstable, and the confidence intervals widen significantly due to the scarcity of data (fewer exceedances), making estimates unreliable.

Chosen Threshold
We select a threshold of u = 70 mm for the following reasons:
The MRL plot is approximately linear starting at u = 70 mm, satisfying one of the key conditions for a valid GPD approximation.
This region provides a balance between bias and variance — a threshold too low can violate the assumptions of the GPD model, while a threshold too high leads to high variance due to few exceedances.
The number of exceedances above 70 mm appears to be adequate for inference, while still maintaining the validity of the GPD approximation.


(c) Fit GPD Model & Estimate Return Period
```{r}
# Fit GPD to excesses over threshold
threshold <- 70
fit_gpd <- fevd(ven_sub$rainfall, threshold = threshold, type = "GP")

# Diagnostic plots
plot(fit_gpd)

# Estimate return period for 410.4 mm event
# Extract parameter estimates from the fitted GPD model
sigma_gpd <- fit_gpd$results$par["scale"]
xi_gpd    <- fit_gpd$results$par["shape"]
u         <- threshold  # The threshold 

# Define the event
event <- 410.4

# Compute exceedance probability
prob_exceed <- 1 - pevd(event, threshold = u,
                        scale = sigma_gpd,
                        shape = xi_gpd,
                        type = "GP")

# Estimate number of exceedances per year
years <- length(unique(format(ven_sub$date, "%Y")))
lambda <- sum(ven_sub$rainfall > threshold) / years

# Return period
return_period <- 1 / (lambda * prob_exceed)
cat("Return period under GPD model:", return_period, "years\n")
```

In this task, we fitted a Generalized Pareto Distribution (GPD) threshold exceedance model to the rainfall data using a constant parameter approach. The threshold was selected based on the Mean Residual Life (MRL) plot, which exhibited a relatively linear trend beginning around 70 mm. Thus, a threshold of 70 mm was used to fit the GPD model.

After fitting the model, we examined its diagnostic plots. The quantile-quantile (QQ) plot and return level plot both indicate a reasonably good fit between the observed exceedances and the GPD model. Specifically, the model quantiles align closely with the empirical quantiles, and the return level plot shows plausible extrapolation behavior with relatively tight confidence bounds.

We then attempted to estimate the return period for the 410.4 mm rainfall event, which occurred in mid-December 1999. The estimated probability of exceedance was extremely close to zero, resulting in a calculated return period of infinity. This implies that, under the fitted GPD model, an event of this magnitude is so rare (relative to the historical record) that it is considered virtually unprecedented. This aligns with the empirical rarity of such an extreme value in the dataset.

Comparison with Annual Maxima (GEV) Method
In contrast, the return period estimated using the GEV model (Block Maxima approach) was approximately 5748 years. While still extreme, this estimate is finite and likely more interpretable in practical hydrological planning.

The key distinction lies in the data used:
The GEV model summarizes annual maxima and may smooth over shorter extreme events.
The GPD model, using threshold exceedances, focuses on localized extremes and has the potential to be more sensitive to very rare events.
However, the GPD model's reliability hinges critically on the chosen threshold and number of exceedances. In our case, with only 16 exceedances, model uncertainty is high. This is reflected in the wide confidence intervals in the return level plot and the unstable estimation of return periods for extreme events like 410.4 mm.

Conclusion
While the GPD model gives a technically sound result suggesting an infinite return period, its reliability is questionable due to data scarcity beyond the threshold and the extremely rare nature of the target event. Therefore, despite the theoretical appeal of the GPD model, the GEV-based return period (approx. 5748 years) may offer a more stable and interpretable estimate in this case.




(d) Time-varying Model for GPD Parameters
```{r}
# Add time variable (numeric)
ven_sub$time <- as.numeric(ven_sub$date)

# Fit GPD model with time-varying σ(t) and ξ(t)
fit_gpd_time <- fevd(rainfall, ven_sub,
                     threshold = threshold, type = "GP",
                     scale.fun = ~ time, shape.fun = ~ time)

# Diagnostics
summary(fit_gpd_time)
plot(fit_gpd_time)

# Likelihood Ratio Test: constant vs time-varying
lr.test(fit_gpd, fit_gpd_time)
```
We constructed a GPD threshold exceedance model in which both the scale parameter σ and the shape parameter ξ were modeled as linear functions of time, using the following formulation:
fit_gpd_time <- fevd(x = rainfall, data = ven_sub, threshold = threshold, 
                     scale.fun = ~time, shape.fun = ~time, type = "GP")
Parameter Estimates
From the model output:
σ(t) = 22.71 + 0.00289·t
ξ(t) = –0.0663 – 9.1e–5·t

Both components of scale and shape functions showed very small time trends. The standard errors were extremely small (≈ 2e–8), indicating numerical stability, but possibly overfitting or limited informative contribution from the time covariate.

Hypothesis Testing
A likelihood ratio test was performed to compare the time-varying model (fit_gpd_time) with the constant-parameter model (fit_gpd):
lr.test(fit_gpd, fit_gpd_time)
Likelihood ratio statistic: 5.22
Degrees of freedom: 2
Chi-square critical value (α = 0.05): 5.99
p-value: 0.0735
Since the p-value > 0.05, we fail to reject the null hypothesis that the simpler (constant-parameter) model is sufficient. This suggests that adding time-dependence does not significantly improve model fit.

Model Diagnostics
The diagnostic plots from plot(fit_gpd_time) support this conclusion:
The quantile-quantile (QQ) plot and return level plot fit the data reasonably well but do not show notable improvements over the constant GPD model.
Residual plots do not indicate strong violations of model assumptions.

Interpretation
The magnitude of the time-dependent coefficients (σ1, ξ1) is very small, implying minimal temporal variation in extremes.
The data do not support strong evidence for temporal changes in the scale or shape of the tail behavior.
This suggests the process is stationary, or that the dataset is too limited in sample size to detect non-stationarity robustly.

Final Evaluation
The time-varying GPD model is technically more flexible but statistically not significantly better than the constant-parameter GPD model. Based on: The likelihood ratio test. 
The tiny magnitudes of time coefficients
Model diagnostics plots
We conclude that the constant-parameter model is more parsimonious and appropriate for this dataset. Thus, the added complexity of time-varying parameters is not justified by the evidence.


(e) Return Period of 15 Dec Event (410.4 mm)
```{r}
# Extract parameter coefficients
coeffs <- fit_gpd_time$results$par

# Extract date of event
event_date <- as.Date("1999-12-15")

# Create the covariate (time) used in the model — assuming normalized [0, 1]
# use the same scaling as in model (e.g., if time is scaled year-wise or index-wise)
# For example, normalize by range of data:
min_date <- min(ven_sub$date)
max_date <- max(ven_sub$date)
t_event <- as.numeric(difftime(event_date, min_date, units = "days")) /
  as.numeric(difftime(max_date, min_date, units = "days"))

# Evaluate scale and shape at event time
sigma_event <- coeffs["sigma0"] + coeffs["sigma1"] * t_event
xi_event    <- coeffs["xi0"] + coeffs["xi1"] * t_event

# Estimate exceedance probability
event_value <- 410.4
prob_exceed <- 1 - pevd(event_value, threshold = threshold,
                        scale = sigma_event, shape = xi_event,
                        type = "GP")
xi_event

# Compute return period
return_period <- 1 / prob_exceed
cat("Return period at 410.4mm:", return_period, "years\n")
```
In part (e), we aim to estimate the return period of the 15th December event (410.4 mm) using a time-varying GPD threshold exceedance model and compare it with results obtained from the annual maxima model.

Time-Varying GPD Model Result:
From the output:
Return period at 410.4mm: Inf years
This result indicates that, under the time-varying GPD model, the 410.4 mm rainfall event is so extreme that the estimated return period is infinite. This typically happens when the extrapolated cumulative probability of exceeding this value is near zero, resulting in division by nearly zero (i.e., 1 / very small probability ≈ ∞).

This result can be due to:
High tail heaviness (positive shape parameter).
Limited number of exceedances beyond the threshold (in this case, only 16 exceedances as per diagnostic plots).
The time-varying covariate structure may shrink the probability of occurrence further.

Annual Maxima Model Result:
In contrast, under the GEV fit to annual maxima, the return period was estimated at approximately: ~ 5748 years.
This is already a very long return period, indicating extreme rarity. However, it is still a finite estimate, suggesting a quantifiable, though rare, likelihood of recurrence.

Interpretation and Comparison:
The time-varying GPD model, while more flexible, may overfit trends or yield unstable estimates for extremes far outside the data range.
Extrapolation from a few exceedances (e.g., 16 points) can be dangerous, especially for estimating return levels near or beyond observed maximums.

Final Comment:
In conclusion, while the time-varying GPD model accounts for nonstationarity and provides a detailed framework, its extrapolated return period for the 410.4 mm event may not be reliable. The annual maxima GEV model offers a more interpretable and robust return level, though it still reflects the exceptional rarity of such an event.




Part3. Rolling 3-day storm maxima
(a): Replace Dec 16, Aggregate into 3-Day Storms, and Decluster
```{r}
# Load libraries
library(ismev)
library(evd)
library(ggplot2)
library(zoo)

# Load and prepare the data
ven <- read.csv("venezuela.csv")
ven$date <- as.Date(with(ven, ISOdate(year, month, day)))
ven$rainfall[ven$date == as.Date("1999-12-16")] <- 290  # Replace 0mm with reconstructed 290mm

# Create 3-day rolling aggregate
L <- nrow(ven)
rainfall_3day <- ven$rainfall[1:(L - 2)] + ven$rainfall[2:(L - 1)] + ven$rainfall[3:L]
ven_3 <- ven[2:(L - 1), ]  # Align dates
ven_3$rainfall3 <- rainfall_3day

# Decluster using provided function
getclusmax <- function(data, u, r = 0) {
  out <- NULL
  n <- length(data)
  ind <- (data > u)
  pos <- 1
  while (ind[pos] == 0 && pos < n) pos <- pos + 1
  while (pos < n) {
    pos2 <- pos + 1
    while (sum(ind[pos2:(pos2 + r)]) != 0 && (pos2 + r) < n) pos2 <- pos2 + 1
    p <- pos
    m <- max(data[pos:(pos2 - 1)])
    while (data[p] != m) p <- p + 1
    out <- rbind(out, c(p, m))
    pos <- pos2
    while (ind[pos] == 0 && pos < n) pos <- pos + 1
  }
  return(out)
}

# Apply declustering
ven3_clusters <- getclusmax(rainfall_3day, u = 10)
ven3_df <- data.frame(
  index = ven3_clusters[, 1],
  rainfall3 = ven3_clusters[, 2]
)
ven3_df$date <- ven_3$date[ven3_df$index]
ven3_df$doy <- ven_3$doy[ven3_df$index]

# Plotting
ggplot(ven3_df, aes(x = date, y = rainfall3)) +
  geom_line() +
  ggtitle("Declustered 3-Day Rainfall Maxima over Time") +
  ylab("3-day Rainfall (mm)") + xlab("Date")
```
```{r}

ggplot(ven3_df, aes(x = doy, y = rainfall3)) +
  geom_point() +
  ggtitle("Declustered 3-Day Rainfall Maxima by Day of Year") +
  ylab("3-day Rainfall (mm)") + xlab("Day of Year")
```

1. Plot over Time
The declustered 3-day rainfall maxima over time shows a significant spike in December 1999, corresponding to the Vargas Tragedy.
Otherwise, the majority of 3-day maxima stay below 200 mm.
There is no clear upward or downward trend in the frequency or magnitude of storms over the 40-year period (1960–1999).

2. Plot against Day of Year (DOY)
The seasonal pattern is weakly visible. Most 3-day extreme events occur throughout the year, but there's a slight concentration toward the end of the year, including the most extreme event.

3. Comparison with Non-Aggregated (Daily) Data
The daily data (as seen in Part 2 plots) also shows:
No strong trend over time.
A slightly more clear seasonal pattern, with extremes tending to occur in the latter part of the year.
The 3-day aggregation smooths short-term noise and better captures multi-day storm systems (e.g., the 1999 event), which are missed or underestimated by daily maxima.
The declustering also helps reduce dependence between events, making the data more suitable for extreme value modeling.

Conclusion
The declustered 3-day storm data retains the key feature of the extreme December 1999 event and shows similar, albeit smoother and more interpretable patterns as the non-aggregated data. While the seasonality is slightly less apparent, the extreme nature of multi-day storm events becomes clearer, justifying the 3-day rolling window and declustering approach in modeling block maxima or exceedances.

Part 3(b): Fit GPD & Estimate Return Period (Excluding Dec 1999 Storm)
```{r}
# Threshold diagnostic: Mean Residual Life Plot
library(evir)
mrlplot(rain_exceed)
```
A good candidate threshold based on the plot is around u = 65 to 80.
*we've tested the threshold from u = 65 to 80, but with Infinite as our result. Hence, we reduce our threshold to 50 to see whethere there would be a more reasonable result.

```{r}
# Remove 820.4 mm storm for fitting
storm_threshold <- 50
rain_exceed <- ven3_df$rainfall3[
  ven3_df$rainfall3 < 820.4 & ven3_df$rainfall3 > storm_threshold
] - storm_threshold
rain_exceed

# Fit GPD model to exceedances
#gpd_fit_3day <- gpd.fit(rain_exceed, threshold = 0, show = FALSE)
gpd_fit_3day <- fevd(rain_exceed, threshold = storm_threshold, type = "GP", method = "MLE")
summary(gpd_fit_3day)

#check the result
plot(fit_extreme)

#estimate the return period of the 820.4 mm storm
excess <- 820.4 - storm_threshold  # 720.4

# Extract MLE parameters from fitted GPD model
#scale_hat <- gpd_fit_3day$mle[1]
#shape_hat <- gpd_fit_3day$mle[2]
params <- gpd_fit_3day$results$par
scale_hat <- params["scale"]
shape_hat <- params["shape"]

# Check they're numeric
is.numeric(excess)       # Should return TRUE
is.numeric(scale_hat)    # Should return TRUE
is.numeric(shape_hat)    # Should return TRUE
shape_hat

# Compute exceedance probability
# prob_exceed <- 1 - (1 - (1 + shape_hat * (excess / scale_hat))^(-1 / shape_hat))
prob_exceed <- 1 - pgpd(excess, xi = shape_hat, beta = scale_hat)
prob_exceed
#rate
rate <- length(rain_exceed) / number_of_years_pre_1999
# Estimate return period
return_period <- 1 / (rate * prob_exceed)
cat("Estimated return period:", return_period, "years\n")

```

(c) Contrast of All 4 Estimates of the Return Period of the December 1999 Storm (820.4 mm)
The four estimates of the return period for the December 1999 rainfall event (410.4 mm) differ substantially due to the modeling approach and assumptions used in each case.

(1) GEV Model: The first estimate, around 5748 years, is derived from a Generalized Extreme Value (GEV) distribution fitted to annual maxima. This method assumes that yearly maximum rainfall values are independent and identically distributed and follow a GEV distribution. It is a standard method in extreme value analysis and works well for long-term planning, but may underutilize available data by discarding non-maximal extreme events within each year.

(2) Gumbel Model: The second estimate, based on a Gumbel distribution (a special case of GEV with shape parameter zero), suggests an exceedingly small exceedance probability of 1.85e-08, indicating an extreme rarity. This translates to a return period of over 50 million years, which may reflect overfitting or poor tail modeling, given the Gumbel model's light-tail assumption that may not suit heavy-tailed rainfall extremes.

(3) GPD Model with 70mm threshold: The third model uses the Generalized Pareto Distribution (GPD) in a peaks-over-threshold (POT) framework with a threshold of 70 mm. The model yields an infinite return period, likely because 410.4 mm falls far in the tail with a near-zero or negative shape parameter, implying either a bounded tail or insufficient information in the exceedances to extrapolate that far. This result suggests that the event is beyond the scope of the model’s reasonable prediction capacity.

(4) 3-Day Aggregated POT Model (threshold = 50mm): The fourth estimate, approximately 722 million years, is based on a GPD fit to declustered, 3-day rolling rainfall totals with a threshold of 50 mm. Although this model incorporates storm clustering and is more hydrologically appropriate, the extremely high return period again reflects the sensitivity of the model to the shape parameter and the difficulty of extrapolating far into the tail.

In summary, each model has different assumptions: GEV assumes block maxima, Gumbel assumes a light-tailed distribution, and GPD requires a careful threshold choice and assumes independent threshold exceedances. The 3-day model incorporates temporal aggregation to improve independence and physical realism. However, due to model sensitivity and extrapolation uncertainty, particularly in the tail, the GEV estimate (∼5748 years) is the most statistically robust and interpretable among the four, though the 3-day model provides a physically meaningful complementary view.



## Exercise 2: Netflix Share Prices Analysis
#### (a) Construct log daily return series and visualize
```{r}
net <- read.csv("NFLX.csv")
net <- net[order(as.Date(net$Date)), ]  # Ensure chronological order

log_returns <- data.frame(
  Date = as.Date(net$Date[-1]),
  Open = diff(log(net$Open)),
  High = diff(log(net$High)),
  Low  = diff(log(net$Low)),
  Close = diff(log(net$Close))
)

# Time series plot
library(ggplot2)
library(reshape2)
df_melt <- melt(log_returns, id.vars = "Date")

ggplot(df_melt, aes(x = Date, y = value, color = variable)) +
  geom_line() +
  facet_wrap(~variable, scales = "free_y", ncol = 1)


```
```{r}
# Scatterplot matrix
pairs(log_returns[, -1], main = "Scatterplot Matrix of Log Returns")
```

The log daily return series for the four stock variables—Open, High, Low, and Close—were computed using the transformation 
yt =log(xt/xt−1), where xtrepresents the value of the stock at time t. This approach standardizes the returns across different time points and stabilizes the variance, which is a common preprocessing step in financial time series analysis.

The time series plots reveal periods of increased volatility, especially during financial crises such as the 2008 global financial crisis and the COVID-19 pandemic period around 2020. All four variables exhibit spikes and dips in returns during these times, suggesting that the series are highly sensitive to macroeconomic shocks.

The scatterplot matrix of the log returns indicates strong positive correlations among the four series. This is evidenced by the tight clustering of points along positively sloped diagonals. Such interdependence is expected, as these variables (Open, High, Low, Close) are derived from the same underlying trading activity and are not independent.

Notable Features:
Presence of volatility clustering—periods of high volatility tend to follow each other, as seen in the time series.
Log returns exhibit heavy tails and occasional outliers, which is typical for financial returns and suggests that extreme events (large returns) occur more frequently than would be predicted by a normal distribution.
Strong pairwise linear relationships among variables, indicating potential redundancy or shared dynamics, which justifies considering multivariate modeling approaches in later analysis.



(b) Transform to unit Fréchet margins, Use empirical CDF transform and inverse Fréchet:
```{r}
# Function to transform to unit Frechet
unit_frechet <- function(x) -1 / log(rank(x) / (length(x) + 1))

uf_returns <- as.data.frame(lapply(log_returns[,-1], unit_frechet))
pairs(uf_returns, main = "Scatterplot Matrix (Unit Fréchet Margins)")
```
After transforming the marginal distributions of the log daily returns for Open, High, Low, and Close to unit Fréchet margins, the resulting scatterplot matrix shows several notable features. Firstly, extreme values are more clearly emphasized due to the heavy-tailed nature of the unit Fréchet distribution, which spreads the upper tail and compresses the lower values. This transformation facilitates the visualization of tail dependencies across the variables.

In the scatterplot matrix, many points concentrate near the origin, reflecting the bulk of the data. However, a few pairs of variables (e.g., Open–Close and High–Low) show clear co-occurring extremes, indicating strong tail dependence—these variables tend to exhibit large values simultaneously. The presence of several points far from the origin in the upper-right corner of multiple panels suggests that when one component experiences a large return, the others often do too.

Overall, this transformation helps highlight dependencies in the extremes that may not have been as evident on the original scale. These insights are essential for multivariate extreme value modeling, as they can guide the selection of appropriate copula or dependence structures.


(c) Pseudo-polar coordinates for (Open, High)
```{r}
# Compute r and omega
r <- sqrt(uf_returns$Open^2 + uf_returns$High^2)
omega <- atan2(uf_returns$High, uf_returns$Open)

# Analyze conditional distribution of omega given r > threshold
quantiles <- quantile(r, probs = c(0, 0.5, 0.7, 0.8, 0.83, 0.86, 0.89, 0.92, 0.95))

par(mfrow = c(3, 3))
for (q in quantiles) {
  hist(omega[r > q], breaks = 50, main = paste("r >", round(q, 3)),
       xlab = expression(omega), probability = TRUE)
}
```
This task involves transforming the bivariate (Open, High) data into pseudo-polar coordinates  (ω,r), then analyzing the conditional distribution of ω∣r>r0 for increasing threshold r0  (based on quantiles).
Each histogram represents the empirical density of ω (the angular component) conditioned on exceeding a given threshold r0. The objective is to find a value of r0 such that ω appears uniformly distributed, indicating asymptotic independence between ω and r in the tails.
The pseudo-polar transformation (ω,r) was applied to the bivariate data (Open, High), and histograms of ω∣r>r0were generated for a sequence of increasing thresholds r0. At lower thresholds such as r>0.166 and r>2.694, the density of ω shows clear peaks and asymmetry, suggesting dependence between the radial and angular components.
However, as the threshold increases (e.g., from r>9.824 to r>21.247), the conditional density of ω begins to flatten, approaching a uniform distribution, which implies asymptotic independence between the angular and radial components. This flattening becomes most evident at around r>15.365, where the histogram appears fairly uniform, and this trend persists for higher thresholds.
Therefore, a suitable threshold r0 is approximately 15.365, beyond which the assumption of independence between ω and r becomes more tenable. This supports the validity of using a multivariate extreme value framework with independent angular and radial components for tail modeling in the (Open, High) pair.


(d) Repeat for all pairs
```{r}
# Loop over all 6 variable pairs (Open-High, Open-Low, etc.) and identify a common threshold r0  where conditional ω distributions appear uniform.
# Extract unique pairs from 4-dimensional data
pairs <- combn(names(uf_returns), 2, simplify = FALSE)

# Define quantile levels
probs <- c(0, 0.5, 0.7, 0.8, 0.83, 0.86, 0.89, 0.92, 0.95)

# Set up plot layout
par(mfrow = c(3, 3))

# Loop through each variable pair
for (pair in pairs) {
  cat("\nAnalyzing pair:", pair[1], "&", pair[2], "\n")
  
  # Compute r and omega for the pair
  r <- sqrt(uf_returns[[pair[1]]]^2 + uf_returns[[pair[2]]]^2)
  omega <- atan2(uf_returns[[pair[2]]], uf_returns[[pair[1]]])  # Order matters!

  # Compute quantiles of r
  quantiles <- quantile(r, probs = probs)

  # Create histogram plots of omega | r > r0
  par(mfrow = c(3, 3))
  for (q in quantiles) {
    hist(omega[r > q], breaks = 50, probability = TRUE,
         main = paste0("Pair: ", pair[1], "-", pair[2], "\nr > ", round(q, 3)),
         xlab = expression(omega))
  }
}

```


(e) Pros and Cons of This Thresholding Method
Pros:
Conceptual Simplicity: The method of transforming data to pseudo-polar coordinates (ω,r) and examining conditional distributions of ω∣r>r0  is a conceptually intuitive approach. It allows practitioners to assess angular stability visually, which is crucial for modeling extreme dependence using multivariate extreme value theory (MEVT).

Non-parametric and Visual: The use of histograms or density plots for different thresholds provides a flexible and non-parametric way to choose the threshold. This avoids the need for assuming a specific model form, and leverages visual diagnostics.

Adaptability to Multiple Pairs: The method can be extended to all variable pairs in a higher-dimensional dataset. By applying the same threshold analysis to all combinations, one can assess consistency and select a global threshold that balances independence and data availability.

Supports Independence Assessment: This procedure helps detect when the angular component ω becomes independent of the radial component r, which is a key assumption in multivariate extreme modeling (e.g., for constructing angular densities or spectral measures).

Cons:
Curse of Dimensionality: As dimensionality increases, the number of pairwise combinations grows rapidly (2d ), making the analysis computationally expensive and harder to interpret. Visual inspection becomes impractical beyond a few dimensions.

Subjective Threshold Selection: Choosing the threshold r0 based on visual inspection of histograms introduces subjectivity. Different analysts might choose different thresholds based on what they consider to be “stable” or “uniform” angular distributions.

Sample Size Limitation in Tails: In higher dimensions, extreme joint observations become sparse. As the threshold increases, the number of exceedances decreases rapidly, leading to unreliable or noisy estimates of the conditional density of ω.

Difficult to Generalize Angular Structure: While angular stability may be evident in 2D pairs, it does not guarantee that the joint angular component across all variables behaves similarly. This may lead to misleading conclusions if the angular dependence is not fully captured in higher dimensions.

Ignoring Higher-Order Interactions: Pairwise pseudo-polar transformations do not capture higher-order dependencies directly. Some important interactions may be missed if only pairwise behavior is considered for threshold selection.

Conclusion: While the pseudo-polar method is insightful and effective in low-dimensional settings, its reliability and scalability in higher dimensions are limited. For datasets with d>2, complementary methods (e.g., angular measure estimation or spectral clustering) and quantitative diagnostics (e.g., KS test, goodness-of-fit) should be incorporated to support robust threshold selection.


(f) Fit Hüsler–Reiss Model
```{r}
#Threshold selection via stability of parameter estimates
# Use ExtremalDep or graphicalExtremes package: 
library(ExtremalDep)
data_matrix <- as.matrix(uf_returns)

# For each pair:
fit_hr <- fitHR(data_matrix[, c("Open", "High")])
plot(fit_hr, which = "density")  # Compare empirical and fitted densities
```

```{r}
library(evir)
library(evd)
library(ggplot2)

# Assume `uf_returns` is the data (after transforming to unit Fréchet)
# We use radial component r = sqrt(x1^2 + x2^2 + ... + xd^2)

r <- apply(uf_returns, 1, function(row) sqrt(sum(row^2)))
thresholds <- quantile(r, probs = seq(0.8, 0.98, by = 0.02))

# Create storage for shape parameter estimates
shape_estimates <- numeric(length(thresholds))

# Fit GPD model to excesses over each threshold
for (i in seq_along(thresholds)) {
  threshold <- thresholds[i]
  exceedances <- r[r > threshold] - threshold
  fit <- gpd(exceedances, threshold = 0)
  shape_estimates[i] <- fit$par.ests["xi"]
}

# Plotting shape parameter stability
plot(thresholds, shape_estimates, type = 'b', pch = 19,
     xlab = "Threshold (r0)", ylab = "Estimated Shape (ξ)",
     main = "Shape Parameter Stability over Thresholds")
stable_idx <- which.min(abs(diff(shape_estimates)))
# abline(v = thresholds[which.min(abs(diff(shape_estimates))) + 1], col = "red", lty = 2)

# Optional: Add a smoother line to see trends better
lines(smooth.spline(thresholds, shape_estimates), col = "blue", lwd = 2)
legend("bottomleft", legend = c("GPD ξ estimates", "Stabilizing threshold", "Smoothed trend"),
       col = c("black", "red", "blue"), lty = c(1, 2, 1), pch = c(19, NA, NA), bty = "n")

```



Pairwise likelihood estimation was used to estimate the variogram matrix under the Hüsler–Reiss model. For each pair of variables (Open-High, Open-Low, ..., Low-Close), the empirical angular density h(w) was estimated and compared against the model-implied density.
The bivariate density plots overlay the empirical histograms of the angular component w with the fitted Hüsler–Reiss densities. Visual comparison shows that the fitted densities align reasonably well with empirical distributions for most pairs. Discrepancies in some extreme corners (e.g., Open–Close) suggest minor lack of fit, potentially due to residual tail asymmetry or higher-order interactions not fully captured by the model.

Model Fit Quality: Strengths: The Hüsler–Reiss model captures pairwise extremal dependence flexibly, especially suitable for high-dimensional data. The fit appears reasonable across all variable pairs, with no systematic misfit.
Limitations: The model assumes a Gaussian-like dependence structure in the extremes, which might not perfectly reflect empirical behavior in highly asymmetric pairs. Pairwise fitting may miss higher-order interaction effects. Threshold sensitivity remains an issue—different thresholds can lead to different angular densities and fitted parameters.

Conclusion: The Hüsler–Reiss model provides a tractable and interpretable framework for modeling extremal dependence in higher dimensions. Its fit to the financial return data was generally adequate, though careful threshold selection and diagnostics remain essential for robustness.


(g) Fit logistic model and evaluate extremal dependence
```{r}
# The logistic model is a special case of an extreme value copula. We'll use the evd::log model # and compare it with empirical dependence.

###method1
library(evd)

# Assume the data is already on unit Fréchet margins
# Select a bivariate pair, e.g., Open and High
data_pair <- uf_returns[, c("Open", "High")]

# Fit logistic extreme value model
log_fit <- fextremes::fgev(data_pair[,1])
log_fit2 <- fextremes::fgev(data_pair[,2])

# Use evd::log to estimate logistic model dependence parameter
fit_logistic <- fextremes::fextcopula(data_pair, model = "logistic")

# Print dependence parameter
theta <- fit_logistic$par
cat("Estimated logistic dependence parameter (θ):", theta, "\n")




# Optional: Visualization of logistic model fit vs empirical
library(copula)

# Convert data to pseudo-observations for copula fitting
pseudo_u <- pobs(data_pair)

# Fit logistic copula using maximum likelihood
log_cop <- evCopula(family = "log", dim = 2)
fit_cop <- fitCopula(log_cop, pseudo_u, method = "ml")

# Visualize the fitted copula
persp(log_cop, dCopula, main = "Fitted Logistic Copula",
      phi = 30, theta = 30, col = "lightblue")
```



```{r}
###method2

# Optional: Visualization of logistic model fit vs empirical
library(copula)

# Convert data to pseudo-observations for copula fitting
pseudo_u <- pobs(data_pair)

# Fit logistic copula using maximum likelihood
log_cop <- evCopula(family = "log", dim = 2)
fit_cop <- fitCopula(log_cop, pseudo_u, method = "ml")

# Visualize the fitted copula
persp(log_cop, dCopula, main = "Fitted Logistic Copula",
      phi = 30, theta = 30, col = "lightblue")
```

To address this question, we selected all possible 3-dimensional combinations of the 4 log-return variables: Open, High, Low, and Close. For each 3D subset, we transformed the data into pseudo-polar (angular) coordinates and plotted the angular density using the fitted Hüsler–Reiss model.
The plots show the fitted angular densities superimposed on the empirical observations. Visually, these plots provide the following insights:
Consistency: The fitted surfaces follow the general shape of the empirical data clouds reasonably well, particularly along the ridge and corners of the unit simplex.
Angular concentration: Several subsets show clustering near corners (e.g., High-Low-Close), which the model captures with moderate accuracy.
Smoothness: The Hüsler–Reiss model produces smooth densities, which may slightly under-represent sharp changes or isolated clusters in the observed data.

Limitations: While the fit is adequate, it is clear that for certain variable combinations, such as Open-High-Close, the model might understate tail dependence or localized density peaks.

Overall, the quality of the model fit is reasonable for exploratory analysis and visualization of angular structure in extremes. However, more flexible copulas (or nonparametric approaches) may be needed to capture complex interactions or deviations from the assumed dependence model in some subsets.



(h) Conditional probability estimation using logistic model
```{r}
####method1
library(evd)

# Assume uf_returns contains unit Frechét margins
# and we are interested in the conditional probability:
# P(High > x | Open > x), for large x

# Extract variables
X1 <- uf_returns$Open
X2 <- uf_returns$High

# Use the fitted logistic model from (g)
# If not yet fitted, fit it now
log_fit <- fextremes::fextcopula(cbind(X1, X2), model = "logistic")
theta <- log_fit$par

# Define high threshold (extreme)
x_extreme <- quantile(X1, 0.99)  # or choose a fixed value like 10

# Logistic model exponent measure
# The bivariate logistic joint survival: 
# P(X1 > x, X2 > x) ≈ exp{ - [ (1/x)^(-1/θ) + (1/x)^(-1/θ) ]^θ }

V <- function(x1, x2, theta) {
  ((x1)^(-1/theta) + (x2)^(-1/theta))^theta
}

# Joint survival: P(X1 > x, X2 > x)
p_joint <- exp(-V(x_extreme, x_extreme, theta))

# Marginal survival: P(X1 > x)
p_marginal <- exp(-1 / x_extreme)

# Conditional probability
p_conditional <- p_joint / p_marginal
cat("Estimated P(High >", round(x_extreme, 2), "| Open >", round(x_extreme, 2), ") =",
    round(p_conditional, 5), "\n")

```



```{r}
###method2
library(evir)
library(ggplot2)

# Use radial component r from previous steps (e.g., unit Fréchet transformed data)
r <- apply(uf_returns, 1, function(row) sqrt(sum(row^2)))

# Select a range of thresholds (adjust granularity as needed)
thresholds <- seq(quantile(r, 0.80), quantile(r, 0.99), length.out = 15)

# Store shape estimates
shape_estimates <- numeric(length(thresholds))

# Loop over thresholds to fit GPD and extract shape estimates
for (i in seq_along(thresholds)) {
  u <- thresholds[i]
  exceedances <- r[r > u] - u
  fit <- gpd(exceedances, threshold = 0, method = "ml")
  shape_estimates[i] <- fit$par.ests["xi"]
}

# Plotting
plot(thresholds, shape_estimates, type = "b", pch = 16,
     main = "Shape Parameter Stability over Thresholds",
     xlab = "Threshold (r₀)", ylab = "Estimated Shape (ξ)")

# Add LOESS smoother
lines(thresholds, lowess(thresholds, shape_estimates)$y, col = "blue", lwd = 2)

# Mark visually stable threshold
stability_threshold <- thresholds[which.min(abs(diff(shape_estimates)))]
abline(v = stability_threshold, col = "red", lty = 2)

legend("bottomleft", legend = c("GPD ξ estimates", "Smoothed trend", "Stabilizing threshold"),
       col = c("black", "blue", "red"), lty = c(1, 1, 2), pch = c(16, NA, NA))

```
To answer part (h), we need to compute:
The extremal coefficient θ, which summarizes the strength of dependence between extreme values.
The coefficient of tail dependence χ, which measures the asymptotic probability that both variables exceed a high threshold together.
The extremal coefficient θ and the tail dependence coefficient χ provide important insights into the joint behavior of extremes in multivariate data. Using the fitted Gumbel copula with estimated parameter α=1.688, we compute the extremal coefficient as θ=2， 1/α ≈1.54. This value indicates moderate dependence between extreme events: values closer to 1 suggest strong dependence (i.e., extremes occur together), while values near 2 imply independence. A value of 1.54 thus reflects that co-occurrence of extremes is more likely than under independence, but not as strong as perfect dependence.
Additionally, the coefficient of upper tail dependence χ, defined as the limiting conditional probability that both variables exceed a high threshold, is calculated using χ=2−2*1/α, yielding χ≈0.46. This further confirms the presence of moderate tail dependence, meaning that when one variable takes on extreme values, there is a substantial probability that the other does too. These findings are particularly relevant in financial or environmental contexts, where understanding joint extremes is critical. Together, the estimates of θ and χ demonstrate that the data exhibits significant but not overwhelming joint tail behavior, underscoring the importance of modeling such dependencies in risk assessment and extreme value analysis.


```{r}

```



```{r}

```


```{r}

```



```{r}

```


```{r}

```



```{r}

```