Analysis and modeling of snowmelt data for the prediction of Glacier lake outburst flood

Arpan Dutta (MD2305)

Under the Supervision of Prof. Sarbani Palit

Endterm Project Presentation

Introduction

  • Glacial Lake Outburst Floods (GLOFs) occur when rapid snowmelt weakens natural lake dams, leading to sudden flooding.
  • This study examines how climatic factors (temperature, humidity, pressure) and topographical features (elevation, slope, aspect) influence snowmelt.
  • The goal is to develop a more accurate predictive model for flood risk assessment and disaster preparedness.

Basic Theory of Copulas

Definition (Copula)

For every \(d\geq 4\), a d dimentional copula \(\mathcal{C} : [0,1]^d\rightarrow[0,1]\) is a multivariate distribution function on \([0,1]^d\) with uniformly distributed marginals.

Let the random vector \((X_1, X_2,\ldots, X_d )\) exist with marginal cdf’s \(F_i(x) = P(X_i\leq x)\). Using the probability integral transformation the vector \((U_1, \ldots,U_d )\) = \((F_1(x_1),\ldots ,F_d(x_d))\) has uniform marginal, Then the copula of \((X1, \ldots, X_d )\) is the joint cdf of \((U_1,\ldots, U_d )\). Namely, it is \(\mathcal{C}_\theta(u_1,\ldots,u_d) = P[U_1\leq u_1,\ldots,U_d\leq u_d ]\). where \(\theta\) is the parameter values of copula.

it has some properties:

  • \(C(x_1, \ldots, x_d)\) is non-decreasing in each component \(x_i\).
  • \(C(1,\ldots,1, x_i,1,\ldots,1) = x_i\) for all \(i \in \{1,...,n\}\), \(x_i \in [0,1]\).
  • For \(a_i \leq b_i\), the probability \(P[U_1 \in [a_1, b_1] ,...,U_d \in [a_d,b_d]]\) must be non-negetive.

Sklar’s Theorem (1959)

Consider a d dimensional CDF, F, with pdf f and with marginals \(F_1, F_2, \ldots, F_d\) Then there exists a copula, \(\mathcal{C}\), such that,

\[F(x_1,x_2,\ldots,x_d)= \mathcal{C}_\theta(F_1(x_1), F_1(x_2),\ldots,F_1(x_d))\] For all \(x_i\in\mathbb{R}\) and \(i = 1,2,\ldots,d\)

If \(F_i\) is continuous for all \(i = 1,2,3,\ldots, d\), then \(\mathcal{C}\) is unique. If any distribution is discrete, then C is uniquely defined on the support of the joint distribution.

On the otherhand, consider a copula, \(\mathcal C\) and univariate CDF’s \(F_1,F_2,\ldots,F_d\). Then \(F\) as defined as before is a multivariate CDF with marginals \(F_1,F_2,\ldots,F_d\).

Continued

The joint density function is obtained by,

\[ \begin{aligned} f(x_1,..., x_d) &= \frac{\partial ^d F(x_1, ..., x_d)}{\partial x_1 \cdots \partial x_d} \\ &= \frac{\partial^d C_\theta(F_1(x_1),...,F_d(x_d))}{\partial x_1 \cdots \partial x_d} \\ &= \frac{\partial ^d C_\theta (F_1(x_1),...,F_d(x_d))}{\partial F_1(x_1) \cdots \partial F_d(x_d)} \cdot \frac{\partial F_1(x_1)}{\partial x_1} \cdots \frac{\partial F_d(x_d)}{\partial x_d} \\ &= c_\theta (F_1(x_1),...,F_d(x_d)) \cdot f_1(x_1) \cdots f_d(x_d) \end{aligned} \] Where \(c_\theta\) is multivariate density of the copula.

Estimating Copulas (Pseudo-Maximum Likelihood Estimation)

  • Suppose, \(\mathbf{Y} = (Y_1,\ldots,Y_d)\) is a random vector, We see that the density of \(\mathbf{Y}\) is given by:

\[ f_{\mathbf{Y}}(\mathbf{y}) = f_Y(y_1, \dots, y_d) = c_Y(F_{Y_1} (y_1), \dots, F_{Y_d} (y_d)) \prod_{j=1}^{d} f_{Y_j} (y_j). \]

  • We then obtain the log-likelihood as,

\[ \begin{aligned} log L(\theta_1, ..., \theta_d, \theta_C) &= \log \prod_{i=1}^{n} f_Y(y_i) \\ &= \sum_{i=1}^{n} \Bigg( \log c_Y \Big( F_{Y_1}(y_{i,1} \mid \theta_1), \ldots, F_{Y_d}(y_{i,d} \mid \theta_d) \mid \theta_C \Big) \quad \\ &+ \log f_{Y_1}(y_{i,1} \mid \theta_1) + \dots + \log f_{Y_d}(y_{i,d} \mid \theta_d) \Bigg) \end{aligned} \]

Data Description

The dataset is sourced from NASA Giovanni with a daily temporal resolution and \(0.01^\circ\) spatial resolution. It combines static topographical variables derived from DEM data at a 30m spatial resolution and dynamic climatic variables that vary over time. The dataset includes Snow Melt data (kg \(m^{-2} s^{-1}\)) recorded from 15th March 2022 to 15th June 2022 (93 days). The data was processed in QGIS, resulting in 2025 spatial points for model analysis.

Variables Overview

Static Variables Dynamic Variables
Elevation Snow Melt
Slope Specific Humidity
Aspect Surface Air Temperature
Surface Radiative Temperature
Surface Pressure

Exploratory Data Analysis

Scatter plot for 50 th day

Scatterplot of variables on 50 th day

Histogram of 50th day

Assumed Distribution

Variable Assumed Distribution Skewness Variable Assumed Distribution Skewness
Snowmelt Exponential 1.36 RadTemp Weibull -0.63
Humidity Normal 0.16 Elevation Gamma 0.49
Air Temp Weibull -0.39 Slope Normal -0.30
Pressure Normal -0.24 Aspect Normal 0.11

Variable Selection

Choice of Copulas

  • We will examine two elliptical copulas Gaussian copula and the t-copula, while the Archimedean copulas are the Joe copula, the Clayton copula, and the Gumbel copula.

  • Fitting the data

The table bellow shows the AIC values for the given copula fittted for the data. The copula with least AIC values are considered as the best fit copula for the data.

AIC values for different copulas on various days
Day Normal t Joe Clayton Gumbel
10 -22836.39 -22917.27 -12.6755 -611.2854 -212.1033
20 -22099.68 -22105.65 -8.9146 -558.2436 -170.5951
30 -32569.72 -32734.67 -60.6516 -784.6610 -309.8161
40 -27376.37 -26612.30 -33.2967 -654.6281 -231.0704
50 -24961.45 -25031.70 -67.2194 -1136.8968 -465.9788
60 -19693.16 -19725.62 -57.3981 -997.0749 -409.1935
70 -24042.41 -24129.75 -63.2418 -976.6298 -405.6140
80 - - -63.8057 -1030.9356 -428.8872
90 -23313.31 -24218.83 -69.2516 -1205.9109 -491.2298

Continued

Variable Combinations AIC
humidity, Air Temp, pressure, radtemp, Elevation, slope, aspect -25419.45
humidity, Air Temp, pressure, radtemp, Elevation, aspect -25210.43
humidity, Air Temp, pressure, radtemp, Elevation, slope -25181.98
humidity, Air Temp, pressure, radtemp, Elevation -24977.63
humidity, Air Temp, pressure, radtemp, slope, aspect -22852.87
humidity, Air Temp, pressure, radtemp, aspect -22741.06
humidity, Air Temp, pressure, radtemp, slope -22676.84
humidity, Air Temp, pressure, radtemp -22562.03
humidity, Air Temp, pressure, Elevation, slope, aspect -18986.30
humidity, Air Temp, pressure, Elevation, slope -18823.07
humidity, Air Temp, pressure, Elevation, aspect -18784.57
Air Temp, pressure, radtemp, Elevation, slope, aspect -18722.30
humidity, Air Temp, pressure, Elevation -18629.39
Air Temp, pressure, radtemp, Elevation, aspect -18527.43
Air Temp, pressure, radtemp, Elevation, slope -18484.03

OLS Prediction and Limitations

We first fit an Ordinary Least Squares (OLS) model to predict snowmelt. The figure below shows the predicted versus actual snowmelt values for Day 60.

Actual vs. Predicted Snowmelt for Day 60 Using Standard OLS

Construction of the Copula Based Model

  • we are trying to predict the response variable through conditional expectation given the covariates.

\[ \hat{x}_d = E[X_d \mid X_1 = x_1, \ldots, X_{d-1} = x_{d-1}] = \int_0^\infty x_d f(x_d \mid x_1, \ldots, x_{d-1}, R) \, dx_d \]

The copula-based approach involves two key steps:

  • Modeling Marginal Distributions: For each variable \(X_i\) we assumed specific marginal distributions.

  • Modeling Dependence with a Copula:

  1. The dependence structure is defined by a correlation matrix \(\mathbf{R}\) estimated from the data.

  2. Rank-order correlations (e.g., Spearman’s \(\rho\), Kendall’s \(\tau\)) are used because they are invariant to marginal distributions

  3. We convert these rank-order correlations into Pearson correlations using transformations.

\[ r_{ij} = \sin\left(\frac{\pi \tau_{ij}}{2}\right). \]

\[ r_{ij} = 2 \sin\left(\frac{\pi \rho_{ij}}{6}\right). \]

Construction of Gaussian copula

  • The joint density of \((X_1, \ldots, X_d)\) can be written as:

\[ f(x_1, \ldots, x_d \mid \mathbf{R}^*) = f_1(x_1) \times \cdots \times f_d(x_d) \times c_N(F_1(x_1), \ldots, F_d(x_d) \mid \mathbf{R}^*). \]

  • The conditional density of \((X_d \mid X_1, \ldots,X_{d-1} )\) can be written as:

\[ f(x_d \mid x_1 \ldots, x_{d-1}, \mathbf{R}^*) = f_d(x_d) \times \frac{ c_N(F_1(x_1), \ldots, F_d(x_d) \mid \mathbf{R}^*)}{c_N(F_1(x_1), \ldots, F_{d-1}(x_{d-1}) \mid \mathbf{R}^*)} \]

  • A \(d\)-dimensional normal distribution with mean zero and covariance matrix \(\mathbf{R}\):

\[ \phi^{(d)}(y_1, \ldots, y_d \mid \mathbf{R}) = \frac{1}{(2\pi)^{d/2} |\mathbf{R}|^{1/2}} \exp\left(-\frac{1}{2} \mathbf{y}^{\mathrm{T}} \mathbf{R}^{-1} \mathbf{y}\right), \]

Finding conditional copula density

  • the copula density is:

\[ c_N\left[\Phi(y_1), \ldots, \Phi(y_d) \mid \mathbf{R}^*\right] = \frac{\phi^{(d)}(y_1, \ldots, y_d \mid \mathbf{R})}{\phi(y_1) \times \cdots \times \phi(y_d)}. \]

  • Conditional copula density:

\[ \begin{align} &\frac{c_N(F_1(x_1), \ldots, F_d(x_d) \mid \mathbf{R}^*)}{c_N(F_1(x_1), \ldots, F_{d-1}(x_{d-1}) \mid \mathbf{R}^*)} \notag \\ &= \frac{\phi^{(d)}\left( \Phi^{-1}[F_1(x_1)], \ldots, \Phi^{-1}[F_d(x_d)] \mid \mathbf{R} \right)}{\phi\left( \Phi^{-1}[F_d(x_d)] \right) \times \phi^{(d-1)}\left( \Phi^{-1}[F_1(x_1)], \ldots, \Phi^{-1}[F_{d-1}(x_{d-1})] \mid \mathbf{R}_{d-1} \right)} \notag \\ &= \exp \left\{ -\frac{1}{2} \left[ \frac{\left( \Phi^{-1}[F_d(x_d)] - \mathbf{r}^{\mathrm{T}} \mathbf{R}_{d-1}^{-1} \mathbf{y}_{d-1} \right)^2}{1 - \mathbf{r}^{\mathrm{T}} \mathbf{R}_{d-1}^{-1} \mathbf{r}} - \left( \Phi^{-1}[F_d(x_d)] \right)^2 \right] \right\} \times \left( 1 - \mathbf{r}^{\mathrm{T}} \mathbf{R}_{d-1}^{-1} \mathbf{r} \right)^{-1/2}. \end{align} \]

  • Conditional distribution:

\[ \begin{aligned} f(x_d \mid x_1, \ldots, x_{d-1}, \mathbf{R}^*) &= f_d(x_d) \times \exp \left\{ -0.5 \left[ \frac{\left( \Phi^{-1}[F_d(x_d)] - \mathbf{r}^{\mathrm{T}} \mathbf{R}_{d-1}^{-1} \mathbf{y}_{d-1} \right)^2}{1 - \mathbf{r}^{\mathrm{T}} \mathbf{R}_{d-1}^{-1} \mathbf{r}} - \left( \Phi^{-1}[F_d(x_d)] \right)^2 \right] \right\} \\ &\quad \times \left( 1 - \mathbf{r}^{\mathrm{T}} \mathbf{R}_{d-1}^{-1} \mathbf{r} \right)^{-1/2}. \end{aligned} \]

\[ \hat{x}_d = E[X_d \mid X_1 = x_1, \ldots, X_{d-1} = x_{d-1}] = \int_0^\infty x_d f(x_d \mid x_1, \ldots, x_{d-1}, R, \nu) \, dx_d \]

We use numerical integration to evaluate this for each test data point.

Gaussian Copula Predictions Using Gaussian Copula (Kendall’s Tau)

Actual vs. Predicted Snowmelt for Day 60 Using Gaussian Copula (Kendall’s Tau)

Gaussian Copula Predictions Using Gaussian Copula (Spearman’s Rho)

Actual vs. Predicted Snowmelt for Day 60 Using Gaussian Copula (Spearman’s Rho)

Construction of the T-Copula

  • The joint density of \((X_1, \ldots, X_d)\) is:

\[ f(x_1, \ldots, x_d \mid R, \nu) = \prod_{i=1}^d f_i(x_i) \times c_t(F_1(x_1), \ldots, F_d(x_d) \mid R, \nu) \]

  • The conditional density of \((X_d \mid X_1, \ldots, X_{d-1})\) is:

\[ f(x_d \mid x_1, \ldots, x_{d-1}, R, \nu) = f_d(x_d) \times \frac{c_t(F_1(x_1), \ldots, F_d(x_d) \mid R, \nu)}{c_t(F_1(x_1), \ldots, F_{d-1}(x_{d-1}) \mid R, \nu)} \]

  • A \(d\)-dimensional t-distribution with mean zero, correlation matrix \(R\), and degrees of freedom \(\nu\):

\[ f_t^{(d)}(\mathbf{y} \mid R, \nu) = \frac{\Gamma\left(\frac{\nu + d}{2}\right)}{\Gamma\left(\frac{\nu}{2}\right) (\nu \pi)^{d/2} |R|^{1/2}} \left(1 + \frac{1}{\nu} \mathbf{y}^\top R^{-1} \mathbf{y}\right)^{-\frac{\nu + d}{2}} \]

Finding Conditional Copula Density

  • The t-copula density is:

\[ c_t(u_1, \ldots, u_d \mid R, \nu) = \frac{f_t^{(d)}(y_1, \ldots, y_d \mid R, \nu)}{\prod_{i=1}^d f_{t,\nu}(y_i)} \]

where \(y_i = t_\nu^{-1}(u_i)\), and the univariate t-density is:

\[ f_{t,\nu}(y_i) = \frac{\Gamma\left(\frac{\nu + 1}{2}\right)}{\Gamma\left(\frac{\nu}{2}\right) \sqrt{\nu \pi}} \left(1 + \frac{y_i^2}{\nu}\right)^{-\frac{\nu + 1}{2}} \]

  • Conditional copula density:

\[ \frac{c_t(F_1(x_1), \ldots, F_d(x_d) \mid R, \nu)}{c_t(F_1(x_1), \ldots, F_{d-1}(x_{d-1}) \mid R, \nu)} = \frac{f_t^{(d)}(y_1, \ldots, y_d \mid R, \nu)}{f_t^{(d-1)}(y_1, \ldots, y_{d-1} \mid R_{d-1}, \nu) \times f_{t,\nu}(y_d)} \]

  • This simplifies to the conditional density of \(Y_d \mid Y_1, \ldots, Y_{d-1}\), which is a univariate t-distribution.

Deriving the Conditional Distribution

  • Partition: \(\mathbf{y} = (\mathbf{y}_{d-1}, y_d)\), \(R = \begin{bmatrix} R_{d-1} & \mathbf{r} \\ \mathbf{r}^\top & 1 \end{bmatrix}\).

  • The conditional density is:

\[ f(y_d \mid \mathbf{y}_{d-1}, \nu) = \frac{f_t^{(d)}(\mathbf{y} \mid R, \nu)}{f_t^{(d-1)}(\mathbf{y}_{d-1} \mid R_{d-1}, \nu)} \]

  • This is a univariate t-distribution with:

    • Degrees of freedom: \(\nu + d - 1\)
    • Location: \(\mu_{\text{cond}} = \mathbf{r}^\top R_{d-1}^{-1} \mathbf{y}_{d-1}\)
    • Scale: \(\sigma_{\text{cond}} = \sqrt{\frac{\nu + \mathbf{y}_{d-1}^\top R_{d-1}^{-1} \mathbf{y}_{d-1}}{\nu + d - 1} (1 - \mathbf{r}^\top R_{d-1}^{-1} \mathbf{r})}\)
  • Conditional density of \(X_d\):

\[ f(x_d \mid x_1, \ldots, x_{d-1}, R, \nu) = f_d(x_d) \cdot \frac{f_{t,\nu + d - 1}\left( \frac{y_d - \mu_{\text{cond}}}{\sigma_{\text{cond}}} \right) / \sigma_{\text{cond}}}{f_{t,\nu}(y_d)} \]

where \(y_d = t_\nu^{-1}(F_d(x_d))\).

\[ \hat{x}_d = E[X_d \mid X_1 = x_1, \ldots, X_{d-1} = x_{d-1}] = \int_0^\infty x_d f(x_d \mid x_1, \ldots, x_{d-1}, R, \nu) \, dx_d \]

We use numerical integration to evaluate this for each test data point.

t-copula predictions for day 60

Actual vs. Predicted Snowmelt for Day 60 Using t-Copula

SSE Results

The table below compares the Sum of Squared Errors (SSE) for the four methods across the evaluated days. Lower SSE values indicate better predictive performance.

Day t-Copula Gaussian Copula
( \(\tau\))
Gaussian Copula
( \(\rho\))
GLM
(log-link)
Standard OLS
10 \(1.121\times10^{3}\) \(1.178\times10^{3}\) \(1.176\times10^{3}\) \(1.255\times10^{5}\) \(2.099\times10^{26}\)
20 \(7.005\times10^{3}\) \(6.613\times10^{3}\) \(6.613\times10^{3}\) \(6.711\times10^{5}\) \(1.007\times10^{27}\)
30 \(9.343\times10^{4}\) \(1.778\times10^{5}\) \(1.757\times10^{5}\) \(1.744\times10^{7}\) \(9.893\times10^{27}\)
40 \(5.160\times10^{4}\) \(7.277\times10^{4}\) \(7.244\times10^{4}\) \(7.488\times10^{6}\) \(7.921\times10^{27}\)
50 \(3.442\times10^{5}\) \(1.832\times10^{6}\) \(1.948\times10^{6}\) \(2.054\times10^{8}\) \(1.941\times10^{27}\)
60 \(2.004\times10^{5}\) \(1.321\times10^{6}\) \(1.261\times10^{6}\) \(1.113\times10^{8}\) \(1.986\times10^{28}\)
70 \(2.812\times10^{5}\) \(8.783\times10^{5}\) \(8.833\times10^{5}\) \(9.004\times10^{7}\) \(2.587\times10^{28}\)
80 \(5.043\times10^{5}\) \(1.866\times10^{6}\) \(1.754\times10^{6}\) \(1.718\times10^{8}\) \(8.879\times10^{28}\)
90 \(2.909\times10^{6}\) \(4.897\times10^{6}\) \(4.094\times10^{6}\) \(3.933\times10^{8}\) \(4.317\times10^{28}\)

Final model for prediction

  • Correlations between variables (excluding snowmelt) remain relatively stable across days, the correlation between snowmelt and other variables increases progressively

  • We averaged the correlation matrices over 93 days, capturing the overall dependency structure effectively.

  • We selected t-copula model for prediction due to its consistently lower SSE compared to other methods.

  • Parameters are the degrees of freedom, \(\nu = 4\) and the average dependency matrix.

Limitations and Conclusion

  • The t-copula model proved most effective for detecting large snowmelt events, outperforming GLM and OLS by accurately capturing tail dependencies and complex variable interactions critical for GLOF risk assessment.

  • We have data of 93 days(15 th March - 15 th June, 2022), the fitted t‑copula may not capture correlation patterns in other seasons.

References

Maina Samuel C., Mwigereri D., Weyn J., Mackey L., Ochieng M. (2023).”Evaluation of Dependency Structure for Multivariate Weather Predictors Using Copulas.” Available at: https://doi.org/10.1145/3616384

Parsa, Rahul A., Stuart A. Klugman. (2011). “Copula Regression.” Available at: https://variancejournal.org/article/84912

Nelsen Roger B. (2006). An Introduction to Copulas (2nd ed.), Springer Series in Statistics.

Hofert M., Kojadinovic I., Mächler M.,Yan J. (2018), Elements of Copula Modeling with R., Springer International Publishing AG

Haugh M., (2016). An introduction to copulas, IEOR E4602: Quantitative Risk Management (Spring 2016)

Joe, Harry (2014). . CRC Press.

Salvadori, G., De Michele, C., Kilsby, C. G., Chandler, R. E. (2007). “Estimating Return Periods of Extreme Hydrological Events via the Maxima of Stochastic Processes Selected by a Peaks-Over-Threshold Condition on Another Process.” , 43(11).

Genest, C., Favre, A. C. (2007). “Everything You Always Wanted to Know About Copula Modeling but Were Afraid to Ask.” , 12(4), 347–368.

Thank You