Analysis and modeling of snowmelt data for the prediction of Glacier lake outburst flood

Arpan Dutta (MD2305)

Under the Supervision of Prof. Sarbani Palit

Endterm Project Presentation

Introduction

Glacial Lake Outburst Floods (GLOFs) occur when rapid snowmelt weakens natural lake dams, leading to sudden flooding.
This study examines how climatic factors (temperature, humidity, pressure) and topographical features (elevation, slope, aspect) influence snowmelt.
The goal is to develop a more accurate predictive model for flood risk assessment and disaster preparedness.

Basic Theory of Copulas

Definition (Copula)

For every \(d\geq 4\), a d dimentional copula \(\mathcal{C} : [0,1]^d\rightarrow[0,1]\) is a multivariate distribution function on \([0,1]^d\) with uniformly distributed marginals.

Let the random vector \((X_1, X_2,\ldots, X_d )\) exist with marginal cdf’s \(F_i(x) = P(X_i\leq x)\). Using the probability integral transformation the vector \((U_1, \ldots,U_d )\) = \((F_1(x_1),\ldots ,F_d(x_d))\) has uniform marginal, Then the copula of \((X1, \ldots, X_d )\) is the joint cdf of \((U_1,\ldots, U_d )\). Namely, it is \(\mathcal{C}_\theta(u_1,\ldots,u_d) = P[U_1\leq u_1,\ldots,U_d\leq u_d ]\). where \(\theta\) is the parameter values of copula.

it has some properties:

\(C(x_1, \ldots, x_d)\) is non-decreasing in each component \(x_i\).
\(C(1,\ldots,1, x_i,1,\ldots,1) = x_i\) for all \(i \in \{1,...,n\}\), \(x_i \in [0,1]\).
For \(a_i \leq b_i\), the probability \(P[U_1 \in [a_1, b_1] ,...,U_d \in [a_d,b_d]]\) must be non-negetive.

Sklar’s Theorem (1959)

Consider a d dimensional CDF, F, with pdf f and with marginals \(F_1, F_2, \ldots, F_d\) Then there exists a copula, \(\mathcal{C}\), such that,

\[F(x_1,x_2,\ldots,x_d)= \mathcal{C}_\theta(F_1(x_1), F_1(x_2),\ldots,F_1(x_d))\] For all \(x_i\in\mathbb{R}\) and \(i = 1,2,\ldots,d\)

If \(F_i\) is continuous for all \(i = 1,2,3,\ldots, d\), then \(\mathcal{C}\) is unique. If any distribution is discrete, then C is uniquely defined on the support of the joint distribution.

On the otherhand, consider a copula, \(\mathcal C\) and univariate CDF’s \(F_1,F_2,\ldots,F_d\). Then \(F\) as defined as before is a multivariate CDF with marginals \(F_1,F_2,\ldots,F_d\).

Continued

The joint density function is obtained by,

\[ \begin{aligned} f(x_1,..., x_d) &= \frac{\partial ^d F(x_1, ..., x_d)}{\partial x_1 \cdots \partial x_d} \\ &= \frac{\partial^d C_\theta(F_1(x_1),...,F_d(x_d))}{\partial x_1 \cdots \partial x_d} \\ &= \frac{\partial ^d C_\theta (F_1(x_1),...,F_d(x_d))}{\partial F_1(x_1) \cdots \partial F_d(x_d)} \cdot \frac{\partial F_1(x_1)}{\partial x_1} \cdots \frac{\partial F_d(x_d)}{\partial x_d} \\ &= c_\theta (F_1(x_1),...,F_d(x_d)) \cdot f_1(x_1) \cdots f_d(x_d) \end{aligned} \] Where \(c_\theta\) is multivariate density of the copula.

Estimating Copulas (Pseudo-Maximum Likelihood Estimation)

Suppose, \(\mathbf{Y} = (Y_1,\ldots,Y_d)\) is a random vector, We see that the density of \(\mathbf{Y}\) is given by:

\[ f_{\mathbf{Y}}(\mathbf{y}) = f_Y(y_1, \dots, y_d) = c_Y(F_{Y_1} (y_1), \dots, F_{Y_d} (y_d)) \prod_{j=1}^{d} f_{Y_j} (y_j). \]

We then obtain the log-likelihood as,

\[ \begin{aligned} log L(\theta_1, ..., \theta_d, \theta_C) &= \log \prod_{i=1}^{n} f_Y(y_i) \\ &= \sum_{i=1}^{n} \Bigg( \log c_Y \Big( F_{Y_1}(y_{i,1} \mid \theta_1), \ldots, F_{Y_d}(y_{i,d} \mid \theta_d) \mid \theta_C \Big) \quad \\ &+ \log f_{Y_1}(y_{i,1} \mid \theta_1) + \dots + \log f_{Y_d}(y_{i,d} \mid \theta_d) \Bigg) \end{aligned} \]

Data Description

The dataset is sourced from NASA Giovanni with a daily temporal resolution and \(0.01^\circ\) spatial resolution. It combines static topographical variables derived from DEM data at a 30m spatial resolution and dynamic climatic variables that vary over time. The dataset includes Snow Melt data (kg \(m^{-2} s^{-1}\)) recorded from 15th March 2022 to 15th June 2022 (93 days). The data was processed in QGIS, resulting in 2025 spatial points for model analysis.

Variables Overview

Static Variables	Dynamic Variables
Elevation	Snow Melt
Slope	Specific Humidity
Aspect	Surface Air Temperature
	Surface Radiative Temperature
	Surface Pressure

Exploratory Data Analysis

Scatter plot for 50 th day

Scatterplot of variables on 50 th day

Histogram of 50th day

Assumed Distribution

Variable	Assumed Distribution	Skewness	Variable	Assumed Distribution	Skewness
Snowmelt	Exponential	1.36	RadTemp	Weibull	-0.63
Humidity	Normal	0.16	Elevation	Gamma	0.49
Air Temp	Weibull	-0.39	Slope	Normal	-0.30
Pressure	Normal	-0.24	Aspect	Normal	0.11

Variable Selection

Choice of Copulas

We will examine two elliptical copulas Gaussian copula and the t-copula, while the Archimedean copulas are the Joe copula, the Clayton copula, and the Gumbel copula.
Fitting the data

The table bellow shows the AIC values for the given copula fittted for the data. The copula with least AIC values are considered as the best fit copula for the data.

AIC values for different copulas on various days
Day	Normal	t	Joe	Clayton	Gumbel
10	-22836.39	-22917.27	-12.6755	-611.2854	-212.1033
20	-22099.68	-22105.65	-8.9146	-558.2436	-170.5951
30	-32569.72	-32734.67	-60.6516	-784.6610	-309.8161
40	-27376.37	-26612.30	-33.2967	-654.6281	-231.0704
50	-24961.45	-25031.70	-67.2194	-1136.8968	-465.9788
60	-19693.16	-19725.62	-57.3981	-997.0749	-409.1935
70	-24042.41	-24129.75	-63.2418	-976.6298	-405.6140
80	-	-	-63.8057	-1030.9356	-428.8872
90	-23313.31	-24218.83	-69.2516	-1205.9109	-491.2298

Continued

Variable Combinations	AIC
humidity, Air Temp, pressure, radtemp, Elevation, slope, aspect	-25419.45
humidity, Air Temp, pressure, radtemp, Elevation, aspect	-25210.43
humidity, Air Temp, pressure, radtemp, Elevation, slope	-25181.98
humidity, Air Temp, pressure, radtemp, Elevation	-24977.63
humidity, Air Temp, pressure, radtemp, slope, aspect	-22852.87
humidity, Air Temp, pressure, radtemp, aspect	-22741.06
humidity, Air Temp, pressure, radtemp, slope	-22676.84
humidity, Air Temp, pressure, radtemp	-22562.03
humidity, Air Temp, pressure, Elevation, slope, aspect	-18986.30
humidity, Air Temp, pressure, Elevation, slope	-18823.07
humidity, Air Temp, pressure, Elevation, aspect	-18784.57
Air Temp, pressure, radtemp, Elevation, slope, aspect	-18722.30
humidity, Air Temp, pressure, Elevation	-18629.39
Air Temp, pressure, radtemp, Elevation, aspect	-18527.43
Air Temp, pressure, radtemp, Elevation, slope	-18484.03

OLS Prediction and Limitations

We first fit an Ordinary Least Squares (OLS) model to predict snowmelt. The figure below shows the predicted versus actual snowmelt values for Day 60.

Actual vs. Predicted Snowmelt for Day 60 Using Standard OLS

Construction of the Copula Based Model

we are trying to predict the response variable through conditional expectation given the covariates.

\[ \hat{x}_d = E[X_d \mid X_1 = x_1, \ldots, X_{d-1} = x_{d-1}] = \int_0^\infty x_d f(x_d \mid x_1, \ldots, x_{d-1}, R) \, dx_d \]

The copula-based approach involves two key steps:

Modeling Marginal Distributions: For each variable \(X_i\) we assumed specific marginal distributions.
Modeling Dependence with a Copula:

The dependence structure is defined by a correlation matrix \(\mathbf{R}\) estimated from the data.
Rank-order correlations (e.g., Spearman’s \(\rho\), Kendall’s \(\tau\)) are used because they are invariant to marginal distributions
We convert these rank-order correlations into Pearson correlations using transformations.

\[ r_{ij} = \sin\left(\frac{\pi \tau_{ij}}{2}\right). \]

\[ r_{ij} = 2 \sin\left(\frac{\pi \rho_{ij}}{6}\right). \]

Construction of Gaussian copula

The joint density of \((X_1, \ldots, X_d)\) can be written as:

\[ f(x_1, \ldots, x_d \mid \mathbf{R}^*) = f_1(x_1) \times \cdots \times f_d(x_d) \times c_N(F_1(x_1), \ldots, F_d(x_d) \mid \mathbf{R}^*). \]

The conditional density of \((X_d \mid X_1, \ldots,X_{d-1} )\) can be written as:

\[ f(x_d \mid x_1 \ldots, x_{d-1}, \mathbf{R}^*) = f_d(x_d) \times \frac{ c_N(F_1(x_1), \ldots, F_d(x_d) \mid \mathbf{R}^*)}{c_N(F_1(x_1), \ldots, F_{d-1}(x_{d-1}) \mid \mathbf{R}^*)} \]

A \(d\)-dimensional normal distribution with mean zero and covariance matrix \(\mathbf{R}\):

\[ \phi^{(d)}(y_1, \ldots, y_d \mid \mathbf{R}) = \frac{1}{(2\pi)^{d/2} |\mathbf{R}|^{1/2}} \exp\left(-\frac{1}{2} \mathbf{y}^{\mathrm{T}} \mathbf{R}^{-1} \mathbf{y}\right), \]

Finding conditional copula density

the copula density is:

\[ c_N\left[\Phi(y_1), \ldots, \Phi(y_d) \mid \mathbf{R}^*\right] = \frac{\phi^{(d)}(y_1, \ldots, y_d \mid \mathbf{R})}{\phi(y_1) \times \cdots \times \phi(y_d)}. \]

Conditional copula density:

\[ \begin{align} &\frac{c_N(F_1(x_1), \ldots, F_d(x_d) \mid \mathbf{R}^*)}{c_N(F_1(x_1), \ldots, F_{d-1}(x_{d-1}) \mid \mathbf{R}^*)} \notag \\ &= \frac{\phi^{(d)}\left( \Phi^{-1}[F_1(x_1)], \ldots, \Phi^{-1}[F_d(x_d)] \mid \mathbf{R} \right)}{\phi\left( \Phi^{-1}[F_d(x_d)] \right) \times \phi^{(d-1)}\left( \Phi^{-1}[F_1(x_1)], \ldots, \Phi^{-1}[F_{d-1}(x_{d-1})] \mid \mathbf{R}_{d-1} \right)} \notag \\ &= \exp \left\{ -\frac{1}{2} \left[ \frac{\left( \Phi^{-1}[F_d(x_d)] - \mathbf{r}^{\mathrm{T}} \mathbf{R}_{d-1}^{-1} \mathbf{y}_{d-1} \right)^2}{1 - \mathbf{r}^{\mathrm{T}} \mathbf{R}_{d-1}^{-1} \mathbf{r}} - \left( \Phi^{-1}[F_d(x_d)] \right)^2 \right] \right\} \times \left( 1 - \mathbf{r}^{\mathrm{T}} \mathbf{R}_{d-1}^{-1} \mathbf{r} \right)^{-1/2}. \end{align} \]

Conditional distribution:

\[ \begin{aligned} f(x_d \mid x_1, \ldots, x_{d-1}, \mathbf{R}^*) &= f_d(x_d) \times \exp \left\{ -0.5 \left[ \frac{\left( \Phi^{-1}[F_d(x_d)] - \mathbf{r}^{\mathrm{T}} \mathbf{R}_{d-1}^{-1} \mathbf{y}_{d-1} \right)^2}{1 - \mathbf{r}^{\mathrm{T}} \mathbf{R}_{d-1}^{-1} \mathbf{r}} - \left( \Phi^{-1}[F_d(x_d)] \right)^2 \right] \right\} \\ &\quad \times \left( 1 - \mathbf{r}^{\mathrm{T}} \mathbf{R}_{d-1}^{-1} \mathbf{r} \right)^{-1/2}. \end{aligned} \]

\[ \hat{x}_d = E[X_d \mid X_1 = x_1, \ldots, X_{d-1} = x_{d-1}] = \int_0^\infty x_d f(x_d \mid x_1, \ldots, x_{d-1}, R, \nu) \, dx_d \]

We use numerical integration to evaluate this for each test data point.

Gaussian Copula Predictions Using Gaussian Copula (Kendall’s Tau)

Actual vs. Predicted Snowmelt for Day 60 Using Gaussian Copula (Kendall’s Tau)

Gaussian Copula Predictions Using Gaussian Copula (Spearman’s Rho)

Actual vs. Predicted Snowmelt for Day 60 Using Gaussian Copula (Spearman’s Rho)

Construction of the T-Copula

The joint density of \((X_1, \ldots, X_d)\) is:

\[ f(x_1, \ldots, x_d \mid R, \nu) = \prod_{i=1}^d f_i(x_i) \times c_t(F_1(x_1), \ldots, F_d(x_d) \mid R, \nu) \]

The conditional density of \((X_d \mid X_1, \ldots, X_{d-1})\) is:

\[ f(x_d \mid x_1, \ldots, x_{d-1}, R, \nu) = f_d(x_d) \times \frac{c_t(F_1(x_1), \ldots, F_d(x_d) \mid R, \nu)}{c_t(F_1(x_1), \ldots, F_{d-1}(x_{d-1}) \mid R, \nu)} \]

A \(d\)-dimensional t-distribution with mean zero, correlation matrix \(R\), and degrees of freedom \(\nu\):

\[ f_t^{(d)}(\mathbf{y} \mid R, \nu) = \frac{\Gamma\left(\frac{\nu + d}{2}\right)}{\Gamma\left(\frac{\nu}{2}\right) (\nu \pi)^{d/2} |R|^{1/2}} \left(1 + \frac{1}{\nu} \mathbf{y}^\top R^{-1} \mathbf{y}\right)^{-\frac{\nu + d}{2}} \]

Finding Conditional Copula Density

The t-copula density is:

\[ c_t(u_1, \ldots, u_d \mid R, \nu) = \frac{f_t^{(d)}(y_1, \ldots, y_d \mid R, \nu)}{\prod_{i=1}^d f_{t,\nu}(y_i)} \]

where \(y_i = t_\nu^{-1}(u_i)\), and the univariate t-density is:

\[ f_{t,\nu}(y_i) = \frac{\Gamma\left(\frac{\nu + 1}{2}\right)}{\Gamma\left(\frac{\nu}{2}\right) \sqrt{\nu \pi}} \left(1 + \frac{y_i^2}{\nu}\right)^{-\frac{\nu + 1}{2}} \]

Conditional copula density:

\[ \frac{c_t(F_1(x_1), \ldots, F_d(x_d) \mid R, \nu)}{c_t(F_1(x_1), \ldots, F_{d-1}(x_{d-1}) \mid R, \nu)} = \frac{f_t^{(d)}(y_1, \ldots, y_d \mid R, \nu)}{f_t^{(d-1)}(y_1, \ldots, y_{d-1} \mid R_{d-1}, \nu) \times f_{t,\nu}(y_d)} \]

This simplifies to the conditional density of \(Y_d \mid Y_1, \ldots, Y_{d-1}\), which is a univariate t-distribution.

Deriving the Conditional Distribution

Partition: \(\mathbf{y} = (\mathbf{y}_{d-1}, y_d)\), \(R = \begin{bmatrix} R_{d-1} & \mathbf{r} \\ \mathbf{r}^\top & 1 \end{bmatrix}\).
The conditional density is:

\[ f(y_d \mid \mathbf{y}_{d-1}, \nu) = \frac{f_t^{(d)}(\mathbf{y} \mid R, \nu)}{f_t^{(d-1)}(\mathbf{y}_{d-1} \mid R_{d-1}, \nu)} \]

This is a univariate t-distribution with:
- Degrees of freedom: \(\nu + d - 1\)
- Location: \(\mu_{\text{cond}} = \mathbf{r}^\top R_{d-1}^{-1} \mathbf{y}_{d-1}\)
- Scale: \(\sigma_{\text{cond}} = \sqrt{\frac{\nu + \mathbf{y}_{d-1}^\top R_{d-1}^{-1} \mathbf{y}_{d-1}}{\nu + d - 1} (1 - \mathbf{r}^\top R_{d-1}^{-1} \mathbf{r})}\)
Conditional density of \(X_d\):

\[ f(x_d \mid x_1, \ldots, x_{d-1}, R, \nu) = f_d(x_d) \cdot \frac{f_{t,\nu + d - 1}\left( \frac{y_d - \mu_{\text{cond}}}{\sigma_{\text{cond}}} \right) / \sigma_{\text{cond}}}{f_{t,\nu}(y_d)} \]

where \(y_d = t_\nu^{-1}(F_d(x_d))\).

\[ \hat{x}_d = E[X_d \mid X_1 = x_1, \ldots, X_{d-1} = x_{d-1}] = \int_0^\infty x_d f(x_d \mid x_1, \ldots, x_{d-1}, R, \nu) \, dx_d \]

We use numerical integration to evaluate this for each test data point.

t-copula predictions for day 60

Actual vs. Predicted Snowmelt for Day 60 Using t-Copula

SSE Results

The table below compares the Sum of Squared Errors (SSE) for the four methods across the evaluated days. Lower SSE values indicate better predictive performance.

Day	t-Copula	Gaussian Copula ( \(\tau\))	Gaussian Copula ( \(\rho\))	GLM (log-link)	Standard OLS
10	\(1.121\times10^{3}\)	\(1.178\times10^{3}\)	\(1.176\times10^{3}\)	\(1.255\times10^{5}\)	\(2.099\times10^{26}\)
20	\(7.005\times10^{3}\)	\(6.613\times10^{3}\)	\(6.613\times10^{3}\)	\(6.711\times10^{5}\)	\(1.007\times10^{27}\)
30	\(9.343\times10^{4}\)	\(1.778\times10^{5}\)	\(1.757\times10^{5}\)	\(1.744\times10^{7}\)	\(9.893\times10^{27}\)
40	\(5.160\times10^{4}\)	\(7.277\times10^{4}\)	\(7.244\times10^{4}\)	\(7.488\times10^{6}\)	\(7.921\times10^{27}\)
50	\(3.442\times10^{5}\)	\(1.832\times10^{6}\)	\(1.948\times10^{6}\)	\(2.054\times10^{8}\)	\(1.941\times10^{27}\)
60	\(2.004\times10^{5}\)	\(1.321\times10^{6}\)	\(1.261\times10^{6}\)	\(1.113\times10^{8}\)	\(1.986\times10^{28}\)
70	\(2.812\times10^{5}\)	\(8.783\times10^{5}\)	\(8.833\times10^{5}\)	\(9.004\times10^{7}\)	\(2.587\times10^{28}\)
80	\(5.043\times10^{5}\)	\(1.866\times10^{6}\)	\(1.754\times10^{6}\)	\(1.718\times10^{8}\)	\(8.879\times10^{28}\)
90	\(2.909\times10^{6}\)	\(4.897\times10^{6}\)	\(4.094\times10^{6}\)	\(3.933\times10^{8}\)	\(4.317\times10^{28}\)

Final model for prediction

Correlations between variables (excluding snowmelt) remain relatively stable across days, the correlation between snowmelt and other variables increases progressively
We averaged the correlation matrices over 93 days, capturing the overall dependency structure effectively.
We selected t-copula model for prediction due to its consistently lower SSE compared to other methods.
Parameters are the degrees of freedom, \(\nu = 4\) and the average dependency matrix.

Limitations and Conclusion

The t-copula model proved most effective for detecting large snowmelt events, outperforming GLM and OLS by accurately capturing tail dependencies and complex variable interactions critical for GLOF risk assessment.
We have data of 93 days(15 th March - 15 th June, 2022), the fitted t‑copula may not capture correlation patterns in other seasons.

References

Maina Samuel C., Mwigereri D., Weyn J., Mackey L., Ochieng M. (2023).”Evaluation of Dependency Structure for Multivariate Weather Predictors Using Copulas.” Available at: https://doi.org/10.1145/3616384

Parsa, Rahul A., Stuart A. Klugman. (2011). “Copula Regression.” Available at: https://variancejournal.org/article/84912

Nelsen Roger B. (2006). An Introduction to Copulas (2nd ed.), Springer Series in Statistics.

Hofert M., Kojadinovic I., Mächler M.,Yan J. (2018), Elements of Copula Modeling with R., Springer International Publishing AG

Haugh M., (2016). An introduction to copulas, IEOR E4602: Quantitative Risk Management (Spring 2016)

Joe, Harry (2014). . CRC Press.

Salvadori, G., De Michele, C., Kilsby, C. G., Chandler, R. E. (2007). “Estimating Return Periods of Extreme Hydrological Events via the Maxima of Stochastic Processes Selected by a Peaks-Over-Threshold Condition on Another Process.” , 43(11).

Genest, C., Favre, A. C. (2007). “Everything You Always Wanted to Know About Copula Modeling but Were Afraid to Ask.” , 12(4), 347–368.

Analysis and modeling of snowmelt data for the prediction of Glacier lake outburst flood

Introduction

Basic Theory of Copulas

Definition (Copula)

Sklar’s Theorem (1959)

Continued

Estimating Copulas (Pseudo-Maximum Likelihood Estimation)

Data Description

Variables Overview

Exploratory Data Analysis

Scatter plot for 50 th day

Histogram of 50th day

Assumed Distribution

Variable Selection

Choice of Copulas

Continued

OLS Prediction and Limitations

Construction of the Copula Based Model

Construction of Gaussian copula

Finding conditional copula density

Gaussian Copula Predictions Using Gaussian Copula (Kendall’s Tau)

Gaussian Copula Predictions Using Gaussian Copula (Spearman’s Rho)

Construction of the T-Copula

Finding Conditional Copula Density

Deriving the Conditional Distribution

t-copula predictions for day 60

SSE Results

Final model for prediction

Limitations and Conclusion

References

Thank You