Meeting agenda 30.08.2023

In this meeting we discuss the set-up and framing of paper III.

Discussion should be guided by “what is the minimum we can do and still have a defensible study?”

Agenda items:

Read through draft abstract together. This provides a first attempt at framing the study.
I will present preliminary results. These will hopefully provide context on the sorts of questions we can ask.
Discuss framing of the study (three specific questions)

Draft abstract

Regression based methods are commonly used to estimate flood quantiles in ungauged basins. Comparison of different regression methods typically focuses on predictive accuracy. However, in situations where flood quantile estimates are needed at multiple durations, consistency of estimates between durations is also important: we do not want, for example, an estimate that says more water will arrive in a 1-hour window than in a 24-hour window if the latter time period encompasses the former. In this study we compare two regression-based RFFA methods and assess their ability to provide estimates that match the observed consistency between durations in the data. In one approach we regress directly on predetermined flood quantiles (quantile regression technique). In the other approach we regress on the (independently estimated) parameters of the Generalized Extreme Value (GEV) distribution, which are then used to calculate particular flood quantiles (parameter regression technique). For both methods the estimation is performed independently across durations. For the quantile regression technique the regression model used is a boosted tree ensemble (XGBoost). As one of the main differences between the parameter and quantile regression techniques is the ability of parameter regression to provide unified uncertainty estimates across quantiles, it is natural to choose a regression model that preserves this property. A generalized additive model (GAM) is a good choice here as the data-driven nature of the predictor-response relationship within the GAM matches XGBoost in character while the underlying distributional assumptions allow for uncertainty assessment. We find that…

Preliminary results

Parameters of GEV from pre-selection

beta

no interactions XGBoost

pure XGBoost

xi

no interactions XGBoost

pure XGBoost

Parameters of quantiles from pre-selection

Q50

pure XGBoost

Q100

pure XGBoost

Q500

pure XGBoost

The main findings are: - XGBoost has a harder and harder time finding a signal the further out in the tail we go. This means we should be careful not to over-interpret IIS-chosen covariates. - Regression directly on the quantiles with XGBoost is far more duration-inconsistent than either regression on the parameters of the GEV or the local Stan fits. However, we should discuss choice of regression model.

Regression on quantiles vs regression on GEV parameters

Regression on parameters of GEV:

\(\eta\) - floodGAM, \(\hspace{0.5in}\) \(\beta\) - GAM with parameters \(Q_N\), \(A_{For}\), \(R_{G1085}\), \(P_{Jun}\), \(\hspace{0.5in}\) \(\xi\) - GAM with parameters \(Q_N\), \(R_{G1085}\), \(P_{Mai}\).

Quantile regression: Q50-Q1000 - pure XGBoost w. IIS pre-selection

All predicted GEV parameters are within the support of the distribution.

Inconsistencies between quantiles are adjusted with rearrange from package quantreg (link Thomas shared last meeting).

Inconsistencies between durations

For individual Stan fits:

##          ID V1
## 1: 31100004  1

For regression on parameters of GEV:

##          ID V1
## 1: 15600017  1
## 2:   200415  1
## 3:   200614  1
## 4:  2500024  1

For regression on quantiles:

##           ID V1
##  1:  1200171  3
##  2: 12800009  1
##  3: 13900025  2
##  4: 14000002 10
##  5: 14800002  2
##  6:  1900107  2
##  7: 19600007  4
##  8:   200614  8
##  9: 21200011  7
## 10: 21300002  9
## 11: 23400005  2
## 12: 24600009  2
## 13:  2600021  1
## 14:  2700015  1
## 15:   300022  1
## 16: 31100006  5
## 17: 31100460  7
## 18:  6800001  5
## 19:  8300002  2
## 20:  8300012  8
## 21:  8800004  6
## 22:  8900001 11
## 23:  9600003  5
##           ID V1

Return level plots

10100001

14000002

15600017

To-do: decide framing of the study

Today I think there are three questions we should discuss:

We have options for each regression model. Right now we use GAMs for the parameter regression and XGBoost for the quantile regression. What regression models do we want to use? Is there a way we can structure our study such that we won’t get “why didn’t you try this this and this regression model”?
What role does predictive accuracy play in our study? Uncertainty?
Do we have an idea of what the implications are of having one method be more duration-consistent than the other?

\(~\)

text text text

Paper III

D. Barna

2023-08-13

Meeting agenda 30.08.2023

Draft abstract

Preliminary results

Parameters of GEV from pre-selection

beta

no interactions XGBoost

pure XGBoost

xi

no interactions XGBoost

pure XGBoost

Parameters of quantiles from pre-selection

Q50

pure XGBoost

Q100

pure XGBoost

Q500

pure XGBoost

Regression on quantiles vs regression on GEV parameters

Inconsistencies between durations

For individual Stan fits:

For regression on parameters of GEV:

For regression on quantiles:

Return level plots

10100001

14000002

15600017

To-do: decide framing of the study