In this meeting we discuss the set-up and framing of paper III.
Discussion should be guided by “what is the minimum we can do and still have a defensible study?”
Agenda items:
Regression based methods are commonly used to estimate flood quantiles in ungauged basins. Comparison of different regression methods typically focuses on predictive accuracy. However, in situations where flood quantile estimates are needed at multiple durations, consistency of estimates between durations is also important: we do not want, for example, an estimate that says more water will arrive in a 1-hour window than in a 24-hour window if the latter time period encompasses the former. In this study we compare two regression-based RFFA methods and assess their ability to provide estimates that match the observed consistency between durations in the data. In one approach we regress directly on predetermined flood quantiles (quantile regression technique). In the other approach we regress on the (independently estimated) parameters of the Generalized Extreme Value (GEV) distribution, which are then used to calculate particular flood quantiles (parameter regression technique). For both methods the estimation is performed independently across durations. For the quantile regression technique the regression model used is a boosted tree ensemble (XGBoost). As one of the main differences between the parameter and quantile regression techniques is the ability of parameter regression to provide unified uncertainty estimates across quantiles, it is natural to choose a regression model that preserves this property. A generalized additive model (GAM) is a good choice here as the data-driven nature of the predictor-response relationship within the GAM matches XGBoost in character while the underlying distributional assumptions allow for uncertainty assessment. We find that…
The main findings are: - XGBoost has a harder and harder time finding a signal the further out in the tail we go. This means we should be careful not to over-interpret IIS-chosen covariates. - Regression directly on the quantiles with XGBoost is far more duration-inconsistent than either regression on the parameters of the GEV or the local Stan fits. However, we should discuss choice of regression model.
Regression on parameters of GEV:
\(\eta\) - floodGAM, \(\hspace{0.5in}\) \(\beta\) - GAM with parameters \(Q_N\), \(A_{For}\), \(R_{G1085}\), \(P_{Jun}\), \(\hspace{0.5in}\) \(\xi\) - GAM with parameters \(Q_N\), \(R_{G1085}\), \(P_{Mai}\).
Quantile regression: Q50-Q1000 - pure XGBoost w. IIS pre-selection
All predicted GEV parameters are within the support of the distribution.
Inconsistencies between quantiles are adjusted with rearrange from package quantreg (link Thomas shared last meeting).
## ID V1
## 1: 31100004 1
## ID V1
## 1: 15600017 1
## 2: 200415 1
## 3: 200614 1
## 4: 2500024 1
## ID V1
## 1: 1200171 3
## 2: 12800009 1
## 3: 13900025 2
## 4: 14000002 10
## 5: 14800002 2
## 6: 1900107 2
## 7: 19600007 4
## 8: 200614 8
## 9: 21200011 7
## 10: 21300002 9
## 11: 23400005 2
## 12: 24600009 2
## 13: 2600021 1
## 14: 2700015 1
## 15: 300022 1
## 16: 31100006 5
## 17: 31100460 7
## 18: 6800001 5
## 19: 8300002 2
## 20: 8300012 8
## 21: 8800004 6
## 22: 8900001 11
## 23: 9600003 5
## ID V1
Today I think there are three questions we should discuss:
We have options for each regression model. Right now we use GAMs for the parameter regression and XGBoost for the quantile regression. What regression models do we want to use? Is there a way we can structure our study such that we won’t get “why didn’t you try this this and this regression model”?
What role does predictive accuracy play in our study? Uncertainty?
Do we have an idea of what the implications are of having one method be more duration-consistent than the other?
\(~\)
text text text