Conclusions
Model sensitivity to input durations
Model performance on in-sample durations
- IQD
- MAPE
Model performance on out-of-sample durations
- IQD
- MAPE
Comparison of in- and out-of-sample sub-daily estimates
Appendix

Conclusions

DD outperforms both Javelle and the mixture model. Its performance advantages are greatest when estimating events with either (1) short durations and/or (2) long return periods. The mixture model performs second best and Javelle performs third best.
With a few exceptions, QDF models are able to predict out-of-sample durations with a relatively moderate loss in accuracy when compared to in-sample durations. The exceptions tend to be stations that are already ill-fit by QDF models; that is, if there is already a significant difference between the QDF and reference models this difference is likely to be amplified when predicting out of sample durations.
QDF models in general can struggle when there is a large spread in shape parameter values between the durations estimated, or when catchment characteristics contribute to underlying differences in hydrograph shape between durations (for example, the diurnal melt-freeze cycle seen in Hugdal Bru).
QDF models are sensitive to the durations fed into them and should be fit with the minimum number of evenly spaced durations needed to ensure convergence of the model.

Model sensitivity to input durations

Purpose of section: assess model behavior.

Adding durations unnecessarily can bias estimates and artificially narrow the credible intervals. Section already written up in paper draft.

Model performance on in-sample durations

Purpose of section: assess model behavior.

Models fit with four durations: (1, 24, 48, 72 hours) and assessed at each of these in-sample durations.

IQD

Purpose of IQD: analyze distributional similarity to the reference model (local GEV) within the observed range of data.

Performance on in-sample durations is what we expect: models with more parameters fit the data better.

##    model    V1
## 1:    DD 0.066
## 2:     J 0.074
## 3:    RJ 0.071

Double-Delta wins everywhere (see return levels plots in appendix), but has the biggest advantage on the shorter durations.

##     d DD_beats_J DD_beats_RJ
## 1:  1      10/12       10/12
## 2: 24       9/12        9/12
## 3: 48       8/12        8/12
## 4: 72       7/12        7/12

There are certain stations and durations where QDF models provide a poor distributional fit (Hugdal Bru, 1 hour, Øyungen)

MAPE

Purpose of MAPE: analyze differences in tail behavior between the QDF model and local GEV fits outside of observed range of data.

The addition of the second delta has the most impact when estimating events with long return periods (look at figures: note how IQD for RJD & J is almost identical vs. MAPE has RJD values clustered much more tightly around the diagonal than J).
Again, DD wins everywhere on the in-sample durations (as expected) but has the biggest advantage on the short durations.

##    model   rp     V1
## 1:    DD  100  7.108
## 2:     J  100  8.315
## 3:    RJ  100  7.271
## 4:    DD 1000 10.962
## 5:     J 1000 12.483
## 6:    RJ 1000 11.151

##     d   rp DD_beats_J DD_beats_RJ
## 1:  1  100      10/12       10/12
## 2:  1 1000      10/12       10/12
## 3: 24  100       9/12       10/12
## 4: 24 1000       9/12        9/12
## 5: 48  100       4/12        4/12
## 6: 48 1000       4/12        4/12
## 7: 72  100       6/12        4/12
## 8: 72 1000       4/12        3/12

Some of the MAPE values are high. We suspect that one factor that can contribute to a high MAPE value is a diversity of xi values in the local GEV fits. Ex: Gryta.

Model performance on out-of-sample durations

Purpose of section: rank the models.

Models fit with four durations: (24, 36, 48, 60 hours) and used to predict the 1 & 12 hour durations.

IQD

DD has the best average IQD score on the out-of-sample durations (1 & 12 hours).

##    model    V1
## 1:    DD 0.488
## 2:     J 0.616
## 3:    RJ 0.594

There are only three station / duration combinations where DD performs worse than either J or RJD (Sjodalsvatn, and Dyrdalsvatn 1 hour). Everywhere else it’s either the same or better.
Some stations are still difficult to estimate with QDF models (Hugdal Bru, Røykenes).

MAPE

DD has the best average MAPE score on the out-of-sample durations (1 & 12 hours) at both return periods of interest.

##    model   rp     V1
## 1:    DD  100 12.678
## 2:     J  100 13.826
## 3:    RJ  100 13.248
## 4:    DD 1000 17.332
## 5:     J 1000 18.738
## 6:    RJ 1000 18.185

DD provides an equal or better fit at ~75% of the stations and durations studied. There are 5 or 6 station / duration combos where it is outperformed by J or RJD (marked in red on the plot).
Several of the smallest catchments (Gravå, Gryta, Grosettjern) have high out-of-sample MAPE values. This is likely because the subdaily durations are significantly different than the other durations (i.e. averaging window starts to become much wider than the typical flood event). One of the ways the sub-daily durations tend to differ is in their xi values.

Comparison of in- and out-of-sample sub-daily estimates

Purpose of section: assess capabilities of QDF models. How much do we lose going from in sample to out of sample?

Two sets of models fit: one with four durations (24, 36, 48, 60 hours) that is then used to predict the 1 & 12 hour durations, and one with six durations (1, 12, 24, 36, 48, 60) where the 1 & 12 hour durations are evaluated as in-sample durations. The 1 & 12 hour durations from each of these models are compared.

The stations that have the greatest loss when going from in-sample to out-of-sample tend to be stations that already had high IQD or MAPE values. This means that if there is already a siginificant difference between the the QDF and reference models this difference is likely to be amplified when predicting out of sample durations.
Most stations and durations have a relatively moderate loss when moving from in- to out-of-sample on both the IQD and MAPE (the exceptions to this are labeled in the plots). MAPE has an intuitive interpretation: we can say that there typically is only a +/- 5% difference in MAPE between the in- and out-of-sample sets.

Results version 5.0

D. Barna

7/19/2022

Conclusions

Model sensitivity to input durations

Model performance on in-sample durations

IQD

MAPE

Model performance on out-of-sample durations

IQD

MAPE

Comparison of in- and out-of-sample sub-daily estimates

Appendix

Gryta with individually fit GEV overlaid…