Definition and general properties
- Framing the problem
- Connections to QDF/IDF modeling
Empirical results
- Checking assumption (19)
- Checking assumption (20)
Conclusions

Definition and general properties

Let \(\{Y(\mathbf{d})\}\) be a series of annual maxima indexed by \(\mathbf{d} \in D \subseteq \mathbb{R}^d\). Then the scaling property states that

\[\begin{equation} Y(\lambda \mathbf{d}) \overset{dist}{=} \lambda^{\theta}Y(\mathbf{d}) \end{equation}\]

Equality in distribution implies \[\begin{equation} \mathbb{E}\left[Y^h(\lambda \mathbf{d})\right] = \lambda^{h\theta}\mathbb{E}\left[Y^h(\mathbf{d})\right] \end{equation}\]

where \(h\) is the order of the moment. Equivalently, \[\begin{equation} \tag{11} \text{log}\left(\mathbb{E}\left[Y^h(\lambda \mathbf{d})\right]\right) = h\theta \, \text{log}\left( \lambda \right) + \text{log}\left(\mathbb{E}\left[Y^h(\mathbf{d})\right]\right) \end{equation}\]

Following from (11), the simple scaling hypothesis holds when the data fulfill

\[\begin{equation} \tag{19} \text{log-log linearity in }\text{log}\left(\mathbb{E}\left[Y^h(\lambda \mathbf{d})\right]\right) \text{versus } \text{log}\left( \lambda \right) \text{for each } h \end{equation}\]

\[\begin{equation} \tag{20} \text{linearity of slope change } h \rightarrow h \theta \end{equation}\]

The multiscaling hypothesis holds when the data fulfill (19) but not (20); instead of linear slope change, multiscaling as defined by Gupta & Waymire presents as nonlinear, concave slope change in \(h \rightarrow h \theta\).

The behavior of the slope change in assumption (20) shows the rate at which the variability in the data decrease with respect to an increase in scale. A concave slope change means that as we increase the scale, the variability in the data decreases faster than we would expect to see under simple scaling which assumes variability does not change with scale.

Note there is no physical requirement that the slope change be concave instead of convex; Gupta & Waymire simply found concavity in slope change to be a common property of empirical moments for the data sets in their analysis. It would be possible to develop similar theoretical framework for convex slope change. However, this would present modeling challenges as the concavity is used to develop parameter constraints for the multiscaling model. A model that does not assume concavity or convexity ahead of time would be nonidentifiable.

Framing the problem

We’re betting that when we temporally (time-domain) scale our series of annual maxima it will adhere to a set of specific scaling properties that have been established for spatial scaling of hydrologic process. If our annual maxima fit these properties, we can draw on a rich theoretical basis that allows us to describe how the variability in our process changes as the scale is increased/decreased.

But it is entirely possible that our underlying physical process does not adhere to the scaling properties that would allow us to make claims about either simple scaling or multiscaling.

This would look like failing the assumption in (19). If our data fails the assumption in (19) it would mean we are outside the natural class of multiplicative stochastic processes that exhibit the structure needed for multiscaling or simple scaling.

The key point is: if you fail multiscaling, you don’t just drop down to simple scaling, it means you are outside this realm of stochastic process theory altogether.

Connections to QDF/IDF modeling

The current QDF and IDF models don’t rely on this bit of stochastic process theory so they are unaffected if the assumption in (19) doesn’t hold. The reason for this is that they rely on separable functional dependence between the duration dependent parameters and the parameters of the GEV, i.e. we can write

\[\begin{equation} q = \frac{a(T)}{b(d)} \label{eqn:sep} \end{equation}\]

where \(a(T)\) is a function of the return period and \(b(d)\) is a function of the duration. Since \(b(d)\) is simply acting as a scaling factor on the GEV parameters, we can remain confident we will have equality in distribution across durations.

The connection between IDF and simple/multiscaling comes when the function \(b(d) = \lambda^{\theta}\). Then the form of the IDF relationship matches the form of the scaling hypothesis but does not necessarily fulfill the scaling hypothesis.

In most scenarios, this distinction is non-important since we are not interested in making statements beyond the first-order distribution of our data and the separability relationship in is enough. But it is important to realize that the scaling hypothesis is much stricter then the separability requirement, and although there is overlap between the IDF model and simple scaling, it is not guaranteed we actually have a simple scaling process unless our data adhere to the specific property in (19).

Empirical results

We use data from the hykval 35 database. The data is interpolated to hourly values. The interpolation is applied regularly across the entire dataset regardless of gap size; this means we do not take point density into account in the linear representation.

Checking assumption (19)

To check the assumptions behind the scaling hypothesis, we need to decide ahead of time which durations we want to observe (that is, we need to choose the scaling factor \(\lambda\)). Following Gupta and Waymire, we choose a set of durations from our interpolated data set that stretches over roughly three orders of magnitude, from 1 hour to 8 days (96 hours):

\[\begin{equation*} d = 1, 2, 3, \dots, 96\text{ hours} \end{equation*}\]

We are most interested in how the second moment (the variance) changes with scale, so we plot the log-log plots for the second moment for each station.

All of our stations fail assumption (19) for this range of durations; that is, neither simple scaling nor multiscaling can explain the underlying process when we scale our data across the three orders of magnitude from 1 hour to 96 hours.

Interestingly, we note that for durations greater than ~24 hours for the small (< 200 km\(^2\)) catchments and greater than ~48 hours for the large catchments the plots appear to have a more linear relationship:

The same does not hold true for durations < 24 hours:

The question now is whether this behavior is indicative of a physical property (i.e., as we approach instantaneous flood data the variability in flow cannot be expressed with the scaling property) or a data issue (i.e. most of our data has a resolution of daily values and when we linearly interpolate to hourly values we are destroying some inherent property of the data).

Comparing the interpolated data to the oracle data for durations < 24 hours

Luckily seven of our stations have at least 10 years of ultra-high quality digitally collected data where there are no gaps around the annual maxima larger than 1 hour. This forms a sort of “oracle” data set we can use to check the linearity behavior.

We find the curvilinear behavior is preserved when using the oracle data set. This means

For the properties checked in assumption (19), we do not lose meaningful information when we interpolate.
The short (<24 hour) durations display a behavior we cannot adequately capture with simple scaling or multiscaling.

Checking assumption (20)

For the durations that fit assumption (19) (so above 24 hours for the small catchments and above 48 hours for the large catchments) we check assumption (20) for evidence of multiscaling behavior using the full interpolated dataset.

We see both concavity and convexity in our data. To model both types of slope change we would need to be able to predict which catchments have convex vs concave slope change. It is not obvious how we could do this, especially for our particular application where we will extend the model to ungauged catchments.

A secondary goal could simply be to establish the presence of multiscaling, even if we cannot model it. However, it is not clear the figure above presents evidence of multiscaling: there are multiple issues that hinder our ability to validate assumption (20).

We cannot get at the uncertainty in the slope change of the linear fit (dashed line).
- Uncertainty in the dashed line is a product of the number of durations we look at, not the data itself–i.e. we can artificially drive down the uncertainty by computing a bunch of durations so it is not a meaningful measure.
The concavity or convexity of the slope change is extremely sensitive to what data is used.

To demonstrate point (2) we compute the moments analysis using three different periods of data for each station. Each period has around 20 years of data. We see the slope change flip between concavity and convexity in some cases. We also see the relationship change from simple scaling (linear relationship) to multiscaling and vice versa depending on the data set used.

These changes in the realtionship between slope and moment order is most likely due to differing amounts of error in the data (daily values vs. limnigraph vs. digitally collected data, also possibly data gaps) rather than reflecting a meaningful change in catchment properties.

Finally, we check assumption (20) with oracle data and get yet another set of slope changes that differ from both (i) using the full set of interpolated data and (ii) using the separate periods of data.

Conclusions

The scaling relationship between instantaneous (hourly) and daily (24 hour) floods cannot be explained by either simple scaling or multiscaling.
Above 24 hours for small catchments (< 200 km\(^2\)) and above 48 hours for large catchments we see behavior that matches the scaling hypothesis, but we cannot determine whether this behavior is indicative of simple scaling or multiscaling.
Since we cannot differentiate between simple scaling and multiscaling behavior in our data we cannot disprove the simple model.

Temporal multiscaling behavior in annual maxima

D. Barna

5/7/2021