This markdown reports a quick afternoon look at some different probability distributions for the subsurface storage data.

Plots for the exponential distribution

Support of the distribution is x in [0,\(\infty\)):

Histogram

Empirical CDF

Probability integral transform

We can check if our observations follow the estimated distribution using the probability integral transform (PIT). If the estimated distribution closely aligns with the unknown true distribution of the data, the PIT values will be approximately uniformly distributed (Dawid, 1984).

There is an excess of values at the lowest quantile, indicating the exponential distribution tends to overestimate when compared to the data. This overestimation can likely be attributed to the spike at low values, which the exponential distribution does a poor job of capturing.

Additionally, the histogram shows a slight bias with more values at higher quantiles, suggesting the exponential distribution’s tail could be too light. Because of this, we test fitting a few heavier-tailed distributions.

Investigating tail behavior and heavy-tailed distributions

We test out the Frechet and the log-logistic distribution as heavier-tailed alternatives. However, both of these distributions have difficulties fitting both the lower values and a heavier tail (see plots).

These results indicate that none of the individual probability distributions tested here captures all characteristics of the data. If we want better representation of the subsurface storage values, we could attempt to either (i) model the spike at low values and/or (ii) blend a fatter tail onto the exponential but neither (i) nor (ii) are trivial and would likely require a mixture distribution.

Histogram

Empirical CDF

Probability integral transform

The PIT values show that the Log-logistic and Frechet distributions are far too heavy tailed. The exponential is possibly a little too light tailed, but the tail behavior is not too far off from what we want.

Refrences

Dawid, A. P.: Present position and potential developments: Some personal views statistical theory the prequential approach, Journal of the Royal Statistical Society: Series A (General), 147, 278–290, 1984.