This markdown reports a quick afternoon look at some different probability distributions for the subsurface storage data.
Support of the distribution is x in [0,\(\infty\)):
We can check if our observations follow the estimated distribution using the probability integral transform (PIT). If the estimated distribution closely aligns with the unknown true distribution of the data, the PIT values will be approximately uniformly distributed (Dawid, 1984).
There is an excess of values at the lowest quantile, indicating the exponential distribution tends to overestimate when compared to the data. This overestimation can likely be attributed to the spike at low values, which the exponential distribution does a poor job of capturing.
Additionally, the histogram shows a slight bias with more values at higher quantiles, suggesting the exponential distribution’s tail could be too light. Because of this, we test fitting a few heavier-tailed distributions.
We test out the Frechet and the log-logistic distribution as heavier-tailed alternatives. However, both of these distributions have difficulties fitting both the lower values and a heavier tail (see plots).
These results indicate that none of the individual probability distributions tested here captures all characteristics of the data. If we want better representation of the subsurface storage values, we could attempt to either (i) model the spike at low values and/or (ii) blend a fatter tail onto the exponential but neither (i) nor (ii) are trivial and would likely require a mixture distribution.
The PIT values show that the Log-logistic and Frechet distributions are far too heavy tailed. The exponential is possibly a little too light tailed, but the tail behavior is not too far off from what we want.
Dawid, A. P.: Present position and potential developments: Some personal views statistical theory the prequential approach, Journal of the Royal Statistical Society: Series A (General), 147, 278–290, 1984.