Aleatoric Uncertainty & Hurricane Counts

If your forecast target is a count, like the number of hurricanes in a season, it is a good idea to understand aleatoric uncertainty. For small counts the aleatoric uncertainty is quite large.

You can demonstrate this using the following code. First you create a vector of length equal to 100 years that has the rate varying predictably between 4 and 8. All statistical models for hurricane counts predict this rate.

Then, from the rate for each year you generate a random count based on a Poisson distribution and compute the squared correlation between the count vector and the rate vector. This is the variation in counts explained by the rate. One minus this is the variation in counts unexplained by the rate (aleatoric uncertainty).

Repeat 1000 times and generate a histogram.

rate = rep(4:8, 20)
h = numeric()
xx = numeric()
for (j in 1:1000) {
    for (i in 1:100) h[i] = rpois(1, lambda = rate[i])
    xx[j] = 100 - cor(rate, h)^2 * 100
}
hist(xx, xlab = "Aleatoric Uncertainty (%)", main = "")

plot of chunk simulation

The estimated level of irreducible uncertainty in this example is 74.1% with a interquartile range of between 69.7% and 79.2%.

This is not to say that we can't learn anything from a busted forecast, and we should certainly try, it only means that what we do learn will not diminish this type of uncertainty.