Edward S. Epstein

Predictive Hurricane Climatology

James B. Elsner

February 5, 2014

With help from Thomas H. Jagger

Epstein's 1985 Monograph

“The goal is to produce useful predictions that are consistent with one's best judgments, and to allow consistent revisions of such judgments as data…become available.”—Epward S. Epstein

Statistical Inference and Prediction in Climatology: A Bayesian Approach, Meteorological Monographs, volume 20, September 1985, Number 42, American Meteorological Society.

Problem

When I first became interested in hurricane climatology in the early 1990s, I ran across lots of information about past hurricanes from a variety of sources including instrument records and handwritten accounts.

Written accounts were, of course, less precise than instrumental records, which tended to become even more precise as technology advanced.

How was I to make use of all this information in a meaningful way?

Solution

In 1985 a solution was explained by Edward S. Epstein. Keep all information, but estimate a discount rate on the older information. His solution is Bayesian.

In 2001 I published a paper on hurricane climatology using his Bayesian formalism clearly outlined in Epstein (1985).

Even today reading his monograph tingles my neck hairs.

30-year hurricane forecast

In fact, thirteen years ago this July and based on our Journal of Climate paper (Elsner and Bossak, 2001), I issued a 30-year forecast of U.S. hurricane activity.

Forecast Issued July 1, 2001

Region/Type Forecast Count Total (2001-2013)
U.S. hurricanes 50 21
U.S. major hurricanes 18 7
Florida hurricanes 20 7

How did I use Epstein's work to solve my problem and come up with this forecast?

Poisson-gamma conjugate

Consider the arrival of hurricanes on the coast as a stochastic (read: random) process, where the annual counts are described by a Poisson distribution.

The Poisson distribution is described by the parameter \( \lambda \), the rate at which hurricanes arrive on the coast.

Knowledge of \( \lambda \) allows you to make statements about future hurricane frequency. Since the process is stochastic your statements will be given using probabilities.

Poisson-gamma conjugate

For example, the probability of \( \hat h \) hurricanes occurring over the next \( T \) years (e.g., \( T \) = 1, 5, 20, etc) is

\[ f(\hat h | \lambda, T) = \hbox{exp}(-\lambda T){(\lambda T) ^h \over h!} \]

for \( h=0, 1, \ldots, \lambda > 0 \), and \( T > 0 \).

The hat notation is used to indicate future values.

Knowledge about \( \lambda \) can come from written archives and instrumental records. You want to use as much of this information as possible before inferring what might happen in the future.

Poisson-gamma conjugate

This requires you to treat \( \lambda \) as a parameter, not a fixed constant. \( \lambda \) can take on any positive real number.

One way to express your judgment about \( \lambda \) is through the gamma distribution.

The numbers used to estimate \( \lambda \) from a set of data are the time interval \( T' \) and the number of hurricanes \( h' \) over this interval.

The prime indicates prior (here, earlier) information.

Poisson-gamma conjugate

For instance, observations from the hurricane record since 1851 indicate 15 hurricanes over the first ten years of the record, so \( T'=10 \) and \( h'=15 \).

con = "http://myweb.fsu.edu/jelsner/data/US.txt"
H = read.table(con, header=TRUE)
sum(H$All[1:10])
[1] 15

The above R code downloads the dataset and sums the counts over the first 10 years.

Note: You can do this yourself by downloading the slides from http://rpubs.com/jelsner/EpsteinSymposium and using copy/paste in an R session.

Poisson-gamma conjugate

The gamma distribution of possible future values for \( \lambda \) is given by \[ f(\hat \lambda | h', T') = {T'^{h'}\lambda^{h'-1} \over \Gamma (h')} \hbox{exp}(-\lambda T'), \] with the expected value E(\( \hat \lambda \)) = \( h'/T' \), and where \( \Gamma (x) \) is the gamma function.

If the probability density on \( \hat \lambda \) is a gamma distribution, with initial numbers (prior parameters) \( h' \) and \( T' \), and the numbers \( h \) and \( T \) are later observed, then the posterior density of \( \hat \lambda \) is also gamma with parameters \( h+h' \) and \( T+T' \).

In other words the gamma density is the conjugate prior for the Poisson rate \( \lambda \).

Priors

This gives you a convenient way to combine earlier, less reliable information, with later information. You simply add the prior parameters \( h' \) and \( T' \) to the observed numbers \( h \) and \( T \) to get the posterior parameters.

But how do you estimate the priors? You have values but they could be too low (or too high) due to missing or misclassified information. One way is to use bootstrapping. Bootstrapping is sampling from your sample (resampling).

This idea of using bootstrapping to specify the priors is what tingled my neck hairs when I first read it in Edward's monograph.

Priors

Here is how to do it in R.

library(bootstrap)
early = H$All[H$Year < 1899]
bs = bootstrap(early, theta=mean, 
               nboot=1000)

To obtain a 90% bootstrapped confidence interval about the mean hurricane rate prior to 1899, type

qbs = quantile(bs$thetastar, 
               prob=c(.05, .95))
qbs
   5%   95% 
1.396 2.125 

Priors

You can't say with certainty what the true hurricane rate, but you are willing to admit a 5% chance that the true rate is less than 1.4 and a 5% chance that it is greater than 2.12.

Given this appraisal about the early American hurricane rate you need to obtain an estimate of the parameters of the gamma distribution.

You start by creating an objective function defined as the absolute value of the difference between gamma quantiles and your target quantiles.

obj = function(x){
  sum(abs(pgamma(q=qbs, shape=x[1], 
                 rate=x[2]) - c(.05, .95)))
}

Priors

You then apply the function starting with initial values for the gamma parameters.

theta = optim(par = c(2, 1), obj)$par

Store these parameters as separate objects by typing

hp = theta[1]
Tp = theta[2]

The above procedure quantifies your judgment about hurricanes prior to the reliable set of counts (1899). Your prior parameters are \( h' \) = 61.7 and \( T' \) = 35.4

Likelihood statistics

You have additional information. You know the total number of hurricanes over the reliable period of record (since 1899) and the record length are

late = H$All[H$Year >= 1899]
h = sum(late)
T = length(late)

That is there are \( h \) = 187 hurricanes over the \( T \) = 112 years.

These are your likelihood statistics.

Posterior distribution

The posterior parameters are then \( h'' = h+h' \) = 248.7 and \( T'' = T+T' \) = 147.4. Note that although the likelihood parameters \( h \) and \( T \) must be integers, the prior parameters can take on any real value depending on your degree of belief.

Since the prior, likelihood, and posterior are all in the same gamma family, you can use the dgamma() function to compute the densities.

Prior, likelihood, & posterior

plot of chunk priorLikePost

Predictive distribution

The information you have about \( \lambda \) is codified in the two parameters \( h'' \) and \( T'' \) of the gamma density. Of more practical interest is how to use the information to predict future hurricane activity.

The answer lies in the fact that the predictive density for observing \( \hat h \) hurricanes over the next \( \hat T \) years is a negative binomial distribution, with parameters \( h'' \) and \( {T'' \over \hat T + T''} \) given by

\[ f\Bigl(\hat h | h'', {T'' \over \hat T + T''}\Bigr) = {\Gamma(\hat h + h'') \over \Gamma (h'') \hat h!}\Bigl[{T'' \over \hat T + T''}\Bigr]^{h''}\Bigl[{\hat T \over \hat T + T''}\Bigr]^{\hat h}. \]

Predictive distribution

The mean and variance of the negative binomial are \( \hat T{h'' \over T''} \) and \( \hat T{h'' \over T''}\bigl({\hat T + T'' \over T''}\bigr) \), respectively.

Note that the variance of the predictive distribution is larger than it would be if \( \lambda \) were known precisely.

If your interest is the probability of a hurricane next year, then \( \hat T \) is one and small compared with \( T'' \) so it makes little difference, but if your interest is hurricane activity over the next 20 years then it is important.

10-Yr prediction

plot of chunk plotPosteriorDistribution

10-, 20-, 30-yr predictions

plot of chunk plotPosteriorDistribution2

Summary

With Edward's method applied to my hurricane data, in July 2001 I predicted that the number of U.S. hurricanes over the next 30 years would be 51 of which 18 would be major (Category 3 or above).

The approach outlined in Epstein (1985) and adopted for hurricane climatology is a rational and coherent foundation for incorporating the available information about hurricane occurrences, while accounting for differences in the precision.

It could be used to account for the influence of climate change. Records influenced by recent changes can be given more weight than records from earlier decades.

Final words: Reproducibility

Epstein's 1985 monograph on statistical climatology is clearly written. It was easy for me to follow and reproduce his examples. This helped my career.

I try to emulate Edward by making my research reproducible.

The code used to produce the slides are available on RPubs at http://rpubs.com/jelsner/EpsteinSymposium

Thank you for your attention.

I would like to give my best regards to the family & friends of Edward Epstein.

Questions?