Frequency, intensity, and sensitivity of TCs in best-track and simulated data

James B. Elsner, Jill C. Trepanier, Kerry A. Emanuel

## [1] "Wed Aug 29 21:30:08 2012"

1) Introduction

Hurricanes are lethal and costly. Understanding if, and how, they might change as the planet warms is important. Hurricanes are not adequately resolved in current global climate model (GCM) simulations. So they cannot be used to directly address future changes with confidence.

Statistical models built on observational hurricane data provide evidence on what is happening and provide intriguing glimpses as to what might happen in the future. Uncertainty on model parameters is large however owing to short, inhomogenous data records.

A method for generating synthetic hurricane data has recently been developed that makes use of NCEP's numerical reanalysis of the observational data and empirical models (Emanuel et. al. 2006). The synthetic data are a series of potential tracks and intensities covering the period 1980–2010, inclusive.

Here we provide a comparison of the observational and synthetic hurricane data sets using frequency, intensity, and the sensitivity of intensity to sea-surface temperature (SST). We do this using the method developed in Elsner et. al. (2012) The hypothesis is that the synthetic hurricanes will match the observed sensitivity of limiting intensity to SST.

The main findings are:

The spatial distribution of TC frequency in the synthetic TC data matches the spatial distribution of the TC frequency in the best-track data. Regions where TCs actually occur correlate well with regions where the synthetic TCs are found.
The spatial distribution of TC intensity in the synthetic TC data also matches the spatial distribution of TC intensity in the best-track data. Regions where strong hurricanes occur in the best-track data compare well with regions where strong hurricanes occur in the simulated data.
On average the sensitivity of limiting intensity to SST is larger in actual hurricanes.
The spatial distribution of the occurrence of hurricanes in the best-track set does not match the spatial distribution of the occurrence of hurricanes in the simulated set.

2) Data

This study compares simulated TC data with the best-track data. We use a single SST data set for comparing sensitivity. The study region is the North Atlantic basin, which includes the Caribbean Sea and the Gulf of Mexico. We begin with a description of the data sets.

Best-track data

The best-track data set contains the six-hourly center locations and intensities of all known tropical cyclones across the North Atlantic basin including the Gulf of Mexico and Caribbean Sea. The data set is called HURDAT for HURricane DATa. It is maintained by the U.S. National Oceanic and Atmospheric Administration (NOAA) at the National Hurricane Center (NHC).

Center locations (center fixes) are given in geographic coordinates (in tenths of degrees) and the intensities, representing the one-minute near-surface (\( \sim \) 10 m) wind speeds, are given in knots (1 kt = .5144 m s\( ^{-1} \)) and the minimum central pressures are given in millibars (1 mb = 1 hPa). The data are provided in six-hourly intervals starting at 00 UTC (Universal Time Coordinate). The version of HURDAT file used here contains cyclones over the period 1851 through 2010 inclusive (http://www.nhc.noaa.gov/pastall.shtml#hurdat). Information on the history and origin of these data is found in Jarvinen et. al. (1984).

We interpolate the raw best-track data to one hour intervals using splines and spherical geometry. Details of the procedure including R code for the interpolation are given in Elsner & Jagger, (2012) We subtract 60% of the forward speed from the best-track speed to get an estimate of the cyclone's maximum rotational velocity.

The best-track data set has a total of 364 TCs over this period.

Simulated TC data

The simulated data represent a large number of cyclone tracks together with a deterministic intensity model integrated along each track. The tracks are based on a weighted average of the upper- and lower-tropospheric flow plus 'beta drift'. The flow is simulated from wind time series where the monthly mean, variance, and covariances conform to the statistics derived from the NCEP reanalysis data. The kinetic energy of the simulated flow field obeys the observed frequency distribution characteristic of geostrophic turbulence. Wind shear derived from the flows is input to the intensity model. Statistics of the simulated hurricane motion match the statistics of observed hurricane motion. The method ensures that hurricane intensity comforms broadly to the underlying physics, including the natural limitations imposed by potential intensity, ocean coupling, vertical wind shear, and landfall. The simulated track data are available in Matlab format from ftp://texmex.mit.edu/pub/emanuel/ForJim/ncep850b.zip.

Note: The file was replaced on 8/21/12 at 2:06:00 PM with ncep850b_rev.zip. Email from Kerry: ''I placed a revised zip file in the same directory as before; this is identical to the old one but contains an additional array, vnetstore, that contains the peak surface winds in knots that includes the translation effect plus our baroclinic correction. The latter is unpublished, but basically adds a fraction of the background wind shear to the surface wind speed, based on isallobaric arguments. I am not sure how this might affect sensitivity.''

The track information is stored in arrays with the first index representing the cyclone number and the second indicating the center fix along the track. Additional vectors record the cyclone year and the annual frequency. The maximum rotational velocity is given in units of knots. We reformat the data to match the best-track data.

##   Sid SYear Mo Da hr    lon   lat WmaxS    P YrRate
## 1   1  1980  9 24 20 -68.50 17.76 8.315 1004    9.4
## 2   1  1980  9 24 22 -68.66 17.81 8.229 1004    9.4
## 3   1  1980  9 25  0 -68.82 17.85 8.466 1004    9.4
## 4   1  1980  9 25  2 -68.98 17.90 8.855 1004    9.4
## 5   1  1980  9 25  4 -69.13 17.95 9.360 1004    9.4
## 6   1  1980  9 25  6 -69.29 18.01 9.930 1003    9.4

## [1] 6200

We keep only wind speeds above 31 m s\( ^{-1} \) to match the 75th percentile wind speeds used in the best-track data that correspond to hurricane intensity.

The simulated data set has a total of 6200 hurricanes over the period.

Figure 1: Comparison of wind speed densities. Frequency of 2 hour hurricane records. Rotational intensity shown in m s\( ^{-1} \).
plot of chunk densities

We take only the even hours of the best-track record to match the 2-hourly simulation samples. Figure 1 shows the distribution of the rotational component of the wind speeds in the best-track and simulated hurricane sets. The mean best-track wind velocity for winds above the 75th percentile is 24.6 m s\( ^{-1} \) with a standard deviation of 13.61 m s\( ^{-1} \), which compares with a mean simulated velocity of 25.5 m s\( ^{-1} \) and a standard deviation of 11.01 m s\( ^{-1} \).

The simulated wind speeds have a skewness of 1.11 which compares with a skewness of 1.03 for the observed wind speeds. The simulated wind speeds have a kurtosis of 1.6 which compares with a kurtosis of 0.67 for the observed wind speeds.

The distributions match at the 70th percentile (30 m~s\( ^{-1} \)).

Sea-surface temperature

We also make use of SST available from www.esrl.noaa.gov/psd/data/gridded/data.noaa.ersst.html in netCDF format.

We create a spatial points data frame and transform the latitude-longitude grids to a Lambert conformal conic (LCC) projection. The secant latidues are 30\( ^\circ \) and 60\( ^\circ \) N and the projection is centered on the 60\( ^\circ \) W longitude. The projecion is used by the NHC for the seasonal summary maps.

3) Spatial framework for comparisons

Our method is a spatial comparison of simulated and observed hurricanes in terms of frequency, intensity, and sensitivity. We create separate spatial points data frames for the observed and simulated hurricanes and transform the center fix locations given in latitude and longitude to the same LCC projection as used to project the SST data.

We next create a hexagonal tessellation of the basin from the set of best-track hurricanes. A rectangular domain encompassing the set of observed hurricane locations is gridded into equal-area hexagons (Elsner et. al. 2012).

The area of each hexagon is a compromise between large enough to have a sufficient number of hurricanes passing through to reliably estimate model parameters and small enough that regional variations are meaningful. Here we use an area of 6.1782 × 10⁵ thousand square kilometers. The same tessellation is used on the simulated hurricanes.

Overlay the hexagon tessellation of the basin on the best-track and simulated track points and get the per grid hurricane count and per hurricane maximum.

We remove hexagons having fewer than 15 observed tropical cyclone. This is done so that each hexagon has a sufficient number of wind speed values to estimate the limiting-intensity model (Elsner et. al. 2012). Within a single grid, the highest number of observed hurricanes is 69. This compares with a high of 1837 simulated hurricanes.

Figure 2: Frequency and intensity maps.
plot of chunk mapFreqInt

The maps show the observed and simulated number of hurricanes per hexagon grid as a color ramp. There are at least an order of magnitude more simulated hurricanes. Grids with the largest number of simulated hurricanes are concentrated over the southwestern Atlantic into the Gulf of Mexico. In contrast grids with the largest number of observed hurricanes are centered across the central Atlantic. The correlation between the observed and simulated counts across the domain is 0.71.

Figure 3: Relative risk of a simulated hurricane. Factor by which the simulated relative TC frequency exceeds that of the observed relative TC frequency.
plot of chunk intensityDiffs

4) Relationships to SST

One advantage of the spatial framework is that we can match the hurricane data with covariate information spatially. To see this we map the SST field using the same grids. Each grid contains the seasonal and area averaged SST value.

Figure 4: Map of SST averaged over the grids.
plot of chunk mapSSTgrids

Frequency

A correlation is computed between the average SST values per grid and the hurricane counts across the domain for the best-track and simulated data.

The correlation is 0.43 between the observed hurricane frequency and SST across the domain. This compares with a correlation of 0.24 between the simulated hurricane frequency and SST.

Interestingly the relationship between hurricane occurrence and SST is essentially zero in the best-track data, but not so with the simulated hurricanes. We might want to compare these correlations with correlations obtained using only grids where average SST exceeds 26\( ^\circ \) C. This result is opposite if we use all tropical cyclones.

Side-by-side sensitivity of occurrence to SST plots.

Intensity

The best-track and simulated maximum hurricane intensities match quite well.

The correlation over the grids is 0.84. The relationship between the maximum winds and the corresponding SST is 0.64 for the best-track data and 0.49 for the simulated data.

High values of correlation indicate a tight relationship between SST and limiting intensity. However, correlation does not answer the sensitivity question. This is done using a regression model of LI onto SST. Since not all grids have the same number of hurricanes, we use weighted regression so that the grids having more hurricanes contribute more to the relationship than do those having fewer hurricanes.

Extract a data frame of LI and SST and compute the sensitivity. Use weights=count for weighted regression. Test if northing is an important predictor.

Sensitivity values are 7.4 and 4.6 m s\( ^{-1} \)/K from the best-track and simulated hurricanes, respectively. SST explains 47.9% of the spatial variability in the observed maximum intensity and 33.9% of the spatial variability in the simulated maximum intensity.

Figure 5: Sensitivity of limiting intensity to SST. The slope of the red line is the sensitivity of limiting intensity to spatial variation in SST.
plot of chunk sensitivity

5) Spatial Variability in Sensitivity

We first examine the spatial autocorrelation of the model residuals.

Estimate Moran's I on the model residuals.

hspdf2 = hspdf[hspdf$avgsst >= 25, ]
hexnb2 = poly2nb(hspdf2)
wts2 = nb2listw(hexnb2, style = "W")
stvty.lmOBS = lm(WmaxOBS ~ avgsst, weights = countOBS, data = hspdf2)
stvty.lmSIM = lm(WmaxSIM ~ avgsst, weights = countSIM, data = hspdf2)
lm.morantest(stvty.lmOBS, wts2)

## 
##  Global Moran's I for regression residuals
## 
## data:  
## model: lm(formula = WmaxOBS ~ avgsst, data = hspdf2, weights =
## countOBS)
## weights: wts2
##  
## Moran I statistic standard deviate = 4.578, p-value = 2.351e-06
## alternative hypothesis: greater 
## sample estimates:
## Observed Moran's I        Expectation           Variance 
##            0.44622           -0.05061            0.01178 
##

lm.morantest(stvty.lmSIM, wts2)

## 
##  Global Moran's I for regression residuals
## 
## data:  
## model: lm(formula = WmaxSIM ~ avgsst, data = hspdf2, weights =
## countSIM)
## weights: wts2
##  
## Moran I statistic standard deviate = 6.768, p-value = 6.552e-12
## alternative hypothesis: greater 
## sample estimates:
## Observed Moran's I        Expectation           Variance 
##             0.6839            -0.0513             0.0118 
##

The p-values indicate significant spatial autocorrelation. The spatial autocorrelation largely disappears if we subset on WmaxOBS >= 55 m/s.

bwOBS = gwr.sel(WmaxOBS ~ avgsst, weights = countOBS, data = hspdf2)

## Bandwidth: 3338701 CV score: 3651431 
## Bandwidth: 5396738 CV score: 4079888 
## Bandwidth: 2066764 CV score: 3023835 
## Bandwidth: 1280664 CV score: 2447022 
## Bandwidth: 794827 CV score: 2514531 
## Bandwidth: 1139003 CV score: 2391995 
## Bandwidth: 1083085 CV score: 2385046 
## Bandwidth: 1064579 CV score: 2384956 
## Bandwidth: 1072306 CV score: 2384855 
## Bandwidth: 1072365 CV score: 2384855 
## Bandwidth: 1072344 CV score: 2384855 
## Bandwidth: 1072344 CV score: 2384855 
## Bandwidth: 1072344 CV score: 2384855 
## Bandwidth: 1072352 CV score: 2384855 
## Bandwidth: 1072347 CV score: 2384855 
## Bandwidth: 1072345 CV score: 2384855 
## Bandwidth: 1072344 CV score: 2384855 
## Bandwidth: 1072344 CV score: 2384855 
## Bandwidth: 1072344 CV score: 2384855 
## Bandwidth: 1072344 CV score: 2384855 
## Bandwidth: 1072344 CV score: 2384855

bwSIM = gwr.sel(WmaxSIM ~ avgsst, weights = countSIM, data = hspdf2)

## Bandwidth: 3338701 CV score: 1.412e+09 
## Bandwidth: 5396738 CV score: 1.639e+09 
## Bandwidth: 2066764 CV score: 1.076e+09 
## Bandwidth: 1280664 CV score: 793090944 
## Bandwidth: 794827 CV score: 690653800 
## Bandwidth: 494563 CV score: 735653112 
## Bandwidth: 807996 CV score: 691239272 
## Bandwidth: 765558 CV score: 690124436 
## Bandwidth: 765639 CV score: 690124382 
## Bandwidth: 766125 CV score: 690124234 
## Bandwidth: 766115 CV score: 690124234 
## Bandwidth: 766115 CV score: 690124234 
## Bandwidth: 766115 CV score: 690124234 
## Bandwidth: 766115 CV score: 690124234 
## Bandwidth: 766115 CV score: 690124234 
## Bandwidth: 766115 CV score: 690124234

gwrOBS = gwr(WmaxOBS ~ avgsst, weights = countOBS, data = hspdf2, 
    bandwidth = bwOBS)
gwrSIM = gwr(WmaxSIM ~ avgsst, weights = countSIM, data = hspdf2, 
    bandwidth = bwSIM)
rSensOBSSIM = cor(gwrOBS$SDF$avgsst, gwrSIM$SDF$avgsst)
range(gwrOBS$SDF$avgsst)

## [1] -13.048   9.891

range(gwrSIM$SDF$avgsst)

## [1] -0.09577 12.78153

gwrOBS$SDF$avgsst[gwrOBS$SDF$avgsst < 0] = -1
gwrSIM$SDF$avgsst[gwrSIM$SDF$avgsst < 0] = -1
cls = brewer.pal(7, "Oranges")
cls = c("gray", cls)
rng = seq(-2, 14, 2)
p0 = spplot(gwrOBS$SDF, "avgsst", col = "white", col.regions = cls, 
    sp.layout = list(l1), at = rng, colorkey = list(space = "bottom", labels = c("< 0", 
        rng[2:length(rng)])), sub = "Sensitivity (Observed) [m s$^{-1}$/K]")
p1 = spplot(gwrSIM$SDF, "avgsst", col = "white", col.regions = cls, 
    sp.layout = list(l1), at = rng, colorkey = list(space = "bottom", labels = c("< 0", 
        rng[2:length(rng)])), sub = "Sensitivity (Simulated) [m s$^{-1}$/K]")
p0 = update(p0, main = textGrob("a", x = unit(0.05, "npc")), par.settings = list(fontsize = list(text = 15)))
p1 = update(p1, main = textGrob("b", x = unit(0.05, "npc")), par.settings = list(fontsize = list(text = 15)))
plot(p0, split = c(1, 1, 1, 2), more = TRUE)
plot(p1, split = c(1, 2, 1, 2), more = FALSE)

plot of chunk sensitivityMaps

6) Effect of grid size

CODE NEEDS TO BE WRITTEN.

7) Summary and conclusions

A method for simulating hurricanes has recently been developed that makes use of reanalysis data from NCEP and empirical models. Here we analyze a set of these synthetic hurricanes and compare them to the set of observed hurricanes over the common period 1980–2010. Comparisons of the frequency, intensity, and sensitivity to SST are made spatially using a hexagon grid covering the North Atlantic basin including the Gulf of Mexico and parts of the Caribbean Sea.

The main findings are:

The spatial distribution of TC frequency in the synthetic TC data matches the spatial distribution of the TC frequency in the best-track data. The pattern correlation is 0.715. Regions where TCs actually occur correlate well with regions where the synthetic TCs are found.
The spatial distribution of TC intensity in the synthetic TC data also matches the spatial distribution of TC intensity in the best-track data. The pattern correlation is 0.84. Regions where strong hurricanes occur in the best-track data compare well with regions where strong hurricanes occur in the simulated data.
On average the sensitivity of limiting intensity to SST is larger in actual hurricanes. The sensitivity is estimated to be 7.4 m s\( ^{-1} \)/K using the best-track data and 4.6 m s\( ^{-1} \)/K using the simulated TCs.
The spatial distribution of the occurrence of hurricanes in the best-track set does not match the spatial distribution of the occurrence of hurricanes in the simulated set. The pattern correlation is -0.532.

References

Can't get RStudio to list all references.

bibliography()

## Emanuel K, Ravela S, Vivant E and Risi C (2006). "A statistical
## deterministic approach to hurricane risk assessment." _Bulletin of
## the American Meteorological Society_, *87*, pp. 299-314.
## 
## 
## 
## 
## 
## 
## 
##

Supporting Material: Statistical model of the limiting intensity from the observed data

Start with a single grid to show the procedure. Grid id 28. This corresponds to the 13th grid id and the 28th grid idSIM. It is located in the southeastern Gulf of Mexico.

Plot a histogram of the best-track and simulated hurricane maximum wind speeds.

Run the model on grid id 28 using the best-track data only.

Make a plot of the return level curve with the empirical estimates and the horizontal lines corresponding to the threshold and limiting intensity.

Run the GPD model at all grids and create a data frame with the limiting intensity, the 3 parameters, and the average SST.

15 of the 27 grids have xi < -1 so the limiting intensity is the observed max intensity for these grids.

The sensitivity to SST increases to 8.7 m s\( ^{-1} \)/K.

Side-by-side sensitivity of limiting intensity to SST plots.