A spatial point process model for violent tornado occurrence in the U.S.~Great Plains

James B. Elsner, Richard J. Murnane, Thomas H. Jagger, and Holly M. Widen

Code in support of our paper to be submitted to the Mathematical Geosciences.

“The business of the statistician is to catalyze the scientific learning process.” - George E. P. Box

date()
## [1] "Sun Feb 10 12:31:12 2013"
# install.packages('pixmap')
# source('http://www.math.ntnu.no/inla/givemeINLA.R')

Preliminaries

Set working directory and load packages.

setwd("~/Dropbox/Tornadoes")
require(maptools)
require(maps)
require(ggplot2)
require(plyr)
require(evaluate)
require(reshape)
require(ggmap)
require(rgdal)
require(spatstat)
require(grid)
require(mapproj)
require(colorRamps)
require(knitcitations)
## Warning: replacing previous import 'write.bib' when loading 'pkgmaker'
require(reshape)
require(colorRamps)
require(raster)
require(lattice)
require(RColorBrewer)
require(vcd)

Tornado Data and Study Area

tmp = download.file("http://www.spc.noaa.gov/gis/svrgis/zipped/tornado.zip", 
    "tornado.zip", mode = "wb")
unzip("tornado.zip")
Torn = readOGR(dsn = ".", layer = "tornado")
## OGR data source with driver: ESRI Shapefile 
## Source: ".", layer: "tornado"
## with 56221 features and 28 fields
## Feature type: wkbPoint with 2 dimensions
names(Torn)
##  [1] "OM"         "YEAR"       "MONTH"      "DAY"        "DATE"      
##  [6] "TIME"       "TIMEZONE"   "STATE"      "FIPS"       "STATENUMBE"
## [11] "FSCALE"     "INJURIES"   "FATALITIES" "LOSS"       "CROPLOSS"  
## [16] "SLAT"       "SLON"       "ELAT"       "ELON"       "LENGTH"    
## [21] "WIDTH"      "NS"         "SN"         "SG"         "F1"        
## [26] "F2"         "F3"         "F4"
proj4string(Torn)
## [1] "+proj=lcc +lat_1=33 +lat_2=45 +lat_0=39 +lon_0=-96 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0"

We downloaded the dataset from \url{http://www.spc.noaa.gov/gis/svrgis/}. At the time of download the number of tornado reports was 56221.

ourlocation = geocode("Russell KS")
Map = get_map(location = unlist(ourlocation), maptype = "road", zoom = 7)
lla = attr(Map, "bb")$ll.lat
llo = attr(Map, "bb")$ll.lon
ula = attr(Map, "bb")$ur.lat
ulo = attr(Map, "bb")$ur.lon
Torn2 = subset(Torn, Torn$SLAT <= ula & Torn$SLAT >= lla & Torn$SLON <= ulo & 
    Torn$SLON >= llo)
Torn3 = subset(Torn2, Torn2$MONTH == 3 | Torn2$MONTH == 4 | Torn2$MONTH == 5 | 
    Torn2$MONTH == 6)
Torn3 = subset(Torn3, Torn3$FSCALE >= 1)
Torn3$Category = factor(Torn3$FSCALE, ordered = TRUE)
p1 = ggmap(Map, extent = "normal") + theme_bw() + geom_point(aes(x = SLON, y = SLAT, 
    size = Category), color = "red", alpha = 0.5, data = Torn3@data) + xlab(expression(paste("Longitude (", 
    degree, "E)"))) + ylab(expression(paste("Latitude (", degree, "N)"))) + 
    guides(size = guide_legend(reverse = TRUE))
p1

F1-F5 tornado occurrences 1950-2011.

FIGURE 1

The principal aim is to demonstrate a significant statistical model for local violent tornado rates. Thus we are motivated to use a portion of the data where reports are numerous and spatially homogeneous. Here we use the same region defined in \cite{ElsnerMichaelsScheitlinElsner2013} centered on Russell, Kansas and bounded by 36.10\( ^\circ \) and 41.57\( ^\circ \) N latitudes and 102.37\( ^\circ \) and 95.34\( ^\circ \) W longitudes. The region is the central Plains from northern Texas to central Nebraska. This is an area with a high concentration of tornadoes and where there are no large spatial gradients in occurrence rates although there are regional variations in the number of tornado reports. It also corresponds to an area favored by storm chasers. We further restrict our attention to the months from March through June when the primary tornado-producing severe convective storm is the supercell.

Figure~1 shows a road map of the study domain and the 2116 touchdown points in the region during March–June by Fujita damage scale over the period 1950–2011. The touchdown points are provided on a Lambert conformal conic (LCC) projection with reference parallels of 33 and 45\( ^\circ \) N latitudes. The native spatial unit is meters.

The Fujita scale (F scale) introduced in the 1970's is the standard measure of tornado intensity. It is based on the maximum damage caused along the tornado path and ranges from F0 (for minimum damage) to F5 (for total destruction). It was replaced by the enhanced Fujita scale during early 2007, using slightly different and more specific criteria for assessment \citep{Potter2007}. The Fujita scale and the enhanced Fujita scale are considered equivalent for climatological applications. In this study we consider only tornadoes with an F scale rating of one or higher because the F0 level was historically used as default when the amount of damage was unknown \citep{DoswellEtAl2009}. For reference the minimum velocity for a F1 tornado is 33 m~s\( ^{-1} \).

trpy = ddply(Torn3@data, .(YEAR, FSCALE), .drop = FALSE, "nrow")
mT = tapply(trpy$nrow, trpy$FSCALE, sum)
F4F5 = sum(mT[4:5])
F1F3 = sum(mT[1:3])

The total number of F4 and F5 tornadoes over the study domain is 59, which represents 2.87\% of the total number of F1 through F5 tornadoes over the 62-year period.

mR = tapply(trpy$nrow, trpy$FSCALE, mean)
vR = tapply(trpy$nrow, trpy$FSCALE, var)
trpy$Fscale = paste("F", trpy$FSCALE, sep = "")
p2 = ggplot(trpy, aes(x = YEAR, y = nrow)) + geom_point() + facet_grid(. ~ Fscale)
p2 = p2 + ylab("Number of Tornado Reports") + xlab("Year") + theme_bw()
p2 = p2 + geom_line(stat = "hline", yintercept = "mean", col = "red")
p2

plot of chunk timeSeriesPlot

FIGURE 2

Figure~2 shows time-series plots of the annual number of tornadoes by F scale. The horizontal bar is the annual rate over the study domain.Occurrence rates decrease significantly with increasing tornado ferocity. The annual rate is 20.6 for F1 tornadoes, which compares to 0.145 for F5 tornadoes.

The corresponding annual variance is much larger than the annual rate for F1, F2, and F3 tornadoes. This is because tornadoes frequently occur in clusters—defined as an outbreak—on days when the weather is particularly favorable for severe convective storms. Statistically the occurrence of a tornado on a given day increases the chance of another one on the same day.

V = Torn3[Torn3$FSCALE >= 4, ]
trpyV = ddply(V@data, .(YEAR), .drop = FALSE, "nrow")
table(table(as.character(V$DATE)))
## 
##  1  2  3  5 
## 35  8  1  1
N1 = as.numeric(table(table(as.character(V$DATE)))[1])
N2 = as.numeric(table(table(as.character(V$DATE)))[2])
Nmax = max(table(as.character(V$DATE)))

Violent tornadoes (F4 and F5) are less obviously clustered. Of the 59 violent tornadoes 35 days had a single event and another 8 days had two events. The largest outbreak occurred on April 26, 1991 with 5 violent tornadoes.

2116/384419000000 = 5.504411e-09 per m2 / 1e-6 = 0.005504411 per km2 = 55.04411 per 10000 km2

tbl = table(factor(V$YEAR, levels = 1950:2011))
tbl.df = as.data.frame(tbl)
tbl.df$Year = as.numeric(as.character(tbl.df[, 1]))
tbl.df = tbl.df[, c("Year", "Freq")]
tbl2.df = as.data.frame(table(factor(tbl.df$Freq, levels = 0:6)))
p3 = ggplot(tbl2.df, aes(x = Var1, y = Freq)) + geom_bar(stat = "bin")
p3 = p3 + xlab("Number of Violent Tornado Reports") + ylab("Number of Years") + 
    theme_bw()
p3
## Mapping a variable to y and also using stat="bin".  With stat="bin", it
## will attempt to set the y value to the count of cases in each group.  This
## can result in unexpected behavior and will not be allowed in a future
## version of ggplot2.  If you want y to represent counts of cases, use
## stat="bin" and don't map a variable to y.  If you want y to represent
## values in the data, use stat="identity".  See ?geom_bar for examples.
## (Deprecated; last used in version 0.9.2)

plot of chunk dataFrameYrvsFreq

FIGURE 3

gfPois = goodfit(tbl.df$Freq, type = "poisson")
xx = summary(gfPois)
print(gfPois)

Figure~3 shows the distribution of annual violent tornado counts in the region. More than half the years (32) are without a violent tornado. The expected number of years without a violent tornado is 24 assuming a Poisson distribution with an annual rate equal to the average count. Of the remaining years, 43\% have one and 30\% have two violent tornadoes. The year with the single largest outbreak (1991) had a total of six violent tornadoes.

A goodness-of-fit test for a Poisson distribution using the likelihood ratio gives a \( p \)-value of 0.0057 providing evidence that violent tornadoes tend to cluster in time. This clustering means there will be more years without violent tornadoes and more years with four or more than would be expected under the assumption of a Poisson distribution.

(3) Regional and Local Tornado Report Rates

The regional tornado rate is defined as the number of tornado reports per area. Since the native touchdown locations have a planar projection we do this using functions in the spatstat package of R.

All.ppp = as.ppp(Torn3["FSCALE"])
All = summary(All.ppp)
shortside(as.rectangle(All.ppp))/8
NV.ppp = All.ppp[All.ppp$marks <= 3 & All.ppp$marks >= 1]
NV = summary(NV.ppp)
V.ppp = All.ppp[All.ppp$marks >= 4]
V = summary(V.ppp)
area = All$window$area * 1e-06
density = All$intensity * 1e+10
NVdensity = NV$intensity * 1e+10
Vdensity = V$intensity * 1e+10

The area is square meters. To convert to square km divide by 106 (1000 x 1000). The intensity is points per square meter. To convert to points per square km multiply by 106. To convert to 10000 square km multiply by 1010.

March–June (1950–2011) F1–F5 tornadoes in our central Plains region with an area of 3.8442 × 105 km\( ^2 \). This amounts to an average rate of 55 tornadoes per 104 km\( ^2 \) per 62 years. The rate excludes F0 tornadoes and tornadoes with no F scale rating. However there are only 59 violent tornado touchdowns over the same period for a regional rate of 1.5 per 104 km\( ^2 \).

denNV = density.ppp(NV.ppp)
bwNV = attr(denNV, "sigma") * 0.001
denNV$v = denNV$v * 1e+10
denNV.sgdf = as(denNV, "SpatialGridDataFrame")
# namespace conflict (with maptools?) if package name is not included
cr = brewer.pal(7, "Reds")
rng = seq(20, 90, 10)
l0 = list("sp.points", spatstat::coords(V.ppp), col = "black", pch = 16)
p4 = spplot(denNV.sgdf, "v", col.regions = cr, sp.layout = list(l0), at = rng, 
    colorkey = list(space = "bottom", labels = paste(rng)), sub = list(expression(paste("Non-Violent Tornado Reports/10", 
        {
        }^4, " km", {
        }^2))))
p4

plot of chunk densityPlot

denV = density(V.ppp)
denV$v = denV$v * 1e+10
bwV = attr(denV, "sigma") * 0.001
denV.sgdf = as(denV, "SpatialGridDataFrame")
coVNV = cor(denV.sgdf$v, denNV.sgdf$v)

FIGURE 4

(4) Tornado Rates and Distance from Nearest City

As noted the estimated tornado rates are affected by a population bias described by fewer reports outside towns and cities. Here we model this bias on the rates of violent tornadoes using the method of \cite{ElsnerMichaelsScheitlineElsner2013}.

unzip("ci08au12.zip")
ll = "+proj=longlat +datum=NAD83"
US = readOGR(dsn = ".", layer = "ci08au12", p4s = ll)
## OGR data source with driver: ESRI Shapefile 
## Source: ".", layer: "ci08au12"
## with 43603 features and 23 fields
## Feature type: wkbPoint with 2 dimensions
UST = spTransform(US, CRS(proj4string(Torn3)))
USt = subset(UST, UST$POP_1990 > 2000)
US.ppp = as.ppp(USt)
W = as.owin(All.ppp)
CP.ppp = US.ppp[W]

We obtain centroid locations for all U.S. cities from http://www.nws.noaa.gov/geodata/catalog/national/html/cities.htm. We remove cities with 1990 population less than 2000 and those outside our study region. City centers are specified with longitudes and latitudes. We project the centers using the same LCC as for the tornado touchdown reports.

USt$NAME[USt$POP_1990 == 367302]
## [1] TULSA
## 28039 Levels: (BWI)BALTMORE-WASHINGTON ARPT ... ZWOLLE
Zc = distmap(CP.ppp)
m2km = 0.001
Zc$v = Zc$v * m2km
dd = as.numeric(quantile(Zc$v))
dist.sgdf = as(Zc, "SpatialGridDataFrame")

At each pixel where the local tornado rates are estimated in the previous section we determine the distance the grid point is from the nearest city center.

Zc = distmap(CP.ppp)
rhat2 = rhohat(unmark(V.ppp), Zc, bw = 0.2, smoother = "kernel", method = "transform")
# plot(rhat2)
bwm = sd(as.vector(Zc$v)) * 0.2
m2km = 0.001
km2 = 1e+10
dist2 = rhat2$Zc * m2km
rho2 = rhat2$rho * km2
hi2 = rhat2$hi * km2
lo2 = rhat2$lo * km2
Rhowide.df = data.frame(dist2 = dist2, rho2 = rho2, hi2 = hi2, lo2 = lo2)
bw = attr(rhat2, "stuff")$sigma
bwm = sd(as.vector(Zc$v)) * bw * m2km
p5 = ggplot(Rhowide.df) + geom_ribbon(aes(x = dist2, ymin = lo2, ymax = hi2), 
    alpha = 0.3) + geom_line(aes(x = dist2, y = rho2), col = "black") + # scale_y_continuous(limits = c(0, 3.5)) + scale_x_continuous(limits =
# c(0, 60)) +
ylab(expression(paste(hat(rho), " (Violent Tornado Reports/10", {
}^4, " km", {
}^2, ")"))) + xlab("Distance from Nearest City Center (km)") + theme_bw()
p5

plot of chunk Rhoplot

FIGURE 5

Repeat for NV tornadoes.

Zc = distmap(CP.ppp)
rhat2 = rhohat(unmark(NV.ppp), Zc, bw = 0.2, smoother = "kernel", method = "transform")
# plot(rhat2)
bwm = sd(as.vector(Zc$v)) * 0.2
m2km = 0.001
km2 = 1e+10
dist2 = rhat2$Zc * m2km
rho2 = rhat2$rho * km2
hi2 = rhat2$hi * km2
lo2 = rhat2$lo * km2

(5) Spatial Models for Violent Tornado Occurrences

The previous analysis suggests that violent tornado rates across the study domain might be modeled successfully using non-violent tornado rates after controlling for distance from nearest city. We begin by fitting a model to the violent tornado occurrences using distance from nearest city. The fit uses a maximum likelihood procedure with a Berman-Turner approximation \citep{BermanTurner1992} with an edge correction inward from the domain boundaries to 5 km.

Zc = distmap(CP.ppp)
fit1 = ppm(unmark(V.ppp), ~Zc, covariates = list(Zc = Zc), correction = "border", 
    rbord = 5000)
xx = summary(fit1)
trend = xx$coefs.SE.CI[2, 1] * 1000
trendSE = xx$coefs.SE.CI[2, 2] * 1000

As expected the model shows a decreasing trend with increasing distance from the city. The trend term amounts to a decrease of 4\% for every 1 km increase in distance. The standard error on the trend estimate is 1.1\% consistent with the notion that there is a significant population bias.

null = ppm(unmark(V.ppp), ~1, correction = "border", rbord = 5000)
AIC(null)
## [1] 2783
AIC(fit1)
## [1] 2726
x = 2 * (as.numeric(logLik(fit1)) - as.numeric(logLik(null)))/V.ppp$n

1: huge .1: large .01: good .001: okay

We check to see if the model with the trend term is better than a model without it by comparing the AIC from the two alternatives. The AIC is smaller with the distance from city model so we choose it over the null model. In fact, two times the difference in log likelihood between the two models divided by the number of violent tornadoes is close to 1 at 0.996, indicating a huge improvement.

Next we examine the violent tornado density relative to the non-violent tornado density. Let \( kappa(u) \) be the violent tornado density on grid \( u \) conditional on the grid's distance from nearest city and let \( Z_{nv}(u) \) be the distances to the nearest non-violent tornado at each grid, then the model is given by \begin{equation} \lambda(u) = \rho(Z_{nv}(u)) kappa(u). \end{equation} A smoothing estimator for \( Z_{nv}(u) \) is computed that gives the ratio of violent to non-violent tornado rates.

Znv = distmap(NV.ppp)
rhat3 = rhohat(fit1, Znv, bw = 0.2, smoother = "kernel", method = "transform")
# plot(rhat3)
dist3 = rhat3$Znv * m2km
rho3 = rhat3$rho
hi3 = rhat3$hi
lo3 = rhat3$lo
Rhowide3.df = data.frame(dist3 = dist3, rho3 = rho3, hi3 = hi3, lo3 = lo3)
p6 = ggplot(Rhowide3.df) + geom_ribbon(aes(x = dist3, ymin = lo3, ymax = hi3), 
    alpha = 0.3) + geom_line(aes(x = dist3, y = rho3), col = "black") + scale_x_continuous(limits = c(0, 
    20)) + ylab("Factor By Which Violent Tornado Rates Exceed the Overall Rate") + 
    xlab("Distance from Nearest Non-Violent Tornado (km)") + theme_bw()
p6

plot of chunk Rhoplot2

FIGURE 6

The curve in Fig.~6 shows the ratio as a function of distance from nearest non-violent tornado. The ratio peaks at 1.5 in the immediate vicinity of a non-violent tornado. This says that, on average, the risk of a violent tornado is higher by a factor of 1.5 in the vicinity of less violent tornadoes after accounting for the population bias. We randomize the non-violent touchdowns locations and repeat the smoothing and find, as expected, the ratio does not depart significantly from a ratio of one at any distance.

Randomlocations.ppp = runifpoint(n = NV.ppp$n, win = NV.ppp$window)
Z2 = distmap(Randomlocations.ppp)
rhat4 = rhohat(fit1, Z2, bw = 0.2, smoother = "kernel", method = "transform")
plot(rhat4)
abline(h = 1, col = "red")

plot of chunk smoothingEstimateResidualRandom

Next we use the method of maximum likelihood to fit a parametric point process model \citep{Besag1975} to the occurrence of violent tornadoes using distance from nearest city and distance from nearest non-violent tornado as covariates. We assume the point process is inhomogeneous Poisson. The integration domain for the pseudolikelihood is obtained by trimming off 5 km from the domain boundaries to reduce edge effects.

fit2 = ppm(unmark(V.ppp), trend = ~Zc + Znv, interaction = Poisson(), covariates = list(Zc = Zc, 
    Znv = Znv), correction = "border", rbord = 5000)
xx2 = summary(fit2)
trendCity = xx2$coefs.SE.CI[2, 1] * 1000
trendCitySE = xx2$coefs.SE.CI[2, 2] * 1000
trendNV = xx2$coefs.SE.CI[3, 1] * 1000
trendNVSE = xx2$coefs.SE.CI[3, 2] * 1000

As expected the model shows fewer violent tornadoes with increasing distance from the nearest city. Adding a cluster process (Matern or Thomas) to the model increases the parameter values by less than 4%.

fit3 = kppm(unmark(V.ppp), trend = ~Zc + Znv, clusters = "MatClust", covariates = list(Zc = Zc, 
    Znv = Znv), correction = "border", rbord = 5000, statistic = "K")
fit3
## Inhomogeneous cluster point process model
## Fitted to point pattern dataset 'unmark(V.ppp)' 
## Fitted by minimum contrast
##  Summary statistic: inhomogeneous K-function 
## 
## Trend formula: ~Zc + Znv
## 
## Fitted coefficients for trend formula:
## (Intercept)          Zc         Znv 
##  -2.111e+01  -3.252e-05  -1.108e-04 
## 
## Cluster model: Matern Cluster process 
## Fitted cluster parameters:
## kappa     R 
## 31165 85185 
## Mean cluster size: [pixel image]

(6) Model Diagnostics and Prediction

aic1 = AIC(fit1)
aic2 = AIC(fit2)
aic1
## [1] 2726
aic2
## [1] 2718
x = 2 * (logLik(fit2) - logLik(fit1))/59

The two-term final model appears to be a reasonable representation of the spatial pattern of violent tornadoes across the study domain. The AIC for the model with only the distance-from-nearest-city term is 2726 and the AIC for the two-term final model is 2718. The difference in log likelihoods between the two models divided by the number of violent tornadoes is 0.172, indicate a large improvement.

x66 = quadrat.test(fit2, nx = 6, ny = 6, method = "MonteCarlo", nsim = 999)

A \( \chi \)-squared goodness-of-fit test of the null hypothesis that the final model is true against the very general alternative that it is not gives a \( p \)-value of 0.55 when using a 7 by 7 quadrat scheme suggesting the model is adequate. Similar large \( p \)-values are obtained using 6 by 6 and 8 by 8 quadrats.

plot(unmark(V.ppp), pch = 25, cols = "red", main = "")
M = quadrat.test(fit2, nx = 6, ny = 6)
## Warning: Some expected counts are small; chi^2 approximation may be
## inaccurate
plot(M, add = TRUE, cex = 1.1)

plot of chunk predictionPlot

cor(M$expected, M$observed)
## [1] 0.554

FIGURE 7

Figure~7 shows the actual violent tornado count versus the expected number from the final model using a 6 by 6 quadrat scheme. In each grid box the actual count is the integer on the top left and the expected number is the value on the top right. The value at the bottom is the standardized residual. Negative residuals indicate where the model predicts a greater number of violent tornadoes than have actually occurred.

plot(unmark(V.ppp), pch = 25, cols = "red")
B = dirichlet(V.ppp)
M = quadrat.test(fit2, tess = B)
plot(M, add = TRUE, cex = 1.1)

FIGURE 7B

km2d = km2 * 100/dim(tbl.df)[1]
predFit2 = predict(fit2, ngrid = 32, type = "lambda")
predFit2$v = predFit2$v * km2d
predFit2.sgdf = as(predFit2, "SpatialGridDataFrame")
cr = brewer.pal(5, "Reds")
rng = seq(0, 10, 2)
l0 = list("sp.points", spatstat::coords(V.ppp), col = "black", pch = 16)
p8 = spplot(predFit2.sgdf, "v", col.regions = cr, sp.layout = list(l0), at = rng, 
    colorkey = list(space = "bottom", labels = paste(rng)), sub = list(expression(paste("Predicted Violent Tornado Reports/10", 
        {
        }^4, " km", {
        }^2, "/Century"))))
p8

plot of chunk predictionPlot2

FIGURE 8

Figure~8 shows predicted rates in violent tornado reports per 100 km\( ^2 \) per century. The actual occurrences are shown as points. The model appears to do a good job; showing higher rates in regions where violent tornadoes have occurred and lower rates in regions without a tornado.

denV = density(V.ppp)
denV$v = denV$v * km2d
predFit3 = predict(fit2, ngrid = 128, type = "lambda")
predFit3$v = predFit3$v * km2d
predFit3$v = denV$v - predFit3$v
predFit3.sgdf = as(predFit3, "SpatialGridDataFrame")
cr = brewer.pal(10, "Spectral")
rng = seq(-10, 10, 2)
p9 = spplot(predFit3.sgdf, "v", col.regions = cr, at = rng, colorkey = list(space = "bottom", 
    labels = paste(rng)), sub = list(expression(paste("Model Residuals (Violent Tornadoes/10", 
    {
    }^4, " km", {
    }^2, "/Century)"))))
p9

plot of chunk residuals

corObsPre = cor(as.vector(predFit3$v), as.vector(denV$v))

FIGURE 9

Figure~9 maps the observed minus model predicted violent tornado rates. The observed rates are based on a kernel smoother. Areas shown in red indicate where the model predicts more violent tornadoes than have been observed. The correlation between observed and predicted rates over this 128 by 128 grid is 0.45.

proj4string(predFit3.sgdf) = proj4string(Torn3)
spdf = spTransform(predFit3.sgdf, CRS(ll))
df = data.frame(res = spdf$v, lon = spdf@coords[, 1], lat = spdf@coords[, 2])
p10 = ggmap(Map, extent = "normal") + theme_bw() + geom_point(aes(x = lon, y = lat, 
    color = res), data = df) + xlab(expression(paste("Longitude (", degree, 
    "E)"))) + ylab(expression(paste("Latitude (", degree, "N)")))
p10
Znv = distmap(NV.ppp)
tC = numeric()
tCSE = numeric()
tNV = numeric()
tNVSE = numeric()
nC = numeric()
pp = seq(1000, 3000, 500)
for (i in pp) {
    USt = subset(UST, UST$POP_1990 > i)
    US.ppp = as.ppp(USt)
    W = as.owin(All.ppp)
    CP.ppp = US.ppp[W]
    nC = c(nC, CP.ppp$n)
    Zc = distmap(CP.ppp)
    fitX = ppm(unmark(V.ppp), ~Zc + Znv, covariates = list(Zc = Zc, Znv = Znv), 
        correction = "border", rbord = 5000)
    xx2 = summary(fitX)
    tC = c(tC, xx2$coefs.SE.CI[2, 1] * 1000)
    tCSE = c(tCSE, xx2$coefs.SE.CI[2, 2] * 1000)
    tNV = c(tNV, xx2$coefs.SE.CI[3, 1] * 1000)
    tNVSE = c(tNVSE, xx2$coefs.SE.CI[3, 2] * 1000)
}

Pop Thr tC (%) s.e.(tC) (%) tNV s.e.(tNV) (%) # cities 1000 -4.1 1.6 -11.1 3.7 320 1500 -3.0 1.3 -11.5 3.6 226 2000 -3.1 1.1 -10.8 3.7 177 2500 -2.2 0.9 -11.3 3.7 141 3000 -1.8 0.8 -11.5 3.7 124

TABLE 2

Table~2 shows the sensitivity of the model parameters to changes in population threshold. The magnitude of the distance-from-nearest-city trend decreases with increasing population threshold. With a minimum threshold of 1000 residents defining a city the decrease per kilometer is 4.1\%. With a minimum threshold of 3000 residents the decrease per kilometer is less than half of that at 1.8\%. The magnitude of the distance-from-nearest-non-violent-tornado trend is insensitive to these changes. With a threshold defining a city less than 2000, the distance from nearest city parameter is larger along with a larger standard error. With a threshold greater than 2000, the city parameter is about the same.

To check the senstivity of choosing F4 as our threshold for defining a violent tornado we decrease the threshold to F3 and call F1 and F2 non-violent tornadoes. We then refit the model to the data and find the distance-from-nearest city parameter is smaller at -1\% per km and the distance-from-nearest non-violent tornado parameter is about the same at -12.3\% per km. The two terms remain statistically significant with this new definition. The smaller population bias is consistent with \cite{AndersonEtAl2007} who show that weaker tornado reports in Oklahoma vary less with population density than do the stronger tornadoes. Including F0 tornadoes in the set of non-violent tornadoes does not change these results.

(7) Simulating Violent Tornado Occurrences

Insured loss estimates from tornadoes require an estimate of the tornodo rate. The rate together with exposure information, policy conditions, and vulnerability functions for construction and automobile types can be used to simulate events to produce a probabilistic description of potential losses.

fit2 = ppm(unmark(V.ppp), ~Zc + Znv, covariates = list(Zc = Zc, Znv = Znv), 
    correction = "border", rbord = 5000)
plot(simulate(fit2, nsim = 16), main = "", pch = 16)
## Generating 16 simulated patterns ...1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
##  16.

plot of chunk simulation

FIGURE 10

Here we use our model to generate an event set containing sixteen 62-year (992 years) simulations of violent tornado touchdowns (see Fig.~10). The method simulates realizations from our fitted inhomogeneous point process model using the Metropolis-Hastings algorithm. Simulations provide a catalogue of touchdown locations that can be used as part of a larger catastrophe model that also includes information on track length, width, and heading.

plot(rmh(fit2))
rmhmodel(fit2)

(8) Summary and Conclusions

Reliable and stable estimates of tornado rates are critical for hazard assessment. The high level of tornado activity in 2008 and 2011 is a call to increase research efforts to better understand the relationship between tornadoes and climate change \citep{DiffenbaughEtAl2008}. In this paper we illustrated a statistical point process model that uses the spatial occurrence of non-violent tornadoes to predict the distribution of the rare, violent tornadoes during springtime across the U.S. central Great Plains. The model has a component that accounts for the population bias in the tornado record based on the distance to nearest city \citep{ElsnerMichaelsScheitlinElsner2013}.

The model is based on modern tools for statistically analyzing spatial point data. It is used to make predictions of violent tornado rates. It is also used to to simulate approximately 1000 years of touchdown points. The principal findings of the research in this paper include: \begin{itemize} \item The total number of F4 and F5 tornadoes over the central Great Plains is 59, which represents 2.87\% of the total number of F1 through F5 tornadoes over the 62-year period (1950–2011).

\item The average rate of non-violent tornadoes is 55 per 100 km\( ^2 \) per 62 years.

\item The average rate of violent tornadoes over the same period is 1.5 per 100 km\( ^2 \).

\item The correlation between between non-violent and violent rates using a 128 by 128 grid covering the study domain is 0.84.

\item Violent tornado report density peaks at 72.7 per 100 km\( ^2 \) (62 yr) in the city to 37.2 per 100 km\( ^2 \) in the countryside.

\item The risk of a violent tornado is higher by a factor of 1.5, on average, in the vicinity of less violent tornadoes after accounting for the population bias.

\item A model for the occurrence rate of violent tornadoes indicates that rates are higher by 10.8 (+/- 3.7[s.e.])\% for every 1 km increase in distance from nearest non-violent tornado controlling for distance from nearest city.

\item Model significance and distance-from-nearest non-violent tornado parameter are not sensitive to population threshold or definition of violent tornado.

\item The model is useful for generating a catalogue of touchdown points as part of a comprehensive catastrophe model.

\end{itemize}

The model can be used to estimate local violent tornado rates. Regions where the estimate rates are higher than the smoothed estimates based on the historical occurrences indicate where insurance coverage might be too low. The modeling approach can be applied where ever tornadoes occur.

The model can be improved by including measurement error on the location estimates. Measurement error will attenuate the response of the violent tornado density to distance from non-violent tornadoes. The model might be improved by considering the events as a marked point process where the marks are the estimated F scale rating. The code used in creating the analyzes and models in this paper is available at http://rpubs.com/jelsner/3185.

References

References.bib

How do the parameters change if we redefine violent tornadoes?

NV.ppp = All.ppp[All.ppp$marks <= 2 & All.ppp$marks >= 1]
NV = summary(NV.ppp)
V.ppp = All.ppp[All.ppp$marks >= 3]
V = summary(V.ppp)
Znv = distmap(NV.ppp)
USt = subset(UST, UST$POP_1990 > 2000)
US.ppp = as.ppp(USt)
W = as.owin(All.ppp)
CP.ppp = US.ppp[W]
Zc = distmap(CP.ppp)
fit2 = ppm(unmark(V.ppp), ~Zc + Znv, covariates = list(Zc = Zc, Znv = Znv), 
    correction = "border", rbord = 5000)
xx2 = summary(fit2)
tC = xx2$coefs.SE.CI[2, 1] * 1000
tCSE = xx2$coefs.SE.CI[2, 2] * 1000
tNV = xx2$coefs.SE.CI[3, 1] * 1000
tNVSE = xx2$coefs.SE.CI[3, 2] * 1000
tC
tCSE
tNV
tNVSE
Torn3 = subset(Torn2, Torn2$MONTH == 3 | Torn2$MONTH == 4 | Torn2$MONTH == 5 | 
    Torn2$MONTH == 6)
Torn3 = subset(Torn3, Torn3$FSCALE >= 0)
All.ppp = as.ppp(Torn3["FSCALE"])
NV.ppp = All.ppp[All.ppp$marks <= 2 & All.ppp$marks >= 1]
NV = summary(NV.ppp)
V.ppp = All.ppp[All.ppp$marks >= 3]
V = summary(V.ppp)
Znv = distmap(NV.ppp)
USt = subset(UST, UST$POP_1990 > 2000)
US.ppp = as.ppp(USt)
W = as.owin(All.ppp)
CP.ppp = US.ppp[W]
Zc = distmap(CP.ppp)
fit2 = ppm(unmark(V.ppp), ~Zc + Znv, covariates = list(Zc = Zc, Znv = Znv), 
    correction = "border", rbord = 5000)
xx2 = summary(fit2)
tC = xx2$coefs.SE.CI[2, 1] * 1000
tCSE = xx2$coefs.SE.CI[2, 2] * 1000
tNV = xx2$coefs.SE.CI[3, 1] * 1000
tNVSE = xx2$coefs.SE.CI[3, 2] * 1000
tC
tCSE
tNV
tNVSE
Torn3 = subset(Torn2, Torn2$MONTH == 3 | Torn2$MONTH == 4 | Torn2$MONTH == 5 | 
    Torn2$MONTH == 6)
Torn3 = subset(Torn3, Torn3$FSCALE >= 0)
All.ppp = as.ppp(Torn3["FSCALE"])
NV.ppp = All.ppp[All.ppp$marks <= 3 & All.ppp$marks >= 1]
NV = summary(NV.ppp)
V.ppp = All.ppp[All.ppp$marks >= 4]
V = summary(V.ppp)
Znv = distmap(NV.ppp)
USt = subset(UST, UST$POP_1990 > 2000)
US.ppp = as.ppp(USt)
W = as.owin(All.ppp)
CP.ppp = US.ppp[W]
Zc = distmap(CP.ppp)
fit2 = ppm(unmark(V.ppp), ~Zc + Znv, covariates = list(Zc = Zc, Znv = Znv), 
    correction = "border", rbord = 5000)
xx2 = summary(fit2)
tC = xx2$coefs.SE.CI[2, 1] * 1000
tCSE = xx2$coefs.SE.CI[2, 2] * 1000
tNV = xx2$coefs.SE.CI[3, 1] * 1000
tNVSE = xx2$coefs.SE.CI[3, 2] * 1000
tC
tCSE
tNV
tNVSE