Daily U.S. Tornado Frequency Distributions

James B. Elsner, Thomas H. Jagger, Holly M. Widen, and Daniel R. Chavas

Code in support of our paper submitted to Environmental Research Letters.

Set working directory and load packages.

date()

## [1] "Mon Dec  2 08:16:37 2013"

setwd("~/Dropbox/Tornadoes")
library(ggplot2)
library(rgdal)
library(reshape)
library(plyr)
library(spatstat)
library(maptools)
library(maps)
library(ggmap)
library(raster)
library(VGAM)

Tornado data

Original data from the Storm Prediction Center (SPC) http://www.spc.noaa.gov/gis/svrgis/. New data (through 2012) from Patrick Marsh. New data are polyline files.

tmp = download.file("http://www.pmarshwx.com/gis/torn.zip", "torn.zip", mode = "wb")

Read the tornado data.

unzip("torn.zip")
TornL = readOGR(dsn = "torn", layer = "torn", stringsAsFactors = FALSE)

## OGR data source with driver: ESRI Shapefile 
## Source: "torn", layer: "torn"
## with 57150 features and 21 fields
## Feature type: wkbLineString with 2 dimensions

Change the data types in the data slot.

TornL$Year = as.integer(TornL$YR)
TornL$Month = as.integer(TornL$MO)
TornL$Day = as.integer(TornL$DY)
TornL$EF = as.integer(TornL$MAG)
TornL$Date = as.Date(TornL$DATE)

Counts by month, year, and EF threshold.

Torn2.df = as.data.frame(TornL)
Y.df = ddply(Torn2.df, .(Year), summarize, F0 = sum(EF == 0), F1 = sum(EF == 
    1), F2 = sum(EF == 2), F3 = sum(EF == 3), F4 = sum(EF == 4), F5 = sum(EF == 
    5), F0F5 = sum(EF >= 0), F1F5 = sum(EF >= 1), F2F5 = sum(EF >= 2), F3F5 = sum(EF >= 
    3), F4F5 = sum(EF >= 4))

YD.df = ddply(Torn2.df, .(Date, Year), summarize, nTor = length(OM), nVTor = sum(EF >= 
    4), nNVTor = sum(EF < 4))
xx = YD.df$nVTor > 0 & YD.df$nNVTor == 0
sum(xx)

## [1] 4

YD.df$Year[xx]

## [1] 1950 1950 1953 1988

YmaxD.df = ddply(YD.df, .(Year), summarize, nTdays = length(nTor), maxTor = max(nTor))
YmaxD2.df = merge(Y.df, YmaxD.df, by = "Year")
YmaxD2.df$frac = YmaxD2.df$maxTor/YmaxD2.df$F0F5
YmaxD3.df = subset(YmaxD2.df, Year >= 1994)
minFrac = min(YmaxD3.df$frac) * 100
maxFrac = max(YmaxD3.df$frac) * 100
minYr = YmaxD3.df$Year[which.min(YmaxD3.df$frac)]
maxYr = YmaxD3.df$Year[which.max(YmaxD3.df$frac)]

Introduction

In the United States over the period 1994–2012 there was an average of 1265 tornadoes per year with an interquartile range of 256 tornadoes. The percentage of the annual count occurring on the day of the year with the most tornadoes ranged from a low of 3.2\% in 1998 to a high of 12.2\% in 2011. Moreover, tornadoes are rated on a categorical damage scale ranging from weak EF0 to violent EF4 and EF5. Of the 24032 U.S. tornadoes in the database over the 19-year period only 0.57\% were violent.

Study period and tornado data

Torn2.df = subset(Torn2.df, Year >= 1994 & EF >= 1)
Y2.df = ddply(Torn2.df, .(Year), summarize, F1 = sum(EF == 1), F2 = sum(EF == 
    2), F3 = sum(EF == 3), F4 = sum(EF == 4), F5 = sum(EF == 5))
Y2Long = melt(Y2.df, id = "Year")
ggplot(Y2Long, aes(Year, value)) + geom_bar(stat = "identity", fill = "grey") + 
    facet_wrap(~variable, scale = "free_y", ncol = 1) + theme_bw() + ylab("Tornado Frequency")

plot of chunk figure1

FIGURE 1 Annual tornado frequency by damage category. The trend lines are shown along with the 95\% uncertainty interval about the trend (gray band).

Aggregate damage category frequency distribution

MY.df = ddply(Torn2.df, .(Month, Year), summarize, F1 = sum(EF == 1), F2 = sum(EF == 
    2), F3 = sum(EF == 3), F4 = sum(EF == 4), F5 = sum(EF == 5), F1F5 = sum(EF >= 
    1), F2F5 = sum(EF >= 2), F3F5 = sum(EF >= 3), F4F5 = sum(EF >= 4))
cs = colSums(MY.df[, 3:7])
cs

##   F1   F2   F3   F4   F5 
## 6420 1938  567  124   13

Total = sum(cs)
cs/Total

##       F1       F2       F3       F4       F5 
## 0.708453 0.213860 0.062569 0.013684 0.001435

cs[1:4]/cs[2:5]

##    F1    F2    F3    F4 
## 3.313 3.418 4.573 9.538

There are 9062 tornado reports (EF1 or higher) over the period 1994–2012 inclusive. Table~\ref{TotalDist} gives the distribution of the reports by EF rating by count and by percentage of the total. It also gives the factor by which the frequency in the category exceeds the frequency in the next highest category.

\begin{table} \caption{Distribution of tornadoes (1994–2012) by EF category.} \begin{tabular}{lrcc} \hline Category & Count & Percent of Total & Factor \ \hline EF1 & 6420 & 70.8 & 2.31 \ EF2 & 1938 & 21.4 & 2.42 \ EF3 & 567 & 6.26 & 3.57 \ EF4 & 124 & 1.37 & 8.54 \ EF5 & 13 & 0.14 & \ \hline \end{tabular} \label{TotalDist} \end{table}

Distribution frequency of tornado days

allDays = as.integer(as.Date("2012-12-31") - as.Date("1994-01-01"))
YD2.df = ddply(Torn2.df, .(Date, Year), summarize, nTor = length(OM), nVtor = sum(EF >= 
    4))
torDays = dim(YD2.df)[1]
per = torDays/allDays * 100
mTor0 = round(mean(YD2.df$nTor[YD2.df$nVtor == 0]))
mTorV = round(mean(YD2.df$nTor[YD2.df$nVtor != 0]))
YD2nd.df = ddply(YD2.df, .(Year), summarize, nTdays = length(Date), maxNtor = max(nTor))
p2a = ggplot(YD2nd.df, aes(Year, nTdays)) + geom_bar(stat = "identity", fill = "grey") + 
    theme_bw() + ylab("Tornado Days")
p2b = ggplot(YD2nd.df, aes(Year, maxNtor)) + geom_point() + theme_bw() + ylab("Number of Tornadoes\n on the Day with the Most Tornadoes")

source("multiplot.txt")
mat = matrix(c(1, 2), nrow = 1, byrow = TRUE)
p2a = p2a + ggtitle("a") + theme(plot.title = element_text(hjust = 0))
p2b = p2b + ggtitle("b") + theme(plot.title = element_text(hjust = 0))
multiplot(p2a, p2b, layout = mat)

## Loading required package: grid

plot of chunk Figure2

FIGURE 2 Annual number of tornado days and the annual count of on the tornado-day with the most tornadoes.

A majority of days have no tornadoes. Over the 6939 days from January 1, 1994 through December 31, 2012, only 2012 (28.9955\%) had at least one EF1 or stronger tornado.

xx = table(YD2.df$nTor)
datf = data.frame(xx)
datf$Size = as.integer(names(xx))
p3a = ggplot(datf, aes(x = Size, y = Freq)) + scale_x_log10() + scale_y_log10() + 
    geom_point() + xlab(expression(paste("Size (Number of Tornadoes ", d^{
    -1
}, ")"))) + ylab("Frequency (Number of Days)") + theme_bw()
zipflaw = vglm(formula = Size ~ 1, family = zipf(link = identity), data = datf, 
    weights = Freq)
zipflaw.s = summary(zipflaw)

Figure 2 shows the number of tornado days by year as well as the number of tornadoes on the day with the most tornadoes by year. The time series indicate no upward or downward trends although the 145 tornadoes on April 27, 2011 stands out as very unusual.

Figure 3a shows the frequency distribution of tornado days by size on a log-log graph. Size refers to the number of tornadoes occurring during a single calendar day. The straight-line appearance of the points suggests that the size of a tornado day follows a power-law relationship (Zipf's law or Pareto distribution).

Formally, the probability that a random tornado day has \( k \) tornadoes is given by \[ P(X = k) = 1/\zeta(s) \cdot 1/k^s \] where \( s \) is the scaling exponent and where \[ \zeta(s) = \sum_{n=1}^\infty \frac{1}{n^s} \] is Reimann's zeta function. The rank \( n \) ranges from 1 to the number of unique tornado-day sizes in the record (47).

YD2.df$rankday = rank(-YD2.df$nTor, ties.method = "first")
ggplot(YD2.df, aes(x = rankday, y = nTor)) + scale_x_log10() + scale_y_log10() + 
    geom_point() + xlab(expression(paste("Rank of day by number of tornadoes"))) + 
    ylab("Number of Tornadoes") + theme_bw()

plot of chunk notZipfslaw

model = lm(log(nTor) ~ log(rankday), data = YD2.df)
model.s = summary(model)

We test for Zipf's law by regressing the logarithm of the rank of tornado-day size onto the logarithm of size. A slope of one indicates that Zipf's law applies. We find a slope of -0.943 with a standard error of 0.0058. Thus we reject Zipf's law.

We estimate the scaling exponent \( s \) using the {\tt vglm()} function in R to be 1.64 with a standard error of 0.019. This result is similiar to Malamud and Turcotte (2012) who find a power-law distribution with a scaling exponent of 1.8 for the frequency density of tornado path length per day.

The power-law distribution helps explain why the aggregate distribution frequency discussed in the previous section is complicated. We find that the average number of tornadoes on tornado days without a violent tornado (EF4 or EF5) is 4 which compares to an average of 20 tornadoes on days with at least one violent tornado. Since most tornado days have less than a handful of tornadoes and most violent tornadoes occur with larger outbreaks (days with more tornadoes) there is an excess accumulation in the aggregate of weaker EF1 and EF2 tornadoes relative to violent tornadoes.

N = dim(datf)[1]
n = 1:N
s = 1.643728
z = 1/n^s
zetas = sum(z)
z2 = 1/n^(s - 1/2)
zetas2 = sum(z2)
k = 1:100
pxk = 1/zetas * 1/k^s
datf2 = data.frame(k, pxk)
p3b = ggplot(datf2, aes(x = k, y = pxk)) + geom_point() + scale_x_log10() + 
    scale_y_log10() + ylab("Probability of k Tornadoes\n Given a Tornado Day") + 
    xlab("Number of Tornadoes (k)") + theme_bw()
p145 = 1/zetas * 1/145^s
p145

## [1] 0.0001369

mat = matrix(c(1, 2), nrow = 1, byrow = TRUE)
p3a = p3a + ggtitle("a") + theme(plot.title = element_text(hjust = 0))
p3b = p3b + ggtitle("b") + theme(plot.title = element_text(hjust = 0))
multiplot(p3a, p3b, layout = mat)

plot of chunk Figure3

FIGURE 3 Power-law for the daily tornado frequency. (a) The observed distribution frequency by size (number of tornadoes per day). (b) The probability distribution of the number of tornadoes on a random tornado day assuming a power-law with a scaling exponent of 1.644.

Figure 3b shows the probability distribution of tornado-day sizes out to an outbreak with 100 tornadoes assuming a power law with a scaling exponent of 1.64. Using this relationship we find a per tornado-day probability of 0.014\% that a tornado day has 145 tornadoes. This is the conditional probability given a tornado day.

Disaggregated damage scale frequency distribution

D.df = ddply(Torn2.df, .(Date), summarize, F1 = sum(EF == 1), F2 = sum(EF == 
    2), F3 = sum(EF == 3), F4 = sum(EF == 4), F5 = sum(EF == 5))
D2.df = subset(D.df, F5 != 0)
D2Long = melt(D2.df, id = "Date")
meanSE = ddply(D2Long, .(variable), summarize, avg = mean(value), N = length(value), 
    sd = sd(value), se = sd/sqrt(N), ciMult = qt(0.95/2 + 0.5, N - 1), ci = se * 
        ciMult)
ff = colMeans(D2.df[, -1])
cs2 = colSums(D2.df[, -1])
Total = sum(cs2)
cs2/Total

##      F1      F2      F3      F4      F5 
## 0.51923 0.23077 0.14744 0.06090 0.04167

cs2[1:4]/cs2[2:5]

##    F1    F2    F3    F4 
## 2.250 1.565 2.421 1.462

p4a = ggplot(meanSE, aes(x = variable, y = avg)) + geom_bar(stat = "identity", 
    fill = "grey") + geom_errorbar(aes(ymin = avg - se, ymax = avg + se), width = 0.1) + 
    xlab("Damage Category") + ylab("Mean Number of Tornadoes") + theme_bw()

D.df = ddply(Torn2.df, .(Date), summarize, F1 = sum(EF == 1), F2 = sum(EF == 
    2), F3 = sum(EF == 3), F4 = sum(EF == 4), F5 = sum(EF == 5))
D3.df = subset(D.df, F4 != 0)
D3Long = melt(D3.df, id = "Date")
meanSE2 = ddply(D3Long, .(variable), summarize, avg = mean(value), N = length(value), 
    sd = sd(value), se = sd/sqrt(N), ciMult = qt(0.95/2 + 0.5, N - 1), ci = se * 
        ciMult)
ff2 = colMeans(D2.df[, -1])
cs3 = colSums(D2.df[, -1])
Total = sum(cs3)
cs3/Total

##      F1      F2      F3      F4      F5 
## 0.51923 0.23077 0.14744 0.06090 0.04167

cs3[1:4]/cs3[2:5]

##    F1    F2    F3    F4 
## 2.250 1.565 2.421 1.462

p4b = ggplot(meanSE2, aes(x = variable, y = avg)) + geom_bar(stat = "identity", 
    fill = "grey") + geom_errorbar(aes(ymin = avg - se, ymax = avg + se), width = 0.1) + 
    xlab("Damage Category") + ylab("Mean Number of Tornadoes") + theme_bw()

mat = matrix(c(1, 2), nrow = 1, byrow = TRUE)
p4a = p4a + ggtitle("a") + theme(plot.title = element_text(hjust = 0))
p4b = p4b + ggtitle("b") + theme(plot.title = element_text(hjust = 0))
multiplot(p4a, p4b, layout = mat)

plot of chunk Figure4

FIGURE 4 Mean number of tornadoes by EF damage category. (a) On days with at least one EF5 tornado and (b) on days with at least one EF4 tornado.

Consistent with earlier studies on tornado path length we find a power-law scaling relationship with an exponent of 1.64 (0.019 s.e.) for the distribution frequency on tornado days. This indicates a long-tailed distribution with the potential for surprisingly large outbreaks as was observed on April 27, 2011.

http://www.nssl.noaa.gov/users/brooks/public_html/papers/meyeretal.pdf