Population Adjusted Tornado Probabilities

Group term project for Applied Spatial Statistics: Spring 2013

Students: Migliorelli, Loury; Michaels, Laura; Strazzo, Sarah; Mulholland, Brendan; Cruz, Rizalino; Patterson, Michael; Fraza, Erik; Xing, Guang; Amrine, Cameron; Widen, Holly

Background: Thom 1963: Tornado probabilities, Mon. Wea. Rev., Elsner, Michaels, Scheitlin, Elsner 2013: The decreasing population bias in tornado records across the central Plains, Weather, Climate, and Society, Elsner, Murnane, Jagger, Widen 2013: A spatial point process model for violent tornado occurrence in the U.S. Great Plains, Mathematical Geosciences.

Demonstrate the methods on one state, then apply them on several others. Explain the limitations of the methodology as applied to certain states.

Download data and subset

Begin by downloading the data and reading it into R. Use the rgdal package for the readOGR() function. Print the projection information.

Download tornado data.

tmp = download.file("http://www.spc.noaa.gov/gis/svrgis/zipped/tornado.zip", 
    "tornado.zip", mode = "wb")

Read the tornado data.

unzip("tornado.zip")
require(rgdal)
Torn = readOGR(dsn = ".", layer = "tornado")
## OGR data source with driver: ESRI Shapefile 
## Source: ".", layer: "tornado"
## with 56221 features and 28 fields
## Feature type: wkbPoint with 2 dimensions
names(Torn)
##  [1] "OM"         "YEAR"       "MONTH"      "DAY"        "DATE"      
##  [6] "TIME"       "TIMEZONE"   "STATE"      "FIPS"       "STATENUMBE"
## [11] "FSCALE"     "INJURIES"   "FATALITIES" "LOSS"       "CROPLOSS"  
## [16] "SLAT"       "SLON"       "ELAT"       "ELON"       "LENGTH"    
## [21] "WIDTH"      "NS"         "SN"         "SG"         "F1"        
## [26] "F2"         "F3"         "F4"
proj4string(Torn)
## [1] "+proj=lcc +lat_1=33 +lat_2=45 +lat_0=39 +lon_0=-96 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0"

Tornadoes through 2012. CSV file only.

con = "http://www.spc.noaa.gov/wcm/data/1950-2012_torn.csv"
TornA = read.csv(con, header = FALSE)

Subset on year and F scale and create a factor variable (Category) for the F-scale. Convert length from miles and width from yards to meters. Compute per tornado path area.

Torn2 = subset(Torn, FSCALE >= 0 & YEAR >= 1950)
Torn2$Category = factor(Torn2$FSCALE, ordered = TRUE)
Torn2$Length = Torn2$LENGTH * 1609.34
Torn2$Width = Torn2$WIDTH * 0.9144
Torn2$Area = Torn2$Width * Torn2$Length

Choose a state

Here I'm interested in the state of Florida (12) http://www2.census.gov/cgi-bin/shapefiles2009/state-files?state=12 http://www.census.gov/geo/www/cob/st2000.html#shp http://www.nws.noaa.gov/geodata/catalog/wsom/html/cwa.htm

# tmp =
# download.file('http://www.census.gov/geo/cob/bdy/st/st00shp/st20_d00_shp.zip',
# 'st20_d00_shp.zip', mode='wb') tmp =
# download.file('http://www.census.gov/geo/cob/bdy/st/st00shp/st19_d00_shp.zip',
# 'st19_d00_shp.zip', mode='wb')
tmp = download.file("http://www.census.gov/geo/cob/bdy/st/st00shp/st12_d00_shp.zip", 
    "st12_d00_shp.zip", mode = "wb")
tmp = download.file("http://www.nws.noaa.gov/geodata/catalog/wsom/data/w_02ap13.zip", 
    "w_02ap13.zip", mode = "wb")

Read the data.

# unzip('st20_d00_shp.zip') STATE = readOGR(dsn = '.', layer = 'st20_d00')
# unzip('st19_d00_shp.zip') STATE = readOGR(dsn = '.', layer = 'st19_d00')
unzip("st12_d00_shp.zip")
STATE = readOGR(dsn = ".", layer = "st12_d00")
## OGR data source with driver: ESRI Shapefile 
## Source: ".", layer: "st12_d00"
## with 14 features and 10 fields
## Feature type: wkbPolygon with 2 dimensions
unzip("w_02ap13.zip")
CWArea = readOGR(dsn = ".", layer = "w_02ap13")
## OGR data source with driver: ESRI Shapefile 
## Source: ".", layer: "w_02ap13"
## with 123 features and 5 fields
## Feature type: wkbPolygon with 2 dimensions

Add projection information and transform the state boundaries.

ll = "+proj=longlat +ellps=GRS80 +datum=NAD83"
proj4string(STATE) = CRS(ll)
STATE = spTransform(STATE, CRS(proj4string(Torn2)))
require(rgeos)
stArea = gArea(STATE)
stArea/1e+06
## [1] 153583

Download the county boundaries (all US).

tmp = download.file("http://www.nws.noaa.gov/geodata/catalog/county/data/c_02ap13.zip", 
    "c_02ap13.zip", mode = "wb")

Read the data and select state.

unzip("c_02ap13.zip")
COUNTY = readOGR(dsn = ".", layer = "c_02ap13")
## OGR data source with driver: ESRI Shapefile 
## Source: ".", layer: "c_02ap13"
## with 3312 features and 11 fields
## Feature type: wkbPolygon with 2 dimensions
COUNTY = COUNTY[COUNTY$STATE == "FL", ]
COUNTY = spTransform(COUNTY, CRS(proj4string(Torn2)))

Estimate the raw statewide tornado risk

Subset by state boundary.

st = overlay(Torn2, STATE)
st = !is.na(st)
Torn3 = Torn2[st, ]

Check the subsetting.

plot(STATE)
plot(Torn3, add = TRUE)

plot of chunk plotsChecks

table(Torn3$STATE)
## 
##   AK   AL   AR   AZ   CA   CO   CT   DC   DE   FL   GA   HI   IA   ID   IL 
##    0    1    0    0    0    0    0    0    0 2751    1    0    0    0    0 
##   IN   KS   KY   LA   MA   MD   ME   MI   MN   MO   MS   MT   NC   ND   NE 
##    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0 
##   NH   NJ   NM   NV   NY   OH   OK   OR   PA   PR   RI   SC   SD   TN   TX 
##    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0 
##   UT   VA   VT   WA   WI   WV   WY 
##    0    0    0    0    0    0    0
# plot(Torn2[Torn2$STATE=='NE', ], add=TRUE, col='red')
stTorPerYrObs = dim(Torn3@data)[1]/length(1950:2011)
stTorPerYrObs/stArea * 1e+10  # Annual statewide number of tornadoes per year per 10000 square km
## [1] 2.891

Compute total area of tornado paths and annual average area.

tArea = sum(Torn3$Area)
tArea/1e+06
## [1] 668.5
areaPerYear = tArea/length(1950:2011)
areaPerYear/1e+06
## [1] 10.78

Estimate the annual average recurrence interval (return period) at any location and the annual probability of getting hit by a tornado.

By the principle of geometrical probability, the chance of a tornado striking a location is the ratio of the mean area covered by the tornadoes per year to the area over which the tornadoes may occur.

stArea/areaPerYear  # Annual avg recurrence interval for any location
## [1] 14244
areaPerYear/stArea * 100  # Annual probability
## [1] 0.007021

Next, create and summarize a planar point pattern object.

require(spatstat)
require(maptools)
W = as(STATE, "owin")
## Checking 14 polygons...1, [Checking polygon with 2918 edges...]
## 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,  14.
## done.
## Checking for cross-intersection between 14 polygons...1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,  13.
## done.
All.ppp = as.ppp(Torn3["Category"])
All.ppp = All.ppp[W]
unitname(All.ppp) = c("meter", "meters")
summary(All.ppp)
## Marked planar point pattern: 2753 points
## Average intensity 1.79e-08 points per square meter  
## 
## *Pattern contains duplicated points*
## 
## Coordinates are given to 2 decimal places 
## i.e. rounded to the nearest multiple of 0.01 meters  
## 
## Multitype:
##   frequency proportion intensity
## 0      1640   0.595000  1.07e-08
## 1       784   0.285000  5.10e-09
## 2       296   0.108000  1.93e-09
## 3        33   0.012000  2.15e-10
## 4         1   0.000363  6.51e-12
## 5         0   0.000000  0.00e+00
## 
## Window: polygonal boundary
## 14 separate polygons (no holes)
##             vertices      area relative.area
## polygon 1       2918 1.527e+11      9.95e-01
## polygon 2         36 4.759e+07      3.10e-04
## polygon 3         22 5.163e+07      3.36e-04
## polygon 4         26 1.412e+07      9.20e-05
## polygon 5         69 8.935e+07      5.82e-04
## polygon 6        289 1.714e+08      1.12e-03
## polygon 7         25 6.844e+06      4.46e-05
## polygon 8         15 1.089e+07      7.09e-05
## polygon 9          9 7.209e+05      4.69e-06
## polygon 10       106 3.606e+07      2.35e-04
## polygon 11        19 1.975e+07      1.29e-04
## polygon 12       179 3.577e+08      2.33e-03
## polygon 13        11 3.512e+06      2.29e-05
## polygon 14        12 2.759e+07      1.80e-04
## enclosing rectangle: [802437, 1617257] x [-1501045, -800485] meters 
## Window area =  1.53583e+11 square meters 
## Unit of length: 1 meter

The native spatial units are meters.

Estimate population bias corrected tornado risk

Get data on the location of cities.

tmp = download.file("http://www.nws.noaa.gov/geodata/catalog/national/data/ci08au12.zip", 
    "ci08au12.zip", mode = "wb")

Read the data.

unzip("ci08au12.zip")
US = readOGR(dsn = ".", layer = "ci08au12", p4s = ll)
## OGR data source with driver: ESRI Shapefile 
## Source: ".", layer: "ci08au12"
## with 43603 features and 23 fields
## Feature type: wkbPoint with 2 dimensions
UST = spTransform(US, CRS(proj4string(Torn3)))

Subset by state boundary.

st = overlay(UST, STATE)
st = !is.na(st)
Cities = UST[st, ]
plot(unmark(All.ppp), pch = 16, cex = 0.4)
plot(Cities, pch = 15, col = "green", cex = 0.4, add = TRUE)

plot of chunk subsetCitiesByStateBoundary

C = as.ppp(Cities)
window = as.owin(All.ppp)
C = C[window]

Create a map of distance from nearest city centers. Distance from nearest city is a covariate in the point pattern model (ppm).

Zc = distmap(C)

Kernel density model

Kernel density model for local report density as a function of distance from nearest city.

rhat = rhohat(unmark(All.ppp), Zc, smoother = "kernel", method = "transform", 
    adjust = 1)
## Warning: Some tiles with zero area contain points
## Warning: Some weights are zero
## Warning: 6 out of 487 pixel values were outside the pixel image domain and
## were estimated by convolution
plot(rhat, legend = FALSE, main = "")

plot of chunk populationBiasCorrectedStats

stDensity = max(rhat$rho)
index = which.max(rhat$rho)
stDensityHi = rhat$hi[index]
stDensityLo = rhat$lo[index]

Report density is highest 3-4 km from city center: population distribution within cities. Spotter network tends to maximize just outside the city.

Verify significance with a parametric point pattern model.

fit = ppm(unmark(All.ppp), ~Zc, covariates = list(Zc = Zc), correction = "border", 
    rbord = 5)
## Warning: Some tiles with zero area contain points
## Warning: Some weights are zero
## Warning: Values of the covariate 'Zc' were NA or undefined at 0.16% (10
## out of 6155) of the quadrature points. Occurred while executing:
## ppm(unmark(All.ppp), ~Zc, covariates = list(Zc = Zc), correction =
## "border",
stDensity = exp(fit$coef[1])
summary(fit)$coefs[2, 4] * 1000
## [1] -0.2621
summary(fit)$coefs[2, 1] * 1000
## [1] -0.2466
summary(fit)$coefs[2, 5] * 1000
## [1] -0.2311

Statewide expect number of tornadoes per year.

stTorPerYrExp = stDensity * stArea/length(1950:2011)
stTorPerYrExp
## (Intercept) 
##       137.7
stDensityHi * stArea/length(1950:2011)
## [1] 120.1
stDensityLo * stArea/length(1950:2011)
## [1] 104.1
stTorPerYrExp/stArea * 1e+10  # Annual statewide number of tornadoes per year per 10000 square km
## (Intercept) 
##       8.968

Local recurrence interval and annual probability. First determine the mean tornado area as the product of the expection of path length and path length accounting for the correlation between them.

Width = Torn3$Width
Length = Torn3$Length
y = log(Length)
u = log(Width)
v = y + u
meanv = mean(y[y > 0]) + mean(u[u > 0])
varv = var(y[y > 0]) + var(u[u > 0]) + 2 * cor(Width, Length) * sd(y[y > 0]) * 
    sd(u[u > 0])
meanArea = exp(meanv + 0.5 * varv)
meanArea/1e+06
## [1] 0.09312
mean(Torn3$Area)/1e+06
## [1] 0.2428

Estimate the population corrected annual average recurrence interval (return period) at any location and the annual probability of getting hit by a tornado.

tAreaPerYr = stTorPerYrExp * meanArea
tAreaPerYr/1e+06
## (Intercept) 
##       12.83
stArea/tAreaPerYr  # Annual avg recurrence interval for any location
## (Intercept) 
##       11975
tAreaPerYr/stArea * 100  # Annual probability of a tornado at any location
## (Intercept) 
##    0.008351

County-level statistics

Observed versus expected annual probabilities by county.

COUNTY$ctArea = gArea(COUNTY, byid = TRUE)
ct = over(Torn2, COUNTY)
Torn2$COUNTYNAME = ct$COUNTYNAME
require(plyr)
## Loading required package: plyr
perCounty = ddply(Torn2@data, .(COUNTYNAME), summarize, obs = length(COUNTYNAME), 
    totalArea = sum(Area))
xx = merge(COUNTY@data, perCounty, by = "COUNTYNAME")
xx$exp = stDensity * xx$ctArea
yy = xx[order(xx$OBJECTID_1), ]
COUNTY$ctTorPerYrObs = yy$obs/length(1950:2011)
COUNTY$ctTorPerYrExp = yy$exp/length(1950:2011)
COUNTY$apR = pnbinom(0, size = 1, mu = COUNTY$ctTorPerYrObs, lower.tail = FALSE) * 
    100
COUNTY$apC = pnbinom(0, size = 1, mu = COUNTY$ctTorPerYrExp, lower.tail = FALSE) * 
    100
COUNTY$apCperA = COUNTY$ctTorPerYrExp * meanArea/COUNTY$ctArea * 100
require(RColorBrewer)
## Loading required package: RColorBrewer
range(COUNTY$apR)
## [1]  4.615 71.163
rng = seq(0, 90, 10)
cls = brewer.pal(9, "YlOrRd")
p1 = spplot(COUNTY, "apR", col.regions = cls, at = rng, colorkey = list(space = "bottom", 
    labels = list(cex = 1)), sub = list("Annual Tornado Probability (%)\n [raw]", 
    cex = 1, font = 1))
range(COUNTY$apC)
## [1]  0.3997 82.9957
rng = seq(0, 90, 10)
cls = brewer.pal(9, "YlOrRd")
p2 = spplot(COUNTY, "apC", col.regions = cls, at = rng, colorkey = list(space = "bottom", 
    labels = list(cex = 1)), sub = list("Annual Tornado Probability (%)\n [adjusted]", 
    cex = 1, font = 1))
p1 = update(p1, main = textGrob("a", x = unit(0.05, "npc")))
p2 = update(p2, main = textGrob("b", x = unit(0.05, "npc")))

Conditional risk of significant tornadoes (F2 or higher).

Naming convention

F0-F1: weak
F2-F3: strong
F4-F5: violent

F2-F5: significant
F3-F5: intense (as an opposite we could use 'moderate')

Create two ppp objects and join them. This is needed for the relative risk function.

plot(split(All.ppp))

plot of chunk summaryAnalysis

H = unmark(All.ppp[All.ppp$marks <= 1 & All.ppp$marks >= 0])
S = unmark(All.ppp[All.ppp$marks >= 2])
All2.ppp = superimpose(H = H, S = S)
## Warning: data contain duplicated points
summary(All2.ppp)
## Marked planar point pattern: 2753 points
## Average intensity 1.79e-08 points per square meter  
## 
## *Pattern contains duplicated points*
## 
## Coordinates are given to 2 decimal places 
## i.e. rounded to the nearest multiple of 0.01 meters  
## 
## Multitype:
##   frequency proportion intensity
## H      2420       0.88  1.58e-08
## S       330       0.12  2.15e-09
## 
## Window: polygonal boundary
## 14 separate polygons (no holes)
##             vertices      area relative.area
## polygon 1       2918 1.527e+11      9.95e-01
## polygon 2         36 4.759e+07      3.10e-04
## polygon 3         22 5.163e+07      3.36e-04
## polygon 4         26 1.412e+07      9.20e-05
## polygon 5         69 8.935e+07      5.82e-04
## polygon 6        289 1.714e+08      1.12e-03
## polygon 7         25 6.844e+06      4.46e-05
## polygon 8         15 1.089e+07      7.09e-05
## polygon 9          9 7.209e+05      4.69e-06
## polygon 10       106 3.606e+07      2.35e-04
## polygon 11        19 1.975e+07      1.29e-04
## polygon 12       179 3.577e+08      2.33e-03
## polygon 13        11 3.512e+06      2.29e-05
## polygon 14        12 2.759e+07      1.80e-04
## enclosing rectangle: [802437, 1617257] x [-1501045, -800485] meters 
## Window area =  1.53583e+11 square meters 
## Unit of length: 1 meter
plot(S, pch = 25, cols = "red", main = "")

plot of chunk summaryAnalysis

Compute spatial densities and relative risk.

eps = 320
dAll = density(unmark(All2.ppp), dimyx = c(eps, eps))
attr(dAll, "sigma")
## [1] 87570
mean(dAll$v, na.rm = TRUE)
## [1] 1.681e-08
dH = density(H, dimyx = c(eps, eps))
dS = density(S, dimyx = c(eps, eps))
bw = bw.relrisk(All2.ppp, hmax = 1e+12)
rr = relrisk(All2.ppp, dimyx = c(eps, eps), sigma = bw)

With All2 a bivariate point pattern, moderate tornadoes are treated as non-events and the significant tornadoes are treated as events. The relrisk() function computes the spatially-varying risk of a significant tornado. That is it computes the probability p(u) that given a tornado at location u it will be significant (conditional probability of a significant tornado)

Statewide mean conditional probability of a significant tornado.

mean(rr$v, na.rm = TRUE) * 100
## [1] 12.56
sd(as.vector(rr$v), na.rm = TRUE) * 100
## [1] 3.476
range(rr$v, na.rm = TRUE) * 100
## [1]  7.007 20.177

Create a SpatialGridDataFrame (S4 class).

rr.sgdf = as(rr, "SpatialGridDataFrame")
proj4string(rr.sgdf) = CRS(proj4string(Torn2))

Conditional risk of significant tornado by county

Extract the mean relative risk per county and multiply it by the population corrected probability of a tornado.

COUNTY$v = over(COUNTY, rr.sgdf, fn = mean)
df = COUNTY$apC * COUNTY$v
names(df) = "apS"
COUNTY$apS = df$apS
COUNTY$apCperAS = COUNTY$apCperA * COUNTY$apS/COUNTY$apC * 100
range(COUNTY$apS)
## [1] NA NA
rng = seq(0, 30, 5)
cls = brewer.pal(6, "YlOrRd")
p3 = spplot(COUNTY, "apS", col.regions = cls, at = rng, colorkey = list(space = "bottom", 
    labels = list(cex = 1)), sub = list("Annual Significant Tornado\n Probability (%)", 
    cex = 1, font = 1))
range(COUNTY$apSpA)
## Warning: no non-missing arguments to min; returning Inf
## Warning: no non-missing arguments to max; returning -Inf
## [1]  Inf -Inf
rng = seq(0, 9, 1)
cls = brewer.pal(9, "YlOrRd")
p4 = spplot(COUNTY, "apCperAS", col.regions = cls, at = rng, colorkey = list(space = "bottom", 
    labels = list(cex = 1)), sub = list("Annual Significant Tornado\n Probability (%) At a Location", 
    cex = 1, font = 1))
p3 = update(p3, main = textGrob("c", x = unit(0.05, "npc")))
p4 = update(p4, main = textGrob("d", x = unit(0.05, "npc")))

Create 4 panel figure of maps.

print(p1, split = c(1, 1, 2, 2), more = TRUE)
print(p2, split = c(2, 1, 2, 2), more = TRUE)
print(p3, split = c(1, 2, 2, 2), more = TRUE)
print(p4, split = c(2, 2, 2, 2), more = FALSE)

plot of chunk fourPanelFigureMaps

county.df = as.data.frame(COUNTY)
county.df[rev(order(county.df$v))[1:10], ]
##      OBJECTID_1 STATE CWA COUNTYNAME  FIPS TIME_ZONE FE_AREA    LON   LAT
## 3286       3301    FL TAE     Walton 12131         C      pa -85.90 30.54
## 3093       3102    FL TAE    Gadsden 12039         E      bb -84.61 30.58
## 2482       2487    FL TAE       Leon 12073         E      bb -84.28 30.46
## 2523       2528    FL TAE    Jackson 12063         C      pa -84.92 30.78
## 3081       3090    FL TAE  Jefferson 12065         E      bb -83.90 30.44
## 3098       3107    FL TAE    Wakulla 12129         E      bb -84.40 30.17
## 3253       3265    FL TAE    Liberty 12077         E      bb -84.88 30.24
## 2522       2527    FL TAE    Jackson 12063         C      pa -85.22 30.80
## 2485       2490    FL TAE    Madison 12079         E      bb -83.47 30.44
## 2472       2477    FL TAE    Calhoun 12013         C      pa -85.20 30.41
##      Shape_Leng Shape_Area    ctArea ctTorPerYrObs ctTorPerYrExp   apR
## 3286      1.476  0.0004162 4.475e+06        0.5968      0.004013 37.37
## 3093      1.818  0.1287079 1.383e+09        0.4355      1.240099 30.34
## 2482      2.180  0.1707325 1.838e+09        0.3065      1.648010 23.46
## 2523      1.601  0.0029718 3.183e+07        0.4677      0.028548 31.87
## 3081      2.539  0.1487864 1.602e+09        0.1774      1.436612 15.07
## 3098      5.489  0.1495334 1.616e+09        0.2097      1.449666 17.33
## 3253      2.941  0.2046864 2.210e+09        0.1290      1.982215 11.43
## 2522      3.101  0.2301034 2.464e+09        0.4677      2.209868 31.87
## 2485      2.084  0.1741421 1.875e+09        0.2581      1.681273 20.51
## 2472      1.842  0.1396401 1.504e+09        0.2258      1.348944 18.42
##          apC  apCperA      v     apS apCperAS
## 3286  0.3997 0.008351     NA      NA       NA
## 3093 55.3591 0.008351 0.1967 10.8916   0.1643
## 2482 62.2358 0.008351 0.1954 12.1615   0.1632
## 2523  2.7755 0.008351 0.1929  0.5354   0.1611
## 3081 58.9594 0.008351 0.1904 11.2269   0.1590
## 3098 59.1781 0.008351 0.1878 11.1112   0.1568
## 3253 66.4679 0.008351 0.1842 12.2453   0.1538
## 2522 68.8461 0.008351 0.1812 12.4736   0.1513
## 2485 62.7043 0.008351 0.1804 11.3094   0.1506
## 2472 57.4277 0.008351 0.1792 10.2887   0.1496

Subset by county boundary.

CTY = subset(COUNTY, COUNTY$COUNTYNAME == "Jefferson")
ct = overlay(Torn2, CTY)
ct = !is.na(ct)
Torn3 = Torn2[ct, ]
table(Torn3$Category)
## 
## 0 1 2 3 4 5 
## 6 5 0 0 0 0