Population Adjusted Tornado Probabilities

Group term project for Applied Spatial Statistics: Spring 2013

Students: Migliorelli, Loury; Michaels, Laura; Strazzo, Sarah; Mulholland, Brendan; Cruz, Rizalino; Patterson, Michael; Fraza, Erik; Xing, Guang; Amrine, Cameron; Widen, Holly

Background: Thom 1963: Tornado probabilities, Mon. Wea. Rev., Elsner, Michaels, Scheitlin, Elsner 2013: The decreasing population bias in tornado records across the central Plains, Weather, Climate, and Society, Elsner, Murnane, Jagger, Widen 2013: A spatial point process model for violent tornado occurrence in the U.S. Great Plains, Mathematical Geosciences.

Demonstrate the methods on one state, then apply them on several others. Explain the limitations of the methodology as applied to certain states.

Download data and subset

Begin by downloading the data and reading it into R. Use the rgdal package for the readOGR() function. Print the projection information.

Download tornado data.

tmp = download.file("http://www.spc.noaa.gov/gis/svrgis/zipped/tornado.zip", 
    "tornado.zip", mode = "wb")

Read the tornado data.

unzip("tornado.zip")
require(rgdal)
Torn = readOGR(dsn = ".", layer = "tornado")
## OGR data source with driver: ESRI Shapefile 
## Source: ".", layer: "tornado"
## with 56221 features and 28 fields
## Feature type: wkbPoint with 2 dimensions
names(Torn)
##  [1] "OM"         "YEAR"       "MONTH"      "DAY"        "DATE"      
##  [6] "TIME"       "TIMEZONE"   "STATE"      "FIPS"       "STATENUMBE"
## [11] "FSCALE"     "INJURIES"   "FATALITIES" "LOSS"       "CROPLOSS"  
## [16] "SLAT"       "SLON"       "ELAT"       "ELON"       "LENGTH"    
## [21] "WIDTH"      "NS"         "SN"         "SG"         "F1"        
## [26] "F2"         "F3"         "F4"
proj4string(Torn)
## [1] "+proj=lcc +lat_1=33 +lat_2=45 +lat_0=39 +lon_0=-96 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0"

Tornadoes through 2012. CSV file only.

con = "http://www.spc.noaa.gov/wcm/data/1950-2012_torn.csv"
TornA = read.csv(con, header = FALSE)

Subset on year and F scale and create a factor variable (Category) for the F-scale. Convert length from miles and width from yards to meters. Compute per tornado path area.

Torn2 = subset(Torn, FSCALE >= 0 & YEAR >= 1950)
Torn2$Category = factor(Torn2$FSCALE, ordered = TRUE)
Torn2$Length = Torn2$LENGTH * 1609.34
Torn2$Width = Torn2$WIDTH * 0.9144
Torn2$Area = Torn2$Width * Torn2$Length

Choose a state

Here I'm interested in the state of Illinois(17) http://www2.census.gov/cgi-bin/shapefiles2009/state-files?state=20 http://www.census.gov/geo/www/cob/st2000.html#shp http://www.nws.noaa.gov/geodata/catalog/wsom/html/cwa.htm

tmp = download.file("http://www.census.gov/geo/cob/bdy/st/st00shp/st17_d00_shp.zip", 
    "st17_d00_shp.zip", mode = "wb")
tmp = download.file("http://www.nws.noaa.gov/geodata/catalog/wsom/data/w_02ap13.zip", 
    "w_02ap13.zip", mode = "wb")

Read the data.

unzip("st17_d00_shp.zip")
STATE = readOGR(dsn = ".", layer = "st17_d00")
## OGR data source with driver: ESRI Shapefile 
## Source: ".", layer: "st17_d00"
## with 1 features and 10 fields
## Feature type: wkbPolygon with 2 dimensions
unzip("w_02ap13.zip")
CWArea = readOGR(dsn = ".", layer = "w_02ap13")
## OGR data source with driver: ESRI Shapefile 
## Source: ".", layer: "w_02ap13"
## with 123 features and 5 fields
## Feature type: wkbPolygon with 2 dimensions

Add projection information and transform the state boundaries.

ll = "+proj=longlat +ellps=GRS80 +datum=NAD83"
proj4string(STATE) = CRS(ll)
STATE = spTransform(STATE, CRS(proj4string(Torn2)))
require(rgeos)
stArea = gArea(STATE)
stArea/1e+06
## [1] 144465

Download the county boundaries (all US).

tmp = download.file("http://www.nws.noaa.gov/geodata/catalog/county/data/c_02ap13.zip", 
    "c_02ap13.zip", mode = "wb")

Read the data and select state.

unzip("c_02ap13.zip")
COUNTY = readOGR(dsn = ".", layer = "c_02ap13")
## OGR data source with driver: ESRI Shapefile 
## Source: ".", layer: "c_02ap13"
## with 3312 features and 11 fields
## Feature type: wkbPolygon with 2 dimensions
COUNTY = COUNTY[COUNTY$STATE == "IL", ]
COUNTY = spTransform(COUNTY, CRS(proj4string(Torn2)))

Estimate the raw statewide tornado risk

Subset by state boundary.

st = overlay(Torn2, STATE)
st = !is.na(st)
Torn3 = Torn2[st, ]

Check the subsetting.

plot(STATE)
plot(Torn3, add = TRUE)

plot of chunk plotsChecks

table(Torn3$STATE)
## 
##   AK   AL   AR   AZ   CA   CO   CT   DC   DE   FL   GA   HI   IA   ID   IL 
##    0    0    0    0    0    0    0    0    0    0    0    0   10    0 2092 
##   IN   KS   KY   LA   MA   MD   ME   MI   MN   MO   MS   MT   NC   ND   NE 
##    2    0    1    0    0    0    0    0    0    5    0    0    0    0    0 
##   NH   NJ   NM   NV   NY   OH   OK   OR   PA   PR   RI   SC   SD   TN   TX 
##    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0 
##   UT   VA   VT   WA   WI   WV   WY 
##    0    0    0    0    1    0    0
# plot(Torn2[Torn2$STATE=='NE', ], add=TRUE, col='red')
stTorPerYrObs = dim(Torn3@data)[1]/length(1950:2011)
stTorPerYrObs
## [1] 34.05
stTorPerYrObs/stArea * 1e+10  # Annual statewide number of tornadoes per year per 10000 square km
## [1] 2.357

Compute total area of tornado paths and annual average area.

tArea = sum(Torn3$Area)
tArea/1e+06
## [1] 2722
areaPerYear = tArea/length(1950:2011)
areaPerYear/1e+06
## [1] 43.9

Estimate the annual average recurrence interval (return period) at any location and the annual probability of getting hit by a tornado.

By the principle of geometrical probability, the chance of a tornado striking a location is the ratio of the mean area covered by the tornadoes per year to the area over which the tornadoes may occur.

stArea/areaPerYear  # Annual avg recurrence interval for any location
## [1] 3291
areaPerYear/stArea * 100  # Annual probability
## [1] 0.03039

Next, create and summarize a planar point pattern object.

require(spatstat)
require(maptools)
W = as(STATE, "owin")
## Checking 1 polygon...[Checking polygon with 2832 edges...]
## done.
All.ppp = as.ppp(Torn3["Category"])
All.ppp = All.ppp[W]
unitname(All.ppp) = c("meter", "meters")
summary(All.ppp)
## Marked planar point pattern: 2111 points
## Average intensity 1.46e-08 points per square meter  
## 
## *Pattern contains duplicated points*
## 
## Coordinates are given to 2 decimal places 
## i.e. rounded to the nearest multiple of 0.01 meters  
## 
## Multitype:
##   frequency proportion intensity
## 0       977   0.463000  6.76e-09
## 1       659   0.312000  4.56e-09
## 2       353   0.167000  2.44e-09
## 3        97   0.045900  6.71e-10
## 4        23   0.010900  1.59e-10
## 5         2   0.000947  1.38e-11
## 
## Window: polygonal boundary
## single connected closed polygon with 2832 vertices
## enclosing rectangle: [379979, 734192] x [-201377, 416159] meters 
## Window area =  1.44465e+11 square meters 
## Unit of length: 1 meter

The native spatial units are meters.

Estimate population bias corrected tornado risk

Get data on the location of cities.

tmp = download.file("http://www.nws.noaa.gov/geodata/catalog/national/data/ci08au12.zip", 
    "ci08au12.zip", mode = "wb")

Read the data.

unzip("ci08au12.zip")
US = readOGR(dsn = ".", layer = "ci08au12", p4s = ll)
## OGR data source with driver: ESRI Shapefile 
## Source: ".", layer: "ci08au12"
## with 43603 features and 23 fields
## Feature type: wkbPoint with 2 dimensions
UST = spTransform(US, CRS(proj4string(Torn3)))

Subset by state boundary.

st = overlay(UST, STATE)
st = !is.na(st)
Cities = UST[st, ]
plot(unmark(All.ppp), pch = 16, cex = 0.4)
plot(Cities, pch = 15, col = "green", cex = 0.4, add = TRUE)

plot of chunk subsetCitiesByStateBoundary

C = as.ppp(Cities)
window = as.owin(All.ppp)
C = C[window]

Create a map of distance from nearest city centers. Distance from nearest city is a covariate in the point pattern model (ppm).

Zc = distmap(C)

Kernel density model

Kernel density model for local report density as a function of distance from nearest city.

rhat = rhohat(unmark(All.ppp), Zc, smoother = "kernel", method = "transform", 
    adjust = 1)
## Warning: Some tiles with zero area contain points
## Warning: Some weights are zero
plot(rhat, legend = FALSE, main = "")

plot of chunk populationBiasCorrectedStats

stDensity = max(rhat$rho)
index = which.max(rhat$rho)
stDensityHi = rhat$hi[index]
stDensityLo = rhat$lo[index]

Report density is highest 3-4 km from city center: population distribution within cities. Spotter network tends to maximize just outside the city.

Verify significance with a parametric point pattern model.

fit = ppm(unmark(All.ppp), ~Zc, covariates = list(Zc = Zc), correction = "border", 
    rbord = 5)
## Warning: Some tiles with zero area contain points
## Warning: Some weights are zero
stDensity = exp(fit$coef[1])
summary(fit)$coefs[2, 4] * 1000
## [1] -0.2021
summary(fit)$coefs[2, 1] * 1000
## [1] -0.1807
summary(fit)$coefs[2, 5] * 1000
## [1] -0.1592

Statewide expect number of tornadoes per year.

stTorPerYrExp = stDensity * stArea/length(1950:2011)
stTorPerYrExp
## (Intercept) 
##       69.36
stDensityHi * stArea/length(1950:2011)
## [1] 71.97
stDensityLo * stArea/length(1950:2011)
## [1] 61.8
stTorPerYrExp/stArea * 1e+10  # Annual statewide number of tornadoes per year per 10000 square km
## (Intercept) 
##       4.801

Local recurrence interval and annual probability. First determine the mean tornado area as the product of the expection of path length and path length accounting for the correlation between them.

Width = Torn3$Width
Length = Torn3$Length
y = log(Length)
u = log(Width)
v = y + u
meanv = mean(y[y > 0]) + mean(u[u > 0])
varv = var(y[y > 0]) + var(u[u > 0]) + 2 * cor(Width, Length) * sd(y[y > 0]) * 
    sd(u[u > 0])
meanArea = exp(meanv + 0.5 * varv)
meanArea/1e+06
## [1] 1.062
mean(Torn3$Area)/1e+06
## [1] 1.289

Estimate the population corrected annual average recurrence interval (return period) at any location and the annual probability of getting hit by a tornado.

tAreaPerYr = stTorPerYrExp * meanArea
tAreaPerYr/1e+06
## (Intercept) 
##       73.63
stArea/tAreaPerYr  # Annual avg recurrence interval for any location
## (Intercept) 
##        1962
tAreaPerYr/stArea * 100  # Annual probability of a tornado at any location
## (Intercept) 
##     0.05097

County-level statistics

Observed versus expected annual probabilities by county.

COUNTY$ctArea = gArea(COUNTY, byid = TRUE)
ct = over(Torn2, COUNTY)
Torn2$COUNTYNAME = ct$COUNTYNAME
require(plyr)
## Loading required package: plyr
perCounty = ddply(Torn2@data, .(COUNTYNAME), summarize, obs = length(COUNTYNAME), 
    totalArea = sum(Area))
xx = merge(COUNTY@data, perCounty, by = "COUNTYNAME")
xx$exp = stDensity * xx$ctArea
yy = xx[order(xx$OBJECTID_1), ]
COUNTY$ctTorPerYrObs = yy$obs/length(1950:2011)
COUNTY$ctTorPerYrExp = yy$exp/length(1950:2011)
COUNTY$apR = pnbinom(0, size = 1, mu = COUNTY$ctTorPerYrObs, lower.tail = FALSE) * 
    100
COUNTY$apC = pnbinom(0, size = 1, mu = COUNTY$ctTorPerYrExp, lower.tail = FALSE) * 
    100
COUNTY$apCperA = COUNTY$ctTorPerYrExp * meanArea/COUNTY$ctArea * 100
require(RColorBrewer)
## Loading required package: RColorBrewer
range(COUNTY$apR)
## [1]  4.615 59.477
rng = seq(0, 90, 10)
cls = brewer.pal(9, "YlOrRd")
p1 = spplot(COUNTY, "apR", col.regions = cls, at = rng, colorkey = list(space = "bottom", 
    labels = list(cex = 1)), sub = list("Annual Tornado Probability (%)\n [raw]", 
    cex = 1, font = 1))
range(COUNTY$apC)
## [1] 17.50 59.35
rng = seq(0, 90, 10)
cls = brewer.pal(9, "YlOrRd")
p2 = spplot(COUNTY, "apC", col.regions = cls, at = rng, colorkey = list(space = "bottom", 
    labels = list(cex = 1)), sub = list("Annual Tornado Probability (%)\n [adjusted]", 
    cex = 1, font = 1))
p1 = update(p1, main = textGrob("a", x = unit(0.05, "npc")))
p2 = update(p2, main = textGrob("b", x = unit(0.05, "npc")))

Conditional risk of significant tornadoes (F2 or higher).

Naming convention

F0-F1: weak
F2-F3: strong
F4-F5: violent

F2-F5: significant
F3-F5: intense (as an opposite we could use 'moderate')

Create two ppp objects and join them. This is needed for the relative risk function.

plot(split(All.ppp))

plot of chunk summaryAnalysis plot of chunk summaryAnalysis

H = unmark(All.ppp[All.ppp$marks <= 1 & All.ppp$marks >= 0])
S = unmark(All.ppp[All.ppp$marks >= 2])
All2.ppp = superimpose(H = H, S = S)
## Warning: data contain duplicated points
summary(All2.ppp)
## Marked planar point pattern: 2111 points
## Average intensity 1.46e-08 points per square meter  
## 
## *Pattern contains duplicated points*
## 
## Coordinates are given to 2 decimal places 
## i.e. rounded to the nearest multiple of 0.01 meters  
## 
## Multitype:
##   frequency proportion intensity
## H      1640      0.775  1.13e-08
## S       475      0.225  3.29e-09
## 
## Window: polygonal boundary
## single connected closed polygon with 2832 vertices
## enclosing rectangle: [379979, 734192] x [-201377, 416159] meters 
## Window area =  1.44465e+11 square meters 
## Unit of length: 1 meter
plot(S, pch = 25, cols = "red", main = "")

plot of chunk summaryAnalysis

Compute spatial densities and relative risk.

eps = 320
dAll = density(unmark(All2.ppp), dimyx = c(eps, eps))
attr(dAll, "sigma")
## [1] 44277
mean(dAll$v, na.rm = TRUE)
## [1] 1.464e-08
dH = density(H, dimyx = c(eps, eps))
dS = density(S, dimyx = c(eps, eps))
bw = bw.relrisk(All2.ppp, hmax = 1e+12)
rr = relrisk(All2.ppp, dimyx = c(eps, eps), sigma = bw)

With All2 a bivariate point pattern, moderate tornadoes are treated as non-events and the significant tornadoes are treated as events. The relrisk() function computes the spatially-varying risk of a significant tornado. That is it computes the probability p(u) that given a tornado at location u it will be significant (conditional probability of a significant tornado)

Statewide mean conditional probability of a significant tornado.

mean(rr$v, na.rm = TRUE) * 100
## [1] 23.1
sd(as.vector(rr$v), na.rm = TRUE) * 100
## [1] 4.867
range(rr$v, na.rm = TRUE) * 100
## [1] 15.02 36.47

Create a SpatialGridDataFrame (S4 class).

rr.sgdf = as(rr, "SpatialGridDataFrame")
proj4string(rr.sgdf) = CRS(proj4string(Torn2))

Conditional risk of significant tornado by county

Extract the mean relative risk per county and multiply it by the population corrected probability of a tornado.

COUNTY$v = over(COUNTY, rr.sgdf, fn = mean)
df = COUNTY$apC * COUNTY$v
names(df) = "apS"
COUNTY$apS = df$apS
COUNTY$apCperAS = COUNTY$apCperA * COUNTY$apS/COUNTY$apC * 100
range(COUNTY$apS)
## [1]  4.638 14.748
rng = seq(0, 30, 5)
cls = brewer.pal(6, "YlOrRd")
p3 = spplot(COUNTY, "apS", col.regions = cls, at = rng, colorkey = list(space = "bottom", 
    labels = list(cex = 1)), sub = list("Annual Significant Tornado\n Probability (%)", 
    cex = 1, font = 1))
range(COUNTY$apSpA)
## Warning: no non-missing arguments to min; returning Inf
## Warning: no non-missing arguments to max; returning -Inf
## [1]  Inf -Inf
rng = seq(0, 9, 1)
cls = brewer.pal(9, "YlOrRd")
p4 = spplot(COUNTY, "apCperAS", col.regions = cls, at = rng, colorkey = list(space = "bottom", 
    labels = list(cex = 1)), sub = list("Annual Significant Tornado\n Probability (%) At a Location", 
    cex = 1, font = 1))
p3 = update(p3, main = textGrob("c", x = unit(0.05, "npc")))
p4 = update(p4, main = textGrob("d", x = unit(0.05, "npc")))

Create 4 panel figure of maps.

print(p1, split = c(1, 1, 2, 2), more = TRUE)
print(p2, split = c(2, 1, 2, 2), more = TRUE)
print(p3, split = c(1, 2, 2, 2), more = TRUE)
print(p4, split = c(2, 2, 2, 2), more = FALSE)

plot of chunk fourPanelFigureMaps

county.df = as.data.frame(COUNTY)
county.df[rev(order(county.df$v))[1:10], ]
##     OBJECTID_1 STATE CWA COUNTYNAME  FIPS TIME_ZONE FE_AREA    LON   LAT
## 552        553    IL PAH    Johnson 17087         C      ss -88.88 37.46
## 584        585    IL PAH    Pulaski 17153         C      ss -89.13 37.22
## 613        614    IL PAH  Alexander 17003         C      ss -89.34 37.19
## 583        584    IL PAH       Pope 17151         C      ss -88.56 37.41
## 599        600    IL PAH      Union 17181         C      ss -89.26 37.47
## 591        592    IL PAH     Saline 17165         C      ss -88.54 37.75
## 571        572    IL PAH     Massac 17127         C      ss -88.71 37.22
## 608        609    IL PAH Williamson 17199         C      ss -88.93 37.73
## 642        643    IL PAH     Hardin 17069         C      ss -88.27 37.52
## 640        641    IL PAH   Gallatin 17059         C      ss -88.23 37.76
##     Shape_Leng Shape_Area    ctArea ctTorPerYrObs ctTorPerYrExp    apR
## 552      1.275    0.09202 8.944e+08       0.16129        0.4294 13.889
## 584      1.241    0.05348 5.216e+08       0.06452        0.2504  6.061
## 613      1.686    0.06637 6.476e+08       0.16129        0.3109 13.889
## 583      1.589    0.09876 9.606e+08       0.11290        0.4612 10.145
## 599      1.541    0.11136 1.082e+09       0.19355        0.5196 16.216
## 591      1.281    0.10247 9.919e+08       0.14516        0.4762 12.676
## 571      1.181    0.06366 6.209e+08       0.11290        0.2981 10.145
## 608      1.415    0.11763 1.139e+09       0.09677        0.5469  8.824
## 642      0.996    0.04793 4.655e+08       0.04839        0.2235  4.615
## 640      1.402    0.08697 8.418e+08       0.09677        0.4041  8.824
##       apC apCperA      v    apS apCperAS
## 552 30.04 0.05097 0.3629 10.901    1.849
## 584 20.03 0.05097 0.3612  7.233    1.841
## 613 23.72 0.05097 0.3601  8.541    1.835
## 583 31.56 0.05097 0.3537 11.165    1.803
## 599 34.19 0.05097 0.3529 12.066    1.799
## 591 32.26 0.05097 0.3503 11.299    1.785
## 571 22.97 0.05097 0.3499  8.035    1.783
## 608 35.35 0.05097 0.3436 12.146    1.751
## 642 18.27 0.05097 0.3389  6.190    1.727
## 640 28.78 0.05097 0.3373  9.708    1.719

Subset by county boundary.

CTY = subset(COUNTY, COUNTY$COUNTYNAME == "Jefferson")
ct = overlay(Torn2, CTY)
ct = !is.na(ct)
Torn3 = Torn2[ct, ]
table(Torn3$Category)
## 
## 0 1 2 3 4 5 
## 7 6 6 2 1 0