Group term project for Applied Spatial Statistics: Spring 2013
Students: Migliorelli, Loury; Michaels, Laura; Strazzo, Sarah; Mulholland, Brendan; Cruz, Rizalino; Patterson, Michael; Fraza, Erik; Xing, Guang; Amrine, Cameron; Widen, Holly
Background: Thom 1963: Tornado probabilities, Mon. Wea. Rev., Elsner, Michaels, Scheitlin, Elsner 2013: The decreasing population bias in tornado records across the central Plains, Weather, Climate, and Society, Elsner, Murnane, Jagger, Widen 2013: A spatial point process model for violent tornado occurrence in the U.S. Great Plains, Mathematical Geosciences.
Demonstrate the methods on one state, then apply them on several others. Explain the limitations of the methodology as applied to certain states.
Begin by downloading the data and reading it into R. Use the rgdal package for the readOGR() function. Print the projection information.
Download tornado data.
tmp = download.file("http://www.spc.noaa.gov/gis/svrgis/zipped/tornado.zip",
"tornado.zip", mode = "wb")
Read the tornado data.
unzip("tornado.zip")
require(rgdal)
Torn = readOGR(dsn = ".", layer = "tornado")
## OGR data source with driver: ESRI Shapefile
## Source: ".", layer: "tornado"
## with 56221 features and 28 fields
## Feature type: wkbPoint with 2 dimensions
names(Torn)
## [1] "OM" "YEAR" "MONTH" "DAY" "DATE"
## [6] "TIME" "TIMEZONE" "STATE" "FIPS" "STATENUMBE"
## [11] "FSCALE" "INJURIES" "FATALITIES" "LOSS" "CROPLOSS"
## [16] "SLAT" "SLON" "ELAT" "ELON" "LENGTH"
## [21] "WIDTH" "NS" "SN" "SG" "F1"
## [26] "F2" "F3" "F4"
proj4string(Torn)
## [1] "+proj=lcc +lat_1=33 +lat_2=45 +lat_0=39 +lon_0=-96 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0"
Tornadoes through 2012. CSV file only.
con = "http://www.spc.noaa.gov/wcm/data/1950-2012_torn.csv"
TornA = read.csv(con, header = FALSE)
Subset on year and F scale and create a factor variable (Category) for the F-scale. Convert length from miles and width from yards to meters. Compute per tornado path area.
Torn2 = subset(Torn, FSCALE >= 0 & YEAR >= 1950)
Torn2$Category = factor(Torn2$FSCALE, ordered = TRUE)
Torn2$Length = Torn2$LENGTH * 1609.34
Torn2$Width = Torn2$WIDTH * 0.9144
Torn2$Area = Torn2$Width * Torn2$Length
Here I'm interested in the state of Florida (12) http://www2.census.gov/cgi-bin/shapefiles2009/state-files?state=12 http://www.census.gov/geo/www/cob/st2000.html#shp http://www.nws.noaa.gov/geodata/catalog/wsom/html/cwa.htm
# tmp =
# download.file('http://www.census.gov/geo/cob/bdy/st/st00shp/st20_d00_shp.zip',
# 'st20_d00_shp.zip', mode='wb') tmp =
# download.file('http://www.census.gov/geo/cob/bdy/st/st00shp/st19_d00_shp.zip',
# 'st19_d00_shp.zip', mode='wb')
tmp = download.file("http://www.census.gov/geo/cob/bdy/st/st00shp/st12_d00_shp.zip",
"st12_d00_shp.zip", mode = "wb")
tmp = download.file("http://www.nws.noaa.gov/geodata/catalog/wsom/data/w_02ap13.zip",
"w_02ap13.zip", mode = "wb")
Read the data.
# unzip('st20_d00_shp.zip') STATE = readOGR(dsn = '.', layer = 'st20_d00')
# unzip('st19_d00_shp.zip') STATE = readOGR(dsn = '.', layer = 'st19_d00')
unzip("st12_d00_shp.zip")
STATE = readOGR(dsn = ".", layer = "st12_d00")
## OGR data source with driver: ESRI Shapefile
## Source: ".", layer: "st12_d00"
## with 14 features and 10 fields
## Feature type: wkbPolygon with 2 dimensions
unzip("w_02ap13.zip")
CWArea = readOGR(dsn = ".", layer = "w_02ap13")
## OGR data source with driver: ESRI Shapefile
## Source: ".", layer: "w_02ap13"
## with 123 features and 5 fields
## Feature type: wkbPolygon with 2 dimensions
Add projection information and transform the state boundaries.
ll = "+proj=longlat +ellps=GRS80 +datum=NAD83"
proj4string(STATE) = CRS(ll)
STATE = spTransform(STATE, CRS(proj4string(Torn2)))
require(rgeos)
stArea = gArea(STATE)
stArea/1e+06
## [1] 153583
Download the county boundaries (all US).
tmp = download.file("http://www.nws.noaa.gov/geodata/catalog/county/data/c_02ap13.zip",
"c_02ap13.zip", mode = "wb")
Read the data and select state.
unzip("c_02ap13.zip")
COUNTY = readOGR(dsn = ".", layer = "c_02ap13")
## OGR data source with driver: ESRI Shapefile
## Source: ".", layer: "c_02ap13"
## with 3312 features and 11 fields
## Feature type: wkbPolygon with 2 dimensions
COUNTY = COUNTY[COUNTY$STATE == "FL", ]
COUNTY = spTransform(COUNTY, CRS(proj4string(Torn2)))
Subset by state boundary.
st = overlay(Torn2, STATE)
st = !is.na(st)
Torn3 = Torn2[st, ]
Check the subsetting.
plot(STATE)
plot(Torn3, add = TRUE)
table(Torn3$STATE)
##
## AK AL AR AZ CA CO CT DC DE FL GA HI IA ID IL
## 0 1 0 0 0 0 0 0 0 2751 1 0 0 0 0
## IN KS KY LA MA MD ME MI MN MO MS MT NC ND NE
## 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## NH NJ NM NV NY OH OK OR PA PR RI SC SD TN TX
## 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## UT VA VT WA WI WV WY
## 0 0 0 0 0 0 0
# plot(Torn2[Torn2$STATE=='NE', ], add=TRUE, col='red')
stTorPerYrObs = dim(Torn3@data)[1]/length(1950:2011)
stTorPerYrObs/stArea * 1e+10 # Annual statewide number of tornadoes per year per 10000 square km
## [1] 2.891
Compute total area of tornado paths and annual average area.
tArea = sum(Torn3$Area)
tArea/1e+06
## [1] 668.5
areaPerYear = tArea/length(1950:2011)
areaPerYear/1e+06
## [1] 10.78
Estimate the annual average recurrence interval (return period) at any location and the annual probability of getting hit by a tornado.
By the principle of geometrical probability, the chance of a tornado striking a location is the ratio of the mean area covered by the tornadoes per year to the area over which the tornadoes may occur.
stArea/areaPerYear # Annual avg recurrence interval for any location
## [1] 14244
areaPerYear/stArea * 100 # Annual probability
## [1] 0.007021
Next, create and summarize a planar point pattern object.
require(spatstat)
require(maptools)
W = as(STATE, "owin")
## Checking 14 polygons...1, [Checking polygon with 2918 edges...]
## 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14.
## done.
## Checking for cross-intersection between 14 polygons...1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13.
## done.
All.ppp = as.ppp(Torn3["Category"])
All.ppp = All.ppp[W]
unitname(All.ppp) = c("meter", "meters")
summary(All.ppp)
## Marked planar point pattern: 2753 points
## Average intensity 1.79e-08 points per square meter
##
## *Pattern contains duplicated points*
##
## Coordinates are given to 2 decimal places
## i.e. rounded to the nearest multiple of 0.01 meters
##
## Multitype:
## frequency proportion intensity
## 0 1640 0.595000 1.07e-08
## 1 784 0.285000 5.10e-09
## 2 296 0.108000 1.93e-09
## 3 33 0.012000 2.15e-10
## 4 1 0.000363 6.51e-12
## 5 0 0.000000 0.00e+00
##
## Window: polygonal boundary
## 14 separate polygons (no holes)
## vertices area relative.area
## polygon 1 2918 1.527e+11 9.95e-01
## polygon 2 36 4.759e+07 3.10e-04
## polygon 3 22 5.163e+07 3.36e-04
## polygon 4 26 1.412e+07 9.20e-05
## polygon 5 69 8.935e+07 5.82e-04
## polygon 6 289 1.714e+08 1.12e-03
## polygon 7 25 6.844e+06 4.46e-05
## polygon 8 15 1.089e+07 7.09e-05
## polygon 9 9 7.209e+05 4.69e-06
## polygon 10 106 3.606e+07 2.35e-04
## polygon 11 19 1.975e+07 1.29e-04
## polygon 12 179 3.577e+08 2.33e-03
## polygon 13 11 3.512e+06 2.29e-05
## polygon 14 12 2.759e+07 1.80e-04
## enclosing rectangle: [802437, 1617257] x [-1501045, -800485] meters
## Window area = 1.53583e+11 square meters
## Unit of length: 1 meter
The native spatial units are meters.
Get data on the location of cities.
tmp = download.file("http://www.nws.noaa.gov/geodata/catalog/national/data/ci08au12.zip",
"ci08au12.zip", mode = "wb")
Read the data.
unzip("ci08au12.zip")
US = readOGR(dsn = ".", layer = "ci08au12", p4s = ll)
## OGR data source with driver: ESRI Shapefile
## Source: ".", layer: "ci08au12"
## with 43603 features and 23 fields
## Feature type: wkbPoint with 2 dimensions
UST = spTransform(US, CRS(proj4string(Torn3)))
Subset by state boundary.
st = overlay(UST, STATE)
st = !is.na(st)
Cities = UST[st, ]
plot(unmark(All.ppp), pch = 16, cex = 0.4)
plot(Cities, pch = 15, col = "green", cex = 0.4, add = TRUE)
C = as.ppp(Cities)
window = as.owin(All.ppp)
C = C[window]
Create a map of distance from nearest city centers. Distance from nearest city is a covariate in the point pattern model (ppm).
Zc = distmap(C)
Kernel density model for local report density as a function of distance from nearest city.
rhat = rhohat(unmark(All.ppp), Zc, smoother = "kernel", method = "transform",
adjust = 1)
## Warning: Some tiles with zero area contain points
## Warning: Some weights are zero
## Warning: 6 out of 487 pixel values were outside the pixel image domain and
## were estimated by convolution
plot(rhat, legend = FALSE, main = "")
stDensity = max(rhat$rho)
index = which.max(rhat$rho)
stDensityHi = rhat$hi[index]
stDensityLo = rhat$lo[index]
Report density is highest 3-4 km from city center: population distribution within cities. Spotter network tends to maximize just outside the city.
Verify significance with a parametric point pattern model.
fit = ppm(unmark(All.ppp), ~Zc, covariates = list(Zc = Zc), correction = "border",
rbord = 5)
## Warning: Some tiles with zero area contain points
## Warning: Some weights are zero
## Warning: Values of the covariate 'Zc' were NA or undefined at 0.16% (10
## out of 6155) of the quadrature points. Occurred while executing:
## ppm(unmark(All.ppp), ~Zc, covariates = list(Zc = Zc), correction =
## "border",
stDensity = exp(fit$coef[1])
summary(fit)$coefs[2, 4] * 1000
## [1] -0.2621
summary(fit)$coefs[2, 1] * 1000
## [1] -0.2466
summary(fit)$coefs[2, 5] * 1000
## [1] -0.2311
Statewide expect number of tornadoes per year.
stTorPerYrExp = stDensity * stArea/length(1950:2011)
stTorPerYrExp
## (Intercept)
## 137.7
stDensityHi * stArea/length(1950:2011)
## [1] 120.1
stDensityLo * stArea/length(1950:2011)
## [1] 104.1
stTorPerYrExp/stArea * 1e+10 # Annual statewide number of tornadoes per year per 10000 square km
## (Intercept)
## 8.968
Local recurrence interval and annual probability. First determine the mean tornado area as the product of the expection of path length and path length accounting for the correlation between them.
Width = Torn3$Width
Length = Torn3$Length
y = log(Length)
u = log(Width)
v = y + u
meanv = mean(y[y > 0]) + mean(u[u > 0])
varv = var(y[y > 0]) + var(u[u > 0]) + 2 * cor(Width, Length) * sd(y[y > 0]) *
sd(u[u > 0])
meanArea = exp(meanv + 0.5 * varv)
meanArea/1e+06
## [1] 0.09312
mean(Torn3$Area)/1e+06
## [1] 0.2428
Estimate the population corrected annual average recurrence interval (return period) at any location and the annual probability of getting hit by a tornado.
tAreaPerYr = stTorPerYrExp * meanArea
tAreaPerYr/1e+06
## (Intercept)
## 12.83
stArea/tAreaPerYr # Annual avg recurrence interval for any location
## (Intercept)
## 11975
tAreaPerYr/stArea * 100 # Annual probability of a tornado at any location
## (Intercept)
## 0.008351
Observed versus expected annual probabilities by county.
COUNTY$ctArea = gArea(COUNTY, byid = TRUE)
ct = over(Torn2, COUNTY)
Torn2$COUNTYNAME = ct$COUNTYNAME
require(plyr)
## Loading required package: plyr
perCounty = ddply(Torn2@data, .(COUNTYNAME), summarize, obs = length(COUNTYNAME),
totalArea = sum(Area))
xx = merge(COUNTY@data, perCounty, by = "COUNTYNAME")
xx$exp = stDensity * xx$ctArea
yy = xx[order(xx$OBJECTID_1), ]
COUNTY$ctTorPerYrObs = yy$obs/length(1950:2011)
COUNTY$ctTorPerYrExp = yy$exp/length(1950:2011)
COUNTY$apR = pnbinom(0, size = 1, mu = COUNTY$ctTorPerYrObs, lower.tail = FALSE) *
100
COUNTY$apC = pnbinom(0, size = 1, mu = COUNTY$ctTorPerYrExp, lower.tail = FALSE) *
100
COUNTY$apCperA = COUNTY$ctTorPerYrExp * meanArea/COUNTY$ctArea * 100
require(RColorBrewer)
## Loading required package: RColorBrewer
range(COUNTY$apR)
## [1] 4.615 71.163
rng = seq(0, 90, 10)
cls = brewer.pal(9, "YlOrRd")
p1 = spplot(COUNTY, "apR", col.regions = cls, at = rng, colorkey = list(space = "bottom",
labels = list(cex = 1)), sub = list("Annual Tornado Probability (%)\n [raw]",
cex = 1, font = 1))
range(COUNTY$apC)
## [1] 0.3997 82.9957
rng = seq(0, 90, 10)
cls = brewer.pal(9, "YlOrRd")
p2 = spplot(COUNTY, "apC", col.regions = cls, at = rng, colorkey = list(space = "bottom",
labels = list(cex = 1)), sub = list("Annual Tornado Probability (%)\n [adjusted]",
cex = 1, font = 1))
p1 = update(p1, main = textGrob("a", x = unit(0.05, "npc")))
p2 = update(p2, main = textGrob("b", x = unit(0.05, "npc")))
F0-F1: weak
F2-F3: strong
F4-F5: violent
F2-F5: significant
F3-F5: intense (as an opposite we could use 'moderate')
Create two ppp objects and join them. This is needed for the relative risk function.
plot(split(All.ppp))
H = unmark(All.ppp[All.ppp$marks <= 1 & All.ppp$marks >= 0])
S = unmark(All.ppp[All.ppp$marks >= 2])
All2.ppp = superimpose(H = H, S = S)
## Warning: data contain duplicated points
summary(All2.ppp)
## Marked planar point pattern: 2753 points
## Average intensity 1.79e-08 points per square meter
##
## *Pattern contains duplicated points*
##
## Coordinates are given to 2 decimal places
## i.e. rounded to the nearest multiple of 0.01 meters
##
## Multitype:
## frequency proportion intensity
## H 2420 0.88 1.58e-08
## S 330 0.12 2.15e-09
##
## Window: polygonal boundary
## 14 separate polygons (no holes)
## vertices area relative.area
## polygon 1 2918 1.527e+11 9.95e-01
## polygon 2 36 4.759e+07 3.10e-04
## polygon 3 22 5.163e+07 3.36e-04
## polygon 4 26 1.412e+07 9.20e-05
## polygon 5 69 8.935e+07 5.82e-04
## polygon 6 289 1.714e+08 1.12e-03
## polygon 7 25 6.844e+06 4.46e-05
## polygon 8 15 1.089e+07 7.09e-05
## polygon 9 9 7.209e+05 4.69e-06
## polygon 10 106 3.606e+07 2.35e-04
## polygon 11 19 1.975e+07 1.29e-04
## polygon 12 179 3.577e+08 2.33e-03
## polygon 13 11 3.512e+06 2.29e-05
## polygon 14 12 2.759e+07 1.80e-04
## enclosing rectangle: [802437, 1617257] x [-1501045, -800485] meters
## Window area = 1.53583e+11 square meters
## Unit of length: 1 meter
plot(S, pch = 25, cols = "red", main = "")
Compute spatial densities and relative risk.
eps = 320
dAll = density(unmark(All2.ppp), dimyx = c(eps, eps))
attr(dAll, "sigma")
## [1] 87570
mean(dAll$v, na.rm = TRUE)
## [1] 1.681e-08
dH = density(H, dimyx = c(eps, eps))
dS = density(S, dimyx = c(eps, eps))
bw = bw.relrisk(All2.ppp, hmax = 1e+12)
rr = relrisk(All2.ppp, dimyx = c(eps, eps), sigma = bw)
With All2 a bivariate point pattern, moderate tornadoes are treated as non-events and the significant tornadoes are treated as events. The relrisk() function computes the spatially-varying risk of a significant tornado. That is it computes the probability p(u) that given a tornado at location u it will be significant (conditional probability of a significant tornado)
Statewide mean conditional probability of a significant tornado.
mean(rr$v, na.rm = TRUE) * 100
## [1] 12.56
sd(as.vector(rr$v), na.rm = TRUE) * 100
## [1] 3.476
range(rr$v, na.rm = TRUE) * 100
## [1] 7.007 20.177
Create a SpatialGridDataFrame (S4 class).
rr.sgdf = as(rr, "SpatialGridDataFrame")
proj4string(rr.sgdf) = CRS(proj4string(Torn2))
Extract the mean relative risk per county and multiply it by the population corrected probability of a tornado.
COUNTY$v = over(COUNTY, rr.sgdf, fn = mean)
df = COUNTY$apC * COUNTY$v
names(df) = "apS"
COUNTY$apS = df$apS
COUNTY$apCperAS = COUNTY$apCperA * COUNTY$apS/COUNTY$apC * 100
range(COUNTY$apS)
## [1] NA NA
rng = seq(0, 30, 5)
cls = brewer.pal(6, "YlOrRd")
p3 = spplot(COUNTY, "apS", col.regions = cls, at = rng, colorkey = list(space = "bottom",
labels = list(cex = 1)), sub = list("Annual Significant Tornado\n Probability (%)",
cex = 1, font = 1))
range(COUNTY$apSpA)
## Warning: no non-missing arguments to min; returning Inf
## Warning: no non-missing arguments to max; returning -Inf
## [1] Inf -Inf
rng = seq(0, 9, 1)
cls = brewer.pal(9, "YlOrRd")
p4 = spplot(COUNTY, "apCperAS", col.regions = cls, at = rng, colorkey = list(space = "bottom",
labels = list(cex = 1)), sub = list("Annual Significant Tornado\n Probability (%) At a Location",
cex = 1, font = 1))
p3 = update(p3, main = textGrob("c", x = unit(0.05, "npc")))
p4 = update(p4, main = textGrob("d", x = unit(0.05, "npc")))
Create 4 panel figure of maps.
print(p1, split = c(1, 1, 2, 2), more = TRUE)
print(p2, split = c(2, 1, 2, 2), more = TRUE)
print(p3, split = c(1, 2, 2, 2), more = TRUE)
print(p4, split = c(2, 2, 2, 2), more = FALSE)
county.df = as.data.frame(COUNTY)
county.df[rev(order(county.df$v))[1:10], ]
## OBJECTID_1 STATE CWA COUNTYNAME FIPS TIME_ZONE FE_AREA LON LAT
## 3286 3301 FL TAE Walton 12131 C pa -85.90 30.54
## 3093 3102 FL TAE Gadsden 12039 E bb -84.61 30.58
## 2482 2487 FL TAE Leon 12073 E bb -84.28 30.46
## 2523 2528 FL TAE Jackson 12063 C pa -84.92 30.78
## 3081 3090 FL TAE Jefferson 12065 E bb -83.90 30.44
## 3098 3107 FL TAE Wakulla 12129 E bb -84.40 30.17
## 3253 3265 FL TAE Liberty 12077 E bb -84.88 30.24
## 2522 2527 FL TAE Jackson 12063 C pa -85.22 30.80
## 2485 2490 FL TAE Madison 12079 E bb -83.47 30.44
## 2472 2477 FL TAE Calhoun 12013 C pa -85.20 30.41
## Shape_Leng Shape_Area ctArea ctTorPerYrObs ctTorPerYrExp apR
## 3286 1.476 0.0004162 4.475e+06 0.5968 0.004013 37.37
## 3093 1.818 0.1287079 1.383e+09 0.4355 1.240099 30.34
## 2482 2.180 0.1707325 1.838e+09 0.3065 1.648010 23.46
## 2523 1.601 0.0029718 3.183e+07 0.4677 0.028548 31.87
## 3081 2.539 0.1487864 1.602e+09 0.1774 1.436612 15.07
## 3098 5.489 0.1495334 1.616e+09 0.2097 1.449666 17.33
## 3253 2.941 0.2046864 2.210e+09 0.1290 1.982215 11.43
## 2522 3.101 0.2301034 2.464e+09 0.4677 2.209868 31.87
## 2485 2.084 0.1741421 1.875e+09 0.2581 1.681273 20.51
## 2472 1.842 0.1396401 1.504e+09 0.2258 1.348944 18.42
## apC apCperA v apS apCperAS
## 3286 0.3997 0.008351 NA NA NA
## 3093 55.3591 0.008351 0.1967 10.8916 0.1643
## 2482 62.2358 0.008351 0.1954 12.1615 0.1632
## 2523 2.7755 0.008351 0.1929 0.5354 0.1611
## 3081 58.9594 0.008351 0.1904 11.2269 0.1590
## 3098 59.1781 0.008351 0.1878 11.1112 0.1568
## 3253 66.4679 0.008351 0.1842 12.2453 0.1538
## 2522 68.8461 0.008351 0.1812 12.4736 0.1513
## 2485 62.7043 0.008351 0.1804 11.3094 0.1506
## 2472 57.4277 0.008351 0.1792 10.2887 0.1496
Subset by county boundary.
CTY = subset(COUNTY, COUNTY$COUNTYNAME == "Jefferson")
ct = overlay(Torn2, CTY)
ct = !is.na(ct)
Torn3 = Torn2[ct, ]
table(Torn3$Category)
##
## 0 1 2 3 4 5
## 6 5 0 0 0 0