Class Project: Spring 2015

An ecological study of the relationship between tornado occurrence and terrain roughness

Ecological studies use statistics to examine factors that modify risk. Data are typically aggregated by area. Our application concerns factors that modify the risk of tornadoes.

Put tornado counts onto a raster grid

download.file(url = "http://www.spc.noaa.gov/gis/svrgis/zipped/tornado.zip",
              destfile = "tornado.zip")
unzip("tornado.zip")
library("rgdal")

## Loading required package: sp
## rgdal: version: 0.9-1, (SVN revision 518)
## Geospatial Data Abstraction Library extensions to R successfully loaded
## Loaded GDAL runtime: GDAL 1.9.2, released 2012/10/08
## Path to GDAL shared files: /Users/jelsner/Library/R/3.1/library/rgdal/gdal
## Loaded PROJ.4 runtime: Rel. 4.8.0, 6 March 2012, [PJ_VERSION: 480]
## Path to PROJ.4 shared files: /Users/jelsner/Library/R/3.1/library/rgdal/proj

TornL = readOGR(dsn = ".", layer = "tornado", 
                stringsAsFactors = FALSE)

## OGR data source with driver: ESRI Shapefile 
## Source: ".", layer: "tornado"
## with 57988 features and 21 fields
## Feature type: wkbLineString with 2 dimensions

library("raster")
r = raster(xmn = -102, xmx = -95,
           ymn = 36, ymx = 42,
           resolution = .25)
sp = as(r, 'SpatialPolygons')
spT = spTransform(sp, CRS(proj4string(TornL)))

At this resolution cells all have about the same area so the exposure to tornadoes is the same. The CRS of the native spatial data is Lambert conic conformal (lcc) with units of meters. Here we create a raster by specifying a bounding box and a resolution of .25 degree. The domain is centered on Kansas but includes parts of Nebraska, Iowa, Oklahoma, and Texas. Change the raster to spatial polygons. This is used to perform the overlay of the tornado tracks. Convert the polygon grid CRS to the projection of the tornado tracks.

There are 24 cells north to south (rows) and 28 cells east to west (cols) for a total of 672. The CRS is geographic. Overlay the tornado tracks onto the spatial polygons with the over() function from the sp package. It works with various combinations of spatial objects but the rgeos package is needed for it to work with spatial polygons and spatial lines.

Remove duplicate tracks. Overlay tracks onto cells and return a vector indicating either NA (is not inside extent) or the cell number. Subset the tornadoes by the raster extent. Compute the track lengths and store in data frame with start and end lat/lon. Subset by duplicates of this data frame and paths exceeding 100 m.

library("rgeos")

## rgeos version: 0.3-4, (SVN revision 438)
##  GEOS runtime version: 3.3.3-CAPI-1.7.4 
##  Polygon checking: TRUE

tc = over(TornL, spT)
TornL2 = subset(TornL, !is.na(tc))
PathD = SpatialLinesLengths(TornL2)
df = data.frame(PathD, 
                SLAT = as.numeric(TornL2$SLAT), 
                SLON = as.numeric(TornL2$SLON), 
                ELAT = as.numeric(TornL2$ELAT), 
                ELON = as.numeric(TornL2$ELON),
                DATE = TornL2$DATE)
dup = duplicated(df)
TornL3 = subset(TornL2, !dup)
#TornL3 = subset(TornL3, as.numeric(YR) >= 1990)

Add a buffer to the tracks based on the width specified in the attribute table to create a spatial polygons data frame. Overlay the path polygons on the cell polygons. Add minimum widths to tornadoes without width.

Width = as.numeric(TornL3$WID) * .9144
Width[Width == 0] = min(Width[Width != 0])
TornP3 = gBuffer(TornL3, byid = TRUE, width = Width/2)
ct = over(spT, TornP3, returnList = TRUE)
nT = sapply(ct, function(x) length(x))
sumL = sapply(ct, function(x) sum(TornP3$Length[x]))
nt = r
values(nt) = nT

The result is a list. Each element of the list is a vector of row numbers from the attribute table corresponding to paths occurring in the cell. There are 672 cells. The order of the list matches the order of the raster (upper left to lower right in lexigraphical order). From the list we get the length of each vector. This is done with the sapply() (simple apply) where the first argument is the list object and the second argument is a function. These counts are then placed onto the original raster with the values() function. The GIS overlay operation uses projected coordinates. The analysis/display uses geographic coordinates.

Compare using lines only.

ct2 = over(spT, TornL3, returnList = TRUE)
nT2 = sapply(ct2, function(x) nrow(x))
(sum(nT) - sum(nT2))/sum(nT) * 100

## [1] 4.380215

There are 4.4% more tornado counts using tornado path compared to using tornado track.

Get elevation raster from a DEM

Digital elevation model data are available from http://www.viewfinderpanoramas.org/DEM/TIF15/ Get the elevation raster and crop it to the extent of the tornado raster r. Compute the elevation roughness with the terrain() function. Here roughness is the difference between the maximum and the minimum elevation in the cell and the eight surrounding cells. Match the resolution and origin of the elevation rasters with those of the tornado raster by degrading the resolution of elevation roughness. Compute the correlation between the number of tornadoes and surface roughness & elevation.

#download.file(url = "http://www.viewfinderpanoramas.org/DEM/TIF15/15-H.ZIP",
#              destfile = "15-H.ZIP", mode = "wb")
#unzip("15-H.ZIP")
download.file(url = "http://myweb.fsu.edu/jelsner/data/15-H.tif.zip",
              destfile = "15-H.tif.zip", mode = "wb")
unzip("15-H.tif.zip")
Elev = raster("15-H.tif")
Elev = crop(Elev, nt)
TR = terrain(Elev, opt = 'roughness')
el = resample(aggregate(Elev, fact = c(nrow(r), ncol(r)), fun = mean), r)
tr = resample(aggregate(TR, fact = c(nrow(r), ncol(r)), fun = mean), r)

Get a population raster

Gridded Population of the World (GPW), v3. http://sedac.ciesin.columbia.edu/data/set/gpw-v3-population-density/data-download Values are persons per square km. Download as grid and read as raster. Crop to extent of tornado raster and match the resolution of the tornado grid.

download.file(url = "http://myweb.fsu.edu/jelsner/data/usadens.zip",
              destfile = "usadens.zip")
unzip("usadens.zip")
Pop = raster("usadens/usads00g/w001001.adf")
Pop = crop(Pop, r)
pop = resample(aggregate(Pop, fact = c(nrow(r), ncol(r)), fun = mean), r)

Exploratory analysis

Scatter plots

library("ggplot2")
df = data.frame(nT = values(nt),
                el = values(el),
                tr = values(tr),
                pop = values(pop))
ggplot(df, aes(x = tr, y = nT)) +
  geom_point() +
  geom_smooth() +
  ylab("Number of Tornadoes") +
  xlab("Terrain Roughness (m)")

## geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use 'method = x' to change the smoothing method.

ggplot(df, aes(x = log2(pop), y = nT)) +
  geom_point() +
  geom_smooth(method = lm) +
  ylab(expression(paste("Number of Central Plains Tornadoes (1990-2013), .25", degree, "resolution"))) +
  xlab(expression(paste("Population density (people per ", km^2, "), in 2000"))) +
  scale_x_continuous(breaks = c(1, 2, 4, 8),
                     labels = c(1, 2, 4, 8))

ggplot(df, aes(x = el, y = nT)) +
  geom_point() +
  geom_smooth(method = lm) +
  ylab("Number of Tornadoes") +
  xlab("Elevation (m)")

Histogram of the number of tornadoes

ggplot(df, aes(x = nT)) + 
  geom_histogram(binwidth = 2, color = "white") +
  ylab("Number of cells") +
  xlab("Number of tornadoes")

mean(df$nT)

## [1] 14.98214

var(df$nT)/mean(df$nT)

## [1] 3.651618

The mean number of tornadoes is 15.1 (tornadoes/.25 deg/64 years). The ratio of the variance to the mean is 3.76. The cell counts are more dispersed than a Poisson distribution.

Get county & state boundaries. Get the tornado tracks. Convert to the geographic coordinates of the raster.

library("mapproj")

## Loading required package: maps

library("maptools")

## Checking rgeos availability: TRUE

ext = as.vector(extent(r))
bndryC = map("county", fill = TRUE,
            xlim = ext[1:2],
            ylim = ext[3:4],
            plot = FALSE)
IDs = sapply(strsplit(bndryC$names, ":"), function(x) x[1])
bndryCP = map2SpatialPolygons(bndryC, IDs = IDs,
                              proj4string = CRS(projection(r)))
bndryS = map("state", fill = TRUE,
            xlim = ext[1:2],
            ylim = ext[3:4],
            plot = FALSE)
IDs = sapply(strsplit(bndryS$names, ":"), function(x) x[1])
bndrySP = map2SpatialPolygons(bndryS, IDs = IDs,
                              proj4string = CRS(projection(r)))
TornP3T = spTransform(TornP3, CRS(projection(r)))

Create the map

library("rasterVis")

## Loading required package: lattice
## Loading required package: latticeExtra
## Loading required package: RColorBrewer
## 
## Attaching package: 'latticeExtra'
## 
## The following object is masked from 'package:ggplot2':
## 
##     layer
## 
## Loading required package: hexbin

library("wesanderson")
range(values(nt))

## [1]  1 49

rng = seq(0, 50, 5)
cr = wes.palette(name = "Zissou", type = "continuous")
p1 = levelplot(nt, margin = TRUE, 
          sub = "Tornado Counts (1950-2013)", 
          xlab = NULL, ylab = NULL, 
          col.regions = cr, at = rng, 
          colorkey = list(space = 'bottom'),
          par.settings = list(fontsize = list(text = 15)))
p1 + 
  latticeExtra::layer(sp.polygons(bndryCP, col = gray(.85), lwd = 1)) +
  latticeExtra::layer(sp.polygons(bndrySP, col = gray(.15), lwd = 1)) +
  latticeExtra::layer(sp.polygons(TornP3T, fill = gray(.4), 
                                  col = gray(.4), alpha = .3))

Convert raster layers to polygons

Convert raster layers to polygons. Change the attribute name to nT in the spatial polygons data frame and add the elevation roughness as another column.

spdf = as(nt, "SpatialPolygonsDataFrame")
names(spdf) = "nT"
spdf$el = values(el)
spdf$tr = values(tr)
spdf$pop = values(pop)
spdfT = spTransform(spdf, CRS(proj4string(TornP3)))
spdf$area = gArea(spdfT, byid = TRUE)
spdf$l2pop = log2(spdf$pop)
spdf$ID = 1:ncell(nt)
cor(spdf$el, spdf$tr)

## [1] -0.07755983

cor(spdf$el, spdf$pop)

## [1] -0.5615524

cor(spdf$tr, spdf$pop)

## [1] 0.1249989

Spatial model

Some controls for INLA.

#source("http://www.math.ntnu.no/inla/givemeINLA.R")
library("INLA")

## Loading required package: Matrix
## Loading required package: splines

Spatial neighborhood definition as an inla graph

library("spdep")
nb = poly2nb(spdf)
nb2INLA("g", nb)
g = inla.read.graph("g")

Model for the smoothed relative rate.

formula0 = nT ~ f(ID, model = "besag", graph = g)
model0  =  inla(formula0, family = "nbinomial", E = area/10^6,
                data = spdf@data)
summary(model0)

## 
## Call:
## c("inla(formula = formula0, family = \"nbinomial\", data = spdf@data, ",  "    E = area/10^6)")
## 
## Time used:
##  Pre-processing    Running inla Post-processing           Total 
##          0.1481          0.8950          0.0375          1.0806 
## 
## Fixed effects:
##                mean     sd 0.025quant 0.5quant 0.975quant   mode kld
## (Intercept) -3.7458 0.0145    -3.7741  -3.7459    -3.7171 -3.746   0
## 
## Random effects:
## Name   Model
##  ID   Besags ICAR model 
## 
## Model hyperparameters:
##                                                        mean     sd
## size for the nbinomial observations (overdispersion) 17.673 2.6565
## Precision for ID                                      2.782 0.4257
##                                                      0.025quant 0.5quant
## size for the nbinomial observations (overdispersion)     12.995   17.489
## Precision for ID                                          2.047    2.748
##                                                      0.975quant   mode
## size for the nbinomial observations (overdispersion)     23.417 17.140
## Precision for ID                                          3.712  2.678
## 
## Expected number of effective parameters(std dev): 210.17(24.12)
## Number of equivalent replicates : 3.197 
## 
## Marginal Likelihood:  -2789.89

rSR = r
values(rSR) = (exp(model0$summary.random$ID$mean) - 1) * 100

Plot the smoothed rate.

library("RColorBrewer")
range(values(rSR))

## [1] -64.70234 134.29750

rng = seq(-100, 150, 50)
rngL = paste(rng, '%', sep = "")
cr = rev(brewer.pal(8, "RdBu"))
cr = cr[-(1:2)]
p2 = levelplot(rSR, margin = TRUE, 
          sub = "Tornado Reports (Above/Below Regional Average)             ",
          xlab = NULL, ylab = NULL, 
          col.regions = cr, at = rng,
          colorkey = list(at = rng, labels = rngL, col = cr),
          par.settings = list(fontsize = list(text = 13)))
p2 + 
  latticeExtra::layer(sp.polygons(bndryCP, col = gray(.85), lwd = 1)) +
  latticeExtra::layer(sp.polygons(bndrySP, col = gray(.15), lwd = 1))

Add covariates to the model

formula1 = nT ~ f(ID, model = "besag", graph = g) + 
                el + l2pop + tr
model1 = inla(formula = formula1, family = "nbinomial", E = area/10^6,
             data = spdf@data)
summary(model1)

## 
## Call:
## c("inla(formula = formula1, family = \"nbinomial\", data = spdf@data, ",  "    E = area/10^6)")
## 
## Time used:
##  Pre-processing    Running inla Post-processing           Total 
##          0.1187          0.8853          0.0337          1.0377 
## 
## Fixed effects:
##                mean     sd 0.025quant 0.5quant 0.975quant    mode kld
## (Intercept) -3.4921 0.2533    -3.9956  -3.4902    -2.9993 -3.4864   0
## el          -0.0002 0.0003    -0.0008  -0.0002     0.0005 -0.0002   0
## l2pop        0.0831 0.0329     0.0188   0.0830     0.1480  0.0828   0
## tr          -0.0272 0.0034    -0.0339  -0.0272    -0.0204 -0.0272   0
## 
## Random effects:
## Name   Model
##  ID   Besags ICAR model 
## 
## Model hyperparameters:
##                                                        mean    sd
## size for the nbinomial observations (overdispersion) 16.371 2.375
## Precision for ID                                      4.752 1.056
##                                                      0.025quant 0.5quant
## size for the nbinomial observations (overdispersion)     12.147   16.223
## Precision for ID                                          3.069    4.616
##                                                      0.975quant   mode
## size for the nbinomial observations (overdispersion)     21.463 15.949
## Precision for ID                                          7.192  4.346
## 
## Expected number of effective parameters(std dev): 151.01(25.48)
## Number of equivalent replicates : 4.45 
## 
## Marginal Likelihood:  -2783.44

Elevation is not significant. Drop it from the model.

formula2 = nT ~ f(ID, model = "besag", graph = g) + 
                l2pop + tr
model2 = inla(formula = formula2, family = "nbinomial", E = area/10^6,
             data = spdf@data)
summary(model2)

## 
## Call:
## c("inla(formula = formula2, family = \"nbinomial\", data = spdf@data, ",  "    E = area/10^6)")
## 
## Time used:
##  Pre-processing    Running inla Post-processing           Total 
##          0.1132          0.9698          0.0321          1.1151 
## 
## Fixed effects:
##                mean     sd 0.025quant 0.5quant 0.975quant    mode kld
## (Intercept) -3.6068 0.1015    -3.8079  -3.6063    -3.4088 -3.6053   0
## l2pop        0.0896 0.0303     0.0301   0.0895     0.1491  0.0895   0
## tr          -0.0271 0.0034    -0.0338  -0.0271    -0.0204 -0.0271   0
## 
## Random effects:
## Name   Model
##  ID   Besags ICAR model 
## 
## Model hyperparameters:
##                                                        mean    sd
## size for the nbinomial observations (overdispersion) 16.433 2.375
## Precision for ID                                      4.743 1.040
##                                                      0.025quant 0.5quant
## size for the nbinomial observations (overdispersion)     12.200   16.290
## Precision for ID                                          3.088    4.609
##                                                      0.975quant   mode
## size for the nbinomial observations (overdispersion)     21.509 16.028
## Precision for ID                                          7.143  4.341
## 
## Expected number of effective parameters(std dev): 150.76(25.17)
## Number of equivalent replicates : 4.457 
## 
## Marginal Likelihood:  -2772.09

rSR = r
values(rSR) = (exp(model2$summary.random$ID$mean) - 1) * 100

range(values(rSR))

## [1] -41.40773  69.69852

rng = seq(-60, 80, 20)
rngL = paste(rng, '%', sep = "")
cr = rev(brewer.pal(8, "RdBu"))
cr = cr[-(1)]
p3 = levelplot(rSR, margin = TRUE, 
          sub = "Tornado Reports (Above/Below Regional Average)             ",
          xlab = NULL, ylab = NULL, 
          col.regions = cr, at = rng,
          colorkey = list(at = rng, labels = rngL, col = cr),
          par.settings = list(fontsize = list(text = 13)))

p3 + 
  latticeExtra::layer(sp.polygons(bndryCP, col = gray(.85), lwd = 1)) +
  latticeExtra::layer(sp.polygons(bndrySP, col = gray(.15), lwd = 1)) +
  latticeExtra::layer({SpatialPolygonsRescale(layout.north.arrow(type = 1), 
             offset = c(-95.5, 41.2), 
             scale = .75)})

#  layer({SpatialPolygonsRescale(layout.scale.bar(),
#                                offset = c(-101.5, 41.8),
#                                scale = 1, fill = c("transparent", "black"))}) +
#  layer(sp.text(loc = c(-101.5, 41.7), "0")) +
#  layer(sp.text(loc = c(-100.1, 41.7), "100 km"))

Function for plotting the density of the marginal term.

ggplotmargin <- function(x, type, effect, xlab, ylab = "Posterior Density",
                         int.value = c(value = 0, 5, 95),
                         color = c("red", "gray", "gray")){
xx = as.data.frame(inla.smarginal(x[[paste("marginals", type, sep=".")]][[effect]]))
  out = ggplot(xx, aes(x, y)) + geom_line(size = 1) + ylab(ylab) + xlab(xlab)    
if(length(int.value) == 0) int.value = 0
int.value = lapply(int.value, function(x) if(is.character(x)) 
  type.convert(x, as.is = TRUE) else x)
int.value = lapply(int.value, function(x) if(is.character(x)) 
  lapply(strsplit(x, "=")[[1]], type.convert, as.is = TRUE) else x)
nx = names(int.value)
if(!is.null(nx))
   for(i in which(nx != ""))  int.value[[i]] = list(nx[i], int.value[[i]])
    int.value = sapply(int.value, function(x) {
                      if(is.numeric(x)) xx$x[which.max(cumsum(xx$y)/sum(xx$y) >= as.numeric(x/100))]
                      else switch(x[[1]], 
                      mean = sum(xx$y*xx$x)/sum(xx$y), 
                      median = xx$x[which.max(cumsum(xx$y)/sum(xx$y) >=.5)],
                      mode = xx$x[which.max(xx$y)],
                      value = x[[2]],
                      zero = 0)})

if(length(color) <= length(int.value)) color = rep(color, length = length(int.value))
for(i in 1:length(int.value)) out = out + geom_vline(xintercept = int.value[i], color = color[i]) 
out
}

results = model2
results$marginals.fixed$tr[, 1] = (exp(-results$marginals.fixed$tr[, 1]) - 1) * 100
results$marginals.fixed$l2pop[, 1] = (exp(results$marginals.fixed$l2pop[, 1]) - 1) * 100

ggplotmargin(results, type = "fixed", effect = "tr", 
             xlab = "% decrease in tornado reports\n per meter increase in elevation roughness")

ggplotmargin(results, type = "fixed", effect = "l2pop", 
             xlab = "% increase in tornado reports\n per doubling of population")