Introduction
Tornadoes are considered one of the most dangerous and unpredictable weather event causing damage or worse death. Oklahoma state has had a complex history of tornadoes with the most deadly on record in 1947, the Woodward tornado with 116 fatalities and 782 injuries with severe damage to property Woodward Tornado, Oklahoma 1947, F5 category. Tornadoes can produce multiple storms extending for miles across the state in well-defined damage paths that may or may not meet up to form stronger super tornadoes. The Woodward tornado F5 category (violent): 261-318 mph, (rare) incredible damage [Woodward F5 Category] (https://www.weather.gov/oun/tornadodata-county-ok)
The data set used in the analysis is a historic data set of tornadoes in the state of Oklahoma spanning the years 1950 to 2021, provided in Esri shapefile “multiple file format (include feature geometries, attribute table, spatial indices, and coordinate reference system). The objective of this analysis is to map the patterns of the more recent tornadoes for years 2016 to 2021. The tornado data includes a number of variable columns with information tornado touchdown point location and tornado paths.
Objective 1 - Generate a map for tornado paths with each year displayed as a different color. Create a static composite map with both tornado paths and tornado points with plot_grid()
Objective 2 - Summarize the density of tornado points by both country and year and generate a faceted static plot maps of country -level tornado density from 2016-2021
Objective 3 - Generate four chloropleth maps of tornado density based on quantile breaks with numbers of classes ranging from 3-6. Create a composite figure containing the four maps using plot_grid() and examine how the number of classes changes the interpretation.
Data Preparation
library(sf)
library(ggplot2)
library(dplyr)
library(tidyr)
library(RColorBrewer)
library(cowplot)
library(units)
library(here) #* Load the required packages
Data preparation
The analysis will focus on the sf() R-package created by Pebesma, 2022 to work with vector data sets in “R”. A series of data frame objects will be created to store the vector data. The dplyr and tidyr functions will also be used and ggplot2 will be used to generate maps from data frames. The sf and tidyverse packages will be used together to work with the multiple format spatial data.
# *Use st_read to import multiple file format shapefiles (point, line, polygons) to sf objects and assign objects names*
okcounty <- st_read("ok_counties.shp", quiet = TRUE) #*Import county boundaries OK state*
tpoint <- st_read("ok_tornado_point.shp", quiet = TRUE) #*Import tornado point data*
tpath <- st_read("ok_tornado_path.shp", quiet = TRUE) #*Import tornado paths followed after touchdown*
class(okcounty)
## [1] "sf" "data.frame"
glimpse(okcounty) # *Take a look at variable types in data set; note: the sf object(s) contain a column called geometry used by functions in sf*
## Rows: 77
## Columns: 8
## $ STATEFP <chr> "40", "40", "40", "40", "40", "40", "40", "40", "40", "40", "…
## $ COUNTYFP <chr> "077", "025", "011", "107", "105", "153", "001", "053", "059"…
## $ COUNTYNS <chr> "01101826", "01101800", "01101793", "01101841", "01101840", "…
## $ AFFGEOID <chr> "0500000US40077", "0500000US40025", "0500000US40011", "050000…
## $ GEOID <chr> "40077", "40025", "40011", "40107", "40105", "40153", "40001"…
## $ NAME <chr> "Latimer", "Cimarron", "Blaine", "Okfuskee", "Nowata", "Woodw…
## $ LSAD <chr> "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "…
## $ geometry <POLYGON [°]> POLYGON ((-95.50766 35.0292..., POLYGON ((-103.0025 3…
okcounty_sp <- as(okcounty, 'Spatial')
class(okcounty_sp) # *Convert geospatial tornado data from an sf object to a sp object if increased functionality is needed*
## [1] "SpatialPolygonsDataFrame"
## attr(,"package")
## [1] "sp"
okcounty_sf <- st_as_sf(okcounty_sp)
class(okcounty_sf) #* Convert back to an sf object for the purposes of this study*
## [1] "sf" "data.frame"
Now examine the column names for the tornado variable of interest and note the variable yr
names(tpoint) # *Examine the column names for tornado year (yr) variable*
## [1] "om" "yr" "mo" "dy" "date" "time"
## [7] "tz" "st" "stf" "stn" "mag" "inj"
## [13] "fat" "loss" "closs" "slat" "slon" "elat"
## [19] "elon" "len" "wid" "fc" "geometry"
Next modify the sf objects (point data and polyline data) using filter() the data to a narrower yr range (2016 to 2021)
tpoint_16_21 <- tpoint %>%
filter(yr >= 2016 & yr <= 2021) %>%
select(om, yr, date)
tpath_16_21 <- tpath %>%
filter(yr >= 2016 & yr <= 2021) %>%
select(om, yr, date) #*Filter the yr variable for the years of interest*
Analysis
Objective 1 To create a composite plot with tornado initiation points and tornado paths after touchdown colored by year use ggplot2 and plot_grid () function.
# make a plot grid using plot_grid() function showing tornado touchdown points
p1 <- ggplot() +
geom_sf(data = okcounty, fill = NA) +
geom_sf(data = tpath_16_21, linewidth = .75, aes(color = as.factor(yr),))
p2 <- ggplot() +
geom_sf(data = okcounty, fill = NA) +
geom_sf(data = tpoint_16_21,
aes(color = as.factor(yr)), size = .5)
# arrange two plots into one column
# arrange two plots into one column
plot_grid(
p1, p2,
labels = "AUTO", ncol = 1
)
# add a single title for the composite plot
title <- ggdraw() +
draw_label(
"Oklahoma Tornadoes between 2016 and 2021",
fontface = 'bold',
x = 0,
hjust = 0)
Figure 1. Composite plot of Oklahoma tornado touchdowns and paths between 2016 to 2021 colored by year. This static visualization is difficult to interpret. A more effective figure for conveying information may be to combine the two data sets into a single image. Plot A of tornado paths after touchdown shows that there was strong activity in 2019 (teal) in the northeast part of the state. This is verified by a strong clustering of tornado touchdowns (teal) in Plot B that also shows a strong east-northeast tornado trend across the middle of the state.
Figure 2. Touchdown points of tornadoes in Oklahoma from 2016 and 2021 with years represented by individual images in a facet plot. Tornado touchdown point patterns vary over the years. There was increased activity in 2019 shown as a clustering of tornado touchdowns in central and northeastern part of the state, In 2021 the points form a well-defined east-northeast trend.
Objective 2 - Summarize the density of tornado points by both county and year and generate a faceted plot maps of county -level tornado density from 2016-2021.
Task 2 - Data Preparation
Use st_join() to carry out a spatial join on tornado paths and tornado points data sets
countypnt <- st_join(tpoint_16_21, okcounty) #* Use st_join to carry out a spatial join on tornado paths and tornado point data sets*
countypnt <- st_drop_geometry(countypnt)
countysum <- countypnt %>%
group_by(yr, GEOID) %>%
summarize(tcnt = n())
The joined data frame still contains one record for each tornado. To compute the total number of tornadoes per county, county point must be grouped by the GEOID county code. Then, the numbers of tornadoes in each county can be calculated using summarize() with the n() function to count the records in each group. Before grouping and summarizing, county point must be converted from an sf object to a normal data frame using st_drop_geometry().
Calculating Tornado Touchdown Density at the County Level
In order to calculate tornado density at the county level, the county areas need to be calculated (before the join) then county data is joined to the summary of tornado touchdowns per county. If the area was calculated after the join there would be a discrepancy in number of records (okcounty vs countysum). Na values need to be dropped to avoid applying a zero for a year which doesn’t make sense to visualize.
#* Join tornado count per county to calculate density at the county-level*
countymap <- okcounty %>%
mutate(area = st_area(okcounty)) %>% #*Calculate area before join*
left_join(countysum, by = "GEOID") %>%
na.omit() %>% #* To omit counties with no tornadoes*
mutate(tdens = 10^6 * 10^3 * tcnt / area) %>% #*Mutate- density at county-level*
drop_units() #* removed for simplicity*
Now generate a faceted plot using ggplot2 of the county-level tornado density by year
ggplot() +
geom_sf(data = okcounty) +
geom_sf(data = countymap, aes(fill = tdens)) +
scale_fill_distiller(name = expression("Tornadoes/1000 km"^2),
palette = "YlOrRd",
direction = 1) +
facet_wrap(~ yr) +
theme_void() +
ggtitle("Tornado Density at the County Level from 2016 to 2021") +
theme(legend.position = "bottom",
plot.title = element_text(hjust = 0.5, vjust = 5))
Figure 2. Tornado density at the county level form 2016 to 2021. There was the highest density of tornado touchdowns per county in 2019 centered in central and northeastern Oklahoma. In 2017 the tornado touchdowns per county were more irregular in both the eastern, southern, and western parts of the state. In 2021 tornado density per county were highest along the central part of the state.
Objective 3 - Generate four chloropleth maps of tornado density based on quantile breaks with numbers of classes ranging from 3-6. Create a composite figure containing the four maps using plot_grid() and examine how the number of classes changes the interpretation.
countypnt <- st_join(tpoint_16_21, okcounty) #* Use st_join to carry out a spatial join on tornado paths and tornado point data sets*
countypnt <- st_drop_geometry(countypnt)
countysum <- countypnt %>%
group_by(yr, GEOID) %>%
summarize(tcnt = n())
## `summarise()` has grouped output by 'yr'. You can override using the `.groups`
## argument.
#* Join tornado count per county to calculate density at the county-level*
countymap <- okcounty %>%
mutate(area = st_area(okcounty)) %>% #*Calculate area before join*
left_join(countysum, by = "GEOID") %>%
na.omit() %>% #* To omit counties with no tornadoes*
mutate(tdens = 10^6 * 10^3 * tcnt / area) %>% #*Mutate- density at county-level*
drop_units() #* removed for simplicity*
Create quantile three classes ranging from 3 to 4, 4 to 5, 5 to 6
numclas <- c(3, 4, 5, 6) #* a reasonable number classes first*
countymap_quant <- countymap %>%
mutate(tdens_c3 = cut(tdens,
breaks = quantile(tdens,
probs = seq(0, 1, length.out = 4)),
include.lowest = T)) %>%
mutate(tdens_c4 = cut(tdens,
breaks = quantile(tdens,
probs = seq(0, 1, length.out = 5)),
include.lowest = T)) %>%
mutate(tdens_c5 = cut(tdens,
breaks = quantile(tdens,
probs = seq(0, 1, length.out = 6)),
include.lowest = T)) %>%
mutate(tdens_c6 = cut(tdens, breaks = quantile(tdens,
probs = seq(0, 1, length.out = 7)),
include.lowest = T))
Plot the choropleth visualizations of the three different classes and save them as objects and use plot_grid to visualize them together to enhance the readers ability to interpret.
choropleth3 <- ggplot (data = countymap_quant) +
geom_sf(aes(fill = tdens_c3)) +
scale_fill_brewer(name = expression("Tornadoes/1000 km"^2),
palette = "YlOrRd") +
theme_void() +
theme(legend.position = "left")
choropleth4 <- ggplot (data = countymap_quant) +
geom_sf(aes(fill = tdens_c4)) +
scale_fill_brewer(name = expression("Tornadoes/1000 km"^2),
palette = "YlOrRd") +
theme_void() +
theme(legend.position = "left")
choropleth5 <- ggplot (data = countymap_quant) +
geom_sf(aes(fill = tdens_c5)) +
scale_fill_brewer(name = expression("Tornadoes/1000 km"^2),
palette = "YlOrRd") +
theme_void() +
theme(legend.position = "left")
choropleth6 <- ggplot (data = countymap_quant) +
geom_sf(aes(fill = tdens_c6)) +
scale_fill_brewer(name = expression("Tornadoes/1000 km"^2),
palette = "YlOrRd") +
theme_void() +
theme(legend.position = "left")
#* Create a composite plot with the four quantile plots by different classes
plot_grid(choropleth3, choropleth4, choropleth5, choropleth6,
labels = c("A) Three Quantiles Years 2016-2021",
"B) Four Quantiles Years 2016-2021",
"C) Five Quantiles Years 2016-2021",
"D) Six Quantiles Years 2016-2021",
label_size = .5),
ncol = 2,
hjust = 0,
label_x = .01,
label_y = 1.0,
align = "hv")
Figure 3. These four quantile maps show the difference between using three, four, five or six quantile breaks to visualize the tornado touchdowns in each county between 2016 to 2021. The color palette was selected to enhance the variations county to county. The higher the class the more varied the plot is and makes it more difficult to interpret. Choosing the righrt number of classes for the quantile classification visualization is critical.
References
Data source: National Oceanographic and Atmospheric Administration (NOAA) National Weather Service Storm Prediction Center