Lab 4: Part A

Matthew Sommer

2024-03-21

Tornado Occurences In Oklahoma 2016 - 2021

Introduction

Tornadoes are a common occurrence in the central plains of the United States. Oklahoma is one state that often gets struck by tornadoes. This analysis will examine tornadoes between 2016 and 2021 in Oklahoma and demonstrate that some simple exploratory data analysis can show that different areas of the state may be more prone to tornadoes than others.

Data

The data for this analysis is three shapefiles that represent tornado touchdown points, tornado paths as lines, and county boundaries in Oklahoma as polygons. Using these three shapefiles, the data will be subset to the appropriate time period (2016-2021), analyzed for density of touchdowns per county, and also separated by county density per year.

The first step is to import the necessary packages for operating on spatial data and reading in the shapefiles with the sf package.

# Import packages
library(sf)
library(tidyverse)
library(units)
library(RColorBrewer)
library(cowplot)
library(scales)
library(here)


# read in shapefiles using sf package to create spatial tibbles
okcounty <- st_read(here("data/Chapter5", "ok_counties.shp"))

tpoint <- st_read(here('data/Chapter5', 'ok_tornado_point.shp'))

tpath <- st_read(here('data/Chapter5', 'ok_tornado_path.shp'))

Analysis

The next step is to take a subset of the data for paths and points from 2016 - 2021.

# dplyr functions to narrow date range of tornado data
# sub setting tornado points to 2016 - 2021
tpoint_16_21 <- tpoint %>% 
  filter(yr >= 2016 & yr <= 2021) %>% 
  select(om, yr, date)

# sub setting tornado paths to 2016 - 2021
tpath_16_21 <- tpath %>% 
  filter(yr >= 2016 & yr <= 2021) %>% 
  select(om, yr, date)

Plotting this data over the Oklahoma county boundaries with year represented in color (Figure 1), gives an idea of the variability in time and space that is present in this data.

#### part A question A
# plotting the tornado points over county data 
tpoint_plot <- ggplot() +
  geom_sf(data = okcounty, 
          fill = NA) +
  geom_sf(data = tpoint_16_21,
          aes(color = as.factor(yr)),
          size = 1.25) +
  scale_color_brewer(palette = "Dark2",
                     name = "Year") +
  theme_void() +
  theme(legend.margin = margin(0.25, 0.5, 0.25, 0.25, "cm")) +
  labs(title = "Tornado Touchdown Points by Year")


# plotting tornado paths over county data
tpath_plot <- ggplot() +
  geom_sf(data = okcounty, fill = NA) +
  geom_sf(data = tpath_16_21,
          aes(color = as.factor(yr)),
          linewidth = 1) +
  scale_color_brewer(palette = 'Dark2',
                     name = "Year") +
  theme_void() +
  theme(legend.margin = margin(0.25, 0.5, 0.25, 0.25, "cm")) +
  labs(title = "Tornado Paths by Year")


# using plot grid to show the two plots together
plot_grid(tpoint_plot, tpath_plot,
          ncol = 1)

Figure 1: Raw tornado data shows variability in space and time as well as path length.

However, this representation isn’t conducive to making any inferences. For that, it makes more sense to break the years out into individual maps. This map demonstrates the variability between years, with 2019 standing out as a year with many tornadoes (Figure 2). It also shows that some counties tend to be more affected by tornadoes than others. Counties in the northwest and southeast portions of the state seem to have fewer tornado occurrences in this time period than the band of counties from the southwest to northeast corners of the state.

ggplot() +
  geom_sf(data = okcounty, 
          fill = NA, 
          color = "gray") +
  geom_sf(data = tpoint_16_21, size = 0.75) +
  facet_wrap(vars(yr), ncol = 2) +
  coord_sf(datum = NA) +
  labs(title = "Tornado Touchdowns by Year") +
  theme_void()

Figure 2: Points separated by year and plotted over a map of Oklahoma counties show that there is variability to the number and location of tornadoes each year.

Using the data from Figure 2, the density of tornado touchdowns can be calculated for each county in each year the analysis covers. Figure 3 shows this plot. Though the density scale is different for each year based on the relative densities, we can conclude from this map that certain counties are near the high end of the range in each year including several counties in the northeastern part of the state. Likewise, several counties in the north-central part of Oklahoma, near the Kansas border, are near the low end of the scale for each of these years. Analysis on a longer time scale could determine if this is an actual pattern or if it is coincidental to this five year period.

# create a function to filter point data by year

filter_year <- function(year){
  tpoint %>% 
  filter(yr >= year) %>%
  select(om, yr, date)
}

# call the function for each year in the time series
point2016 <- filter_year(2016)
point2017 <- filter_year(2017)
point2018 <- filter_year(2018)
point2019 <- filter_year(2019)
point2020 <- filter_year(2020)
point2021 <- filter_year(2021)


# create a function to calculate density by county and 
#    then map the results.

map_density <- function(yeardf){
  
  # use spatial join to get points in counties
  countypnt <- st_join(yeardf, okcounty)
  
  # drop the geometry so you can use polygon area instead of points
  countypnt <- st_drop_geometry(countypnt)
  
  # aggregate sum of points by county
  countysum <- countypnt %>%
    group_by(GEOID) %>%
    summarize(tcnt = n())
  
  # create a data frame that holds a density of touchdowns
  #    per county (tdens) by using a regular join, calculating
  #    county area with st_area, then computing density in tornadoes
  #    per km squared. This gives density as a continuous value.
  countymap <- okcounty %>%
    left_join(countysum, by = "GEOID") %>%
    replace(is.na(.), 0) %>%
    mutate(area = st_area(okcounty),
           tdens = 10^6 * 10^3 * tcnt / area) %>%
    drop_units()
  
  # plot the map
  ggplot(data = countymap) +
    geom_sf(aes(fill = tdens)) +
    scale_color_brewer(palette = "PuBu",
                       name = "Tornado Density") +
    theme_void()

}

# call the density mapping function for every year in the
#    time series
map_2016 <- map_density(point2016)
map_2017 <- map_density(point2017)
map_2018 <- map_density(point2018)
map_2019 <- map_density(point2019)
map_2020 <- map_density(point2020)
map_2021 <- map_density(point2021)

# use plot_grid() from cowplot to map the six yearly density
#    maps in one plot with two columns.
plot_grid(map_2016, map_2017, map_2018, map_2019, map_2020, map_2021,
          ncol = 2,
          labels = c("2016", "2017", "2018", "2019", "2020", "2021"),
          label_size = 12,
          label_x = 0, label_y = 0.8)

Figure 3: Densities of tornado touchdowns by county for each year between 2016 and 2021. The variable scale makes this difficult to interpret visually, but some counties are at the high end of the range nearly every year.

Other considerations

One note about this analysis is that the conclusions about the data may change based on the visualizations or other decisions made in the analysis. To demonstrate, Figure 3 shows a calculation of the density of tornadoes per county for the total number of tornadoes between 2016 and 2021. Each of the four maps has a different number of classes representing density, from three to six. With more classes, there is more granular variation at the top end of the scale which may be beneficial in drawing conclusions about which places are at greater risk. However, at the lower end of the scale, more classes may show more variation than there actually is in reality. For example, with six classes there is a visual distinction between zero and one tornadoes per thousand square kilometers, but there is no visual distinction between four and ten tornadoes per thousand square kilometers.

### Operations for mapping density in a choropleth map

# use spatial join to get points in counties
countypnt <- st_join(tpoint_16_21, okcounty)

# drop the geometry so you can use polygon area instead of points
countypnt <- st_drop_geometry(countypnt)

# aggregate sum of points by county
countysum <- countypnt %>%
  group_by(GEOID) %>%
  summarize(tcnt = n())

# create a data frame that holds a density of touchdowns
#    per county (tdens) by using a regular join, calculating
#    county area with st_area, then computing density in tornadoes
#    per km squared. This gives density as a continuous value.
countymap <- okcounty %>%
  left_join(countysum, by = "GEOID") %>%
  replace(is.na(.), 0) %>%
  mutate(area = st_area(okcounty),
         tdens = 10^6 * 10^3 * tcnt / area) %>%
  drop_units()


### Next the density values are put into discrete classes and maps
##    with classes between 3 and 6 will be compared.

# creating a function to make different maps based on number of 
#    classification breaks
break_class <- function(numclas){
  
  qbrks <- seq(0, 1, length.out = numclas + 1)
  
  countymap <- countymap %>%
    mutate(tdens_c1 = cut(tdens,
                          breaks = quantile(tdens, probs = qbrks),
                          include.lowest = T))
  
ggplot(data = countymap) +
    geom_sf(aes(fill = tdens_c1)) +
    scale_fill_brewer(name = expression("Tornadoes/1000 km"^2),   
                      palette = "YlOrRd") +
    theme_void()
}

# call function once per each number of classification and assign
#    to a map object to be passed to plot_grid() for display
breaks_3 <- break_class(3)
breaks_4 <- break_class(4)
breaks_5 <- break_class(5)
breaks_6 <- break_class(6)
 

# using plot_grid() from cowplot to make one figure of the four
#    different quantile breaks maps
plot_grid(breaks_3, breaks_4, breaks_5, breaks_6,
          ncol = 2,
          labels = c("Map A", "Map B", "Map C", "Map D"))

Figure 4: More classes allow for a more granular comparison for areas with more tornadoes. However, with more classes, more variation is implied at the low end of the scale than truly exists. Map A has 3 classes, Map B has 4, Map C has 5 and Map D has 6.

Conclusion

Tornado occurrences in Oklahoma are variable in time and space. This exploratory data analysis has shown that certain counties were more affected than others by tornadoes between 2016 and 2021. There appears to be a pattern of more tornadoes in a band from the southwest to northeast corner of the state. This exploratory data analysis has raised questions that could be answered with further statistical analysis. Point pattern analysis or analysis that takes the path, intensity, and amount of damage into account could help answer the question of which counties are most susceptible to tornadoes.