Part A of Lab 4 for GEOG 588 Spring Semester 2025

Visualizing Tornadoes in Oklahoma with Spatial Vector Data

Part A of this assignment reads and codes along with Wimberly’s (2023) “Geographic Data Science in R” The purpose of this work is to practice the tools associated with displaying data spatially in R, while familiarizing the user with resources available for vector data projection.

Data Preparaton

The code used in this script utilizes the sf package which stands for “simple features” This package allows object to be displayed spatially and creates a standardization for vector data displayed in R. Oklahoma tornadoes from 2016 to 2021 was the focus of this assignment. Data of tornado points and paths were collected filtered to the years of focus and ### Background on the sf package: The sf package uses dataframes to store vector data, This creates a standardized way of reading vector data in R. This is performed by making each feature as one row in the data frame. In this formatting attributes are columns and spatial information is special column, typically called “geometry.” It is important to note that for data to be displayed spatially, telling R which of your columns it should use for projection is crucial.

# These libraries were installed and utilized in the creation of the maps below.
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyr)
library(ggplot2)
# sf is what assist turn objects into data frames that can be displayed spatially.
library(sf)
## Linking to GEOS 3.12.2, GDAL 3.9.3, PROJ 9.4.1; sf_use_s2() is TRUE
library(units)
## udunits database from C:/Users/edwar/AppData/Local/R/win-library/4.4/units/share/udunits/udunits2.xml
# RColorBrewer provides good color scales for Choropleth mapping.
library(RColorBrewer)
# Cowplot is used for combining saved .png and saved composite figures (.jpeg, etc.)
library(cowplot)

The above libraries were used in this lab and in the creation of the maps below.

Continuing the data preparation step data was read to features to easily feed processes with the required information.

# Data was read into RStudio using st_read, which allows shapefiles to be assigned to features.
# okcounty contains the geospatial information for the state of Oklahoma
okcounty <- st_read("C:/GEOG588/Week4/data/ok_counties.shp", quiet = TRUE)
# tpoint contains the information for tornado points measured in Oklahoma
tpoint <- st_read("C:/GEOG588/Week4/data/ok_tornado_point.shp", quiet = TRUE)
# tpath contains the information for tornado paths measured in Oklahoma
tpath <- st_read("C:/GEOG588/Week4/data/ok_tornado_path.shp", quiet = TRUE)

Continuing with data preparation steps using “glimpse” to view the table layout of the organized information and “ggplot” to quickly view the boundaries shapefile.

# [1] "sf" "data.frame"
glimpse(okcounty)
## Rows: 77
## Columns: 8
## $ STATEFP  <chr> "40", "40", "40", "40", "40", "40", "40", "40", "40", "40", "…
## $ COUNTYFP <chr> "077", "025", "011", "107", "105", "153", "001", "053", "059"…
## $ COUNTYNS <chr> "01101826", "01101800", "01101793", "01101841", "01101840", "…
## $ AFFGEOID <chr> "0500000US40077", "0500000US40025", "0500000US40011", "050000…
## $ GEOID    <chr> "40077", "40025", "40011", "40107", "40105", "40153", "40001"…
## $ NAME     <chr> "Latimer", "Cimarron", "Blaine", "Okfuskee", "Nowata", "Woodw…
## $ LSAD     <chr> "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "…
## $ geometry <POLYGON [°]> POLYGON ((-95.50766 35.0292..., POLYGON ((-103.0025 3…
# plot all the county's for oklahoma in ggplot2
ggplot(data = okcounty) + 
  geom_sf(fill = NA)

To filter the raw data to focus on the information desired by this project. The “filter” and “select” functions were used to trim the time frame information down to the tornado information between 2016 and 2021. The script below shows this filtered information being stored in objects called tpoint_16_21 and tpath_16_21.

# Display list of attribute names
names(tpoint)
##  [1] "om"       "yr"       "mo"       "dy"       "date"     "time"    
##  [7] "tz"       "st"       "stf"      "stn"      "mag"      "inj"     
## [13] "fat"      "loss"     "closs"    "slat"     "slon"     "elat"    
## [19] "elon"     "len"      "wid"      "fc"       "geometry"
# Creating features called tpoint_16_21 that filter the path and points of the tornadoes between 2016 and 2021
tpoint_16_21 <- tpoint %>%
  filter(yr >= 2016 & yr <= 2021) %>%
  select(om, yr, date)

tpath_16_21 <- tpath %>%
  filter(yr >= 2016 & yr <= 2021) %>%
  select(om, yr, date)

Figure 1:

Story of Map 1: Five year spread of tornado impacts in Oklahoma 2016-2021.

Data Preparation

The script below displays the creation of two ggplot visualizations, one titled “color_path” and the other “tornado_point.” These plots were used to display the previously discussed “tpoint_16_21” and “tpath_16_21.” Using ggplots to prepare the data for viewing allows the viewers to quickly breakdown the story of data in a digestible way. A five year range in data from 2016 to 2021 was chosen to provide an adequate sampling size for viewing and to briefly display the amount of storm systems this area can expect to face annually.

color_path <- ggplot() +
  # plotting a layer of the OK counties with no fill
  geom_sf(data = okcounty, fill = NA) +
  # plotting tornado paths where the years are different colors
  geom_sf(data = tpath_16_21,
          aes(color = as.factor(yr))) +
  # using scale_color_discrete allows the legend in the resulting map to be labeled "Year"
  scale_color_discrete(name = "Year") +
  #making the theme void to better focus on the overlaying layers
  theme_void()

# polygon of state overlayed with points
tornado_point <- ggplot() +
  geom_sf(data = okcounty, fill = NA) + 
  geom_sf(data = tpoint_16_21,
          aes(color = as.factor(yr))) +
  scale_color_discrete(name = "Year") +
  theme_void()

# The maps were exported to .png to create a composite
ggsave(filename = "color_path.png",
       plot = color_path,
       width = 6,
       height = 4,
       dpi = 300)

# Creating .png of simple tornado points to composite into a single graphic in the next step
ggsave(filename = "tornado_points.png",
       plot = tornado_point,
       width = 6,
       height = 4,
       dpi = 300)

# Creating a composite of both the tornado paths by year map and tornado point map
# plot_grid is used to combine both the .png files into one graphic
plot_grid(color_path, tornado_point, 
          labels = c("Tornado Paths by Year",
                     "Tornado Locations",
                     label_size = 12),
          ncol = 1, 
          hjust = 0,
          # These values allow the designer control over their placement of the labels for the graphic
          label_x = 0,
          # label_x changes the horizontal display of the label while label_y displays the vertical display of the label
          label_y = 0.5,
          # align controls the alignment of the plots "h" is for horizontal and "v" is for vertical
          align = "hv")

###Results: Figure 1 displays two generated maps, a map of tornado paths and a map of tornado points in Oklahoma from the years 2016 to 2021. This highlights the difference in how visualizing tornadoes by paths or by touch down points changes the perspective of impact. A storm system could impact one area then travel to another, understanding both touch down points and distance traveled creates a complete picture for each of the tornadoes journey. The storm systems are color coded by year to provide distinction to viewers and highlight different extreme weather trends by year.

Figure 2:

Story of Map 2: Summarizing the density of tornado points by both county and year between 2016 and 2021.

The outcome of this map is a faceted plot that displays maps of county-level tornado density from 2016 to 2021. ### Data Preparation An object called “countymap” was created to join the information from the Oklahoma shapefile and the calculate the area of the polygon. This area would be used to spatially join the tornado points data and calculate the density of tornadoes by county and later by year.

# Step 1. Was to create a layer that joined the Oklahoma boundaries and the numeric tornado point data. This was crucial for calculating density of tornadoes. 
# countypnt is spatially joining the points from years 2016 through 2021 with the spatial information held in the "okcounty" object.
countypnt <- st_join(tpoint_16_21, okcounty)
# countysum is summarizing the points grouped by GEOID, into a new field called "tcnt" which counts all the tornadoes in a GEOID
countysum <- countypnt %>%
  group_by(GEOID) %>%
  summarize(tcnt =n())
# countymap spatially joins the county layer and the countysum layer together by the spatial area they intersect.
countymap <- st_join(okcounty, countysum, join = st_intersects)
# Next came calculating the tornado density of the area by county and by year
# "countymap2" was created to spatially join the points layer "tpoint_16_21" and the prevoiusly created "countymap" layer.
countymap2 <- st_join(tpoint_16_21, countymap)

# Step 2: This step involved removing the geometry from tornado point data after the spatial join. Geometry was removed to easily calculate county sums by year.
countymap2 <- st_drop_geometry(countymap2)

# Step 3: This step was fun and involved summarizing the tornado density by county and year.
countysum2 <- countymap2 %>%
  group_by(COUNTYFP, yr) %>%
  summarize(countysum2 = n(), .groups = 'drop')

# Step 4: This step was to reintroduce the geometry component in order to calculate density
okcounty <- okcounty %>%
  mutate(area_km2 = as.numeric(st_area(geometry) / 1e6)) 

# Step 5: Required the count summary of tornadoes by county polygons
tdensity_sf <- okcounty %>%
  left_join(countysum2, by = "COUNTYFP") %>%
  #calculating density and creating the "tdensity" field
  mutate(tdensity = countysum2 / area_km2) %>%
  mutate(tdensity = replace_na(tdensity, 0))

# Step 6: Plotting the faceted map
ggplot() +
  geom_sf(data = okcounty, 
          fill = NA,
          color = "gray") +
  # "tdensity" is the field used to display density
  geom_sf(data = tdensity_sf, aes(fill = tdensity)) +
  # Facet by year using the facet_wrap function
  # fun fact the ~ symbol means the function is dependant on the following information
  facet_wrap(~yr) + 
  # scale_fill_viridis_c lets you control the name of the legend that will display in the resulting graphic
  scale_fill_viridis_c(name = "Tornado Density") +
  theme_void()

Results:

Figure 2 shows tornado density by county wrapped by year. Which displays the change in tornado density over time and highlights the trends of these storms systems in the state over a five year range. Important to note the lighter the color the more intense the density of tornadoes was for a given year. The regional tornado activity in this area is important to understand for coordinating mitigation and preparedness efforts. A consideration is the NA features that are present in the final map. These are relic numbers and were tried to filter out. Next steps for this graphic would be to filter out any NA datasets.

Figure 3:

Story of Map 3: Tornado density based on quantile breaks with number classes ranging from 3 to 6.

The outcome of this exercise is a composite figure containing four maps that each examine the density of tornadoes in Oklahoma counties by different quantile breaks. ### Data Preparation:

# Starting off with data preparation we will use the tornado density data created for Figure 2.
tdensity_sf <- okcounty %>%
  left_join(countysum2, by = "COUNTYFP") %>%
  mutate(tdensity = countysum2 / area_km2) %>%
  mutate(tdensity = replace_na(tdensity, 0))

# creating the QUANTILE BREAK 1, using "seq"

# QUANTILE BREAK 1 - using 3 number of classes
# the numclass3 object was created to hold the value used for breaking the tornado density field by quantile classes.
numclass3 <- 3
qbreak_1 <- quantile(tdensity_sf$tdensity, probs = seq(0, 1, length.out = numclass3 + 1))
qbreak_1
##           0%    33.33333%    66.66667%         100% 
## 0.0000000000 0.0004966641 0.0009921624 0.0058610691
# tdensity_sf dataset was updated to hold the new mutated "tdens_c1" field which was used to visualize the quantile break when plotted.
tdensity_sf <- tdensity_sf %>%
  mutate(tdens_c1 = cut(tdensity,
                        breaks = qbreak_1,
                        include.lowest = T))
# assigning the plot to an object, this allowed for compositing
qbreak1 <- ggplot(data = tdensity_sf) +
  # the fill was assigned to the previously created tornado density break 1 field.
  geom_sf(aes(fill = tdens_c1)) +
  # a minimal theme was chosen to assist focus on the data.
  theme_void()

# QUANTILE BREAK 2 - using 4 number of classes
# the following quantile breaks used the same reasoning as quantile break 1.
numclass4 <- 4
qbreak_2 <- quantile(tdensity_sf$tdensity, probs = seq(0, 1, length.out = numclass4 + 1))
qbreak_2
##           0%          25%          50%          75%         100% 
## 0.0000000000 0.0004334565 0.0006496769 0.0012211401 0.0058610691
tdensity_sf <- tdensity_sf %>%
  mutate(tdens_c2 = cut(tdensity,
                        breaks = qbreak_2,
                        include.lowest = TRUE))
qbreak2 <- ggplot(data = tdensity_sf) +
  geom_sf(aes(fill = tdens_c2)) +
  theme_void()

# QUANTILE BREAK 3 - using 5 number of classes
numclass5 <- 5
qbreak_3 <- quantile(tdensity_sf$tdensity, probs = seq(0, 1, length.out = numclass5 + 1))
qbreak_3
##           0%          20%          40%          60%          80%         100% 
## 0.0000000000 0.0004249029 0.0005408858 0.0008006891 0.0013738510 0.0058610691
tdensity_sf <- tdensity_sf %>%
  mutate(tdens_c3 = cut(tdensity,
                        breaks = qbreak_3,
                        include.lowest = TRUE))
qbreak3 <- ggplot(data = tdensity_sf) +
  geom_sf(aes(fill = tdens_c3)) +
  theme_void()

# QUANTILE BREAK 4 - using 6 number of classes
numclass6 <- 6
qbreak_4 <- quantile(tdensity_sf$tdensity, probs = seq(0, 1, length.out = numclass6 + 1))
qbreak_4
##           0%    16.66667%    33.33333%          50%    66.66667%    83.33333% 
## 0.0000000000 0.0004072161 0.0004966641 0.0006496769 0.0009921624 0.0015265528 
##         100% 
## 0.0058610691
tdensity_sf <- tdensity_sf %>%
  mutate(tdens_c4 = cut(tdensity,
                        breaks = qbreak_4,
                        include.lowest = TRUE))
qbreak4 <- ggplot(data = tdensity_sf) +
  geom_sf(aes(fill = tdens_c4)) +
  theme_void()

# Compositing the four maps together using "plot_grid()" which proved very useful for easily combining plots into one graphic.
plot_grid(qbreak1, qbreak2, qbreak3, qbreak4, 
          labels = c("3 Quantile Breaks",
                     "4 Quantile Breaks",
                     "5 Quantile Breaks",
                     "6 Quantile Breaks",
                     label_size = 12),
          # the following variables allow for control over label placement.
          ncol = 2, 
          hjust = 0,
          label_x = 0,
          label_y = 0.25,
          align = "hv")

Results:

Number classes ranging from 3 to 6 were chosen to provide distinctive visualizations of the map and highlight how differences in varying quantile breaks for tornado density can create in interpretation and trend analysis. The blue and pink color fill reflects the high end of the density threshold in each of the maps. using a low amount of class breaks is useful for keeping the visual simple and demonstrating high, medium, and low density of tornadoes, but using a high amount of class breaks allow for a more complete picture of tornado risk to be told. This exercise demonstrates the level of complexity quantile breaks can create for graphics created and the story your data is trying to tell.

Sources

Wimberly, “Geographic Data Science in R,” (2023). https://bookdown.org/mcwimberly/gdswr-book/vector-geospatial-data.html CRAN R resource library, “sf: Simple Features for R.” https://cran.r-project.org/web/packages/sf/index.html