Part A - Practice

Data Cleaning

okcounty_sp <- as(okcounty, 'Spatial')

tpoint_16_21 <- tpoint %>%
  filter(yr >= 2016 & yr <= 2021) %>%
  select(om, yr, date)

tpath_16_21 <- tpath %>%
  filter(yr >= 2016 & yr <= 2021) %>%
  select(om, yr, date)

Part 1

points <- ggplot() +
  geom_sf(data = tpoint_16_21, 
          aes(color = as.factor(yr))) +
  geom_sf(data = okcounty, fill = NA) +
  scale_color_discrete(name = "Year") +
  coord_sf(datum = NA) +
  theme_void()

paths <- ggplot() +
  geom_sf(data = tpath_16_21, 
          aes(color = as.factor(yr))) +
  geom_sf(data = okcounty, fill = NA) +
  scale_color_discrete(name = "Year") +
  coord_sf(datum = NA) +
  theme_void()

plot_grid(points, paths, labels = c("A) Tornado Points", "B) Tornado Paths"), 
          ncol = 2, hjust = 0, align = "hv")

Plot 1: The locations of tornadoes in Oklahoma showing the location of tornadoes and their path by year(2016-2021)

This map shows the locations of tornadoes in Oklahoma as well as showing the path of the tornado. In order to achieve this I first had to filter the tornado point and tornado path data to only use the data from 2016 to 2021. I then used the ggplot2 library to create two plots with one plot showing the starting location of the tornado and the other showing the overall path of the tornado through Oklahoma.

Part 2

countypnt <- st_join(tpoint_16_21, okcounty)

countypnt <- st_drop_geometry(countypnt)
countysum <- countypnt %>%
  group_by(GEOID, yr) %>%
  summarize(tcnt = n())

years <- 2016:2021
countysum_full <- okcounty %>%
  st_drop_geometry() %>%
  mutate(GEOID = as.character(GEOID)) %>%
  crossing(yr = years) %>%         
  left_join(countysum, by = c("GEOID", "yr")) %>%
  mutate(tcnt = replace_na(tcnt, 0))

countymap <- okcounty %>%
  mutate(GEOID = as.character(GEOID)) %>%
  left_join(countysum_full, by = "GEOID") %>%
  mutate(area = st_area(geometry),
         tdens = 10^6 * 10^3 * tcnt / area) %>%
  drop_units()


ggplot(data = countymap) +
  geom_sf(aes(fill = tdens)) +
  facet_wrap(vars(yr), ncol = 2) +
  labs(title = "Tornadoes by County(2016-2021)", fill = "Tornado Density") + 
  theme_void()

Plot 2: Tornado density in Oklahoma by county from 2016 to 2021 with a notable increase in density in 2019.

For this map, I performed a spatial join between the tornado location data and data of the Oklahoma county boundaries to explore the counties that have the highest prevalence of tornadoes. To find the count of tornadoes by county I joined the tornado location and county data and grouped by year and by county to find the number of tornadoes in a county for each year, 2016 to 2021. I then joined this data frame with the original county boundaries data set and calculated the tornado density by using the overall number of tornadoes that year divided by the area of the county. One thing I found interesting in thsese maps is the increase in tornado density in 2019 aroudn the middle of Oklahoma.

Part 3

countypnt <- st_join(tpoint_16_21, okcounty)

countypnt <- st_drop_geometry(countypnt)
countysum <- countypnt %>%
  group_by(GEOID) %>%
  summarize(tcnt = n())

countymap <- okcounty %>%
  left_join(countysum, by = "GEOID") %>%
  replace(is.na(.), 0) %>%
  mutate(area = st_area(okcounty),
         tdens = 10^6 * 10^3 * tcnt / area) %>%
  drop_units()

breaks3 <- ggplot(data = countymap) +
  geom_sf(aes(fill = tdens)) +
  scale_fill_distiller(name = expression("Tornadoes/1000 km"^2), 
                       palette = "YlOrRd", 
                       breaks = pretty_breaks(n = 3),
                       direction = 1) + 
  theme_void()

breaks4 <- ggplot(data = countymap) +
  geom_sf(aes(fill = tdens)) +
  scale_fill_distiller(name = expression("Tornadoes/1000 km"^2), 
                       palette = "YlOrRd", 
                       breaks = pretty_breaks(n = 4),
                       direction = 1) + 
  theme_void()

breaks5 <- ggplot(data = countymap) +
  geom_sf(aes(fill = tdens)) +
  scale_fill_distiller(name = expression("Tornadoes/1000 km"^2), 
                       palette = "YlOrRd", 
                       breaks = pretty_breaks(n = 5),
                       direction = 1) + 
  theme_void()

breaks6 <- ggplot(data = countymap) +
  geom_sf(aes(fill = tdens)) +
  scale_fill_distiller(name = expression("Tornadoes/1000 km"^2), 
                       palette = "YlOrRd", 
                       breaks = pretty_breaks(n = 6),
                       direction = 1) + 
  theme_void()

plot_grid(breaks3, breaks4, breaks5, breaks6,
          labels = c("A) 3 Class Breaks", "B) 4 Class Breaks", "C) 5 Class Breaks", "D) 6 Class Breaks"))

Plot3: Tornado density in Oklahoma by county from 2016 to 2021 using four different ammounts of class breaks.

This map explores the difference in the look of a map when using a different number of classes. To first create this map, I had to perform a join similar to the one in part 2, but without the need to group and join the data based on year as well as county. I then calculated the tornado density the same as in part 2 by dividing the total number of tornadoes in each county by the total area. I created 4 different maps, but with each specifying a different number of class breaks for each map.

Lesson4

Jacob Hodges

2025-11-07

Introduction

Part A - Practice

Data Cleaning

Part 1

Part 2

Part 3