Overview

The purpose of this report is to detail the solutions for each of the three exercise problems laid out in Chapter 5 of Geographic Data Science with R. Namely, the three exercises are:

  1. Generating a map of tornado paths,
  2. Summarizing the density of tornado points by county & year, and
  3. Generating choropleth maps of tornado density.

Before getting into each exercise, first we’ll import the relevant libraries & read in the required datasets:

  • a shapefile of tornado paths, stored as LINESTRING geometries;
  • a shapefile of tornado points, stored as POINT geometries; and
  • a shapefile of counties in Oklahoma, stored as POLYGON geometries.
library(sf)
library(ggplot2)
library(cowplot)
library(dplyr)
library(units)

# Read in the shapefiles, and let's take a look at the columns.
tornado_paths <- st_read("Chapter5/ok_tornado_path.shp", quiet=TRUE)
tornado_points <- st_read("Chapter5/ok_tornado_point.shp", quiet=TRUE)
counties <- st_read("Chapter5/ok_counties.shp", quiet=TRUE)

Exercise 1

The first exercise reads as follows:

Generate a map of tornado paths where the paths from each year are displayed as a different color, similar to the map of tornado points in Figure 5.4. Create a composite figure containing the map of tornado paths and the map of tornado points using plot_grid().

To do this, we’ll leverage the plotting functionality graphics package ggplot2; namely, the workhorse here will be the geom_sf() function. After creating each of the individual plots (tornado paths & tornado points), we’ll use the cowplot package to create the composite of both plots via the plot_grid() function.

# Create the tornado paths plot
paths_plot <- ggplot(data=tornado_paths) +
  geom_sf(aes(color=yr)) +
  geom_sf(data=counties, fill=NA, color="black") +
  theme_void() +
  labs(color = "Year")

# Create the tornado points plot
points_plot <- ggplot(data=tornado_points) +
  geom_sf(aes(color=yr)) +
  geom_sf(data=counties, fill=NA, color="black") +
  theme_void() +
  labs(color = "Year")

# Composite the plots together
plot_grid(
  paths_plot, points_plot,
  labels = c(
    "Tornado Paths", "Tornado Points", label_size = 12
  ),
  ncol = 1,
  hjust = 0, 
  label_x = 0,
  align = "hv"
)

Exercise 2

The second exercise reads as follows:

Summarize the density of tornado points by both county and year and generate a faceted plot that displays maps of county-level tornado density from 2016-2021.

To do this, we’ll first need to conduct a spatial join between the tornado points & the counties dataframes. This wil give us county information for each tornado point.

# Filter to the required years, then conduct a spatial join.
tornado_points_county <- tornado_points %>%
  filter(yr >= 2016 & yr <= 2021) %>%
  st_join(counties)

Now that we’ve done that, we need to group-by/summarise the dataset, grouping by county and year & counting the occurrences of tornados.

# Group by the county & year, and count the number of tornado touchdowns.
tornado_points_total <- tornado_points_county %>%
  st_drop_geometry() %>%
  group_by(GEOID, yr) %>%
  summarise(tornado_count = n())

Now that we have the aggregate measures of tornados per county, per year we can join that table back to our original counties shapefile. This gives us the required information for plotting.

# Join the tornado touchdown totals back to the original counties shapefile
county_tornado_totals <- counties %>%
  left_join(tornado_points_total, by = "GEOID")

# Calculate the tornado density (code used from the book chapter)
county_tornado_totals <- county_tornado_totals %>%
  mutate(area = st_area(county_tornado_totals),
         tornado_density = 10^6 * 10^3 * tornado_count / area) %>%
  drop_units()

Now that we have the required dataset for plotting, we’ll continue to leverage ggplot2. To create a faceted plot by year, we’ll leverage the facet_wrap() function.

# Plot the tornado density choropleths, faceted by year.
ggplot(data = county_tornado_totals) +
  geom_sf(data=counties, fill=NA) +
  geom_sf(aes(fill = tornado_density)) +
  facet_wrap(~yr, ncol=2) +
  theme_void() +
  labs(
    title = "Tornado Density by County, by Year",
    fill = "Tornado Density"
  )

Exercise 3

The final exercise reads as follows:

Generate four choropleth maps of tornado density based on quantile breaks with numbers of classes ranging from 3 to 6. Create a composite figure containing the four maps using plot_grid() and examine how the number of classes changes the interpretation of the map.

To do this, we’ll leverage some base R functionality: the cut() function will create discrete classes based on our continuous tornado density column we’ve already created. We’ll loop through each of the desired number of classes to get our desired result.

First, let’s create a similar tornado density dataset, but without grouping by year.

# Calculate the count of tornado touchdowns per county (not per year)
tornado_points_total <- tornado_points_county %>%
  st_drop_geometry() %>%
  group_by(GEOID) %>%
  summarise(tornado_count = n())

# Join the tornado counts back to the original counties shapefile
county_tornado_totals <- counties %>%
  left_join(tornado_points_total, by = "GEOID") %>%
  replace(is.na(.), 0)

# Calculate the tornado density
county_tornado_totals <- county_tornado_totals %>%
  mutate(area = st_area(county_tornado_totals),
         tornado_density = 10^6 * 10^3 * tornado_count / area) %>%
  drop_units()

Now we can run a for-loop for each of the desired number of breaks. In each iteration of the loop, the desired quantiles are calculated for the tornado density & classes are created as a column. This column is then used to plot a choropleth. The last step is to composite these 4 plots together using the same plot_grid() function we used earlier.

# First, instantiate an empty list
plots <- list()

# For each desired number of classes...
for (num_classes in c(3, 4, 5, 6)){
  
  # ... first create the list of quantiles based on the number of breaks
  quantile_breaks <- seq(0, 1, length.out = num_classes + 1)
  
  # ... then create the classed tornado density column
  county_tornado_totals <- county_tornado_totals %>%
  mutate(
    tornado_density_classed = cut(
      tornado_density,
      breaks = quantile(tornado_density, probs = quantile_breaks),
      include.lowest = T
    )
  )
  
  # ... then create the classed choropleth
  plot <- ggplot(data = county_tornado_totals) +
    geom_sf(aes(fill = tornado_density_classed)) +
    scale_fill_brewer(palette = "YlOrRd") +
    theme_void() +
    labs(fill = paste0("Tornado Density (Classes: ", num_classes, ")"))
  
  # ... then finally save the plot to the list of plots
  plots[[num_classes]] <- plot
}

# Plot each of the classed choropleths on a composite plot.
plot_grid(plots[[3]], plots[[4]], plots[[5]], plots[[6]])

You can see from each of the different plots that the interpretation of each map can look different based on the number of defined classes. This is a consideration when creating discrete classes of a numeric variable!