This report looks at the distribution of tornadoes in the state of Oklahoma, using data from the National Oceanographic and Atmospheric Administration (NOAA) and the National Weather Service Storm Prediction Center. The dataset consists of a counties, a tornado point, and a tornado path shapefile. The counties shapefile is a file consisting of polygons that represent each county in Oklahoma. The tornado point shapefile shows the initial touchdown locations of tornadoes and the tornado path shapefile shows the paths of the tornado. The goal of this report is to explore and present the spatial distribution of tornadoes in Oklahoma.

This report is Part A of Lab #4 for GEOG588.

Load packages

These are all the packages that will be used within the report.

library(sf)
library(rgdal)
library(ggplot2)
library(dplyr)
library(tidyr)
library(scales)
library(RColorBrewer)
library(units)
library(cowplot)
library(ggsn)

Data Wrangling

Import geospatial data

Using st_read, we’ll read in the data that show the historical distrubtion of tornadoes in Oklahoma. We’ll assign them to objects for ease of analysis.

okcounty <- st_read("ok_counties.shp", quiet = TRUE) # Counties in Oklahoma.
tpoint <- st_read("ok_tornado_point.shp", quiet = TRUE) # Initial touchdown points of tornadoes.
tpath <- st_read("ok_tornado_path.shp", quiet = TRUE)  # The paths of the tornadoes. 
class(okcounty) # Verify the class of the county map.
## [1] "sf"         "data.frame"

Here is a glimpse of the data

glimpse(okcounty)
## Rows: 77
## Columns: 8
## $ STATEFP  <chr> "40", "40", "40", "40", "40", "40", "40", "40", "40", "40", "…
## $ COUNTYFP <chr> "077", "025", "011", "107", "105", "153", "001", "053", "059"…
## $ COUNTYNS <chr> "01101826", "01101800", "01101793", "01101841", "01101840", "…
## $ AFFGEOID <chr> "0500000US40077", "0500000US40025", "0500000US40011", "050000…
## $ GEOID    <chr> "40077", "40025", "40011", "40107", "40105", "40153", "40001"…
## $ NAME     <chr> "Latimer", "Cimarron", "Blaine", "Okfuskee", "Nowata", "Woodw…
## $ LSAD     <chr> "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "…
## $ geometry <POLYGON [°]> POLYGON ((-95.50766 35.0292..., POLYGON ((-103.0025 3…

Next, the data was filtered to cover tornadoes for the years 2016-2021.

tpoint_16_21 <- tpoint %>%   # The point data.
  filter(yr >= 2016 & yr <=2021) %>%   # Filter for the years.
  select(om, yr, date) # Select three variables.
tpath_16_21 <- tpath %>%   # The path (polyline) data.
  filter(yr >=2016 & yr <= 2021) %>%   # Filter for the same years.
  select(om, yr, date)   # Select three variables.

Here’s a sample of the tornado point data

head(tpoint_16_21)
## Simple feature collection with 6 features and 3 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -101.988 ymin: 35.6916 xmax: -94.5042 ymax: 36.8151
## Geodetic CRS:  NAD83
##       om   yr       date                 geometry
## 1 613662 2016 2016-03-23 POINT (-94.5042 35.6916)
## 2 613675 2016 2016-03-30 POINT (-96.0151 36.2151)
## 3 613676 2016 2016-03-30 POINT (-95.5523 36.7291)
## 4 613677 2016 2016-03-30  POINT (-95.6384 36.284)
## 5 613678 2016 2016-03-30 POINT (-95.3178 36.8151)
## 6 613727 2016 2016-04-15   POINT (-101.988 36.74)

…and a sample of the tornado path data

head(tpath_16_21)
## Simple feature collection with 6 features and 3 fields
## Geometry type: LINESTRING
## Dimension:     XY
## Bounding box:  xmin: -101.988 ymin: 35.6916 xmax: -94.4104 ymax: 36.8317
## Geodetic CRS:  NAD83
##       om   yr       date                       geometry
## 1 613662 2016 2016-03-23 LINESTRING (-94.5042 35.691...
## 2 613675 2016 2016-03-30 LINESTRING (-96.0151 36.215...
## 3 613676 2016 2016-03-30 LINESTRING (-95.5523 36.729...
## 4 613677 2016 2016-03-30 LINESTRING (-95.6384 36.284...
## 5 613678 2016 2016-03-30 LINESTRING (-95.3178 36.815...
## 6 613727 2016 2016-04-15 LINESTRING (-101.988 36.74,...

Analysis

Exercise 1

In this section, a map of tornado paths is created where the paths from each year are displayed in different colors. Another map is created in a similar fashion with all of the tornado points data.

In the final map, a composite map is created combining the color-coded tornado paths and the tornado points maps using plot_grid().

First, a map of the tornado paths, differentiated by color, is created.

tpathplot_2016_2021 <- ggplot() +
  geom_sf(data = tpath_16_21,    # We'll map the paths by year.
          aes(color = as.factor(yr))) +
  geom_sf(data = okcounty, fill = NA) +  # And add a layer for the counties.
  scale_color_discrete(name = "Year") + # Color by "Year."
  coord_sf(datum = NA) +  # Ensure the layers use the same CRS. 
  ggtitle("Tornado Paths in Oklahoma by Year") + # Add a title
  theme_minimal() +   # Add a theme
  theme(plot.title=element_text(color="darkblue", hjust= 0.2, face='bold', size=14)) # Format the title

This step adds a north arrow and a scale bar on the bottom left, using the “ggsn” package.

tpathplot_2016_2021  +
  blank() +
  north(okcounty, symbol=10, location ="bottomleft", scale = 0.20) + # north arrow
  scalebar(okcounty,location="bottomleft", dist=50, st.dist=0.05, st.size=3, dist_unit="mi", # scalebar
           transform=TRUE, model="WGS84")

Next, a map of tornado points is created in a similar fashion.

tpointplot_16_21 <- ggplot()+
  geom_sf(data=okcounty, fill=NA) + 
  geom_sf(data=tpoint_16_21, aes(color = as.factor(yr))) +
  scale_color_discrete(name = "Year") +
  ggtitle("Tornado Points in Oklahoma by Year") + 
  theme_minimal() +
  theme(plot.title=element_text(color="darkblue", hjust= 0.2, face='bold', size=14))

Again, we add a north arrow and a scale bar on the bottom left.

tpointplot_16_21  +
  blank() +
  north(okcounty, symbol=10, location ="bottomleft", scale = 0.20) +
  scalebar(okcounty,location="bottomleft", dist=50, st.dist=0.05, st.size=3, dist_unit="mi",
           transform=TRUE, model="WGS84")

Now, here is the composite map of both the tornado paths and the tornado points maps created with plot_grid():

plot_grid(tpathplot_2016_2021, tpointplot_16_21, 
          labels = c("A)", 
                     "B)",
                     label_size = 2),
          ncol = 1, 
          hjust = 0, 
          label_x = 0,
          align = "h")

The figure above shows the composite maps of tornado paths in the top map, A, and the tornado points in the bottom map, B. Both maps are organized by year, with both the paths and the points color coded according to the year values. The year color palettes are the same for both maps. From map A, it can be seen that the tornadoes seem to move generally in a southwest to northeast or a northeast to southwest direction. The specific directionality is not defined in the map but that is what the patterns suggest. For the tornado points, the touchdown areas of the tornadoes seem to be concentrated in the central area of the state, with a band of area covering swaths from the southwest to the northeast containing most points. The panhandle area to the northwest and the northern areas in general, seem to have less points, with the exception of the northeastern portion of the state.

Exercise 2

In this section, we summarize the density of tornado points by both county and year and generate a faceted plot that displays maps of county-level tornado density from 2016-2021.

First, we’ll group the data by GEOID and the year and summarize the count of the tornado point in each county for each year.

countypnt <- st_join(tpoint_16_21, okcounty)   # The point data is joined with the county shapefile. 
countypnt_yr <- st_drop_geometry(countypnt)    # The geometry is dropped so operations can be done on the data.
countysum_yr <- countypnt_yr %>%     # The data is then grouped by GEOID and year
  group_by(GEOID, yr) %>%
  summarise(tcnt=n()) %>%      # A count of each point is summarized for each county (GEOID) and year. 
  aggregate(tcnt ~ yr + GEOID, FUN=sum, na.rm=TRUE) # The count is then aggregated.
## `summarise()` has grouped output by 'GEOID'. You can override using the
## `.groups` argument.

Here’s a glimpse of that data.

glimpse(countysum_yr)
## Rows: 217
## Columns: 3
## $ yr    <dbl> 2016, 2017, 2019, 2020, 2021, 2016, 2019, 2017, 2018, 2019, 2017…
## $ GEOID <chr> "40001", "40001", "40001", "40001", "40001", "40005", "40005", "…
## $ tcnt  <int> 1, 1, 1, 1, 2, 2, 1, 1, 1, 2, 3, 1, 3, 1, 1, 1, 1, 2, 1, 5, 4, 1…

Next, the summarized count data is joined with the county map so that it will have a geometry. The density is also calculated for each county.

countymap2 <- okcounty %>%
  left_join(countysum_yr, by="GEOID") %>%    # The summarized data is re-joined to the county shapefile so that it can have a geometry. 
  mutate(area = st_area(.),
         tdens2 = 10^6 * 10^3 * tcnt / area) %>%   # The density is calculated per county. 
  drop_units()    # The unit is dropped. 

Then, here is the composite plot that shows tornado density per county by year.

ggplot() +
  geom_sf(data = okcounty,      # A layer for the county borders.
          fill = NA, 
          color = "gray") +
  geom_sf(data = countymap2, aes(fill=tdens2))+     # A layer for the density per county. 
  facet_wrap(vars(yr), ncol = 2) +    # The data is facet_wrapped to organize by year. 
  coord_sf(datum = NA) +   # Set the coordiante system
  scale_fill_distiller(name=expression("Tornadoes/1000 km"^2), # Set the scale fill
                       palette = "YlOrRd", 
                       breaks=pretty_breaks(),
                       direction=1) +
  ggtitle("Tornado Density in Oklahoma by Year") + # Add a title
  theme(plot.title=element_text(color="darkblue", hjust= 0.2, face='bold', size=12)) # Adjust the title

The figure above shows the tornado densities in Oklahoma organized by year. A plot with “NA” years was kept to see where the missing data was coming from. The density is calculated as incidence of tornadoes within a 1000 km2 area. The color palette shows that the more red the county, the more incidence of tornadoes that occurred. In 2019, for example, there were many tornadoes in the northeastern section of the state. That year also saw many tornadoes occurring in many counties. In other years, such as 2018, there seem to be a lot less tornadoes, with not many counties seeing tornado occurrences.

Exercise 3

Here, we generate four choropleth maps of tornado density based on quantile breaks with the number of classes ranging from 3 to 6.

After the four maps are created, we generate a composite figure containing the four maps using plot_grid() and examine how the number of classes changes the interpretation of the map.

We’ll start with creating three classes. First, we’ll get rid of replace “NA” values in the records with “0.”

countymap2 <- countymap2 %>%
    replace(is.na(.), 0)

Now, we create three class. This is basically a list of the class break points.

numclas3 <- 3
qbrks3 <- seq(0, 1, length.out = numclas3 + 1)
qbrks3
## [1] 0.0000000 0.3333333 0.6666667 1.0000000

A new sf object is created assigning the density values to each county.

countymap3 <- countymap2 %>%
  mutate(tdens_c3 = cut(tdens2,
                        breaks = quantile(tdens2, probs = qbrks3),
                        include.lowest = T))

A map is then created.

map_c3 <- ggplot(data = countymap3) +
  geom_sf(aes(fill = tdens_c3)) +
  scale_fill_brewer(name = expression(""),   
                    palette = "PuRd") +
  theme_void() +
  theme(legend.position = "bottom") 

Here is the tornado density map with three classes:

map_c3

The three code chunks above is repeated to create maps with 4, 5, and 6 classes.

Four classes:

numclas4 <- 4
qbrks4 <- seq(0, 1, length.out = numclas4 + 1)
qbrks4
## [1] 0.00 0.25 0.50 0.75 1.00
countymap4 <- countymap2 %>%
  mutate(tdens_c4 = cut(tdens2,
                        breaks = quantile(tdens2, probs = qbrks4),
                        include.lowest = T))
map_c4 <- ggplot(data = countymap4) +
  geom_sf(aes(fill = tdens_c4)) +
  scale_fill_brewer(name = expression(""),   
                    palette = "PuRd") +
  theme_void() +
  theme(legend.position = "bottom") 

Five classes:

numclas5 <- 5
qbrks5 <- seq(0, 1, length.out = numclas5 + 1)
qbrks5
## [1] 0.0 0.2 0.4 0.6 0.8 1.0
countymap5 <- countymap2 %>%
  mutate(tdens_c5 = cut(tdens2,
                        breaks = quantile(tdens2, probs = qbrks5),
                        include.lowest = T))
map_c5 <- ggplot(data = countymap5) +
  geom_sf(aes(fill = tdens_c5)) +
  scale_fill_brewer(name = expression(""),   
                    palette = "PuRd") +
  theme_void() +
  theme(legend.position = "bottom") 

Lastly, six classes:

numclas6 <- 6
qbrks6 <- seq(0, 1, length.out = numclas6 + 1)
qbrks6
## [1] 0.0000000 0.1666667 0.3333333 0.5000000 0.6666667 0.8333333 1.0000000
countymap6 <- countymap2 %>%
  mutate(tdens_c6 = cut(tdens2,
                        breaks = quantile(tdens2, probs = qbrks6),
                        include.lowest = T))
map_c6 <- ggplot(data = countymap6) +
  geom_sf(aes(fill = tdens_c6)) +
  scale_fill_brewer(name = expression(""),   
                    palette = "PuRd") +
  theme_void() +
  theme(legend.position = "bottom") 

We use plot_grid() for a composite map and put together the four different maps.

classes_map <- plot_grid(map_c3, NULL, map_c4, map_c5, NULL, map_c6, # "NULL" maps were added to give space between the maps. 
          labels = c("A) 3 classes", "",
                     "B) 4 classes",
                     "C) 5 classes", "",
                     "D) 6 classes",
                     label_size = 0.2),
          ncol = 3, 
          hjust = 0, 
          vjust = 1,
          label_x = 0,
          align = "hv")

Here’s the final map:

classes_map

Above: A composite map of tornado densities in Oklahoma, split into different class breaks.

The density is calculated as incidence of tornadoes per 1000 km2.

The figure above shows the four map created with the different class breaks. The darker red/pink areas show counties with higher numbers of tornado occurrence within the 1000km2 area defined. As the number of class breaks increase, there is an impression that the number of darker hued counties increase. Though each map were created using the same data points, maps C and D, made with 5 and 6 classes respectively, seems to have higher tornado densities in general compared to maps A and B, which have 3 and 4 classes, respectively.

Conclusion

In this report, three maps were created using tornado data from the years 2016 to 2021 for the state of Oklahoma. First, a composite map of tornado paths and tornado points were created, which showed the distribution and the spatial patterns of tornadoes within the state. Then, density maps were created and presented by year. Some years, such as 2019, saw many more tornadoes in the state. Finally, a composite map exploring different class breaks using tornado densities was created. It revealed that for this dataset, the higher number of classes resulted in an appearance of having more counties with higher tornado occurrences. This resulted in different perceptions of the map, as can be seen by the difference in the overall hue between maps A and D, for example, in the last map.