This report covers the practice questions in section 5.7 of Michael C. Wimberly’s Geographic Data Science with R: Vizualizing and Analyzing Environmental Change.
library(sf)
library(ggplot2)
library(dplyr)
library(tidyr)
library(scales)
library(RColorBrewer)
library(units)
library(cowplot)
okcounty <- st_read("ok_counties.shp", quiet = TRUE)
tpoint <- st_read("ok_tornado_point.shp", quiet = TRUE)
tpath <- st_read("ok_tornado_path.shp", quiet = TRUE)
Ensure the data is properly read-in by glimpsing the data and plotting a simple map.
glimpse(okcounty)
## Rows: 77
## Columns: 8
## $ STATEFP <chr> "40", "40", "40", "40", "40", "40", "40", "40", "40", "40", "…
## $ COUNTYFP <chr> "077", "025", "011", "107", "105", "153", "001", "053", "059"…
## $ COUNTYNS <chr> "01101826", "01101800", "01101793", "01101841", "01101840", "…
## $ AFFGEOID <chr> "0500000US40077", "0500000US40025", "0500000US40011", "050000…
## $ GEOID <chr> "40077", "40025", "40011", "40107", "40105", "40153", "40001"…
## $ NAME <chr> "Latimer", "Cimarron", "Blaine", "Okfuskee", "Nowata", "Woodw…
## $ LSAD <chr> "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "…
## $ geometry <POLYGON [°]> POLYGON ((-95.50766 35.0292..., POLYGON ((-103.0025 3…
ggplot(data = okcounty) +
geom_sf(fill = NA) # no fill
- Generate a map of tornado paths where the paths from each year are displayed as a different color, similar to the map of tornado points in Figure 5.4. Create a composite figure containing the map of tornado paths and the map of tornado points using plot_grid().
Using names(), find the column name corresponding to year.
names(tpoint)
## [1] "om" "yr" "mo" "dy" "date" "time"
## [7] "tz" "st" "stf" "stn" "mag" "inj"
## [13] "fat" "loss" "closs" "slat" "slon" "elat"
## [19] "elon" "len" "wid" "fc" "geometry"
We can see that the ‘yr’ column indicates year.
Just like in section 5.1, filter the data to the year range from 2016 to 2021.
tpoint_16_21 <- tpoint %>%
filter(yr >= 2016 & yr <= 2021) %>% # set range for tornado points
select(om, yr, date) # om refers to a unique identifier
tpath_16_21 <- tpath %>%
filter(yr >= 2016 & yr <= 2021) %>% # set range for tornado paths
select(om, yr, date)
Add tornado paths to the plot, displaying each year as a different color.
tpath_plot <- ggplot(data = okcounty) + # save to new object
geom_sf(data = tpath_16_21,
aes(color = as.factor(yr)), # display yr as a discrete variable
size = 1) + # increase line width from default 0.5 to 1
geom_sf(data = okcounty, fill = NA,) +
scale_color_discrete(name = "Year") + # set legend name
coord_sf(datum = NA) +
theme_void() # remove plot axes and labels
tpath_plot # display plot
Plot Figure 5.4 from section 5.2 with tpoint_16_21.
tpoint_plot <- ggplot(data = okcounty) + # save to new object
geom_sf(data = tpoint_16_21,
aes(color = as.factor(yr))) + # display yr as a discrete variable
geom_sf(data = okcounty, fill = NA) +
scale_color_discrete(name = "Year") + # set legend name
coord_sf(datum = NA) +
theme_void() # remove plot axes and labels
tpoint_plot # display plot
Create a composite figure containing the map of tornado paths and the map of tornado points using plot_grid().
plot_grid(tpath_plot, tpoint_plot, # specify plots
labels = c("A) Oklahoma Tornado Paths", # set plot labels
"B) Oklahoma Tornado Occurrences"),
ncol = 1, # set number of columns
hjust = 0, # adjust label position horizontally
vjust = 0.95, # adjust label position vertically
label_size = 10, # adjust label font size
align = "hv") # both horizontally and vertically aligned
- Summarize the density of tornado points by both county and year and generate a faceted plot that displays maps of county-level tornado density from 2016-2021.
okcounty <- okcounty %>%
mutate(area = st_area(okcounty))
First, link the tpoint_16_21 object with the okcounty object using st_join.
countypnt <- st_join(tpoint_16_21, okcounty)
glimpse(countypnt)
## Rows: 434
## Columns: 12
## $ om <dbl> 613662, 613675, 613676, 613677, 613678, 613727, 613728, 61372…
## $ yr <dbl> 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2…
## $ date <chr> "2016-03-23", "2016-03-30", "2016-03-30", "2016-03-30", "2016…
## $ STATEFP <chr> "40", "40", "40", "40", "40", "40", "40", "40", "40", "40", "…
## $ COUNTYFP <chr> "001", "113", "105", "131", "035", "139", "139", "139", "139"…
## $ COUNTYNS <chr> "01101788", "01101844", "01101840", "01101853", "01101805", "…
## $ AFFGEOID <chr> "0500000US40001", "0500000US40113", "0500000US40105", "050000…
## $ GEOID <chr> "40001", "40113", "40105", "40131", "40035", "40139", "40139"…
## $ NAME <chr> "Adair", "Osage", "Nowata", "Rogers", "Craig", "Texas", "Texa…
## $ LSAD <chr> "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "…
## $ area [m^2] 1493619771 [m^2], 5954084222 [m^2], 1503893122 [m^2], 1848819…
## $ geometry <POINT [°]> POINT (-94.5042 35.6916), POINT (-96.0151 36.2151), POI…
The linked attributes show the data corresponding to tornado occurrences (om, yr, and date) linked with okcounty.
countypnt must be grouped by the year, as well as GEOID county code using group_by(). The summarize() function is then used to create a new data frame of the number of records in each year for each GEOID group created with the n() function.
countypnt <- st_drop_geometry(countypnt) # convert to normal data frame
countysum_yr <- countypnt %>%
group_by(GEOID, yr) %>%
summarize(tcnt = n(), .groups = "drop") # count number of records (tornadoes) for each year in each GEOID group
glimpse(countysum_yr)
## Rows: 217
## Columns: 3
## $ GEOID <chr> "40001", "40001", "40001", "40001", "40001", "40005", "40005", "…
## $ yr <dbl> 2016, 2017, 2019, 2020, 2021, 2016, 2019, 2017, 2018, 2019, 2017…
## $ tcnt <int> 1, 1, 1, 1, 2, 2, 1, 1, 1, 2, 3, 1, 3, 1, 1, 1, 1, 2, 1, 5, 4, 1…
The newly created column tcnt displays the count of tornado occurrences per year and GEOID group.
A quick fix needs to be applied to missing values in the countysum_yr dataset in order for plotted maps to show all counties regardless of tornado count. Without running the complete() function on the data, county polygons with 0 tcnt values disappear when plotted. Documentation for the tidyr complete() function can be found here.
countysum_yr_complete <- countysum_yr %>%
complete(GEOID = okcounty$GEOID, yr, fill = list(tcnt = 0))
Perform left_join() on the newly created countysum_yr with okcounty.
countymap_yr <- okcounty %>%
left_join(countysum_yr_complete, by = "GEOID") %>%
mutate(tcnt = replace_na(tcnt, 0)) %>% # replace tcnt "NA" values with 0
mutate(tdens = 10^6 * 10^3 * tcnt / area) %>%
drop_units() %>%
filter(!is.na(yr)) # filter out yr "NA" values only
glimpse(countymap_yr)
## Rows: 462
## Columns: 12
## $ STATEFP <chr> "40", "40", "40", "40", "40", "40", "40", "40", "40", "40", "…
## $ COUNTYFP <chr> "077", "077", "077", "077", "077", "077", "025", "025", "025"…
## $ COUNTYNS <chr> "01101826", "01101826", "01101826", "01101826", "01101826", "…
## $ AFFGEOID <chr> "0500000US40077", "0500000US40077", "0500000US40077", "050000…
## $ GEOID <chr> "40077", "40077", "40077", "40077", "40077", "40077", "40025"…
## $ NAME <chr> "Latimer", "Latimer", "Latimer", "Latimer", "Latimer", "Latim…
## $ LSAD <chr> "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "…
## $ area <dbl> 1890663261, 1890663261, 1890663261, 1890663261, 1890663261, 1…
## $ yr <dbl> 2016, 2017, 2018, 2019, 2020, 2021, 2016, 2017, 2018, 2019, 2…
## $ tcnt <int> 0, 0, 0, 0, 1, 0, 2, 2, 0, 1, 0, 7, 0, 1, 0, 0, 0, 0, 0, 2, 0…
## $ geometry <POLYGON [°]> POLYGON ((-95.50766 35.0292..., POLYGON ((-95.50766 3…
## $ tdens <dbl> 0.0000000, 0.0000000, 0.0000000, 0.0000000, 0.5289149, 0.0000…
Finally, plot 6 maps displaying tornado density for the years 2016-2021.
countymap_yr %>%
filter(yr != 0) %>% # filter out years with "0" values
ggplot() +
geom_sf(aes(fill = tdens)) +
scale_fill_distiller(name = "Tornado Density",
palette = "YlOrRd",
breaks = pretty_breaks(),
direction = 1) + # reverse color ramp direction
facet_wrap(~ yr) + # for each year
labs(
title = "Tornado Density by Year, 2016-2021") +
theme_void()
The tornado density maps show that in 2019, Oklahoma experienced
noticably more tornadoes than in the few preceding and following
years.
- Generate four choropleth maps of tornado density based on quantile breaks with numbers of classes ranging from 3 to 6. Create a composite figure containing the four maps using plot_grid() and examine how the number of classes changes the interpretation of the map.
To generate these choropleth maps, I will prep the data by using the code provided in 5.3 Overlaying Vector Datasets to join okcounty to countysum.
countysum <- countypnt %>%
group_by(GEOID) %>% # group by GEOID
summarize(tcnt = n()) # count the records in each group
countymap <- okcounty %>%
left_join(countysum, by = "GEOID") %>% # ensure all county polygons are
# retained by using left_join()
replace(is.na(.), 0) %>% # NA values are replaced by 0s
mutate(area = st_area(okcounty), # compute area of each county
tdens = 10^6 * 10^3 * tcnt / area) %>% # compute density of tornadoes
drop_units() # remove measurement units
glimpse(countymap)
## Rows: 77
## Columns: 11
## $ STATEFP <chr> "40", "40", "40", "40", "40", "40", "40", "40", "40", "40", "…
## $ COUNTYFP <chr> "077", "025", "011", "107", "105", "153", "001", "053", "059"…
## $ COUNTYNS <chr> "01101826", "01101800", "01101793", "01101841", "01101840", "…
## $ AFFGEOID <chr> "0500000US40077", "0500000US40025", "0500000US40011", "050000…
## $ GEOID <chr> "40077", "40025", "40011", "40107", "40105", "40153", "40001"…
## $ NAME <chr> "Latimer", "Cimarron", "Blaine", "Okfuskee", "Nowata", "Woodw…
## $ LSAD <chr> "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "…
## $ area <dbl> 1890663261, 4766283042, 2427121029, 1657249513, 1503893122, 3…
## $ tcnt <dbl> 1, 12, 1, 10, 6, 2, 6, 0, 4, 9, 3, 10, 12, 1, 2, 8, 13, 4, 5,…
## $ geometry <POLYGON [°]> POLYGON ((-95.50766 35.0292..., POLYGON ((-103.0025 3…
## $ tdens <dbl> 0.5289149, 2.5176851, 0.4120108, 6.0340944, 3.9896452, 0.6197…
After the data is joined, create quantile breaks with numbers of classes ranging from 3 to 6.
countymap <- countymap %>%
# to cut down on code, remove the numclas and qbrks objects from the tutorial
# and enter each class into a single mutate function
mutate(
tdens_c3 = cut(tdens,
breaks = quantile(tdens, probs = seq(0, 1, length.out = 4)),
include.lowest = T),
tdens_c4 = cut(tdens,
breaks = quantile(tdens, probs = seq(0, 1, length.out = 5)),
include.lowest = T),
tdens_c5 = cut(tdens,
breaks = quantile(tdens, probs = seq(0, 1, length.out = 6)),
include.lowest = T),
tdens_c6 = cut(tdens,
breaks = quantile(tdens, probs = seq(0, 1, length.out = 7)),
include.lowest = T)
)
qb3_map <- ggplot(data = countymap) + # 3 class breaks map
geom_sf(aes(fill = tdens_c3)) +
scale_fill_brewer(palette = "YlOrRd") +
theme_void() +
theme(legend.position = "bottom",
legend.text=element_text(size=7))
qb4_map <- ggplot(data = countymap) + # 4 class breaks map
geom_sf(aes(fill = tdens_c4)) +
scale_fill_brewer(palette = "YlOrRd") +
theme_void() +
theme(legend.position = "bottom",
legend.text=element_text(size=7))
qb5_map <- ggplot(data = countymap) + # 5 class breaks map
geom_sf(aes(fill = tdens_c5)) +
scale_fill_brewer(palette = "YlOrRd") +
theme_void() +
theme(legend.position = "bottom",
legend.text=element_text(size=7))
qb6_map <- ggplot(data = countymap) + # 6 class breaks map
geom_sf(aes(fill = tdens_c6)) +
scale_fill_brewer(palette = "YlOrRd") +
theme_void() +
theme(legend.position = "bottom",
legend.text=element_text(size=7))
plot_grid(qb3_map, qb4_map, qb5_map, qb6_map,
ncol = 2, # create 2 columns
align = "hv") # align horizontally & vertically
The number of classes changes the interpretation of the severity of the data. In the maps with 3 and 4 classes, the limited color range gives the impression that counties in the northeast portion of the state (symbolized by dark orange) have seen a similar number of tornadoes as counties in the central and southwest regions of the same color. The third and fourth maps, symbolizing 5 and 6 classes, respectively, show greater variation in the density of tornado occurrences. The darker colors give the impression of a higher concentration of tornado occurrences in the northeast corner of the state, with fewer exceptions in the central and southwest regions than in the other maps. The map symbolizing 5 classes gives the impression of more severe occurrences than any other map, while adding one more class appears to dilute these colors in the fourth map.