Spatial data in R

##Note: ####This lab was created originally by Dr. Jerry Shannon, at the University of Georgia. He kindly shared all of this information with me, and has more to look through on his GitHub site: https://github.com/jshannon75/geog4300/tree/master For our purposes, I have made mild changes to the text and formatting. All mistakes are mine.

Loading and manipulating spatial data

There are several packages available to create and edit spatial data in R. This includes both raster and vector data. This script focuses on the latter. The sf (stands for “simple features”) package is one efficient way to load vector data. Other popular packages for spatial data are raster, terra, and stars.

#install.packages("sf")
library(tidyverse)

## Warning: package 'tidyverse' was built under R version 4.5.2

## Warning: package 'ggplot2' was built under R version 4.5.2

## Warning: package 'tidyr' was built under R version 4.5.2

## Warning: package 'readr' was built under R version 4.5.2

## Warning: package 'purrr' was built under R version 4.5.2

## Warning: package 'forcats' was built under R version 4.5.2

## Warning: package 'lubridate' was built under R version 4.5.2

library(sf)

## Warning: package 'sf' was built under R version 4.5.2

library(tidycensus)

## Warning: package 'tidycensus' was built under R version 4.5.2

library(viridis)

## Warning: package 'viridis' was built under R version 4.5.2

## Warning: package 'viridisLite' was built under R version 4.5.2

The main way to read in data using sf is the function st_read, which uses gdal to read a variety of vector file formats (shapefile, geopackage, geojson, etc.). The resulting object is formatted just like a data frame. For example, we could read our census county dataset this way:

#always remember to set your work space
setwd("//Mac/Home/Downloads/cg_splab/cg_splab")
cty<-st_read("ACSCtyData_2022ACS_simplify.gpkg")

## Reading layer `ACSCtyData_2022ACS_simplify' from data source 
##   `\\Mac\Home\Downloads\cg_splab\cg_splab\ACSCtyData_2022ACS_simplify.gpkg' 
##   using driver `GPKG'
## Simple feature collection with 3107 features and 65 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -124.7322 ymin: 24.54423 xmax: -66.96015 ymax: 49.38437
## Geodetic CRS:  WGS 84

For this script, we won’t be using st_read. Instead, we will download public use microdata area (PUMA) boundaries for Atlanta. PUMAs have uniform populations, making them smaller than counties in metro areas but larger in rural ones. We will download boundaries using the tidycensus package and then filter for those whose name refers to Atlanta.

One way to map these out is the geom_sf function in ggplot, which you will see below.

atl_pumas<-get_acs(geography="puma", #Use PUMA data
                   state="GA", #Only download the state of Georgia
                   variable="B19013_001", #Get the median household income variable
                   geometry=TRUE) %>% #Include the spatial boundaries
  filter(str_detect(NAME,"Atlanta")) %>% #Filter for just Atlanta boundaries using the name
  st_transform(4326) #Transform projection to WGS 84

## Getting data from the 2019-2023 5-year ACS

## Warning: • You have not set a Census API key. Users without a key are limited to 500
## queries per day and may experience performance limitations.
## ℹ For best results, get a Census API key at
## http://api.census.gov/data/key_signup.html and then supply the key to the
## `census_api_key()` function to use it throughout your tidycensus session.
## This warning is displayed once per session.

## Downloading feature geometry from the Census website.  To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.

##   |                                                                              |                                                                      |   0%  |                                                                              |=                                                                     |   1%  |                                                                              |==                                                                    |   3%  |                                                                              |===                                                                   |   4%  |                                                                              |===                                                                   |   5%  |                                                                              |====                                                                  |   6%  |                                                                              |=====                                                                 |   7%  |                                                                              |======                                                                |   9%  |                                                                              |=======                                                               |  10%  |                                                                              |=======                                                               |  11%  |                                                                              |========                                                              |  12%  |                                                                              |=========                                                             |  12%  |                                                                              |=========                                                             |  13%  |                                                                              |==========                                                            |  15%  |                                                                              |===========                                                           |  16%  |                                                                              |============                                                          |  17%  |                                                                              |=============                                                         |  18%  |                                                                              |=============                                                         |  19%  |                                                                              |==============                                                        |  20%  |                                                                              |===============                                                       |  21%  |                                                                              |================                                                      |  22%  |                                                                              |================                                                      |  24%  |                                                                              |===================                                                   |  28%  |                                                                              |======================                                                |  31%  |                                                                              |=======================                                               |  32%  |                                                                              |=======================                                               |  33%  |                                                                              |========================                                              |  34%  |                                                                              |==========================                                            |  38%  |                                                                              |===========================                                           |  39%  |                                                                              |============================                                          |  40%  |                                                                              |=============================                                         |  41%  |                                                                              |=============================                                         |  42%  |                                                                              |==============================                                        |  43%  |                                                                              |===============================                                       |  44%  |                                                                              |==================================                                    |  49%  |                                                                              |====================================                                  |  51%  |                                                                              |====================================                                  |  52%  |                                                                              |=====================================                                 |  53%  |                                                                              |=======================================                               |  56%  |                                                                              |==========================================                            |  61%  |                                                                              |=============================================                         |  65%  |                                                                              |================================================                      |  69%  |                                                                              |===================================================                   |  73%  |                                                                              |======================================================                |  78%  |                                                                              |=========================================================             |  82%  |                                                                              |============================================================          |  86%  |                                                                              |===============================================================       |  90%  |                                                                              |==================================================================    |  94%  |                                                                              |===================================================================== |  99%  |                                                                              |======================================================================| 100%

#If tidycensus isn't working, use this:
#atl_pumas<-st_read("[where you set your workspace]/atl_pumas.gpkg")

ggplot(atl_pumas) + 
  geom_sf()

For this analysis, we’ll be looking at dollar stores in the Atlanta metro in 2020. We will read in a csv file with a national dollar store listing, select only those identified as being in Atlanta and still open (no end year). Lastly, we will use st_as_sf to convert to spatial data.

dollars<-read_csv("dollars.csv") %>%
  filter(str_detect(msa_name,"Atlanta") & is.na(end_year)) %>%
  st_as_sf(coords=c("x","y"), #Identify the coordinate variables
           crs=4326, #Identify the projection (4326 is WGS84)
           remove=FALSE) %>% #Don't remove the coordinate variables
  st_intersection(atl_pumas) #Select only those points that intersect the pumas layer

## Rows: 38492 Columns: 25
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (16): store_name, store_type, store_group, street_num, street_name, addr...
## dbl  (9): record_id, zip, zip4, auth_year, end_year, x, y, cty_fips, msa_fips
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

## Warning: attribute variables are assumed to be spatially constant throughout
## all geometries

The sf package also has a set of spatial functions that can be used on these data. For example, you could convert the PUMA polygons to centroids. We then use geom_sf in ggplot to visualize those points.

ggplot(atl_pumas) + 
  geom_sf()

atl_pumas_points<-atl_pumas %>%
  st_centroid()

## Warning: st_centroid assumes attributes are constant over geometries

ggplot(atl_pumas_points) + 
  geom_sf()

Tools for viewing/mapping spatial data

#install.packages("tmap")
library(tmap)

## Warning: package 'tmap' was built under R version 4.5.2

You can make maps with ggplot, but it’s not the best option out there (in your instructor’s opinion). The tmap package is a popular tool for mapping spatial data. Here’s a basic plot in tmap, which follows a similar logic to ggplot:

tm_shape(atl_pumas)+
  tm_polygons()

You can make a choropleth map by adding a variable. Here we, will use the count of total population using the Jenks natural breaks classification method.

tm_shape(atl_pumas)+
  tm_polygons("estimate", style="jenks")

##

## ── tmap v3 code detected ───────────────────────────────────────────────────────

## [v3->v4] `tm_polygons()`: instead of `style = "jenks"`, use fill.scale =
## `tm_scale_intervals()`.
## ℹ Migrate the argument(s) 'style' to 'tm_scale_intervals(<HERE>)'

You try it!

Load our county census data in geopackage format: ACSCtyData_2022ACS_simplify.gpkg in the data folder. Create a map of those data showing the percentage of foreign born residents: fb_pct. Use the st_transform function to Albers conic projection (CRS 5070) and display the data with Jenks natural breaks classification.

#BB: Load county-level census data and map % foreign-born
cty <- st_read("ACSCtyData_2022ACS_simplify.gpkg") %>%
  st_transform(5070)  # Project to Albers Equal Area (EPSG:5070)

## Reading layer `ACSCtyData_2022ACS_simplify' from data source 
##   `\\Mac\Home\Downloads\cg_splab\cg_splab\ACSCtyData_2022ACS_simplify.gpkg' 
##   using driver `GPKG'
## Simple feature collection with 3107 features and 65 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -124.7322 ymin: 24.54423 xmax: -66.96015 ymax: 49.38437
## Geodetic CRS:  WGS 84

# Map the fb_pct variable using Jenks classification
tm_shape(cty) +
  tm_polygons("fb_pct", 
              style = "fisher", 
              palette = "viridis", 
              title = "% Foreign Born") +
  tm_layout(legend.outside = TRUE)

##

## ── tmap v3 code detected ───────────────────────────────────────────────────────

## [v3->v4] `tm_polygons()`: instead of `style = "fisher"`, use fill.scale =
## `tm_scale_intervals()`.
## ℹ Migrate the argument(s) 'style', 'palette' (rename to 'values') to
##   'tm_scale_intervals(<HERE>)'
## [v3->v4] `tm_polygons()`: migrate the argument(s) related to the legend of the
## visual variable `fill` namely 'title' to 'fill.legend = tm_legend(<HERE>)'

You can you can also add dollar stores. The code below uses just Dollar Generals for mapping. The st_intersection function to keep only those in the ATlanta metro.

dollars_dg<-dollars %>%
  filter(dollar_type=="Dollar General") %>%
  st_intersection(atl_pumas)

## Warning: attribute variables are assumed to be spatially constant throughout
## all geometries

tm_shape(atl_pumas)+
  tm_polygons("estimate",style="jenks")+
tm_shape(dollars_dg)+
  tm_dots(size=0.1,alpha=0.5)

##

## ── tmap v3 code detected ───────────────────────────────────────────────────────

## [v3->v4] `tm_polygons()`: instead of `style = "jenks"`, use fill.scale =
## `tm_scale_intervals()`.
## ℹ Migrate the argument(s) 'style' to 'tm_scale_intervals(<HERE>)'
## [v3->v4] `tm_dots()`: use `fill_alpha` instead of `alpha`.

We can make this map prettier, adding a north arrow and scale bar and moving the legend outside.

tm_shape(atl_pumas)+
  tm_polygons("estimate",style="jenks")+
tm_shape(dollars_dg)+
  tm_dots(size=0.1,alpha=0.5) +
tm_compass()+
tm_scale_bar(position="left")+
tm_legend(legend.outside=TRUE)

##

## ── tmap v3 code detected ───────────────────────────────────────────────────────

## [v3->v4] `tm_polygons()`: instead of `style = "jenks"`, use fill.scale =
## `tm_scale_intervals()`.
## ℹ Migrate the argument(s) 'style' to 'tm_scale_intervals(<HERE>)'
## [v3->v4] `tm_dots()`: use `fill_alpha` instead of `alpha`.
## ! `tm_scale_bar()` is deprecated. Please use `tm_scalebar()` instead.
## [v3->v4] `tm_legend()`: use 'tm_legend()' inside a layer function, e.g.
## 'tm_polygons(..., fill.legend = tm_legend())'

###You try it! Make a map showing median household income and another dollar store chain besides Dollar General. How does it compare to Dollar General’s map?

#BB: Filter for Family Dollar stores instead of Dollar General
dollars_fd <- dollars %>%
  filter(dollar_type == "Family Dollar") %>%
  st_intersection(atl_pumas)

## Warning: attribute variables are assumed to be spatially constant throughout
## all geometries

# Map median household income with Family Dollar locations
tm_shape(atl_pumas) +
  tm_polygons("estimate", 
              style = "fisher", 
              palette = "viridis", 
              title = "Median HH Income") +
tm_shape(dollars_fd) +
  tm_dots(size = 0.1, alpha = 0.5, col = "red") +
  tm_compass() +
  tm_scale_bar(position = "left") +
  tm_legend(legend.outside = TRUE)

##

## ── tmap v3 code detected ───────────────────────────────────────────────────────

## [v3->v4] `tm_polygons()`: instead of `style = "fisher"`, use fill.scale =
## `tm_scale_intervals()`.
## ℹ Migrate the argument(s) 'style', 'palette' (rename to 'values') to
##   'tm_scale_intervals(<HERE>)'
## [v3->v4] `tm_polygons()`: migrate the argument(s) related to the legend of the
## visual variable `fill` namely 'title' to 'fill.legend = tm_legend(<HERE>)'
## [v3->v4] `tm_dots()`: use 'fill' for the fill color of polygons/symbols
## (instead of 'col'), and 'col' for the outlines (instead of 'border.col').
## [v3->v4] `tm_dots()`: use `fill_alpha` instead of `alpha`.
## ! `tm_scale_bar()` is deprecated. Please use `tm_scalebar()` instead.

You can also make interactive maps with tmap. Make sure you set the output to the Console using the gear icon above.

tmap_mode("view") #To shift back to static maps, use tmap_mode("plot")

## ℹ tmap modes "plot" - "view"
## ℹ toggle with `tmap::ttm()`

tm_shape(atl_pumas)+
  tm_polygons("estimate",style="jenks",alpha=0.4)+
tm_shape(dollars_dg)+
  tm_dots(size=0.1)

## 
## ── tmap v3 code detected ───────────────────────────────────────────────────────
## [v3->v4] `tm_polygons()`: instead of `style = "jenks"`, use fill.scale =
## `tm_scale_intervals()`.
## ℹ Migrate the argument(s) 'style' to 'tm_scale_intervals(<HERE>)'[v3->v4] `tm_polygons()`: use `fill_alpha` instead of `alpha`.

There are other good mapping packages available. For example, mapview (https://r-spatial.github.io/mapview/) provides quick interactive maps and the the leaflet package (https://rstudio.github.io/leaflet/) creates maps using that popular javascript library.

You try it!

The wh_locations_coords.csv file in the data folder has most Waffle House locations. Load that file and filter to just those in that Atlanta metro area. Create a map that shows those Waffle House locations and the Atlanta PUMAs with median income.

library(tidyverse)
library(sf)
library(tmap)
library(stringr)

# Load Waffle House CSV
wh <- read_csv("//Mac/Home/Downloads/wh_locations_coords.csv")

## Rows: 2017 Columns: 21
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (15): Company Name, Address, City, State, ZIP Four, County, Metro Area, ...
## dbl  (6): ZIP Code, Primary SIC Code, Primary NAICS, Last Updated On, Longit...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

# Filter out rows with missing coordinates
wh <- wh %>% filter(!is.na(Latitude) & !is.na(Longitude))

# Convert to sf object
wh_sf <- st_as_sf(
  wh,
  coords = c("Longitude", "Latitude"),  # exact column names
  crs = 4326,                            # WGS 84
  remove = FALSE
)

# Filter to Atlanta Metro Area (use the correct name from unique())
wh_atl <- wh_sf %>% filter(`Metro Area` == "Atl-Ss-Rswl, GA")

# Remove any empty geometries
wh_atl <- wh_atl[!st_is_empty(wh_atl), ]

# Confirm that we have rows to plot
nrow(wh_atl)  # should now be > 0

## [1] 278

# Set tmap mode to static plotting
tmap_mode("plot")

## ℹ tmap modes "plot" - "view"

# Map Atlanta PUMAs median income and Waffle House locations
tm_shape(atl_pumas) +
  tm_polygons(
    col = "estimate",       # median household income
    style = "fisher",       # breaks method
    palette = "viridis",    # color palette
    alpha = 0.6,
    title = "Median HH Income"
  ) +
tm_shape(wh_atl) +
  tm_dots(
    col = "orange",         # dot color
    size = 0.1,
    alpha = 0.5
  ) +
tm_compass() +
tm_scalebar(position = "left") +
tm_layout(legend.outside = TRUE)

## 
## ── tmap v3 code detected ───────────────────────────────────────────────────────
## [v3->v4] `tm_polygons()`: instead of `style = "fisher"`, use fill.scale =
## `tm_scale_intervals()`.
## ℹ Migrate the argument(s) 'style', 'palette' (rename to 'values') to
##   'tm_scale_intervals(<HERE>)'[v3->v4] `tm_polygons()`: use `fill_alpha` instead of `alpha`.[v3->v4] `tm_polygons()`: migrate the argument(s) related to the legend of the
## visual variable `fill` namely 'title' to 'fill.legend = tm_legend(<HERE>)'[v3->v4] `tm_dots()`: use `fill_alpha` instead of `alpha`.

Learn more about spatial analysis in R in Manuel Gimond’s web textbook: https://mgimond.github.io/Spatial/