Spatial Data in R

sf


There are many packages to handle spatial data in R. We are going to focus in sf, the most modern and easy-to-use.

  • In the sf package, spatial data is represented as a simple feature:
    • a set of standards on how to store spatial objects
    • similar to PostGIS, ArcGIS, QGIS
  • spatial data is represented as a data frame with:
    • a geometry column that lists all of the coordinates that make up the shape
    • slots that hold spatial information
  • To learn more about simple features and the sf package, see this vignette

Reading and writing spatial data


sf uses the Geospatial Data Abstraction Library (GDAL)

  • read and write vector and raster data
  • same functions for all types:
  • point, linestring, polygon, multipoint, multilinestring, multipolygon, geometry collection.

Read and write functions

Use st_read to import all spatial data

  • dataframe <- st_read("filepath.filetype")

Use st_write to export all spatial data

  • st_write(dataframe, "filepath.filetype")
  • the filetype indicates the type of file you want to write out to your computer , called the driver. Most commonly that will be either a shapefile (.shp) or geojson (.geojson).

Conversion function

Convert a dataframe with x and y —> to a simple feature using st_as_sf()

  • dataframe
  • coords: the columns that define x and y (latitude and logitude)
  • crs: the coordinate system that translate x and y to a location on the Earth

example <- st_as_sf(dataframe = df, coords = c("lat", "long"), crs = 4326)

In-class exercise: Use simple feature objects with sf

Import a csv of schools in Georgia

  • convert the dataframe to a simple feature

  • import a shapefile of the state of Georgia

  • create a simple map of schools in Georgia using ggplot.

Step 1.

Open georgia_title_1_schools.R

Step 2.

At the top of your script, load the sf package into your environment

# Let's load sf!
library(sf)

Notice


Notice

GEOS and GDAL are loaded when you load sf into your environment. They are the backbone spatial analysis and visualizaton in R. In all languages, really.

Step 3.


Use the same georgia schools csv file from the last lesson, and turn it into an sf object using the st_as_sf() function.

ga_schools <- read_csv("data/raw/ga_school_enrollment_2021.csv") |>
  select(school, long, lat, total_enroll)

convert to sf object


ga_schools_shp <- st_as_sf(ga_schools, 
                           coords = c("long", "lat"), 
                           crs = st_crs(4326))


Simple feature collection with 2306 features and 2 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -85.5485 ymin: 30.5251 xmax: -80.845 ymax: 34.9761
Geodetic CRS:  WGS 84
# A tibble: 2,306 × 3
   school                                     total_…¹             geometry
 * <chr>                                         <dbl>          <POINT [°]>
 1 7 PILLARS CAREER ACADEMY                         96 (-84.37143 33.63869)
 2 A.L. BURRUSS ELEMENTARY SCHOOL                  406   (-84.5781 33.9434)
 3 AARON COHN MIDDLE SCHOOL                        498 (-84.82468 32.55014)
 4 AARON COHN REGIONAL YOUTH DETENTION CENTER       17 (-84.85224 32.52715)
 5 ABBOTTS HILL ELEMENTARY SCHOOL                  574   (-84.1964 34.0562)
 6 ACADEMY FOR CLASSICAL EDUCATION                1825 (-83.72791 32.93929)
 7 ACADEMY OF RICHMOND COUNTY HIGH SCHOOL         1043    (-82.0054 33.474)
 8 ACWORTH INTERMEDIATE SCHOOL                     565   (-84.6532 34.0577)
 9 ADAIRSVILLE ELEMENTARY SCHOOL                   769 (-84.93798 34.36939)
10 ADAIRSVILLE HIGH SCHOOL                        1026 (-84.91689 34.33976)
# … with 2,296 more rows, and abbreviated variable name ¹​total_enroll
# ℹ Use `print(n = ...)` to see more rows

Notice

Notice

The dataframe for the sf object looks very similar to the regular dataframe except:

  • the lat and long columns have become one geometry column
  • two slots have been added: “sf_column” and “agr”
    • these two slots define the geometry column and how it relates to the attributes
    • documentation if you’re curious

Step 4.

Use the st_read() function to import a shapefile of Georgia (the state) from our computer:

You can explicitly define the projection when you read it in, or you can let R determine the projection automatically.

ga <- st_read("~/spatial/Bellwether/mapping-workshop-2022/slides/data/raw/georgia.shp", quiet=TRUE)

Step 5.

Check the projection of all of your objects using the st_crs() function. - returns the coordinate reference system that R has assumed.

st_crs(ga_schools)
Coordinate Reference System: NA


st_crs(ga_schools_shp)
Coordinate Reference System:
  User input: EPSG:4326 
  wkt:
GEOGCRS["WGS 84",
    ENSEMBLE["World Geodetic System 1984 ensemble",
        MEMBER["World Geodetic System 1984 (Transit)"],
        MEMBER["World Geodetic System 1984 (G730)"],
        MEMBER["World Geodetic System 1984 (G873)"],
        MEMBER["World Geodetic System 1984 (G1150)"],
        MEMBER["World Geodetic System 1984 (G1674)"],
        MEMBER["World Geodetic System 1984 (G1762)"],
        MEMBER["World Geodetic System 1984 (G2139)"],
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]],
        ENSEMBLEACCURACY[2.0]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    USAGE[
        SCOPE["Horizontal component of 3D system."],
        AREA["World."],
        BBOX[-90,-180,90,180]],
    ID["EPSG",4326]]


st_crs(ga)
Coordinate Reference System:
  User input: WGS 84 
  wkt:
GEOGCRS["WGS 84",
    DATUM["World Geodetic System 1984",
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["latitude",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["longitude",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    ID["EPSG",4326]]

sp

An older package that provides classes and methods for handling spatial data.

  • Spatial data is an object of Spatial class with 2 slots:
    • a bounding box
    • a CRS class object to define the Coordinate Reference System
  • Spatial data frame: a spatial object with a dataframe
    • SpatialLinesDataFrame, SpatialPointsDataFrame, SpatialPolygonsDataFrame

tidycensus

tidycensus is a package that imports census data directly from the U.S. Census as tidyverse-ready dataframes, with an option to include simple feature geometry. It is very nice.

documentation

You need a Census API key to use tidycensus which you can obtain here.

Major functions

There are two major functions implemented in tidycensus:

  • get_decennial(): grants access to the 2000, 2010, and 2020 decennial US Census APIs
    • only the redistricting file - “pl” - is available from the 2020 decennial census
  • get_acs(): grants access to the 1-year and 5-year American Community Survey APIs


-   get_decennial(
    -   geography, (find available geographies [here](https://walker-data.com/tidycensus/articles/basic-usage.html))
    -   variables = NULL,
    -   table = NULL,
    -   cache_table = FALSE,
    -   year = 2010,
    -   sumfile = "sf1",
    -   state = NULL,
    -   county = NULL,
    -   geometry = FALSE,
    -   output = "tidy",
    -   keep_geo_vars = FALSE,
    -   shift_geo = FALSE,
    -   summary_var = NULL,
    -   key = NULL,
    -   show_call = FALSE )
    

Use load_variables() to return a dataframe of all available variables for each dataset.

To load your API key, use census_api_key("YOUR API KEY GOES HERE"). You only need to add your key once.

In-class exercise: Import census data as a simple feature object

We’ll import data and simple feature objects for school districts in Georgia using the tidycensus package. We’ll keep working with the georgia_schools.R script.

Step 1.

Load the packages you’ll need at the top of your script

Add your census API key if necessary

  • census_api_key("YOUR API KEY GOES HERE")
library(tidyverse)
library(tidycensus)
library(sf)

Step 2.

Let’s look at the variables available from the 2020 decennial census.

Add cache = TRUE so that your computer stores the variables and it is quicker to load next time.

census_pl <- load_variables(year = 2020, 
                         dataset = "pl", 
                         cache=TRUE)

Step 3.

Load variables for the “sf1” dataset.

Step 4.

Import unified school districts for Georgia with total population data and geometry.

raw_sd_ga_pop <- get_decennial(geography = "school district (unified)", 
                               variables = "P1_002N", # total pop
                               state='GA', 
                               geometry = TRUE, 
                               year = 2020)

Notice

Beware of unified, secondary, elementary school districts

The census has unified, secondary, elementary school districts as different datasets. Georgia only has 1 or 2 secondary districts, so I’m ignoring them for this exercise.

Step 5.

Import unified school districts for Georgia with white-alone, not Hispanic or Latino population data without geometry.

# import school districts with white alone data
raw_sd_ga_white <- get_decennial(geography = "school district (unified)", 
                                 variables = "P2_005N", # white alone, not hispanic or latino
                                 state='GA', 
                                 geometry = FALSE, 
                                 year = 2020)

Step 6.

Combine the two dataframes and create a variable pct_bipoc = 1 - white-only population / population

ga_sd_pop <- raw_sd_ga_pop |>
  select(GEOID, NAME, value, geometry) |> 
  rename(pop = value) |>
  left_join(raw_sd_ga_white |> select(GEOID, value), by = "GEOID") |>
  rename(white_o_pop = value) |>
  mutate(pct_bipoc_pop = round(1-white_o_pop/pop, 3),
         NAME = str_replace(NAME, ", Georgia", "")) |>
  select(GEOID, NAME, pop, white_o_pop, pct_bipoc_pop, geometry) # reorder columns so that geometry is at the end

Notice

Always join data to the simple feature object

The simple feature object has extra slots and attributes to define it’s spatialness that you don’t want to lose. ALWAYS join data to the simple feature object, not the other way around. If you join a simple feature object to a regular dataframe, you will add the geometry column, but without the slots and attributes.

Step 7.

Look at your data!

Write out spatial datasets

We’ll use both of these datasets in the next lessons….

Let’s write out our two spatial dataframes so we can import them into other scripts. We’ll use st_write()

In-class exercise: Write out simple feature object

Step 1.

Re-import schools csv without removing

ga_schools <- read_csv("data/raw/ga_school_enrollment_2021.csv") 

# convert to sf object
ga_schools_shp <- st_as_sf(ga_schools, 
                           coords = c("long", "lat"), # define the coordinates, always (x, y)
                           crs = st_crs(4326))

Step 2.

Write out to processed data folder

st_write(ga_schools_shp, "data/processed/ga_schools_shp.geojson")

st_write(ga_sd_pop, "data/processed/ga_sd_pop.geojson")

Notice

Notice

We wrote out our simple features as .geojsons.

Shapefiles are the most common spatial data format, but they have 2 annoying features:

  • The maximum length for column names is 10 characters
    • if they are longer, R shortens the column name (sometimes in unpredictable ways)
    • pushes you to create meaningless column names
  • They consist of 3-7 files that must be kept in the same folder
    • confusing, annoying

For this workshop, we’ll use write out all of our spatial dataframes as .geojsons