There are many packages to handle spatial data in R. We are going to focus in sf, the most modern and easy-to-use.
sf package, spatial data is represented as a simple feature:
slots that hold spatial informationsf uses the Geospatial Data Abstraction Library (GDAL)
Use st_read to import all spatial data
dataframe <- st_read("filepath.filetype")Use st_write to export all spatial data
st_write(dataframe, "filepath.filetype")Convert a dataframe with x and y —> to a simple feature using st_as_sf()
example <- st_as_sf(dataframe = df, coords = c("lat", "long"), crs = 4326)
Import a csv of schools in Georgia
convert the dataframe to a
simple featureimport a shapefile of the state of Georgia
create a simple map of schools in Georgia using
ggplot.
Open georgia_title_1_schools.R
At the top of your script, load the sf package into your environment
Notice
GEOS and GDAL are loaded when you load sf into your environment. They are the backbone spatial analysis and visualizaton in R. In all languages, really.
Use the same georgia schools csv file from the last lesson, and turn it into an
sfobject using thest_as_sf()function.
Simple feature collection with 2306 features and 2 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: -85.5485 ymin: 30.5251 xmax: -80.845 ymax: 34.9761
Geodetic CRS: WGS 84
# A tibble: 2,306 × 3
school total_…¹ geometry
* <chr> <dbl> <POINT [°]>
1 7 PILLARS CAREER ACADEMY 96 (-84.37143 33.63869)
2 A.L. BURRUSS ELEMENTARY SCHOOL 406 (-84.5781 33.9434)
3 AARON COHN MIDDLE SCHOOL 498 (-84.82468 32.55014)
4 AARON COHN REGIONAL YOUTH DETENTION CENTER 17 (-84.85224 32.52715)
5 ABBOTTS HILL ELEMENTARY SCHOOL 574 (-84.1964 34.0562)
6 ACADEMY FOR CLASSICAL EDUCATION 1825 (-83.72791 32.93929)
7 ACADEMY OF RICHMOND COUNTY HIGH SCHOOL 1043 (-82.0054 33.474)
8 ACWORTH INTERMEDIATE SCHOOL 565 (-84.6532 34.0577)
9 ADAIRSVILLE ELEMENTARY SCHOOL 769 (-84.93798 34.36939)
10 ADAIRSVILLE HIGH SCHOOL 1026 (-84.91689 34.33976)
# … with 2,296 more rows, and abbreviated variable name ¹total_enroll
# ℹ Use `print(n = ...)` to see more rows
Notice
The dataframe for the sf object looks very similar to the regular dataframe except:
Use the
st_read()function to import a shapefile of Georgia (the state) from our computer:You can explicitly define the projection when you read it in, or you can let R determine the projection automatically.
Check the projection of all of your objects using the
st_crs()function. - returns the coordinate reference system that R has assumed.
Coordinate Reference System:
User input: EPSG:4326
wkt:
GEOGCRS["WGS 84",
ENSEMBLE["World Geodetic System 1984 ensemble",
MEMBER["World Geodetic System 1984 (Transit)"],
MEMBER["World Geodetic System 1984 (G730)"],
MEMBER["World Geodetic System 1984 (G873)"],
MEMBER["World Geodetic System 1984 (G1150)"],
MEMBER["World Geodetic System 1984 (G1674)"],
MEMBER["World Geodetic System 1984 (G1762)"],
MEMBER["World Geodetic System 1984 (G2139)"],
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]],
ENSEMBLEACCURACY[2.0]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
CS[ellipsoidal,2],
AXIS["geodetic latitude (Lat)",north,
ORDER[1],
ANGLEUNIT["degree",0.0174532925199433]],
AXIS["geodetic longitude (Lon)",east,
ORDER[2],
ANGLEUNIT["degree",0.0174532925199433]],
USAGE[
SCOPE["Horizontal component of 3D system."],
AREA["World."],
BBOX[-90,-180,90,180]],
ID["EPSG",4326]]
Coordinate Reference System:
User input: WGS 84
wkt:
GEOGCRS["WGS 84",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
CS[ellipsoidal,2],
AXIS["latitude",north,
ORDER[1],
ANGLEUNIT["degree",0.0174532925199433]],
AXIS["longitude",east,
ORDER[2],
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4326]]
spAn older package that provides classes and methods for handling spatial data.
tidycensus is a package that imports census data directly from the U.S. Census as tidyverse-ready dataframes, with an option to include simple feature geometry. It is very nice.
You need a Census API key to use tidycensus which you can obtain here.
There are two major functions implemented in tidycensus:
get_decennial(): grants access to the 2000, 2010, and 2020 decennial US Census APIs
get_acs(): grants access to the 1-year and 5-year American Community Survey APIs
- get_decennial(
- geography, (find available geographies [here](https://walker-data.com/tidycensus/articles/basic-usage.html))
- variables = NULL,
- table = NULL,
- cache_table = FALSE,
- year = 2010,
- sumfile = "sf1",
- state = NULL,
- county = NULL,
- geometry = FALSE,
- output = "tidy",
- keep_geo_vars = FALSE,
- shift_geo = FALSE,
- summary_var = NULL,
- key = NULL,
- show_call = FALSE )
Use load_variables() to return a dataframe of all available variables for each dataset.
To load your API key, use census_api_key("YOUR API KEY GOES HERE"). You only need to add your key once.
We’ll import data and simple feature objects for school districts in Georgia using the
tidycensuspackage. We’ll keep working with the georgia_schools.R script.
Load the packages you’ll need at the top of your script
Add your census API key if necessary
census_api_key("YOUR API KEY GOES HERE")
Let’s look at the variables available from the 2020 decennial census.
Add cache = TRUE so that your computer stores the variables and it is quicker to load next time.
Load variables for the “sf1” dataset.
Import unified school districts for Georgia with total population data and geometry.
Beware of unified, secondary, elementary school districts
The census has unified, secondary, elementary school districts as different datasets. Georgia only has 1 or 2 secondary districts, so I’m ignoring them for this exercise.
Import unified school districts for Georgia with white-alone, not Hispanic or Latino population data without geometry.
Combine the two dataframes and create a variable pct_bipoc = 1 - white-only population / population
ga_sd_pop <- raw_sd_ga_pop |>
select(GEOID, NAME, value, geometry) |>
rename(pop = value) |>
left_join(raw_sd_ga_white |> select(GEOID, value), by = "GEOID") |>
rename(white_o_pop = value) |>
mutate(pct_bipoc_pop = round(1-white_o_pop/pop, 3),
NAME = str_replace(NAME, ", Georgia", "")) |>
select(GEOID, NAME, pop, white_o_pop, pct_bipoc_pop, geometry) # reorder columns so that geometry is at the endAlways join data to the simple feature object
The simple feature object has extra slots and attributes to define it’s spatialness that you don’t want to lose. ALWAYS join data to the simple feature object, not the other way around. If you join a simple feature object to a regular dataframe, you will add the geometry column, but without the slots and attributes.
Look at your data!
We’ll use both of these datasets in the next lessons….
Let’s write out our two spatial dataframes so we can import them into other scripts. We’ll use st_write()
Re-import schools csv without removing
Write out to processed data folder
Notice
We wrote out our simple features as .geojsons.
Shapefiles are the most common spatial data format, but they have 2 annoying features:
For this workshop, we’ll use write out all of our spatial dataframes as .geojsons