The sf R package was designed to provide support for working with the simple features (sf) standard for geographic vector data representation within the R ecosystem. The sf package was first released in 2016 and within a few years had largely surpassed the sp package (first released in 2005) as the most commonly used geographic data representation for vector data in R. In additon to providing support for the simple features data standard, sf includes a wide range of functions for working with spatial vector data in R. If you are accustomed to working with spatial data using traditional GIS desktop software such as QGIS or ArCGIS, you will find analogues to the typical analysis, query, selection, geoprocessing, and visualization GIS tools within the sf package empowered with the tidyverse packages.
Simple features (sf) or simple feature access refers to a formal standard (ISO 19125-1:2004) that describes how objects in the real world can be represented in computers, with emphasis on the spatial geometry of these objects. It also describes how such objects can be stored in and retrieved from databases, and which geometrical operations should be defined for them. The standard is widely implemented in spatial databases (such as PostGIS), commercial GIS (e.g., ESRI ArcGIS) and forms the vector data basis for libraries such as GDAL.
This notebook will introduce you to the sf package and working with geographic vector data in the simple features (sf) data standard. This notebook is only intended to be a very quick introductin to the sf package as many more sf functions wont’t be touched here.
✨ Prerequisites ✨
Have a copy of the MGN2023_MPIO_POLITICO shapefile available from DANE (and used since the begining of the Geomática Básica course.
Setup
Reading and Writing Spatial Data
Creating and Viewing Spatial Objects
This notebook should live in your GB2 directory close to the directory where saved your data
This notebook requires the following R packages and functions.
This notebook uses the following functions from sf:
st_as_sf · convert foreign object to an sf object
st_bbox · return bounding of a simple feature or simple feature set
st_geometry · get, set, replace or rename geometry from an sf object
st_read · read simple features or layers from file or database
st_write · write simple features object to file or database
dplyr - dplyr is a grammar of data manipulation, providing a consistent set of verbs that help you solve the most common data manipulation challenges:
mutate() adds new variables that are functions of existing variables
select() picks variables based on their names.
filter() picks cases based on their values.
summarise() reduces multiple values down to a single summary.
arrange() changes the ordering of the rows.
ggplot2 - ggplot2 is a system for declaratively creating graphics, based on The Grammar of Graphics. You provide the data, tell ggplot2 how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
ggspatial - spatial data plus the power of the ggplot2 framework means easier mapping.
You can install the required libraries from the console using install.packages(“name_of_the_library”). If successful, you should load sf and dplyr to use its functions:
library(sf)
## Linking to GEOS 3.13.0, GDAL 3.8.5, PROJ 9.5.1; sf_use_s2() is TRUE
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(ggspatial)
The st_read function from the sf package reads spatial vector data from a variety of formats supported by GDAL (Geospatial Data Abstraction Library), including ESRI Shapefile (.shp), GeoJSON (.geojson), GeoPackage (.gpkg), KML (.kml) and KMZ (.kmz), CSV with WKT (well-known text) (.csv), and more.
st_read simplifies the process of reading in spatial data by automatically recognizing the file format and returning the data as a simple features (sf) object.
★ Pro Tip: If the coordinate reference system (CRS) is specified in the file, it will be read automatically.
#Get the filenames close to your notebook
list.files("./DANE/MGN2023_MPIO_POLITICO")
## [1] "MGN_ADM_MPIO_GRAFICO.cpg" "MGN_ADM_MPIO_GRAFICO.dbf"
## [3] "MGN_ADM_MPIO_GRAFICO.prj" "MGN_ADM_MPIO_GRAFICO.sbn"
## [5] "MGN_ADM_MPIO_GRAFICO.sbx" "MGN_ADM_MPIO_GRAFICO.shp"
## [7] "MGN_ADM_MPIO_GRAFICO.shp.xml" "MGN_ADM_MPIO_GRAFICO.shx"
Now, you can specify the relative path to read the shapefile:
# read the *Municipios* shapefile as an sf object
colombia <- st_read("./DANE/MGN2023_MPIO_POLITICO/MGN_ADM_MPIO_GRAFICO.shp")
## Reading layer `MGN_ADM_MPIO_GRAFICO' from data source
## `/Users/ials/Documents/unal/GB2024/data/DANE/MGN2023_MPIO_POLITICO/MGN_ADM_MPIO_GRAFICO.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 1121 features and 11 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -81.73562 ymin: -4.229406 xmax: -66.84722 ymax: 13.39473
## Geodetic CRS: MAGNA-SIRGAS
When we use st_read to read a shapefile we get a preview of some of the file’s spatial metadata.
The data was read in using the ESRI Shapefile driver.
The data includes 1121 feature (the number of municipalities in Colombia) and has 11 attribute fields.
The geometry type is multipolygon. What is a multipolygon?
The bounding box is: ………… Complete it!
The coordinate reference system (CRS) is MAGNA-SIRGAS Review your physical notebook to know this CRS’s parameters
head(colombia)
To export spatial data, the st_write() function can save an sf object to a variety of formats.
You can specify the output format by either adding the appropriate file extension or specifying the appropritate GDAL driver. Additional options can be specified to control how the data is written.
★ Pro Tip: The append = F parameter allows the file to overwrite an already-existing file with the same name in the working directory.
st_write(colombia, "municipios.gpkg", driver = "GPKG", append = F)
## Deleting layer `municipios' using driver `GPKG'
## Writing layer `municipios' to data source `municipios.gpkg' using driver `GPKG'
## Writing 1121 features with 11 fields and geometry type Multi Polygon.
# Check the new geopackage file was created
list.files(pattern="gpkg")
## [1] "COL_cities.gpkg" "depto_cordoba.gpkg"
## [3] "mun_cordoba.gpkg" "mun_stder.gpkg"
## [5] "municipios.gpkg" "new_cordoba_cities.gpkg"
## [7] "soc_stder.gpkg" "stader_frutales.gpkg"
## [9] "stder_munic.gpkg"
# read the *Municipios* geopackage as an sf object
colombia2 <- st_read("./municipios.gpkg")
## Reading layer `municipios' from data source
## `/Users/ials/Documents/unal/GB2024/data/municipios.gpkg' using driver `GPKG'
## Simple feature collection with 1121 features and 11 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -81.73562 ymin: -4.229406 xmax: -66.84722 ymax: 13.39473
## Geodetic CRS: MAGNA-SIRGAS
Select By Attributes tool in GIS software allows us to provide an SQL query expression to select features that match the selection criteria.
sf & dplyr on the other hand offers quite easy and straightforward options to perform similar operations. Let’s examine one of them.
(santander <- dplyr::filter(colombia, dpto_cnmbr=="SANTANDER"))
Check the number of features selected. Does it coincide with your previous information about your department?
Then, let’s make a very rudimentary plot the Santander’s municipalities:
plot(st_geometry(santander), col = sf.colors(12, categorical = TRUE), border = 'grey',
axes = TRUE)
plot(st_geometry(st_centroid(santander)), pch = 3, col = 'red', add = TRUE)
Note that we plotted boundaries of municipios as well as their centroids.
Now, we can write the selected features into a new geopackage file for later use:
st_write(santander, "stder_munic.gpkg", driver = "GPKG", append = F)
## Deleting layer `stder_munic' using driver `GPKG'
## Writing layer `stder_munic' to data source `stder_munic.gpkg' using driver `GPKG'
## Writing 87 features with 11 fields and geometry type Multi Polygon.
A very common operations when dealing with geodata is the process of selecting features based on a spatial relationship to another object (layer). In GIS software this process is usually performed using the Select by Location tool. This tool offers selecting features based on the following relationships:
intersect: The features in the input layer will be selected if they intersect a selecting feature. This is the default.
within_a_distance: The features in the input layer will be selected if they are within the specified distance (using Euclidean distance) of a selecting feature. Use the search_distance parameter to specify the distance.
contains: The features in the input layer will be selected if they contain a selecting feature.
completely_contains: The features in the input layer will be selected if they completely contain a selecting feature.
within: The features in the input layer will be selected if they are within a selecting feature.
completely_within: The features in the input layer will be selected if they are completely within or contained by a selecting feature.
are_identical_to: The features in the input layer will be selected if they are identical (in geometry) to a selecting feature.
boundary_touches: The features in the input layer will be selected if they have a boundary that touches a selecting feature. When the input features are lines or polygons, the boundary of the input feature can only touch the boundary of the selecting feature, and no part of the input feature can cross the boundary of the selecting feature.
share_a_line_segment_with: The features in the input layer will be selected if they share a line segment with a selecting feature. The input and selecting features must be line or polygon.
crossed_by_the_outline_of: The features in the input layer will be selected if they are crossed by the outline of a selecting feature. The input and selecting features must be lines or polygons. If polygons are used for the input or selecting layer, the polygon’s boundary (line) will be used. Lines that cross at a point will be selected; lines that share a line segment will not be selected.
have_their_center_in: The features in the input layer will be selected if their center falls within a selecting feature. The center of the feature is calculated as follows: for polygon and multipoint, the geometry’s centroid is used; for line input, the geometry’s midpoint is used.
Let’s illustrate the select by location functionality.
First, let’s read a csv file with world cities. Yes, this is the same file we used several weeks ago in this course. Do you remember the CRS of this file?
cities = read.csv('./worldcities/worldcities.csv') %>%
st_as_sf(coords=c("lng","lat"), crs=4326) # remember x=longitude and y=latitude
What is cities?
cities
We may have noted that the CRS of the two datasets (i.e. santander and cities) do not match. Let’s prove it:
st_crs(cities)$epsg
## [1] 4326
st_crs(santander)$epsg
## [1] 4686
Before using any selection by location we need to transform the cities CRS to match the santander CRS. We can do it by using the st_transform function:
# take note of this function sintax
ncities <- st_transform(cities, crs= st_crs(santander))
Now, we are ready to go:
stder_cities <- ncities[santander, , op = st_within]
Did we get what we expected?
Let’s check it using a rudimentary plot again:
plot(st_geometry(santander), col = sf.colors(12, categorical = TRUE), border = 'grey', axes = TRUE)
plot(st_geometry(stder_cities), pch = 20, col = 'red', add = TRUE)
Note the variety of symbols that can be used for representing point geometries:
We’ll use ggplot() and call geom_sf() which tells R that we’re working with spatial data. When making a map, it’s also crucial to consider the order of our layers. If we place our cities data (stder_cities) before the santander municipalities polygon (santander), the cities will end up under the municipalities polygon and we won’t be able to see them.
ggplot() +
#Add municipalities
geom_sf(data = santander) +
#Add cities layer
geom_sf(data = stder_cities, aes(color = city, label = city), size = 3) +
#Add titles
labs(x = "Longitud", y = "Latitud", title = "Ciudades de Santander") +
#Add theme
theme_bw()
Now, we’ll add some custom elements to our map, like a north arrow and a scale bar, using the ggspatial R package.
ggplot() +
#Crop Virginia boundary to spatial extent of cities and add Virginia layer
geom_sf(data = santander) +
#Add cities layer
geom_sf(data = stder_cities, aes(color = city, label = city), size = 3) +
#Add scale bar to bottom left from ggspatial
annotation_scale(location = "tr", height = unit(.25, "cm"),
width = unit(1, "cm"), pad_x = unit(0.3, "in"),
pad_y = unit(0.5, "in")) +
#Add north arrow to bottom left from ggspatial
annotation_north_arrow(height = unit(1, "cm"), width = unit(1, "cm"),
which_north = "true", location = "tr",
pad_x = unit(0.5, "in"), pad_y = unit(0.05, "in")) +
#Add titles
labs(x = "Longitud", y = "Latitud", title = "Ciudades de Santander") +
#Add theme
theme_bw()
If you got similar results to the ones above displayed, congratulations!!!
sessionInfo()
## R version 4.5.0 (2025-04-11)
## Platform: x86_64-apple-darwin20
## Running under: macOS Sonoma 14.7.1
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.1
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: America/Bogota
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] ggspatial_1.1.9 ggplot2_3.5.2 dplyr_1.1.4 sf_1.0-21
##
## loaded via a namespace (and not attached):
## [1] gtable_0.3.6 jsonlite_2.0.0 compiler_4.5.0 tidyselect_1.2.1
## [5] Rcpp_1.0.14 jquerylib_0.1.4 scales_1.4.0 yaml_2.3.10
## [9] fastmap_1.2.0 R6_2.6.1 generics_0.1.4 classInt_0.4-11
## [13] s2_1.1.8 knitr_1.50 tibble_3.2.1 units_0.8-7
## [17] DBI_1.2.3 bslib_0.9.0 pillar_1.10.2 RColorBrewer_1.1-3
## [21] rlang_1.1.6 cachem_1.1.0 xfun_0.52 sass_0.4.10
## [25] cli_3.6.5 withr_3.0.2 magrittr_2.0.3 wk_0.9.4
## [29] class_7.3-23 digest_0.6.37 grid_4.5.0 lifecycle_1.0.4
## [33] vctrs_0.6.5 KernSmooth_2.23-26 proxy_0.4-27 evaluate_1.0.3
## [37] glue_1.8.0 farver_2.1.2 e1071_1.7-16 rmarkdown_2.29
## [41] tools_4.5.0 pkgconfig_2.0.3 htmltools_0.5.8.1