sf
Packagesf
PackageThe sf
package is the modern standard in R for working
with spatial vector data. It represents spatial
features (points, lines, polygons) in a special kind of data frame. This
makes it intuitive to use if you are already familiar with
dplyr
and the tidyverse.
An sf
object is essentially a data.frame
with one crucial addition: a special list-column, usually named
geometry
, that holds the spatial information for each
row.
We use the st_read()
function to load vector data from a
file, such as a shapefile (.shp
). Let’s load the
administrative boundaries for Somalia’s regions. You can download this
data from the Humanitarian
Data Exchange (HDX).
library(sf)
library(ggplot2)
library(dplyr)
# Load the shapefile for Somalia's administrative regions (level 1)
# Make sure the file "som_admbnda_adm1_ocha_20250108.shp" is in your working directory
som_adm1 <- st_read("som_admbnda_adm1_ocha_20250108.shp", quiet = TRUE)
# Check the class of the object
class(som_adm1)
## [1] "sf" "data.frame"
## Simple feature collection with 18 features and 10 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 40.99488 ymin: -1.664897 xmax: 51.41303 ymax: 11.9852
## Geodetic CRS: WGS 84
## First 10 features:
## ADM1_EN ADM1_PCODE ADM0_EN ADM0_PCODE date validOn validTo
## 1 Awdal SO11 Somalia SO 2024-12-04 2025-01-08 <NA>
## 2 Bakool SO25 Somalia SO 2024-12-04 2025-01-08 <NA>
## 3 Banadir SO22 Somalia SO 2024-12-04 2025-01-08 <NA>
## 4 Bari SO16 Somalia SO 2024-12-04 2025-01-08 <NA>
## 5 Bay SO24 Somalia SO 2024-12-04 2025-01-08 <NA>
## 6 Galgaduud SO19 Somalia SO 2024-12-04 2025-01-08 <NA>
## 7 Gedo SO26 Somalia SO 2024-12-04 2025-01-08 <NA>
## 8 Hiraan SO20 Somalia SO 2024-12-04 2025-01-08 <NA>
## 9 Lower Juba SO28 Somalia SO 2024-12-04 2025-01-08 <NA>
## 10 Lower Shabelle SO23 Somalia SO 2024-12-04 2025-01-08 <NA>
## Shape_Leng Shape_Area AREA_SQKM geometry
## 1 5.657071 1.31222441 15884.8143 MULTIPOLYGON (((43.46189 11...
## 2 6.650225 2.10138869 25797.6141 MULTIPOLYGON (((44.03028 4....
## 3 1.068762 0.02661158 327.3517 MULTIPOLYGON (((45.55389 2....
## 4 12.218762 5.61806371 68077.0071 MULTIPOLYGON (((50.79877 11...
## 5 8.086913 3.57292945 43931.5260 MULTIPOLYGON (((44.3111 3.5...
## 6 9.101881 4.01935108 49279.5284 MULTIPOLYGON (((47.07738 6....
## 7 9.042089 3.66528190 45056.5014 MULTIPOLYGON (((42.88542 4....
## 8 7.046882 2.75764693 33853.5458 MULTIPOLYGON (((45.71684 5....
## 9 11.338459 3.89035579 47883.1992 MULTIPOLYGON (((41.9267 -1....
## 10 8.115243 2.08196012 25611.8031 MULTIPOLYGON (((45.33112 2....
The output shows us key information: - Geometry
type: MULTIPOLYGON (since regions are areas) -
Dimension: XY (2D data) - Bounding
box: The geographic extent of the data - CRS:
The Coordinate Reference System (WGS 84) - Attributes:
A standard data frame with columns like ADM1_EN
(region
name), ADM0_EN
(country name), etc. - geometry
column: The special column holding the polygon data.
We can quickly plot the object to see the map.
sf
ObjectsSince an sf
object is like a data frame, we can use
standard dplyr
or base R methods to filter and select
data.
Example: Isolate the Banadir region (where Mogadishu is).
banadir <- som_adm1 %>%
filter(ADM1_EN == "Banadir")
# Plot just the Banadir region
ggplot() +
geom_sf(data = banadir, fill = "lightblue", color = "black") +
ggtitle("Map of Banaadir Region") +
theme_bw()
Example: Isolate the regions of Somaliland. Somaliland’s regions in this dataset are Awdal, Woqooyi Galbeed, Togdheer, Sanaag, and Sool.
somaliland_regions <- som_adm1 %>%
filter(ADM1_EN %in% c("Awdal", "Woqooyi Galbeed", "Togdheer", "Sanaag", "Sool"))
# Plot Somaliland regions
ggplot() +
geom_sf(data = somaliland_regions, aes(fill = ADM1_EN)) +
ggtitle("Regions of Somaliland") +
theme_bw() +
labs(fill = "Region Name")
A very common task is joining non-spatial data (like statistics) to a
spatial object for mapping. Let’s create a dummy dataset of Internally
Displaced Person (IDP) counts for a few regions and join it to our
som_adm1
map.
# 1. Create a simple data frame with IDP data
idp_data <- data.frame(
region_name = c("Banaadir", "Lower Shabelle", "Gedo", "Woqooyi Galbeed"),
idp_count = c(500000, 120000, 85000, 95000)
)
# 2. Join the IDP data to the sf object
# We join by the region name column in both datasets.
som_with_idps <- left_join(som_adm1, idp_data, by = c("ADM1_EN" = "region_name"))
# 3. Map the results (a choropleth map)
ggplot(data = som_with_idps) +
geom_sf(aes(fill = idp_count)) +
scale_fill_viridis_c(name = "IDP Count", na.value = "grey90") +
ggtitle("Estimated IDP Counts by Region in Somalia") +
theme_bw()
Regions without data in our
idp_data
frame are colored
grey. This is a powerful way to visualize spatial patterns in your
data.
sf
Objects from CoordinatesWe can also create our own sf
objects from a table of
coordinates. This is useful for mapping specific locations. Let’s create
an sf
object for the capital cities of Somalia and
Somaliland.
# 1. Create a data frame with city names and coordinates
cities <- data.frame(
name = c("Mogadishu", "Hargeisa"),
longitude = c(45.333333, 44.066667),
latitude = c(2.033333, 9.566667)
)
# 2. Convert the data frame to an sf object
# We specify the coordinate columns and the CRS (4326 is the code for WGS 84)
cities_sf <- st_as_sf(cities, coords = c("longitude", "latitude"), crs = 4326)
class(cities_sf)
## [1] "sf" "data.frame"
## Simple feature collection with 2 features and 1 field
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: 44.06667 ymin: 2.033333 xmax: 45.33333 ymax: 9.566667
## Geodetic CRS: WGS 84
## name geometry
## 1 Mogadishu POINT (45.33333 2.033333)
## 2 Hargeisa POINT (44.06667 9.566667)
# 3. Plot the cities on top of the country map
ggplot() +
geom_sf(data = som_adm1) + # Base map
geom_sf(data = cities_sf, color = "red", size = 4) + # City points
ggtitle("Capitals on the Map of Somalia") +
theme_bw()
This lecture covered the fundamentals of reading, manipulating, and
creating vector data with sf
, using real-world examples
from Somalia and Somaliland. ```