if(!require(sf)) install.packages("sf") & require(sf)
if(!require(tmap)) install.packages("tmap") & require(tmap)
Spatial Joins and Aggregation
This document explains a standard approach to performing a spatial join between two geographic datasets in R. The process depends on the sf
package. Additionally, aggregation is also explained following the application of a spatial join.
1 Spatial joins
You are faced with two geographical data sets that can fall within the same map space. Your instincts tell you that it is a good idea to merge (join) these data sets together but you do not have a mutual field. Typically this scenario occurs when you want points to inherit the properties of polygons (and vice versa). What do you do? The answer is that you create a spatial join. This is easily achieved using the st_join(...)
function in R. Here’s how it works using the Birth Weight data set featured in various formative assignments.
1.1 Essential packages
For this process, the packages we’ll need are as follows:
1.2 Data
For the sake of demonstration, I have uploaded the birth weight data to my GitHub page, alongside the Shapefiles for Welsh regions it falls within. They can be loaded using the following code:
<- read.csv("https://raw.githubusercontent.com/Richtea84/I2Q-files/main/BIRTHS_NO68.csv")
BIRTHS_DATA <- read_sf("/vsicurl/https://github.com/Richtea84/I2Q-files/raw/main/WALES_clipped/WALES_clipped.shp") POLY_REGIONS
Upon initial inspection, we find that BIRTHS_DATA
has 4536 and the POLY_REGIONS
, 8. Already, we can see that using the merge(...)
function will result in a mismatch error. Let’s perform the spatial join.
1.3 Converting the CSV file into a mappable format
To convert the CSV file into mappable data, we use the st_as_sf(...)
function. Here’s the process:
<- st_as_sf(BIRTHS_DATA, coords = c("LONGITUDE","LATITUDE")) BIRTHS_MAP
Here, the Longitude and Latitude fields are used to match native projections for the POLY_REGIONS data (Word Geodetic System 1984, aka WGS 84). However, we still need to tell R what the coordinate reference system is. We do this by using the st_crs(...)
function in the following way:
st_crs(BIRTHS_MAP) <- 4326
The number 4326 is the European Petroleum Survey Group’s (EPSG) reference number for the WGS 84 projection. In the past we had to insert the full PROJ.4 string. Older packages like sp
still use this convention. For reference, you can retrieve the value +proj=longlat +datum=WGS84 +no_defs +type=crs
in the following way:
st_crs(4326)$proj4string
[1] "+proj=longlat +datum=WGS84 +no_defs"
In any case, sf
readily projects your data from the EPSG vector value.
To ensure that both your data sets are using the same projection, you can use the st_crs(...)
function in the following way:
st_crs(BIRTHS_MAP) <- st_crs(POLY_REGIONS)
We can produce a quick map to see whether the data sets are aligned (Figure 1).
tm_shape(POLY_REGIONS)+
tm_polygons()+
tm_shape(BIRTHS_MAP)+
tm_dots()+
tm_compass()+
tm_scale_bar()
1.4 The joining process
With the geometric properties of data sets set to the same projection, you are now able to perform the join. This is achieved using the st_join(...)
function. If you want your points to inherit the properties of the polygons, specify the polygons as your x =
points and the y =
as your polygons:
<- st_join(x = BIRTHS_MAP, y = POLY_REGIONS)
PTS_MODIFIED
# verify
nrow(PTS_MODIFIED)
[1] 4536
Under some circumstances, you may find it useful for the polygons to inherit the properties of the points. To achieve, you’ll simply reverse the join. For the data featured in this tutorial, this will result in duplications of polygons. For now, we’ll leave this process to one side and carry on with the PTS_MODIFIED
object.
We can verify our join by preparing another simple map (Figure 2).
#
tm_shape(PTS_MODIFIED)+
tm_dots("name")
2 Aggregation
Now that we’ve joined the spatial data, we can apply the aggregate(...)
function to extract further insights from the data based on the properties from the polygons. For example, we can examine the number of recorded births per region (see code below and the result in Table 1).
# Create a count field
$Count = 1
PTS_MODIFIED
# Aggregate (presented as a table)
::kable(
knitraggregate(Count ~ name, data = PTS_MODIFIED, FUN = sum)
)
name | Count |
---|---|
Caerphilly | 58 |
Cardiff Central | 925 |
Cardiff North | 1330 |
Cardiff South and Penarth | 717 |
Cardiff West | 861 |
Newport West | 56 |
Vale of Glamorgan | 586 |
The region with the most recorded births is Cardiff North.
If we wanted to calculate the average birth weight by region, the aggregate(...)
function can be applied in the following way (results shown in Table 2).
::kable(
knitraggregate(Bweight ~ name, data = PTS_MODIFIED, FUN = mean)
)
name | Bweight |
---|---|
Caerphilly | 3.333276 |
Cardiff Central | 3.262681 |
Cardiff North | 3.375639 |
Cardiff South and Penarth | 3.397782 |
Cardiff West | 3.241963 |
Newport West | 3.456250 |
Vale of Glamorgan | 3.146450 |
The region with the heaviest births is Newport West.