Spatial data operations

Disclaimer: The contents of this document come from Chapter 4. Spatial data operations of Geocomputation with `R`(Lovelace, Nowosad, & Muenchow, 2019). This document is prepared for CP6521 Advanced GIS, a graduate-level city planning elective course at Georgia Tech in Spring 2019. For any question, contact the instructor, Yongsung Lee, Ph.D. via yongsung.lee(at)gatech.edu.

This document is also published on RPubs.

if (!require("tidyverse")) install.packages("tidyverse", dependencies = TRUE)
if (!require("stringr")) install.packages("stringr", dependencies = TRUE)

if (!require("sf")) install.packages("sf", dependencies = TRUE)
if (!require("raster")) install.packages("raster", dependencies = TRUE)
if (!require("spData")) install.packages("spData", dependencies = TRUE)
if (!require("devtools")) install.packages("devtools") # for this, you need Rtools installed on your machine 
if (!require("spDataLarge")) devtools::install_github("Nowosad/spDataLarge")

if (!require("sp")) install.packages("sp", dependencies = TRUE)
if (!require("mapview")) install.packages("mapview", dependencies = TRUE)
if (!require("tmap")) install.packages("tmap", dependencies = TRUE)

library(tidyverse)
library(stringr)     # for working with strings (pattern matching)

library(sf)          # classes and functions for vector data
library(raster)      # classes and functions for raster data
library(spData)      # load geographic data
library(spDataLarge) # load larger geographic data

library(sp)
library(mapview)
library(tmap)

library(tidycensus)
library(tigris)
library(lehdr)

Learning objectives

What we do: We focus on vector data.

Learn how to subset an sf object based on its spatial relationships with other sf objects: e.g., st_intersects, st_touches, st_crosses, st_within, and st_disjoint. For a full set of available spatial operations, see sf documentation
Learn about the way sgbp(Sparse Geometry Binary Predicate) works behind the scene.
Learn how to spatially join two sf objects by st_join. See spatial left or inner join
Leanr how to calculate distances between features from different sf objects by st_distance.

4.1 Introduction

Unlike its non-spatial counterpart, spatial data processing requires to specify spatial relationships among features: e.g., one feature is completely within, crosses, touches, or is within a certain distance from another feature.
Note that for spatial data processing, input data sets should have the same (projected) coordinate reference system.

4.2 Spatial operations on vector data

4.2.1 Spatial subsetting

By [ (base R way) or filter() (tidyverse way)
- epsg 2193

# first, create a sf polygon object  
canterbury <- nz %>% filter(Name == "Canterbury") # subsetting by an attribute    

# second, use the just created sf object to subset a sf point object 
canterbury_height <- nz_height[canterbury, ]      # subsetting points that fall in a polygon

Note that Name == "canterbury" creates a logical vector on the fly, which is then used to subset nz.

Similarly, [Canterbury, ] works as if it’s a logical vector, in which TRUE indicates given points located inside the polygon and FALSE indicates the opposite cases.

Spatial relations are mnany, and the default is st_intersects, which includes st_touches, st_crosses, or st_within. The opposite operation is st_disjoint. The below scripts show how to specify different operations.

nz_height[canterbury, , op = st_disjoint]

Here is an in depth explanation on the way R conducts spatial data operations by sgbp(Sparse Geometry Binary Predicate).

sel_sgbp <- st_intersects(x = nz_height, y = canterbury)

sel_sgbp          # check the first 10 elements 
class(sel_sgbp)
typeof(sel_sgbp) 
length(sel_sgbp)  # the number of elements, 101, the same as the number of highest points 
lengths(sel_sgbp) # vectorized operation, the length of individual list elements 

sel_logical <- lengths(sel_sgbp) > 0           # TRUE or 1 indicates given points inside Canterbery 
canterbury_height2 <- nz_height[sel_logical, ] # subsetting by a logical vector, sort of attribute subsetting   

sel_logical2 <- st_intersects(x = nz_height, y = canterbury, sparse = FALSE)      # dense matrix 
class(sel_logical2)
typeof(sel_logical2)
dim(sel_logical2)

sel_logical3 <- st_intersects(x = nz_height, y = canterbury, sparse = FALSE)[, 1] # only the first column 
class(sel_logical3)
typeof(sel_logical3)
length(sel_logical3)

canterbury_height3 <- nz_height %>%
  filter(st_intersects(x = ., y = canterbury, sparse = FALSE)) # . placeholder for the input: i.e., nz_height here

4.2.2 Topological relations

A simple example that helps understand topological relations between sf objects. The main question here is:

What are the main differences among st_intersects, st_disjoint, st_within, st_touches, and st_is_within?

The below scripts create polygon, line, and multipoint sf objects, which are shown in the below figure.

# create a polygon
a_poly = st_polygon(list(rbind(c(-1, -1), c(1, -1), c(1, 1), c(-1, -1)))) # starting/ending points are the same 
a = st_sfc(a_poly)
# create a line
l_line = st_linestring(x = matrix(c(-1, -1, -0.5, 1), ncol = 2))
l = st_sfc(l_line)
# create points
p_matrix = matrix(c(0.5, 1, -1, 0, 0, 1, 0.5, 1), ncol = 2)
p_multi = st_multipoint(x = p_matrix)
p = st_cast(st_sfc(p_multi), "POINT")

st_intersects(p, a)
st_intersects(p, a, sparse = FALSE)
st_disjoint(p, a, sparse = FALSE)[, 1]
st_within(p, a, sparse = FALSE)[, 1]
st_touches(p, a, sparse = FALSE)[, 1]
sel = st_is_within_distance(p, a, dist = 0.9) # can only return a sparse matrix
lengths(sel) > 0

4.2.3 Spatial joining

epsg 4326

set.seed(2018)               # set seed for reproducibility
(bb_world = st_bbox(world))  # the world's bounds: c(xmin, ymin, xmax, ymax)

random_df = tibble(
  x = runif(n = 10, min = bb_world[1], max = bb_world[3]),
  y = runif(n = 10, min = bb_world[2], max = bb_world[4])
)

# Note that random_points has no attributes 
random_points = random_df %>% 
  st_as_sf(coords = c("x", "y")) %>% # set coordinates
  st_set_crs(4326)                   # set geographic CRS 

world_random = world[random_points, ] # spatially subset polygons if they have any point falling inside them.   
nrow(world_random)                    # number of polygons with successful spatial joining 

random_joined = st_join(random_points, world["name_long"]) # spatially join name_long from world to random_points

4.2.4 Non-overlapping joins

A useful reminder of the st_geometry function from RDocumentation: Click here.
A related interesting discussion is here.
For the Spatial reference list: Search with keywords like “NAD83 Georgia”

plot(st_geometry(cycle_hire), col = "blue")
plot(st_geometry(cycle_hire_osm), add = TRUE, pch = 3, col = "red")

tmap_mode("view")
tm_basemap("Stamen.Watercolor") +
  tm_shape(cycle_hire) + 
  tm_symbols(col = "red", shape = 16, size = 0.5, alpha = .5) + 
  tm_shape(cycle_hire_osm) +
  tm_symbols(col = "blue", shape = 16, size = 0.5, alpha = .5) + 
  tm_tiles("Stamen.TonerLabels")

tmap_mode("plot")

epsg 27700

class(st_touches(cycle_hire, cycle_hire_osm, sparse = FALSE)) # dense matrix 
dim(st_touches(cycle_hire, cycle_hire_osm, sparse = FALSE))   # rows- first input, cols - second input
sum(st_touches(cycle_hire, cycle_hire_osm, sparse = FALSE))   # coerce TRUE to 1 and FALSE to 0 
any(st_touches(cycle_hire, cycle_hire_osm, sparse = FALSE))   # if any element is TRUE

cycle_hire_P = st_transform(cycle_hire, 27700)         # projected with epsg = 27700 
cycle_hire_osm_P = st_transform(cycle_hire_osm, 27700) # projected with epsg = 27700 
sel = st_is_within_distance(cycle_hire_P, cycle_hire_osm_P, dist = 20)  # sparse matrix 
summary(lengths(sel) > 0) # for individual cycle_hire_P, how many of them have any cycle_hire_osm_P point in 20 meters?  
sum(lengths(sel) > 0)
mean(lengths(sel) > 0)

z = st_join(cycle_hire_P, cycle_hire_osm_P, st_is_within_distance, dist = 20) # 3rd argument specifies the spatial relation between two inputs
nrow(cycle_hire)
nrow(z) # st_join of two point sf objects creates the pairs of matched points 
n_distinct(z$id) # some cycle_hire points repeat b/c they more than one point from cycle_hire_osm 

z = z %>% 
  group_by(id) %>% 
  summarize(capacity = mean(capacity))
nrow(z) == nrow(cycle_hire)

plot(cycle_hire_osm["capacity"])
plot(z["capacity"])

4.2.5 Spatial data aggregation

First example: Spatial join of polygons and points

nz_height # 101 features
# any(is.na(nz_height$elevation)) # no point with missing elevation 
nz        #  16 features 
nz_avheight = aggregate(x = nz_height, by = nz, FUN = mean)  # 16 features 

nz %>%
  st_join(nz_height) %>%
  group_by(Name) %>%
  summarize(n = sum(is.na(t50_fid) == FALSE)) %>% # count valid point id's per region
  filter(n == 0) # nine out of 16 regions have no highest points 

nz_avheight2 = nz %>%
  st_join(nz_height) %>%
  group_by(Name) %>%
  summarize(elevation = mean(elevation, na.rm = TRUE)) # na.rm = TRUE for those regions with no point

Second example: Spatial join of spatially incongruent polygons: skip for now.

agg_aw = st_interpolate_aw(incongruent[, "value"], aggregating_zones,
                           extensive = TRUE)
agg_aw$value

4.2.6 Distance relations

# Check whether input data sets are projected 
st_crs(nz_height)
st_crs(nz) # the original source of the centerbury sf object 
st_crs(canterbury) 

nz_heighest = nz_height %>% top_n(n = 1, wt = elevation)
canterbury_centroid = st_centroid(canterbury)
st_distance(nz_heighest, canterbury_centroid) # returns a matrix for every possible pair of two points 

co = filter(nz, grepl("Canter|Otag", Name))
st_distance(nz_height[1:3, ], co) # if a point is within a polygon, its distance to the polygon is zero 

plot(st_geometry(nz), col = "grey") 
plot(st_geometry(co)[2], col = "white", add = TRUE) # only Otago 
plot(st_geometry(nz_height)[1:3], pch = 4, lwd = 0.5, col = "blue", add = TRUE)

plot(st_geometry(co)[2], col = "white") # only Otago 
plot(st_geometry(nz_height)[1:3], pch = 4, lwd = 0.5, col = "blue", add = TRUE)
plot(st_geometry(nz_height)[2:3], pch = 3, lwd = 1.5, col = "red", add = TRUE)

4.3 Spatial operations on raster data

Skip

4.4 Exercises

It was established in Section 4.2 that Canterbury was the region of New Zealand containing most of the 100 highest points in the country. How many of these high points does the Canterbury region contain?
Which region has the second highest number of nz_height points in, and how many does it have?
Generalizing the question to all regions: how many of New Zealand’s 16 regions contain points which belong to the top 100 highest points in the country? Which regions?

Bonus: create a table listing these regions in order of the number of points and their name.

Use data(dem, package = "RQGIS"), and reclassify the elevation in three classes: low, medium and high. Secondly, attach the NDVI raster (data(ndvi, package = "RQGIS")) and compute the mean NDVI and the mean elevation for each altitudinal class.
Apply a line detection filter to raster(system.file("external/rlogo.grd", package = "raster")). Plot the result. Hint: Read ?raster::focal().
Calculate the NDVI of a Landsat image. Use the Landsat image provided by the spDataLarge package (system.file("raster/landsat.tif", package="spDataLarge")).
A StackOverflow post shows how to compute distances to the nearest coastline using raster::distance(). Retrieve a digital elevation model of Spain, and compute a raster which represents distances to the coast across the country (hint: use getData()). Second, use a simple approach to weight the distance raster with elevation (other weighting approaches are possible, include flow direction and steepness); every 100 altitudinal meters should increase the distance to the coast by 10 km. Finally, compute the difference between the raster using the Euclidean distance and the raster weighted by elevation. Note: it may be wise to increase the cell size of the input raster to reduce compute time during this operation.