CLASS 06: Spatial Data Operators

In this class we will work with maps and analize the case when some data intersects our maps.

Before to start our project, lets clean our working space.

rm(list = ls())

Now the packages we will use:

library(sf)
## Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1
library(raster)
## Loading required package: sp
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:raster':
## 
##     intersect, select, union
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(spData)
library(ggspatial)
## Warning: package 'ggspatial' was built under R version 4.0.3
library(ggplot2)
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.0.3
## -- Attaching packages -------------------------------------------------------------------------------- tidyverse 1.3.0 --
## v tibble  3.0.3     v purrr   0.3.4
## v tidyr   1.1.2     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.5.0
## -- Conflicts ----------------------------------------------------------------------------------- tidyverse_conflicts() --
## x tidyr::extract() masks raster::extract()
## x dplyr::filter()  masks stats::filter()
## x dplyr::lag()     masks stats::lag()
## x dplyr::select()  masks raster::select()

In this example we use projected data for the 16 main regions and 101 points in New Zealand. The following code generates the data base for those objects:

view(nz_height)
data(nz)
nz
## Simple feature collection with 16 features and 6 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 1090144 ymin: 4748537 xmax: 2089533 ymax: 6191874
## projected CRS:  NZGD2000 / New Zealand Transverse Mercator 2000
## First 10 features:
##                 Name Island Land_area Population Median_income Sex_ratio
## 1          Northland  North 12500.561     175500         23400 0.9424532
## 2           Auckland  North  4941.573    1657200         29600 0.9442858
## 3            Waikato  North 23900.036     460100         27900 0.9520500
## 4      Bay of Plenty  North 12071.145     299900         26200 0.9280391
## 5           Gisborne  North  8385.827      48500         24400 0.9349734
## 6        Hawke's Bay  North 14137.524     164000         26100 0.9238375
## 7           Taranaki  North  7254.480     118000         29100 0.9569363
## 8  Manawatu-Wanganui  North 22220.608     234500         25000 0.9387734
## 9         Wellington  North  8048.553     513900         32700 0.9335524
## 10        West Coast  South 23245.456      32400         26900 1.0139072
##                              geom
## 1  MULTIPOLYGON (((1745493 600...
## 2  MULTIPOLYGON (((1803822 590...
## 3  MULTIPOLYGON (((1860345 585...
## 4  MULTIPOLYGON (((2049387 583...
## 5  MULTIPOLYGON (((2024489 567...
## 6  MULTIPOLYGON (((2024489 567...
## 7  MULTIPOLYGON (((1740438 571...
## 8  MULTIPOLYGON (((1866732 566...
## 9  MULTIPOLYGON (((1881590 548...
## 10 MULTIPOLYGON (((1557042 531...

Lets separate only one of those regions, Canterbury, and save its information:

canterbury <- nz %>% 
  filter(Name == "Canterbury")

canterbury_height <- nz_height[canterbury, ]
view(canterbury_height)

Lets observe graphically what information we just created:

ggplot() + geom_sf(data = nz)

ggplot() + geom_sf(data = canterbury_height)

We can make the map more beautiful by using the following code:

ggplot() + geom_sf(data = nz, fill="white") + geom_sf(fill="antiquewhite") +
  geom_sf(data = canterbury, fill="yellow") + geom_sf(data=nz_height, size=3, shape=23, fill="red") +
    xlab("Longitude") + ylab("Latitude") + ggtitle("High points in New Zealand") +
  annotation_north_arrow(location = "br", which_north = "true", pad_x = unit(0.25, "in"), pad_y = unit(0.15, "in"),
                                                                  style = north_arrow_fancy_orienteering) + 
  theme(panel.background = element_rect(fill = "lightblue"))

Disjoint or Intersect elements

If we are interested in find the points that are outside a certain space or polygon, we can use the function st_joint:

nz_height[canterbury, , op = st_disjoint]
## Simple feature collection with 31 features and 2 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 1204143 ymin: 5048309 xmax: 1822492 ymax: 5650492
## projected CRS:  NZGD2000 / New Zealand Transverse Mercator 2000
## First 10 features:
##    t50_fid elevation                geometry
## 1  2353944      2723 POINT (1204143 5049971)
## 2  2354404      2820 POINT (1234725 5048309)
## 3  2354405      2830 POINT (1235915 5048745)
## 4  2369113      3033 POINT (1259702 5076570)
## 12 2363996      2759 POINT (1373264 5175442)
## 25 2364028      2756 POINT (1374183 5177165)
## 26 2364029      2800 POINT (1374469 5176966)
## 27 2364031      2788 POINT (1375422 5177253)
## 46 2364166      2782 POINT (1383006 5181085)
## 47 2364167      2905 POINT (1383486 5181270)
nz_height[canterbury, 2, op = st_disjoint]
## Simple feature collection with 31 features and 1 field
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 1204143 ymin: 5048309 xmax: 1822492 ymax: 5650492
## projected CRS:  NZGD2000 / New Zealand Transverse Mercator 2000
## First 10 features:
##    elevation                geometry
## 1       2723 POINT (1204143 5049971)
## 2       2820 POINT (1234725 5048309)
## 3       2830 POINT (1235915 5048745)
## 4       3033 POINT (1259702 5076570)
## 12      2759 POINT (1373264 5175442)
## 25      2756 POINT (1374183 5177165)
## 26      2800 POINT (1374469 5176966)
## 27      2788 POINT (1375422 5177253)
## 46      2782 POINT (1383006 5181085)
## 47      2905 POINT (1383486 5181270)

Now lets identify and safe the information for those points that intersects Canterbury, we can use the following code:

sel_sgbp <- st_intersects(x = nz_height, y = canterbury)  

In the previous code, an object of type sbgp is created, i.e. a sparse geometry binary predicate. Also, we create a list of length x converted into a logical vector sel_logical. The function lengths() identifies which features in nz_height intersect with any objects in y.

class(sel_sgbp)
## [1] "sgbp" "list"
sel_logical <- lengths(sel_sgbp) > 0
canterbury_height2 <- nz_height[sel_logical, ]

Of course, we can get the same results by using pipes and the function filter():

canterbury_height3 <- nz_height %>%
  filter(st_intersects(x = ., y = canterbury, sparse = FALSE))

Graphically what we obtain is:

ggplot() + geom_sf(data = nz, fill="white") + geom_sf(fill="antiquewhite") +
  geom_sf(data = canterbury, fill="yellow") + geom_sf(data=canterbury_height, size=3, shape=23, fill="red") +
  xlab("Longitude") + ylab("Latitude") + ggtitle("High points in Canterburry / New Zealand") +
  annotation_north_arrow(location = "br", which_north = "true", pad_x = unit(0.25, "in"), pad_y = unit(0.15, "in"),
                         style = north_arrow_fancy_orienteering) + 
  theme(panel.background = element_rect(fill = "lightblue"))

Topologic relations

It describes the spatial relationships between objects. In order to understand in detail what we are doing, lets create some spatial objects, a polygon (a), a line (l), and some points (p).

# Polygon
p_poly <- st_polygon(list(rbind(c(-1, -1), c(1, -1), c(1, 1), c(-1, -1))))
p <- st_sfc(p_poly)

# Line
l_line <- st_linestring(x = matrix(c(-1, -1, -0.5, 1), ncol = 2))
l <- st_sfc(l_line)

# Points
a_matrix <- matrix(c(0.5, 1, -1, 0, 0, 1, 0.5, 1), ncol = 2)
a_multi <- st_multipoint(x = a_matrix)
a <- st_cast(st_sfc(a_multi), "POINT")

Lets observe graphically what we created

plot(p, axes = TRUE, lwd=2, bg="antiquewhite") 
plot(l, add=TRUE, lwd=2)
plot(a, add=TRUE, cex=2, fill = "red", pch=21, bg="red")

Lets identify some relationships between two polygons:

st_intersects(a, p)
## Sparse geometry binary predicate list of length 4, where the predicate was `intersects'
##  1: 1
##  2: 1
##  3: (empty)
##  4: (empty)
st_intersects(a, p, sparse = FALSE)
##       [,1]
## [1,]  TRUE
## [2,]  TRUE
## [3,] FALSE
## [4,] FALSE
st_disjoint(a, p, sparse = FALSE)[,1]  #[,1] it changes the output into a vector
## [1] FALSE FALSE  TRUE  TRUE
st_within(a, p, sparse = FALSE)[, 1]
## [1]  TRUE FALSE FALSE FALSE
st_touches(a, p, sparse = FALSE)[, 1]
## [1] FALSE  TRUE FALSE FALSE

Lets imagine that we want to find those points that do not touch the polygon we are interesed in but almost touch it. For that purpose, we use the command st_is_within_distance(), which includes the dist option, which give as a relative reference of how close can be a point.

# Lets check if the points "almost" touch the polygon rather than just touch, we set the distance.
sel <- st_is_within_distance(a, p, dist = 0.9) # can only return a sparse matrix
lengths(sel) > 0
## [1]  TRUE  TRUE FALSE  TRUE

Spatial Joins

For this purpose, imagine you have points randomly distributed across the Earth’s surface. Lets compare with World map information:

set.seed(2020)                # set seed for reproducibility
(bb_world <- st_bbox(world))  # the world's bounds
##       xmin       ymin       xmax       ymax 
## -180.00000  -90.00000  180.00000   83.64513
random_df <- tibble(
  x = runif(n = 100, min = bb_world[1], max = bb_world[3]),
  y = runif(n = 100, min = bb_world[2], max = bb_world[4])
)
random_points <- random_df %>% 
  st_as_sf(coords = c("x", "y")) %>%  # set coordinates
  st_set_crs(4326)                    # set geographic CRS

Now we identify the joined set:

world_random <- world[random_points, ]
## although coordinates are longitude/latitude, st_intersects assumes that they are planar
nrow(world_random)
## [1] 17
random_joined <- st_join(random_points, world["name_long"])
## although coordinates are longitude/latitude, st_intersects assumes that they are planar
## although coordinates are longitude/latitude, st_intersects assumes that they are planar
random_joined2 <- st_join(random_points, world["name_long"], left = FALSE)
## although coordinates are longitude/latitude, st_intersects assumes that they are planar
## although coordinates are longitude/latitude, st_intersects assumes that they are planar

Graphycally we have:

# graphic
ggplot() + geom_sf(data=world, fill="white") 

ggplot() + geom_sf(data=world, fill="white") + geom_sf(data=random_joined, col="red") + 
  theme(panel.background = element_rect(fill = "lightblue")) + ggtitle("World Map")

ggplot() + geom_sf(data=world, fill="white") + geom_sf(data=random_joined2, col="red") + 
  theme(panel.background = element_rect(fill = "lightblue"))

Spatial data aggregation

Aggregated data show some statistics about a variable related to some kind of grouping variable. By using the New Zealand example, imagine we want to find out the average height of high points in each region:

view(nz)
nz_avheight <- aggregate(x = nz_height, by = nz, FUN = mean)

Graphically:

ggplot() +  geom_sf(data = nz_avheight) +
  theme(panel.background = element_rect(fill = "lightblue")) 

Same results we can get by using pipes:

nz_avheight2 <- nz %>%
  st_join(nz_height) %>%
  group_by(Name) %>%
  summarize(elevation = mean(elevation, na.rm = TRUE))
## `summarise()` ungrouping output (override with `.groups` argument)
ggplot(nz_avheight2) + geom_sf(aes(fill = elevation)) + theme(panel.background = element_rect(fill = "lightblue")) + 
  scale_fill_stepsn(colors = c('#efedf5','#bcbddc','#756bb1'))

table(nz_avheight2$elevation)
## 
##             2720             2723 2734.33333333333             2777 
##                1                1                1                1 
##             2825 2889.45454545455           2994.6 
##                1                1                1