R(Lovelace, Nowosad, & Muenchow, 2019). This document is prepared for CP6521 Advanced GIS, a graduate-level city planning elective course at Georgia Tech in Spring 2019. For any question, contact the instructor, Yongsung Lee, Ph.D. via yongsung.lee(at)gatech.edu.if (!require("tidyverse")) install.packages("tidyverse", dependencies = TRUE)
if (!require("stringr")) install.packages("stringr", dependencies = TRUE)
if (!require("sf")) install.packages("sf", dependencies = TRUE)
if (!require("raster")) install.packages("raster", dependencies = TRUE)
if (!require("spData")) install.packages("spData", dependencies = TRUE)
if (!require("devtools")) install.packages("devtools") # for this, you need Rtools installed on your machine
if (!require("spDataLarge")) devtools::install_github("Nowosad/spDataLarge")
if (!require("sp")) install.packages("sp", dependencies = TRUE)
if (!require("mapview")) install.packages("mapview", dependencies = TRUE)
if (!require("tmap")) install.packages("tmap", dependencies = TRUE)
library(tidyverse)
library(stringr) # for working with strings (pattern matching)
library(sf) # classes and functions for vector data
library(raster) # classes and functions for raster data
library(spData) # load geographic data
library(spDataLarge) # load larger geographic data
library(sp)
library(mapview)
library(tmap)
library(tidycensus)
library(tigris)
library(lehdr)
What we do: We focus on vector data.
Learn how to subset an sf object based on its spatial relationships with other sf objects: e.g., st_intersects, st_touches, st_crosses, st_within, and st_disjoint. For a full set of available spatial operations, see sf documentation
Learn about the way sgbp(Sparse Geometry Binary Predicate) works behind the scene.
Learn how to spatially join two sf objects by st_join. See spatial left or inner join
Leanr how to calculate distances between features from different sf objects by st_distance.
Unlike its non-spatial counterpart, spatial data processing requires to specify spatial relationships among features: e.g., one feature is completely within, crosses, touches, or is within a certain distance from another feature.
Note that for spatial data processing, input data sets should have the same (projected) coordinate reference system.
By [ (base R way) or filter() (tidyverse way)
- epsg 2193
# first, create a sf polygon object
canterbury <- nz %>% filter(Name == "Canterbury") # subsetting by an attribute
# second, use the just created sf object to subset a sf point object
canterbury_height <- nz_height[canterbury, ] # subsetting points that fall in a polygon
Note that Name == "canterbury" creates a logical vector on the fly, which is then used to subset nz.
Similarly, [Canterbury, ] works as if it’s a logical vector, in which TRUE indicates given points located inside the polygon and FALSE indicates the opposite cases.
Spatial relations are mnany, and the default is st_intersects, which includes st_touches, st_crosses, or st_within. The opposite operation is st_disjoint. The below scripts show how to specify different operations.
nz_height[canterbury, , op = st_disjoint]
Here is an in depth explanation on the way R conducts spatial data operations by sgbp(Sparse Geometry Binary Predicate).
sel_sgbp <- st_intersects(x = nz_height, y = canterbury)
sel_sgbp # check the first 10 elements
class(sel_sgbp)
typeof(sel_sgbp)
length(sel_sgbp) # the number of elements, 101, the same as the number of highest points
lengths(sel_sgbp) # vectorized operation, the length of individual list elements
sel_logical <- lengths(sel_sgbp) > 0 # TRUE or 1 indicates given points inside Canterbery
canterbury_height2 <- nz_height[sel_logical, ] # subsetting by a logical vector, sort of attribute subsetting
sel_logical2 <- st_intersects(x = nz_height, y = canterbury, sparse = FALSE) # dense matrix
class(sel_logical2)
typeof(sel_logical2)
dim(sel_logical2)
sel_logical3 <- st_intersects(x = nz_height, y = canterbury, sparse = FALSE)[, 1] # only the first column
class(sel_logical3)
typeof(sel_logical3)
length(sel_logical3)
canterbury_height3 <- nz_height %>%
filter(st_intersects(x = ., y = canterbury, sparse = FALSE)) # . placeholder for the input: i.e., nz_height here
A simple example that helps understand topological relations between sf objects. The main question here is:
st_intersects, st_disjoint, st_within, st_touches, and st_is_within?The below scripts create polygon, line, and multipoint sf objects, which are shown in the below figure.
# create a polygon
a_poly = st_polygon(list(rbind(c(-1, -1), c(1, -1), c(1, 1), c(-1, -1)))) # starting/ending points are the same
a = st_sfc(a_poly)
# create a line
l_line = st_linestring(x = matrix(c(-1, -1, -0.5, 1), ncol = 2))
l = st_sfc(l_line)
# create points
p_matrix = matrix(c(0.5, 1, -1, 0, 0, 1, 0.5, 1), ncol = 2)
p_multi = st_multipoint(x = p_matrix)
p = st_cast(st_sfc(p_multi), "POINT")
st_intersects(p, a)
st_intersects(p, a, sparse = FALSE)
st_disjoint(p, a, sparse = FALSE)[, 1]
st_within(p, a, sparse = FALSE)[, 1]
st_touches(p, a, sparse = FALSE)[, 1]
sel = st_is_within_distance(p, a, dist = 0.9) # can only return a sparse matrix
lengths(sel) > 0
set.seed(2018) # set seed for reproducibility
(bb_world = st_bbox(world)) # the world's bounds: c(xmin, ymin, xmax, ymax)
random_df = tibble(
x = runif(n = 10, min = bb_world[1], max = bb_world[3]),
y = runif(n = 10, min = bb_world[2], max = bb_world[4])
)
# Note that random_points has no attributes
random_points = random_df %>%
st_as_sf(coords = c("x", "y")) %>% # set coordinates
st_set_crs(4326) # set geographic CRS
world_random = world[random_points, ] # spatially subset polygons if they have any point falling inside them.
nrow(world_random) # number of polygons with successful spatial joining
random_joined = st_join(random_points, world["name_long"]) # spatially join name_long from world to random_points
st_geometry function from RDocumentation: Click here.plot(st_geometry(cycle_hire), col = "blue")
plot(st_geometry(cycle_hire_osm), add = TRUE, pch = 3, col = "red")
tmap_mode("view")
tm_basemap("Stamen.Watercolor") +
tm_shape(cycle_hire) +
tm_symbols(col = "red", shape = 16, size = 0.5, alpha = .5) +
tm_shape(cycle_hire_osm) +
tm_symbols(col = "blue", shape = 16, size = 0.5, alpha = .5) +
tm_tiles("Stamen.TonerLabels")
tmap_mode("plot")
class(st_touches(cycle_hire, cycle_hire_osm, sparse = FALSE)) # dense matrix
dim(st_touches(cycle_hire, cycle_hire_osm, sparse = FALSE)) # rows- first input, cols - second input
sum(st_touches(cycle_hire, cycle_hire_osm, sparse = FALSE)) # coerce TRUE to 1 and FALSE to 0
any(st_touches(cycle_hire, cycle_hire_osm, sparse = FALSE)) # if any element is TRUE
cycle_hire_P = st_transform(cycle_hire, 27700) # projected with epsg = 27700
cycle_hire_osm_P = st_transform(cycle_hire_osm, 27700) # projected with epsg = 27700
sel = st_is_within_distance(cycle_hire_P, cycle_hire_osm_P, dist = 20) # sparse matrix
summary(lengths(sel) > 0) # for individual cycle_hire_P, how many of them have any cycle_hire_osm_P point in 20 meters?
sum(lengths(sel) > 0)
mean(lengths(sel) > 0)
z = st_join(cycle_hire_P, cycle_hire_osm_P, st_is_within_distance, dist = 20) # 3rd argument specifies the spatial relation between two inputs
nrow(cycle_hire)
nrow(z) # st_join of two point sf objects creates the pairs of matched points
n_distinct(z$id) # some cycle_hire points repeat b/c they more than one point from cycle_hire_osm
z = z %>%
group_by(id) %>%
summarize(capacity = mean(capacity))
nrow(z) == nrow(cycle_hire)
plot(cycle_hire_osm["capacity"])
plot(z["capacity"])
First example: Spatial join of polygons and points
nz_height # 101 features
# any(is.na(nz_height$elevation)) # no point with missing elevation
nz # 16 features
nz_avheight = aggregate(x = nz_height, by = nz, FUN = mean) # 16 features
nz %>%
st_join(nz_height) %>%
group_by(Name) %>%
summarize(n = sum(is.na(t50_fid) == FALSE)) %>% # count valid point id's per region
filter(n == 0) # nine out of 16 regions have no highest points
nz_avheight2 = nz %>%
st_join(nz_height) %>%
group_by(Name) %>%
summarize(elevation = mean(elevation, na.rm = TRUE)) # na.rm = TRUE for those regions with no point
Second example: Spatial join of spatially incongruent polygons: skip for now.
agg_aw = st_interpolate_aw(incongruent[, "value"], aggregating_zones,
extensive = TRUE)
agg_aw$value
Related resources:
- Regions/districts in New Zealand
- epsg 2193
- New Zealand Transverse Mercator 2000
- dplor::top_n - grepl on RDocumentation - grepl with examples
# Check whether input data sets are projected
st_crs(nz_height)
st_crs(nz) # the original source of the centerbury sf object
st_crs(canterbury)
nz_heighest = nz_height %>% top_n(n = 1, wt = elevation)
canterbury_centroid = st_centroid(canterbury)
st_distance(nz_heighest, canterbury_centroid) # returns a matrix for every possible pair of two points
co = filter(nz, grepl("Canter|Otag", Name))
st_distance(nz_height[1:3, ], co) # if a point is within a polygon, its distance to the polygon is zero
plot(st_geometry(nz), col = "grey")
plot(st_geometry(co)[2], col = "white", add = TRUE) # only Otago
plot(st_geometry(nz_height)[1:3], pch = 4, lwd = 0.5, col = "blue", add = TRUE)
plot(st_geometry(co)[2], col = "white") # only Otago
plot(st_geometry(nz_height)[1:3], pch = 4, lwd = 0.5, col = "blue", add = TRUE)
plot(st_geometry(nz_height)[2:3], pch = 3, lwd = 1.5, col = "red", add = TRUE)
Skip
It was established in Section 4.2 that Canterbury was the region of New Zealand containing most of the 100 highest points in the country. How many of these high points does the Canterbury region contain?
Which region has the second highest number of nz_height points in, and how many does it have?
Generalizing the question to all regions: how many of New Zealand’s 16 regions contain points which belong to the top 100 highest points in the country? Which regions?
Use data(dem, package = "RQGIS"), and reclassify the elevation in three classes: low, medium and high. Secondly, attach the NDVI raster (data(ndvi, package = "RQGIS")) and compute the mean NDVI and the mean elevation for each altitudinal class.
Apply a line detection filter to raster(system.file("external/rlogo.grd", package = "raster")). Plot the result. Hint: Read ?raster::focal().
Calculate the NDVI of a Landsat image. Use the Landsat image provided by the spDataLarge package (system.file("raster/landsat.tif", package="spDataLarge")).
A StackOverflow post shows how to compute distances to the nearest coastline using raster::distance(). Retrieve a digital elevation model of Spain, and compute a raster which represents distances to the coast across the country (hint: use getData()). Second, use a simple approach to weight the distance raster with elevation (other weighting approaches are possible, include flow direction and steepness); every 100 altitudinal meters should increase the distance to the coast by 10 km. Finally, compute the difference between the raster using the Euclidean distance and the raster weighted by elevation. Note: it may be wise to increase the cell size of the input raster to reduce compute time during this operation.