Disclaimer: The contents of this document come from Chapter 3. Attribute data operations, Chapter 4. Spatial data operations, and Chapter 5. Geometry operations of Geocomputation with R(Lovelace, Nowosad, & Muenchow, 2019). This document is prepared for CP6521 Advanced GIS, a graduate-level city planning elective course at Georgia Tech in Spring 2019. For any question, contact the instructor, Yongsung Lee, Ph.D. via yongsung.lee(at)gatech.edu.
This document is also published on RPubs.
install.packages("tidyverse", dependencies = TRUE)
install.packages("stringr", dependencies = TRUE)
install.packages("sf", dependencies = TRUE)
install.packages("raster", dependencies = TRUE)
install.packages("spData", dependencies = TRUE)
install.packages("devtools") # for this, you need Rtools installed on your machine 
devtools::install_github("Nowosad/spDataLarge")
# install.packages("sp", dependencies = TRUE)
# install.packages("mapview", dependencies = TRUE)
# install.packages("tmap", dependencies = TRUE)
library(tidyverse)
library(stringr)     # for working with strings (pattern matching)
library(sf)          # classes and functions for vector data
library(raster)      # classes and functions for raster data
library(spData)      # load geographic data
library(spDataLarge) # load larger geographic data
# library(sp)
# library(mapview)
# library(tmap)

Learning Objectives

What we do:
1. Learn how to interact with raster data
- Extract/modify cell values
- Subset a raster object
- Summarize/visualize cell values
2. Learn how to spatially manipulate raster data
- Spatially subset a raster object
- Map algebra: local, focal, zonal, & global operations
- Merge raster objects

3.3 Manipulating raster objects

In contrast to the vector data model underlying simple features (which represents points, lines and polygons as discrete entities in space), raster data represent continuous surfaces.

elev = raster(nrows = 6, ncols = 6, res = 0.5,
              xmn = -1.5, xmx = 1.5, ymn = -1.5, ymx = 1.5,
              vals = 1:36)

Raster objects can also contain categorical values of class logical or factor variables in R.

grain_order = c("clay", "silt", "sand")
grain_char = sample(grain_order, 36, replace = TRUE) # randomly generate 36 values out of grain_order 
grain_fact = factor(grain_char, levels = grain_order) # convert data type from character to factor 

typeof(grain_order)
typeof(grain_char)
typeof(grain_fact)
class(grain_fact)

grain = raster(nrows = 6, ncols = 6, res = 0.5, 
               xmn = -1.5, xmx = 1.5, ymn = -1.5, ymx = 1.5,
               vals = grain_fact)

The raster object stores the corresponding look-up table or “Raster Attribute Table” (RAT) as a data frame in a new slot named attributes.

str(grain)
grain@data@attributes # extract slot elements 

levels(grain) # list 
levels(grain)[[1]] # data.frame, before change  

levels(grain)[[1]] <- 
  cbind(levels(grain)[[1]], wetness = c("wet", "moist", "dry")) # add a new column to Raster Attribute Table 
levels(grain)[[1]] # data.frame, after change  

Change factor values to their actual meanings

# extract(i.e., subset) cell values by a numeric vector (cell id)
grain[c(1, 11, 35)] 

# change above factor values to their meanings  
factorValues(grain, grain[c(1, 11, 35)])
factorValues(grain, c(1, 11, 35)) # the second argument should be a vector of possible factor values 
3.3.1 Raster subsetting

Raster subsetting is done with the base R operator [, which accepts a variety of inputs:
- Row-column indexing
- Cell IDs
- Coordinates (in Chapter 4) - Another raster object (in Chapter 4)

elev[1, 1]
elev[1]
r_stack = stack(elev, grain)
names(r_stack) = c("elev", "grain")

str(grain)
str(r_stack)
# three ways to extract a layer of a stack
raster::subset(r_stack, "elev")
r_stack[["elev"]] # RasterStack becomes RasterLayer 
r_stack$elev

# update individual cell values 
elev[] # all values, before change 
elev[1, 1] <- 0 # assign a new value 
elev[] # all values, after change 
3.3.2 Summarizing raster objects
# summarize 
cellStats(elev, sd)
?cellStats # check the full list of avaliable summary statistics 
summary(elev[])
elev %>% values() %>% summary()
elev %>% getValues() %>% summary()

# histogram 
hist(elev)

elev.df <- as.data.frame(elev[]) #ggplot takes only data.frames, not vectors 
names(elev.df) <- "elev"
ggplot(data = elev.df) +
  geom_histogram(aes(x = elev), color = "white", binwidth = 5)

3.4. Exercises

  1. Create a raster from scratch with nine rows and columns and a resolution of 0.5 decimal degrees (WGS84). Fill it with random numbers. Extract the values of the four corner cells.

    14a. Change the seed number to 2020 and extract those values in the four corners.
    14b. Change the seed number to 2030 and extract those values in the four corners.
    14c. Instead of vals = sample(81), use rnorm(n = 81), and compute max, min, and median values (Use set.seed(2019).
    14d. Create 20 by 20 raster, instead of 9 by 9. With runif(n = 4000), compute max, min, and median values (Use set.seed(2019)). You need to change xmn, ymn, xmx, and ymx accordingly (Use the same res value).

Use Google Forms here: https://goo.gl/forms/643wQk2qjGsYzK4K3

set.seed(2019)
raster1 <- raster(
  nrows = 9, ncols = 9, res = 0.5, 
  xmn= -2.25, ymn = -2.25, xmx = 2.25, ymx = 2.25, 
  vals = sample(81))

plot(raster1)  
text(raster1)

x <- ncol(raster1)
y <- nrow(raster1)

raster1[1,1] # top left
raster1[1,x] # top right
raster1[y,1] # bottom left 
raster1[y,x] # bottom right 
  1. What is the most common class of our example raster grain (hint: modal())?
plot(grain)
text(grain)

levels(grain)
grain %>% values() %>% table()
grain %>% getValues() %>% table()
  1. Plot the histogram and the boxplot of the data(dem, package = "RQGIS") raster.
# read in the data 
install.packages("RQGIS", dependencies = TRUE)
library(RQGIS)
data(dem, package = "RQGIS")

# base R functions: execute the following three lines together 
# https://www.rdocumentation.org/packages/graphics/versions/3.5.2/topics/par
par(mfrow = c(1, 2))
hist(dem)
boxplot(dem)

# ggplot functions 
dem.df <- as.data.frame(values(dem))
names(dem.df) <- "elev"

ggplot(data = dem.df) + 
  geom_histogram(aes(x = elev), color = "white", binwidth = 50)

ggplot(data = dem.df) + 
  geom_boxplot(aes(y = elev)) +
  theme_classic()
  1. Now attach also data(ndvi, package = "RQGIS"). Create a raster stack using dem and ndvi, and make a pairs() plot.
# read in the data 
install.packages("RQGIS", dependencies = TRUE)
library(RQGIS)
data(ndvi, package = "RQGIS")
?ndvi
plot(ndvi)

r_stack_mongon <- stack(dem, ndvi)
names(r_stack_mongon) 
pairs(r_stack_mongon)

4.3 Spatial operations on raster data

4.3.1 Spatial subsetting

Subset raster by location (coordinates)

id = cellFromXY(elev, xy = c(0.1, 0.1))
?cellFromXY
elev[id]
# the same as 
raster::extract(elev, data.frame(x = 0.1, y = 0.1))
?raster::extract

Subsect raster by another raster: target raster[subsetting raster]

clip = raster(nrows = 3, ncols = 3, res = 0.3, xmn = 0.9, xmx = 1.8, 
              ymn = -0.45, ymx = 0.45, vals = rep(1, 9)) # repeat 1 for 9 times 
elev[clip] # target raster[subsetting raster]

plot(elev)
plot(clip, add = TRUE, alpha = 0.75)

If we want a new raster object instead of raster values, use drop = FALSE

plot(elev[1:2, drop = FALSE])    # spatial subsetting with cell IDs 
elev[1:2]       # returns cell values 
plot(elev[1:2]) # plots two points without values 

plot(elev[1, 1:2, drop = FALSE]) # spatial subsetting by row, column indices 
elev[1, 1:2]       # returns cell values 
plot(elev[1, 1:2]) # plots two points without values 

Create a masking raster on the fly and use it on the target raster: also use mask() and overlay()

# check the raw input 
plot(elev)

# create subsetting raster 
rmask <- elev
values(rmask) <- sample(c(NA, TRUE), 36, replace = TRUE)
plot(rmask)  # only 1 or missing

# the following three scripts return the same output
elev[rmask, drop = FALSE] %>% plot()         # with [ operator 
mask(elev, rmask) %>% plot()                 # with mask()
overlay(elev, rmask, fun = "max") %>% plot() # with overlay()
4.3.2 Map algebra

If two or more raster datasets share the same extent, projection, and resolution (i.e., one-to-one locational correspondence), one could treat them as matrices for raster processing. In this case, raster processing gets really fast because in the process, we do not check their geographic coordianates. Four types of raster operations (divided by the number of cells involved in cacluation):

  1. Local or per-cell operations
  2. Focal or neighborhood operations
  3. Zonal operations, similar to focal operations but can take on any irregular size/shape as the basis for calculation
  4. Gloabl or per-raster operations returns a single value from one or several entire raster(s)
4.3.3 Local operations

Cell-by-cell operations in one or several layers

# construct a reclasification matrix 
rcl <- matrix(c(0, 12, 1, 12, 24, 2, 24, 36, 3), ncol = 3, byrow = TRUE) 

recl <- reclassify(elev, rcl = rcl)
# in each row, the lower bounds not included, but the upper bound included  
# check to which "new"" value the "old" value 12 is reclassified 

plot(elev) # before reclassification 
plot(recl) # after reclassification, note that the upper left corner has the value of zero. 

Basic arithmetic operations

elev + elev 
elev^2 
log(elev)
elev > 5 # the outcome is a Boolean raster (TRUE or FALSE)
plot(elev > 5)

Other examples: the normalized difference vegetation index (NDVI) and predictive mapping (e.g., by using lm and glm)

4.3.4 Focal operations

Focal operations take into account a central cell and its neighbors.
- The neighborhood (e.g., kernel, filter, or moving window) under consideration is typically of size 3-by-3 cells.
- The neighborhood can be in non-rectangular shapes.
- A focal operation applies an aggregation function to all cells within the specified neighborhood.
- Also called spatial filtering and convolution

r_focal <- focal(elev, w = matrix(1, nrow = 3, ncol = 3), fun = min)
plot(elev)    # input
plot(r_focal) # output 
?focal
  • focal() provides options for low-pass/smoothing (i.e., removing extreme values) and high-passing (e.g., accentuating features and detecting lines). See examples.
  • terrain() calculates the slope, aspect, and flow directions of individual cells, but not all common operations are available in R. R allows to access desktop GIS functionality from within R (see Chapter 9, but we may not cover this chapter in the class).
4.3.5 Zonal operations

Zonal filters can take on any shape instead of just a predefined windonw.

# check the inputs 
plot(elev)
plot(grain)
# calculate statistics for individual zones 
z <- zonal(elev, grain, fun = "mean") %>% # zonal returns a matrix 
  as.data.frame()
z
4.3.6 Global operations and distances

A special case of zonal operations with the entire raster dataset representing a single zone. Examples are:
- Descriptive statistics for the entire raster dataset
- The distance from each cell to a specific target cell: e.g., distance to the nearest coast
- Visibility and viewshed (a viewshed raster in Chapter 9)

4.3.7 Merging rasters

Merge two rasters that cover neighborhing areas. If they overlap, merge returns values from the first raster.

Useful links:
- getData:https://www.rdocumentation.org/packages/raster/versions/2.8-19/topics/getData
- About SRTM: http://opentopo.sdsc.edu/raster?opentopoID=OTSRTM.082015.4326.1
- For more details: Remote Sensing and GIS for Ecologists: Using Open Source Software

raster::ccodes()
aut <- getData("alt", country = "AUT", mask = TRUE)
ch  <- getData("alt", country = "CHE", mask = TRUE)
aut_ch <- merge(aut, ch) 
plot(aut)
plot(ch)
plot(aut_ch)

4.4 Exercises

  1. Use data(dem, package = "RQGIS"), and reclassify the elevation in three classes: low, medium and high. Secondly, attach the NDVI raster (data(ndvi, package = "RQGIS")) and compute the mean NDVI and the mean elevation for each altitudinal class.

    4a. Change the number of classes to five, and compute mean values for each class. Report the mean for dem from the middle class (i.e., 3rd class).
    4b. Still with five classes, report the mean for ndvi from the same class (i.e., 3rd class).
    4c. Now, change the number of classes to seven, and compute mean values for each class. Report the median for dem from the 1st class (i.e., one with the lowest values in dem).
    4d. Still with seven classes, report the median for ndvi from the same class (i.e., one with the lowest values in dem).

Use Google Forms here: https://goo.gl/forms/643wQk2qjGsYzK4K3 (the same link as the above)

library(RQGIS)
data(dem, package = "RQGIS")
plot(dem)
summary(dem)

library(classInt) # extract cutoff values 
?classIntervals
classIntervals(values(dem), n = 3) %>% str() # check its structure 

brk <- classIntervals(values(dem), n = 3)$brk # extract the second list element, brk 
rcl <- matrix(c(brk[1]-1, brk[2], 1, brk[2], brk[3], 2, brk[3], brk[4], 3), ncol = 3, byrow = TRUE) 
#in the above line, for each row, the lower-bound value is not included 
recl <- reclassify(dem, rcl = rcl)
plot(recl)
summary(recl)

data(ndvi, package = "RQGIS")
plot(ndvi)
summary(ndvi)

# for two rasters, stack them first, and compute zonal statistics together 
zonal(stack(dem, ndvi), recl, fun = "mean") # base R way 
stack(dem, ndvi) %>% 
  zonal(recl, fun = "mean") # with the pipe
  1. Apply a line detection filter to raster(system.file("external/rlogo.grd", package = "raster"))). Plot the result. Hint: Read ?raster::focal.

Useful links:
- An example of the Sobel filter: https://mgimond.github.io/Spatial/raster-operations-in-r.html#focal-operations-and-functions-1

?raster::focal
# before applying a filter
raster(system.file("external/rlogo.grd", package = "raster")) %>% plot()  
# create a filter for line detection 
sobel <- matrix(c(-1,0,1,-2,0,2,-1,0,1) / 4, nrow=3) 
# after applying the filter 
# inside the plot function, all values are divided into two groups, each of which gets either white or black 
raster(system.file("external/rlogo.grd", package = "raster")) %>% focal(w=sobel, fun=sum) %>% plot(col = c("white", "black")) 
  1. Calculate the NDVI of a Landsat image. Use the Landsat image provided by the spDataLarge package (system.file("raster/landsat.tif", package="spDataLarge")).

Useful links:
- How to assign RGB to which bands? # https://landsat.usgs.gov/what-are-band-designations-landsat-satellites
- How to rescale each band to 0 to 255? https://www.neonscience.org/dc-multiband-rasters-r

file <- system.file("raster/landsat.tif", package="spDataLarge")
multiband.raster <- stack(file)
multiband.raster
plotRGB(multiband.raster, r = 3, g = 2, b = 1, stretch = "hist")
# "landsat.4" - Near Infrared (NIR)
# "landsat.3" - Red
ndvi <- (multiband.raster[["landsat.4"]] - multiband.raster[["landsat.3"]]) / 
  (multiband.raster[["landsat.4"]] + multiband.raster[["landsat.3"]])
plot(ndvi)
  1. A StackOverflow post shows how to compute distances to the nearest coastline using raster::distance(). Retrieve a digital elevation model of Spain, and compute a raster which represents distances to the coast across the country (hint: use getData()). Second, use a simple approach to weight the distance raster with elevation (other weighting approaches are possible, include flow direction and steepness); every 100 altitudinal meters should increase the distance to the coast by 10 km. Finally, compute the difference between the raster using the Euclidean distance and the raster weighted by elevation. Note: it may be wise to increase the cell size of the input raster to reduce compute time during this operation.

Useful links:
- getData: https://www.rdocumentation.org/packages/raster/versions/2.8-19/topics/getData

# read in DEM for Spain 
raster::ccodes() %>%
  filter(NAME %in% "Spain") 
dem <- getData("alt", country = "ESP", mask = TRUE)
plot(dem)

# reduce resolution for faster processing
agg = aggregate(dem, fact = 10)    
plot(agg)

# put a boundary sp object 
# "GADM" global administrative boundaries
poly = getData("GADM", country = "ESP", level = 1)  
# poly 
# poly@data
plot(agg)
plot(poly, add = TRUE)

# create a Boolean raster 
is.na(agg) 
plot(is.na(agg))
dist <- is.na(agg)
cellStats(dist, summary) # quick summary stat for rasters 

# compute distance from land to the nearest shore 
# raster::distance computes the distance to the nearest non-NA cell
# convert land cells into NAs and sea cells into 1s
dist[dist == FALSE] <- NA
dist[dist == TRUE] <- 1
plot(dist) 

# compute distance to nearest non-NA cell
dist = raster::distance(dist)
plot(dist)
plot(poly, add = TRUE)
# just keep Spain
dist = mask(dist, poly)
# convert distance into km
dist = dist / 1000  
plot(dist)
plot(poly, add = TRUE)

# now let's weight each 100 altitudinal meters by an additionaly distance of 10 km
agg <- mask(agg, poly)
agg[agg < 0] = 0 # its unit is meter 
weight = dist + agg / 100 * 10

# plot the difference between two approaches 
plot(weight - dist)
plot(poly, add = TRUE)

5.3. Geometric operations on raster data

To be continued Next week

5.4. Raster-vector interactions

To be continued Next week

5.5 Exercises