Disclaimer: The contents of this document come from Chapter 2. Geographic data in R of Geocomputation with R(Lovelace, Nowosad, & Muenchow, 2019). This document is prepared for CP6521 Advanced GIS, a graduate-level city planning elective course at Georgia Tech in Spring 2019. For any question, contact the instructor, Yongsung Lee, Ph.D. via yongsung.lee(at)gatech.edu.
This document is also published on RPubs.
install.packages("tidyverse", dependencies = TRUE)
install.packages("sf", dependencies = TRUE)
install.packages("raster", dependencies = TRUE)
install.packages("spData", dependencies = TRUE)
if (!require("devtools")) install.packages("devtools")
devtools::install_github("Nowosad/spDataLarge")
install.packages("sp", dependencies = TRUE)
library(tidyverse)
library(sf)          # classes and functions for vector data
library(raster)      # classes and functions for raster data
library(spData)      # load geographic data
library(spDataLarge) # load larger geographic data
library(sp)

Introduction

What we do:

  1. Learn how R handles two geographic data models, vector and raster.

Why we do:

  1. Most geoprocessing operations can be done in R. To do these operations, we need to understand how R handles the geographic data models.

Basic terms/concepts:

  1. List column: In a data frame, the individual elements of a colum are lists.

2.1 Introduction

vignette(package = "sf") # see which vignettes are available
vignette("sf1")          # an introduction to the package

2.2 Vector data

world
class(world)
str(world)
names(world)
colnames(world)
summary(world["lifeExp"])
plot(world)

world_mini = world[1:2, 1:3]
world_mini

Why simple features?

world_sp = as(world, Class = "Spatial")
class(world_sp)
world_sp

world_sf = st_as_sf(world_sp, "sf")
class(world_sf)
world_sf
plot(world[3:6])     # multiple maps one for each variable
plot(world["pop"])   # treated as sf 
plot(world[["pop"]]) # treated as non-spatial data frame 
world_asia = world[world$continent == "Asia", ]
asia = st_union(world_asia)
plot(world["pop"], reset = FALSE)    # reset: keep the plot in a mode that allows adding further map elements
plot(asia, add = TRUE, col = "red")  # add: add to current plot? https://r-spatial.github.io/sf/reference/plot.html

plot(world["continent"], reset = FALSE)
cex = sqrt(world$pop) / 10000
# class(cex)
# length(cex)
world_cents = st_centroid(world, of_largest = TRUE)   # compute the centroids of countries 
# world_cents
# nrow(world_cents)
plot(st_geometry(world_cents), add = TRUE, cex = cex) # cex controls the size of circles 
india = world[world$name_long == "India", ]
# india
# names(india)
# class(india)
# plot(india, max.plot = 10)      # one for each variable 
# plot(st_geometry(india))        # st_geometry(.) only plots geometries

# compare with c(0, 0, 0, 0)
# https://r-spatial.github.io/sf/reference/plot.html
plot(st_geometry(india), expandBB = c(0, 0.2, 0.1, 1), col = "gray", lwd = 3) 
plot(world_asia[0], add = TRUE) # use of [0] to keep only the geometry column

map_chr(world, typeof)       # type of individual columns 
map_chr(world, class)        # returns an error, because the last column returns more than one value 
map(world, class)            # class of individual columns: "sfc" (simple feature columns)

map_chr(world$geom, typeof)  # type of individual elements of the geom column, which is a list   
map_chr(world$geom, class)   # returns an error, because individual elements of the geom column returns more than one value 
map(world$geom, class)       # class of individual elements of the geom column: "sfg" (simple feature geometries)

world[1, ]                   # subsets a sf: the first observation, Fiji 
plot(world[1, ][0])
# plot(st_geometry(world[1, ]),col = "red", lwd = 3) 
# plot(world[world$continent == "Oceania", ][0], add = TRUE)

class(world[1, ]) 
world$geom[1]                # returns a sfc (e.g., list[i] returns a sublist)
class(world$geom[1])                 
world$geom[[1]]              # returns a sfg (e.g., list[[i]] returns the element in the first position)
class(world$geom[[1]])

2.3 Raster data

Raster maps usually represent continuous phenomena such as elevation, temperature, population density or spectral data.

vignette("functions", package = "raster")

A LasterLayer is the basic raster data model in R.

raster_filepath = system.file("raster/srtm.tif", package = "spDataLarge")
new_raster = raster(raster_filepath)
new_raster # a digital elevation model, or DEM of an area in Zion National Park (Utah, USA)
dim(new_raster)        # the number of rows, columns and layers
ncell(new_raster)      # the number of cells (pixels)
res(new_raster)        # the raster’s spatial resolution, three arc-seconds, or 1/3600*3 about 90 meters at the equator
extent(new_raster)     # its spatial extent
crs(new_raster)        # its coordinate reference system
inMemory(new_raster)   # whether the raster data is stored in memory (the default) or on disk
help("raster-package") # Overview of the functions in the raster package
plot(new_raster)
new_raster2 = raster(nrows = 6, ncols = 6, res = 0.5,              # by default, crs = WGS84
                     xmn = -1.5, xmx = 1.5, ymn = -1.5, ymx = 1.5,
                     vals = 1:36)
class(new_raster2)
plot(new_raster2)

A RasterBrick consists of multiple layers, which typically correspond to a single multispectral satellite file or a single multilayer object in memory.

multi_raster_file = system.file("raster/landsat.tif", package = "spDataLarge")
r_brick = brick(multi_raster_file)
r_brick
nlayers(r_brick)

A RasterStack allows you to connect several raster objects stored in different files or multiply objects in memory.

raster_on_disk = raster(r_brick, layer = 1)
# nlayers(raster_on_disk)
raster_in_memory = raster(xmn = 301905, xmx = 335745,
                          ymn = 4111245, ymx = 4154085, 
                          res = 30)
values(raster_in_memory) = sample(seq_len(ncell(raster_in_memory))) # seq_len generates an integer vector from 1:n
                                                                    # sample generates an integer vector, randomly sorted 
crs(raster_in_memory) = crs(raster_on_disk)
r_stack = stack(raster_in_memory, raster_on_disk)                   # put two raster layers together 
r_stack

2.4 Coordinate Reference Systems (CRS)

The Coordinate Reference System (CRS) Defines how the spatial elements of the data related to the surface of the Earth.

2.4.1 Geographic coordinate systems (GCS)

Geographic coordinate systems identify any location on the Earth’s surface using two values — longitude and latitude.

The datum contains information on what ellipsoid to use (with the ellps parameter in the PROJ CRS library) and the precise relationship between the Cartesian coordinates and location on the Earth’s surface.

There are two types of datum — local and geocentric. In a local datum such as NAD83 the ellipsoidal surface is shifted to align with the surface at a particular location. In a geocentric datum such as WGS84 the center is the Earth’s center of gravity and the accuracy of projections is not optimized for a specific location.

st_proj_info(type = "datum")

Local vs. Earth-Centered Datums(Source:Humboldt State University)

2.4.2 Projected coordinate reference systems (PCS)

Projected CRSs are based on Cartesian coordinates on an implicitly flat surface. They have an origin, x and y axes, and a linear unit of measurement such as meters. All projected CRSs are based on a geographic CRS, and rely on map projections to convert the three-dimensional surface of the Earth into Easting and Northing (x and y) values in a projected CRS.

  • Transition introduces some distortion.
  • A PCS preserves one or two properties among area, direction, distance, and shape.
  • Projections are often named based on a property they preserve:

    • Equal-area preserves area
    • Azimuthal preserve direction
    • Equidistant preserve distance
    • Conformal preserve local shape
st_proj_info(type = "proj")

Map Projections Explained (Source: GEO awesomeness)

An interesting xkcd post: What your favorite map projection says about you

2.4.3. CRSs in R

Two ways of storing CRSs in R.

  1. an epsg code: less flexible (i.e., preset CRSs)

  2. a proj4string definition: more flexible (i.e. user-defined parameters)

# a data frame of available projections
crs_data = rgdal::make_EPSG()
crs_data %>%
  filter(grepl("NAD83", note)) %>% # grepl: return a logical vector of elements that contain "NAD83"
  filter(grepl("Georgia West", note))
vector_filepath = system.file("vector/zion.gpkg", package = "spDataLarge")
new_vector = st_read(vector_filepath)
st_crs(new_vector)
new_vector = st_set_crs(new_vector, 4326) # set CRS

Figure 2.13: Examples of geographic (WGS 84; left) and projected (NAD83 / UTM zone 12N; right) coordinate systems for a vector data type.

# for a raster object 
projection(new_raster)
# raster objects only accept proj4 definitions
projection(new_raster) = "+proj=utm +zone=12 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 
                            +units=m +no_defs" # set CRS

Figure 2.14: Examples of geographic (WGS 84; left) and projected (NAD83 / UTM zone 12N; right) coordinate systems for raster data.

2.5 Units

Skip

2.6 Exercises (These are not your homework. Weekly HW#4 will be available soon.)

  1. Use summary() on the geometry column of the world data object. What does the output tell us about:

    • Its geometry type?
    • The number of countries?
    • Its coordinate reference system (CRS)?
  2. Run the code that ‘generated’ the map of the world in Figure 2.5 at the end of Section 2.2.4. Find two similarities and two differences between the image on your computer and that in the book.

    • What does the cex argument do (see ?plot)?
    • Why was cex set to the sqrt(world$pop) / 10000?
    • Bonus: experiment with different ways to visualize the global population.
  3. Use plot() to create maps of Nigeria in context (see Section 2.2.4).

    • Adjust the lwd, col and expandBB arguments of plot().
    • Challenge: read the documentation of text() and annotate the map.

  1. Create an empty RasterLayer object called my_raster with 10 columns and 10 rows. Assign random values between 0 and 10 to the new raster and plot it.

  2. Read-in the raster/nlcd2011.tif file from the spDataLarge package. What kind of information can you get about the properties of this file?