Overlay Analysis in R

Overview

You are probably well aware that, unfortunately, the Brazilian Amazon has been experiencing rapid deforestation for several decades, with the exception of a brief pause during stepped up monitoring and enforcement during roughly 2005 through 2012, corresponding to reforms undertaken under the first Da Silva administration.

In this tutorial, we will getting some practice using some geoprocessing tools in R in the context of analyzing the drivers of Amazonian forest loss. It may be helpful to refer back to the slides on geoprocessing to refresh your memory about the different operations we will be conducting.

The Objectives

These are the main operations we will be working with in this tutorial:

  • Reading vector layers into R
  • Making basic plots with vector layers
  • Reprojecting vector layers
  • Simplifying vector geometries
  • Clipping
  • Unioning
  • Buffering
  • Erasing
  • Computing areas

The Data

Here are the datasets we will be using, along with their sources:

These data can be found in this zip file.

The Packages

As we learned in our introduction to geospatial data in R, the sf library is now the primary workhorse for manipulating vector data. We are also going to use a package called rmapshaper, which has a few geoprocessing tools that work a bit better than their counterparts in the sf package.

Let’s install and activate the packages we’ll be using for this tutorial:

require(tidyverse)
Loading required package: tidyverse
Warning: package 'ggplot2' was built under R version 4.3.3
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
require(sf)
Loading required package: sf
Linking to GEOS 3.11.2, GDAL 3.7.2, PROJ 9.3.0; sf_use_s2() is TRUE
require(rmapshaper)
Loading required package: rmapshaper
Warning: package 'rmapshaper' was built under R version 4.3.3

Reading in Vector Data with the sf Package

Just like with the terra package, we have to tell R where to look for the files holding any vector data we want to use. As before, it is easiest to do this by locating our code in the same folder as all of our data, setting our working directory accordingly, and then pointing R to the datasets we want to read in.

First, we can set the working directory. Remember that you can always use Session -> Set Working Directory -> To Source File Location to quickly find the path to the folder where your code and data are hanging out:

setwd("G:/My Drive/My Classes/GIS for Global Environment/Labs/Overlay")

Now let’s just read in one vector layer and plot it using the sf package. To read in vector data in the sf package, we use the st_read() function:

brazil <- st_read("Brazil.gpkg")
Reading layer `world_map' from data source 
  `G:\My Drive\My Classes\GIS for Global Environment\Labs\Overlay\Brazil.gpkg' 
  using driver `GPKG'
Simple feature collection with 1 feature and 6 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -74.01847 ymin: -33.74228 xmax: -29.31688 ymax: 5.267225
Geodetic CRS:  WGS 84
plot(brazil)

As you may remember from last time, if we just use the basic plot() function with an sf object, it will plot up to the first nine variables in the layer.

Task 1: Now that we know how to read in a layer, let’s try reading in the boundaries of the legal Amazon. Using the code above as an example, read in the Legal_Amazon.gpkg file and assign it to an object called leg_amazon. Then plot() it. The output should look like this:

Reading layer `limites_amazonia_legal_2022' from data source 
  `G:\My Drive\My Classes\GIS for Global Environment\Labs\Overlay\Legal_Amazon.gpkg' 
  using driver `GPKG'
Simple feature collection with 1 feature and 2 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -73.99045 ymin: -18.0416 xmax: -44 ymax: 5.271841
Geodetic CRS:  SIRGAS 2000

Basic Plotting with the sf Package

Now that we have two layers in memory, we can make some basic maps. Later in the lab, we’ll look at how to make more polished maps using the ggplot2 and ggspatial packages, but for right now we’ll just use the basic tools, which are helpful for doing quick checks on what your code is doing as you work through a project.

We’ll start by plotting just the boundaries Brazil. Because we’ll be adding another layer to the plot, we want to make sure that sf doesn’t plot all the variables in the brazil object. Here’s a way to do that:

plot(st_geometry(brazil))

Now, if we want to plot the outline of the Legal Amazon on top of this, we can’t just use the same plot command, because it’ll just create a new plot of only the Legal Amazon:

plot(st_geometry(leg_amazon))

Remember that this will only work if you already read in the Legal_Amazon.gpkg file and assigned it to an object called leg_amazon.

So we need a way to add multiple layers to a same plot. Here’s how we do that:

plot(st_geometry(brazil))
plot(st_geometry(leg_amazon), add=TRUE)

By including the add=TRUE argument, we’re telling R that we want to plot the second layer on top of the first.

But now we have the problem that it’s a bit unclear what we’re seeing - for all we know, that black line could just be a river or something. Fortunately, we have a way to set the color for different layers:

plot(st_geometry(brazil), col="gray")
plot(st_geometry(leg_amazon), col="darkgreen", add=TRUE)

That’s better! You might also have noticed that one cool thing about doing this in RStudio is that it will actually highlight color words with the color that will show up, giving you a bit of a preview of what the plot will look like.

There are lots of ways to reference colors in R, including commonly used RGB and Hex codes. When starting out, though, I tend to like to just use the color words. You can find all the color words for R at this link.

Task 2: Read in the IFL_Brazil_2020.gpkg file and assign it to an object called ifl_2020. Then read in the National_Roads_Brasil_Amazon.gpkg file and assign it to an object called nat_roads. Then create a plot showing the Legal Amazon in beige, the 2020 Intact Forest Landscapes in dark green, and the national roads in blue. The result should look something like this (be patient - these files are a bit large and can take a moment to read into memory):

Reading layer `clipped' from data source 
  `G:\My Drive\My Classes\GIS for Global Environment\Labs\Overlay\IFL_Brazil_2020.gpkg' 
  using driver `GPKG'
Simple feature collection with 206 features and 2 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -74.01847 ymin: -25.58273 xmax: -43.16131 ymax: 5.267225
Geodetic CRS:  WGS 84
Reading layer `vias_nacional' from data source 
  `G:\My Drive\My Classes\GIS for Global Environment\Labs\Overlay\National_Roads_Brasil_Amazon.gpkg' 
  using driver `GPKG'
Simple feature collection with 41089 features and 17 fields
Geometry type: MULTILINESTRING
Dimension:     XYZ
Bounding box:  xmin: -79.54228 ymin: -20.31399 xmax: -43.97888 ymax: 10.01478
z_range:       zmin: 0 zmax: 0
Geodetic CRS:  SIRGAS 2000

Clipping in the `rmapshaper`` Package

You’ve probably noticed that the nat_roads layer is operating with a different definition of the extent of the Amazon than we are. First, it covers the whole Amazon, not just the part in Brazil. Second, even in Brazil, it extends outside the official Legal Amazon.

For the rest of this tutorial, we’ll focus our analysis just on the area in Brazil’s Legal Amazon. That means we’re going to need to clip some of our other layers to fit.

The function in rmapshaper for clipping is just ms_clip():

nat_roads_brazil <- ms_clip(nat_roads, leg_amazon)

Then we can have a look at our plots again:

Now let’s move on to some of our analysis. As we discussed at the outset, we want to look at how the loss of intact forest in the Legal Amazon is affected by roads and indigenous territories. That means we need to compare the intact forest landscapes as of 2000 with those in 2020.

Task 3: Read in the IFL_Brazil_2000.gpkg file and assign it to an object called ifl_2000. Plot it in blue Then plot the 2020 on top of it in dark green. To make the plot clearer, you can add in an argument, border = NA to the plot() function to keep R from drawing the borders. Discuss what you see. Where were the losses of intact forest landscape in the Legal Amazon between 2000 and 2020? The results should look something like this:

Reading layer `ifl_brazil_2000' from data source 
  `G:\My Drive\My Classes\GIS for Global Environment\Labs\Overlay\IFL_Brazil_2000.gpkg' 
  using driver `GPKG'
Simple feature collection with 239 features and 2 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -74.01847 ymin: -25.58273 xmax: -41.52281 ymax: 5.267225
Geodetic CRS:  WGS 84

So it’s those blue areas that we’re really interested in, because those are the areas of intact forest landscape that were lost between 2000 and 2020.

Simplifying polygons in the rmapshaper Package

Before we continue with our analysis, we have to deal with a quick problem.

The intact forest landscape layers feature a common problem we encounter with data that have been produced by vectorizing raster layers - they have a very high vertex to area ratio. In other words, the polygons have really jagged edges.

This not only dramatically increases the file size and computational load when we’re working with these layers, it also can result in creating a bunch of invalid polygons when we use geoprocessing tools on them.

One way to mitigate these problems is to use simplification algorithms. ms_simplify() in rmapshaper, for example, uses a pretty sophisticated algorithm to remove vertices in order to shrink the detail in a layer while maintaining the overall shapes and neighborhood (or topological) relationships among the polygons. While using tools will sacrifice some accuracy, it will reduce our computational load and reduce the risk of invalid polygons later on.

ms_simplify() takes up to three arguments: first, the layer you want to simplify, then, the proportion of the original vertices you want to keep, and, finally, the tolerance, which is the distance, in the units of the projection, that you are willing to allow things to be moved.

ifl_2000 <- ms_simplify(ifl_2000, keep = 0.25)
ifl_2020 <- ms_simplify(ifl_2020, keep = 0.25)

Reprojecting in the sf Package

Because at the end of the day we want to be able to do some area calculations, we need to address the fact that the intact forest areas layers are not projected. We can see the CRS of any sf object using the st_crs() function:

st_crs(ifl_2000)
Coordinate Reference System:
  User input: WGS 84 
  wkt:
GEOGCRS["WGS 84",
    ENSEMBLE["World Geodetic System 1984 ensemble",
        MEMBER["World Geodetic System 1984 (Transit)"],
        MEMBER["World Geodetic System 1984 (G730)"],
        MEMBER["World Geodetic System 1984 (G873)"],
        MEMBER["World Geodetic System 1984 (G1150)"],
        MEMBER["World Geodetic System 1984 (G1674)"],
        MEMBER["World Geodetic System 1984 (G1762)"],
        MEMBER["World Geodetic System 1984 (G2139)"],
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]],
        ENSEMBLEACCURACY[2.0]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    USAGE[
        SCOPE["Horizontal component of 3D system."],
        AREA["World."],
        BBOX[-90,-180,90,180]],
    ID["EPSG",4326]]
st_crs(ifl_2020)
Coordinate Reference System:
  User input: WGS 84 
  wkt:
GEOGCRS["WGS 84",
    ENSEMBLE["World Geodetic System 1984 ensemble",
        MEMBER["World Geodetic System 1984 (Transit)"],
        MEMBER["World Geodetic System 1984 (G730)"],
        MEMBER["World Geodetic System 1984 (G873)"],
        MEMBER["World Geodetic System 1984 (G1150)"],
        MEMBER["World Geodetic System 1984 (G1674)"],
        MEMBER["World Geodetic System 1984 (G1762)"],
        MEMBER["World Geodetic System 1984 (G2139)"],
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]],
        ENSEMBLEACCURACY[2.0]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    USAGE[
        SCOPE["Horizontal component of 3D system."],
        AREA["World."],
        BBOX[-90,-180,90,180]],
    ID["EPSG",4326]]

As you can see, both layers are in WGS 1984.

So what projection should we use?

Well, some of the layers we have are already projected (sf is impressive in that it can reproject layers on the fly in plotting, so we haven’t paid much attention to this point, yet). They also just happen to be using the projection that is most commonly used when mapping the Amazon Basin. Have a look:

st_crs(nat_roads_brazil)
Coordinate Reference System:
  User input: SIRGAS 2000 
  wkt:
GEOGCRS["SIRGAS 2000",
    DATUM["Sistema de Referencia Geocentrico para las AmericaS 2000",
        ELLIPSOID["GRS 1980",6378137,298.257222101,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    USAGE[
        SCOPE["Horizontal component of 3D system."],
        AREA["Latin America - Central America and South America - onshore and offshore. Brazil - onshore and offshore."],
        BBOX[-59.87,-122.19,32.72,-25.28]],
    ID["EPSG",4674]]

To reproject sf layers, we use the st_transform() function. This function takes two main arguments: first, the layer that we want to reproject, and, second, the CRS we want to reproject to.

One handy thing is that because st_crs() extracts projection information from a layer, we can use it to define our target CRS in the st_transform() function, like this:

ifl_2000 <- st_transform(ifl_2000,
                           crs = st_crs(nat_roads_brazil))

ifl_2020 <- st_transform(ifl_2020,
                         crs = st_crs(nat_roads_brazil))

Buffering in the sf Package

A final data cleaning step that we should take has to do with the fact that, even simplified, the intact forest landscape layer edges are so jagged that comparing them is still likely to generate invalid polygons, just because even the boundary lines of the same forest area might not line up exactly in the two datasets.

One way to mitigate this problem is to slightly buffer the dataset we are using to do the erasing - in this case the 2020 intact forest landscapes layer. A very small buffer can help avoid polygon errors due to slight errors in polygon alignment between the two years. As with simplification, it comes with a cost in accuracy, as it will slightly bias the results downward, but this bias should be tiny at the scale of the entire Legal Amazon.

Buffering in the sf package is handled with the st_buffer() function. It takes two arguments: first, the layer whose features you want to buffer, and, second, the distance of the buffer, in the units used by the layer’s projection. In this case, the units are meters:

ifl_2020 <- st_buffer(ifl_2020, dist = 1)

Erasing in the rmapshaper Package

Now that these layers are simplified, we’re back on track and continue trying to identify the areas in the Legal Amazon that saw intact forest landscapes degraded between 2000 and 2020.

So how do we get them?

Well, you probably already have guessed from the title of this section that the answer is by erasing areas in the 2020 intact forest landscapes layer from the 2000 layer. We can infer that parts of the 2000 intact forest area that are not present in 2020 have been degraded.

Like it’s clipping function, rmapshaper’s erasing function is pretty simple. It’s just ms_erase(). The remove_slivers = TRUE argument tells R to automatically remove “sliver polygons”, which are tiny polygons that show up as a result of small differences in the coordinates of features in two different layers:

ifl_change <- ms_erase(ifl_2000, ifl_2020,
                       remove_slivers = TRUE)

plot(st_geometry(ifl_change), col = "red", border = NA)

Computing Areas in the sf Package

Once we have our forest change layer in the correct projection, we can compute its area. Once again, the function is pretty self-explanatory. It’s just st_area(). The only argument you need is the layer the area of whose polygons you want to compute:

st_area(ifl_change)
Units: [m^2]
  [1] 2.307611e+05 7.967242e+07 6.820126e+08 6.673104e+07 3.429516e+08
  [6] 1.234064e+06 9.227031e+08 5.600445e+07 8.997843e+08 9.742856e+07
 [11] 4.012627e+08 4.215947e+07 5.805841e+08 1.108370e+09 1.131150e+08
 [16] 1.628108e+09 1.097564e+09 9.981549e+05 6.898964e+08 3.653589e+08
 [21] 1.024169e+09 1.011160e+07 6.149162e+08 5.811054e+08 4.894421e+08
 [26] 1.725710e+09 4.061873e+09 5.047224e+08 1.146925e+09 2.040136e+06
 [31] 8.109000e+08 2.774408e+09 1.482184e+08 1.375591e+09 2.557835e+09
 [36] 2.639199e+09 6.331842e+08 2.600989e+09 8.908084e+08 1.017145e+09
 [41] 5.854078e+09 1.046130e+09 2.125112e+06 1.347841e+09 5.396044e+08
 [46] 1.473590e+09 8.489057e+08 7.567140e+08 1.060168e+09 4.676751e+08
 [51] 3.485669e+09 1.547199e+09 4.324115e+09 1.396558e+08 5.256330e+08
 [56] 1.288504e+09 3.563021e+08 3.098468e+08 1.885178e+08 1.712626e+09
 [61] 4.610338e+09 3.698670e+06 4.304035e+08 1.055513e+09 4.816532e+07
 [66] 4.790458e+07 1.043789e+09 1.324364e+09 5.142351e+08 2.765344e+09
 [71] 6.336114e+07 9.084048e+05 2.723769e+09 3.029612e+09 2.102207e+08
 [76] 6.965547e+09 6.857204e+08 1.667654e+08 1.169765e+06 3.340842e+08
 [81] 2.391534e+07 1.030441e+09 6.795066e+08 6.975074e+08 7.232646e+08
 [86] 7.496413e+06 5.773175e+07 6.866935e+07 3.104832e+09 6.339763e+07
 [91] 1.189049e+09 4.954618e+09 6.258683e+06 5.645246e+08 7.659030e+08
 [96] 3.235308e+08 3.756304e+09 2.590090e+09 1.361778e+09 4.114798e+06
[101] 9.915751e+08 8.387665e+08 5.416145e+09 5.259717e+08 3.517868e+07
[106] 3.913933e+08 1.031756e+09 3.941357e+08 2.319482e+09 3.093052e+09
[111] 7.976295e+08 2.029974e+07 1.492712e+09 2.056446e+09 1.525265e+09
[116] 2.079050e+09 5.993279e+08 1.438407e+08 2.477197e+08 1.718612e+08
[121] 2.371806e+07 1.267739e+10 2.654924e+09 3.464867e+07 3.329549e+08
[126] 2.319583e+06 4.719610e+09 1.027333e+08 1.347915e+08 4.325748e+08
[131] 7.567353e+08 9.796155e+08 8.012145e+08 4.261738e+09 9.866199e+08
[136] 3.571260e+08 3.000545e+08 1.170499e+07 1.203407e+09 6.171366e+07
[141] 4.190873e+08 7.188309e+08 2.768667e+08 1.745414e+09 5.727924e+08
[146] 7.721752e+08 3.265375e+05 1.132910e+09 2.587349e+06 3.812432e+09
[151] 1.810647e+08 1.878713e+08 6.653614e+08 1.237496e+09 1.054937e+09
[156] 8.019842e+08 3.009207e+08 6.119620e+08 8.989860e+09 8.497059e+06
[161] 3.908621e+08 3.051112e+07 1.742930e+07 4.004283e+08 2.423181e+09
[166] 5.112473e+09 3.383640e+06 1.525403e+09 3.443707e+08 1.543639e+08
[171] 4.191853e+08 1.479165e+09 8.144962e+05 5.391701e+08 1.598274e+07
[176] 4.573232e+07 8.487029e+05 4.963645e+06 9.699488e+07 1.287582e+07
[181] 1.107176e+08 2.814726e+08 3.209090e+07 3.110456e+07 3.980415e+07
[186] 5.114165e+07 1.211904e+08 5.467590e+08 5.118170e+08 3.200852e+08
[191] 2.143574e+07 4.535022e+07 5.115563e+08 7.551946e+07 1.129903e+09
[196] 5.342706e+08 1.069010e+07 6.501001e+08 1.411788e+07 1.324460e+08
[201] 6.169287e+08 5.138457e+08 5.924084e+06 3.297307e+07 1.208797e+07
[206] 1.160538e+07 1.857590e+07 1.441718e+07 1.220303e+08 2.440671e+07
[211] 8.266277e+05 4.192618e+09 2.521693e+07 5.880370e+08 5.007415e+09
[216] 6.752895e+07 4.921133e+08 4.538254e+07 4.680118e+09 9.427283e+06
[221] 1.900851e+09

Unioning in the sf Package

Well, that isn’t the most helpful, is it? We really wanted to see the total area that was lost, not the area in each random polygon in the ifl_change object.

One way to get the entire area would be to sum this value up:

st_area(ifl_change) %>% 
  sum()
225130831883 [m^2]

Since square meters is a bit tough to wrap our heads around, maybe we should also convert it to square kilometers while we’re at it (there are 1 million square meters in a square kilometer):

st_area(ifl_change) %>% 
  sum()/1000000
225130.8 [m^2]

Okay, so that’s almost 200 thousand square kilometers of intact forest landscape in the Legal Amazon degraded between 2000 and 2020 - depressing!

We could also do this calculation by unioning the polygons first and then computing the area. We perform unions with the st_union() function.

st_union(ifl_change) %>%
  st_area()/1000000
225130.8 [m^2]

Note that R didn’t automatically convert the units designation when we did the division to convert to kilometers. That’s something to keep an eye on.

Task 4: Write a line of code that would compute the area of intact forest landscapes in the Legal Amazon in 2000.

Final Tasks

Now it’s your turn to conduct some analysis.

Task 5: Compute the share of total intact forest degradation that took place within five kilometers of a departmental road in the Legal Amazon between 2000 and 2020. Include all the code required to replicate your results.

Task 6: Compute the share of total forest degradation that took place in indigenous territories in the Legal Amazon between 2000 and 2020. Then compute the share that took place within 5 kilometers of the boundary of indigenous territories (remember that you need to erase the area within indigenous territories in order to get just the buffer zone). Include all the code required to replicate your results.

Task 7: Create a flowchart showing the steps of your analysis in Task 6. In the flowchart boxes, include the exact R functions you used.