require(tidyverse)
require(terra)
require(landscapemetrics)Landscape Metrics in R
Overview
The term landscape metrics refers to a vast array of algorithms designed to quantitatively characterize aspects of the geographic layout of different types of land cover in a landscape. We’ve talked about the basic logic behind landscape metrics, which builds off the idea of patches of different land cover across a landscape.
In this tutorial, we’re going to be looking at how to compute and visualize landscape metrics using the landscapemetrics package. Really, computing the metrics is the easy part, although they can be computationally intensive. The harder part is thinking through what they might mean about processes taking place on a landscape.
Because computing landscape metrics is relatively straightforward, we will also spend some time in this tutorial looking at how to make cool plots of our results using ggplot2. You’re already familiar with using ggplot2 for mapping - this is an opportunity to see the other cool things it can do.
The Data
Most of the data from this tutorial come from Doyle, C., et al. (2021). Tropical forest and wetland losses and the role of protected areas in northwestern Belize, revealed from LandSat and machine learning. Remote Sensing, 13(3), 379. The data are hosted on the lead author’s GitHub repository.
These are the data we’ll be working with: * orange_walk.gpkg has the boundaries of Belize’s Orange Walk District, extracted from the Global Administrative Areas dataset * rf_classification_1985.tif, rf_classification_2000.tif, and rf_classification_2015.tif are Geotiff files with the land-cover classification results from the article * Orange_Walk_LC_Codes.csv is a crosswalk file that identifies the land-cover types shown in the Geotiffs
These data can be accessed in this zip file.
The Packages
landscapemetrics works with the terra package, and, as we mentioned above, we’ll be doing some work with ggplot2, so we’ll also load in the tidyverse.
Let’s load these in now:
Running landscape metrics with landscapemetrics
To perform its calculations, landscapemetrics requires its input SpatRaster to be in a particular format:
- The CRS units must be in meters, so the raster must be projected.
- The raster values must be categorical - that is, they should be discrete numbers each representing a different class of some characteristic, generally land cover or land use.
We’ll be using some relatively long-term historical data covering an important period of landscape transition in Belize’s Orange Walk District, in the northwest of the country. We have land-cover rasters for 1985, 2000, and 2015, a period in which the region shifted dramatically from a heavily forested landscape featuring some small-scale agriculture and traditional shifting cultivation of corn in a system known as milpa to a landscape dominated by large-scale cultivation of several commercial crops.
Let’s set our working directory, read in these data, and have a look:
setwd("G:/My Drive/My Classes/GIS for Global Environment/Labs/Landscape Metrics")
ow_1985 <- rast("rf_classification_1985.tif")
ow_2000 <- rast("rf_classification_2000.tif")
ow_2015 <- rast("rf_classification_2015.tif")
plot(ow_1985, type = "classes")plot(ow_2000, type = "classes")plot(ow_2015, type = "classes")So you may notice one immediate problem that we need to address. We said above that landscapemetrics needs the raster to be projected, with units in meters. But we can tell from these plots’ axes that the raster is probably in latitude and longitude. Let’s just double-check, to be sure:
crs(ow_2015)[1] "GEOGCRS[\"WGS 84\",\n ENSEMBLE[\"World Geodetic System 1984 ensemble\",\n MEMBER[\"World Geodetic System 1984 (Transit)\"],\n MEMBER[\"World Geodetic System 1984 (G730)\"],\n MEMBER[\"World Geodetic System 1984 (G873)\"],\n MEMBER[\"World Geodetic System 1984 (G1150)\"],\n MEMBER[\"World Geodetic System 1984 (G1674)\"],\n MEMBER[\"World Geodetic System 1984 (G1762)\"],\n MEMBER[\"World Geodetic System 1984 (G2139)\"],\n ELLIPSOID[\"WGS 84\",6378137,298.257223563,\n LENGTHUNIT[\"metre\",1]],\n ENSEMBLEACCURACY[2.0]],\n PRIMEM[\"Greenwich\",0,\n ANGLEUNIT[\"degree\",0.0174532925199433]],\n CS[ellipsoidal,2],\n AXIS[\"geodetic latitude (Lat)\",north,\n ORDER[1],\n ANGLEUNIT[\"degree\",0.0174532925199433]],\n AXIS[\"geodetic longitude (Lon)\",east,\n ORDER[2],\n ANGLEUNIT[\"degree\",0.0174532925199433]],\n USAGE[\n SCOPE[\"Horizontal component of 3D system.\"],\n AREA[\"World.\"],\n BBOX[-90,-180,90,180]],\n ID[\"EPSG\",4326]]"
Yup - sure enough! We’re in WGS 1984. That means we need to reproject this raster.
Before dealing with the projection issue, however, there is another, slightly more subtle problem that you might not have noticed: there are 10 land-cover classes in these rasters (not counting 0, which designates areas outside of Orange Walk), even though there are only 8 different classes in the maps in the article. In addition, for some reason the article separates out rice cultivation as its own class but aggregates multiple other classes into the single agriculture class. It seems like we should probably just group rice in with the others for consistency.
So all that means we need to use the classify() function from last time, again. First, let’s read in a table I’ve put together that we can use to reclassify our rasters:
ow_reclass <- read.csv("Orange_Walk_LC_Codes.csv",
stringsAsFactors = FALSE)
ow_reclass LC_Code Class New_Code
1 1 Agriculture 1
2 2 Agriculture 1
3 3 Savanna/Shurbland 2
4 4 Urban/Road 3
5 5 Lowland Broad-Leaved Forest 4
6 6 Swamp Forest 5
7 7 Wetland 6
8 8 Water 7
9 9 Agriculture 1
10 10 Agriculture 1
So, as you can see, we need to reclassify these rasters from LC_Code to New_Code.
Task 1: Reclassify all three rasters from the old to the new values. Name the new rasters ow_2015_rec, ow_2000_rec, and ow_1985_rec. Your results should look something like this:
Reprojecting SpatRasters in terra
Now let’s deal with that projection problem. The thing is, we’ve reprojected sf objects plenty of times, now, but we haven’t actually seen how to reproject a raster, and this process is a bit more complicated than reprojecting vector layers, because we actually have to recalculate rasters’ pixel values when reprojecting, because we’re creating new sets of pixels. As a result there is always some distortion in pixel values when reprojecting rasters. That means that, to the extent possible, you should avoid reprojecting rasters.
In this case, however, we have to project the raster in order to compute landscape metrics, so here’s what we need to do to reproject a raster in terra: * Create a template target raster in the CRS to which we want to project with the extent and pixel resolution that we want our final raster to have * Use the project() function to get our raster to match the target
So how do we build a template? Well, if we know the exact extent of the area we want to cover in our target projection, that is a good start.
But how can we figure that out? One way would be to extract the bounding box of our raster and then reproject that. Because the bounding box is a vector object, we can reproject it into our target CRS without distortion:
ow_bbox <- ext(ow_2015) %>% #Get SpatRaster extent/bounding box
vect(crs = crs(ow_2015)) %>% #Convert to polygon and specify CRS
project("epsg:32616") #Projecting in appropriate UTM projection
plot(ow_bbox)As you can see, we now have units in meters. Of course, that’s not quite enough. Now we need to use this information to create a new template raster. A key decision we need to make here is what resolution we want to have. In general, you should try to approximate the pixel size of the raster you are projecting (though this is impossible if the raster covers a lot of north/south distance, because the pixel sizes will be changing so much as you move away from the Equator). Because the paper we are building on was using LandSat, and LandSat is at about a 30-meter resolution close to the Equator, we can use that as our template.
So to get this to work, we just need to create a SpatRaster using the rast() function, with ow_bbox as our input, and specify the resolution we want:
ow_template <- rast(ow_bbox, res = 30)
ow_templateclass : SpatRaster
dimensions : 3459, 3093, 1 (nrow, ncol, nlyr)
resolution : 30, 30 (x, y)
extent : 271239.9, 364029.9, 1914624, 2018394 (xmin, xmax, ymin, ymax)
coord. ref. : WGS 84 / UTM zone 16N (EPSG:32616)
Cool! If we approximated the pixel size well, our dimensions should be pretty close to those of the original rasters:
ow_2015class : SpatRaster
dimensions : 3451, 3214, 1 (nrow, ncol, nlyr)
resolution : 0.0002694946, 0.0002694946 (x, y)
extent : -89.15231, -88.28616, 17.31287, 18.2429 (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326)
source : rf_classification_2015.tif
name : classification
min value : 0
max value : 10
Now to project, we just need to tell terra our initial SpatRaster, our template, and the method to use for estimating the values of the new pixels. For continuous values, we have a bunch of different options for computing these values, but if our pixels are categorical, like in this case, we don’t want to do any transformations other than just assigning the new pixel values as the value that covers the majority of the pixel, which we can do by computing the mode:
ow_2015_utm <- project(ow_2015_rec, ow_template, method = "mode")
plot(ow_2015_utm, type = "classes")All done! Hopefully we managed to do that without too much distortion!
Task 2: Reproject the other two reclassifed Orange Walk land-cover rasters (ow_2000_rec and ow_1985_rec). Because all three rasters are aligned, you can use the same template raster for all of them. Plot your results, which should look something like this:
Computing landscape metrics
landscapemetrics has dozens of functions to choose from, and I’ll be honest that I’m not that familiar with many of them. You can see a list of them all here.
While this is a dizzying array of functions, they mostly follow similar patterns. All the functions for computing metrics between with lsm_, for “landscape metric” and then are followed by the level at which they should be calculated: * lsm_l_: Compute the metric for the whole raster * lsm_c_: Compute the metric for each individual land-cover class, taken as a whole * lsm_p_: Compute the metric for every single patch
The final part of the function name is then an abbreviation for the type of measure to conduct. So, for example, lsm_l_ed computes the *edge density - the total length of edges per square meter** for the entire landscape. lsm_p_ed would compute the same measure for each patch.
No matter what function you use, landscapemetrics gives you results in the same tibble format, which makes it really easy to feed those results into ggplot2 for visualization. Let’s just have a look at a simple example (be patient - while the function format is simple, these are still computationally intensive):
ow_1985_c_ed <- lsm_c_ed(ow_1985_utm)
ow_1985_c_ed# A tibble: 8 × 6
layer level class id metric value
<int> <chr> <int> <int> <chr> <dbl>
1 1 class 0 NA ed 0.797
2 1 class 1 NA ed 10.2
3 1 class 2 NA ed 4.21
4 1 class 3 NA ed 0.297
5 1 class 4 NA ed 0.939
6 1 class 5 NA ed 1.14
7 1 class 6 NA ed 8.37
8 1 class 7 NA ed 0.334
Okay, so a few things to notice, here: * A couple of classes, class 1 and 6, have particularly high edge density, meaning that their patches tend to be more elongated, with complex boundaries * Classes 3 and 7 have particularly low edge densities, suggesting they are more compact and concave patches * A value has been computed for class 0, which is outside the boundaries of the landscape, so we need to make sure to keep that from happening if we compute any landscape-level metrics or it will mess up our results
So what did those codes represent, again?
ow_reclass LC_Code Class New_Code
1 1 Agriculture 1
2 2 Agriculture 1
3 3 Savanna/Shurbland 2
4 4 Urban/Road 3
5 5 Lowland Broad-Leaved Forest 4
6 6 Swamp Forest 5
7 7 Wetland 6
8 8 Water 7
9 9 Agriculture 1
10 10 Agriculture 1
Task 3: Compute edge density at the class level for the other two time periods (make sure to use the _utm objects) and discuss your results. Do you notice any interesting changes over time? What do you think they might indicate is going on on the landscape?
Data visualization in ggplot2
Now that we know the basic setup for running landscape metric functions, let’s turn our attention to using data visualization to get a better feel for what’s taking place on a landscape over time. Let’s start by computing a new measure, this time at the patch level, across all three time periods. Let’s have a look at each patch’s core area - that is, the area of the patch that is not on an edge with another patch:
ow_1985_p_ca <- lsm_p_core(ow_1985_utm)
ow_2000_p_ca <- lsm_p_core(ow_2000_utm)
ow_2015_p_ca <- lsm_p_core(ow_2015_utm)Like before, each of these will be a tibble, with the same format:
# A tibble: 20,731 × 6
layer level class id metric value
<int> <chr> <int> <int> <chr> <dbl>
1 1 patch 0 1 core 479043.
2 1 patch 1 2 core 22630.
3 1 patch 1 3 core 0
4 1 patch 1 4 core 10.9
5 1 patch 1 5 core 1102.
6 1 patch 1 6 core 0
7 1 patch 1 7 core 0
8 1 patch 1 8 core 1.89
9 1 patch 1 9 core 0
10 1 patch 1 10 core 0
# ℹ 20,721 more rows
You’ll notice here that this is a much bigger dataset - almost 21,000 rows (i.e. 21,000 patches in the landscape)! There’s no way we can get a sense of what’s going on in this table - much less compare it to the other ones - just by looking at it. That’s why we need ggplot2!
Data wrangling in tidyverse
Before we get to plotting, we’ll need to do a quick tutorial on some of tidyverse’s tools for managing data tables.
Because we have measures over time, it would be nice to be able to compare those measures on the same plot. So we want to combine the core results for each of the years, but we also need to make sure that we can still distinguish which measure is from which year. Here’s how we can do it:
#First, we need to add a new variable, which we'll call "year," to
#show which measure is from which year.
ow_1985_p_ca <- ow_1985_p_ca %>%
mutate(year = 1985)
ow_2000_p_ca <- ow_2000_p_ca %>%
mutate(year = 2000)
ow_2015_p_ca <- ow_2015_p_ca %>%
mutate(year = 2015)
#Now we can combine them using the rbind() function. rbind() just stacks
#up the rows of data tables, so long as they have the same number of
#columns and the variables have consistent names across the tables.
ow_p_ca <- rbind(ow_1985_p_ca,
ow_2000_p_ca,
ow_2015_p_ca)
#Then we can look at a summary of the variables to see if we got it right:
summary(ow_p_ca) layer level class id
Min. :1 Length:51387 Min. :0.000 Min. : 1
1st Qu.:1 Class :character 1st Qu.:2.000 1st Qu.: 4283
Median :1 Mode :character Median :3.000 Median : 8565
Mean :1 Mean :3.326 Mean : 8780
3rd Qu.:1 3rd Qu.:6.000 3rd Qu.:12847
Max. :1 Max. :7.000 Max. :20731
metric value year
Length:51387 Min. : 0.0 Min. :1985
Class :character 1st Qu.: 0.0 1st Qu.:1985
Mode :character Median : 0.0 Median :2000
Mean : 51.8 Mean :2001
3rd Qu.: 0.4 3rd Qu.:2015
Max. :479043.2 Max. :2015
Once we have this table, we can also use tidyverse to create a column that will give us the text name of the classes, rather than the code, which will be helpful for making informative plots:
#First, we get the list of new codes and land-cover classes:
lc_names <- ow_reclass[,c("Class", "New_Code")] %>%
unique() #This returns only rows with distinct combinations of the
#column variables - that gets rid of those duplicated agriculture
#entries
#Now, we use the left_join(), which performs a table join, to attach
#the new labels:
ow_p_ca <- ow_p_ca %>%
left_join(lc_names, by = join_by(class == New_Code))
#The join_by() function tells R which columns to match up
ow_p_ca# A tibble: 51,387 × 8
layer level class id metric value year Class
<int> <chr> <int> <int> <chr> <dbl> <dbl> <chr>
1 1 patch 0 1 core 479031. 1985 <NA>
2 1 patch 0 2 core 0 1985 <NA>
3 1 patch 0 3 core 0.18 1985 <NA>
4 1 patch 0 4 core 0.18 1985 <NA>
5 1 patch 0 5 core 0.09 1985 <NA>
6 1 patch 0 6 core 0 1985 <NA>
7 1 patch 0 7 core 3.24 1985 <NA>
8 1 patch 0 8 core 0.9 1985 <NA>
9 1 patch 0 9 core 0.09 1985 <NA>
10 1 patch 0 10 core 0 1985 <NA>
# ℹ 51,377 more rows
Cool! You might have noticed another issue, though, which is that we still have the pesky 0 category hanging on. tidyverse can help us with that, too:
ow_p_ca <- ow_p_ca %>%
filter(class != 0) #i.e. class isn't equal to 0
ow_p_ca# A tibble: 51,126 × 8
layer level class id metric value year Class
<int> <chr> <int> <int> <chr> <dbl> <dbl> <chr>
1 1 patch 1 260 core 329046. 1985 Agriculture
2 1 patch 1 261 core 0 1985 Agriculture
3 1 patch 1 262 core 0 1985 Agriculture
4 1 patch 1 263 core 0 1985 Agriculture
5 1 patch 1 264 core 0 1985 Agriculture
6 1 patch 1 265 core 0 1985 Agriculture
7 1 patch 1 266 core 0 1985 Agriculture
8 1 patch 1 267 core 0 1985 Agriculture
9 1 patch 1 268 core 0.54 1985 Agriculture
10 1 patch 1 269 core 0 1985 Agriculture
# ℹ 51,116 more rows
Ah, much better!
Also, we can use these tools to rescale the core area measure (whatever measure we compute, landscapemetrics will put it in a column called value) to square kilometers or football pitches to make things easier to follow:
ow_p_ca <- ow_p_ca %>%
mutate(core_km = value / 1000000,
core_fbp = core_km * 140)One last trick: we can use the group_by() function to define groupings based on categorical variables in the data table and then use summarise() to compute measures broken down by group:
ow_p_ca %>%
group_by(Class, year) %>% #Class with a capital C is the text label
summarise(core_mn_fbp = mean(core_fbp),
core_sd_fbp = sd(core_fbp)) #sd() computes standard deviation`summarise()` has grouped output by 'Class'. You can override using the
`.groups` argument.
# A tibble: 21 × 4
# Groups: Class [7]
Class year core_mn_fbp core_sd_fbp
<chr> <dbl> <dbl> <dbl>
1 Agriculture 1985 0.0163 0.856
2 Agriculture 2000 0.0171 0.825
3 Agriculture 2015 0.00377 0.241
4 Lowland Broad-Leaved Forest 1985 0.0000576 0.000251
5 Lowland Broad-Leaved Forest 2000 0.000183 0.00134
6 Lowland Broad-Leaved Forest 2015 0.00536 0.309
7 Savanna/Shurbland 1985 0.000865 0.0165
8 Savanna/Shurbland 2000 0.000991 0.0220
9 Savanna/Shurbland 2015 0.000748 0.0164
10 Swamp Forest 1985 0.000422 0.00888
# ℹ 11 more rows
Data visualization
Okay, that’s probably enough data wrangling for the moment. Let’s start having a look at data visualization. We’ve already played around with ggplot2 quite a bit, and you should have reviewed the ggplot2 overview video, so I’m going to be pretty quick here.
One of the things to think about when making data visualizations is how best to match the characteristics of the data with the visualization technique. We’re trying to visualize a set of measures with a very large number of observations with very widely spread values. Those are some of the hardest types of data to make sense of, so we’re just going to try a few things here. Let’s start with a few approaches to visualizing distributions:
ggplot(data = ow_p_ca, aes(x = core_fbp)) +
geom_histogram() +
#We're going to label the axes for clarity. The first entry in
#a scale_ function is the title for the scale:
scale_x_continuous("Patch Size (in Football Pitches)") +
scale_y_continuous("Number of Patches") +
facet_wrap(~Class) + #Showing each class separately
theme_dark() #We'll use a few built-in themes so you can see them`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Well, that’s not great - what’s going on? This is a common problem when we’re trying to visualize very heterogeneous data. We’ve got some absolutely gigantic agriculture plots, and they’re messing with the scales for all the other classes. Also, we would probably be better served by orienting our plots vertically, to optimize space. Let’s make those adjustments:
ggplot(data = ow_p_ca, aes(x = core_fbp)) +
geom_histogram() +
scale_x_continuous("Patch Size (in Football Pitches)") +
scale_y_continuous("Number of Patches") +
#Here's the change. We specify that we want 1 column of plots and that
#ggplot should set the x-axis scale separately for each plot
facet_wrap(~Class, ncol = 2, scales = "free_x") +
theme_bw() #New theme`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Better, but the y-axes are still a problem, for the same reason that the x-axes were. Let’s try this:
ggplot(data = ow_p_ca, aes(x = core_fbp)) +
geom_histogram() +
scale_x_continuous("Patch Size (in Football Pitches)") +
scale_y_continuous("Number of Patches") +
#ggplot should set the scales separately for each plot
facet_wrap(~Class, ncol = 2, scales = "free") +
theme_classic() #New theme`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Better, but we probably also want to see how this varies by year:
ggplot(data = ow_p_ca, aes(x = core_fbp)) +
geom_histogram() +
scale_x_continuous("Patch Size (in Football Pitches)") +
scale_y_continuous("Number of Patches") +
#facet_grid will lay out the plots based on two variables. The first
#will vary along the rows; the second along the columns
facet_grid(year~Class, scales = "free") +
theme_linedraw() #New theme`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Not good, either. This is hard! Let’s try a different strategy. We can also use boxplots or violin plots to show distributions. Maybe that will work better:
#Notice now we're setting the year as the x-axis and the value as the
#y-axis. Wrapping the year variable with factor() ensures R understands
#it is a discrete value:
ggplot(data = ow_p_ca, aes(x = factor(year), y = core_fbp)) +
geom_boxplot() +
scale_x_discrete("Year") +
scale_y_continuous("Patch Size (in Football Pitches)") +
#ggplot should set the scales separately for each plot
facet_wrap(~Class, ncol = 2, scales = "free") +
theme_light() #New themeNope! Still too skewed! Let’s try looking at things on a logarithmic scale:
ggplot(data = ow_p_ca, aes(x = factor(year), y = core_fbp)) +
geom_boxplot() +
scale_x_discrete("Year") +
scale_y_log10("Patch Size (in Football Pitches)") +
#ggplot should set the scales separately for each plot
facet_wrap(~Class, ncol = 2, scales = "free") +
theme_light() #New themeWarning in scale_y_log10("Patch Size (in Football Pitches)"): log-10
transformation introduced infinite values.
Warning: Removed 31383 rows containing non-finite outside the scale range
(`stat_boxplot()`).
And now that eliminated over 30,000 patches that had 0 core area! Ugh! When all else fails, maybe we can at least look at some overall trends. We can use the geom_smooth() method to at least see the average trend over time:
#Notice that if we're using geom_smooth() we can keep year as a
#continuous variable, because geom_smooth() is actually computing
#a regression model to draw its trendline.
ggplot(data = ow_p_ca, aes(x = year, y = core_fbp)) +
geom_smooth(method = "glm") +
scale_x_continuous("Year") +
scale_y_continuous("Patch Size (in Football Pitches)") +
#ggplot should set the scales separately for each plot
facet_wrap(~Class, ncol = 2, scales = "free") +
theme_light() #New theme`geom_smooth()` using formula = 'y ~ x'
Finally, we could just take some summary measures, like the mean, and plot those over time to see if we see any clearer patterns:
ow_p_ca %>%
group_by(year, Class) %>%
summarise(core_mn_fbp = mean(core_fbp)) %>%
ggplot(aes(x = factor(year), y=core_mn_fbp)) +
geom_bar(stat = "identity", linewidth = 1, color = "black") +
scale_x_discrete("Year") +
scale_y_continuous("Mean Patch Size (FB Pitches)") +
facet_wrap(~Class, ncol = 3, scales = "free") +
#with medium numbers of values
theme_gray()`summarise()` has grouped output by 'year'. You can override using the
`.groups` argument.
Task 4: Compute two different landscape metrics, at the class level, for all three years we have available. Then do some online searches for examples of how to use the geom_bar() method to show how these measures have changed over time. Explain what your findings might tell you about what’s going on in the landscape.