This tutorial is designed to walk the user through some basic
features of the mop package (Cobos, et al. 2024a),
a suite of tools designed to extend the Mobility-Oriented Parity (MOP)
metric (Owens, et al. 2013). MOP is a method for estimating
dissimilarity between environmental datasets when projecting a niche
model to novel environments outside those used to train the model
(examples: assessing risk of spread of invasive species, estimating
changes in available suitable habitat under climate change). When niche
models are projected, generally some form of statistical extrapolation
is done. This may lead to additional uncertainty in model projections,
which should be strongly considered when interpreting model projection
results. MOP was originally proposed as an alternative to the
Multivariate Environmental Similarity Surface (MESS) metric (Elith,
et al. 2010).
Here, we adapt supplementary material accompanying our recent paper in Frontiers in Biogeography, which introduces new extensions of MOP and demonstrates their use and usefulness (Cobos, et al. 2024b). In this example, we want to compare the model training region for a hypothetical species with a native range in the Yucután Peninsula to the rest of Mexico (perhaps a conservation organization wants to transplant the species, an endangered plant, and we want to project a niche model of the species to see where else there might be suitable habitat under present-day conditions).
First thing’s first–we need to load the necessary packages.
library(mop) # Tools for calculating Mobility-Oriented Parity and its derivatives
library(terra) # For working with spatial data## terra 1.7.78
library(geodata) # For getting WorldClim data
library(ggplot2) # Classic, for plotting
library(tidyterra) # For plotting fancy maps##
## Attaching package: 'tidyterra'
## The following object is masked from 'package:stats':
##
## filter
Now we get some data for the demonstration. For this, we will download all 19 Bioclim variables for Mexico, then select mean annual air temperature, temperature seasonality, and mean annual precipitation (Bio 1, 4, and 12, respectively), variables we have determined are most relevant for our species of interest.
# downloading environmental variables
varMex <- terra::aggregate(geodata::worldclim_country(country = "Mexico",
var = "bio",
res = 10,
path = "data/"),
4) # Further reduces resolution of environmental data
varMex <- varMex[[c(1, 4, 12)]] # Reduce dataset to variables of interest
names(varMex) <- gsub("wc2.1_30s_", "", names(varMex)) # renaming variables
names(varMex) <- gsub("bio_", "Bio ", names(varMex))
plot(varMex, col = map.pal("inferno"))Next, we simulate a training region for a species of interest, the native range of which is in the Yucután Peninsula.
# Create training region
tr <- data.frame("lon" = c(-89.25, -88.5, -89),
"lat" = c(19.75, 20.5, 19))
trYuc <- vect(tr, geom = c("lon", "lat"), crs = crs(varMex))
trYuc <- aggregate(buffer(trYuc, width = 100000))
plot(varMex[[1]], col = map.pal("reds"),
main = "Mean Annual Temperature")
plot(trYuc, add = T, lwd = 1.5)Now we calculate detailed MOP metrics comparing the training and projection regions. Be patient. These calculations will likely take a while to run, depending on the size of the dataset (resolution, extent, AND number of variables).
mexMOP <- mop(m = varYuc, # Training SpatRaster
g = varMex, # Projection SpatRaster
type = "detailed", # Calculates all the MOP variants
calculate_distance = TRUE, # Estimates how dissimilar g is from m
where_distance = "out_range", # Calculates distances from m in g (comparable to MESS)
scale = TRUE, # Recommended if variables are at different orders of magnitude
center = TRUE, # Recommended if variables are at different orders of magnitude
percentage = .95) # How much of m is used to calculate distancesThere are, of course, some basic methods available for the results of
the analysis. summary() provides the most basic results
(and reminds you what arguments were supplied).
##
## Summary of MOP resuls
## ---------------------------------------------------------------------------
##
## MOP summary:
## Values
## type scale center calculate_distance distance percentage
## 1 detailed TRUE TRUE TRUE euclidean 0.95
## rescale_distance fix_NA N_m N_g
## 1 FALSE TRUE 5520 243336
##
## Reference conditions
## Bio 1 Bio 4 Bio 12
## min 1.223214 -1.418604 -0.03005889
## max 1.809927 -1.116505 0.92187111
##
##
## Distances:
## min mean max
## 0.04570247 2.34100660 5.96785727
##
##
## Non-analogous conditions (NAC):
## Percentage = 0.954% of all conditions
## Variables with NAC in 'simple' = 3
##
##
## Detailed results were obtained, not shown here
First, there are a few things we need to make some nice-looking maps.
# Define a "clear" color for no data values
my_col <- rgb(0, 0, 255, max = 255, alpha = 0, names = "clear")
# Get a continent outline
land <- aggregate(rnaturalearth::ne_countries(scale = "small", returnclass = "sv"))Now let’s look at the MOP results in more detail. First, the
mop_simple raster shows how many variables have values
outside of those in the training region. The training region is shown
with a heavy black outline. Areas with NA values do not
have values outside the range of the training region. According to this
map, a user may want to treat model projections in areas with more
variables outside the training region limits as having more
uncertainty.
ggplot() +
geom_spatraster(data = mexMOP$mop_simple) +
scale_fill_brewer(type = "seq", name = "", palette = "Purples") +
geom_sf(data = land, bg = my_col, col = "black") +
geom_sf(data = trYuc, bg = my_col, col = "black", linewidth = 1.5) +
coord_sf(xlim = ext(mexMOP$mop_simple)[1:2],
ylim = ext(mexMOP$mop_simple)[3:4],
expand = FALSE, label_axes = list(bottom = "E", left = "N")) +
ggtitle("Number of Variables Outside Training Region") +
theme_bw()Next, let’s take a look at how different our g space (Mexico) is from
m space (the training region for our hypthetical plant in the Yucután)
based on multivariate Euclidean distances (see Cobos et al.
2024b for details). As before, areas with NA values do not
have values outside the range of the training region.
ggplot() +
geom_spatraster(data = mexMOP$mop_distances) +
scale_fill_continuous(type = "viridis", na.value = my_col, name = "") +
geom_sf(data = land, bg = my_col, col = "black") +
geom_sf(data = trYuc, bg = my_col, col = "black", linewidth = 1.5) +
coord_sf(xlim = ext(mexMOP$mop_distances)[1:2],
ylim = ext(mexMOP$mop_distances)[3:4],
expand = FALSE, label_axes = list(bottom = "E", left = "N")) +
ggtitle("Distance from m in g") +
theme_bw()One of the key improvements of Cobos et al. 2024b is the addition of more detailed extrapolation maps showing which variables are outside the limits of the training region at the high end versus the low end. If you know which variables are too high and which are too low, you can compare this information to variable response curves to better understand if and how a model is extrapolating to make a prediction.
First, we plot the low-end extrapolations. As in the first map, NA values indicate no extrapolation, but this time, only at the low end of each variable. The map is color-coded by which variable’s values are lower than the lowest values for that variable within the training region.
dl <- droplevels(mexMOP$mop_detailed$towards_low_combined)
ggplot() +
geom_spatraster(data = dl) +
scale_fill_brewer(name = "", palette = "Blues") +
geom_sf(data = land, bg = my_col, col = "black") +
geom_sf(data = trYuc, bg = my_col, col = "black", linewidth = 1.5) +
coord_sf(xlim = ext(mexMOP$mop_simple)[1:2],
ylim = ext(mexMOP$mop_simple)[3:4],
expand = FALSE, label_axes = list(bottom = "E", left = "N")) +
ggtitle("Variables Outside Training Region at Low End") +
theme_bw()And here’s high end extrapolation. NA values now indicate no extrapolation at the high end of each variable. You can see that there is not much high end extrapolation in southwest Mexico. If you look at the variable response curves of your models and see that at the high ends, responses go down (i.e. higher values = lower suitability) you might be more confident in the model’s ability to generate reasonable predictions via extrapolation in these regions.
dl <- droplevels(mexMOP$mop_detailed$towards_high_combined)
ggplot() +
geom_spatraster(data = dl) +
scale_fill_brewer(name = "", palette = "Oranges") +
geom_sf(data = land, bg = my_col, col = "black") +
geom_sf(data = trYuc, bg = my_col, col = "black", linewidth = 1.5) +
coord_sf(xlim = ext(mexMOP$mop_simple)[1:2],
ylim = ext(mexMOP$mop_simple)[3:4],
expand = FALSE, label_axes = list(bottom = "E", left = "N")) +
ggtitle("Variables Outside Training Region at High End") +
theme_bw()I hope this has helped. Happy (and careful) model projecting!
Cobos ME, Owens HL, Soberón J, Peterson AT. 2024a. mop: Mobility Oriented-Parity Metric. R package version 0.1.2. https://CRAN.R-project.org/package=mop.
Cobos ME, Owens HL, Soberón J, Peterson AT. 2024b. Detailed multivariate comparisons of environments with mobility oriented parity. Frontiers of Biogeography 17: e132916. https://doi.org/10.21425/fob.17.132916.
Elith J, Kearney M, Phillips S. 2010. The art of modelling range‐shifting species. Methods in Ecology and Evolution 1: 330-42. https://doi.org/10.1111/j.2041-210X.2010.00036.x.
Owens HL, Campbell LP, Dornak LL, Saupe EE, Barve N, Soberón J, Ingenloff K, Lira-Noriega A, Hensz CM, Myers CE, Peterson AT. 2013. Constraints on interpretation of ecological niche models by limited environmental ranges on calibration areas. Ecological Modelling 263: 10-8. https://doi.org/10.1016/j.ecolmodel.2013.04.011.