This tutorial is designed to walk the user through some basic features of the mop package (Cobos, et al. 2024a), a suite of tools designed to extend the Mobility-Oriented Parity (MOP) metric (Owens, et al. 2013). MOP is a method for estimating dissimilarity between environmental datasets when projecting a niche model to novel environments outside those used to train the model (examples: assessing risk of spread of invasive species, estimating changes in available suitable habitat under climate change). When niche models are projected, generally some form of statistical extrapolation is done. This may lead to additional uncertainty in model projections, which should be strongly considered when interpreting model projection results. MOP was originally proposed as an alternative to the Multivariate Environmental Similarity Surface (MESS) metric (Elith, et al. 2010).

Here, we adapt supplementary material accompanying our recent paper in Frontiers in Biogeography, which introduces new extensions of MOP and demonstrates their use and usefulness (Cobos, et al. 2024b). In this example, we want to compare the model training region for a hypothetical species with a native range in the Yucután Peninsula to the rest of Mexico (perhaps a conservation organization wants to transplant the species, an endangered plant, and we want to project a niche model of the species to see where else there might be suitable habitat under present-day conditions).

Setup

First thing’s first–we need to load the necessary packages.

library(mop) # Tools for calculating Mobility-Oriented Parity and its derivatives
library(terra) # For working with spatial data
## terra 1.7.78
library(geodata) # For getting WorldClim data
library(ggplot2) # Classic, for plotting
library(tidyterra) # For plotting fancy maps
## 
## Attaching package: 'tidyterra'
## The following object is masked from 'package:stats':
## 
##     filter

Getting data

Now we get some data for the demonstration. For this, we will download all 19 Bioclim variables for Mexico, then select mean annual air temperature, temperature seasonality, and mean annual precipitation (Bio 1, 4, and 12, respectively), variables we have determined are most relevant for our species of interest.

# downloading environmental variables 
varMex <- terra::aggregate(geodata::worldclim_country(country = "Mexico",
                                     var = "bio",
                                     res = 10, 
                                     path = "data/"), 
                           4) # Further reduces resolution of environmental data
varMex <- varMex[[c(1, 4, 12)]] # Reduce dataset to variables of interest
names(varMex) <- gsub("wc2.1_30s_", "", names(varMex)) # renaming variables
names(varMex) <- gsub("bio_", "Bio ", names(varMex))
plot(varMex, col = map.pal("inferno"))

Next, we simulate a training region for a species of interest, the native range of which is in the Yucután Peninsula.

# Create training region
tr <- data.frame("lon" = c(-89.25, -88.5, -89), 
                 "lat" = c(19.75, 20.5, 19))
trYuc <- vect(tr, geom = c("lon", "lat"), crs = crs(varMex))
trYuc <- aggregate(buffer(trYuc, width = 100000))

plot(varMex[[1]], col = map.pal("reds"), 
     main = "Mean Annual Temperature")
plot(trYuc, add = T, lwd = 1.5)

# Crop environmental data to training region
varYuc <- crop(varMex, trYuc, mask = TRUE)

Calculating extended MOP

Now we calculate detailed MOP metrics comparing the training and projection regions. Be patient. These calculations will likely take a while to run, depending on the size of the dataset (resolution, extent, AND number of variables).

mexMOP <- mop(m = varYuc, # Training SpatRaster
              g = varMex, # Projection SpatRaster
              type = "detailed", # Calculates all the MOP variants
              calculate_distance = TRUE, # Estimates how dissimilar g is from m
              where_distance = "out_range", # Calculates distances from m in g (comparable to MESS)
              scale = TRUE, # Recommended if variables are at different orders of magnitude
              center = TRUE, # Recommended if variables are at different orders of magnitude
              percentage = .95) # How much of m is used to calculate distances

There are, of course, some basic methods available for the results of the analysis. summary() provides the most basic results (and reminds you what arguments were supplied).

summary(mexMOP)
## 
##                         Summary of MOP resuls
## ---------------------------------------------------------------------------
## 
## MOP summary:
## Values
##       type scale center calculate_distance  distance percentage
## 1 detailed  TRUE   TRUE               TRUE euclidean       0.95
##   rescale_distance fix_NA  N_m    N_g
## 1            FALSE   TRUE 5520 243336
## 
## Reference conditions
##        Bio 1     Bio 4      Bio 12
## min 1.223214 -1.418604 -0.03005889
## max 1.809927 -1.116505  0.92187111
## 
## 
## Distances:
##        min       mean        max 
## 0.04570247 2.34100660 5.96785727 
## 
## 
## Non-analogous conditions (NAC):
## Percentage = 0.954% of all conditions
## Variables with NAC in 'simple' = 3
## 
## 
## Detailed results were obtained, not shown here

Mapping

First, there are a few things we need to make some nice-looking maps.

# Define a "clear" color for no data values
my_col <- rgb(0, 0, 255, max = 255, alpha = 0, names = "clear")

# Get a continent outline
land <- aggregate(rnaturalearth::ne_countries(scale = "small", returnclass = "sv"))

Now let’s look at the MOP results in more detail. First, the mop_simple raster shows how many variables have values outside of those in the training region. The training region is shown with a heavy black outline. Areas with NA values do not have values outside the range of the training region. According to this map, a user may want to treat model projections in areas with more variables outside the training region limits as having more uncertainty.

ggplot() +
  geom_spatraster(data = mexMOP$mop_simple) +
  scale_fill_brewer(type = "seq", name = "", palette = "Purples") +
  geom_sf(data = land, bg = my_col, col = "black") +
  geom_sf(data = trYuc, bg = my_col, col = "black", linewidth = 1.5) +
  coord_sf(xlim = ext(mexMOP$mop_simple)[1:2], 
           ylim = ext(mexMOP$mop_simple)[3:4],
           expand = FALSE, label_axes = list(bottom = "E", left = "N")) +
  ggtitle("Number of Variables Outside Training Region") +
  theme_bw()

Next, let’s take a look at how different our g space (Mexico) is from m space (the training region for our hypthetical plant in the Yucután) based on multivariate Euclidean distances (see Cobos et al. 2024b for details). As before, areas with NA values do not have values outside the range of the training region.

ggplot() +
  geom_spatraster(data = mexMOP$mop_distances) +
  scale_fill_continuous(type = "viridis", na.value = my_col, name = "") +
  geom_sf(data = land, bg = my_col, col = "black") +
  geom_sf(data = trYuc, bg = my_col, col = "black", linewidth = 1.5) +
  coord_sf(xlim = ext(mexMOP$mop_distances)[1:2], 
           ylim = ext(mexMOP$mop_distances)[3:4],
           expand = FALSE, label_axes = list(bottom = "E", left = "N")) +
  ggtitle("Distance from m in g") +
  theme_bw()

One of the key improvements of Cobos et al. 2024b is the addition of more detailed extrapolation maps showing which variables are outside the limits of the training region at the high end versus the low end. If you know which variables are too high and which are too low, you can compare this information to variable response curves to better understand if and how a model is extrapolating to make a prediction.

First, we plot the low-end extrapolations. As in the first map, NA values indicate no extrapolation, but this time, only at the low end of each variable. The map is color-coded by which variable’s values are lower than the lowest values for that variable within the training region.

dl <- droplevels(mexMOP$mop_detailed$towards_low_combined)

ggplot() +
  geom_spatraster(data = dl) +
  scale_fill_brewer(name = "", palette = "Blues") +
  geom_sf(data = land, bg = my_col, col = "black") +
  geom_sf(data = trYuc, bg = my_col, col = "black", linewidth = 1.5) +
  coord_sf(xlim = ext(mexMOP$mop_simple)[1:2], 
           ylim = ext(mexMOP$mop_simple)[3:4],
           expand = FALSE, label_axes = list(bottom = "E", left = "N")) +
  ggtitle("Variables Outside Training Region at Low End") +
  theme_bw()

And here’s high end extrapolation. NA values now indicate no extrapolation at the high end of each variable. You can see that there is not much high end extrapolation in southwest Mexico. If you look at the variable response curves of your models and see that at the high ends, responses go down (i.e. higher values = lower suitability) you might be more confident in the model’s ability to generate reasonable predictions via extrapolation in these regions.

dl <- droplevels(mexMOP$mop_detailed$towards_high_combined)

ggplot() +
  geom_spatraster(data = dl) +
  scale_fill_brewer(name = "", palette = "Oranges") +
  geom_sf(data = land, bg = my_col, col = "black") +
  geom_sf(data = trYuc, bg = my_col, col = "black", linewidth = 1.5) +
  coord_sf(xlim = ext(mexMOP$mop_simple)[1:2], 
           ylim = ext(mexMOP$mop_simple)[3:4],
           expand = FALSE, label_axes = list(bottom = "E", left = "N")) +
  ggtitle("Variables Outside Training Region at High End") +
  theme_bw()

I hope this has helped. Happy (and careful) model projecting!

References

Cobos ME, Owens HL, Soberón J, Peterson AT. 2024a. mop: Mobility Oriented-Parity Metric. R package version 0.1.2. https://CRAN.R-project.org/package=mop.

Cobos ME, Owens HL, Soberón J, Peterson AT. 2024b. Detailed multivariate comparisons of environments with mobility oriented parity. Frontiers of Biogeography 17: e132916. https://doi.org/10.21425/fob.17.132916.

Elith J, Kearney M, Phillips S. 2010. The art of modelling range‐shifting species. Methods in Ecology and Evolution 1: 330-42. https://doi.org/10.1111/j.2041-210X.2010.00036.x.

Owens HL, Campbell LP, Dornak LL, Saupe EE, Barve N, Soberón J, Ingenloff K, Lira-Noriega A, Hensz CM, Myers CE, Peterson AT. 2013. Constraints on interpretation of ecological niche models by limited environmental ranges on calibration areas. Ecological Modelling 263: 10-8. https://doi.org/10.1016/j.ecolmodel.2013.04.011.