wallace-session-2021-08-17--1-.knit

Please find below the R code history from your Wallace v1.1.0 session.

You can reproduce your session results by running this R Markdown file in RStudio.

Each code block is called a “chunk”, and you can run them either one-by-one or all at once by choosing an option in the “Run” menu at the top-right corner of the “Source” pane in RStudio.

For more detailed information see http://rmarkdown.rstudio.com).

Package installation

Wallace uses the following R packages that must be installed and loaded before starting.

library(spocc)
library(spThin)

## Loading required package: spam

## Loading required package: dotCall64

## Loading required package: grid

## Spam version 2.7-0 (2021-06-25) is loaded.
## Type 'help( Spam)' or 'demo( spam)' for a short introduction 
## and overview of this package.
## Help for individual functions is also obtained by adding the
## suffix '.spam' to the function name, e.g. 'help( chol.spam)'.

## 
## Attaching package: 'spam'

## The following objects are masked from 'package:base':
## 
##     backsolve, forwardsolve

## Loading required package: fields

## Loading required package: viridis

## Loading required package: viridisLite

## See https://github.com/NCAR/Fields for
##  an extensive vignette, other supplements and source code

## Loading required package: knitr

library(dismo)

## Loading required package: raster

## Loading required package: sp

library(rgeos)

## rgeos version: 0.5-5, (SVN revision 640)
##  GEOS runtime version: 3.8.0-CAPI-1.13.1 
##  Linking to sp version: 1.4-5 
##  Polygon checking: TRUE

library(ENMeval)

## Loading required package: magrittr

## 
## Attaching package: 'magrittr'

## The following object is masked from 'package:raster':
## 
##     extract

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:rgeos':
## 
##     intersect, setdiff, union

## The following objects are masked from 'package:raster':
## 
##     intersect, select, union

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Wallace also includes several functions developed to help integrate different packages and some additional functionality. For this reason, it is necessary to load the file functions.R, The function system.file() finds this script, and source() loads it.

source(system.file('shiny/funcs', 'functions.R', package = 'wallace'))

Record of analysis for Ovis canadensis.

Obtain Occurrence Data

The search for occurrences was limited to 10^{4} records. Obtain occurrence records of the selected species from the gbif database.

# query selected database for occurrence records
results <- spocc::occ(query = "Ovis canadensis", from = "gbif", limit = 10000, has_coords = TRUE)
# retrieve data table from spocc object
results.data <- results[["gbif"]]$data[[formatSpName("Ovis canadensis")]]
# remove rows with duplicate coordinates
occs.dups <- duplicated(results.data[c('longitude', 'latitude')])
occs <- results.data[!occs.dups,]
# make sure latitude and longitude are numeric (sometimes they are characters)
occs$latitude <- as.numeric(occs$latitude)
occs$longitude <- as.numeric(occs$longitude)
# give all records a unique ID
occs$occID <- row.names(occs)

Process Occurrence Data

The following code recreates the polygon used to select occurrences to keep in the analysis.

selCoords <- data.frame(x = c(-113.9933, -109.0727, -109.0508, -114.0812, -113.9933), y = c(41.9731, 41.9894, 36.9909, 37.0084, 41.9731))
selPoly <- sp::SpatialPolygons(list(sp::Polygons(list(sp::Polygon(selCoords)), ID=1)))
occs.xy <- occs[c('longitude', 'latitude')]
sp::coordinates(occs.xy) <- ~ longitude + latitude
intersect <- sp::over(occs.xy, selPoly)
intersect.rowNums <- as.numeric(which(!(is.na(intersect))))
occs <- occs[intersect.rowNums, ]

Spatial thinning selected. Thin distance selected is 10 km.

output <- spThin::thin(occs, 'latitude', 'longitude', 'name', thin.par = 10, reps = 100, locs.thinned.list.return = TRUE, write.files = FALSE, verbose = FALSE)

## Warning in spThin::thin(occs, "latitude", "longitude", "name", thin.par = 10, :
## There appear to be more than one species name in this *.csv file.

## Warning in spThin::thin(occs, "latitude", "longitude", "name", thin.par = 10, :
## Only using species name: Ovis canadensis Shaw, 1804

Since spThin did 100 iterations, there are 100 different variations of how it thinned your occurrence localities. As there is a stochastic element in the algorithm, some iterations may include more localities than the others, and we need to make sure we maximize the number of localities we proceed with.

# find the iteration that returns the max number of occurrences
maxThin <- which(sapply(output, nrow) == max(sapply(output, nrow)))
# if there's more than one max, pick the first one
maxThin <- output[[ifelse(length(maxThin) > 1, maxThin[1], maxThin)]]  
# subset occs to match only thinned occs
occs <- occs[as.numeric(rownames(maxThin)),]

Obtain Environmental Data

Using WorldClim (http://www.worldclim.org/) bioclimatic dataset at resolution of 2.5 arcmin.

# get WorldClim bioclimatic variable rasters
envs <- raster::getData(name = "worldclim", var = "bio", res = 2.5, lat = , lon = )
# change names rasters variables
envRes <- 2.5
if (envRes == 0.5) {
  i <- grep('_', names(envs))
  editNames <- sapply(strsplit(names(envs)[i], '_'), function(x) x[1])
  names(envs)[i] <- editNames
}
i <- grep('bio[0-9]$', names(envs))
editNames <- paste('bio', sapply(strsplit(names(envs)[i], 'bio'), function(x) x[2]), sep='0')
names(envs)[i] <- editNames
# subset by those variables selected
envs <- envs[[c('bio01', 'bio02', 'bio03', 'bio04', 'bio05', 'bio06', 'bio07', 'bio08', 'bio09', 'bio10', 'bio11', 'bio12', 'bio13', 'bio14', 'bio15', 'bio16', 'bio17', 'bio18', 'bio19')]]
# extract environmental values at occ grid cells
locs.vals <- raster::extract(envs[[1]], occs[, c('longitude', 'latitude')])
# remove occs without environmental values
occs <- occs[!is.na(locs.vals), ]

Process Environmental Data

Background selection technique chosen as Bounding Box.

xmin <- min(occs$longitude)
xmax <- max(occs$longitude)
ymin <- min(occs$latitude)
ymax <- max(occs$latitude)
bb <- matrix(c(xmin, xmin, xmax, xmax, xmin, ymin, ymax, ymax, ymin, ymin), ncol=2)
bgExt <- sp::SpatialPolygons(list(sp::Polygons(list(sp::Polygon(bb)), 1)))

Buffer size of the study extent polygon defined as 0.5 degrees.

bgExt <- rgeos::gBuffer(bgExt, width = 0.5)

Mask environmental variables by Bounding Box, and take a random sample of background values from the study extent. As the sample is random, your results may be different than those in the session. If there seems to be too much variability in these background samples, try increasing the number from 10,000 to something higher (e.g. 50,000 or 100,000). The better your background sample, the less variability you’ll have between runs.

# crop the environmental rasters by the background extent shape
envsBgCrop <- raster::crop(envs, bgExt)
# mask the background extent shape from the cropped raster
envsBgMsk <- raster::mask(envsBgCrop, bgExt)
# sample random background points
bg.xy <- dismo::randomPoints(envsBgMsk, 10000)
# convert matrix output to data frame
bg.xy <- as.data.frame(bg.xy)  
colnames(bg.xy) <- c("longitude", "latitude")

Partition Occurrence Data

Occurrence data is now partitioned for cross-validation, a method that iteratively builds a model on all but one group and evaluates that model on the left-out group.

For example, if the data is partitioned into 3 groups A, B, and C, a model is first built with groups A and B and is evaluated on C. This is repeated by building a model with B and C and evaluating on A, and so on until all combinations are done.

Cross-validation operates under the assumption that the groups are independent of each other, which may or may not be a safe assumption for your dataset. Spatial partitioning is one way to ensure more independence between groups.

You selected to partition your occurrence data by the method.

occs.xy <- occs[c('longitude', 'latitude')]
group.data <- ENMeval::get.randomkfold(occ = occs.xy, bg = bg.xy, kfolds = 4)

# pull out the occurrence and background partition group numbers from the list
occs.grp <- group.data[[1]]
bg.grp <- group.data[[2]]

Build and Evaluate Niche Model

You selected the maxent model.

# define the vector of regularization multipliers to test
rms <- seq(0.5, 3.5, 1)
# iterate model building over all chosen parameter settings
e <- ENMeval::ENMevaluate(occ = occs.xy, env = envsBgMsk, bg.coords = bg.xy,
                          RMvalues = rms, fc = c('L', 'LQ', 'H', 'LQH'), method = 'user', 
                          occ.grp = occs.grp, bg.grp = bg.grp, 
                          clamp = TRUE, algorithm = "maxnet")

## * Running ENMeval v2.0.0 with legacy arguments. These will be phased out in the next version.

## Warning: These arguments were deprecated and replaced with the argument "user.grp".

## *** Running initial checks... ***

## * Clamping predictor variable rasters...

## * Model evaluations with user-defined 4-fold cross validation...

## 
## *** Running ENMeval v2.0.0 with maxnet from maxnet package v0.1.4 ***

## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |====                                                                  |   6%
  |                                                                            
  |=========                                                             |  12%
  |                                                                            
  |=============                                                         |  19%
  |                                                                            
  |==================                                                    |  25%
  |                                                                            
  |======================                                                |  31%
  |                                                                            
  |==========================                                            |  38%
  |                                                                            
  |===============================                                       |  44%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |=======================================                               |  56%
  |                                                                            
  |============================================                          |  62%
  |                                                                            
  |================================================                      |  69%
  |                                                                            
  |====================================================                  |  75%
  |                                                                            
  |=========================================================             |  81%
  |                                                                            
  |=============================================================         |  88%
  |                                                                            
  |==================================================================    |  94%
  |                                                                            
  |======================================================================| 100%

## ENMevaluate completed in 7 minutes 28.6 seconds.

# unpack the results data frame, the list of models, and the RasterStack of raw predictions
evalTbl <- e@results
evalMods <- e@models
names(evalMods) <- e@tune.settings$tune.args
evalPreds <- e@predictions

# view response curves for environmental variables with non-zero coefficients
plot(evalMods[["rm.0.5_fc.L"]], vars = c('bio01', 'bio02', 'bio03', 'bio04', 'bio05', 'bio06', 'bio08', 'bio09', 'bio11', 'bio12', 'bio13', 'bio14', 'bio15', 'bio16', 'bio18'), type = "cloglog")

# view ENMeval results
ENMeval::evalplot.stats(e, stats = "auc.val", "rm", "fc")

# Select your model from the models list
mod <- evalMods[["rm.0.5_fc.L"]]

# generate raw prediction
pred <- evalPreds[["rm.0.5_fc.L"]]

# plot the model prediction
plot(pred)

# ggplot the model prediction
# utah <- st_read("https://wwiskes.github.io/datadump/states.json") %>%
#   st_transform(crs = 4326) %>%
#   filter(NAME == "Utah")
# library(ggplot2)
# colnames(pred)
# ggplot(pred)+
#   geom_raster()+#aes(x=x, y=y, fill=var1.pred))+
#   scale_fill_distiller(palette = "Spectral", direction = -1)+
#   theme(
#     axis.title.x=element_blank(),
#     axis.title.y=element_blank(),
#     legend.position='none'
#   ) +
#   geom_sf(data = utah, col = "black", alpha=0)