Spatial Analysis in Maritimes Region

This document consolidates a range of methodological approaches to conduct spatial analysis and modeling in the Scotian Shelf Bioregion. Spatial modeling of the same area using different methods may provide dissimilar results, therefore, standards and protocols must be developed to explicitly state the methods and metadata associated with their production https://www.frontiersin.org/articles/10.3389/fmars.2017.00288/full?utm_source=F-AAE&utm_medium=EMLF&utm_campaign=MRK_398354_45_Marine_20170921_arts_A. In this context, we explored how dissimilar are predictions using the same data and different algorithms (i.e. Bioclim, GAM, Maxent, and Random Forest).

Our ultimate goal of this and future collaborations is to share best practices, document efforts being conducted in Maritimes region at DFO, and facilitate transparency and reproducibility of analysis.

Step 1: Load libraries

Make sure you install all packages required for this step (see 1_Setup.R)

library(devtools)
library(dismo)
library(fields)
library(lattice)
library(lubridate)
library(Matrix)
library(mgcv)
library(RandomFields)
library(randomForest)
library(raster)
library(RColorBrewer)
library(rJava)
library(rgdal)
library(ROCR)
library(rgdal)
library(sp)
library(spatstat)
library(SpatialHub)
library(spacetime)
library(oce)
library(ocedata)
pal <- colorRampPalette(rev(brewer.pal(11, "Spectral")))(100)
grid <- 3000

Step 2: Response Variable - Mystery species data

For this exercise, we are using a mystery species sampled in the summer Research Vessel (RV) survey in Maritimes Region. A mystery habitat suitability model was derived using two environmental predictors (ocean depth and bottom temperature). Presence/absence data was sampled from this surface and used to create the species data for this exercise.

To access species data, make sure that in the Guru-SpatialR folder there is a subfolder labeled data. This will be the folder where the data inputs should be stored. Download the species data for this exercise here https://www.dropbox.com/s/xf8idhcw5251alc/Sim1.rdata?dl=0 and save it in the data folder if you are not able to clone from GitHub.

The following lines of code will load the species data Sim1.rdata, and will transform latitude and longitude to UTM coordinates reported in km (plat and plon). This transformation is necessary so the species data matches the same projection as the environmental data.

load("../data/Sim1.rdata")
simData <- lonlat2planar(simData,"utm20", input_names=c("X", "Y"))
head(simData)

##       setid     Depth Temperature         X        Y        H.V Strata P.A
## 1445   2016 109.86731    5.571889 -65.31472 42.37321 0.41892178    481   1
## 6344   6949  99.04169   11.974998 -66.32499 42.64549 0.04058721    481   0
## 19452 20125 104.31867    8.170669 -65.97808 43.04224 0.26188736    481   0
## 21247 22088  99.60382   14.316823 -66.37278 43.07629 0.05877589    481   0
## 16745 17395 139.13347    5.482053 -65.64428 42.98741 0.45918563    481   0
## 7900   8505 102.12708    6.405646 -65.58473 42.73664 0.23138067    481   0
##         plon    plat
## 1445  309.42 4693.81
## 6344  227.42 4726.81
## 19452 257.42 4769.81
## 21247 225.42 4774.81
## 16745 284.42 4762.81
## 7900  288.42 4734.81

plot SimData

Black points are mystery species presence and grey points are mystery species absence.

simData_pres <- subset(simData, P.A == 1)
with(simData, plot(X, Y, col='lightgrey', pch=19, xlab="Longitude", ylab="Latitude"))
with(simData_pres, points(X,Y, pch=19, cex=0.6))
data("coastlineWorldFine") # from the ocedata package
lines(coastlineWorldFine[['longitude']], coastlineWorldFine[['latitude']]) #access object in the double wrackets

Step 3: Predictor variables - Environmental data

Bathymetry and bottom temperature assembled for this work in the Scotian Shelf bioregion were originally derived and assembled as part of the Snowcrab assessment in DFO’s Maritimes Region (Hubley et al. 2018) http://publications.gc.ca/collections/collection_2018/mpo-dfo/fs70-5/Fs70-5-2017-053-eng.pdf

Environmental predictors and species data used in this project are projected in (planar) UTM coordinates reported in km.

Download environmental data https://www.dropbox.com/s/lm5ugii6607lveb/Env1.rdata?dl=0 and save it in the sub-folder data if you were not able to clone it from GitHub.

Code below loads ocean depth (z) and bottom temperature interpolations (tmean.climatology)

load("../data/Env1.rdata")
head(predSpace)

##   plon   plat        z tmean.climatology       lon      lat
## 1 79.4 4550.8 51.91413          8.113430 -68.00020 40.99987
## 2 80.4 4550.8 51.34660          8.241729 -67.98835 41.00039
## 3 81.4 4550.8 51.80719          8.583757 -67.97651 41.00090
## 4 82.4 4550.8 53.47815          9.056436 -67.96466 41.00141
## 5 83.4 4550.8 52.89379          9.608761 -67.95282 41.00192
## 6 84.4 4550.8 50.24844         10.047371 -67.94097 41.00243

Ocean depth

Note all maps presented in this documents are plotted using the function planarMap https://github.com/BradHubley/SpatialHub/blob/master/R/planarMap.r

    xyz = predSpace[c('plon','plat','z')]
    datascale = seq(10,600,l=50)
    corners = data.frame(lon=c(-67.54,-56.5),lat=c(41,47.2))

    planarMap( xyz, fn="Depth", loc="output", datascale=datascale , corners=corners,log.variable=T,rev=T)

Bottom temperature

Interpolations derived from the summer Ecosystem Research Vessel (RV) survey

    xyz = predSpace[c('plon','plat','tmean.climatology')]
    datarange = quantile(xyz[,3],probs=c(0.01,0.99),na.rm=T)
    datascale = seq(datarange[1],datarange[2],l=50)
    corners = data.frame(lon=c(-67.54,-56.5),lat=c(41,47.2))

    planarMap( xyz, fn="tmean.climatology", loc="output", datascale=datascale , corners=corners,save=F)

Step 4: Spatial Modelling

The following code and workflow build Species Distribution Models (SDM) using the environmental and species data loaded in step 1 and 2, including:

Bioclim: presence-only model
general additive models (GAMs): presence-absence models
Maximum Entropy (Maxent): presence-only models
Random forest: presence-absence models
Boosted Regression Trees: presence-absence models (under-development)
Bayesian with INLA: presence-absence models (under-development)

Bioclim

R-package dismo: https://cran.r-project.org/web/packages/dismo/dismo.pdf

The Bioclim algorithm has been extensively used for species distribution modeling, knows as the classic ’climate-envelope-model’. Note that Bioclim does not perform as good as some other modeling methods (Elith et al. 2006) and is unsuited for predicting climate change effects (Hijmans and Graham, 2006).

The BIOCLIM algorithm computes the similarity of a location by comparing the values of environmental variables at any location to a percentile distribution of the values at known locations of occurrence (’training sites’). The closer to the 50th percentile (the median), the more suitable the location is. The tails of the distribution are not distinguished, that is, 10 percentile is treated as equivalent to 90 percentile.

In this R implementation, percentile scores are between 0 and 1, but predicted values larger than 0.5 are subtracted from 1. Then, the minimum percentile score across all the environmental variables is computed (i.e. this is like Liebig’s law of the minimum, except that high values can also be limiting factors). The final value is subtracted from 1 and multiplied with 2 so that the results are between 0 and 1. The reason for this transformation is that the results become more like that of other distribution modeling methods and are thus easier to interpret. The value 1 will rarely be observed as it would require a location that has the median value of the training data for all the variables considered. The value 0 is very common as it is assigned to all cells with a value of an environmental variable that is outside the percentile distribution (the range of the training data) for at least one of the variables.

The following lines of code will model the species distribution using Bioclim. In this example, we are only using species presence-only and two predictors: ocean depth and bottom temperature.

        simData = simData[c("setid", "Depth", "Temperature", "P.A", "plon", "plat")]
        names(simData)[1:4] = c("ID","z","t","Y")
        dat.p = subset(simData,Y==1)
        head(dat.p)

##          ID        z        t Y   plon    plat
## 1445   2016 109.8673 5.571889 1 309.42 4693.81
## 14062 14697 133.6784 8.670085 1 250.42 4755.81
## 11820 12443 142.9231 5.384462 1 306.42 4748.81
## 17134 17785 134.5403 6.220937 1 283.42 4763.81
## 14471 15115 133.2307 4.174895 1 301.42 4756.81
## 19479 20159 126.8158 5.678591 1 291.42 4769.81

        # just temp, depth
        bc = bioclim(dat.p[,c('t','z')])
        pI = predSpace[,c('tmean.climatology','z')]
        names(pI)[1] = 't'
        bcp = predict(bc,pI)
        xyz_bioclim = cbind(baseLine,z=bcp)
        corners = data.frame(lon=c(-67.54,-56.5),lat=c(41,47.2))
        with(xyz_bioclim, hist(z, main = "bioclim output"))

        bioclim.predict <- planarMap( xyz_bioclim, fn="simData.biocl.pred", loc="output",corners=corners,save=F)

GAMs

R-package: “mgcv”: https://cran.r-project.org/web/packages/mgcv/mgcv.pdf Wood 2018

Generalized additive model (GAM) approach allows for space to be explicitly incorporated into the model. GAMs allow the incorporation of smooth functions for covariates, allowing nonlinear relationships between the environmental data and the species data. “GAMs use non‐parametric, data‐defined smoothers to fit non‐linear functions, whereas GLMs fit parametric terms, usually some combination of linear, quadratic and/or cubic terms. Because of their greater flexibility, GAMs are more capable of modelling complex ecological response shapes than GLMs (Yee and Mitchell 1991)” Elith et al. 2006. The species distribution was modeled using a binomial generalized additive model (GAM; R-package “mgcv”; Wood 2006) with a logit link function. Smoothed (thin-plate-spline) covariates were ocean depth (z) and bottom temeprature (t).

# GAM: temp, depth, space
  load("../data/Sim1.rdata")
  simData <- lonlat2planar(simData,"utm20", input_names=c("X", "Y"))
  simData = simData[c("setid", "Depth", "Temperature", "P.A", "plon", "plat")]
    names(simData)[1:4] = c("ID","z","t","Y")

        Mf = formula( Y ~ s(t) +  s(z) + s(plon, plat)  )
        Md = simData[,c('plon','plat','t','z','Y')]

      Mo = gam( Mf, data=Md, family=binomial())

GAM Output

summary(Mo)

## 
## Family: binomial 
## Link function: logit 
## 
## Formula:
## Y ~ s(t) + s(z) + s(plon, plat)
## 
## Parametric coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.35423    0.05721  -6.192 5.94e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                 edf Ref.df Chi.sq p-value    
## s(t)          2.508  3.243  14.75 0.00296 ** 
## s(z)          5.435  6.623  96.47 < 2e-16 ***
## s(plon,plat) 25.183 28.090 231.52 < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =   0.25   Deviance explained = 21.5%
## UBRE = 0.11175  Scale est. = 1         n = 1800

GAM Predictions

        pI = predSpace[,c('plon','plat','tmean.climatology','z')]
        names(pI)[3] = 't'
        bcp = predict(Mo,pI,type='response')
        xyz_gam = cbind(baseLine,z=bcp)
        corners = data.frame(lon=c(-67.54,-56.5),lat=c(41,47.2))
        with(xyz_gam, hist(z, main = "GAM output"))

        GAM.predict <- planarMap( xyz_gam, fn="simData.gambi.pred", loc="output",corners=corners,save=F)

MaxEnt (Maximum Entropy)

R-package: “dismo”: https://cran.r-project.org/web/packages/dismo/dismo.pdf

“The function uses environmental data for locations of known presence and for a large number of ’background’ locations. Environmental data can be extracted from raster files. The result is a model object that can be used to predict the suitability of other locations, for example, to predict the entire range of a species” Hijmans et al. 2017.

This function uses the MaxEnt species distribution model software, which is a java program that you can download here http://www.cs.princeton.edu/~schapire/maxent/. Place the file ’maxent.jar’ in the ’java’ folder of this package. That is the folder returned by system.file(“java”, package=“dismo”). You need MaxEnt version 3.3.3b or higher“.

The follwing line of code should return somehting like this: C:/Users/Documents/R/win-library/3.1/dismo/java

system.file('java', package='dismo')

## [1] "C:/Users/GomezC/Documents/R/win-library/3.3/dismo/java"

Project environmental data in UTM coordinates reported in m, and then create a SpatialPointsDataFrame object (class for spatial attributes that have spatial point locations)

load("../data/Env1.rdata")
head(predSpace)

##   plon   plat        z tmean.climatology       lon      lat
## 1 79.4 4550.8 51.91413          8.113430 -68.00020 40.99987
## 2 80.4 4550.8 51.34660          8.241729 -67.98835 41.00039
## 3 81.4 4550.8 51.80719          8.583757 -67.97651 41.00090
## 4 82.4 4550.8 53.47815          9.056436 -67.96466 41.00141
## 5 83.4 4550.8 52.89379          9.608761 -67.95282 41.00192
## 6 84.4 4550.8 50.24844         10.047371 -67.94097 41.00243

predspace <- planar2lonlat(predSpace, "utm20")
predspace.utm <- project(as.matrix(predspace[c("lon", "lat")]), "+proj=utm +zone=20 ellps=WGS84")
head(predspace.utm)

##        lon     lat
## [1,] 79400 4550800
## [2,] 80400 4550800
## [3,] 81400 4550800
## [4,] 82400 4550800
## [5,] 83400 4550800
## [6,] 84400 4550800

cords <- predspace.utm
spdf <- SpatialPointsDataFrame(cords, data=predspace)

Grid the environmental data. predictors (ocean depth and temperature) are rasterized (if x represents points, each point is assigned to a grid cell) and stacked

xmn <- 100000
ymn <- 4500000
xmx <- 1000000
ymx <- 5300000
extent <- extent(xmn, xmx, ymn, ymx)
ncols <- length(xmx:xmn)/grid
nrows <- (length(ymx:ymn) - 1)/grid
blankraster <- raster(nrows=nrows, ncols=ncols, xmn=xmn, xmx=xmx, ymn=ymn, ymx = ymx)
blankraster[] <- 1: ncell(blankraster)
depth <- rasterize(x=spdf, y=blankraster, field="z", fun=mean)
temp <- rasterize(x=spdf, y=blankraster, field="tmean.climatology", fun=mean)
predictorsr <- list(depth, temp)
predictors<- raster::stack(predictorsr)
names(predictors) <- c("z", "t")
crs(predictors) <- "+proj=utm +zone=20 ellps=WGS84"

Project species data in UTM coordinates in m so it matches the projection of the environmental layers rasterized above

  load("../data/Sim1.rdata")
  simData <- lonlat2planar(simData,"utm20", input_names=c("X", "Y"))
  simdata.utm <- project(as.matrix(simData[c("X", "Y")]), "+proj=utm +zone=20 ellps=WGS84")
  simdata <- cbind(simdata.utm, simData)
  names(simdata)[1:2] = c("lonutm","latutm")
  simdata = simdata[c("lonutm", "latutm", "P.A")]
  pres <- simdata[simdata[, 3] == 1, 1:2]

MaxEnt is a presence-only model. Background points are sampled randomly from the cells that are not NA in the first predictor variable, unless background points (e.g. zeros) are specified. In our case we have zeros, thus we select those in the following step. Note that MaxEnt has been shown to be exactly mathematically equivalent to a GLM, more specifically, to a Poisson regression (also known as “log-linear modelling”) https://methodsblog.com/2013/02/20/some-big-news-about-maxent/

head(simdata)

##       lonutm  latutm P.A
## 1445  309420 4693810   1
## 6344  227420 4726810   0
## 19452 257420 4769810   0
## 21247 225420 4774810   0
## 16745 284420 4762810   0
## 7900  288420 4734810   0

abs <- simdata[simdata[, 3] == 0, 1:2]
head(abs)

##       lonutm  latutm
## 6344  227420 4726810
## 19452 257420 4769810
## 21247 225420 4774810
## 16745 284420 4762810
## 7900  288420 4734810
## 17528 284420 4764810

Run Maxent

Following Phillips et al. (2006) and Merow et al. (2013), the MaxEnt runs were conducted using the following settings: - selected random seed, - maximum number of background points = 10,000 (a random sample of point locations from the landscape to represent the environmental conditions in the study area), - regularization multiplier = 1 (included to reduce over-fitting), - number of replicates = 10 (to do multiple runs as a means to provide averages of the results from all models created), - no output grids, - maximum iterations = 5000 (allows the model to have adequate opportunity for convergence), convergence threshold = 0.00001. - threads=2 (speeds up the processing)

To assess uncertainty in model predictions, cross-validation replication was used, which incorporates all available sightings, making better use of smaller data-sets. Cumulative output type was selected to visualize MaxEnt results; this output does not rely on post-processing assumptions and it is useful when illustrating potential species range boundaries. Note that a sub-folder labelled “maxent”" will be created in “output” to display results.

Specify the model

#options(java.parameters = "-Xmx1g")
model.maxent<-maxent(x=predictors, p=pres, a=abs, args=c("randomseed","threads=4", "replicatetype=crossvalidate", "jackknife=true", "responsecurves=true", "maximumiterations=5000", "replicates=10", "threads=2", "nooutputgrids", "betamultiplier=1"))

The following command will pop-out a window with all of MaxEnt results in html format

model.maxent

## class     : MaxEntReplicates 
## replicates: 10

Prediction

map.maxent <- predict(model.maxent, predictors, progress='text', args=c("outputformat=raw"))

## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |================                                                 |  25%
  |                                                                       
  |================================                                 |  50%
  |                                                                       
  |=================================================                |  75%
  |                                                                       
  |=================================================================| 100%
## 
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |================                                                 |  25%
  |                                                                       
  |================================                                 |  50%
  |                                                                       
  |=================================================                |  75%
  |                                                                       
  |=================================================================| 100%
## 
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |================                                                 |  25%
  |                                                                       
  |================================                                 |  50%
  |                                                                       
  |=================================================                |  75%
  |                                                                       
  |=================================================================| 100%
## 
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |================                                                 |  25%
  |                                                                       
  |================================                                 |  50%
  |                                                                       
  |=================================================                |  75%
  |                                                                       
  |=================================================================| 100%
## 
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |================                                                 |  25%
  |                                                                       
  |================================                                 |  50%
  |                                                                       
  |=================================================                |  75%
  |                                                                       
  |=================================================================| 100%
## 
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |================                                                 |  25%
  |                                                                       
  |================================                                 |  50%
  |                                                                       
  |=================================================                |  75%
  |                                                                       
  |=================================================================| 100%
## 
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |================                                                 |  25%
  |                                                                       
  |================================                                 |  50%
  |                                                                       
  |=================================================                |  75%
  |                                                                       
  |=================================================================| 100%
## 
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |================                                                 |  25%
  |                                                                       
  |================================                                 |  50%
  |                                                                       
  |=================================================                |  75%
  |                                                                       
  |=================================================================| 100%
## 
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |================                                                 |  25%
  |                                                                       
  |================================                                 |  50%
  |                                                                       
  |=================================================                |  75%
  |                                                                       
  |=================================================================| 100%
## 
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |================                                                 |  25%
  |                                                                       
  |================================                                 |  50%
  |                                                                       
  |=================================================                |  75%
  |                                                                       
  |=================================================================| 100%
##

map.maxent.mean <- mean(map.maxent)

To allow for comparisons between results, we rescaled the raw output to range between 0 and 100.

source('../code/functions/scaledCumsum.r')
suitability.maxent.cumulative <- scaledCumsum(map.maxent.mean)

Next step edits the raster of the maxent cumulative model to a xyz list to be able to plot it using the function planarMap (which requires UTM projection in km). The MaxEnt output is rescaled to 1 - 0 for consistency with previous approaches implemented in this project.

xyz_maxent <- rasterToPoints(suitability.maxent.cumulative)
xyz_maxent <- as.data.frame(xyz_maxent)
plat.lon <- xyz_maxent[, -3]/1000 
z <- xyz_maxent[, 3]/100 
xyz_maxent = cbind(plat.lon,z)
names(xyz_maxent)[1:2] = c("plon","plat")
head(xyz_maxent)

##    plon     plat           z
## 1 296.5 5298.502 0.004049125
## 2 299.5 5298.502 0.001597283
## 3 302.5 5298.502 0.003150884
## 4 305.5 5298.502 0.006501497
## 5 308.5 5298.502 0.009876775
## 6 311.5 5298.502 0.011857594

with(xyz_maxent, hist(z, main = "MaxEnt output"))

maxent.predict <- planarMap(xyz_maxent, fn="simData.maxent.pred", loc="output",corners=corners,save=F)

Random Forest

Random Forest is a flexible, machine learning algorithm commonly used in SDMs because of it’s simplicity and its use for both classification and regression tasks. Random forest builds multiple decision trees and merges them together to get more accurate and stable prediction.

“Several studies have shown that RF models often reach top predictive performance compared to other methods (e.g., Cutler et al. 2007). The RF method (Breiman 2001) is an ensemble learning technique based on a combination of a large set of decision trees. Each tree is trained by selecting a random set of variables and a random sample from the training dataset (i.e., the calibration data set). Three training parameters need to be defined in the RF algorithm: ntree, the number of bootstrap samples for the original data (the default is 500); mtry, the number of different predictors tested at each node (which in this case can be 19 at most); and nodesize, the minimal size of the terminal nodes of the trees, below which leaves are not further subdivided. RF was implemented using the random forest package in R with the default values of ntree=500, mtry=6, and nodesize=5. To cross-validate, training and testing data were created by random sampling (without replacement) using k-fold data partitioning from the dismo library in R” http://publications.gc.ca/collections/collection_2018/mpo-dfo/fs70-5/Fs70-5-2018-002-eng.pdf

Load and set up data

load("../data/Sim1.rdata")
simData <- lonlat2planar(simData,"utm20", input_names=c("X", "Y"))
rdata = simData[c("Depth", "Temperature", "P.A")]
names(rdata)[1:3] = c("z","t", "Y")
rdata$Y = as.factor(rdata$Y)
env = predspace[, 3:4]

Specify the model

model <- Y ~ z + t
rf1 <- randomForest(model, data=rdata, proximity=TRUE, type = 'prob', ntree=500, mtry=6, nodesize=5)

## Warning in randomForest.default(m, y, ...): invalid mtry: reset to within
## valid range

rf1

## 
## Call:
##  randomForest(formula = model, data = rdata, proximity = TRUE,      type = "prob", ntree = 500, mtry = 6, nodesize = 5) 
##                Type of random forest: classification
##                      Number of trees: 500
## No. of variables tried at each split: 2
## 
##         OOB estimate of  error rate: 40.33%
## Confusion matrix:
##     0   1 class.error
## 0 692 331   0.3235582
## 1 395 382   0.5083655

Explore variable importance. In this case is ocean depth (z) and temperature (t).

varImpPlot(rf1)

Prediction

pr <- predict(predictors, rf1, type = 'prob', index = 2)

xyz_forest <- rasterToPoints(pr)
xyz_forest <- as.data.frame(xyz_forest)
rlat.lon <- xyz_forest[, -3]/1000 
z <- xyz_forest[, 3]
xyz_forest = cbind(rlat.lon,z)
names(xyz_forest)[1:2] = c("plon","plat")
with(xyz_forest, hist(z, main = "Random Forest output"))

forest.predict <- planarMap(xyz_forest, fn="simData.rforest.pred", loc="output",corners=corners,save=F)

Step 5. Model Comparison

Transform SDM results to rasters, stack them, and run correlations

Project SDM results in UTM coordinates reported in m, and then create a SpatialPointsDataFrame object (class for spatial attributes that have spatial point locations)

a <- planar2lonlat(xyz_bioclim, "utm20")
a.utm <- project(as.matrix(a[c("lon", "lat")]), "+proj=utm +zone=20 ellps=WGS84")
cords_a <- a.utm
a_spdf <- SpatialPointsDataFrame(cords_a, data=a)

b <- planar2lonlat(xyz_gam, "utm20")
b.utm <- project(as.matrix(b[c("lon", "lat")]), "+proj=utm +zone=20 ellps=WGS84")
cords_b <- b.utm
b_spdf <- SpatialPointsDataFrame(cords_b, data=b)

c <- planar2lonlat(xyz_maxent, "utm20")
c.utm <- project(as.matrix(c[c("lon", "lat")]), "+proj=utm +zone=20 ellps=WGS84")
cords_c <- c.utm
c_spdf <- SpatialPointsDataFrame(cords_c, data=c)

d <- planar2lonlat(xyz_forest, "utm20")
d.utm <- project(as.matrix(d[c("lon", "lat")]), "+proj=utm +zone=20 ellps=WGS84")
cords_d <- d.utm
d_spdf <- SpatialPointsDataFrame(cords_d, data=d)

Grid the SDM results: rasterize results (if x represents points, each point is assigned to a grid cell) and stack them

xmn <- 100000
ymn <- 4500000
xmx <- 1000000
ymx <- 5300000
extent <- extent(xmn, xmx, ymn, ymx)
ncols <- length(xmx:xmn)/grid
nrows <- (length(ymx:ymn) - 1)/grid
blankraster <- raster(nrows=nrows, ncols=ncols, xmn=xmn, xmx=xmx, ymn=ymn, ymx = ymx)
blankraster[] <- 1: ncell(blankraster)

raster_bioclim <- rasterize(x=a_spdf, y=blankraster, field="z", fun=mean)
raster_gam <- rasterize(x=b_spdf, y=blankraster, field="z", fun=mean)
raster_maxent <- rasterize(x=c_spdf, y=blankraster, field="z", fun=mean)
raster_randforest <- rasterize(x=d_spdf, y=blankraster, field="z", fun=mean)

SDM <- list(raster_bioclim, raster_gam, raster_maxent, raster_randforest)
SDM <- raster::stack(SDM)
names(SDM) <- c("Bioclim", "GAMs", "MaxEnt", "Random Forest")
crs(SDM) <- "+proj=utm +zone=20 ellps=WGS84"
SDM <- setMinMax(SDM)
pairs(SDM, method= c("spearman"))

#save file:
#outfile = "output/gTiff/maxent.model.tif"
#writeRaster(maxent.model, filename=outfile, overwrite=TRUE, format="GTiff", datatype="FLT4S")

Bioclim

bioclim.predict

GAM

GAM.predict

Random Forest

forest.predict

MaxEnt

maxent.predict

Spatial Analysis in Maritimes Region

Code contributors to-date: B. Hubley, J. Choi, A. Cook, M.E. Greenlaw, K. French, P. Greyson, C. Gomez

May 2019

Step 1: Load libraries

Step 2: Response Variable - Mystery species data

Step 3: Predictor variables - Environmental data

Ocean depth

Bottom temperature

Step 4: Spatial Modelling

Bioclim

GAMs

MaxEnt (Maximum Entropy)

Random Forest

Step 5. Model Comparison

Transform SDM results to rasters, stack them, and run correlations

Bioclim

GAM

Random Forest

MaxEnt