Decription

This little note is designed to show you what are the different way to feed MAXENT models with background data within biomod2: - using pseudo-absences as background (MAXENT is trained with exactly the same set of data than all other models called in biomod2) - using the full explanatory variables dataset to let MAXENT choose it’s own background data (this procedure is much closer to what is done by default by MAXENT when it runs out of biomod2)

Code example

Laod species data

# load biomod2 package
library(biomod2)
## biomod2 3.3-11 loaded.
## 
## Type browseVignettes(package='biomod2') to access directly biomod2 vignettes.
library(raster)
## Loading required package: sp
## print the version of packages used in this example
sessionInfo()
## R version 3.2.5 (2016-04-14)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 14.04.4 LTS
## 
## locale:
##  [1] LC_CTYPE=fr_FR.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=fr_FR.UTF-8        LC_COLLATE=fr_FR.UTF-8    
##  [5] LC_MONETARY=fr_FR.UTF-8    LC_MESSAGES=fr_FR.UTF-8   
##  [7] LC_PAPER=fr_FR.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] raster_2.5-8   sp_1.2-3       biomod2_3.3-11
## 
## loaded via a namespace (and not attached):
##  [1] gbm_2.1.1             zoo_1.7-13            slam_0.1-35          
##  [4] NLP_0.1-9             splines_3.2.5         lattice_0.20-33      
##  [7] colorspace_1.2-6      htmltools_0.3         viridisLite_0.1.3    
## [10] yaml_2.1.13           survival_2.39-4       hexbin_1.27.1        
## [13] DBI_0.4-1             RColorBrewer_1.1-2    plotmo_3.1.4         
## [16] plyr_1.8.4            mda_0.4-8             stringr_1.0.0        
## [19] munsell_0.4.3         gtable_0.2.0          evaluate_0.8         
## [22] latticeExtra_0.6-28   knitr_1.11            SparseM_1.7          
## [25] tm_0.6-2.1            parallel_3.2.5        class_7.3-14         
## [28] maxent_1.3.3.1        Rcpp_0.12.5           scales_0.4.0         
## [31] plotrix_3.6-2         abind_1.4-3           TeachingDemos_2.10   
## [34] ggplot2_2.1.0         digest_0.6.9          stringi_1.1.1        
## [37] dplyr_0.4.3           dismo_1.1-1           rasterVis_0.40       
## [40] grid_3.2.5            tools_3.2.5           magrittr_1.5         
## [43] PresenceAbsence_1.1.9 tibble_1.0            randomForest_4.6-12  
## [46] tidyr_0.5.1           MASS_7.3-45           Matrix_1.2-6         
## [49] pROC_1.8              assertthat_0.1        rmarkdown_0.9.5      
## [52] reshape_0.8.5         earth_4.4.4           R6_2.1.2             
## [55] rpart_4.1-10          nnet_7.3-12
# load species occurence data
dat <- read.csv(system.file("external/species/mammals_table.csv", package="biomod2"))

We will work with GuloGulo species

resp.name <- 'GuloGulo'

In this examaple we will keep only GuloGulo presences

resp.occ.id <- which(dat[, resp.name] == 1)
resp.occ <- as.numeric(dat[resp.occ.id, resp.name])

# extract GuloGulo presences coordinates
resp.xy <- dat[resp.occ, c("X_WGS84", "Y_WGS84")]

Load climatic data

# we will work with BIOCLIM  variables (bio3, bio4, bio7, bio11 & bio12)
expl.var = stack( system.file( "external/bioclim/current/bio3.grd", 
                             package="biomod2"),
                  system.file( "external/bioclim/current/bio4.grd", 
                               package="biomod2"), 
                  system.file( "external/bioclim/current/bio7.grd", 
                               package="biomod2"),  
                  system.file( "external/bioclim/current/bio11.grd", 
                               package="biomod2"), 
                  system.file( "external/bioclim/current/bio12.grd", 
                               package="biomod2"))

Construct a MAXENT friendly background directory

If we want to work with MAXENT as usual backgroud data we have to give (here create) the path to a directory containing our explanatory variables as ascii files

maxent.background.dat.dir <- "maxent_bg"
dir.create(maxent.background.dat.dir, showWarnings = FALSE, recursive = TRUE)

## resave explanatory data
for(var_ in names(expl.var)){
  cat("\n> saving", paste0(var_, ".asc"))
  writeRaster(subset(expl.var, var_), 
              filename = file.path(maxent.background.dat.dir, paste0(var_, ".asc")),
              overwrite = TRUE)
}
## 
## > saving bio3.asc
## > saving bio4.asc
## > saving bio7.asc
## > saving bio11.asc
## > saving bio12.asc
## define the path to maxent.jar file
path.to.maxent.jar <- file.path(getwd(), "maxent.jar")

Format the data in biomod2 friendly format

Here we will throw 100 pseudo absences tandomly selected to show the differences btw running MAXENT with biomod2 PA as background data or with the ‘as usual’ MAXENT background data sampled over the whole environment.

bm.dat <- BIOMOD_FormatingData(resp.var = resp.occ,
                               expl.var = expl.var,
                               resp.xy = resp.xy,
                               resp.name = resp.name,
                               PA.nb.rep = 1,
                               PA.nb.absences = 50)
## 
## -=-=-=-=-=-=-=-=-=-=-=-= GuloGulo Data Formating -=-=-=-=-=-=-=-=-=-=-=-=
## 
##       ! No data has been set aside for modeling evaluation
##    > Pseudo Absences Selection checkings...
##    > random pseudo absences selection
##    > Pseudo absences are selected in explanatory variables
## -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Done -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Define 2 set of MAXENT models options

## to use biomod2 pseudo absences as background
bm.opt.default <- BIOMOD_ModelingOptions(MAXENT.Phillips = list(path_to_maxent.jar = path.to.maxent.jar))
## to use the full environmental dataset to define maxent background points
bm.opt.maxent.bg <- BIOMOD_ModelingOptions(MAXENT.Phillips = list(path_to_maxent.jar = path.to.maxent.jar,
                                                                  background_data_dir = maxent.background.dat.dir))

Build the models

As it is a simple example we will only produce the full model (using 100% of data to calibrate and test the models performances on the same dataset). I decided to do so to make models scores more comparable one to another (same data used to test models performances)

## model using PA as background
bm.mod.default <- BIOMOD_Modeling( bm.dat, 
                                   models = c('MAXENT.Phillips'), 
                                   models.options = bm.opt.default, 
                                   NbRunEval = 1, 
                                   DataSplit = 100,
                                   models.eval.meth = c('TSS','ROC'),
                                   do.full.models = TRUE,
                                   modeling.id = "default")
## 
## 
## Loading required library...
## 
## Checking Models arguments...
## 
## Creating suitable Workdir...
## 
##          ! Weights where automaticly defined for GuloGulo_PA1 to rise a 0.5 prevalence !
## 
## 
## -=-=-=-=-=-=-=-=-=-=-=-= GuloGulo Modeling Summary -=-=-=-=-=-=-=-=-=-=-=-=
## 
##  5  environmental variables ( bio3 bio4 bio7 bio11 bio12 )
## Number of evaluation repetitions : 1
## Models selected : MAXENT.Phillips 
## 
## Total number of model runs : 1 
## 
## -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
## 
## 
## -=-=-=- Run :  GuloGulo_PA1 
## 
## 
## -=-=-=--=-=-=- GuloGulo_PA1_Full 
## 
## Model=MAXENT.Phillips
##  Creating Maxent Temp Proj Data..
##  Running Maxent...
##  Getting predictions...
##  Removing Maxent Temp Data..
##  Evaluating Model stuff...
## -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Done -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
## model using the full environment as background
bm.mod.maxent.bg <- BIOMOD_Modeling( bm.dat, 
                                   models = c('MAXENT.Phillips'), 
                                   models.options = bm.opt.maxent.bg, 
                                   NbRunEval = 1, 
                                   DataSplit = 100,
                                   models.eval.meth = c('TSS','ROC'),
                                   do.full.models = TRUE,
                                   modeling.id = "maxent_bg")
## 
## 
## Loading required library...
## 
## Checking Models arguments...
## 
## Creating suitable Workdir...
## 
##          ! Weights where automaticly defined for GuloGulo_PA1 to rise a 0.5 prevalence !
## 
## 
## -=-=-=-=-=-=-=-=-=-=-=-= GuloGulo Modeling Summary -=-=-=-=-=-=-=-=-=-=-=-=
## 
##  5  environmental variables ( bio3 bio4 bio7 bio11 bio12 )
## Number of evaluation repetitions : 1
## Models selected : MAXENT.Phillips 
## 
## Total number of model runs : 1 
## 
## -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
## 
## 
## -=-=-=- Run :  GuloGulo_PA1 
## 
## 
## -=-=-=--=-=-=- GuloGulo_PA1_Full 
## 
## Model=MAXENT.Phillips
##  Creating Maxent Temp Proj Data..
##  Running Maxent...
##  Getting predictions...
##  Removing Maxent Temp Data..
##  Evaluating Model stuff...
## -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Done -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Compare models performances

get_evaluations(bm.mod.default)
## , , MAXENT.Phillips, Full, PA1
## 
##     Testing.data Cutoff Sensitivity Specificity
## TSS         0.98    633         100          98
## ROC         0.99    635         100          98
get_evaluations(bm.mod.maxent.bg)
## , , MAXENT.Phillips, Full, PA1
## 
##     Testing.data Cutoff Sensitivity Specificity
## TSS            1  528.0         100          98
## ROC            1  528.5         100         100

As we can see their is a slight improvement of MAXENT performances when we let the model free to select its own background data (not always the case). This difference in models performances differences will reduce when we will increase the number of pseudo absences selected at the formatting data step. In this case pseudo absences and background data are almost equivalent (practically and phylosophically).

Summary

What we have to retain from this is that MAXENT needs a enough of background data to raise optimal performances. In particular, you have to give extra care when you are working with presence/absence data (no/few pseudo absences generation).