This little note is designed to show you what are the different way to feed MAXENT
models with background data within biomod2
: - using pseudo-absences as background (MAXENT
is trained with exactly the same set of data than all other models called in biomod2) - using the full explanatory variables dataset to let MAXENT
choose it’s own background data (this procedure is much closer to what is done by default by MAXENT
when it runs out of biomod2
)
# load biomod2 package
library(biomod2)
## biomod2 3.3-11 loaded.
##
## Type browseVignettes(package='biomod2') to access directly biomod2 vignettes.
library(raster)
## Loading required package: sp
## print the version of packages used in this example
sessionInfo()
## R version 3.2.5 (2016-04-14)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 14.04.4 LTS
##
## locale:
## [1] LC_CTYPE=fr_FR.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=fr_FR.UTF-8 LC_COLLATE=fr_FR.UTF-8
## [5] LC_MONETARY=fr_FR.UTF-8 LC_MESSAGES=fr_FR.UTF-8
## [7] LC_PAPER=fr_FR.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] raster_2.5-8 sp_1.2-3 biomod2_3.3-11
##
## loaded via a namespace (and not attached):
## [1] gbm_2.1.1 zoo_1.7-13 slam_0.1-35
## [4] NLP_0.1-9 splines_3.2.5 lattice_0.20-33
## [7] colorspace_1.2-6 htmltools_0.3 viridisLite_0.1.3
## [10] yaml_2.1.13 survival_2.39-4 hexbin_1.27.1
## [13] DBI_0.4-1 RColorBrewer_1.1-2 plotmo_3.1.4
## [16] plyr_1.8.4 mda_0.4-8 stringr_1.0.0
## [19] munsell_0.4.3 gtable_0.2.0 evaluate_0.8
## [22] latticeExtra_0.6-28 knitr_1.11 SparseM_1.7
## [25] tm_0.6-2.1 parallel_3.2.5 class_7.3-14
## [28] maxent_1.3.3.1 Rcpp_0.12.5 scales_0.4.0
## [31] plotrix_3.6-2 abind_1.4-3 TeachingDemos_2.10
## [34] ggplot2_2.1.0 digest_0.6.9 stringi_1.1.1
## [37] dplyr_0.4.3 dismo_1.1-1 rasterVis_0.40
## [40] grid_3.2.5 tools_3.2.5 magrittr_1.5
## [43] PresenceAbsence_1.1.9 tibble_1.0 randomForest_4.6-12
## [46] tidyr_0.5.1 MASS_7.3-45 Matrix_1.2-6
## [49] pROC_1.8 assertthat_0.1 rmarkdown_0.9.5
## [52] reshape_0.8.5 earth_4.4.4 R6_2.1.2
## [55] rpart_4.1-10 nnet_7.3-12
# load species occurence data
dat <- read.csv(system.file("external/species/mammals_table.csv", package="biomod2"))
We will work with GuloGulo
species
resp.name <- 'GuloGulo'
In this examaple we will keep only GuloGulo presences
resp.occ.id <- which(dat[, resp.name] == 1)
resp.occ <- as.numeric(dat[resp.occ.id, resp.name])
# extract GuloGulo presences coordinates
resp.xy <- dat[resp.occ, c("X_WGS84", "Y_WGS84")]
# we will work with BIOCLIM variables (bio3, bio4, bio7, bio11 & bio12)
expl.var = stack( system.file( "external/bioclim/current/bio3.grd",
package="biomod2"),
system.file( "external/bioclim/current/bio4.grd",
package="biomod2"),
system.file( "external/bioclim/current/bio7.grd",
package="biomod2"),
system.file( "external/bioclim/current/bio11.grd",
package="biomod2"),
system.file( "external/bioclim/current/bio12.grd",
package="biomod2"))
MAXENT
friendly background directoryIf we want to work with MAXENT
as usual backgroud data we have to give (here create) the path to a directory containing our explanatory variables as ascii files
maxent.background.dat.dir <- "maxent_bg"
dir.create(maxent.background.dat.dir, showWarnings = FALSE, recursive = TRUE)
## resave explanatory data
for(var_ in names(expl.var)){
cat("\n> saving", paste0(var_, ".asc"))
writeRaster(subset(expl.var, var_),
filename = file.path(maxent.background.dat.dir, paste0(var_, ".asc")),
overwrite = TRUE)
}
##
## > saving bio3.asc
## > saving bio4.asc
## > saving bio7.asc
## > saving bio11.asc
## > saving bio12.asc
## define the path to maxent.jar file
path.to.maxent.jar <- file.path(getwd(), "maxent.jar")
biomod2
friendly formatHere we will throw 100 pseudo absences tandomly selected to show the differences btw running MAXENT
with biomod2 PA as background data or with the ‘as usual’ MAXENT
background data sampled over the whole environment.
bm.dat <- BIOMOD_FormatingData(resp.var = resp.occ,
expl.var = expl.var,
resp.xy = resp.xy,
resp.name = resp.name,
PA.nb.rep = 1,
PA.nb.absences = 50)
##
## -=-=-=-=-=-=-=-=-=-=-=-= GuloGulo Data Formating -=-=-=-=-=-=-=-=-=-=-=-=
##
## ! No data has been set aside for modeling evaluation
## > Pseudo Absences Selection checkings...
## > random pseudo absences selection
## > Pseudo absences are selected in explanatory variables
## -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Done -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
MAXENT
models options## to use biomod2 pseudo absences as background
bm.opt.default <- BIOMOD_ModelingOptions(MAXENT.Phillips = list(path_to_maxent.jar = path.to.maxent.jar))
## to use the full environmental dataset to define maxent background points
bm.opt.maxent.bg <- BIOMOD_ModelingOptions(MAXENT.Phillips = list(path_to_maxent.jar = path.to.maxent.jar,
background_data_dir = maxent.background.dat.dir))
As it is a simple example we will only produce the full model (using 100% of data to calibrate and test the models performances on the same dataset). I decided to do so to make models scores more comparable one to another (same data used to test models performances)
## model using PA as background
bm.mod.default <- BIOMOD_Modeling( bm.dat,
models = c('MAXENT.Phillips'),
models.options = bm.opt.default,
NbRunEval = 1,
DataSplit = 100,
models.eval.meth = c('TSS','ROC'),
do.full.models = TRUE,
modeling.id = "default")
##
##
## Loading required library...
##
## Checking Models arguments...
##
## Creating suitable Workdir...
##
## ! Weights where automaticly defined for GuloGulo_PA1 to rise a 0.5 prevalence !
##
##
## -=-=-=-=-=-=-=-=-=-=-=-= GuloGulo Modeling Summary -=-=-=-=-=-=-=-=-=-=-=-=
##
## 5 environmental variables ( bio3 bio4 bio7 bio11 bio12 )
## Number of evaluation repetitions : 1
## Models selected : MAXENT.Phillips
##
## Total number of model runs : 1
##
## -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
##
##
## -=-=-=- Run : GuloGulo_PA1
##
##
## -=-=-=--=-=-=- GuloGulo_PA1_Full
##
## Model=MAXENT.Phillips
## Creating Maxent Temp Proj Data..
## Running Maxent...
## Getting predictions...
## Removing Maxent Temp Data..
## Evaluating Model stuff...
## -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Done -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
## model using the full environment as background
bm.mod.maxent.bg <- BIOMOD_Modeling( bm.dat,
models = c('MAXENT.Phillips'),
models.options = bm.opt.maxent.bg,
NbRunEval = 1,
DataSplit = 100,
models.eval.meth = c('TSS','ROC'),
do.full.models = TRUE,
modeling.id = "maxent_bg")
##
##
## Loading required library...
##
## Checking Models arguments...
##
## Creating suitable Workdir...
##
## ! Weights where automaticly defined for GuloGulo_PA1 to rise a 0.5 prevalence !
##
##
## -=-=-=-=-=-=-=-=-=-=-=-= GuloGulo Modeling Summary -=-=-=-=-=-=-=-=-=-=-=-=
##
## 5 environmental variables ( bio3 bio4 bio7 bio11 bio12 )
## Number of evaluation repetitions : 1
## Models selected : MAXENT.Phillips
##
## Total number of model runs : 1
##
## -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
##
##
## -=-=-=- Run : GuloGulo_PA1
##
##
## -=-=-=--=-=-=- GuloGulo_PA1_Full
##
## Model=MAXENT.Phillips
## Creating Maxent Temp Proj Data..
## Running Maxent...
## Getting predictions...
## Removing Maxent Temp Data..
## Evaluating Model stuff...
## -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Done -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
get_evaluations(bm.mod.default)
## , , MAXENT.Phillips, Full, PA1
##
## Testing.data Cutoff Sensitivity Specificity
## TSS 0.98 633 100 98
## ROC 0.99 635 100 98
get_evaluations(bm.mod.maxent.bg)
## , , MAXENT.Phillips, Full, PA1
##
## Testing.data Cutoff Sensitivity Specificity
## TSS 1 528.0 100 98
## ROC 1 528.5 100 100
As we can see their is a slight improvement of MAXENT
performances when we let the model free to select its own background data (not always the case). This difference in models performances differences will reduce when we will increase the number of pseudo absences selected at the formatting data step. In this case pseudo absences and background data are almost equivalent (practically and phylosophically).
What we have to retain from this is that MAXENT
needs a enough of background data to raise optimal performances. In particular, you have to give extra care when you are working with presence/absence data (no/few pseudo absences generation).