This document is just a example to illustrate that depending on how you decide to combine you models to produce ensemble-models, ensemble-models evaluation should be unfair compared to evaluation of formal models it is built with.
If you do pseudo-absences selection at BIOMOD_Formatting
step, individual models at BIOMOD_Modeling
step (let’s call them formal models), will not all use the same input data to be built (e.g PA1, PA2,…).
Moreover, if you do some cross-validation to build your formal models (i.e. your models will be calibrated with a certain proportion of input data and evaluated with the remaining part), 2 models belonging to different cross-validation run will not be evaluated with the same data.
Depending on how you decide to combine your models to build your ensemble-models (i.e. em.by
argument), some models built and/or evaluated with different dataset should be combined.. The question is here : What data should we use to evaluate this ensemble-models as fair as possible?
In biomod2
we decided to evaluate ensemble-models this ways :
if em.by
is 'PA_dataset+repet'
, ensemble models are evaluated on the same part of the data than formal-models they are made with. In this case the way ensemble models are evaluated are fair compared to formal models evaluation. In this case formal-models and ensemble-models evaluation scores should be compared.
em.by
is 'PA_dataset+algo'
or 'PA_dataset'
, ensemble models will be evaluated on the union of calibration and evaluation data for each dataset (i.e PA1, PA2, … if some pseudo absences sampling have been computed or on the AllData if you work with presences/absences dataset ). In this case ensemble models are slightly advantage compare to formal-models because they are evaluated partially with points formal models have been trained with. Ensemble-models evaluation scores should be here a bit overoptimistic
if em.by
is 'all'
or 'algo'
then ensemble_models will be evaluated in the full dataset which should be the union of several PA datasets if pseudo-absences sampling have been computed. Here again, ensemble-models are not evaluated on the same dataset than formal-models they are based on so any comparison should be done carefully.
Let’s base our example on BIOMOD_Modeling
example where we work with a presence/absence dataset. 2 algorithms (RF
and SRE
) and 2 run of cross-validation for each are computed (i.e 4 models done)
library(biomod2)
## Loading required package: sp
## Loading required package: raster
## Loading required package: parallel
## Loading required package: reshape
## Loading required package: ggplot2
## biomod2 3.1-61 loaded.
##
## Type browseVignettes(package='biomod2') to access directly biomod2 vignettes.
example(BIOMOD_Modeling)
##
## BIOMOD> # species occurrences
## BIOMOD> DataSpecies <- read.csv(system.file("external/species/mammals_table.csv",
## BIOMOD+ package="biomod2"))
##
## BIOMOD> head(DataSpecies)
## X X_WGS84 Y_WGS84 ConnochaetesGnou GuloGulo PantheraOnca
## 1 1 -94.5 82 0 0 0
## 2 2 -91.5 82 0 1 0
## 3 3 -88.5 82 0 1 0
## 4 4 -85.5 82 0 1 0
## 5 5 -82.5 82 0 1 0
## 6 6 -79.5 82 0 1 0
## PteropusGiganteus TenrecEcaudatus VulpesVulpes
## 1 0 0 0
## 2 0 0 0
## 3 0 0 0
## 4 0 0 0
## 5 0 0 0
## 6 0 0 0
##
## BIOMOD> # the name of studied species
## BIOMOD> myRespName <- 'GuloGulo'
##
## BIOMOD> # the presence/absences data for our species
## BIOMOD> myResp <- as.numeric(DataSpecies[,myRespName])
##
## BIOMOD> # the XY coordinates of species data
## BIOMOD> myRespXY <- DataSpecies[,c("X_WGS84","Y_WGS84")]
##
## BIOMOD> # Environmental variables extracted from BIOCLIM (bio_3, bio_4, bio_7, bio_11 & bio_12)
## BIOMOD> myExpl = stack( system.file( "external/bioclim/current/bio3.grd",
## BIOMOD+ package="biomod2"),
## BIOMOD+ system.file( "external/bioclim/current/bio4.grd",
## BIOMOD+ package="biomod2"),
## BIOMOD+ system.file( "external/bioclim/current/bio7.grd",
## BIOMOD+ package="biomod2"),
## BIOMOD+ system.file( "external/bioclim/current/bio11.grd",
## BIOMOD+ package="biomod2"),
## BIOMOD+ system.file( "external/bioclim/current/bio12.grd",
## BIOMOD+ package="biomod2"))
##
## BIOMOD> # 1. Formatting Data
## BIOMOD> myBiomodData <- BIOMOD_FormatingData(resp.var = myResp,
## BIOMOD+ expl.var = myExpl,
## BIOMOD+ resp.xy = myRespXY,
## BIOMOD+ resp.name = myRespName)
##
## -=-=-=-=-=-=-=-=-=-=-=-= GuloGulo Data Formating -=-=-=-=-=-=-=-=-=-=-=-=
##
## > No pseudo absences selection !
## ! No data has been set aside for modeling evaluation
## -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Done -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
##
## BIOMOD> # 2. Defining Models Options using default options.
## BIOMOD> myBiomodOption <- BIOMOD_ModelingOptions()
##
## BIOMOD> # 3. Doing Modelisation
## BIOMOD>
## BIOMOD> myBiomodModelOut <- BIOMOD_Modeling( myBiomodData,
## BIOMOD+ models = c('SRE','RF'),
## BIOMOD+ models.options = myBiomodOption,
## BIOMOD+ NbRunEval=2,
## BIOMOD+ DataSplit=80,
## BIOMOD+ VarImport=0,
## BIOMOD+ models.eval.meth = c('TSS','ROC'),
## BIOMOD+ do.full.models=FALSE,
## BIOMOD+ modeling.id="test")
##
##
## Loading required library...
##
## Checking Models arguments...
##
## Creating suitable Workdir...
##
## > No weights : all observations will have the same weight
##
##
## -=-=-=-=-=-=-=-=-=-=-=-= GuloGulo Modeling Summary -=-=-=-=-=-=-=-=-=-=-=-=
##
## 5 environmental variables ( bio3 bio4 bio7 bio11 bio12 )
## Number of evaluation repetitions : 2
## Models selected : SRE RF
##
## Total number of model runs : 4
##
## -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
##
##
## -=-=-=- Run : GuloGulo_AllData
##
##
## -=-=-=--=-=-=- GuloGulo_AllData_RUN1
##
## Model=Surface Range Envelop
## Evaluating Model stuff...
## Model=Breiman and Cutler's random forests for classification and regression
## Evaluating Model stuff...
##
## -=-=-=--=-=-=- GuloGulo_AllData_RUN2
##
## Model=Surface Range Envelop
## Evaluating Model stuff...
## Model=Breiman and Cutler's random forests for classification and regression
## Evaluating Model stuff...
## -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Done -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
##
## BIOMOD> ## print a summary of modeling stuff
## BIOMOD> myBiomodModelOut
##
## -=-=-=-=-=-=-=-=-=-=-=-=-=-= BIOMOD.models.out -=-=-=-=-=-=-=-=-=-=-=-=-=-=
##
## Modeling id : test
##
## Species modeled : GuloGulo
##
## Considered variables : bio3 bio4 bio7 bio11 bio12
##
##
## Computed Models : GuloGulo_AllData_RUN1_SRE GuloGulo_AllData_RUN1_RF
## GuloGulo_AllData_RUN2_SRE GuloGulo_AllData_RUN2_RF
##
##
## Failed Models : none
##
## -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Here we get the scores of all formal-models
(scores_all <- get_evaluations(myBiomodModelOut))
## , , SRE, RUN1, AllData
##
## Testing.data Cutoff Sensitivity Specificity
## TSS 0.721 495 86.36 85.75
## ROC 0.861 500 86.36 85.75
##
## , , RF, RUN1, AllData
##
## Testing.data Cutoff Sensitivity Specificity
## TSS 0.909 404 94.7 96.16
## ROC 0.987 402 94.7 96.16
##
## , , SRE, RUN2, AllData
##
## Testing.data Cutoff Sensitivity Specificity
## TSS 0.737 495 90.15 83.56
## ROC 0.869 500 90.15 83.56
##
## , , RF, RUN2, AllData
##
## Testing.data Cutoff Sensitivity Specificity
## TSS 0.892 283 94.7 94.52
## ROC 0.984 282 94.7 94.52
We will extract only TSS score and select a TSS threshold that will lead to a single model to be kept at ensemble model step. This way we would directly compare best formal-model score with the ensemble-model made with only this formal model (i.e exactly the same model) but evaluated on different data depending on em.by
arg.
## extract TSS scores
scores_TSS <- as.numeric(scores_all["TSS","Testing.data",,,])
## select a threshold to keep a single model
score_thresh <- mean(tail(sort(scores_TSS),2))
Then we build 2 ensemble-models based on formal-models average of prediction (i.e. prob.mean
) with em.by
equal to 'all'
or 'PA_dataset+repet'
.
## build ensemble models
em_all <- BIOMOD_EnsembleModeling( modeling.output = myBiomodModelOut,
chosen.models = 'all',
em.by = 'all',
eval.metric = c('TSS'),
eval.metric.quality.threshold = score_thresh,
models.eval.meth = c('TSS','ROC'),
prob.mean = TRUE )
##
## -=-=-=-=-=-=-=-=-=-=-=-=-= Build Ensemble Models -=-=-=-=-=-=-=-=-=-=-=-=-=
##
## ! all models available will be included in ensemble.modeling
## > Evaluation & Weighting methods summary :
## TSS over 0.9005
##
##
## > mergedAlgo_mergedRun_mergedData ensemble modeling
## ! Models projections for whole zonation required...
## > Projecting GuloGulo_AllData_RUN1_RF ...
##
## > Mean of probabilities...
## Evaluating Model stuff...
## -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Done -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
em_pa_repet <- BIOMOD_EnsembleModeling( modeling.output = myBiomodModelOut,
chosen.models = 'all',
em.by = 'PA_dataset+repet',
eval.metric = c('TSS'),
eval.metric.quality.threshold = score_thresh,
models.eval.meth = c('TSS','ROC'),
prob.mean = TRUE )
##
## -=-=-=-=-=-=-=-=-=-=-=-=-= Build Ensemble Models -=-=-=-=-=-=-=-=-=-=-=-=-=
##
## ! all models available will be included in ensemble.modeling
## > Evaluation & Weighting methods summary :
## TSS over 0.9005
##
##
## > mergedAlgo_RUN1_AllData ensemble modeling
## > Mean of probabilities...
## Evaluating Model stuff...
##
## > mergedAlgo_RUN2_AllData ensemble modeling
## ! No models kept due to treshold filtering... Ensemble Modeling was skip!
## -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Done -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
We should now compare evaluation scores.
## compare models scores
get_evaluations(em_all)
## $GuloGulo_EMmeanByTSS_mergedAlgo_mergedRun_mergedData
## Testing.data Cutoff Sensitivity Specificity
## TSS 0.972 475 98.49 98.74
## ROC 0.998 491 98.34 99.02
get_evaluations(em_pa_repet)
## $GuloGulo_EMmeanByTSS_mergedAlgo_RUN1_AllData
## Testing.data Cutoff Sensitivity Specificity
## TSS 0.909 404 94.7 96.16
## ROC 0.987 402 94.7 96.16
Here we saw that em_pa_repet evaluation score is the same than the best formal-model it is based on but em_all (because it is evaluated not on the same data) evaluation score differ.
The best way to address this problem of not fairly evaluated ensemble-models compared with formal-models is to specify an independent data-set to evaluate all models with at BIOMOD_FormatingData
step (i.e. filling eval.resp.var
, eval.expl.var
, eval.resp.xy
arguments). Then all models you will produce will be evaluated against this evaluation data.
You have to be careful if you compare your formal-models and ensemble models evaluation scores in the case where you didn’t specify independent validation dataset and/or when you use em.by
different than 'PA_dataset+repet'
.
Any comment, modification, improvement, … on this document is more than welcome! Cheers.