What is the purpose of this doc?

This document is just a example to illustrate that depending on how you decide to combine you models to produce ensemble-models, ensemble-models evaluation should be unfair compared to evaluation of formal models it is built with.

What is the problem?

If you do pseudo-absences selection at BIOMOD_Formatting step, individual models at BIOMOD_Modeling step (let’s call them formal models), will not all use the same input data to be built (e.g PA1, PA2,…).

Moreover, if you do some cross-validation to build your formal models (i.e. your models will be calibrated with a certain proportion of input data and evaluated with the remaining part), 2 models belonging to different cross-validation run will not be evaluated with the same data.

Depending on how you decide to combine your models to build your ensemble-models (i.e. em.by argument), some models built and/or evaluated with different dataset should be combined.. The question is here : What data should we use to evaluate this ensemble-models as fair as possible?

In biomod2 we decided to evaluate ensemble-models this ways :

if em.by is 'PA_dataset+repet', ensemble models are evaluated on the same part of the data than formal-models they are made with. In this case the way ensemble models are evaluated are fair compared to formal models evaluation. In this case formal-models and ensemble-models evaluation scores should be compared.
em.by is 'PA_dataset+algo' or 'PA_dataset', ensemble models will be evaluated on the union of calibration and evaluation data for each dataset (i.e PA1, PA2, … if some pseudo absences sampling have been computed or on the AllData if you work with presences/absences dataset ). In this case ensemble models are slightly advantage compare to formal-models because they are evaluated partially with points formal models have been trained with. Ensemble-models evaluation scores should be here a bit overoptimistic
if em.by is 'all' or 'algo' then ensemble_models will be evaluated in the full dataset which should be the union of several PA datasets if pseudo-absences sampling have been computed. Here again, ensemble-models are not evaluated on the same dataset than formal-models they are based on so any comparison should be done carefully.

A simple example to illustrate this.

Let’s base our example on BIOMOD_Modeling example where we work with a presence/absence dataset. 2 algorithms (RF and SRE) and 2 run of cross-validation for each are computed (i.e 4 models done)

library(biomod2)

## Loading required package: sp
## Loading required package: raster
## Loading required package: parallel
## Loading required package: reshape
## Loading required package: ggplot2
## biomod2 3.1-61 loaded.
## 
## Type browseVignettes(package='biomod2') to access directly biomod2 vignettes.

example(BIOMOD_Modeling)

## 
## BIOMOD> # species occurrences
## BIOMOD> DataSpecies <- read.csv(system.file("external/species/mammals_table.csv",
## BIOMOD+                                     package="biomod2"))
## 
## BIOMOD> head(DataSpecies)
##   X X_WGS84 Y_WGS84 ConnochaetesGnou GuloGulo PantheraOnca
## 1 1   -94.5      82                0        0            0
## 2 2   -91.5      82                0        1            0
## 3 3   -88.5      82                0        1            0
## 4 4   -85.5      82                0        1            0
## 5 5   -82.5      82                0        1            0
## 6 6   -79.5      82                0        1            0
##   PteropusGiganteus TenrecEcaudatus VulpesVulpes
## 1                 0               0            0
## 2                 0               0            0
## 3                 0               0            0
## 4                 0               0            0
## 5                 0               0            0
## 6                 0               0            0
## 
## BIOMOD> # the name of studied species
## BIOMOD> myRespName <- 'GuloGulo'
## 
## BIOMOD> # the presence/absences data for our species 
## BIOMOD> myResp <- as.numeric(DataSpecies[,myRespName])
## 
## BIOMOD> # the XY coordinates of species data
## BIOMOD> myRespXY <- DataSpecies[,c("X_WGS84","Y_WGS84")]
## 
## BIOMOD> # Environmental variables extracted from BIOCLIM (bio_3, bio_4, bio_7, bio_11 & bio_12)
## BIOMOD> myExpl = stack( system.file( "external/bioclim/current/bio3.grd", 
## BIOMOD+                              package="biomod2"),
## BIOMOD+                 system.file( "external/bioclim/current/bio4.grd", 
## BIOMOD+                              package="biomod2"), 
## BIOMOD+                 system.file( "external/bioclim/current/bio7.grd", 
## BIOMOD+                              package="biomod2"),  
## BIOMOD+                 system.file( "external/bioclim/current/bio11.grd", 
## BIOMOD+                              package="biomod2"), 
## BIOMOD+                 system.file( "external/bioclim/current/bio12.grd", 
## BIOMOD+                              package="biomod2"))
## 
## BIOMOD> # 1. Formatting Data
## BIOMOD> myBiomodData <- BIOMOD_FormatingData(resp.var = myResp,
## BIOMOD+                                      expl.var = myExpl,
## BIOMOD+                                      resp.xy = myRespXY,
## BIOMOD+                                      resp.name = myRespName)
## 
## -=-=-=-=-=-=-=-=-=-=-=-= GuloGulo Data Formating -=-=-=-=-=-=-=-=-=-=-=-=
## 
## > No pseudo absences selection !
##       ! No data has been set aside for modeling evaluation
## -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Done -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
## 
## BIOMOD> # 2. Defining Models Options using default options.
## BIOMOD> myBiomodOption <- BIOMOD_ModelingOptions()
## 
## BIOMOD> # 3. Doing Modelisation
## BIOMOD> 
## BIOMOD> myBiomodModelOut <- BIOMOD_Modeling( myBiomodData, 
## BIOMOD+                                        models = c('SRE','RF'), 
## BIOMOD+                                        models.options = myBiomodOption, 
## BIOMOD+                                        NbRunEval=2, 
## BIOMOD+                                        DataSplit=80, 
## BIOMOD+                                        VarImport=0, 
## BIOMOD+                                        models.eval.meth = c('TSS','ROC'),
## BIOMOD+                                        do.full.models=FALSE,
## BIOMOD+                                        modeling.id="test")
## 
## 
## Loading required library...
## 
## Checking Models arguments...
## 
## Creating suitable Workdir...
## 
##  > No weights : all observations will have the same weight
## 
## 
## -=-=-=-=-=-=-=-=-=-=-=-= GuloGulo Modeling Summary -=-=-=-=-=-=-=-=-=-=-=-=
## 
##  5  environmental variables ( bio3 bio4 bio7 bio11 bio12 )
## Number of evaluation repetitions : 2
## Models selected : SRE RF 
## 
## Total number of model runs : 4 
## 
## -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
## 
## 
## -=-=-=- Run :  GuloGulo_AllData 
## 
## 
## -=-=-=--=-=-=- GuloGulo_AllData_RUN1 
## 
## Model=Surface Range Envelop
##  Evaluating Model stuff...
## Model=Breiman and Cutler's random forests for classification and regression
##  Evaluating Model stuff...
## 
## -=-=-=--=-=-=- GuloGulo_AllData_RUN2 
## 
## Model=Surface Range Envelop
##  Evaluating Model stuff...
## Model=Breiman and Cutler's random forests for classification and regression
##  Evaluating Model stuff...
## -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Done -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
## 
## BIOMOD> ## print a summary of modeling stuff
## BIOMOD> myBiomodModelOut
## 
## -=-=-=-=-=-=-=-=-=-=-=-=-=-= BIOMOD.models.out -=-=-=-=-=-=-=-=-=-=-=-=-=-=
## 
## Modeling id : test
## 
## Species modeled : GuloGulo
## 
## Considered variables : bio3 bio4 bio7 bio11 bio12
## 
## 
## Computed Models :  GuloGulo_AllData_RUN1_SRE GuloGulo_AllData_RUN1_RF 
## GuloGulo_AllData_RUN2_SRE GuloGulo_AllData_RUN2_RF
## 
## 
## Failed Models :  none
## 
## -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Here we get the scores of all formal-models

(scores_all <- get_evaluations(myBiomodModelOut))

## , , SRE, RUN1, AllData
## 
##     Testing.data Cutoff Sensitivity Specificity
## TSS        0.721    495       86.36       85.75
## ROC        0.861    500       86.36       85.75
## 
## , , RF, RUN1, AllData
## 
##     Testing.data Cutoff Sensitivity Specificity
## TSS        0.909    404        94.7       96.16
## ROC        0.987    402        94.7       96.16
## 
## , , SRE, RUN2, AllData
## 
##     Testing.data Cutoff Sensitivity Specificity
## TSS        0.737    495       90.15       83.56
## ROC        0.869    500       90.15       83.56
## 
## , , RF, RUN2, AllData
## 
##     Testing.data Cutoff Sensitivity Specificity
## TSS        0.892    283        94.7       94.52
## ROC        0.984    282        94.7       94.52

We will extract only TSS score and select a TSS threshold that will lead to a single model to be kept at ensemble model step. This way we would directly compare best formal-model score with the ensemble-model made with only this formal model (i.e exactly the same model) but evaluated on different data depending on em.by arg.

## extract TSS scores
scores_TSS <- as.numeric(scores_all["TSS","Testing.data",,,])

## select a threshold to keep a single model
score_thresh <- mean(tail(sort(scores_TSS),2))

Then we build 2 ensemble-models based on formal-models average of prediction (i.e. prob.mean) with em.by equal to 'all' or 'PA_dataset+repet'.

## build ensemble models 
em_all <- BIOMOD_EnsembleModeling( modeling.output = myBiomodModelOut,
                                   chosen.models = 'all',
                                   em.by = 'all',
                                   eval.metric = c('TSS'),
                                   eval.metric.quality.threshold = score_thresh,
                                   models.eval.meth = c('TSS','ROC'),
                                   prob.mean = TRUE )

## 
## -=-=-=-=-=-=-=-=-=-=-=-=-= Build Ensemble Models -=-=-=-=-=-=-=-=-=-=-=-=-=
## 
##    ! all models available will be included in ensemble.modeling
##    > Evaluation & Weighting methods summary :
##       TSS over 0.9005
## 
## 
##   > mergedAlgo_mergedRun_mergedData ensemble modeling
##    ! Models projections for whole zonation required...
##  > Projecting GuloGulo_AllData_RUN1_RF ...
## 
##    > Mean of probabilities...
##          Evaluating Model stuff...
## -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Done -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

em_pa_repet <- BIOMOD_EnsembleModeling( modeling.output = myBiomodModelOut,
                                        chosen.models = 'all',
                                        em.by = 'PA_dataset+repet',
                                        eval.metric = c('TSS'),
                                        eval.metric.quality.threshold = score_thresh,
                                        models.eval.meth = c('TSS','ROC'),
                                        prob.mean = TRUE )

## 
## -=-=-=-=-=-=-=-=-=-=-=-=-= Build Ensemble Models -=-=-=-=-=-=-=-=-=-=-=-=-=
## 
##    ! all models available will be included in ensemble.modeling
##    > Evaluation & Weighting methods summary :
##       TSS over 0.9005
## 
## 
##   > mergedAlgo_RUN1_AllData ensemble modeling
##    > Mean of probabilities...
##          Evaluating Model stuff...
## 
##   > mergedAlgo_RUN2_AllData ensemble modeling
##    ! No models kept due to treshold filtering... Ensemble Modeling was skip!
## -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Done -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

We should now compare evaluation scores.

## compare models scores
get_evaluations(em_all)

## $GuloGulo_EMmeanByTSS_mergedAlgo_mergedRun_mergedData
##     Testing.data Cutoff Sensitivity Specificity
## TSS        0.972    475       98.49       98.74
## ROC        0.998    491       98.34       99.02

get_evaluations(em_pa_repet)

## $GuloGulo_EMmeanByTSS_mergedAlgo_RUN1_AllData
##     Testing.data Cutoff Sensitivity Specificity
## TSS        0.909    404        94.7       96.16
## ROC        0.987    402        94.7       96.16

Here we saw that em_pa_repet evaluation score is the same than the best formal-model it is based on but em_all (because it is evaluated not on the same data) evaluation score differ.

How to address this ‘issue’?

The best way to address this problem of not fairly evaluated ensemble-models compared with formal-models is to specify an independent data-set to evaluate all models with at BIOMOD_FormatingData step (i.e. filling eval.resp.var , eval.expl.var, eval.resp.xy arguments). Then all models you will produce will be evaluated against this evaluation data.

Conclusion.

You have to be careful if you compare your formal-models and ensemble models evaluation scores in the case where you didn’t specify independent validation dataset and/or when you use em.by different than 'PA_dataset+repet'.

Feel free to contribute

Any comment, modification, improvement, … on this document is more than welcome! Cheers.

BIOMOD2 - Note on evaluation of ensemble-models

Damien G.

31/10/2014