Brazil Schisto SDM Updates

1. AUC Figures

Below are figures comparing AUC values across models of different geographic extents. The left panels show AUC values for national models (national training data) predicting national data, the middle panels show AUC values for sudeste regional models (sudeste only training data) predicting sudeste regional data, and the right panels show AUC values for Minas Gerais/Sao Paulo models (Minas Gerais/Sao Paulo only training data) predicting Minas Gerais/Sao Paulo data. This comparison is made across all three statistical models (shown in the different colors: MaxEnt, blue, Random Forest, yellow, and Boosted Regression Tree, red) for two competent species (Biom. Straminea and Biom. Tenagophila). Same for TSS.

1.1 AUC Straminea

1.2 AUC Tenagophila

2. National Prediction Plots

2.1 Straminea National Predictions – Full Training Set (No CV)

2.2 Tenagophila National Predictions – Full Training Set (No CV)

3.1 National Models Subset to Minas Gerais AUC Comparison

The figures below entitled “National Models Subset v Minas Gerais/Sao Paulo” compare AUC values between (left in each pair) national models (national training data) predicting Minas Gerais/Sao Paulo test data (data not used in the training set) and (right in each pair) Minas Gerais/Sao Paulo models (training data only from Minas Gerais/Sao Paulo) predicting Minas Gerais/Sao Paulo test data. This comparison is made across all three statistical models (shown in the different colors: MaxEnt, blue, Random Forest, yellow, and Boosted Regression Tree, red). This is tested for both species.

3.2 National Models Limited Data Quantity v Minas Gerais/Sao Paulo

The figures below entitled “National Models Limited Data Quantity v Minas Gerais/Sao Paulo” compare AUC values between (left in each pair) national models with limited quantity of training data (the same number of occurrence and background points as Minas Gerais/Sao Paulo) predicting national test data (data not used in the training set) and (right in each pair) Minas Gerais/Sao Paulo models (training data only from Minas Gerais/Sao Paulo) predicting Minas Gerais/Sao Paulo test data. This comparison is made across all three statistical models and two species. This test is meant to investigate whether the national models simply having more occurrence points is allowing them to perform better. It is extremely interesting that this is different between the two species, indicating that there are times where using a national over municipal-level model is appropriate and other times where it is not.

4. TSS Figures

4.1 TSS Straminea

4.2 TSS Tenagophila

5. Partial Dependence Plots

Apologies I was unable to address this yet, but make sure to note not all of the x-axis ranges are consistent across the different scales (national, regional, state). For example, some of the state PDP plots may look like they do not have some of the national model behavior, but it might be just that the state models don’t extend as far into extreme values of that predictor. I also have adding uncertainty bars/shading at the top of my to-do list upon my return.

6. Variable Importance Plots

Finally, we have the variable importance plots. It is difficult to compare absolute variable importance across the different types of statistical models (tree based methods produce measures very structurally different than regression methods), so these figures show scaled values within model type to make the importance measures comparable across models.

7. Variable Importance Proportion Plots

7.1 Variable Importance Prop of LULC Variables

7.2 Variable Importance Prop of Non-BioClim Variables

Non-bioclim meaning all variables except temperature and precipitation variables.