Cholera Disease Project - Hypervolume Analysis

Introduction: Ecological Niche Modeling of Cholera Using Hypervolume Analysis

This script performs an ecological niche modeling (ENM) analysis for Cholera, utilizing hypervolume analysis to understand its environmental niche based on climate conditions. The study follows a structured workflow integrating occurrence data, environmental predictors, and advanced modeling techniques to project potential Cholera distributions under current and future climate scenarios.

Objectives

This analysis aims to:
✅ Identify key environmental variables associated with Cholera distribution.
✅ Reduce spatial and statistical biases through spatial thinning and multicollinearity filtering.
✅ Train an ecological niche model (ENM) using an ellipsoidal niche approach.
✅ Construct a hypervolume model to estimate the Cholera ecological space.
✅ Project Cholera distribution across current and future climatic conditions (SSP119 & SSP585 scenarios).

Workflow Overview

1️⃣ Data Preprocessing

Load Cholera occurrence records and environmental climate layers.
Preprocess environmental raster data (remove NA values, filter correlated variables).
Standardize predictor variables to ensure comparability.

2️⃣ Spatial Thinning of Occurrence Data

Reduce spatial autocorrelation to prevent model overfitting.
Ensure occurrences are well-distributed across the study region.

3️⃣ Variable Selection

Identify non-collinear environmental predictors using correlation heatmaps and Variance Inflation Factor (VIF) analysis.
Extract climate data at Cholera presence locations.

4️⃣ Model Calibration

Train an ellipsoidal niche model using a covariance matrix approach.
Select the optimal predictor variables for model stability.
Split data into training (70%) and testing (30%) for validation.

5️⃣ Hypervolume Construction

Use Support Vector Machines (SVM) to build a hypervolume model of Cholera’s environmental space.
Compare full environmental variable sets vs. calibrated predictor sets.

6️⃣ Projection of Cholera Suitability

Map potential Cholera distributions under current conditions.
Project Cholera hypervolume onto future climate conditions for the years 2050-2060 under:
- SSP119: Low-emission, sustainable scenario.
- SSP585: High-emission, worst-case climate scenario.

# Load Required Libraries
library(terra)        # Spatial raster manipulation
library(sp)           # Spatial data handling
library(tidyverse)    # Data manipulation
library(hypervolume)  # Hypervolume niche modeling
library(spThin)       # Spatial thinning of occurrence data
library(maps)         # Mapping support
library(usdm)         # Variance Inflation Factor (VIF) for collinearity check
library(ellipsenm)    # Ellipsoidal niche modeling
library(ggcorrplot)   # Correlation plots
library(MASS)         # Statistical modeling

Load and Preprocess Environmental Data

Load Raster Data

# ------------------------------------------------------------------
# 🔹 Load Environmental Variables (Pre-processed and Masked)
# ------------------------------------------------------------------
current_clim_m <- rast("Cholera/Climate/Mask/current_clim_m.nc")

# ------------------------------------------------------------------
# 🔹 Summarize Raster Layers to Detect Issues (NA, Zero Values, Outliers)
# This helps identify missing or invalid data before proceeding with the model
# ------------------------------------------------------------------
summary(current_clim_m)

## Warning: [summary] used a sample

##        so            thetao           chl              ph       
##  Min.   : 1.02   Min.   :-1.86   Min.   : 0.05   Min.   :7.26   
##  1st Qu.:32.78   1st Qu.: 1.37   1st Qu.: 0.16   1st Qu.:8.04   
##  Median :34.00   Median :15.48   Median : 0.40   Median :8.06   
##  Mean   :33.49   Mean   :14.40   Mean   : 0.48   Mean   :8.06   
##  3rd Qu.:35.05   3rd Qu.:27.14   3rd Qu.: 0.70   3rd Qu.:8.08   
##  Max.   :40.94   Max.   :30.17   Max.   :10.18   Max.   :8.26   
##  NA's   :70955   NA's   :70955   NA's   :70955   NA's   :70955  
##       clt             dfe            kdpar           mlotst       
##  Min.   :0.03    Min.   :0.00    Min.   :0.00    Min.   :   6.77  
##  1st Qu.:0.25    1st Qu.:0.00    1st Qu.:0.06    1st Qu.:  27.19  
##  Median :0.41    Median :0.00    Median :0.08    Median :  40.01  
##  Mean   :0.44    Mean   :0.00    Mean   :0.09    Mean   :  57.18  
##  3rd Qu.:0.63    3rd Qu.:0.00    3rd Qu.:0.11    3rd Qu.:  61.89  
##  Max.   :0.85    Max.   :0.03    Max.   :0.55    Max.   :1152.39  
##  NA's   :70955   NA's   :70955   NA's   :71480   NA's   :70955    
##       no3              o2             par             phyc      
##  Min.   : 0.00   Min.   :181.1   Min.   : 2.92   Min.   : 0.28  
##  1st Qu.: 0.20   1st Qu.:206.2   1st Qu.:25.57   1st Qu.: 1.29  
##  Median : 2.65   Median :257.9   Median :36.44   Median : 1.94  
##  Mean   : 6.49   Mean   :274.1   Mean   :34.70   Mean   : 2.46  
##  3rd Qu.: 7.70   3rd Qu.:341.8   3rd Qu.:43.88   3rd Qu.: 3.29  
##  Max.   :79.29   Max.   :394.1   Max.   :51.40   Max.   :29.84  
##  NA's   :70955   NA's   :70955   NA's   :72973   NA's   :70955  
##       po4              si             siconc         sithick     
##  Min.   :0.00    Min.   :  0.66   Min.   :0.00    Min.   :0.00   
##  1st Qu.:0.10    1st Qu.:  1.78   1st Qu.:0.00    1st Qu.:0.00   
##  Median :0.38    Median :  3.94   Median :0.00    Median :0.00   
##  Mean   :0.60    Mean   : 14.15   Mean   :0.19    Mean   :0.37   
##  3rd Qu.:0.90    3rd Qu.: 16.22   3rd Qu.:0.47    3rd Qu.:0.62   
##  Max.   :2.20    Max.   :208.61   Max.   :0.96    Max.   :5.36   
##  NA's   :70955   NA's   :70955    NA's   :70955   NA's   :70955  
##       swd              sws             tas            terrain       
##  Min.   :  4.13   Min.   :0.00    Min.   :-24.47   Min.   :-9585.3  
##  1st Qu.:125.71   1st Qu.:0.10    1st Qu.: -7.02   1st Qu.:-3926.7  
##  Median :157.93   Median :0.20    Median : 13.99   Median :-2527.7  
##  Mean   :156.88   Mean   :0.24    Mean   : 10.12   Mean   :-2368.8  
##  3rd Qu.:184.72   3rd Qu.:0.32    3rd Qu.: 26.05   3rd Qu.: -330.3  
##  Max.   :332.61   Max.   :1.67    Max.   : 30.24   Max.   :    0.0  
##  NA's   :70955    NA's   :70955   NA's   :70955    NA's   :70955

# ------------------------------------------------------------------
# 🔹 Identify and Remove Problematic Variables
# Some layers may contain excessive NA values, zero values, or unrelated variables.
# We remove these to avoid errors or bias in the model.

# ------------------------------------------------------------------
filtered_clim_m <- current_clim_m[[-c(5, 7, 11, 17, 18, 20)]]

# Print the names of the remaining raster layers to verify the selection
names(filtered_clim_m)

##  [1] "so"      "thetao"  "chl"     "ph"      "dfe"     "mlotst"  "no3"    
##  [8] "o2"      "phyc"    "po4"     "si"      "siconc"  "sithick" "tas"

# ------------------------------------------------------------------
# 🔹 Scale Environmental Variables (Standardization)
# Scaling transforms each variable to have a mean of 0 and standard deviation of 1.
# This ensures variables with different ranges (e.g., temperature vs. nutrient levels)
# do not disproportionately influence the model.
# ------------------------------------------------------------------
scaled_clim_m <- lapply(1:nlyr(filtered_clim_m), function(i) {
  raster::raster(scale(filtered_clim_m[[i]]))  # Standardize each raster layer separately
})

# Convert the list of scaled layers to a RasterStack for compatibility with modeling
current_clim_scale <- raster::stack(scaled_clim_m)

# ------------------------------------------------------------------
# 🔹 Visualize Scaled Environmental Variables
# This step ensures that all layers were successfully scaled and are ready for analysis.
# ------------------------------------------------------------------
plot(current_clim_scale)

Load and Thin Cholera Occurrence Data

Load Cholera Occurrence Data within the M region

# Load occurrence data
cholera <- read.csv("Cholera/data/Occurrence_crop.csv")

# Add species column
cholera$species <- "Cholera"

# Select relevant columns
cholera <- cholera %>% 
  dplyr::select("x", "y", "species")

Apply Spatial Thinning

# Thin the occurrence data to reduce spatial autocorrelation
thinned_cholera <- thin(loc.data = cholera,
                        lat.col = "y",
                        long.col = "x",
                        spec.col = "species",
                        thin.par = 10,  # Minimum distance threshold in km
                        reps = 100,     # Number of repetitions
                        locs.thinned.list.return = TRUE,
                        write.files = TRUE,
                        max.files = 1,
                        out.dir = "Cholera/thinning",
                        out.base = "cholera_thinned", 
                        write.log.file = TRUE)

## ********************************************** 
##  Beginning Spatial Thinning.
##  Script Started at: Wed Feb 12 22:37:55 2025
## lat.long.thin.count
## 699 702 704 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 
##   1   1   1   1   2   2   6   3   2   4   2   4   4   5   5   6   2  10   4 
## 723 724 725 726 727 728 729 730 732 738 
##   9   4   4   3   2   6   2   1   3   1 
## [1] "Maximum number of records after thinning: 738"
## [1] "Number of data.frames with max records: 1"
## [1] "Writing new *.csv files"

## [1] "Writing file: Cholera/thinning/cholera_thinned_thin1_new_new_new_new.csv"

# Load the thinned dataset
thinned_cholera_data <- read.csv("Cholera/thinning/cholera_thinned_thin1.csv")

Reduces spatial bias in the presence of data, which can influence model accuracy.

Extract Environmental Variables for Thinned Occurrences

# Convert thinned data to spatial vector
thinned_cholera_vec <- vect(thinned_cholera_data, geom = c("x", "y"),
                            crs = "+proj=longlat +datum=WGS84 +no_defs", 
                            keepgeom = FALSE)

# Extract climate data at occurrence locations
thinned_cholera_clim <- terra::extract(rast(current_clim_scale), 
                                       thinned_cholera_vec,
                                       xy = TRUE) %>%
  na.omit() %>%
  dplyr::select("x", "y") %>%
  mutate(species = "Cholera")

Check Correlation and Multicollinearity

Correlation Matrix

# ------------------------------------------------------------------
# 🔹 Compute Correlation Matrix for Environmental Variables
# ------------------------------------------------------------------
# Convert the RasterStack to a DataFrame before computing correlation
M <- cor(as.data.frame(rast(current_clim_scale)))

# ------------------------------------------------------------------
# 🔹 Visualize the Correlation Matrix as a Heatmap
# This helps identify highly correlated variables, which may cause redundancy in modeling.
# 'hc.order = TRUE' orders variables hierarchically for better interpretation.
# 'type = "lower"' shows only the lower triangle of the correlation matrix.
# 'lab = TRUE' adds correlation values inside the heatmap.
# ------------------------------------------------------------------
ggcorrplot(M, 
           hc.order = TRUE, 
           type = "lower", 
           lab = TRUE, 
           lab_size = 2,
           title = "Correlation Heatmap of Environmental Variables")

Variance Inflation Factor (VIF) Check

# ------------------------------------------------------------------
# 🔹 Compute Variance Inflation Factor (VIF) to Detect Multicollinearity
# ------------------------------------------------------------------
# VIF measures how much a variable is correlated with other predictor variables.
# High VIF (>10) indicates multicollinearity, which can lead to unreliable model predictions.
# We set a threshold (th = 5) to remove highly correlated variables.
# ------------------------------------------------------------------
current_clim_vif <- vifstep(rast(current_clim_scale), th = 5)

# Print the VIF results
current_clim_vif

## 6 variables from the 14 input variables have collinearity problem: 
##  
## thetao tas chl siconc no3 po4 
## 
## After excluding the collinear variables, the linear correlation coefficients ranges between: 
## min correlation ( sithick ~ ph ):  -0.006055419 
## max correlation ( sithick ~ o2 ):  0.7062824 
## 
## ---------- VIFs of the remained variables -------- 
##   Variables      VIF
## 1        so 1.884866
## 2        ph 1.270972
## 3       dfe 1.669753
## 4    mlotst 1.286894
## 5        o2 3.849542
## 6      phyc 1.707333
## 7        si 1.552798
## 8   sithick 2.494182

# ------------------------------------------------------------------
# 🔹 Check Selected Variables After Removing Highly Correlated Ones
# This step ensures that only independent variables are retained for modeling.
# ------------------------------------------------------------------
selected_variables <- current_clim_vif@results$Variables

# Print the final list of selected variables
selected_variables

## [1] "so"      "ph"      "dfe"     "mlotst"  "o2"      "phyc"    "si"     
## [8] "sithick"

Prepare Training and Testing Data

# Set a random seed for reproducibility
set.seed(1234)

# ------------------------------------------------------------------
# 🔹 Split Data into Training (70%) and Testing (30%) Sets
# ------------------------------------------------------------------
# This step ensures that the model is trained on a subset of data (70%)
# and tested on the remaining 30% to evaluate its predictive performance.
# 
# 'method = "random"' randomly selects training and testing points.
# 'longitude' and 'latitude' specify spatial coordinates.
# 'train_proportion = 0.70' sets the training data percentage.
# 'save = FALSE' prevents saving the split dataset as a file.
# 'name = "occ"' assigns a name to the occurrence dataset.
# ------------------------------------------------------------------
data_split <- ellipsenm::split_data(thinned_cholera_clim,
                                    method = "random",
                                    longitude = "x",
                                    latitude = "y",
                                    train_proportion = 0.70,
                                    save = FALSE,
                                    name = "occ")

Generate Three Combinations of Variables for Model Calibration

# Define the predictor variables
all_variables <- c(current_clim_vif@results$Variables)

# Function to generate three possible combinations
generate_combinations <- function(vars) {
  comb_list <- list()
  for (m in 3) {
    combs <- combn(vars, m, simplify = FALSE)
    comb_list <- c(comb_list, combs)
  }
  return(comb_list)
}

# Generate variable sets
all_combinations <- generate_combinations(all_variables)
all_combinations

## [[1]]
## [1] "so"  "ph"  "dfe"
## 
## [[2]]
## [1] "so"     "ph"     "mlotst"
## 
## [[3]]
## [1] "so" "ph" "o2"
## 
## [[4]]
## [1] "so"   "ph"   "phyc"
## 
## [[5]]
## [1] "so" "ph" "si"
## 
## [[6]]
## [1] "so"      "ph"      "sithick"
## 
## [[7]]
## [1] "so"     "dfe"    "mlotst"
## 
## [[8]]
## [1] "so"  "dfe" "o2" 
## 
## [[9]]
## [1] "so"   "dfe"  "phyc"
## 
## [[10]]
## [1] "so"  "dfe" "si" 
## 
## [[11]]
## [1] "so"      "dfe"     "sithick"
## 
## [[12]]
## [1] "so"     "mlotst" "o2"    
## 
## [[13]]
## [1] "so"     "mlotst" "phyc"  
## 
## [[14]]
## [1] "so"     "mlotst" "si"    
## 
## [[15]]
## [1] "so"      "mlotst"  "sithick"
## 
## [[16]]
## [1] "so"   "o2"   "phyc"
## 
## [[17]]
## [1] "so" "o2" "si"
## 
## [[18]]
## [1] "so"      "o2"      "sithick"
## 
## [[19]]
## [1] "so"   "phyc" "si"  
## 
## [[20]]
## [1] "so"      "phyc"    "sithick"
## 
## [[21]]
## [1] "so"      "si"      "sithick"
## 
## [[22]]
## [1] "ph"     "dfe"    "mlotst"
## 
## [[23]]
## [1] "ph"  "dfe" "o2" 
## 
## [[24]]
## [1] "ph"   "dfe"  "phyc"
## 
## [[25]]
## [1] "ph"  "dfe" "si" 
## 
## [[26]]
## [1] "ph"      "dfe"     "sithick"
## 
## [[27]]
## [1] "ph"     "mlotst" "o2"    
## 
## [[28]]
## [1] "ph"     "mlotst" "phyc"  
## 
## [[29]]
## [1] "ph"     "mlotst" "si"    
## 
## [[30]]
## [1] "ph"      "mlotst"  "sithick"
## 
## [[31]]
## [1] "ph"   "o2"   "phyc"
## 
## [[32]]
## [1] "ph" "o2" "si"
## 
## [[33]]
## [1] "ph"      "o2"      "sithick"
## 
## [[34]]
## [1] "ph"   "phyc" "si"  
## 
## [[35]]
## [1] "ph"      "phyc"    "sithick"
## 
## [[36]]
## [1] "ph"      "si"      "sithick"
## 
## [[37]]
## [1] "dfe"    "mlotst" "o2"    
## 
## [[38]]
## [1] "dfe"    "mlotst" "phyc"  
## 
## [[39]]
## [1] "dfe"    "mlotst" "si"    
## 
## [[40]]
## [1] "dfe"     "mlotst"  "sithick"
## 
## [[41]]
## [1] "dfe"  "o2"   "phyc"
## 
## [[42]]
## [1] "dfe" "o2"  "si" 
## 
## [[43]]
## [1] "dfe"     "o2"      "sithick"
## 
## [[44]]
## [1] "dfe"  "phyc" "si"  
## 
## [[45]]
## [1] "dfe"     "phyc"    "sithick"
## 
## [[46]]
## [1] "dfe"     "si"      "sithick"
## 
## [[47]]
## [1] "mlotst" "o2"     "phyc"  
## 
## [[48]]
## [1] "mlotst" "o2"     "si"    
## 
## [[49]]
## [1] "mlotst"  "o2"      "sithick"
## 
## [[50]]
## [1] "mlotst" "phyc"   "si"    
## 
## [[51]]
## [1] "mlotst"  "phyc"    "sithick"
## 
## [[52]]
## [1] "mlotst"  "si"      "sithick"
## 
## [[53]]
## [1] "o2"   "phyc" "si"  
## 
## [[54]]
## [1] "o2"      "phyc"    "sithick"
## 
## [[55]]
## [1] "o2"      "si"      "sithick"
## 
## [[56]]
## [1] "phyc"    "si"      "sithick"

# Prepare sets for ellipsoidal models
variable_sets <- prepare_sets(raster::stack(current_clim_scale), 
                              all_combinations)

## 
## Preparing sets of variables...
##  56 sets of variables prepared

Model Calibration with Ellipsoidal Niche Models

# ------------------------------------------------------------------
# 🔹 Define Ellipsoid Calibration Method
# ------------------------------------------------------------------
# 'covmat' (covariance matrix) is a method used to build ellipsoid niche models.
# It assumes the ecological niche can be represented by a multidimensional
# covariance matrix of environmental variables.
# ------------------------------------------------------------------
methods <- c("covmat")

# ------------------------------------------------------------------
# 🔹 Run Ellipsoid Model Calibration
# ------------------------------------------------------------------
# This step calibrates the model based on training data and selected variables.
# 
# - 'data = data_split' -> Uses the previously split training dataset.
# - 'species = "species"' -> Specifies the species being modeled.
# - 'longitude' and 'latitude' -> Define spatial coordinates.
# - 'variables = variable_sets' -> Environmental variables used in the model.
# - 'methods = methods' -> Uses the covariance matrix approach ('covmat').
# - 'level = 95' -> 95% confidence level for the ellipsoid.
# - 'selection_criteria = "S_OR_P"' -> Selects the best model based on suitability.
# - 'error = 5' -> Error threshold for calibration.
# - 'iterations = 100' -> Number of model iterations.
# - 'percentage = 50' -> 50% of occurrences used for testing.
# - 'overwrite = TRUE' -> Allows overwriting existing outputs.
# - 'parallel = TRUE' -> Runs computations in parallel to speed up processing.
# ------------------------------------------------------------------
calib <- ellipsoid_calibration(data = data_split, 
                               species = "species", 
                               longitude = "x", 
                               latitude = "y", 
                               variables = variable_sets, 
                               methods = methods, 
                               level = 99, 
                               selection_criteria = "S_OR_P", 
                               error = 5, 
                               iterations = 1000, 
                               percentage = 70,
                               overwrite = TRUE,
                               parallel = TRUE)

# ------------------------------------------------------------------
# 🔹 View Model Summary
# This provides an overview of the calibration results, including model performance.
# ------------------------------------------------------------------
summary(calib)

## 
##                  Summary of calibration_ellipsoid object
## ---------------------------------------------------------------------------
## 
## Methods tested:   covmat
## 
## Level used:   99
## 
## Variable sets tested:
## $set_1
## [1] "so"  "ph"  "dfe"
## 
## $set_2
## [1] "so"     "ph"     "mlotst"
## 
## $set_3
## [1] "so" "ph" "o2"
## 
## $set_4
## [1] "so"   "ph"   "phyc"
## 
## $set_5
## [1] "so" "ph" "si"
## 
## $set_6
## [1] "so"      "ph"      "sithick"
## 
## $set_7
## [1] "so"     "dfe"    "mlotst"
## 
## $set_8
## [1] "so"  "dfe" "o2" 
## 
## $set_9
## [1] "so"   "dfe"  "phyc"
## 
## $set_10
## [1] "so"  "dfe" "si" 
## 
## $set_11
## [1] "so"      "dfe"     "sithick"
## 
## $set_12
## [1] "so"     "mlotst" "o2"    
## 
## $set_13
## [1] "so"     "mlotst" "phyc"  
## 
## $set_14
## [1] "so"     "mlotst" "si"    
## 
## $set_15
## [1] "so"      "mlotst"  "sithick"
## 
## $set_16
## [1] "so"   "o2"   "phyc"
## 
## $set_17
## [1] "so" "o2" "si"
## 
## $set_18
## [1] "so"      "o2"      "sithick"
## 
## $set_19
## [1] "so"   "phyc" "si"  
## 
## $set_20
## [1] "so"      "phyc"    "sithick"
## 
## $set_21
## [1] "so"      "si"      "sithick"
## 
## $set_22
## [1] "ph"     "dfe"    "mlotst"
## 
## $set_23
## [1] "ph"  "dfe" "o2" 
## 
## $set_24
## [1] "ph"   "dfe"  "phyc"
## 
## $set_25
## [1] "ph"  "dfe" "si" 
## 
## $set_26
## [1] "ph"      "dfe"     "sithick"
## 
## $set_27
## [1] "ph"     "mlotst" "o2"    
## 
## $set_28
## [1] "ph"     "mlotst" "phyc"  
## 
## $set_29
## [1] "ph"     "mlotst" "si"    
## 
## $set_30
## [1] "ph"      "mlotst"  "sithick"
## 
## $set_31
## [1] "ph"   "o2"   "phyc"
## 
## $set_32
## [1] "ph" "o2" "si"
## 
## $set_33
## [1] "ph"      "o2"      "sithick"
## 
## $set_34
## [1] "ph"   "phyc" "si"  
## 
## $set_35
## [1] "ph"      "phyc"    "sithick"
## 
## $set_36
## [1] "ph"      "si"      "sithick"
## 
## $set_37
## [1] "dfe"    "mlotst" "o2"    
## 
## $set_38
## [1] "dfe"    "mlotst" "phyc"  
## 
## $set_39
## [1] "dfe"    "mlotst" "si"    
## 
## $set_40
## [1] "dfe"     "mlotst"  "sithick"
## 
## $set_41
## [1] "dfe"  "o2"   "phyc"
## 
## $set_42
## [1] "dfe" "o2"  "si" 
## 
## $set_43
## [1] "dfe"     "o2"      "sithick"
## 
## $set_44
## [1] "dfe"  "phyc" "si"  
## 
## $set_45
## [1] "dfe"     "phyc"    "sithick"
## 
## $set_46
## [1] "dfe"     "si"      "sithick"
## 
## $set_47
## [1] "mlotst" "o2"     "phyc"  
## 
## $set_48
## [1] "mlotst" "o2"     "si"    
## 
## $set_49
## [1] "mlotst"  "o2"      "sithick"
## 
## $set_50
## [1] "mlotst" "phyc"   "si"    
## 
## $set_51
## [1] "mlotst"  "phyc"    "sithick"
## 
## $set_52
## [1] "mlotst"  "si"      "sithick"
## 
## $set_53
## [1] "o2"   "phyc" "si"  
## 
## $set_54
## [1] "o2"      "phyc"    "sithick"
## 
## $set_55
## [1] "o2"      "si"      "sithick"
## 
## $set_56
## [1] "phyc"    "si"      "sithick"
## 
## Selection criteria:  Significance, omission rate, and prevalence
## 
## Selected parameters:
##  Method Variable_set Mean_AUC_ratio_at_5% pval_pROC Valid_iterations
##  covmat       set_27             1.599931         0              997
##  Omission_rate prevalence_E_space prevalence_G_space
##     0.01273885          0.3137428          0.3137399

# ------------------------------------------------------------------
# 🔹 Extract the Selected Variable Set
# This retrieves the environmental variables chosen during calibration.
# ------------------------------------------------------------------
selected_vars <- variable_sets$variable_sets[calib@selected_parameters$Variable_set][1]

# Flatten the list and extract only the values
selected_vars <- unname(unlist(selected_vars))

# Display the final calibrated variable set
selected_vars

## [1] "ph"     "mlotst" "o2"

Hypervolume Analysis

Construct Hypervolume

# ------------------------------------------------------------------
# 🔹 Extract Climate Data at Cholera Occurrence Locations
# ------------------------------------------------------------------
# This extracts environmental variable values from 'current_clim_scale' 
# at the locations where Cholera occurrences were recorded.
#
# - 'rast(current_clim_scale)' -> Converts raster data into a format compatible with terra.
# - 'thinned_cholera_vec' -> The vector of Cholera occurrence locations.
# - 'xy = TRUE' -> Retains spatial coordinates in the extracted dataset.
# - 'na.omit()' -> Removes rows with missing values to ensure clean data input.
# ------------------------------------------------------------------
cholera_vec_clim <- terra::extract(rast(current_clim_scale), 
                                   thinned_cholera_vec, xy = TRUE) %>%
  na.omit()

# ------------------------------------------------------------------
# 🔹 Generate Hypervolume Model Using All Environmental Variables
# ------------------------------------------------------------------
# The hypervolume model represents the ecological niche of Cholera.
#
# - 'cholera_vec_clim[, all_variables]' -> Uses all selected environmental variables.
# - 'name = "Cholera"' -> Assigns a label to the hypervolume model.
# - 'hypervolume_svm()' -> Uses Support Vector Machine (SVM) method to construct the hypervolume.
# ------------------------------------------------------------------
hv_cholera <- hypervolume_svm(cholera_vec_clim[, all_variables],
                              name = "Cholera",
                              samples.per.point = 100)

# ------------------------------------------------------------------
# 🔹 Plot the Hypervolume Model
# Enhances visualization with custom aesthetics for better interpretation.
# ------------------------------------------------------------------
plot(hv_cholera, 
     cex.legend = 1.5,    # Increases legend size
     contour.lwd = 3,     # Adjusts contour line thickness
     cex.axis = 1.5,      # Enlarges axis labels
     cex.names = 1.5,     # Enlarges variable names
     cex.centroid = 3)    # Increases centroid size

# ------------------------------------------------------------------
# 🔹 Generate Hypervolume Model Using Calibrated Variables
# ------------------------------------------------------------------
# This hypervolume model is built using only the variables selected 
# from the VIF analysis, reducing redundancy and improving efficiency.
#
# - 'cholera_vec_clim[, selected_vars]' -> Uses only the calibrated environmental variables.
# - 'name = "Cholera calibration"' -> Assigns a specific label for this model.
# ------------------------------------------------------------------
hv_cholera_calib <- hypervolume_svm(cholera_vec_clim[, selected_vars],
                                    name = "Cholera calibration",
                                    samples.per.point = 100)

# ------------------------------------------------------------------
# 🔹 Plot the Hypervolume Model with Calibrated Variables
# ------------------------------------------------------------------
plot(hv_cholera_calib,
     cex.legend = 1.5,    # Increases legend size
     contour.lwd = 3,     # Adjusts contour line thickness
     cex.axis = 1.5,      # Enlarges axis labels
     cex.names = 1.5,     # Enlarges variable names
     cex.centroid = 3)    # Increases centroid size

Hypervolume Projection Within M Region

With all environmental factors

# Project Cholera hypervolume within the defined M region
hv_cholera_projected <- hypervolume_project(
  hv_cholera,
  rasters = raster::stack(current_clim_scale[[all_variables]]),
  type = 'inclusion',
  fast.or.accurate = 'fast'
)

# Plot projected hypervolume for M region
plot(hv_cholera_projected)
maps::map("world", add = TRUE)

With calibrated environmental factors

# Project Cholera hypervolume with calibrated factors within the defined M region
hv_cholera_calib_projected <- hypervolume_project(
  hv_cholera_calib,
  rasters = raster::stack(current_clim_scale[[selected_vars]]),
  type = 'inclusion',
  fast.or.accurate = 'fast'
)

# Plot projected hypervolume for M region
plot(hv_cholera_calib_projected)
maps::map("world", add = TRUE)

Global Hypervolume Projection

# Load global environmental conditions
current_clim_world <- rast("Cholera/Climate/Current/current_clim.nc")

# Plot global climate data
plot(current_clim_world)

# Identify and remove problematic variables (NA values and irrelevant variables)
# Removing layers
filtered_clim_world <- current_clim_world[[-c(5, 7, 11, 17, 18, 20)]]

# Print the names of the remaining raster layers
names(filtered_clim_world)

# Scale environmental variables (standardization)
scaled_clim_world <- lapply(1:nlyr(filtered_clim_world), function(i) {
  raster::raster(scale(filtered_clim_world[[i]]))  # Scale each layer separately
})

# Convert list to RasterStack for analysis
current_clim_world_scale <- raster::stack(scaled_clim_world)

# Project Cholera hypervolume with all environmental factors globally
hv_cholera_world_projected <- hypervolume_project(
  hv_cholera,
  rasters = raster::stack(current_clim_world_scale[[all_variables]]),
  type = 'inclusion',
  fast.or.accurate = 'fast'
)

# Plot projected hypervolume for global conditions
plot(hv_cholera_world_projected)
maps::map("world", add = TRUE)

# Project Cholera hypervolume with calibrated environmental factors globally
hv_cholera_world_calib_projected <- hypervolume_project(
  hv_cholera_calib,
  rasters = raster::stack(current_clim_world_scale[[selected_vars]]),
  type = 'inclusion',
  fast.or.accurate = 'fast'
)

# Plot projected hypervolume for global conditions
plot(hv_cholera_world_calib_projected)
maps::map("world", add = TRUE)

Modeling Cholera Ecological Niche

Load future environmental conditions

# Load future climate projections for SSP119 (2050s)
future_ssp119 <- rast("Cholera/Climate/Future_2050_2060/future_clim_ssp119_2050s.nc")

# Remove unwanted raster layers (e.g., NA values or unrelated variables)
# Removed layers: 5 (Cloud Cover), 18 (Sea Water Speed), 16 (Sea Ice Thickness)
future_ssp119 <- future_ssp119[[-c(5, 18, 16)]]

# Load future climate projections for SSP585 (2050s)
future_ssp585 <- rast("Cholera/Climate/Future_2050_2060/future_clim_ssp585_2050s.nc")

# Apply the same filtering process for SSP585
future_ssp585 <- future_ssp585[[-c(5, 18, 16)]]

# ------------------------------------------------------------------
# 🔹 Scale environmental variables (standardization) for SSP119
# Each raster layer is standardized individually
# Scaling transforms the data to have mean = 0 and standard deviation = 1
# This prevents bias in the model due to different variable ranges
# ------------------------------------------------------------------
future_clim_ssp119_scale <- lapply(1:nlyr(future_ssp119), function(i) {
  raster::raster(scale(future_ssp119[[i]]))  # Convert SpatRaster layer to Raster after scaling
})

# Convert list of scaled layers to a RasterStack
future_clim_ssp119_stack <- raster::stack(future_clim_ssp119_scale)

# ------------------------------------------------------------------
# 🔹 Scale environmental variables (standardization) for SSP585
# Apply the same standardization process to SSP585 layers
# ------------------------------------------------------------------
future_clim_ssp585_scale <- lapply(1:nlyr(future_ssp585), function(i) {
  raster::raster(scale(future_ssp585[[i]]))  # Standardize each raster layer separately
})

# Convert list of scaled layers to a RasterStack
future_clim_ssp585_stack <- raster::stack(future_clim_ssp585_scale)

# ------------------------------------------------------------------
# 🔹 Visualizing Scaled Climate Data
# ------------------------------------------------------------------
# SSP119
plot(future_clim_ssp119_stack)

# SSP585
plot(future_clim_ssp585_stack)

Create the Ecological Niche Model for Cholera

# ------------------------------------------------------------------
# 🔹 Train the Ecological Niche Model for Cholera
# ------------------------------------------------------------------
# This model predicts the environmental suitability for Cholera
# using an ellipsoid niche approach based on climatic variables.
#
# - 'thinned_cholera_data' -> Cholera occurrence dataset (after thinning).
# - 'species = "species"' -> Defines the target species being modeled.
# - 'longitude' & 'latitude' -> Specify spatial coordinates of occurrences.
# - 'raster_layers' -> Current climate data for training (selected variables only).
# - 'method = "covmat"' -> Uses the covariance matrix method for modeling.
# - 'level = 95' -> Sets the confidence level at 95% for niche space.
# - 'replicates = 100' -> Runs 100 bootstrap replicates for robust estimates.
# - 'replicate_type = "bootstrap"' -> Uses bootstrapping to assess model stability.
# - 'bootstrap_percentage = 70' -> 70% of occurrence data is used for training.
# ------------------------------------------------------------------
cholera_mod <- tryCatch(ellipsoid_model(
  data = thinned_cholera_data, 
  species = "species", 
  longitude = "x",
  latitude = "y", 
  raster_layers = raster::stack(current_clim_world_scale[[selected_vars]]),
  method = "covmat", 
  level = 95, 
  replicates = 10, 
  replicate_type = "bootstrap",
  bootstrap_percentage = 70, 

  # ------------------------------------------------------------------
  # 🔹 Project the Model onto Future Climate Scenarios
  # This step projects the trained niche model onto future climate conditions.
  #
  # - 'future_ssp119' -> Future projection under SSP1-1.9 (low-emission scenario).
  # - 'future_ssp585' -> Future projection under SSP5-8.5 (high-emission scenario).
  # - 'raster::stack()' -> Ensures compatibility of projection data format.
  # ------------------------------------------------------------------
  projection_variables = list(
    future_ssp119 = raster::stack(future_clim_ssp119_stack[[selected_vars]]),
    future_ssp585 = raster::stack(future_clim_ssp585_stack[[selected_vars]])
  ),

  # ------------------------------------------------------------------
  # 🔹 Model Output Settings
  # - 'prvariables_format = "GTiff"' -> Saves the projected variables as GeoTIFF files.
  # - 'prediction = "suitability"' -> Outputs a continuous suitability map.
  # - 'return_numeric = TRUE' -> Returns numeric model output values.
  # - 'tolerance = 1e-60' -> Sets numerical tolerance to avoid computational errors.
  # - 'format = "GTiff"' -> Saves all model outputs as GeoTIFF files.
  # - 'color_palette = viridis::magma' -> Uses a color scheme for visual clarity.
  # - 'output_directory' -> Directory where model results are saved.
  # - 'overwrite = TRUE' -> Overwrites previous results if they exist.
  # ------------------------------------------------------------------
  prvariables_format = "GTiff", 
  prediction = "suitability", 
  return_numeric = TRUE,
  tolerance = 1e-60, 
  format = "GTiff", 
  color_palette = viridis::magma,
  output_directory = "Cholera/Models/ellipsenm_cholera", 
  overwrite = TRUE
))

# ------------------------------------------------------------------
# 🔹 View the Model Summary
# This provides an overview of the trained model, including performance metrics.
# ------------------------------------------------------------------
summary(cholera_mod)

## 
##                   Summary of ellipsoid_model_rep object
## ---------------------------------------------------------------------------
## 
## ---------------------------Ellipsoids metadata-----------------------------
## 
## Method:   covmat
## 
## Level:    95
## 
## Centroids:
##        mean_ellipsoid min_ellipsoid max_ellipsoid
## ph         -0.1852702    -0.1161515    -0.3724863
## mlotst     -1.0581585    -1.0576267    -1.0591639
## o2         -0.3578062    -0.3842882    -0.3886404
## 
## 
## Covariance matrices:
## mean_ellipsoid 
##                ph      mlotst          o2
## ph     4.80693450 0.023155857 0.461105533
## mlotst 0.02315586 0.029809430 0.003446057
## o2     0.46110553 0.003446057 0.244026678
## 
## min_ellipsoid 
##                ph      mlotst          o2
## ph     4.24440910 0.015230824 0.478576978
## mlotst 0.01523082 0.028730895 0.004429377
## o2     0.47857698 0.004429377 0.242821292
## 
## max_ellipsoid 
##                ph      mlotst          o2
## ph     5.73484320 0.057582838 0.619603110
## mlotst 0.05758284 0.032676604 0.003491106
## o2     0.61960311 0.003491106 0.262258564
## 
## 
## 
## Volumes:
## mean_ellipsoid  min_ellipsoid  max_ellipsoid 
##       15.45232       13.86448       17.34092

Visualizing the Model Predictions

Current Climate Suitability

# Load the projected suitability map (Current Climate)
cholera_suitability_current <- rast("Cholera/Models/ellipsenm_cholera/mean_suitability_calibration_Cholera.tif")

# Plot current suitability map
plot(cholera_suitability_current, 
     main = "Cholera Suitability - Current Climate", 
     range = c(0, 1))
maps::map("world", add = TRUE)

Future Climate Suitability - SSP119 (Low Emission Scenario)

# Load the projected suitability map for SSP119
cholera_suitability_future_ssp119 <- rast("Cholera/Models/ellipsenm_cholera/mean_suitability_future_ssp119_Cholera.tif")

# Plot future suitability map (SSP119)
plot(cholera_suitability_future_ssp119, 
     main = "Cholera Suitability - SSP119 (2050s)", 
     range = c(0, 1))
maps::map("world", add = TRUE)

Future Climate Suitability - SSP585 (High Emission Scenario)

# Load the projected suitability map for SSP585
cholera_suitability_future_ssp585 <- rast("Cholera/Models/ellipsenm_cholera/mean_suitability_future_ssp585_Cholera.tif")

# Plot future suitability map (SSP585)
plot(cholera_suitability_future_ssp585, 
     main = "Cholera Suitability - SSP585 (2050s)", 
     range = c(0, 1))
maps::map("world", add = TRUE)