Introduction

In this short project, I model the remaining potential suitable habitat for the black rhino Diceros bicornis in Tanzania and Kenya.

Getting the occurrence data

occ <- gbif('Diceros', 'bicornis', ntries = 5)

Cleaning the occurrence data

occ_clean <- occ |> dplyr::select(lon, lat, basisOfRecord, occurrenceStatus) |> filter(basisOfRecord %in% c('HUMAN_OBSERVATION', 'PRESERVED_SPECIMEN')) |> 
  filter(occurrenceStatus == 'PRESENT') |> 
  mutate(species = 1) |> 
  dplyr::select(lon, lat, species) |>
  unique() |> 
  drop_na()
coordinates(occ_clean) <- ~lon+lat

Downloading and aggregating study area

Getting boundary data for Kenya and Tanzania and aggregating them to a single polygon.

KEN <- raster::getData('GADM', country = 'KEN', level = 0)
TZA <- raster::getData('GADM', country = 'TZA', level = 0)
g <- raster::bind(KEN, TZA)
both_countries <- raster::aggregate(g)

Downloading and masking predictor variables

Getting climate data

clim <- raster::getData('worldclim', var = 'bio', res = 10)
clim_crop <- mask(crop(clim, both_countries), both_countries)
plot(clim_crop[[2]])

Cropping occurrence records only for the study area boundary.

occ_crop <- crop(occ_clean, both_countries)
plot(clim_crop[[2]])
points(occ_crop, col = 'red')

Removing multicollinear predictor variables

Leaving out highly correlated predictor variables

ext <- raster::extract(clim_crop, occ_crop)
v <- usdm::vifstep(ext)
clim_used <- usdm::exclude(clim_crop, v)
clim_used
## class      : RasterBrick 
## dimensions : 100, 76, 7600, 8  (nrow, ncol, ncell, nlayers)
## resolution : 0.1666667, 0.1666667  (x, y)
## extent     : 29.33333, 42, -11.66667, 5  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +datum=WGS84 +no_defs 
## source     : memory
## names      : bio4, bio7, bio8, bio13, bio14, bio15, bio18, bio19 
## min values :  201,   92,   61,    45,     0,    30,    26,     0 
## max values : 2101,  217,  299,   667,    84,   133,   769,   591

Building the sdmData object

Preparing the sdmData object.

d <- sdmData(species~., train = occ_crop, predictors = clim_used, bg = list(method = 'gRandom', n = 500))
d
## class                                 : sdmdata 
## =========================================================== 
## number of species                     :  1 
## species names                         :  species 
## number of features                    :  8 
## feature names                         :  bio4, bio7, bio8, ... 
## type                                  :  Presence-Background 
## has independet test data?             :  FALSE 
## number of records                     :  757 
## has Coordinates?                      :  TRUE

## Building the sdm model

Building the model using random forest, support vector machine, and general linear model. I also use both bootstrapping and sub-sampling. Test percentage is 30% of the dataset leaving 70% for training the model.

model <- sdm(species~., d, methods = c('rf', 'svm', 'glm'), replications = c('boot', 'sub'), test.p = 30, n =3)
model
## class                                 : sdmModels 
## ======================================================== 
## number of species                     :  1 
## number of modelling methods           :  3 
## names of modelling methods            :  rf, svm, glm 
## replicate.methods (data partitioning) :  bootstrap,subsampling 
## number of replicates (each method)    :  3 
## toral number of replicates per model  :  6 (per species) 
## test percentage (in subsampling)      :  30 
## ------------------------------------------
## model run success percentage (per species)  :
## ------------------------------------------
## method          species          
## ---------------------- 
## rf         :        100   %
## svm        :        100   %
## glm        :        100   %
## 
## ###################################################################
## model Mean performance (per species), using test dataset (generated using partitioning):
## -------------------------------------------------------------------------------
## 
##  ## species   :  species 
## =========================
## 
## methods    :     AUC     |     COR     |     TSS     |     Deviance 
## -------------------------------------------------------------------------
## rf         :     0.97    |     0.87    |     0.86    |     0.4      
## svm        :     0.95    |     0.84    |     0.84    |     0.49     
## glm        :     0.93    |     0.78    |     0.82    |     0.63

Final model output for policy formulation

Here is the prediction and ensemble model output for policy considerations.

pred <- predict(model, clim_used)
ens <- ensemble(model, clim_used, setting = list(method = 'weighted', stat = 'tss', opt = 2))
plot(ens, main = 'Potential Suitable Habitats for Black Rhino in Kenya and Tanzania')

References

Allaire, JJ, Yihui Xie, Jonathan McPherson, Javier Luraschi, Kevin Ushey, Aron Atkins, Hadley Wickham, Joe Cheng, Winston Chang, and Richard Iannone. 2021. Rmarkdown: Dynamic Documents for r. https://CRAN.R-project.org/package=rmarkdown.
Bivand, Roger S., Edzer Pebesma, and Virgilio Gomez-Rubio. 2013. Applied Spatial Data Analysis with R, Second Edition. Springer, NY. https://asdar-book.org/.
Henry, Lionel, and Hadley Wickham. 2020. Purrr: Functional Programming Tools. https://CRAN.R-project.org/package=purrr.
Hijmans, Robert J. 2021. Raster: Geographic Data Analysis and Modeling. https://rspatial.org/raster.
Hijmans, Robert J., Steven Phillips, John Leathwick, and Jane Elith. 2021. Dismo: Species Distribution Modeling. https://rspatial.org/raster/sdm/.
Müller, Kirill, and Hadley Wickham. 2021. Tibble: Simple Data Frames. https://CRAN.R-project.org/package=tibble.
Naimi, Babak. 2017. Usdm: Uncertainty Analysis for Species Distribution Models. http://r-gis.net.
Naimi, Babak, and Miguel B. Araujo. 2016. “Sdm: A Reproducible and Extensible r Platform for Species Distribution Modelling.” Ecography 39: 368–75. https://doi.org/10.1111/ecog.01881.
———. 2021. Sdm: Species Distribution Modelling. https://www.biogeoinformatics.org.
Naimi, Babak, Nicholas a.s. Hamm, Thomas A. Groen, Andrew K. Skidmore, and Albertus G. Toxopeus. 2014. “Where Is Positional Uncertainty a Problem for Species Distribution Modelling.” Ecography 37: 191–203. https://doi.org/10.1111/j.1600-0587.2013.00205.x.
Pebesma, Edzer J., and Roger S. Bivand. 2005. “Classes and Methods for Spatial Data in R.” R News 5 (2): 9–13. https://CRAN.R-project.org/doc/Rnews/.
Pebesma, Edzer, and Roger Bivand. 2021. Sp: Classes and Methods for Spatial Data. https://CRAN.R-project.org/package=sp.
R Core Team. 2021. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org.
———. 2019. Stringr: Simple, Consistent Wrappers for Common String Operations. https://CRAN.R-project.org/package=stringr.
———. 2021a. Forcats: Tools for Working with Categorical Variables (Factors). https://CRAN.R-project.org/package=forcats.
———. 2021b. Tidyr: Tidy Messy Data. https://CRAN.R-project.org/package=tidyr.
———. 2021c. Tidyverse: Easily Install and Load the Tidyverse. https://CRAN.R-project.org/package=tidyverse.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.
Wickham, Hadley, Winston Chang, Lionel Henry, Thomas Lin Pedersen, Kohske Takahashi, Claus Wilke, Kara Woo, Hiroaki Yutani, and Dewey Dunnington. 2021. Ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics. https://CRAN.R-project.org/package=ggplot2.
Wickham, Hadley, Romain François, Lionel Henry, and Kirill Müller. 2021. Dplyr: A Grammar of Data Manipulation. https://CRAN.R-project.org/package=dplyr.
Wickham, Hadley, Jim Hester, and Jennifer Bryan. 2021. Readr: Read Rectangular Text Data. https://CRAN.R-project.org/package=readr.
Xie, Yihui. 2014. “Knitr: A Comprehensive Tool for Reproducible Research in R.” In Implementing Reproducible Computational Research, edited by Victoria Stodden, Friedrich Leisch, and Roger D. Peng. Chapman; Hall/CRC. http://www.crcpress.com/product/isbn/9781466561595.
———. 2015. Dynamic Documents with R and Knitr. 2nd ed. Boca Raton, Florida: Chapman; Hall/CRC. https://yihui.org/knitr/.
———. 2021. Knitr: A General-Purpose Package for Dynamic Report Generation in r. https://yihui.org/knitr/.
Xie, Yihui, J. J. Allaire, and Garrett Grolemund. 2018. R Markdown: The Definitive Guide. Boca Raton, Florida: Chapman; Hall/CRC. https://bookdown.org/yihui/rmarkdown.
Xie, Yihui, Christophe Dervieux, and Emily Riederer. 2020. R Markdown Cookbook. Boca Raton, Florida: Chapman; Hall/CRC. https://bookdown.org/yihui/rmarkdown-cookbook.