Introduction

Here I am going make a rapid model of the potential suitable habitats of a fish species Phago loricatus. Basically a demonstration of how predictor variables can help understand the distribution of a species.

In doing so, I used the following package versions:

  • R version 4.1.2 (2021-11-01)
  • knitr version 1.37
  • rmarkdown version 2.11
  • tidyverse version 1.3.1
  • sdm version 1.1.8
  • usdm version 1.1.18
  • raster version 3.5.15
  • dismo version 1.3.5
  • maptools version 1.1.2

Getting the occurrence data

I obtained occurrence records from the global biodiversity facility GBIF.

ext <- c(-7, 10, 4, 15)
phago_occ <- gbif('Phago', 'loricatus', ext = ext)
## 55 records found
## 0-55 records downloaded

There are 55 occurrence records of the species.

Cleaning the occurrence records

In this step, I select only records that were collected by humans or specimen records, presence records, with both longitude and latitude values, and not duplicated. Lastly, I mutate a column containing 1’s as species presence values.

phago_clean <- phago_occ |> 
  dplyr::select(lon, lat, occurrenceStatus, basisOfRecord) |> 
  filter(occurrenceStatus  == 'PRESENT') |> 
  filter(basisOfRecord  %in% c('HUMAN_OBSERVATION', 'PRESERVED_SPECIMEN')) |> 
  dplyr::select(lon, lat) |>  
  drop_na() |> 
  unique() |># Removes duplicates.
  mutate(species = 1)

Converting occurrence data to SpatialPointsDataFrame object.

phago_spatial <- phago_clean
coordinates(phago_spatial) <- ~lon+lat

Getting predictor variables from worldclim

phago_clim <- raster::getData('worldclim', var = 'bio', res = 10)
phago_crop <- crop(phago_clim, ext)

Checking for mullticollinearity among predictors (th >= 0.7)

points_extract <- raster::extract(phago_crop, phago_spatial)
v <- vifcor(points_extract)
## Warning in summary.lm(lm(as.formula(paste(colnames(y)[w[i]], "~.", sep = "")), :
## essentially perfect fit: summary may be unreliable

## Warning in summary.lm(lm(as.formula(paste(colnames(y)[w[i]], "~.", sep = "")), :
## essentially perfect fit: summary may be unreliable
phago_crop_used <- exclude(phago_crop, v) 

Building the sdmdata object

phago_sdmdata <- sdmData(species~., train = phago_spatial, predictors = phago_crop_used,
                         bg = list(method = 'gRandom', n = 1000))
## Loading required package: gbm
## Loaded gbm 2.1.8
## Loading required package: tree
## Registered S3 method overwritten by 'tree':
##   method     from
##   print.tree cli
## Loading required package: mda
## Loading required package: class
## Loaded mda 0.5-2
## Loading required package: mgcv
## Loading required package: nlme
## 
## Attaching package: 'nlme'
## The following object is masked from 'package:usdm':
## 
##     Variogram
## The following object is masked from 'package:raster':
## 
##     getData
## The following object is masked from 'package:dplyr':
## 
##     collapse
## This is mgcv 1.8-38. For overview type 'help("mgcv-package")'.
## Loading required package: glmnet
## Loading required package: Matrix
## 
## Attaching package: 'Matrix'
## The following objects are masked from 'package:tidyr':
## 
##     expand, pack, unpack
## Loaded glmnet 4.1-3
## Loading required package: earth
## Loading required package: Formula
## Loading required package: plotmo
## Loading required package: plotrix
## Loading required package: TeachingDemos
## Loading required package: rJava
## Loading required package: RSNNS
## Loading required package: Rcpp
## Loading required package: ranger
## Loading required package: randomForest
## randomForest 4.7-1
## Type rfNews() to see new features/changes/bug fixes.
## 
## Attaching package: 'randomForest'
## The following object is masked from 'package:ranger':
## 
##     importance
## The following object is masked from 'package:dplyr':
## 
##     combine
## The following object is masked from 'package:ggplot2':
## 
##     margin
## Loading required package: rpart
## Loading required package: kernlab
## 
## Attaching package: 'kernlab'
## The following objects are masked from 'package:raster':
## 
##     buffer, rotated
## The following object is masked from 'package:purrr':
## 
##     cross
## The following object is masked from 'package:ggplot2':
## 
##     alpha

Building the sdm model

phago_sdm <- sdm(species~., data = phago_sdmdata, methods = c('rf', 'svm', 'fda'), replications = c('sub', 'boot'), n = 4)
## Loading required package: parallel

Ensembling the model outputs

phago_ensemble <- ensemble(phago_sdm, newdata = phago_crop_used, setting = list(method = 'weighted', stat = 'TSS', opt = 2))

Visualizing the ensemble output

data("wrld_simpl")
plot(phago_ensemble, main = 'Potential Suitable Habitats of Phago loricatus')
plot(wrld_simpl, add = T)
points(phago_spatial, col = 'black')

Conclusion

With this rapid run through available occurrence and predictor data, there are some evidence that the species’ potential suitable habitats are confined along the coastal waters. Protection of such habitats transcends more than one country and may need collaborative effort for holistic conservation of Phago loricatus.

Next milestones

This is a work in progress and additional data like phylogenetic information could help boost the findings in future versions. As one would expect, water properties of the inland rivers would be quite important in such model, that has not been included in the present rapid model.

References

Allaire, JJ, Yihui Xie, Jonathan McPherson, Javier Luraschi, Kevin Ushey, Aron Atkins, Hadley Wickham, Joe Cheng, Winston Chang, and Richard Iannone. 2021. Rmarkdown: Dynamic Documents for r. https://CRAN.R-project.org/package=rmarkdown.
Bivand, Roger S., Edzer Pebesma, and Virgilio Gomez-Rubio. 2013. Applied Spatial Data Analysis with R, Second Edition. Springer, NY. https://asdar-book.org/.
Bivand, Roger, and Nicholas Lewin-Koh. 2021. Maptools: Tools for Handling Spatial Objects. https://CRAN.R-project.org/package=maptools.
Henry, Lionel, and Hadley Wickham. 2020. Purrr: Functional Programming Tools. https://CRAN.R-project.org/package=purrr.
Hijmans, Robert J. 2022. Raster: Geographic Data Analysis and Modeling. https://rspatial.org/raster.
Hijmans, Robert J., Steven Phillips, John Leathwick, and Jane Elith. 2021. Dismo: Species Distribution Modeling. https://rspatial.org/raster/sdm/.
Müller, Kirill, and Hadley Wickham. 2021. Tibble: Simple Data Frames. https://CRAN.R-project.org/package=tibble.
Naimi, Babak. 2017. Usdm: Uncertainty Analysis for Species Distribution Models. http://r-gis.net.
Naimi, Babak, and Miguel B. Araujo. 2016. “Sdm: A Reproducible and Extensible r Platform for Species Distribution Modelling.” Ecography 39: 368–75. https://doi.org/10.1111/ecog.01881.
———. 2021. Sdm: Species Distribution Modelling. https://www.biogeoinformatics.org.
Naimi, Babak, Nicholas a.s. Hamm, Thomas A. Groen, Andrew K. Skidmore, and Albertus G. Toxopeus. 2014. “Where Is Positional Uncertainty a Problem for Species Distribution Modelling.” Ecography 37: 191–203. https://doi.org/10.1111/j.1600-0587.2013.00205.x.
Pebesma, Edzer J., and Roger S. Bivand. 2005. “Classes and Methods for Spatial Data in R.” R News 5 (2): 9–13. https://CRAN.R-project.org/doc/Rnews/.
Pebesma, Edzer, and Roger Bivand. 2021. Sp: Classes and Methods for Spatial Data. https://CRAN.R-project.org/package=sp.
R Core Team. 2021. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org.
———. 2019. Stringr: Simple, Consistent Wrappers for Common String Operations. https://CRAN.R-project.org/package=stringr.
———. 2021a. Forcats: Tools for Working with Categorical Variables (Factors). https://CRAN.R-project.org/package=forcats.
———. 2021b. Tidyverse: Easily Install and Load the Tidyverse. https://CRAN.R-project.org/package=tidyverse.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.
Wickham, Hadley, Winston Chang, Lionel Henry, Thomas Lin Pedersen, Kohske Takahashi, Claus Wilke, Kara Woo, Hiroaki Yutani, and Dewey Dunnington. 2021. Ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics. https://CRAN.R-project.org/package=ggplot2.
Wickham, Hadley, Romain François, Lionel Henry, and Kirill Müller. 2022. Dplyr: A Grammar of Data Manipulation. https://CRAN.R-project.org/package=dplyr.
Wickham, Hadley, and Maximilian Girlich. 2022. Tidyr: Tidy Messy Data. https://CRAN.R-project.org/package=tidyr.
Wickham, Hadley, Jim Hester, and Jennifer Bryan. 2022. Readr: Read Rectangular Text Data. https://CRAN.R-project.org/package=readr.
Xie, Yihui. 2014. “Knitr: A Comprehensive Tool for Reproducible Research in R.” In Implementing Reproducible Computational Research, edited by Victoria Stodden, Friedrich Leisch, and Roger D. Peng. Chapman; Hall/CRC. http://www.crcpress.com/product/isbn/9781466561595.
———. 2015. Dynamic Documents with R and Knitr. 2nd ed. Boca Raton, Florida: Chapman; Hall/CRC. https://yihui.org/knitr/.
———. 2021. Knitr: A General-Purpose Package for Dynamic Report Generation in r. https://yihui.org/knitr/.
Xie, Yihui, J. J. Allaire, and Garrett Grolemund. 2018. R Markdown: The Definitive Guide. Boca Raton, Florida: Chapman; Hall/CRC. https://bookdown.org/yihui/rmarkdown.
Xie, Yihui, Christophe Dervieux, and Emily Riederer. 2020. R Markdown Cookbook. Boca Raton, Florida: Chapman; Hall/CRC. https://bookdown.org/yihui/rmarkdown-cookbook.