Introduction

In this short article I demonstrate how to assess species occurrence records that are ‘likely’ to be outside the species’ range. Most likely outliers or imprecisely georeferenced occurrence points. Such records are quite common in big data like from GBIF and other citizen science sources.

Loading data

I will start by downloading climate data from worldclim, Kenya boundary shapefile from GADM and a few occurrence records within Kenya and convert them to spatialPointsDataframe.

clim <- getData('worldclim', var = 'bio', res = 10)
KEN <- getData('GADM', country = 'KEN', level = 0)
df <- data.frame(longitude = c(40.029948,  39.031136, 35.587305),
                 latitude = c(2.627751,  -1.269534, 1.072451),
                 sampling_sites = c('Wajir', 'Bura Tana', 'Chesoi'))
df_spatial <- df
coordinates(df_spatial) <- ~longitude+latitude

Cropping data to study area

The next thing is to crop and mask climate data with boundary of Kenya.

clim_mask <- mask(crop(clim, KEN), KEN)
plot(clim_mask[[4]])
plot(KEN, border = 'purple', lwd = 5, add = T)

Generating buffer zone around points

The next phase is to create buffer of one map unit around the occurrence points and extract raster values that fall within the created buffers.

set_buff <- gBuffer(df_spatial, width = 0.5, 
                   byid = T, 
                   id = df_spatial@data$sampling_sites)
values_within_buffer <- raster::extract(clim_mask, set_buff, df = T)
plot(clim_mask[[4]])
plot(KEN, border = 'purple', lwd = 5, add = T)
plot(set_buff, add = T)
plot(df_spatial, add = T)

Plotting the pixel values within buffer

Lastly, we plot the values on boxplot to show points whose extracted values are clearly different from the other values. This could be an occurrence point(s) that has/have been wrongly recorded and may be excluded when running sdm.

values_within_buffer |> mutate(group = case_when(ID == 1 ~ "Wajir",
                              ID == 2 ~ "Bura Tana",
                              ID == 3 ~ "Chesoi")) |> 
  ggplot(aes(x = group, y = bio4, fill = group)) +
  geom_boxplot() +
  geom_jitter(width = 0.1)

Conclusion

In this case, Chesoi site appears to differ from the other two sites with regard to bio4. In case bio4 is one of the most important factors in determining the distribution of the species then we might decide to leave out Chesoi occurrence record from the model procedure and only use those for Bura Tana and Wajir. There is possibility of running probabilistic/Bayesian models to evaluate whether Chesoi is ‘really’ outside the range of the species. Frequentist approaches like anova with some p-values can also be used to test whether the mean of bio4 values around those occurrence records are statistically different. Code generating this html file can be sourced from .Rmd file in gitHub. Happy sdm-ing!.

References

Allaire, JJ, Yihui Xie, Jonathan McPherson, Javier Luraschi, Kevin Ushey, Aron Atkins, Hadley Wickham, Joe Cheng, Winston Chang, and Richard Iannone. 2021. Rmarkdown: Dynamic Documents for r. https://CRAN.R-project.org/package=rmarkdown.
Bivand, Roger S., Edzer Pebesma, and Virgilio Gomez-Rubio. 2013. Applied Spatial Data Analysis with R, Second Edition. Springer, NY. https://asdar-book.org/.
Bivand, Roger, and Colin Rundel. 2021. Rgeos: Interface to Geometry Engine - Open Source (GEOS). https://r-forge.r-project.org/projects/rgeos/ https://trac.osgeo.org/geos/ http://rgeos.r-forge.r-project.org/index.html.
Henry, Lionel, and Hadley Wickham. 2020. Purrr: Functional Programming Tools. https://CRAN.R-project.org/package=purrr.
Hijmans, Robert J. 2021. Raster: Geographic Data Analysis and Modeling. https://rspatial.org/raster.
Müller, Kirill, and Hadley Wickham. 2021. Tibble: Simple Data Frames. https://CRAN.R-project.org/package=tibble.
Pebesma, Edzer J., and Roger S. Bivand. 2005. “Classes and Methods for Spatial Data in R.” R News 5 (2): 9–13. https://CRAN.R-project.org/doc/Rnews/.
Pebesma, Edzer, and Roger Bivand. 2021. Sp: Classes and Methods for Spatial Data. https://CRAN.R-project.org/package=sp.
R Core Team. 2021. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org.
———. 2019. Stringr: Simple, Consistent Wrappers for Common String Operations. https://CRAN.R-project.org/package=stringr.
———. 2021a. Forcats: Tools for Working with Categorical Variables (Factors). https://CRAN.R-project.org/package=forcats.
———. 2021b. Tidyr: Tidy Messy Data. https://CRAN.R-project.org/package=tidyr.
———. 2021c. Tidyverse: Easily Install and Load the Tidyverse. https://CRAN.R-project.org/package=tidyverse.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.
Wickham, Hadley, Winston Chang, Lionel Henry, Thomas Lin Pedersen, Kohske Takahashi, Claus Wilke, Kara Woo, Hiroaki Yutani, and Dewey Dunnington. 2021. Ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics. https://CRAN.R-project.org/package=ggplot2.
Wickham, Hadley, Romain François, Lionel Henry, and Kirill Müller. 2021. Dplyr: A Grammar of Data Manipulation. https://CRAN.R-project.org/package=dplyr.
Wickham, Hadley, and Jim Hester. 2021. Readr: Read Rectangular Text Data. https://CRAN.R-project.org/package=readr.
Xie, Yihui. 2014. “Knitr: A Comprehensive Tool for Reproducible Research in R.” In Implementing Reproducible Computational Research, edited by Victoria Stodden, Friedrich Leisch, and Roger D. Peng. Chapman; Hall/CRC. http://www.crcpress.com/product/isbn/9781466561595.
———. 2015. Dynamic Documents with R and Knitr. 2nd ed. Boca Raton, Florida: Chapman; Hall/CRC. https://yihui.org/knitr/.
———. 2021. Knitr: A General-Purpose Package for Dynamic Report Generation in r. https://yihui.org/knitr/.
Xie, Yihui, J. J. Allaire, and Garrett Grolemund. 2018. R Markdown: The Definitive Guide. Boca Raton, Florida: Chapman; Hall/CRC. https://bookdown.org/yihui/rmarkdown.
Xie, Yihui, Christophe Dervieux, and Emily Riederer. 2020. R Markdown Cookbook. Boca Raton, Florida: Chapman; Hall/CRC. https://bookdown.org/yihui/rmarkdown-cookbook.