Getting started with WhereNext

Jorge Velásquez-Tibatá · The Nature Conservancy, Colombia · jorge.velasquez at


WhereNext is a recommendation system designed to optimize the process of filling gaps in biodiversity knowledge. As such, it may help in the process of planning biological surveys, prioritize sampling from a list of preselected sites and even prioritize regions to mobilize data from (e.g. by digitizing local collections). WhereNext relies on open biodiversity and environmental data, but also provides users with options to upload their own datasets. Though devised to be run using a graphic interface, WhereNext functions may be called directly from the console and integrated into custom data analysis workflows. Besides WhereNext recommendations, the intermediate products generated by the app maybe useful in other analysis, such as raw and cleaned occurrences downloaded from GBIF, cropped and masked WorldClim data, cell sampling completeness estimations and the spatial representation of community dissimilarity.


WhereNext is based on generalized dissimilarity modeling (GDM), a statistical method to model and predict spatial patterns of dissimilarity in community composition (Ferrier et al. 2007). In GDM, site-pair community dissimilarity (measured through a dissimilarity index, such as Bray-Curtis or Jaccard) is modeled as a function of the respective differences in environmental variables among sites. This modeling permits estimation of the expected dissimilarity between arbitrary pairs of sites, for example, between surveyed and unsurveyed sites.

To optimally fill gaps in biodiversity knowledge, WhereNext uses the finding by Faith & Walker (1996) that the number of species sampled by a set of sites will be maximized if on average the biological distance from sampled to unsampled sites is as small as possible. Thus, each iteration of the “Recommend Survey” component of WhereNext, will find the grid cell that minimizes the biological distance to all already sampled places, in other words, the most complementary.


Install and run the WhereNext shiny app using the following R code.

#install.packages("devtools") #Run this line if not installed
devtools::install_github("jivelasquezt/WhereNext-Pkg", build_vignettes = TRUE)

Open the function specific help by running help(package="WhereNext").


WhereNext layout is divided into five sections. [1]. Components, which are major steps in the workflow of the app and are meant to be run in order. [2]. Control panel, which consists of modules that also need to be run sequentially, unless they’re optional. This is where interaction with the user takes place. [3]. Message box, which provides information, warnings and error messages. [4]. Results, consisting of three tabs: map (spatial data), occurrence table (biological data), results (component dependent results). [5]. Spinning wheel, which indicates R is busy computing results. Credit is due to package Wallace ( after which WhereNext layout is modeled.

Results from all modules may be downloaded by clicking on their respective download button .

Running an analysis on WhereNext

To get started with WhereNext we will run an analysis of survey priorities for Colombian palms (Arecaceae). To that end we will use open occurrence and environmental data available in GBIF and WorldClim, respectively. Bear in mind that reading large occurrence files into R, such as those available for birds, may cause R to crash if you don’t have enough RAM in your system. This applies also to large rasters. In those cases, you can: a) reduce your file size by keeping only the necessary information (essentially unique occurrences by cell, though file formatting rules apply); b) increase the cell size of your analysis and/or c) reduce the extent of your analysis. WhereNext has been tested on datasets of up to 9 million occurrences and areas as large as Russia, on both macOS and Windows computers with a RAM from 8 to 16 GB

Component 1: downloading and processing occurrences

The first step in running WhereNext is to define your study area. This may be a country, which you can select from the dropdown list, or a customized area, in which case you’d have to select the “User shape option” in module 1.1 to upload a zipped shapefile. Here, we’ll select Colombia from the list [1]

Next, you’ll need to select the source of your occurrence data in module 1.2. To download data from GBIF, select the taxonomic rank [2] in which you’re interested (class, order or family) and enter the name of your group [3]. Here, we’ll choose “Family” and enter “Arecaceae” (without quotes, always capitalize the first letter) in the text box.