Understanding why species are found at some locations and not others is a central area of focus in ecology (MacKenzie et al. 2017). Humans have actually long been following and predicting (or at least trying to predict) the movement patterns and occurrence of species, since our survival has depended on it (e.g., food, tools, clothing, medicine, avoiding predators, parasites, etc.). Today, occupancy models give us a powerful and important tool for conservation, wildlife management, and policy and decision making. When we estimate species occupancy, we can predict and map the likelihood (or probability) of a species occurring at a given location and time based on things like land cover type or habitat. For example, if you estiamte the probability of occupancy of an endangered or inherently rare species throughout an area, you would data to support protecting areas that, if disturbed or developed, might have negative effects for species (e.g., an extinction event). Other terms that get used and are synonymous with occupancy are occurrence, or presence/absence.

We will work through an example analysis in R to estimate occupancy for the wood thrush (Hylocichla mustelina) in New Hampshire, USA. We will use eBird data (https://ebird.org/home) collected by citizen scientists (Sulivan et al. 2009), and will follow methods outlined as best practices for processing, sampling, and analyzing these data (Strimas-Mackey et al. 2020; https://cornelllabofornithology.github.io/ebird-best-practices/). Within groups, you will develop and test hypotheses relating habitat characteristics to the probability of detecting wood thrushes. To fit and compare multiple occupancy models, we will use the R package ‘unmarked’ (Fiske and Chandler 2011).By evaluating model performance we can assess the contributions of site-level and observation-level covariates on probability of occupancy. We will import raw eBird data from the past 10 years in NH, and combine these data with remotely-sensed data from NASA’s Moderate Resolution Imaging Spectroradiometer (MODIS; Friedl and Sulla-Menashe 2015) to extract land cover types including forest, urban, and cropland. Additionally, we’ll add surface elevation data (Amatulli et al. 2018). Finally, within your groups, you’ll work to develop and test hypotheses and modify the existing code to use models that can predict where wood thrushes are most likely to occur (and not occur) in NH.

After working through the analysis with your group, each group member will write a brief (i.e., <= 5-page) summary that should include the motivation and objectives of your study, statement of predictions or hypotheses being tested, results (ideally some tables and figures), map(s), and a short discussion about management implications or strategies that are informed by results.


  • Use open source eBird data and NASA’s MODIS satellite imagery data
  • Estimate the occupancy of the wood thrush in New Hampshire
  • Test different factors that might influence occupancy of a forest-breeding migratory songbird
  • Use a model to predict the probability of wood thrush occupancy
  • Create figures of the relationship of covariates to detection probability and occupancy
  • Create a map of model-estimated occupancy (Mean and SE) for NH
  • Discuss habitat management strategies and decisions in NH to promote the wood thrush, a species of conservation concern


  • Using R and open source code
  • Data manipulation
  • Using raster and vector files
  • Building simple functions
  • Fitting models
  • AIC model selection and model averaging
  • Interpreting model results
  • Making publishable maps and figures

1) Load packages and set working directory in R.

It’s important at the beginning of each R session to get things organized such as setting the working directory and loading (or installing, if needed) any packages or functions. Below, you’ll want to set your own path for your working directory.

#clear environment

#set working directory
my.directory<-"YOUR PATH GOES HERE"
# setwd(my.directory)

# install.packages("devtools")
# devtools::install_github("mstrimas/ebppackages")

#load packages

# resolve namespace conflicts
select <- dplyr::select
filter <- dplyr::filter

#set global options
knitr::opts_chunk$set(echo = TRUE, message=FALSE, results='hide')

2) Read in raw eBird data and process: filtering data and zero-filling

When we read in raw eBird data, there are several steps we have to take to keep only the parts of the data we’ll be using. We do this through a process of filtering or sub-setting the data, by constraining it to include only what we’ll need for this analysis. Since the eBird data base contains data collected in various ways, using different methods, and under different conditions, we have to be specific about which data we use, to ensure it meets the appropriate assumptions of the models we’ll be using. For example, in our case, we’ll be limiting our data that we’ll use to include only those from standing or travelling surveys up to 5-km (total distance), and surveys conducted between 01 June and 30 June each year from 2010 to 2019. And in order to estimate occupancy of a species, we need all the surveys where a species was detected and all the surveys where it was NOT detected.

# create a new folder: "Data" within your working directory where we'll store data and save files
data_dir <- "./Data"
if (!dir.exists(data_dir)) {

#read in eBird data downloaded from eBird as a text (.txt) file
ebd <- auk_ebd("./Data/ebd_US-NH_relDec-2019/ebd_US-NH_relDec-2019.txt", 
               file_sampling = "./Data/ebd_US-NH_relDec-2019/ebd_US-NH_relDec-2019.txt")

#define an object, ebd_filters, which are a subset (or filter) of all the raw data, ebd:
  # Species: Wood Thrush
  # State: NH
  # Dates: 01 June to 30 June
  # Protocol Type: Stationary or Traveling
ebd_filters <- ebd %>% 
  auk_species("Wood Thrush") %>% 
  # southeastern coastal plain bcr
  auk_state("US-NH") %>% 
  # june, use * to get data from any year
  auk_date(date = c("*-06-01", "*-06-30")) %>% 
  # restrict to the standard traveling and stationary count protocols
  auk_protocol(protocol = c("Stationary", "Traveling")) %>% 

#make file paths for eBird files output from previous code to make ebd_filters
f_ebd <- file.path(data_dir, "ebd_woothr_june_NH.txt")
f_sampling <- file.path(data_dir, "ebd_checklists_june_NH.txt")

# only run if the files don't already exist
if (!file.exists(f_ebd)) {
  auk_filter(ebd_filters, file = f_ebd, file_sampling = f_sampling)

#zero-filling the data (i.e., adding zeros where NAs are for each survey event where the species was NOT detected)
ebd_zf <- auk_zerofill(f_ebd, f_sampling, collapse = TRUE)

# function to convert time observation to hours since midnight
time_to_decimal <- function(x) {
  x <- hms(x, quiet = TRUE)
  hour(x) + minute(x) / 60 + second(x) / 3600

#fix variable names

#remove some columns

# clean up variables
ebd_zf <- ebd_zf %>% 
    # convert X to NA
    observation_count = if_else(observation_count == "X", 
                                NA_character_, observation_count),
    observation_count = as.integer(observation_count),
    # effort_distance_km to 0 for non-travelling counts
    effort_distance_km = if_else(protocol_type != "Traveling", 
                                 0, effort_distance_km),
    # convert time to decimal hours since midnight
    time_observations_started = time_to_decimal(time_observations_started),
    # split date into year and day of year
    year = year(observation_date),
    day_of_year = yday(observation_date)

# additional filtering: this will constrain data further to only include surveys 1.5  hrs or less, 5-km traveled or less, years greater than or equal to 2010, and surveys with fewer than or equal to 10.
ebd_zf_filtered <- ebd_zf %>% 
    # effort filters
    duration_minutes <= 5 * 60,
    effort_distance_km <= 5,
    # last 10 years of data
    year >= 2010,
    # 10 or fewer observers
    number_observers <= 10)

#final formmating and saving
ebird <- ebd_zf_filtered %>% 
  select(checklist_id, observer_id, sampling_event_identifier,
         observation_count, species_observed, 
         state_code, locality_id, latitude, longitude,
         protocol_type, all_species_reported,
         observation_date, year, day_of_year,
         duration_minutes, effort_distance_km,

#Export the formatted eBird data as a new .csv file
write_csv(ebird, "./Data/ebd_woothr_june_NH_zf.csv", na = "")

3) Read in formatted data, perform spatial sampling, and plot on map

#read in ebd_zf data

#We'll want to set the map projection so every time we plot data (vector or raster), everything will line up correctly
map_proj <- "+init=epsg:4326 +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"

#get bounding box coords from eBird data
x.min=min(ebird$longitude, na.rm=TRUE)-0.4
x.max=max(ebird$longitude, na.rm=TRUE)+0.4
y.min=min(ebird$latitude, na.rm=TRUE)-0.3
y.max=max(ebird$latitude, na.rm=TRUE)+0.3

#compile bounding box corner coordinates
ebird.bbox = c(left = x.min, bottom = y.min, right = x.max, top = y.max)

#define map area and type
map.new <- get_stamenmap(bbox=ebird.bbox, maptype="toner-background", source="stamen", zoom=8)

stateMap <- raster::getData('GADM', country='USA', level=2)
nh.sub<-stateMap[stateMap@data$NAME_1 == "New Hampshire",]

nh.poly<-fortify(nh.sub, region="NAME_1")
nh.poly <- nh.poly[order(nh.poly$order),]

#plot base map and vector data with ggplot
ebird$species_observed<-factor(ebird$species_observed, levels=c("TRUE","FALSE"))

rawEbirdMap<-ggmap(map.new, darken=c(0.7,"white"))+
  geom_map(data=nh.poly, map=nh.poly, aes(y=lat, x=long, map_id=id),color="black",fill="gray90",alpha=0.8, inherit.aes = FALSE)+
  geom_point(data=ebird, aes(x=longitude, y=latitude, color=species_observed),alpha=0.6,shape=16,size=0.6)+
  scale_color_manual(values=c("red","gray40"), name="Detected")+
  #overlay layer of all locations with positive detections
  #geom_point(data=subset(ebird, species_observed==TRUE), aes(x=longitude, y=latitude),color="royalblue",alpha=0.7,shape=16,size=1)+

mapOut<-rawEbirdMap+ggsn::scalebar(dist = 20, dist_unit="km", model = "WGS84", transform=TRUE, y.min = y.min, y.max = y.max, x.min =x.min, x.max = x.max, st.size = 2, st.dist=0.03, location = "bottomright", st.bottom = FALSE, border.size=0.3, anchor=c(x=-70.55, y=42.48))+
  theme(legend.key.size = ,legend.position = c(0.2,0.8),   legend.background=element_rect(color="black",fill=alpha("white",0.4)),
        legend.title = element_text(size = 7), legend.text = element_text(size = 7))

ggsn::north2(mapOut, x = 0.69, y = 0.26, scale = 0.1, symbol = 11)