Understanding why species are found at some locations and not others is a central area of focus in ecology (MacKenzie et al. 2017). Humans have actually long been following and predicting (or at least trying to predict) the movement patterns and occurrence of species, since our survival has depended on it (e.g., food, tools, clothing, medicine, avoiding predators, parasites, etc.). Today, occupancy models give us a powerful and important tool for conservation, wildlife management, and policy and decision making. When we estimate species occupancy, we can predict and map the likelihood (or probability) of a species occurring at a given location and time based on things like land cover type or habitat. For example, if you estiamte the probability of occupancy of an endangered or inherently rare species throughout an area, you would data to support protecting areas that, if disturbed or developed, might have negative effects for species (e.g., an extinction event). Other terms that get used and are synonymous with occupancy are occurrence, or presence/absence.
We will work through an example analysis in R to estimate occupancy for the wood thrush (Hylocichla mustelina) in New Hampshire, USA. We will use eBird data (https://ebird.org/home) collected by citizen scientists (Sulivan et al. 2009), and will follow methods outlined as best practices for processing, sampling, and analyzing these data (Strimas-Mackey et al. 2020; https://cornelllabofornithology.github.io/ebird-best-practices/). Within groups, you will develop and test hypotheses relating habitat characteristics to the probability of detecting wood thrushes. To fit and compare multiple occupancy models, we will use the R package ‘unmarked’ (Fiske and Chandler 2011).By evaluating model performance we can assess the contributions of site-level and observation-level covariates on probability of occupancy. We will import raw eBird data from the past 10 years in NH, and combine these data with remotely-sensed data from NASA’s Moderate Resolution Imaging Spectroradiometer (MODIS; Friedl and Sulla-Menashe 2015) to extract land cover types including forest, urban, and cropland. Additionally, we’ll add surface elevation data (Amatulli et al. 2018). Finally, within your groups, you’ll work to develop and test hypotheses and modify the existing code to use models that can predict where wood thrushes are most likely to occur (and not occur) in NH.
After working through the analysis with your group, each group member will write a brief (i.e., <= 5-page) summary that should include the motivation and objectives of your study, statement of predictions or hypotheses being tested, results (ideally some tables and figures), map(s), and a short discussion about management implications or strategies that are informed by results.
It’s important at the beginning of each R session to get things organized such as setting the working directory and loading (or installing, if needed) any packages or functions. Below, you’ll want to set your own path for your working directory.
#clear environment
rm(list=ls())
#set working directory
my.directory<-"YOUR PATH GOES HERE"
# setwd(my.directory)
# install.packages("devtools")
# devtools::install_github("mstrimas/ebppackages")
#load packages
library(raster)
library(ggplot2)
library(ggmap)
library(ggsn)
library(viridisLite)
library(auk)
library(lubridate)
library(sf)
library(gridExtra)
library(tidyverse)
library(grid)
library(velox)
library(MODIS)
library(dggridR)
library(unmarked)
library(ebirdst)
library(MuMIn)
library(AICcmodavg)
library(fields)
library(purrr)
library(dplyr)
library(knitr)
library(pander)
# resolve namespace conflicts
select <- dplyr::select
filter <- dplyr::filter
#set global options
knitr::opts_chunk$set(echo = TRUE, message=FALSE, results='hide')
When we read in raw eBird data, there are several steps we have to take to keep only the parts of the data we’ll be using. We do this through a process of filtering or sub-setting the data, by constraining it to include only what we’ll need for this analysis. Since the eBird data base contains data collected in various ways, using different methods, and under different conditions, we have to be specific about which data we use, to ensure it meets the appropriate assumptions of the models we’ll be using. For example, in our case, we’ll be limiting our data that we’ll use to include only those from standing or travelling surveys up to 5-km (total distance), and surveys conducted between 01 June and 30 June each year from 2010 to 2019. And in order to estimate occupancy of a species, we need all the surveys where a species was detected and all the surveys where it was NOT detected.
# create a new folder: "Data" within your working directory where we'll store data and save files
data_dir <- "./Data"
if (!dir.exists(data_dir)) {
dir.create(data_dir)
}
#read in eBird data downloaded from eBird as a text (.txt) file
ebd <- auk_ebd("./Data/ebd_US-NH_relDec-2019/ebd_US-NH_relDec-2019.txt",
file_sampling = "./Data/ebd_US-NH_relDec-2019/ebd_US-NH_relDec-2019.txt")
#define an object, ebd_filters, which are a subset (or filter) of all the raw data, ebd:
# Species: Wood Thrush
# State: NH
# Dates: 01 June to 30 June
# Protocol Type: Stationary or Traveling
ebd_filters <- ebd %>%
auk_species("Wood Thrush") %>%
# southeastern coastal plain bcr
auk_state("US-NH") %>%
# june, use * to get data from any year
auk_date(date = c("*-06-01", "*-06-30")) %>%
# restrict to the standard traveling and stationary count protocols
auk_protocol(protocol = c("Stationary", "Traveling")) %>%
auk_complete()
ebd_filters
#make file paths for eBird files output from previous code to make ebd_filters
f_ebd <- file.path(data_dir, "ebd_woothr_june_NH.txt")
f_sampling <- file.path(data_dir, "ebd_checklists_june_NH.txt")
# only run if the files don't already exist
if (!file.exists(f_ebd)) {
auk_filter(ebd_filters, file = f_ebd, file_sampling = f_sampling)
}
#zero-filling the data (i.e., adding zeros where NAs are for each survey event where the species was NOT detected)
ebd_zf <- auk_zerofill(f_ebd, f_sampling, collapse = TRUE)
# function to convert time observation to hours since midnight
time_to_decimal <- function(x) {
x <- hms(x, quiet = TRUE)
hour(x) + minute(x) / 60 + second(x) / 3600
}
#fix variable names
names(ebd_zf)[names(ebd_zf)=="observation_count.x"]<-"observation_count"
names(ebd_zf)[names(ebd_zf)=="scientific_name.x"]<-"scientific_name"
#remove some columns
ebd_zf$scientific_name.y<-NULL
ebd_zf$observation_count.y<-NULL
# clean up variables
ebd_zf <- ebd_zf %>%
mutate(
# convert X to NA
observation_count = if_else(observation_count == "X",
NA_character_, observation_count),
observation_count = as.integer(observation_count),
# effort_distance_km to 0 for non-travelling counts
effort_distance_km = if_else(protocol_type != "Traveling",
0, effort_distance_km),
# convert time to decimal hours since midnight
time_observations_started = time_to_decimal(time_observations_started),
# split date into year and day of year
year = year(observation_date),
day_of_year = yday(observation_date)
)
# additional filtering: this will constrain data further to only include surveys 1.5 hrs or less, 5-km traveled or less, years greater than or equal to 2010, and surveys with fewer than or equal to 10.
ebd_zf_filtered <- ebd_zf %>%
filter(
# effort filters
duration_minutes <= 5 * 60,
effort_distance_km <= 5,
# last 10 years of data
year >= 2010,
# 10 or fewer observers
number_observers <= 10)
#final formmating and saving
ebird <- ebd_zf_filtered %>%
select(checklist_id, observer_id, sampling_event_identifier,
scientific_name,
observation_count, species_observed,
state_code, locality_id, latitude, longitude,
protocol_type, all_species_reported,
observation_date, year, day_of_year,
time_observations_started,
duration_minutes, effort_distance_km,
number_observers)
#Export the formatted eBird data as a new .csv file
write_csv(ebird, "./Data/ebd_woothr_june_NH_zf.csv", na = "")
#read in ebd_zf data
ebird<-read.csv("./Data/ebd_woothr_june_NH_zf.csv",header=TRUE)
#We'll want to set the map projection so every time we plot data (vector or raster), everything will line up correctly
map_proj <- "+init=epsg:4326 +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"
#get bounding box coords from eBird data
x.min=min(ebird$longitude, na.rm=TRUE)-0.4
x.max=max(ebird$longitude, na.rm=TRUE)+0.4
y.min=min(ebird$latitude, na.rm=TRUE)-0.3
y.max=max(ebird$latitude, na.rm=TRUE)+0.3
#compile bounding box corner coordinates
ebird.bbox = c(left = x.min, bottom = y.min, right = x.max, top = y.max)
#define map area and type
map.new <- get_stamenmap(bbox=ebird.bbox, maptype="toner-background", source="stamen", zoom=8)
stateMap <- raster::getData('GADM', country='USA', level=2)
nh.sub<-stateMap[stateMap@data$NAME_1 == "New Hampshire",]
nh.poly<-fortify(nh.sub, region="NAME_1")
nh.poly <- nh.poly[order(nh.poly$order),]
nh.poly$id<-as.factor(nh.poly$id)
#plot base map and vector data with ggplot
ebird$species_observed<-as.factor(as.character(ebird$species_observed))
ebird$species_observed<-factor(ebird$species_observed, levels=c("TRUE","FALSE"))
rawEbirdMap<-ggmap(map.new, darken=c(0.7,"white"))+
geom_map(data=nh.poly, map=nh.poly, aes(y=lat, x=long, map_id=id),color="black",fill="gray90",alpha=0.8, inherit.aes = FALSE)+
geom_point(data=ebird, aes(x=longitude, y=latitude, color=species_observed),alpha=0.6,shape=16,size=0.6)+
scale_color_manual(values=c("red","gray40"), name="Detected")+
#overlay layer of all locations with positive detections
#geom_point(data=subset(ebird, species_observed==TRUE), aes(x=longitude, y=latitude),color="royalblue",alpha=0.7,shape=16,size=1)+
theme(panel.border=element_rect(color="black",fill="transparent"))+
labs(x="Longitude",y="Latitude")+
guides(alpha="none")
#rawEbirdMap
mapOut<-rawEbirdMap+ggsn::scalebar(dist = 20, dist_unit="km", model = "WGS84", transform=TRUE, y.min = y.min, y.max = y.max, x.min =x.min, x.max = x.max, st.size = 2, st.dist=0.03, location = "bottomright", st.bottom = FALSE, border.size=0.3, anchor=c(x=-70.55, y=42.48))+
theme(legend.key.size = ,legend.position = c(0.2,0.8), legend.background=element_rect(color="black",fill=alpha("white",0.4)),
legend.title = element_text(size = 7), legend.text = element_text(size = 7))
ggsn::north2(mapOut, x = 0.69, y = 0.26, scale = 0.1, symbol = 11)
#Elevation data
elev <- raster("./Data/elevation_1KMmd_GMTEDmd.tif")
raster::plot(elev, col=viridis(100))
#now subset NH and plot
#crop clips the elevation raster to extent of polygon
nh_elev<-crop(elev, nh.sub)
raster::plot(nh_elev, col=viridis(100))
#mask clips elevation raster to include everything with the polygon's perimeter
nh_elev_crop<-mask(nh_elev, nh.sub)
raster::plot(nh_elev_crop, col=viridis(100))
#save cropped NH elev raster
writeRaster(nh_elev_crop, './Data/nh_elev_crop.tif', overwrite=TRUE)
To generate the data we’re loading in the is section, you’ll use the ‘MODIS’ package, which will require some preliminary steps in getting all the correct dependencies (e.g., GDAL, hdf4) installed and talking to each other correctly (by providing path information). For detailed info on downloading MODIS data from R, there is much information available on the web, but refer here first: https://cornelllabofornithology.github.io/ebird-best-practices/
# load the landcover data as a raster stack
landcover <- list.files("./Data/modis", "^modis_mcd12q1_umd",
full.names = TRUE) %>%
stack()
# label layers with year
landcover <- names(landcover) %>%
str_extract("(?<=modis_mcd12q1_umd_)[0-9]{4}") %>%
paste0("y", .) %>%
setNames(landcover, .)
#subset raster stack to get single raster layer, 9
landcover2019<-subset(landcover, 9, drop=TRUE)
#reproject raster
landcover.proj<-raster::projectRaster(landcover2019, crs=map_proj)
#crop to NH polygon
landcover.nh<-crop(landcover.proj, nh.sub)
landcover.nh<-mask(landcover.nh, nh.sub)
#convert raster to data.frame
landcover2019.df<-as.data.frame(landcover2019, xy = TRUE, na.rm = TRUE)
#make land cover lookup data.frame
lc.lookup<-data.frame(value=0:15,Class = c("Water","Evergreen_needleleaf", "Evergreen_broadleaf",
"Deciduous_needleleaf","Deciduous_broadleaf","Mixed_forest",
"Closed_shrubland", "Open_shrubland", "Woody_savanna","Savanna",
"Grassland", "Wetland", "Cropland", "Urban","Mosiac", "Barren"))
#convert raster to a data.frame
raster2.pts <- rasterToPoints(landcover.nh)
raster2.df <- data.frame(raster2.pts)
#reset colnames
colnames(raster2.df)[3]<-"value"
#round numbers
raster2.df$value<-round(raster2.df$value,0)
#merge with raster2.df
raster.merge<-merge(raster2.df, lc.lookup, by="value",all.x=TRUE)
#set colors for land cover classes
myColors<-c("khaki3","yellow","darkgreen","springgreen3","seagreen3","darkolivegreen4","tan","chartreuse3","orange","chocolate","red","royalblue","turquoise","purple")
#plot with ggplot
landCoverMap<-ggmap(map.new, darken=c(0.7,"white"))+
geom_tile(data=raster.merge, aes(x=x, y=y, fill=Class, color=Class))+
# geom_map(data=nh.poly, map=nh.poly, aes(y=lat, x=long, map_id=id),color="black",fill="gray90",alpha=0.8, inherit.aes = FALSE)+
scale_fill_manual(values=myColors, name="Land Cover Class")+
scale_color_manual(values=myColors)+
#overlay layer of all locations with positive detections
#geom_point(data=subset(ebird, species_observed==TRUE), aes(x=longitude, y=latitude),color="royalblue",alpha=0.7,shape=16,size=1)+
theme(panel.border=element_rect(color="black",fill="transparent"))+
labs(x="Longitude",y="Latitude")+
guides(alpha="none", color="none")
landCoverOut<-landCoverMap+ggsn::scalebar(dist = 20, dist_unit="km", model = "WGS84", transform=TRUE, y.min = y.min, y.max = y.max, x.min =x.min, x.max = x.max, st.size = 2, st.dist=0.03, location = "bottomright", st.bottom = FALSE, border.size=0.3, anchor=c(x=-70.55, y=42.48))
ggsn::north2(landCoverOut, x = 0.55, y = 0.26, scale = 0.1, symbol = 11)
# load the landcover data as a raster stack
landcover <- list.files("./Data/modis", "^modis_mcd12q1_umd",
full.names = TRUE) %>%
raster::stack()
# label layers with year
landcover <- names(landcover) %>%
str_extract("(?<=modis_mcd12q1_umd_)[0-9]{4}") %>%
paste0("y", .) %>%
setNames(landcover, .)
######################################################################
#get most recent year in rasters
max_lc_year <- names(landcover) %>%
str_extract("[0-9]{4}") %>%
as.integer() %>%
max()
#create a buffer radius of cells
neighborhood_radius <- 5 * ceiling(max(res(landcover))) / 2
#use buffer around survey locations to then extract raster cell values from
ebird_buff <- ebird %>%
distinct(year = format(as.Date(observation_date), "%Y"),
locality_id, latitude, longitude) %>%
# for 2018 use 2017 landcover data
mutate(year_lc = if_else(as.integer(year) > max_lc_year,
as.character(max_lc_year), year),
year_lc = paste0("y", year_lc)) %>%
# convert to spatial features
st_as_sf(coords = c("longitude", "latitude"), crs = 4326) %>%
# transform to modis projection
st_transform(crs = raster::projection(landcover)) %>%
# buffer to create neighborhood around each point
st_buffer(dist = neighborhood_radius) %>%
# nest by year
nest(data = c(year, locality_id, geometry))
####################################################################################
# function to extract landcover data for all checklists in a given year
calculate_pland <- function(yr, regions, lc) {
# create a lookup table to get locality_id from row number
locs <- st_set_geometry(regions, NULL) %>%
mutate(id = row_number())
# extract using velox
lc_vlx <- velox(lc[[yr]])
lc_vlx$extract(regions, df = TRUE) %>%
# velox doesn't properly name columns, fix that
set_names(c("id", "landcover")) %>%
# join to lookup table to get locality_id
inner_join(locs, ., by = "id") %>%
select(-id)
}
# iterate over all years extracting landcover for all checklists in each
lc_extract <- ebird_buff %>%
mutate(pland = map2(year_lc, data, calculate_pland, lc = landcover)) %>%
select(pland) %>%
unnest(cols = pland)
pland <- lc_extract %>%
# count landcovers
count(locality_id, year, landcover) %>%
# calculate proporiton
group_by(locality_id, year) %>%
mutate(pland = n / sum(n)) %>%
ungroup() %>%
select(-n) %>%
# remove NAs after tallying so pland is relative to total number of cells
filter(!is.na(landcover))
# convert names to be more descriptive
lc_names <- tibble(landcover = 0:15,
lc_name = c("pland_00_water",
"pland_01_evergreen_needleleaf",
"pland_02_evergreen_broadleaf",
"pland_03_deciduous_needleleaf",
"pland_04_deciduous_broadleaf",
"pland_05_mixed_forest",
"pland_06_closed_shrubland",
"pland_07_open_shrubland",
"pland_08_woody_savanna",
"pland_09_savanna",
"pland_10_grassland",
"pland_11_wetland",
"pland_12_cropland",
"pland_13_urban",
"pland_14_mosiac",
"pland_15_barren"))
pland <- pland %>%
inner_join(lc_names, by = "landcover") %>%
arrange(landcover) %>%
select(-landcover)
# tranform to wide format, filling in implicit missing values with 0s%>%
pland <- pland %>%
pivot_wider(names_from = lc_name,
values_from = pland,
values_fill = list(pland = 0))
# save
write_csv(pland, "./Data/modis_pland_location-year.csv")
#Now for elevation data
# buffer each checklist location
ebird_buff_noyear <- ebird %>%
distinct(locality_id, latitude, longitude) %>%
st_as_sf(coords = c("longitude", "latitude"), crs = 4326) %>%
st_transform(crs = projection(elev)) %>%
st_buffer(dist = neighborhood_radius)
# extract using velox and calculate median and sd
locs <- st_set_geometry(ebird_buff_noyear, NULL) %>%
mutate(id = row_number())
elev_checklists <- velox(elev)$extract(ebird_buff_noyear, df = TRUE) %>%
# velox doesn't properly name columns, fix that
set_names(c("id", "elevation")) %>%
# join to lookup table to get locality_id
inner_join(locs, ., by = "id") %>%
# summarize
group_by(locality_id) %>%
summarize(elevation_median = median(elevation, na.rm = TRUE),
elevation_sd = sd(elevation, na.rm = TRUE))
# extract using velox and calculate median and sd
elev_pred <- velox(elev)$extract(r_cells, df = TRUE) %>%
# velox doesn't properly name columns, fix that
set_names(c("id", "elevation")) %>%
# summarize
group_by(id) %>%
summarize(elevation_median = median(elevation, na.rm = TRUE),
elevation_sd = sd(elevation, na.rm = TRUE))
# checklist covariates
pland_elev_checklist <- inner_join(pland, elev_checklists, by = "locality_id")
write_csv(pland_elev_checklist, "data/pland-elev_location-year.csv")
# prediction surface covariates
pland_elev_pred <- inner_join(pland_coords, elev_pred, by = "id")
write_csv(pland_elev_pred, "./Data/pland-elev_prediction-surface.csv")
# set random number seed to insure fully repeatable results
set.seed(1)
# setup output directory for saved results
if (!dir.exists("output")) {
dir.create("output")
}
# ebird data
ebird <- read_csv("./Data/ebd_woothr_june_NH_zf.csv") %>%
mutate(year = year(observation_date),
# occupancy modeling requires an integer response
species_observed = as.integer(species_observed))
# modis land cover covariates
habitat <- read_csv("./Data/pland-elev_location-year.csv") %>%
mutate(year = as.integer(year))
# combine ebird and modis data
ebird_habitat <- inner_join(ebird, habitat, by = c("locality_id", "year"))
# prediction surface
pred_surface <- read_csv("./Data/pland-elev_prediction-surface.csv")
# latest year of landcover data
max_lc_year <- pred_surface$year[1]
r <- raster("./Data/prediction-surface.tif")
# load gis data for making maps
map_proj <- "+init=epsg:4326 +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"
ne_land <- read_sf("./Data/GIS_data/gis-data.gpkg", "ne_land") %>%
st_transform(crs = map_proj) %>%
st_geometry()
bcr <- read_sf("./Data/GIS_data/gis-data.gpkg", "bcr") %>%
st_transform(crs = map_proj) %>%
st_geometry()
ne_country_lines <- read_sf("./Data/GIS_data/gis-data.gpkg", "ne_country_lines") %>%
st_transform(crs = map_proj) %>%
st_geometry()
ne_state_lines <- read_sf("./Data/GIS_data/gis-data.gpkg", "ne_state_lines") %>%
st_transform(crs = map_proj) %>%
st_geometry()
#################################################
#prepare data (filtering)
# filter prior to creating occupancy model data
ebird_filtered <- filter(ebird_habitat,
number_observers <= 5,
year == max(year))
#format data
occ <- filter_repeat_visits(ebird_filtered,
min_obs = 2, max_obs = 10,
annual_closure = TRUE,
date_var = "observation_date",
site_vars = c("locality_id", "observer_id"))
# entire data set
nrow(ebird_habitat)
#> [1] 134122
# reduced data set
nrow(occ)
#> [1] 11179
# number of individual sites
n_distinct(occ$site)
#1343
# format for unmarked
occ_wide <- format_unmarked_occu(occ,
site_id = "site",
response = "species_observed",
site_covs = c("n_observations",
"latitude", "longitude",
"pland_04_deciduous_broadleaf",
"pland_05_mixed_forest",
"pland_12_cropland",
"pland_13_urban",
"elevation_median"),
obs_covs = c("time_observations_started",
"duration_minutes",
"effort_distance_km",
"number_observers",
"protocol_type",
"pland_04_deciduous_broadleaf",
"pland_05_mixed_forest",
"pland_12_cropland",
"pland_13_urban",
"elevation_median"))
#Spatial sub sampling
# generate hexagonal grid with ~ 5 km betweeen cells
dggs <- dgconstruct(spacing = 5)
# get hexagonal cell id for each site
occ_wide_cell <- occ_wide %>%
mutate(cell = dgGEO_to_SEQNUM(dggs, longitude, latitude)$seqnum)
# sample one site per grid cell
occ_ss <- occ_wide_cell %>%
group_by(cell) %>%
sample_n(size = 1) %>%
ungroup() %>%
select(-cell)
# calculate the percent decrease in the number of sites
#1 - nrow(occ_ss) / nrow(occ_wide)
###################################################################
# generate polygons for the grid cells
hexagons <- dgcellstogrid(dggs, unique(occ_wide_cell$cell), frame = FALSE) %>%
st_as_sf()
occ_wide$Points<-paste("All (n = ",length(unique(occ_wide$site)), ")",sep="")
occ_ss$Points<-paste("Sampled (n = ", length(unique(occ_ss$site)), ")", sep="")
occ_comb<-rbind(occ_wide, occ_ss)
occ_comb$Points<-as.factor(occ_comb$Points)
levels(occ_comb$Points)
plotColors<-c("gray20","red")
subSamplePlot<-ggmap(map.new, darken=c(0.7,"white"))+
geom_sf(data = hexagons,inherit.aes = FALSE)+
geom_point(data=occ_comb, aes(x=longitude,y=latitude,color=Points),size=0.8,alpha=0.6, shape=16, inherit.aes = FALSE)+
theme(panel.border=element_rect(color="black",fill="transparent"), panel.background = element_rect(fill="white"))+
scale_color_manual(values=plotColors)+
labs(x="Longitude",y="Latitude")+
guides(alpha="none")
subSampleFacet<-subSamplePlot+facet_wrap(.~Points)
subSampleFacet
#create unmarkedFrame object
occ_um <- formatWide(occ_ss, type = "unmarkedFrameOccu")
# fit model occu(~detection process ~occupancy process)
occ_model <- occu(~ time_observations_started + duration_minutes + effort_distance_km + number_observers + protocol_type + pland_04_deciduous_broadleaf + pland_05_mixed_forest + pland_12_cropland + pland_13_urban + elevation_median
~ pland_04_deciduous_broadleaf + pland_05_mixed_forest +
pland_12_cropland + pland_13_urban+elevation_median,
data = occ_um)
# look at the regression coefficients from the model
pander(summary(occ_model))
Call: occu(formula = ~time_observations_started + duration_minutes + effort_distance_km + number_observers + protocol_type + pland_04_deciduous_broadleaf + pland_05_mixed_forest + pland_12_cropland + pland_13_urban + elevation_median ~ pland_04_deciduous_broadleaf + pland_05_mixed_forest + pland_12_cropland + pland_13_urban + elevation_median, data = occ_um)
Occupancy (logit-scale): Estimate SE z P(>|z|) (Intercept) -2.29259 0.61687 -3.7165 0.000202 pland_04_deciduous_broadleaf 1.59035 0.91097 1.7458 0.080849 pland_05_mixed_forest 1.85037 0.87616 2.1119 0.034695 pland_12_cropland 1.96672 45.55465 0.0432 0.965564 pland_13_urban -0.22543 2.27904 -0.0989 0.921205 elevation_median -0.00402 0.00129 -3.1089 0.001878
Detection (logit-scale): Estimate SE z P(>|z|) (Intercept) 2.29085 1.21631 1.883 5.96e-02 time_observations_started -0.04547 0.04869 -0.934 3.50e-01 duration_minutes 0.01849 0.00735 2.517 1.19e-02 effort_distance_km -0.30976 0.36728 -0.843 3.99e-01 number_observers -0.17555 0.64105 -0.274 7.84e-01 protocol_typeTraveling 1.95479 0.84342 2.318 2.05e-02 pland_04_deciduous_broadleaf -0.48258 1.30947 -0.369 7.12e-01 pland_05_mixed_forest -0.24232 1.21475 -0.199 8.42e-01 pland_12_cropland -91.25579 22.22884 -4.105 4.04e-05 pland_13_urban -9.26570 2.77085 -3.344 8.26e-04 elevation_median -0.00294 0.00249 -1.183 2.37e-01
AIC: 455.9301 Number of sites: 380 optim convergence code: 0 optim iterations: 100 Bootstrap iterations: 0
state:
Estimate | SE | z | P(>|z|) | |
---|---|---|---|---|
(Intercept) | -2.293 | 0.6169 | -3.716 | 0.000202 |
pland_04_deciduous_broadleaf | 1.59 | 0.911 | 1.746 | 0.08085 |
pland_05_mixed_forest | 1.85 | 0.8762 | 2.112 | 0.03469 |
pland_12_cropland | 1.967 | 45.55 | 0.04317 | 0.9656 |
pland_13_urban | -0.2254 | 2.279 | -0.09892 | 0.9212 |
elevation_median | -0.004019 | 0.001293 | -3.109 | 0.001878 |
det:
Estimate | SE | z | P(>|z|) | |
---|---|---|---|---|
(Intercept) | 2.291 | 1.216 | 1.883 | 0.05964 |
time_observations_started | -0.04547 | 0.04869 | -0.9339 | 0.3503 |
duration_minutes | 0.01849 | 0.007348 | 2.517 | 0.01185 |
effort_distance_km | -0.3098 | 0.3673 | -0.8434 | 0.399 |
number_observers | -0.1756 | 0.6411 | -0.2738 | 0.7842 |
protocol_typeTraveling | 1.955 | 0.8434 | 2.318 | 0.02047 |
pland_04_deciduous_broadleaf | -0.4826 | 1.309 | -0.3685 | 0.7125 |
pland_05_mixed_forest | -0.2423 | 1.215 | -0.1995 | 0.8419 |
pland_12_cropland | -91.26 | 22.23 | -4.105 | 4.038e-05 |
pland_13_urban | -9.266 | 2.771 | -3.344 | 0.0008258 |
elevation_median | -0.002941 | 0.002485 | -1.183 | 0.2367 |
occ_gof <- mb.gof.test(occ_model, nsim = 10, plot.hist = FALSE)
# hide the chisq table to give simpler output
occ_gof$chisq.table <- NULL
print(occ_gof)
MacKenzie and Bailey goodness-of-fit for single-season occupancy model
Chi-square statistic = 1630.075 Number of bootstrap samples = 10 P-value = 0.7
Quantiles of bootstrapped statistics: 0% 25% 50% 75% 100% 1355 1700 2065 2280 5477
Estimate of c-hat = 0.67
# get list of all possible terms, then subset to those we want to keep
det_terms <- getAllTerms(occ_model) %>%
# retain the detection submodel covariates
discard(str_detect, pattern = "psi")
# dredge all possibe combinations of the occupancy covariates
occ_dredge <- dredge(occ_model, fixed = det_terms)
# model comparison
mc <- select(occ_dredge, contains("psi("), df, AICc, delta, weight)
mc<-mc[,-1]
names(mc)<-gsub("psi\\(","", gsub("\\)","",names(mc)))
mc %>%
mutate_all(~ round(., 3)) %>%
pander()
elevation_median | pland_04_deciduous_broadleaf | pland_05_mixed_forest |
---|---|---|
-0.004 | 1.647 | 1.9 |
-0.003 | NA | 1.099 |
-0.002 | NA | NA |
-0.004 | 1.621 | 1.885 |
-0.004 | 1.627 | 1.915 |
NA | NA | NA |
-0.003 | NA | NA |
-0.003 | NA | 0.939 |
NA | NA | 0.687 |
-0.003 | 0.563 | NA |
-0.003 | NA | 1.094 |
-0.004 | 1.59 | 1.85 |
NA | NA | NA |
-0.003 | 0.424 | NA |
NA | -0.056 | NA |
-0.002 | NA | NA |
NA | NA | NA |
-0.003 | NA | NA |
-0.003 | NA | 0.975 |
NA | 0.275 | 0.812 |
NA | NA | 0.7 |
NA | NA | 0.687 |
-0.003 | 0.562 | NA |
NA | -0.134 | NA |
NA | NA | NA |
-0.003 | 0.426 | NA |
NA | -0.07 | NA |
NA | 0.411 | 0.947 |
NA | 0.29 | 0.818 |
NA | NA | 0.697 |
NA | -0.119 | NA |
NA | 0.388 | 0.929 |
pland_12_cropland | pland_13_urban | df | AICc | delta | weight |
---|---|---|---|---|---|
NA | NA | 15 | 453.4 | 0 | 0.189 |
NA | NA | 14 | 454.6 | 1.204 | 0.104 |
NA | NA | 13 | 455.1 | 1.719 | 0.08 |
0.319 | NA | 16 | 455.6 | 2.161 | 0.064 |
NA | -0.089 | 16 | 455.6 | 2.244 | 0.062 |
NA | NA | 12 | 455.8 | 2.376 | 0.058 |
NA | -2.567 | 14 | 455.9 | 2.481 | 0.055 |
NA | -1.626 | 15 | 456.4 | 2.96 | 0.043 |
NA | NA | 13 | 456.6 | 3.221 | 0.038 |
NA | NA | 14 | 456.9 | 3.514 | 0.033 |
-4.226 | NA | 15 | 457.1 | 3.673 | 0.03 |
1.967 | -0.225 | 17 | 457.6 | 4.229 | 0.023 |
NA | -0.674 | 13 | 457.8 | 4.394 | 0.021 |
NA | -2.267 | 15 | 457.8 | 4.434 | 0.021 |
NA | NA | 13 | 457.9 | 4.506 | 0.02 |
-9.323 | NA | 14 | 458.1 | 4.737 | 0.018 |
-4.391 | NA | 13 | 458.3 | 4.884 | 0.016 |
-0.764 | -2.519 | 15 | 458.3 | 4.917 | 0.016 |
1.004 | -1.668 | 16 | 458.4 | 4.99 | 0.016 |
NA | NA | 14 | 458.6 | 5.214 | 0.014 |
NA | 0.085 | 14 | 458.8 | 5.371 | 0.013 |
-0.786 | NA | 14 | 458.8 | 5.435 | 0.013 |
-8.499 | NA | 15 | 459.6 | 6.216 | 0.008 |
NA | -0.873 | 14 | 459.9 | 6.529 | 0.007 |
-0.394 | -0.665 | 14 | 460 | 6.586 | 0.007 |
-0.994 | -2.235 | 16 | 460 | 6.609 | 0.007 |
-3.96 | NA | 14 | 460.4 | 6.974 | 0.006 |
NA | 0.717 | 15 | 460.7 | 7.306 | 0.005 |
0.78 | NA | 15 | 460.7 | 7.321 | 0.005 |
1.394 | 0.058 | 15 | 460.8 | 7.434 | 0.005 |
0.358 | -0.761 | 15 | 462 | 8.656 | 0.003 |
1.892 | 0.589 | 16 | 462.7 | 9.336 | 0.002 |
detformulaList<-c(
~duration_minutes,
~effort_distance_km,
~number_observers,
~time_observations_started,
~pland_04_deciduous_broadleaf,
~pland_05_mixed_forest,
~pland_12_cropland,
~pland_13_urban,
~elevation_median
)
#get obsCovs as data.frame
obsCovs.df<-as.data.frame(occ_um@obsCovs)
#now fit models
#create empty list to save predicted detection Probs
DetCovariatePred<-list()
#for loop to iterate over detection covariates and produce predicted estimates for detection prob
for(i in 1:length(detformulaList)){
print(detformulaList[[i]])
new.occ_model <- occu(detformulaList[[i]]~ 1, data=occ_um)
#get det covariate name
varName<-gsub("~","",as.character(detformulaList[[i]]))[2]
#get range of values from data to simulate new data
varData<-obsCovs.df[,names(obsCovs.df)==varName]
#simulate new data
sim.data<-data.frame(seq(min(varData,na.rm=TRUE), max(varData,na.rm=TRUE),length.out=100))
colnames(sim.data)<-c(varName)
#get predicted values using sim.data
pred.df<-predict(new.occ_model, type="det", newdata=sim.data, appendData=TRUE)
colnames(pred.df)[5]<-"det_cov"
#add column with covariate name
pred.df$CovariateName<-varName
DetCovariatePred<-rbind(DetCovariatePred, pred.df)
}
duration_minutes effort_distance_km number_observers time_observations_started pland_04_deciduous_broadleaf pland_05_mixed_forest pland_12_cropland pland_13_urban ~elevation_median
#make CovariateName a factor
DetCovariatePred$CovariateName<-as.factor(DetCovariatePred$CovariateName)
# Plot it
detPlot<-ggplot(data=DetCovariatePred)+
geom_ribbon(aes(x=det_cov, ymin=lower, ymax=upper), fill="gray80")+
geom_line(aes(x=det_cov, y=Predicted),color="royalblue3")+
labs(x="Covariate Values", y="Detection Probability")+
#ylim(0,1)+
theme(panel.border=element_rect(color="black",fill="transparent"), panel.background = element_rect(fill="white"))
detPlotFacet<-detPlot+facet_wrap(.~CovariateName, scales="free",ncol=3)
detPlotFacet
occformulaList<-c(
~1~pland_04_deciduous_broadleaf,
~1~pland_05_mixed_forest,
~1~pland_12_cropland,
~1~pland_13_urban,
~1~elevation_median
)
#get obsCovs as data.frame
siteCovs.df<-as.data.frame(occ_um@siteCovs)
#now fit models
#create empty list to save predicted detection Probs
occCovariatePredOcc<-list()
#for loop to iterate over detection covariates and produce predicted estimates for detection prob
for(i in 1:length(occformulaList)){
print(occformulaList[[i]])
new.occ_model <- occu(occformulaList[[i]], data=occ_um)
#get det covariate name
varName<-gsub("~","",as.character(occformulaList[[i]]))[3]
#get range of values from data to simulate new data
varData<-siteCovs.df[,names(siteCovs.df)==varName]
#simulate new data
sim.data<-data.frame(seq(min(varData,na.rm=TRUE), max(varData,na.rm=TRUE),length.out=100))
colnames(sim.data)<-c(varName)
#get predicted values using sim.data
pred.df<-predict(new.occ_model, type="state", newdata=sim.data, appendData=TRUE)
colnames(pred.df)[5]<-"occ_cov"
#add column with covariate name
pred.df$CovariateName<-varName
occCovariatePredOcc<-rbind(occCovariatePredOcc, pred.df)
}
~1 ~ pland_04_deciduous_broadleaf ~1 ~ pland_05_mixed_forest ~1 ~ pland_12_cropland ~1 ~ pland_13_urban ~1 ~ elevation_median
#make CovariateName a factor
occCovariatePredOcc$CovariateName<-as.factor(occCovariatePredOcc$CovariateName)
# Plot it
occPlot<-ggplot(data=occCovariatePredOcc)+
geom_ribbon(aes(x=occ_cov, ymin=lower, ymax=upper), fill="gray80")+
geom_line(aes(x=occ_cov, y=Predicted),color="seagreen4")+
labs(x="Covariate Values", y="Probability of Site Occupancy")+
#ylim(0,1)+
theme(panel.border=element_rect(color="black",fill="transparent"), panel.background = element_rect(fill="white"))
occPlotFacet<-occPlot+facet_wrap(.~CovariateName, scales="free",ncol=3)
occPlotFacet
# select models with the most suport for model averaging
occ_dredge_95 <- get.models(occ_dredge, subset = cumsum(weight) <= 0.95)
# average models based on model weights
occ_avg <- model.avg(occ_dredge_95, fit = TRUE)
# model coefficients
t(occ_avg$coefficients) %>%
knitr::kable()
#now use model to make predictions
occ_pred <- predict(occ_avg,
newdata = as.data.frame(pred_surface),
type = "state")
# add to prediction surface
pred_occ <- bind_cols(pred_surface, occ_prob = occ_pred$fit,occ_se = occ_pred$se.fit) %>%
select(latitude, longitude, occ_prob, occ_se)
r_pred <- pred_occ %>%
# convert to spatial features
st_as_sf(coords = c("longitude", "latitude"), crs = map_proj) %>%
st_transform(crs = raster::projection(r)) %>%
# rasterize
rasterize(r)
r_pred <- r_pred[[c("occ_prob", "occ_se")]]
# save the raster
tif_dir <- "output"
if (!dir.exists(tif_dir)) {
dir.create(tif_dir)
}
writeRaster(r_pred[["occ_prob"]],
filename = file.path(tif_dir, "occupancy-model_prob_woothr.tif"),
overwrite = TRUE)
writeRaster(r_pred[["occ_se"]],
filename = file.path(tif_dir, "occupancy-model_se_woothr.tif"),
overwrite = TRUE)
#now map results
# project predictions
r_pred_proj <- projectRaster(r_pred, crs = map_proj, method = "ngb")
stateMap <- raster::getData('GADM', country='USA', level=2)
nh.sub<-stateMap[stateMap@data$NAME_1 == "New Hampshire",]
###################################################
#plot predicted occupancy in NH for each model
r_pred_proj_crop<-crop(r_pred_proj, nh.sub)
r_pred_proj_NH<-mask(r_pred_proj_crop,nh.sub)
r_pred_occu_NH<-subset(r_pred_proj_NH, 1, drop=TRUE)
# Convert the landscape data RasterLayer objects to data frames for ggplot
r_pred_occu_NH.df <- as.data.frame(r_pred_occu_NH, xy = TRUE, na.rm = TRUE)
#now add SE
r_pred_occuSE_NH<-subset(r_pred_proj_NH, 2, drop=TRUE)
# Convert the landscape data RasterLayer objects to data frames for ggplot
r_pred_occuSE_NH.df <- as.data.frame(r_pred_occuSE_NH, xy = TRUE, na.rm = TRUE)
#add Statistic Colum and standardize column names
r_pred_occu_NH.df$Statistic<-"Mean p(Occupancy)"
colnames(r_pred_occu_NH.df)[3]<-"Probability"
r_pred_occuSE_NH.df$Statistic<-"SE"
colnames(r_pred_occuSE_NH.df)[3]<-"Probability"
#combine data.frames
r_pred_comb<-rbind(r_pred_occu_NH.df, r_pred_occuSE_NH.df)
#make Statistic a factor
r_pred_comb$Statistic<-as.factor(r_pred_comb$Statistic)
NHOccuMap<-ggmap(map.new, darken=c(0.7,"white"))+
geom_tile(data=r_pred_comb, aes(x=x, y=y, fill=Probability, color=Probability))+
geom_map(data=nh.poly, map=nh.poly, aes(y=lat, x=long, map_id=id),color="black",fill="transparent",alpha=0.8, inherit.aes = FALSE)+
scale_fill_viridis_c(name="p(occupancy)")+
scale_color_viridis_c(name="p(occupancy)")+
theme(panel.border=element_rect(color="black",fill="transparent"))+
labs(x="Longitude",y="Latitude")+
guides(alpha="none", color="none")
NHOccuMap.facet<-NHOccuMap+facet_grid(~Statistic)
levels(r_pred_comb$Statistic)
[1] “Mean p(Occupancy)” “SE”
NHOccuMapOut<-NHOccuMap.facet+ggsn::scalebar(dist = 40, dist_unit="km", model = "WGS84", transform=TRUE, y.min = y.min, y.max = y.max, x.min =x.min, x.max = x.max, st.size = 2, st.dist=0.03, location = "bottomright", st.bottom = FALSE, border.size=0.3, anchor=c(x=-70.55, y=42.48))
# theme(legend.key.size = ,legend.position = c(0.22,0.8), legend.background=element_rect(color="black",fill=alpha("white",0.6)),
# legend.title = element_text(size = 5), legend.text = element_text(size = 5))
ggsn::north2(NHOccuMapOut, x = 0.75, y = 0.26, scale = 0.1, symbol = 11)
Amatulli, G., Domisch, S., Tuanmu, M.-N., Parmentier, B., Ranipeta, A., Malczyk, J., and Jetz, W. (2018) A suite of global, cross-scale topographic variables for environmental and biodiversity modeling. Scientific Data volume 5, Article number: 180040. DOI: doi:10.1038/sdata.2018.40.
Fiske, I. and Chandler, R., 2011. Unmarked: an R package for fitting hierarchical models of wildlife occurrence and abundance. Journal of statistical software, 43(10), pp.1-23.
Friedl, M., D. Sulla-Menashe. MCD12C1 MODIS/Terra+Aqua Land Cover Type Yearly L3 Global 0.05Deg CMG V006. 2015, distributed by NASA EOSDIS Land Processes DAAC, https://doi.org/10.5067/MODIS/MCD12C1.006. Accessed 2020-02-14.
MacKenzie, D.I., Nichols, J.D., Royle, J.A., Pollock, K.H., Bailey, L. and Hines, J.E., 2017. Occupancy estimation and modeling: inferring patterns and dynamics of species occurrence. Elsevier.
Strimas-Mackey, M., W.M. Hochachka, V. Ruiz-Gutierrez, O.J. Robinson, E.T. Miller, T. Auer, S. Kelling, D. Fink, A. Johnston. 2020. Best Practices for Using eBird Data. Version 1.0. https://cornelllabofornithology.github.io/ebird-best-practices/. Cornell Lab of Ornithology, Ithaca, New York. https://doi.org/10.5281/zenodo.3620739