ENVS 6611 Final Project Proposal

Author

Nissim Lebovits

Published

November 9, 2022

Summary

Using data from eBird, NOAA, and other relevant sources, this project will attempt to evaluate the possible impact of sea level rise on the spatial distribution of avifauna in the Philadelphia region.

Background

Study Area

Philadelphia sits at the boundary between the Atlantic Coastal Plain and Piedmont ecoregions. The area was historically covered with hardwood forests, but much of its natural vegetation has been removed for urbanization and cultivation.1 Built as a port along the Delaware River, the city was constructed on top of historic wetlands and mudflats. Due to infill and development, 95% of the region’s freshwater tidal wetlands have been lost. As a result of Philadelphia’s location and development patterns, SLR projections estimate that much of the area adjacent to the Delaware will be under water within the next century.2

The city of Philadelphia is also a major hotspot for bird biodiversity in North America. A key migratory stopover location on the Atlantic Flyway, Philadelphia sees more than 200 species of migratory birds pass through it each year.3 The city hosts a number of regional bird hotspots, especially Pennypack on the Delaware (266 species), John Heinz National Wildlife Reserve (281 species), and FDR Park (313 species).4 Given their proximity to the Schuylkill and Delaware rivers, all three of these sites are likely to be significantly impacted by SLR. From a management perspective, it is therefore important to understand the possible impact of SLR on Philadelphia’s avian biodiversity.

Bird Biodiversity Hotspots in the Philadelphia Region

Birds as Indicator Species

Birds are a popular subject for biodiversity monitoring because “compared to other vertebrates, birds are easily monitored by skilled observers and provide a mechanism to explore urban effects and responses to different urban designs.”5

“Bird populations are sensitive to many types of environmental change and, because they occupy a position at or near the top of the food chain, give indications of overall ecosystem functioning.”6

Existing Research

Research is lacking on the impacts of SLR on urban biodiversity in North America. While there is some literature on the impacts of SLR on tropical island biodiversity or on global extinction, there has not been much work done specifically on the impact of SLR on avifauna distributions, least of all in North American coastal cities.7 Generally speaking, however, a few general trends have been found. First, research suggests that species with limited ranges will be most impacted.8. In particular, species reliant on rare and/or vulnerable coastal habitats will be most threatened.9

Freshwater wetlands and intertidal habitats face especially high risk. Threats include increased levels of inundation and storm flooding; accelerated coastal erosion; sea water intrusion into freshwater tributaries; changes to the tidal prism, tidal range, sediment supply and rates of accretion; changes in air temperature and rainfall affecting the growth of salt marsh plants with secondary effects on sedimentation (Adam, 2002; Kennish, 2002; Moore, 1999; Nicholls et al., 1999; Reed, 1990).10 The effects of these inputs are unpredictable. For example, depending on a variety of factors, marshes could either contract or expand.11

Sea level rise will cause shifts in fl ooding potential on the urban coastal wetlands and beach zones, which will alter the habitat quality of these locations at rates signifi cantly above natural baseline conditions. The amount of sea level rise could have potential large-scale impacts on the areal extent and ecosystem health of the urban coastal wetlands, including permanent inundation, accelerated inland wetland migration (if the wetlands are not blocked by bulkheads or similar structures), and shifts in salinity gradients. (Urbanization, Biodiversity, and Ecosystem Services: Challenges and Opportunities*, Elmqvist et al., 2013, 497)

Project Goals

This report will combine NOAA sea level rise projections with eBird citizen science data on bird species presence to evaluate how Philadelphia’s endemic birds might be affected by future sea level rise. It will use a presence-only maximum entropy model, break species down by preferred habitat, and focus on endemic species listed by the IUCN as species of concern or higher.

Data

This project will draw on 3 key data sources:

  1. Bird observations recorded by eBird, a project of the Cornell Lab of Ornithology. CLO has published a guide to using eBird’s citizen science data for species distribution modeling, which I will be using extensively.
  2. The National Oceanic and Atmospheric Administration’s data on current sea levels and projected sea level rise.
  3. Various predictive factors assembled based on background reading and availability, such as habitat patch size, connectivity, degree of urbanization, road density, tree canopy cover, impervious surface cover, population density, land use, distance to fresh water, distance to salt water, etc.12 These layers will be assembled from ESRI, PASDA, and OpenDataPhilly, among other possible sources.

Challenges and Limitations

Because of the time frame of this assignment and my own limitations, this report will focus exclusively on sea level rise. Crucially, however, this is not an accurate representation of ecosystem function in the real world. It is impossible to divorce the impact of sea level rise alone on biodiversity from other confounding factors like coastal modification, saltwater intrusion, increased rainfall, and water pollution, let alone less proximate factors like changes in human population density and land use or the arrival of invasive species. Furthermore, as Pilkey and Pilkey point out, even if the scale of such impacts could be measured, it is difficult to account for ordering complexity: the unpredictable differences in impact depending on the order in which events occur.13 As Jimenez-Valverde et al., explain, prediction via species distribution modeling only works “if correlation structures are stable and consistent across different landscapes and time periods”.14 As a result, this project is more of an intellectual exercise than a genuine management tool. I will follow Pilkey and Pilkey, furthermore, in making only qualitative predictions based on my model (e.g., “the range of Species X will likely decrease”, or “species richness at FDR Park will likely decline”) rather than precise quantitative projections for species numbers or ranges.

Data Exploration and Cleaning

Sourcing Data

R Setup

Show the code
library(tidyverse)
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.3.6      ✔ purrr   0.3.4 
✔ tibble  3.1.8      ✔ dplyr   1.0.10
✔ tidyr   1.2.0      ✔ stringr 1.4.1 
✔ readr   2.1.2      ✔ forcats 0.5.2 
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
Show the code
library(sf)
Linking to GEOS 3.9.1, GDAL 3.4.3, PROJ 7.2.1; sf_use_s2() is TRUE
Show the code
library(janitor)

Attaching package: 'janitor'

The following objects are masked from 'package:stats':

    chisq.test, fisher.test
Show the code
library(lubridate)

Attaching package: 'lubridate'

The following objects are masked from 'package:base':

    date, intersect, setdiff, union
Show the code
library(tmap)
library(mapview)
library(janitor)
library(plotly)
Warning: package 'plotly' was built under R version 4.2.2

Attaching package: 'plotly'

The following object is masked from 'package:ggplot2':

    last_plot

The following object is masked from 'package:stats':

    filter

The following object is masked from 'package:graphics':

    layout
Show the code
library(ggthemr)

ggthemr("pale") #set global ggplot theme

Importing eBird and IUCN Data

The primary dataset for this project comes from Cornell Ornithology Lab’s eBird. To avoid downloading the entire 200 GB global eBird dataset, I have manually downloaded a subset of the eBird Basic Dataset prepared specifically for Philadelphia, Pennsylvania beginning in 2011. Instructions for how to manually download these data rather than using eBird’s auk package are available here.

Show the code
ebird = read.table("C:/Users/Nissim/Desktop/Fall 2022/Floodplain Management/Final Project Data/eBird/ebd_US-PA-101_relSep-2022.txt",
           sep = "\t", 
           header = TRUE,
           fill = TRUE) |>
           clean_names()

Cleaning Data

[Include explanation of how I cleaned these data.]

  1. Explain where eBird data come from and what the limitations are
  2. Explain that we are following dest practices based on the eBird book
  3. Explain what that entails:

“Based on our experience working with these data, we suggest restricting checklists to less than 5 hours long and 5 km in length, and with 10 or fewer observers. Furthermore, we’ll only consider data from the past 10 years (2010-2019).”

“The chance of an observer detecting a bird when present can be highly dependent on time of day. For example, many species exhibit a peak in detection early in the morning during dawn chorus and a secondary peak early in the evening. With this in mind, the first predictor of detection that we’ll explore is the time of day at which a checklist was started. We’ll summarize the data in 1 hour intervals, then plot them. Since estimates of detection frequency are unreliable when only a small number of checklists are available, we’ll only plot hours for which at least 100 checklists are present.”

“In later chapters, we’ll make predictions at the peak time of day for detecatibility to limit the effect of this variation. The majority of checklist submissions also occurs in the morning; however, there are reasonable numbers of checklists between 5am and 9pm. It’s in this region that our model estimates will be most reliable.”

From IUCN: “The IUCN Red List Categories and Criteria are intended to be an easily and widely understood system for classifying species at high risk of global extinction. It divides species into nine categories: Not Evaluated, Data Deficient, Least Concern, Near Threatened, Vulnerable, Endangered, Critically Endangered, Extinct in the Wild and Extinct.” https://www.iucnredlist.org/

Show the code
ebird_i = ebird |>
            dplyr::select(taxonomic_order,
                          category,
                          taxon_concept_id,
                          common_name,
                          scientific_name,
                          exotic_code,
                          observation_count,
                          breeding_code,
                          breeding_category,
                          county_code,
                          locality,
                          locality_id,
                          latitude,
                          longitude,
                          observation_date,
                          time_observations_started,
                          observer_id,
                          approved,
                          effort_distance_km,
                          number_observers,
                          protocol_type,
                          duration_minutes,
                          sampling_event_identifier) |>
            dplyr::filter(approved == "1") # make sure R doesn't confuse dplyr::filter with stats::filter

# function to convert time observation to hours since midnight
# pulled directly from https://cornelllabofornithology.github.io/ebird-best-practices/ebird.html#ebird-zf
time_to_decimal <- function(x) {
  x <- hms(x, quiet = TRUE)
  hour(x) + minute(x) / 60 + second(x) / 3600
}

ebird_i$observation_date = as_date(ebird_i$observation_date)
Warning: 3 failed to parse.
Show the code
ebird_i = ebird_i |>
            mutate(
              # convert X to NA
              observation_count = if_else(observation_count == "X", 
                                          NA_character_, observation_count),
              observation_count = as.integer(observation_count),
              # effort_distance_km to 0 for non-travelling counts
              effort_distance_km = if_else(protocol_type != "Traveling", 
                                           "0", effort_distance_km),
              # convert time to decimal hours since midnight
              time_observations_started = time_to_decimal(time_observations_started),
              # split date into year and day of year
              year = year(observation_date),
              day_of_year = yday(observation_date)
            ) |>
          filter(
            # effort filters
            duration_minutes <= 5 * 60,
            effort_distance_km <= 5,
            # last 10 years of data
            year >= 2010,
            # 10 or fewer observers
            number_observers <= 10)
Warning in mask$eval_all_mutate(quo): NAs introduced by coercion
Show the code
ebird_last_decade = ebird_i |>
                      filter(year >= 2012 & year <= 2021)

Exploring Data

Now that we’ve cleaned up and supplemented our eBird data, we’ll explore it.

Observations by Year

First, we need to look at our raw number of observations year by year. This element is crucial because of the influence of pandemic birding. The extraordinary and unprecedented rise in citizen science data collection during the pandemic has been a boon for ecological data. However, it also complicates the interpretation of these data. As Sara Harrison explains in Wired, “Scientists can’t always tell whether changes in the data are due to animal behavior or just an increase in the amount of information available. Furthermore,”It’s not just that more people are observing—it’s also a matter of where* they are observing.” Observations in urban areas have increased, while observations in rural areas have decreased, suggesting likely undersampling in less-accessible habitats.15

As we can see in Philadelphia, the number of observations in 2021 is substantially larger than any of the previous nine years. Although observations per year were already trending up over the previous decade, 2021 represented an [r avg]% increase in bird observations from 2020 and was [r avg]% higher than the average number of observations per year in the previous nine years.

Show the code
countXyear = ebird_last_decade |>
                group_by(year) |>
                  tally() 

ggplotly(ggplot(countXyear) +
  geom_col(aes(x = year, y = n)) +
  labs(title = "Raw Annual eBird Observations, 2012-21",
       x = "Year",
       y = "Total Observations"))

Checklists by Year

To understand the rise in the number of birders submitting observations during the pandemic, it helps to track how many checklists were submitted each year in the last decade. An eBird checklist represents a single observation event (a birding outing), and groups together sightings of multiple species from the same observation event.16 By counting the number of checklists submitted in each year, we have a proxy for birder activity, which will help us account for the rise in pandemic mentioned in the previous section.

Show the code
countXyearXsei = aggregate(data = ebird_last_decade,                # Applying aggregate
                          sampling_event_identifier ~ year,
                          function(sampling_event_identifier) length(unique(sampling_event_identifier)))

ggplotly(ggplot(countXyearXsei) +
  geom_col(aes(x = year, y = sampling_event_identifier)) +
  labs(title = "Annual Sampling Events, 2012-21",
       x = "Year",
       y = "Total Sampling Events"))

Number of Species

The primary focus of our analysis here is species richness: the total number of species within our study region. To that end, it’s helpful to visualize the total number of unique species observed in Philadelphia in each year. Unlike total observations or total checklists, we expect that the number of unique species observed each year has not changed due to the pandemic. [Need to give a clear explanation of why.]

Indeed, the data confirm that there has been only a slight uptick in the number of unique species observed annually in Philadelphia since the start of the pandemic. [Will want to analyze this using normal distribution, mean, std, etc.]

Show the code
countXyearXspecies = aggregate(data = ebird_last_decade,                # Applying aggregate
                          scientific_name ~ year,
                          function(scientific_name) length(unique(scientific_name)))

ggplotly(ggplot(countXyearXspecies) +
  geom_col(aes(x = year, y = scientific_name)) +
  labs(title = "Unique Species Observed per Year, 2012-21",
       x = "Year",
       y = "Total Unique Species"))

Observation Hotspots by Year

Hypothesis

Let’s identify hotposts based on the clustering maps!

Preparing for SDM

Selecting Species

Preparing Rasters

Executing SDM

Findings

Conclusions

Footnotes

  1. https://www.epa.gov/sites/default/files/2019-03/documents/phipa_final.pdf↩︎

  2. https://watercenter.sas.upenn.edu/philadelphia-urban-ecology-and-the-balance-of-human-and-ecological-communities/↩︎

  3. https://pa.audubon.org/chapters-centers/discovery-center↩︎

  4. https://ebird.org/hotspots↩︎

  5. Chace and Walsh, 1↩︎

  6. Wilby and Perry, 78↩︎

  7. include citations↩︎

  8. Urbanization, Biodiversity, and Ecosystem Services: Challenges and Opportunities, Elmqvist et al., 2013, 496↩︎

  9. Wilby and Perry (2006), 73↩︎

  10. From Wilby and Perry 76↩︎

  11. Wilby and Perry 76-78↩︎

  12. Ecology and Conservation of Birds in Urban Environments, Murgui and Hedblom eds., 2017↩︎

  13. Useless Arithmetic, 2007↩︎

  14. Jimenez-Valverde et al., 2009↩︎

  15. Sara Harrison, “Pandemic Bird-Watching Created a Data Boom-and a Conundrum,” Wired (Conde Nast, September 30, 2021), https://www.wired.com/story/pandemic-bird-watching-created-a-data-boom-and-a-conundrum/.↩︎

  16. https://cornelllabofornithology.github.io/ebird-best-practices/intro.html#intro-intro↩︎