eBird Meets Twitter

An exploration of Snowy Owl sightings and mentions

Jaclyn Janis

MPH 676, University of Southern Maine, Fall 2018

Purpose

I got into birding thanks to my husband, who has been a birder since he was 8 years old. When I tell people that this is a hobby of ours, they typically ask something like, “do you keep a list of what you see?” (if they don’t give us a weird side-eye). My response involves gushy praise for eBird.org, a massive repository of bird information that stores the observations of regular birders like myself to build immense datasets that inform science, education, and conservation. eBird is the “world’s largest biodiversity-related citizen science project,” and those managing eBird at the Cornell Lab of Ornithology have made submitting observations tremendously easy and as accurate as possible.

Submitting observations to eBird (via the website or the app) includes not only the species and counts that were observed, but also some details on the length of time spent birding, distance traveled, number of observers in the party, and a good number of breeding codes that inform researchers about nesting areas, migration, and a host of other behaviors. I have been contributing checklists to eBird since 2013, and to date I have submitted 561 checklists reporting a total of 383 species. (Out of curiosity, I asked my husband how many checklists he has… he’s up around 1,000.)

In all my bird pursuits, owls have captured my awe the most. Interestingly, I have seen every owl I have ever sought, elusive though many of them can be. One of the most majestic is the Snowy Owl, with its yellow eyes beaming against stark white feathers. It’s no wonder J.K Rowling chose this species as Hedwig in Harry Potter. The Snowies are magical.

The other great thing about eBird is that they developed their own R package, auk, to handle the data. I actually didn’t end up using it too much, though, because when you request eBird data, you can already narrow down the dataset to what you specifically need so that not as much querying is needed from the beginning. I requested and received Snowy Owl observation data from January 2018 to November 2018 in order to juxtapose it with Twitter mentions of #snowyowl. The following analysis lets eBird meet Twitter. You knew it had to happen eventually.

Preparing the Data

eBird Data

I used the auk package to read in the eBird data - 22,116 observations with 45 variables. Here is what it looks like:

snowy <- read_ebd("ebird_snowyowl.txt")
head(snowy)

Spatial Polygons

I read in a spatial polygons dataframe that outlines bird conservation regions (BCRs) because I thought it would be an interesting way to present the eBird data. Rather than using human-defined boundaries, I’m using boundaries that group areas with ecologically distinct attributes that host similar habitats and bird species. Since the BCRs are in North America, this will narrow my analysis accordingly. I am thinking of these as birds’ state lines.

I merged the eBird data with the BCR spatial polygons by BCR.

snowy$observation_count <- as.numeric(snowy$observation_count)
snowy2 <- snowy %>% rename(BCR = bcr_code) %>% 
  select(BCR, state, latitude, longitude, observation_count, observation_date, country, duration_minutes, number_observers, has_media, trip_comments, species_comments, effort_distance_km) %>%
  group_by(BCR) %>% 
  summarize(bcr_count = sum(observation_count, na.rm = TRUE), avg_duration = as.integer(sum(duration_minutes, na.rm = TRUE)/n()))

snowy_map <- merge(bcr_zip, snowy2, by.x = 'BCR', by.y = 'BCR')

last_month <- snowy %>% filter(observation_date >= "2018-10-01")

Twitter

I gathered Twitter data using twitteR and Rtweet, as in the previous assignment. My primary goal was to gather tweets with #snowyowl specifically, but I also gathered tweets with geo locations on #owl for the final visual in the exploration below.

num_tweets <- 3200
snowy_tweet <- searchTwitter('#snowyowl', n = num_tweets)
snowy_df <- twListToDF(snowy_tweet)

snowy_geo <- search_tweets(
  "snowyowl", geocode = lookup_coords("usa", "canada"), n = 20000)

owl_geo <- search_tweets("#owl", geocode = lookup_coords("usa", "canada"), n = 10000)

owl_geo <- lat_lng(owl_geo)

snowy_geo <- lat_lng(snowy_geo)

Exploring the Data

eBird Exploration

I wanted to ask some basic questions of the eBird data, as it’s my first time working with it, and it seems like there is a lot to tease out from it.

When was the last observation in this dataset?

max(snowy$observation_date)

## [1] "2018-10-31"

The basic dataset that I accessed through eBird is updated monthly, and I didn’t know when in the month it is updated and what I would receive at the time that I requested, so knowing this was essential.

What was the highest count of Snowy Owls in one report?

max(snowy$observation_count, na.rm = TRUE)

## [1] 44

Forty-four?!?! Where was that?? That makes me think of this scene from Harry Potter:

Where did someone see 44 Snowies?

snowy %>% filter(observation_count ==44)

Two checklists report the maximum of 44 counts. They were in Ontario and New York.

What percent of checklists with Snowy Owls have media attached?

6743/22116*100

## [1] 30.48924

After checking how many have has_media == TRUE then dividing that number by the total, about 30.5% of Snowy Owl checklists have media attached.

What is the maximum distance (in kilometers) that someone traveled to spot a Snowy?

max(snowy$effort_distance_km, na.rm = TRUE)

## [1] 80.467

Now, this could be the total distance for the entire birding outing and not necessarily just for seeing a Snowy Owl, so this result doesn’t exactly answer my question. Nonetheless, this is an impressive distance to be birding the entire time.

Snowy Owls by Bird Conservation Regions, North America, January-November 2018

My next exploration of eBird data maps the total observations by Bird Conservation Regions. I have also included the numbered BCRs so you can get an idea of what they look like for all of North America. I ran into some major issues here, so you will notice that the map is not interactive - I used a workaround by just providing you the image of what I was able to create but desperately unable to publish to Rpubs after HOURS of trying every solution I could muster. The popups I coded displayed the total number of observations by each BCR for 2018 as well as the average time spent observing the Snowy Owls.

Snowy Owls by BCR

Numbered BCRs

Bird Conservation Regions

Twitter Exploration

Frequencies of Snowy Owl Reports and Tweets, Fall 2018

Ideally, I wanted to present the frequency of Snowy Owl reports and frequency of #snowyowl Twitter mentions over the same time frame, but due to the limitations of my eBird dataset (latest observation date being 10-31-2018) and Twitter data reporting only the past 6-9 days, they do not match up perfectly. This is the season that we start to see Snowies, and you can get the sense of the increase from the eBird Snowy Owl reports. Maybe they’re being talked about a bit more, too.

Snowy Owl Reports

last_month$observation_count <- as.numeric(last_month$observation_count)
last_month %>% group_by(observation_date) %>% summarize(totalcount = sum(observation_count)) %>%
  ggplot(aes(x = observation_date, y = totalcount)) + 
  geom_line() +
  theme_minimal() +
  theme(plot.title = element_text(face = "bold")) +
  labs(
    x = "Observation Date", y = "Total Counts Across Checklists",
    title = "Total Reports of Snowy Owls in October 2018",
    subtitle = "Number of observations submitted to eBird by day",
    caption = "Source: eBird")

Tweets

ts_plot(snowy_geo, "12 hours") +
  theme_minimal() +
  theme(plot.title = ggplot2::element_text(face = "bold")) +
  labs(
    x = NULL, y = NULL,
    title = "Frequency of #snowyowl Twitter statuses from 11-25-2018 to 12-6-2018",
    subtitle = "Twitter status counts aggregated using twelve-hour intervals",
    caption = "\nSource: Data collected from Twitter's REST API via rtweet")

Diving into the Words

I used regular expressions to identify the words from Twitter posts about #snowyowl as well as the words from two columns in my eBird dataset: trip comments and species comments. This is where observers can include details about the birding outing or more specifics about the Snowy Owl they saw. From these top 10 words lists, you can see the nature of the differences in where these comments are coming from: Twitter is somewhat general, trip comments describe environmental attributes, and species comments give more details about the bird itself.

Twitter Words

tw_10 <- twitter_words %>% group_by(word) %>% summarize(n = n()) %>% arrange(desc(n)) %>% top_n(10)
kable(tw_10)

word	n
rt	300
#snowyowl	238
snowy	178
@ibgbeauty	97
owls	96
found	93
owl	87
adult	71
white	66
yesterday	66

Words from Trip Comments

ebdtrip <- snowy %>% select(trip_comments) 
ebdtrip <- ebdtrip %>% filter(!is.na(ebdtrip))

reg <- "([^A-Za-z\\d#@']|'(?![A-Za-z\\d#@]))"
df <- data.frame(text = ebdtrip$trip_comments) %>%
  unnest_tokens(word, text, token = "regex", pattern = reg) %>%
  filter(!word %in% stop_words$word,
         str_detect(word, "[a-z]"))
tc_10 <- df %>% group_by(word) %>% summarize(n = n()) %>% arrange(desc(n)) %>% top_n(10)
kable(tc_10)

word	n
owl	1068
snowy	1041
snow	533
wind	498
road	466
sunny	422
driving	353
overcast	304
bird	289
light	286

Words from Species Comments

ebdspecies <- snowy %>% select(species_comments) 
ebdspecies <- ebdspecies %>% filter(!is.na(ebdspecies))

reg <- "([^A-Za-z\\d#@']|'(?![A-Za-z\\d#@]))"
spec <- data.frame(text = ebdspecies$species_comments) %>% 
  unnest_tokens(word, text, token = "regex", pattern = reg) %>%
  filter(!word %in% stop_words$word,
         str_detect(word, "[a-z]"))
sc_10 <- spec %>% group_by(word) %>% summarize(n = n()) %>% arrange(desc(n)) %>% top_n(10)
kable(sc_10)

word	n
owl	2642
bird	2220
white	2048
continuing	1925
sitting	1645
pole	1250
road	1169
perched	1142
field	1017
snowy	1003

Sentiments

I was interested in seeing if sentiments differed across tweets, trip comments, and species comments. Though I didn’t anticipate too much negativity about Snowy Owls, I wondered if the trip comments would indicate any negativity (sometimes birders really extend themselves to see a good bird, which could mean horrible weather conditions, for example). I used the NRC lexicon to assess. The caveat to this visual is that I did notice in the tweets that many #snowyowl mentions were advertising artwork or other products, so the nature of the information being discussed is potentially quite different between the eBird and Twitter data.

nrc <- sentiments %>%
  filter(lexicon == "nrc") %>%
  select(word, sentiment)

spec_sent<- spec %>% inner_join(nrc, by="word")
df_sent <- df %>% inner_join(nrc, by="word")
twitter_sent <- twitter_words %>% inner_join(nrc, by="word")

twitter_sent$csource <- "Twitter"
df_sent$csource <- "eBird Trip Comments"
spec_sent$csource <- "eBird Species Comments"

twitter_sent <- twitter_sent %>% select(word, sentiment, csource)
all_sent <- rbind(twitter_sent, df_sent, spec_sent)

sent_df <- all_sent %>% 
  group_by(csource, sentiment) %>% 
  summarize(n = n()) %>%
  mutate(frequency = n/sum(n))

bg <- ggplot(sent_df, aes(x = sentiment, y = frequency, fill = csource)) + 
  geom_bar(stat = "identity", position = "dodge") +
  xlab("Sentiment") +
  ylab("Percent of Comments") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

bg + scale_fill_manual(values = c("darkblue", "cadetblue3", "gray54"))

eBird and Twitter Meet on a Map

Below, I again used leaflet to map eBird species comments alongside Twitter mentions of #snowyowl (n=2) and #owl (n=a lot more than 2). The number of #snowyowl tweets that also had geo locations was tremendously limited, so this is where I filled in with #owl. You can see the range of what people are talking about on Twitter - it could be actual owl sightings, wine, or “Ain’t no sleep, I been feeling like a night owl.” What I wanted to achieve here is that 1) I could add both the datasets onto one map and 2) I could use the Twitter icon to indicate where the tweets came from. Check.

This map also tells me that it was good to use BCRs rather than other geographic boundaries to visualize the data. The distribution of species comments does seem to group in those BCRs as outlined above.

twittericon <- makeIcon(
  iconUrl = "https://cdn2.iconfinder.com/data/icons/minimalism/512/twitter.png",
  iconWidth = 30, iconHeight = 30)

pal <- colorFactor(c("darkblue"), domain = c("eBird Species Comments"))
leaflet(snowy) %>% 
  addProviderTiles("CartoDB.Positron") %>% 
  addCircles(~snowy$longitude, ~snowy$latitude, popup=snowy$species_comments, weight = 6, radius=100, 
                   color= "darkblue", stroke = TRUE, fillOpacity = 0.9) %>%
  addMarkers(lng = snowy_geo$lng, lat = snowy_geo$lat, popup = snowy_geo$text, icon = twittericon) %>%
  addMarkers(lng = owl_geo$lng, lat = owl_geo$lat, popup = owl_geo$text, icon = twittericon) %>%
  setView(-100, 45, zoom = 2.5)

Discussion

Exploring Twitter mentions, eBird trip details, and eBird species comments leads me to believe that there are a good number of other people out there who are as enthusiastic about Snowy Owls as I am. While there are limitations to sentiment comparisons across Twitter and eBird, using these data solidified the many lessons I have learned throughout this semester, and it was interesting to bring all of these comments about a species into one place. As fun as creating these visuals was, I highly recommend checking out eBird’s new visualizations (the link is one for the American Kestrel); they are absolutely stunning.

For all of you in the Portland area who find yourselves intrigued by birding after this analysis, you should start with an incredible rarity that has been hanging around, the Great Black Hawk. Since you’ll be exploring eBird with your newfound enthusiasm, take a look at its range map. You’ll notice that it definitely does not belong here and yet has been hanging out since the summer. This is a very big deal in the birding world. There have probably been times when you could have seen it from campus!

Thank you all for a wonderful semesteR.

MPH 676 Final Project