I got into birding thanks to my husband, who has been a birder since he was 8 years old. When I tell people that this is a hobby of ours, they typically ask something like, “do you keep a list of what you see?” (if they don’t give us a weird side-eye). My response involves gushy praise for eBird.org, a massive repository of bird information that stores the observations of regular birders like myself to build immense datasets that inform science, education, and conservation. eBird is the “world’s largest biodiversity-related citizen science project,” and those managing eBird at the Cornell Lab of Ornithology have made submitting observations tremendously easy and as accurate as possible.
Submitting observations to eBird (via the website or the app) includes not only the species and counts that were observed, but also some details on the length of time spent birding, distance traveled, number of observers in the party, and a good number of breeding codes that inform researchers about nesting areas, migration, and a host of other behaviors. I have been contributing checklists to eBird since 2013, and to date I have submitted 561 checklists reporting a total of 383 species. (Out of curiosity, I asked my husband how many checklists he has… he’s up around 1,000.)
In all my bird pursuits, owls have captured my awe the most. Interestingly, I have seen every owl I have ever sought, elusive though many of them can be. One of the most majestic is the Snowy Owl, with its yellow eyes beaming against stark white feathers. It’s no wonder J.K Rowling chose this species as Hedwig in Harry Potter. The Snowies are magical.
The other great thing about eBird is that they developed their own R package, auk, to handle the data. I actually didn’t end up using it too much, though, because when you request eBird data, you can already narrow down the dataset to what you specifically need so that not as much querying is needed from the beginning. I requested and received Snowy Owl observation data from January 2018 to November 2018 in order to juxtapose it with Twitter mentions of #snowyowl. The following analysis lets eBird meet Twitter. You knew it had to happen eventually.
I used the auk package to read in the eBird data - 22,116 observations with 45 variables. Here is what it looks like:
snowy <- read_ebd("ebird_snowyowl.txt")
head(snowy)
I read in a spatial polygons dataframe that outlines bird conservation regions (BCRs) because I thought it would be an interesting way to present the eBird data. Rather than using human-defined boundaries, I’m using boundaries that group areas with ecologically distinct attributes that host similar habitats and bird species. Since the BCRs are in North America, this will narrow my analysis accordingly. I am thinking of these as birds’ state lines.
I merged the eBird data with the BCR spatial polygons by BCR.
snowy$observation_count <- as.numeric(snowy$observation_count)
snowy2 <- snowy %>% rename(BCR = bcr_code) %>%
select(BCR, state, latitude, longitude, observation_count, observation_date, country, duration_minutes, number_observers, has_media, trip_comments, species_comments, effort_distance_km) %>%
group_by(BCR) %>%
summarize(bcr_count = sum(observation_count, na.rm = TRUE), avg_duration = as.integer(sum(duration_minutes, na.rm = TRUE)/n()))
snowy_map <- merge(bcr_zip, snowy2, by.x = 'BCR', by.y = 'BCR')
last_month <- snowy %>% filter(observation_date >= "2018-10-01")
I gathered Twitter data using twitteR and Rtweet, as in the previous assignment. My primary goal was to gather tweets with #snowyowl specifically, but I also gathered tweets with geo locations on #owl for the final visual in the exploration below.
num_tweets <- 3200
snowy_tweet <- searchTwitter('#snowyowl', n = num_tweets)
snowy_df <- twListToDF(snowy_tweet)
snowy_geo <- search_tweets(
"snowyowl", geocode = lookup_coords("usa", "canada"), n = 20000)
owl_geo <- search_tweets("#owl", geocode = lookup_coords("usa", "canada"), n = 10000)
owl_geo <- lat_lng(owl_geo)
snowy_geo <- lat_lng(snowy_geo)
I wanted to ask some basic questions of the eBird data, as it’s my first time working with it, and it seems like there is a lot to tease out from it.
max(snowy$observation_date)
## [1] "2018-10-31"
The basic dataset that I accessed through eBird is updated monthly, and I didn’t know when in the month it is updated and what I would receive at the time that I requested, so knowing this was essential.
max(snowy$observation_count, na.rm = TRUE)
## [1] 44
Forty-four?!?! Where was that?? That makes me think of this scene from Harry Potter:
snowy %>% filter(observation_count ==44)
Two checklists report the maximum of 44 counts. They were in Ontario and New York.
6743/22116*100
## [1] 30.48924
After checking how many have has_media == TRUE then dividing that number by the total, about 30.5% of Snowy Owl checklists have media attached.
max(snowy$effort_distance_km, na.rm = TRUE)
## [1] 80.467
Now, this could be the total distance for the entire birding outing and not necessarily just for seeing a Snowy Owl, so this result doesn’t exactly answer my question. Nonetheless, this is an impressive distance to be birding the entire time.
My next exploration of eBird data maps the total observations by Bird Conservation Regions. I have also included the numbered BCRs so you can get an idea of what they look like for all of North America. I ran into some major issues here, so you will notice that the map is not interactive - I used a workaround by just providing you the image of what I was able to create but desperately unable to publish to Rpubs after HOURS of trying every solution I could muster. The popups I coded displayed the total number of observations by each BCR for 2018 as well as the average time spent observing the Snowy Owls.
Bird Conservation Regions
Ideally, I wanted to present the frequency of Snowy Owl reports and frequency of #snowyowl Twitter mentions over the same time frame, but due to the limitations of my eBird dataset (latest observation date being 10-31-2018) and Twitter data reporting only the past 6-9 days, they do not match up perfectly. This is the season that we start to see Snowies, and you can get the sense of the increase from the eBird Snowy Owl reports. Maybe they’re being talked about a bit more, too.
last_month$observation_count <- as.numeric(last_month$observation_count)
last_month %>% group_by(observation_date) %>% summarize(totalcount = sum(observation_count)) %>%
ggplot(aes(x = observation_date, y = totalcount)) +
geom_line() +
theme_minimal() +
theme(plot.title = element_text(face = "bold")) +
labs(
x = "Observation Date", y = "Total Counts Across Checklists",
title = "Total Reports of Snowy Owls in October 2018",
subtitle = "Number of observations submitted to eBird by day",
caption = "Source: eBird")
ts_plot(snowy_geo, "12 hours") +
theme_minimal() +
theme(plot.title = ggplot2::element_text(face = "bold")) +
labs(
x = NULL, y = NULL,
title = "Frequency of #snowyowl Twitter statuses from 11-25-2018 to 12-6-2018",
subtitle = "Twitter status counts aggregated using twelve-hour intervals",
caption = "\nSource: Data collected from Twitter's REST API via rtweet")
I used regular expressions to identify the words from Twitter posts about #snowyowl as well as the words from two columns in my eBird dataset: trip comments and species comments. This is where observers can include details about the birding outing or more specifics about the Snowy Owl they saw. From these top 10 words lists, you can see the nature of the differences in where these comments are coming from: Twitter is somewhat general, trip comments describe environmental attributes, and species comments give more details about the bird itself.
tw_10 <- twitter_words %>% group_by(word) %>% summarize(n = n()) %>% arrange(desc(n)) %>% top_n(10)
kable(tw_10)
| word | n |
|---|---|
| rt | 300 |
| #snowyowl | 238 |
| snowy | 178 |
| @ibgbeauty | 97 |
| owls | 96 |
| found | 93 |
| owl | 87 |
| adult | 71 |
| white | 66 |
| yesterday | 66 |
ebdtrip <- snowy %>% select(trip_comments)
ebdtrip <- ebdtrip %>% filter(!is.na(ebdtrip))
reg <- "([^A-Za-z\\d#@']|'(?![A-Za-z\\d#@]))"
df <- data.frame(text = ebdtrip$trip_comments) %>%
unnest_tokens(word, text, token = "regex", pattern = reg) %>%
filter(!word %in% stop_words$word,
str_detect(word, "[a-z]"))
tc_10 <- df %>% group_by(word) %>% summarize(n = n()) %>% arrange(desc(n)) %>% top_n(10)
kable(tc_10)
| word | n |
|---|---|
| owl | 1068 |
| snowy | 1041 |
| snow | 533 |
| wind | 498 |
| road | 466 |
| sunny | 422 |
| driving | 353 |
| overcast | 304 |
| bird | 289 |
| light | 286 |
ebdspecies <- snowy %>% select(species_comments)
ebdspecies <- ebdspecies %>% filter(!is.na(ebdspecies))
reg <- "([^A-Za-z\\d#@']|'(?![A-Za-z\\d#@]))"
spec <- data.frame(text = ebdspecies$species_comments) %>%
unnest_tokens(word, text, token = "regex", pattern = reg) %>%
filter(!word %in% stop_words$word,
str_detect(word, "[a-z]"))
sc_10 <- spec %>% group_by(word) %>% summarize(n = n()) %>% arrange(desc(n)) %>% top_n(10)
kable(sc_10)
| word | n |
|---|---|
| owl | 2642 |
| bird | 2220 |
| white | 2048 |
| continuing | 1925 |
| sitting | 1645 |
| pole | 1250 |
| road | 1169 |
| perched | 1142 |
| field | 1017 |
| snowy | 1003 |
I was interested in seeing if sentiments differed across tweets, trip comments, and species comments. Though I didn’t anticipate too much negativity about Snowy Owls, I wondered if the trip comments would indicate any negativity (sometimes birders really extend themselves to see a good bird, which could mean horrible weather conditions, for example). I used the NRC lexicon to assess. The caveat to this visual is that I did notice in the tweets that many #snowyowl mentions were advertising artwork or other products, so the nature of the information being discussed is potentially quite different between the eBird and Twitter data.
nrc <- sentiments %>%
filter(lexicon == "nrc") %>%
select(word, sentiment)
spec_sent<- spec %>% inner_join(nrc, by="word")
df_sent <- df %>% inner_join(nrc, by="word")
twitter_sent <- twitter_words %>% inner_join(nrc, by="word")
twitter_sent$csource <- "Twitter"
df_sent$csource <- "eBird Trip Comments"
spec_sent$csource <- "eBird Species Comments"
twitter_sent <- twitter_sent %>% select(word, sentiment, csource)
all_sent <- rbind(twitter_sent, df_sent, spec_sent)
sent_df <- all_sent %>%
group_by(csource, sentiment) %>%
summarize(n = n()) %>%
mutate(frequency = n/sum(n))
bg <- ggplot(sent_df, aes(x = sentiment, y = frequency, fill = csource)) +
geom_bar(stat = "identity", position = "dodge") +
xlab("Sentiment") +
ylab("Percent of Comments") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
bg + scale_fill_manual(values = c("darkblue", "cadetblue3", "gray54"))
Below, I again used leaflet to map eBird species comments alongside Twitter mentions of #snowyowl (n=2) and #owl (n=a lot more than 2). The number of #snowyowl tweets that also had geo locations was tremendously limited, so this is where I filled in with #owl. You can see the range of what people are talking about on Twitter - it could be actual owl sightings, wine, or “Ain’t no sleep, I been feeling like a night owl.” What I wanted to achieve here is that 1) I could add both the datasets onto one map and 2) I could use the Twitter icon to indicate where the tweets came from. Check.
This map also tells me that it was good to use BCRs rather than other geographic boundaries to visualize the data. The distribution of species comments does seem to group in those BCRs as outlined above.
twittericon <- makeIcon(
iconUrl = "https://cdn2.iconfinder.com/data/icons/minimalism/512/twitter.png",
iconWidth = 30, iconHeight = 30)
pal <- colorFactor(c("darkblue"), domain = c("eBird Species Comments"))
leaflet(snowy) %>%
addProviderTiles("CartoDB.Positron") %>%
addCircles(~snowy$longitude, ~snowy$latitude, popup=snowy$species_comments, weight = 6, radius=100,
color= "darkblue", stroke = TRUE, fillOpacity = 0.9) %>%
addMarkers(lng = snowy_geo$lng, lat = snowy_geo$lat, popup = snowy_geo$text, icon = twittericon) %>%
addMarkers(lng = owl_geo$lng, lat = owl_geo$lat, popup = owl_geo$text, icon = twittericon) %>%
setView(-100, 45, zoom = 2.5)
Exploring Twitter mentions, eBird trip details, and eBird species comments leads me to believe that there are a good number of other people out there who are as enthusiastic about Snowy Owls as I am. While there are limitations to sentiment comparisons across Twitter and eBird, using these data solidified the many lessons I have learned throughout this semester, and it was interesting to bring all of these comments about a species into one place. As fun as creating these visuals was, I highly recommend checking out eBird’s new visualizations (the link is one for the American Kestrel); they are absolutely stunning.
For all of you in the Portland area who find yourselves intrigued by birding after this analysis, you should start with an incredible rarity that has been hanging around, the Great Black Hawk. Since you’ll be exploring eBird with your newfound enthusiasm, take a look at its range map. You’ll notice that it definitely does not belong here and yet has been hanging out since the summer. This is a very big deal in the birding world. There have probably been times when you could have seen it from campus!
Thank you all for a wonderful semesteR.