Background

Fireflies are a well-loved variety of beetle that is are found across the globe. In the US, it is especially common along the East Coast, although every state within the continental United States is home to at least one species (Alaska and Hawaii have zero). There are roughly fifty different species across three general taxonomic groups.

Many of these species are classified on the IUCN Red List as endangered or threatened, but an even larger number lack sufficient data to make an evaluation. The Firefly Watch project run by Mass Audobon is attempting to fill some of these gaps through the power of crowd sourced data.

Crowdsourced Data

Crowd-sourced data is data that is collected by a large group of people (the “crowd”) who are not otherwise connected to the research project. These contributors are often referred to as “citizen scientists.”

There are many strengths to crowd sourced data. For one, it can be significantly cheaper to obtain compared to having a research team collect an equivalent amount themselves. It also provides significantly more opportunity for observations across time and space. This is invaluable for studies of rare phenomena in which even the best trained and best equipped researchers would have a hard time collecting sufficient data by themselves.

One of the drawbacks is that there is an inherent bias in the data collected. Since the data is not collected through random sampling, but is submitted by motivated volunteers, there is a strong possibility that the sample data will not be representative of the overall population. Additionally, there is a natural tendency for people to not report “non-findings,” similar to how experiments that support a null hypothesis are rarely submitted for publication.

The non-uniformity of space is particularly relevant for crowd sourced data as the distribution of data sources will almost certainly be skewed towards areas of higher population. The locational fallacy might also be exacerbated due to increased input errors from lack of proper training or unclear instructions given to the volunteers.

Working With Data

#-------------------------------------------------------------------------------
#                             IMPORT DATA
#-------------------------------------------------------------------------------
firefly <- read_excel("fireflydata.xlsx") %>%
  as.data.frame() # prefer classic df > tibble

Source: Mass Audubon


Data Description

Object of Analysis
Averages of three ten second observations in ten minutes of observed firefly flashes in the United States from 2008-2016.

Years & Number of Observations

## 
## 2008 2009 2010 2011 2012 2013 2014 2015 2016 
## 1542 2892 2537 2822 2590 1824 1720 1993 2214

Other Summary Statistics

## [1] 75.24436
## [1] 23
## [1] 8000
## 
##       Agricultural               City      Rural: forest   Rural: open area 
##                590               1524               2424               2893 
##             Suburb Suburban_rural mix 
##               6702               6001
## 
##         Clear        Cloudy         Foggy Partly cloudy 
##          8423          2995           149          8567
## 
##    No   Yes 
##  2645 17489


Spatial Wrangling

crs.firefly <- 4326
sf.firefly <- st_as_sf(firefly, coords = c("Longitude", "Latitude"), crs = crs.firefly)

sf.states <- ne_states(country = "united states of america", returnclass = "sf") %>%
  st_transform(crs = crs.firefly)

st_geometry allows us to plot only the geometries without any marks. Without it, R will create an individual plot for every mark, like this:

Maps

Interactive



Results & Discussion

The observations are most frequent on the East Coast and near major cities which is too be expected of crowd-sourced data. What’s interesting is the sharp decline in observations midway through Texas. After examining the terrain map, however, the decline coincides perfectly with dry, desert areas in which it’s known that fireflies cannot live.

The other trend is a decrease along the Appalachian mountain range which could be explained either by the increase in elevation or by a lower population density (and hence fewer possibilities for crowd sourced observations).

Additionally, 50F seems to be the commonly accepted minimum temperature for the firefly bioluminence reaction to work but there are 34 sightings below that temperature. Although there are some, like observation #63 that claims it was 23F in May in California, that are likely user errors, there might be some merit in investigating some of these unexpected observations more closely, especially given that there are species of fireflies in Tibet that are known to survive in freezing temperatures.

Another possible consideration could be pH and soil chemistry, specifically the presence or absence of Mg and Ca which are necessary for the bioluminence reaction. Cursory research suggests that fireflies prefer slightly alkaline water, which could be another factor in why they are not found at high elevation. Or perhaps either the pH or presence of certain radiometric isotopes (like Mg-28+) in the spring rains are necessary to trigger the next stage of the firefly life cycle from larvae to beetle.


Package Citations

Adrian Baddeley, Ege Rubak, Rolf Turner (2015). Spatial Point Patterns: Methodology and Applications with R. London: Chapman and Hall/CRC Press, 2015. URL https://www.routledge.com/Spatial-Point-Patterns-Methodology-and-Applications-with-R/Baddeley-Rubak-Turner/978482210200/

Hijmans R (2022). raster: Geographic Data Analysis and Modeling. R package version 3.5-29, <https://CRAN.R-project.org/package=raster>.

John Fox and Sanford Weisberg (2019). An {R} Companion to Applied Regression, Third Edition. Thousand Oaks CA: Sage. URL: https://socialsciences.mcmaster.ca/jfox/Books/Companion/

Lincoln A. Mullen and Jordan Bratt (2018), “USAboundaries: Historical and Contemporary Boundaries of the United States of America,” Journal of Open Source Software 3, (23):314, https://doi.org/10.21105/joss.00314.

Müller K, Walthert L (2022). styler: Non-Invasive Pretty Printing of R Code. R package version 1.8.1, <https://CRAN.R-project.org/package=styler>.

Neuwirth E (2022). RColorBrewer: ColorBrewer Palettes. R package version 1.1-3, <https://CRAN.R-project.org/package=RColorBrewer>.

Pebesma E, Mailund T, Hiebert J (2016). “Measurement Units in R.” R Journal, 8(2), 486-494. doi:10.32614/RJ-2016-061.

Roger S. Bivand, Edzer Pebesma, Virgilio Gomez-Rubio, 2013. Applied spatial data analysis with R, Second edition. Springer, NY. https://asdar-book.org/

Tennekes M (2018). “tmap: Thematic Maps in R.” Journal of Statistical Software, 84(6), 1-39. doi:10.18637/jss.v084.i06 <https://doi.org/10.18637/jss.v084.i06>.

Waring E, Quinn M, McNamara A, Arino de la Rubia E, Zhu H, Ellis S (2022). skimr: Compact and Flexible Summaries of Data. R package version 2.1.4, <https://CRAN.R-project.org/package=skimr>.

Wickham H, Averick M, Bryan J, Chang W, McGowan LD, François R, Grolemund G, Hayes A, Henry L, Hester J, Kuhn M, Pedersen TL, Miller E, Bache SM, Müller K, Ooms J, Robinson D, Seidel DP, Spinu V, Takahashi K, Vaughan D, Wilke C, Woo K, Yutani H (2019). “Welcome to the tidyverse.” Journal of Open Source Software, 4(43), 1686. doi:10.21105/joss.01686.

Wickham H, François R, Henry L, Müller K (2022). dplyr: A Grammar of Data Manipulation. R package version 1.0.10, <https://CRAN.R-project.org/package=dplyr>.

Wickham H. (2016) ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York.

Yihui Xie (2022). knitr: A General-Purpose Package for Dynamic Report Generation in R. R package version 1.40.