Fireflies are a well-loved variety of beetle that is are found across the globe. In the US, it is especially common along the East Coast, although every state within the continental United States is home to at least one species (Alaska and Hawaii have zero). There are roughly fifty different species across three general taxonomic groups.
Many of these species are classified on the IUCN Red List as endangered or threatened, but an even larger number lack sufficient data to make an evaluation. The Firefly Watch project run by Mass Audobon is attempting to fill some of these gaps through the power of crowd sourced data.
Crowd-sourced data is data that is collected by a large group of
people (the “crowd”) who are not otherwise connected to the research
project. These contributors are often referred to as “citizen
scientists.”
There are many strengths to crowd sourced data. For one, it can be
significantly cheaper to obtain compared to having a research team
collect an equivalent amount themselves. It also provides significantly
more opportunity for observations across time and space. This is
invaluable for studies of rare phenomena in which even the best trained
and best equipped researchers would have a hard time collecting
sufficient data by themselves.
One of the drawbacks is that there is an inherent bias in the data
collected. Since the data is not collected through random sampling, but
is submitted by motivated volunteers, there is a strong possibility that
the sample data will not be representative of the overall population.
Additionally, there is a natural tendency for people to not report
“non-findings,” similar to how experiments that support a null
hypothesis are rarely submitted for publication.
The non-uniformity of space is particularly relevant for crowd
sourced data as the distribution of data sources will almost certainly
be skewed towards areas of higher population. The locational fallacy
might also be exacerbated due to increased input errors from lack of
proper training or unclear instructions given to the volunteers.
#-------------------------------------------------------------------------------
# IMPORT DATA
#-------------------------------------------------------------------------------
firefly <- read_excel("fireflydata.xlsx") %>%
as.data.frame() # prefer classic df > tibbleObject of Analysis
Averages of three ten second observations in ten minutes of observed
firefly flashes in the United States from 2008-2016.
Years & Number of Observations
##
## 2008 2009 2010 2011 2012 2013 2014 2015 2016
## 1542 2892 2537 2822 2590 1824 1720 1993 2214
Other Summary Statistics
## [1] 75.24436
## [1] 23
## [1] 8000
##
## Agricultural City Rural: forest Rural: open area
## 590 1524 2424 2893
## Suburb Suburban_rural mix
## 6702 6001
##
## Clear Cloudy Foggy Partly cloudy
## 8423 2995 149 8567
##
## No Yes
## 2645 17489
crs.firefly <- 4326
sf.firefly <- st_as_sf(firefly, coords = c("Longitude", "Latitude"), crs = crs.firefly)
sf.states <- ne_states(country = "united states of america", returnclass = "sf") %>%
st_transform(crs = crs.firefly)st_geometry allows us to plot only the geometries without any marks. Without it, R will create an individual plot for every mark, like this: