Love them of hate them, Honeybees are the most effective pollinators on the plant. They play a significant role in the nature by pollinating 85% of food crops intended for human consumption (FDA). In addition, they produce honey and wax. Two commonly used raw material.
In recent years, there has been reports that the number of bees in the United States are quickly dwindling. With the continuous disappearance of honeybees, devastating ripple effect are occurring across all ecosystems. If honeybees were to go extinct, we would lose all the plants that bees pollinate. With the significant decrease of plants in our ecosystem, all animals that eat those plants would die due to starvation, and so on up the food chain. In addition, our supermarkets would have half the amount of fruits and vegetables.
Because of the contribution and struggles of honeybees, many beekeepers and activist have arise to preserve the honeybee population such as the save the bee campaign.
In order to preserve the honeybee population, honeybees activist need to find the root cause on why honeybees are disappearing.
For this project, I a taking a look into the disappearance of honeybees in the United States through 2010-2016.
The data was obtained from the DATA 110 dataset course folder, but original originated from the Data World website at: https://data.world/finley/bee-colony-statistical-data-from-1987-2017/workspace/file?filename=Bee+Colony+Loss.xlsx
This dataset is a statistical collection of data from 2010 - 2016. The data provides information on (by state) the loss of bee colonies (by percent) annually, the number of known colonies, the number of known beekeepers, and the percentage of beekeepers exclusive to the state.
Three categorical and five quantitative variables.
Total rows: 365
Total Features: 8
Year: Year in format fiscal format YYYY/YY
Season: All were annual
State: Full name of state in the United States
Annual Loss: Annual colony loss per 100 colonies
Beekeepers: Total number of beekeepers
Beekeepers Exclusive to State: Annual number of beekeepers per 100 that only work in one state
Colonies: Total number of honey bee colonies
Colonies Exclusive to State: Annual number of colonies per 100 that remain in the same state for the entire year.
The reason I chose this topic is because my initial topic was stolen, sad face. While scrolling around the DATA 110 dataset folder, I came upon the bee_colony_loss dataset. I was instantly reminded that for my birthday by auntie provides me with her freshly homemade baklava made from her own beehive. I was devastated from the thought if bees were to extinct, I would never be able to taste my aunties delicious baklava again. From there, I was intrigued and wanted to explore this dataset.
Packages required for this project:
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5 v purrr 0.3.4
## v tibble 3.1.5 v dplyr 1.0.7
## v tidyr 1.1.4 v stringr 1.4.0
## v readr 2.0.2 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library("dplyr")
library("ggplot2")
library("plotly")
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
library("magrittr")
##
## Attaching package: 'magrittr'
## The following object is masked from 'package:purrr':
##
## set_names
## The following object is masked from 'package:tidyr':
##
## extract
library("ggthemes")
library("usmap")
library("knitr")
# Setting working directory
setwd("C:/Users/Jerem/OneDrive/Documents/Montgomery College/Fall 2021/DATA 110/Project 2")
# Provide a fast and friendly way to read rectangular data
library(readxl)
# rereading dataset into bees
bees <- read_csv("bee_colony_loss.csv")
## Rows: 365 Columns: 8
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (6): Year, Season, ÿState, ÿTotal Annual Loss, ÿBeekeepers Exclusive to ...
## dbl (2): ÿBeekeepers, ÿColonies
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
#View(bees)
str(bees)
## spec_tbl_df [365 x 8] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ Year : chr [1:365] "2016/17" "2016/17" "2016/17" "2016/17" ...
## $ Season : chr [1:365] "Annual" "Annual" "Annual" "Annual" ...
## $ ÿState : chr [1:365] "Massachusetts" "Montana" "Nevada" "Maine" ...
## $ ÿTotal Annual Loss : chr [1:365] "15.90%" "17.10%" "23.00%" "23.30%" ...
## $ ÿBeekeepers : num [1:365] 87 21 13 65 18 10 9 52 30 62 ...
## $ ÿBeekeepers Exclusive to State: chr [1:365] "94.30%" "52.40%" "92.30%" "93.80%" ...
## $ ÿColonies : num [1:365] 27186 35905 2512 41102 6521 ...
## $ ÿColonies Exclusive to State : chr [1:365] "2.30%" "0.30%" "5.20%" "1.40%" ...
## - attr(*, "spec")=
## .. cols(
## .. Year = col_character(),
## .. Season = col_character(),
## .. ÿState = col_character(),
## .. `ÿTotal Annual Loss` = col_character(),
## .. ÿBeekeepers = col_double(),
## .. `ÿBeekeepers Exclusive to State` = col_character(),
## .. ÿColonies = col_double(),
## .. `ÿColonies Exclusive to State` = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
Each title of each columns besides Year and State has a letter y with a diaeresis next to it. For convenience stake, I am giving each column a new name.
# Give new column names
colnames(bees) <- c("Year", "Season", "State","Annual.Percent.Loss", "Beekeeper.Count", "Beekeepers.Percent.Exclusive.to.State","Colony.Count","Colonies.Percent.Exclusive.to.State")
head(bees)
## # A tibble: 6 x 8
## Year Season State Annual.Percent.~ Beekeeper.Count Beekeepers.Perc~
## <chr> <chr> <chr> <chr> <dbl> <chr>
## 1 2016/17 Annual Massachusetts 15.90% 87 94.30%
## 2 2016/17 Annual Montana 17.10% 21 52.40%
## 3 2016/17 Annual Nevada 23.00% 13 92.30%
## 4 2016/17 Annual Maine 23.30% 65 93.80%
## 5 2016/17 Annual Wyoming 23.40% 18 77.80%
## 6 2016/17 Annual Hawaii 26.20% 10 100.00%
## # ... with 2 more variables: Colony.Count <dbl>,
## # Colonies.Percent.Exclusive.to.State <chr>
Several columns are factors when they should be numeric.
# removing % symbols for pure numeric values
bees[] <- lapply(bees, gsub, pattern = '%', replacement = '')
bees$Annual.Percent.Loss <- as.numeric(bees$Annual.Percent.Loss)
bees$Beekeepers.Percent.Exclusive.to.State <- as.numeric(bees$Beekeepers.Percent.Exclusive.to.State)
bees$Colonies.Percent.Exclusive.to.State <- as.numeric(bees$Colonies.Percent.Exclusive.to.State)
head(bees)
## # A tibble: 6 x 8
## Year Season State Annual.Percent.~ Beekeeper.Count Beekeepers.Perc~
## <chr> <chr> <chr> <dbl> <chr> <dbl>
## 1 2016/17 Annual Massachusetts 15.9 87 94.3
## 2 2016/17 Annual Montana 17.1 21 52.4
## 3 2016/17 Annual Nevada 23 13 92.3
## 4 2016/17 Annual Maine 23.3 65 93.8
## 5 2016/17 Annual Wyoming 23.4 18 77.8
## 6 2016/17 Annual Hawaii 26.2 10 100
## # ... with 2 more variables: Colony.Count <chr>,
## # Colonies.Percent.Exclusive.to.State <dbl>
Beekeeper Count and Colony Count randomly turned into characters. Here we changed them back to numeric values
bees$Beekeeper.Count<- as.integer(bees$Beekeeper.Count)
bees$Colony.Count<- as.integer(bees$Colony.Count)
bees$Colony.Count<- as.numeric(bees$Colony.Count)
head(bees)
## # A tibble: 6 x 8
## Year Season State Annual.Percent.~ Beekeeper.Count Beekeepers.Perc~
## <chr> <chr> <chr> <dbl> <int> <dbl>
## 1 2016/17 Annual Massachusetts 15.9 87 94.3
## 2 2016/17 Annual Montana 17.1 21 52.4
## 3 2016/17 Annual Nevada 23 13 92.3
## 4 2016/17 Annual Maine 23.3 65 93.8
## 5 2016/17 Annual Wyoming 23.4 18 77.8
## 6 2016/17 Annual Hawaii 26.2 10 100
## # ... with 2 more variables: Colony.Count <dbl>,
## # Colonies.Percent.Exclusive.to.State <dbl>
In this project, I am using plotly to make interactive maps. States need to be abbreviated so choropleth maps and plotly can work properly.
State Abbreviation dataset was obtain from World Population Review website: https://worldpopulationreview.com/states/state-abbreviations
# Storing state code abbreviation
states <- read_csv("states.csv")
## Rows: 51 Columns: 3
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (3): State, Abbrev, Code
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
#view(states)
# Joining both datasets together
new_bees <-bees %>%
inner_join(states, by.x = State, by.y = state) %>%
select(Year, Season, Code, Annual.Percent.Loss, Beekeeper.Count, Beekeepers.Percent.Exclusive.to.State, Colony.Count, Colonies.Percent.Exclusive.to.State)
## Joining, by = "State"
head(new_bees)
## # A tibble: 6 x 8
## Year Season Code Annual.Percent.Loss Beekeeper.Count Beekeepers.Percent.E~
## <chr> <chr> <chr> <dbl> <int> <dbl>
## 1 2016/17 Annual MA 15.9 87 94.3
## 2 2016/17 Annual MT 17.1 21 52.4
## 3 2016/17 Annual NV 23 13 92.3
## 4 2016/17 Annual ME 23.3 65 93.8
## 5 2016/17 Annual WY 23.4 18 77.8
## 6 2016/17 Annual HI 26.2 10 100
## # ... with 2 more variables: Colony.Count <dbl>,
## # Colonies.Percent.Exclusive.to.State <dbl>
Later on in my analysis, I found it troublesome to work with the leading years in the Year column. I removed the leading years from the year column.
# the Year column only shows the first 4 characters in each row
new_bees$Year <- substr(new_bees$Year,1,4)
head(new_bees)
## # A tibble: 6 x 8
## Year Season Code Annual.Percent.Loss Beekeeper.Count Beekeepers.Percent.Exc~
## <chr> <chr> <chr> <dbl> <int> <dbl>
## 1 2016 Annual MA 15.9 87 94.3
## 2 2016 Annual MT 17.1 21 52.4
## 3 2016 Annual NV 23 13 92.3
## 4 2016 Annual ME 23.3 65 93.8
## 5 2016 Annual WY 23.4 18 77.8
## 6 2016 Annual HI 26.2 10 100
## # ... with 2 more variables: Colony.Count <dbl>,
## # Colonies.Percent.Exclusive.to.State <dbl>
Adding a new variables called “hover” for my plotly text descriptions.
new_bees$hover <- with(new_bees, paste("State:", new_bees$Code,
"<br>",
"Beekeeper Count:", new_bees$Beekeeper.Count,
"<br>",
"Beekeepers %:", new_bees$Beekeepers.Percent.Exclusive.to.State,
"<br>",
"Colony Count:", new_bees$Colony.Count,
"<br>",
"Colonies %:", new_bees$Colonies.Percent.Exclusive.to.State))
Checking how many rows that have NA values.
row.has.na<- apply(new_bees,1,function(x){any(is.na(x))})
sum(row.has.na)
## [1] 13
According to the raw dataset there are 13 rows with NA values. To make sure there are no value errors within the analysis, I am removing all NA value rows.
# Removing "NA"s
df_bees <- new_bees[!row.has.na,]
str(df_bees)
## tibble [342 x 9] (S3: tbl_df/tbl/data.frame)
## $ Year : chr [1:342] "2016" "2016" "2016" "2016" ...
## $ Season : chr [1:342] "Annual" "Annual" "Annual" "Annual" ...
## $ Code : chr [1:342] "MA" "MT" "NV" "ME" ...
## $ Annual.Percent.Loss : num [1:342] 15.9 17.1 23 23.3 23.4 26.2 26.3 26.6 27.3 29.2 ...
## $ Beekeeper.Count : int [1:342] 87 21 13 65 18 10 9 52 30 62 ...
## $ Beekeepers.Percent.Exclusive.to.State: num [1:342] 94.3 52.4 92.3 93.8 77.8 100 22.2 94.2 83.3 82.3 ...
## $ Colony.Count : num [1:342] 27186 35905 2512 41102 6521 ...
## $ Colonies.Percent.Exclusive.to.State : num [1:342] 2.3 0.3 5.2 1.4 1.4 100 0.2 79.8 1.9 4.2 ...
## $ hover : chr [1:342] "State: MA <br> Beekeeper Count: 87 <br> Beekeepers %: 94.3 <br> Colony Count: 27186 <br> Colonies %: 2.3" "State: MT <br> Beekeeper Count: 21 <br> Beekeepers %: 52.4 <br> Colony Count: 35905 <br> Colonies %: 0.3" "State: NV <br> Beekeeper Count: 13 <br> Beekeepers %: 92.3 <br> Colony Count: 2512 <br> Colonies %: 5.2" "State: ME <br> Beekeeper Count: 65 <br> Beekeepers %: 93.8 <br> Colony Count: 41102 <br> Colonies %: 1.4" ...
df_bees %>% head() %>% knitr::kable()
| Year | Season | Code | Annual.Percent.Loss | Beekeeper.Count | Beekeepers.Percent.Exclusive.to.State | Colony.Count | Colonies.Percent.Exclusive.to.State | hover |
|---|---|---|---|---|---|---|---|---|
| 2016 | Annual | MA | 15.9 | 87 | 94.3 | 27186 | 2.3 | State: MA Beekeeper Count: 87 Beekeepers %: 94.3 Colony Count: 27186 Colonies %: 2.3 |
| 2016 | Annual | MT | 17.1 | 21 | 52.4 | 35905 | 0.3 | State: MT Beekeeper Count: 21 Beekeepers %: 52.4 Colony Count: 35905 Colonies %: 0.3 |
| 2016 | Annual | NV | 23.0 | 13 | 92.3 | 2512 | 5.2 | State: NV Beekeeper Count: 13 Beekeepers %: 92.3 Colony Count: 2512 Colonies %: 5.2 |
| 2016 | Annual | ME | 23.3 | 65 | 93.8 | 41102 | 1.4 | State: ME Beekeeper Count: 65 Beekeepers %: 93.8 Colony Count: 41102 Colonies %: 1.4 |
| 2016 | Annual | WY | 23.4 | 18 | 77.8 | 6521 | 1.4 | State: WY Beekeeper Count: 18 Beekeepers %: 77.8 Colony Count: 6521 Colonies %: 1.4 |
| 2016 | Annual | HI | 26.2 | 10 | 100.0 | 84 | 100.0 | State: HI Beekeeper Count: 10 Beekeepers %: 100 Colony Count: 84 Colonies %: 100 |
To begin with, I want to address if there is actually an annual loss of honeybees throughout the years.
bee_bargraph <- df_bees %>%
ggplot(aes(x = Year, y = Annual.Percent.Loss)) +
geom_col(aes(fill = Year)) +
labs(title = "Annual Loss Throughout 2010 - 2016",
x = "Year",
y = "Annual Loss")
bee_bargraph
Certainly, the annual loss of honeybees has spiked since 2011 by 00%. On average, 00% of honeybees disappear each year.
### Make Box plots of % loss by year
bee_boxplots <- ggplot(df_bees, aes(Year, Annual.Percent.Loss)) +
geom_boxplot(fill = 7) +
scale_y_continuous(name = "Annual Percent Loss",
breaks = seq(0, 100, 10),
limits=c(0, 100)) +
ggtitle("Box Plot of Annual Percent Loss by Year") +
theme_dark() +
scale_colour_brewer(palette="Accent")
bee_boxplots
The variability from year to year is quite high, but majority of colony loss hasn’t change overall. The year of 2010 is quite eye catching. The variability on annual loss was high before the initial incline of annual loss.
summary(df_bees)
## Year Season Code Annual.Percent.Loss
## Length:342 Length:342 Length:342 Min. : 7.50
## Class :character Class :character Class :character 1st Qu.:30.50
## Mode :character Mode :character Mode :character Median :39.85
## Mean :41.22
## 3rd Qu.:48.65
## Max. :86.90
## Beekeeper.Count Beekeepers.Percent.Exclusive.to.State Colony.Count
## Min. : 5.00 Min. : 3.90 Min. : 29
## 1st Qu.: 23.00 1st Qu.: 85.62 1st Qu.: 1046
## Median : 55.00 Median : 92.90 Median : 5740
## Mean : 87.97 Mean : 87.05 Mean : 31906
## 3rd Qu.:112.75 3rd Qu.: 96.78 3rd Qu.: 31220
## Max. :828.00 Max. :100.00 Max. :625897
## Colonies.Percent.Exclusive.to.State hover
## Min. : 0.00 Length:342
## 1st Qu.: 3.50 Class :character
## Median : 20.35 Mode :character
## Mean : 38.87
## 3rd Qu.: 79.50
## Max. :100.00
Perhaps some appropriate questions associates with the dataset include: How does the number of beekeepers in a state relate to the number of bee colonies? Do particular states attract more bee colonies? Do particular states attract more bee keepers? Is there a correlation between the number of beekeepers vs number of bees and the annual loss across states? Is there a correlation between the four cardinal directions and the annual loss?
#color palette didn't work
Honey <- c("#E0900E", "#FBc349", "F8B51E", "#FFFDDE", "#DCC4A4" )
bee_map1 <- plot_geo(df_bees,
locationmode = 'USA-states',
frame = ~Year) %>%
add_trace(locations = ~Code,
z = ~Annual.Percent.Loss,
zmin = 0,
zmax = max(df_bees$Annual.Percent.Loss),
color = ~Annual.Percent.Loss,
colorscale = "heat",
text = ~hover,
hoverinfo = 'text') %>%
layout(geo = list(scope = 'usa'),
title = "Annual Percent Loss of Honeybees in the US\n2010 - 2016") %>%
config(displayModeBar = FALSE) %>%
colorbar(ticksuffix = "%")
bee_map1
Missing color is due to the removal of NA values.
Scrolling through the years it seems like the east has a higher annual loss than the west.
df_bees %>%
group_by(Code) %>%
tally()
## # A tibble: 50 x 2
## Code n
## <chr> <int>
## 1 AL 7
## 2 AR 7
## 3 AZ 7
## 4 CA 7
## 5 CO 7
## 6 CT 7
## 7 DC 5
## 8 DE 7
## 9 FL 7
## 10 GA 7
## # ... with 40 more rows
On average, each state has 7 rows of data - one row per year excluding District of Columbia (5 rows of data), Mississippi (6 rows of data), Nevada (5 rows of data), South Dakota (5 rows of data), and Wyoming (5 rows of data).
From this project analysis, we can conclude that there has been a large disappearance of honeybees throughout the years of 2010 - 2016.
Time wasn’t on my side for this project. I would of like to explore the factors for the disappearance of honeybees and the questions I was curious about the dataset.
Does the number of beekeepers in a state relate to the number of bee colonies?
Do particular states attract more bee colonies? Do particular states attract more bee keepers?
Is there a correlation between the number of beekeepers vs number of bees and the annual loss across states?
Is there a correlation between the four cardinal directions and the annual loss?
In addition, if I had more time, I would of like to find and join others dataset to explore the massive decreases of the honeybee’s population each year.
Dataset I would of joined:
Stressors of honeybees
Temperature of each state throughout the years
Correlation between civilization, human impact, and honeybees
[1] EPA, Environmental Protection Agency, https://www.epa.gov/pollinator-protection/colony-collapse-disorder.
[2] Medicine, Center for Veterinary. “Helping Agriculture’s Helpful Honey Bees.” U.S. Food and Drug Administration, FDA, https://www.fda.gov/animal-veterinary/animal-health-literacy/helping-agricultures-helpful-honey-bees.
[3] Crane, Eva (1983) The Archaeology of Beekeeping, Cornell University Press, ISBN 0-8014-1609-4