This dataset explores the farmed bee colonies across the United States, using the variables of the year, the state in which the colony resides, the number of colonies, the number of beekeepers, and the percentage of colonies/beekepers exclusive to that state.
Load in the libraires and the dataset
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.0.4
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
New names:
Rows: 581 Columns: 9
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(3): Year, Season, State dbl (6): ...1, Total.Annual.Loss, Beekeepers,
Beekeepers.Exclusive.to.State,...
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...1`
# A tibble: 6 × 10
...1 year season state total.annual.loss beekeepers beekeepers.exclusive…¹
<dbl> <chr> <chr> <chr> <dbl> <dbl> <dbl>
1 1 2020/21 Annual Alas… NA NA NA
2 2 2020/21 Annual Dist… NA NA NA
3 3 2020/21 Annual Guam NA NA NA
4 4 2020/21 Annual Hawa… NA NA NA
5 5 2020/21 Annual Neva… NA NA NA
6 6 2020/21 Annual Puer… NA NA NA
# ℹ abbreviated name: ¹beekeepers.exclusive.to.state
# ℹ 3 more variables: colonies <dbl>, colonies.exclusive.to.state <dbl>,
# year_new <chr>
# A tibble: 10 × 2
state colony_total
<chr> <dbl>
1 California 3528138
2 North Dakota 1683256
3 Texas 1175026
4 Idaho 693353
5 Washington 661741
6 Minnesota 522269
7 Florida 492600
8 Nebraska 447400
9 Oregon 357551
10 Maine 309406
Create the alluveal graph combining all of the new variables.
ggalluv <- bee_colony2 |>filter(state %in%c("California", "North Dakota", "Texas", "Idaho", "Washington", "Minnesota", "Florida", "Nebraska", "Oregon", "Maine"))|>ggplot(aes(x = year_new, y = colonies, alluvium = state)) +theme_bw() +geom_alluvium(aes(fill = state), color ="white",width = .1, alpha = .8,decreasing =FALSE) +scale_fill_manual(values =c("#92351e","#fbe3c2","#d37750","#ecb27d","#f7c267","#591c19","#EF6C00","#b64f32","#d39a2d","#BF360C")) +labs(title ="Number of Bee Colonies in the Top 10 Producing States between 2010-2020",x ="Year", y ="Number of Colonies", fill ="State",caption ="Source: USDA (National Agricultural Statistics Service)")ggalluv
Essay
This dataset began with a large-scale presentation of information, so I had to filter it to present meaningful information in my visualization. Initially, I attempted to narrow down the 50 states it provided to five regions (Midwest, Northeast, South, West, and other). I did this using “mutate” to create a new variable “regions”. Although I decided against presenting this, it helped me get a sense of the similarities of states within a region, and the great differences of bee colonies between regions. Later, I filtered for the 10 of the 50 the states with the greatest number of bee colonies they farm. This was an interesting comparison to see what states raise very many bee colonies, and how the numbers change over time. I wanted to present the timeline of this fluctuation, but the data set’s “year” variable was unfit for coding due to the backslash format, so I used the “str_remove” to mutate a new year column that could be understood in the visualization. In addition, I filtered for the most relevant years to be on my graph using the “unique” function to eliminate any filler/redundant years included in the data set.
I found that California was consistently the state with the highest number of bee colonies, and the state with the lowest (of the 10 states) alternated between Idaho , and Nebraska. Although they had stark differences, every state experienced a sudden, steep drop in bee colonies in the year 2017. This is due to the “Varroa destructor” mite infecting populations across the country. This parasite killed off around 1/3 of the nations bee population which caused great damage to their surrounding ecosystems, and human agricultural crops. This devastating loss was brief, but bee populations never fully recovered to their original numbers pre-disaster. This is due to numerous reasons such as ongoing disease, parasites, and human pesticides.