myData <- read_csv("../00_data/myData.csv")
## Rows: 1222 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): months, state
## dbl (8): year, colony_n, colony_max, colony_lost, colony_lost_pct, colony_ad...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
###Define Question
#I’m investigating how the different stressors of honeybee colonies, have any influence over their decrease of population by percentage.
###Explain Data and variables
myData%>%
pivot_wider(names_from = "year",
values_from = "colony_lost")
## # A tibble: 1,222 × 15
## months state colon…¹ colon…² colon…³ colon…⁴ colon…⁵ colon…⁶ `2015` `2016`
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 January-… Alab… 7000 7000 26 2800 250 4 1800 NA
## 2 January-… Ariz… 35000 35000 13 3400 2100 6 4600 NA
## 3 January-… Arka… 13000 14000 11 1200 90 1 1500 NA
## 4 January-… Cali… 1440000 1690000 15 250000 124000 7 255000 NA
## 5 January-… Colo… 3500 12500 12 200 140 1 1500 NA
## 6 January-… Conn… 3900 3900 22 290 NA NA 870 NA
## 7 January-… Flor… 305000 315000 13 54000 25000 8 42000 NA
## 8 January-… Geor… 104000 105000 14 47000 9500 9 14500 NA
## 9 January-… Hawa… 10500 10500 4 3400 760 7 380 NA
## 10 January-… Idaho 81000 88000 4 2600 8000 9 3700 NA
## # … with 1,212 more rows, 5 more variables: `2017` <dbl>, `2018` <dbl>,
## # `2019` <dbl>, `2020` <dbl>, `2021` <dbl>, and abbreviated variable names
## # ¹colony_n, ²colony_max, ³colony_lost_pct, ⁴colony_added, ⁵colony_reno,
## # ⁶colony_reno_pct
myData%>%
separate(col = colony_n, into = c("colony gained", "colony lost"))
## Warning: Expected 2 pieces. Missing pieces filled with `NA` in 1166 rows [1, 2,
## 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
## # A tibble: 1,222 × 11
## year months state colon…¹ colon…² colon…³ colon…⁴ colon…⁵ colon…⁶ colon…⁷
## <dbl> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2015 January-… Alab… 7000 <NA> 7000 1800 26 2800 250
## 2 2015 January-… Ariz… 35000 <NA> 35000 4600 13 3400 2100
## 3 2015 January-… Arka… 13000 <NA> 14000 1500 11 1200 90
## 4 2015 January-… Cali… 1440000 <NA> 1690000 255000 15 250000 124000
## 5 2015 January-… Colo… 3500 <NA> 12500 1500 12 200 140
## 6 2015 January-… Conn… 3900 <NA> 3900 870 22 290 NA
## 7 2015 January-… Flor… 305000 <NA> 315000 42000 13 54000 25000
## 8 2015 January-… Geor… 104000 <NA> 105000 14500 14 47000 9500
## 9 2015 January-… Hawa… 10500 <NA> 10500 380 4 3400 760
## 10 2015 January-… Idaho 81000 <NA> 88000 3700 4 2600 8000
## # … with 1,212 more rows, 1 more variable: colony_reno_pct <dbl>, and
## # abbreviated variable names ¹`colony gained`, ²`colony lost`, ³colony_max,
## # ⁴colony_lost, ⁵colony_lost_pct, ⁶colony_added, ⁷colony_reno
myData %>%
unite(col = "colony_n", c(colony_added, colony_lost), sep = "/")
## # A tibble: 1,222 × 8
## year months state colony_max colony_l…¹ colon…² colon…³ colon…⁴
## <dbl> <chr> <chr> <dbl> <dbl> <chr> <dbl> <dbl>
## 1 2015 January-March Alabama 7000 26 2800/1… 250 4
## 2 2015 January-March Arizona 35000 13 3400/4… 2100 6
## 3 2015 January-March Arkansas 14000 11 1200/1… 90 1
## 4 2015 January-March California 1690000 15 250000… 124000 7
## 5 2015 January-March Colorado 12500 12 200/15… 140 1
## 6 2015 January-March Connecticut 3900 22 290/870 NA NA
## 7 2015 January-March Florida 315000 13 54000/… 25000 8
## 8 2015 January-March Georgia 105000 14 47000/… 9500 9
## 9 2015 January-March Hawaii 10500 4 3400/3… 760 7
## 10 2015 January-March Idaho 88000 4 2600/3… 8000 9
## # … with 1,212 more rows, and abbreviated variable names ¹colony_lost_pct,
## # ²colony_n, ³colony_reno, ⁴colony_reno_pct
#This shows the comparison of the amount of colonies added and lost
#This represents the amount of colonies relative to State
myData %>%
ggplot(aes(x = colony_n, color = state)) +
geom_freqpoly()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 47 rows containing non-finite values (stat_bin).
myData %>%
#This shows the count relative to colony lost percentage over 10%
#Filter out percentage
filter(colony_lost_pct > 10) %>%
#Plot
ggplot(aes(x = colony_lost_pct)) +
geom_histogram(binwidth = 0.1)
### Unusual values
#This shows the count and colony lost percentage between 0 and 50%
myData %>%
ggplot(aes(x = colony_lost_pct)) +
geom_histogram() +
coord_cartesian(ylim = c(0,50))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 54 rows containing non-finite values (stat_bin).
#This shows the different states in alphabetical order relative to the colony lost percentage.
myData %>%
ggplot(aes(x = state, y = colony_lost_pct)) +
geom_boxplot()
## Warning: Removed 54 rows containing non-finite values (stat_boxplot).
#This shows the colonies lost relative to the colony lost percentage
library(hexbin)
myData %>%
ggplot(aes(x = colony_lost, y = colony_lost_pct)) +
geom_hex()
## Warning: Removed 54 rows containing non-finite values (stat_binhex).
###Analyze data to answer the question
##Tibble 1 #This tibble gives a visual of the different states relative to the overall data you can see the percentage lost and the percentage gained in bee colonies as well as their collective population.
##Tibble 2 #This tibble shows the difference between colonies added among states versus colonies lost.
##Tibble 3 #In this tibble you can observe the max bee colonies in each state where Califronia stands out with 1.69 million. You can also notice the colony lost percentage in Alabama is significantly high at 26% as well as Connecticut at 22% most likely due to rapidly changing weather patterns or diseases among the bee population.
##Graph 1 #This graph shows the amount of bee colonies relative to the 50 states in alphabetical order.
##Graph 2 #This next graph demonstrates the colony lost percentage above 10% and shows that most average around the 10-20%.
##Graph 3 #This graph shows the colony lost percentage between 0 and 50%. Demonstrating that stressors are a major contributer to bee colony losses among states.
##Graph 4 #This graph shows the colony lost percentages among all 50 U.S states in alphabetical order. It shows that Alabama and Connecticut are some of the highest in percentage lost among states at 26% and 22%
##Graph 5 #This last graph shows the balance between colonies lost and the colony lost percentage among all of the states. Some states colony loss percentage goes as high as 50%+ with little loss at the overall population as an outlier and some states see an extremely high amount of loss without taking a major amount of percentage loss from their state.
###Interpret
##Overall, this data gives a great representation of bee colony losses and thier percentages throughout the United States. Showing the efforts to restore them as well as the honey bee’s dwindling population among different states. I believe it is most likely due to the weather and makes it inhabital for their population and if that isn’t the case then disease is defintely a major factor.