Import your data

myData <- read_csv("../00_data/myData.csv")
## Rows: 1222 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): months, state
## dbl (8): year, colony_n, colony_max, colony_lost, colony_lost_pct, colony_ad...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

###Define Question

#I’m investigating how the different stressors of honeybee colonies, have any influence over their decrease of population by percentage.

###Explain Data and variables

wide to long form

myData%>%
    
    pivot_wider(names_from = "year",
                values_from = "colony_lost")
## # A tibble: 1,222 × 15
##    months    state colon…¹ colon…² colon…³ colon…⁴ colon…⁵ colon…⁶ `2015` `2016`
##    <chr>     <chr>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>  <dbl>  <dbl>
##  1 January-… Alab…    7000    7000      26    2800     250       4   1800     NA
##  2 January-… Ariz…   35000   35000      13    3400    2100       6   4600     NA
##  3 January-… Arka…   13000   14000      11    1200      90       1   1500     NA
##  4 January-… Cali… 1440000 1690000      15  250000  124000       7 255000     NA
##  5 January-… Colo…    3500   12500      12     200     140       1   1500     NA
##  6 January-… Conn…    3900    3900      22     290      NA      NA    870     NA
##  7 January-… Flor…  305000  315000      13   54000   25000       8  42000     NA
##  8 January-… Geor…  104000  105000      14   47000    9500       9  14500     NA
##  9 January-… Hawa…   10500   10500       4    3400     760       7    380     NA
## 10 January-… Idaho   81000   88000       4    2600    8000       9   3700     NA
## # … with 1,212 more rows, 5 more variables: `2017` <dbl>, `2018` <dbl>,
## #   `2019` <dbl>, `2020` <dbl>, `2021` <dbl>, and abbreviated variable names
## #   ¹​colony_n, ²​colony_max, ³​colony_lost_pct, ⁴​colony_added, ⁵​colony_reno,
## #   ⁶​colony_reno_pct

Separating and Uniting

Separate a column

myData%>%
    
    separate(col = colony_n, into = c("colony gained", "colony lost"))
## Warning: Expected 2 pieces. Missing pieces filled with `NA` in 1166 rows [1, 2,
## 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
## # A tibble: 1,222 × 11
##     year months    state colon…¹ colon…² colon…³ colon…⁴ colon…⁵ colon…⁶ colon…⁷
##    <dbl> <chr>     <chr> <chr>   <chr>     <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
##  1  2015 January-… Alab… 7000    <NA>       7000    1800      26    2800     250
##  2  2015 January-… Ariz… 35000   <NA>      35000    4600      13    3400    2100
##  3  2015 January-… Arka… 13000   <NA>      14000    1500      11    1200      90
##  4  2015 January-… Cali… 1440000 <NA>    1690000  255000      15  250000  124000
##  5  2015 January-… Colo… 3500    <NA>      12500    1500      12     200     140
##  6  2015 January-… Conn… 3900    <NA>       3900     870      22     290      NA
##  7  2015 January-… Flor… 305000  <NA>     315000   42000      13   54000   25000
##  8  2015 January-… Geor… 104000  <NA>     105000   14500      14   47000    9500
##  9  2015 January-… Hawa… 10500   <NA>      10500     380       4    3400     760
## 10  2015 January-… Idaho 81000   <NA>      88000    3700       4    2600    8000
## # … with 1,212 more rows, 1 more variable: colony_reno_pct <dbl>, and
## #   abbreviated variable names ¹​`colony gained`, ²​`colony lost`, ³​colony_max,
## #   ⁴​colony_lost, ⁵​colony_lost_pct, ⁶​colony_added, ⁷​colony_reno

Unite two columns

myData %>%
    
    unite(col = "colony_n", c(colony_added, colony_lost), sep = "/")
## # A tibble: 1,222 × 8
##     year months        state       colony_max colony_l…¹ colon…² colon…³ colon…⁴
##    <dbl> <chr>         <chr>            <dbl>      <dbl> <chr>     <dbl>   <dbl>
##  1  2015 January-March Alabama           7000         26 2800/1…     250       4
##  2  2015 January-March Arizona          35000         13 3400/4…    2100       6
##  3  2015 January-March Arkansas         14000         11 1200/1…      90       1
##  4  2015 January-March California     1690000         15 250000…  124000       7
##  5  2015 January-March Colorado         12500         12 200/15…     140       1
##  6  2015 January-March Connecticut       3900         22 290/870      NA      NA
##  7  2015 January-March Florida         315000         13 54000/…   25000       8
##  8  2015 January-March Georgia         105000         14 47000/…    9500       9
##  9  2015 January-March Hawaii           10500          4 3400/3…     760       7
## 10  2015 January-March Idaho            88000          4 2600/3…    8000       9
## # … with 1,212 more rows, and abbreviated variable names ¹​colony_lost_pct,
## #   ²​colony_n, ³​colony_reno, ⁴​colony_reno_pct
#This shows the comparison of the amount of colonies added and lost 

Visualizing distributions

#This represents the amount of colonies relative to State 
myData %>%
     ggplot(aes(x = colony_n, color = state)) +
     geom_freqpoly()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 47 rows containing non-finite values (stat_bin).

Typical values

myData %>%
    #This shows the count relative to colony lost percentage over 10%
    
    #Filter out percentage
    filter(colony_lost_pct > 10) %>%
    
    #Plot
    ggplot(aes(x = colony_lost_pct)) +
    geom_histogram(binwidth = 0.1)

### Unusual values

#This shows the count and colony lost percentage between 0 and 50%

myData %>%
    ggplot(aes(x = colony_lost_pct)) +
    geom_histogram() +
    coord_cartesian(ylim = c(0,50))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 54 rows containing non-finite values (stat_bin).

A categorical and continuous variable

#This shows the different states in alphabetical order relative to the colony lost percentage. 
myData %>%
    
    ggplot(aes(x = state, y = colony_lost_pct)) +
    geom_boxplot()
## Warning: Removed 54 rows containing non-finite values (stat_boxplot).

Two continous variables

#This shows the colonies lost relative to the colony lost percentage 
library(hexbin)
myData %>%
    ggplot(aes(x = colony_lost, y = colony_lost_pct)) +
    geom_hex()
## Warning: Removed 54 rows containing non-finite values (stat_binhex).

###Analyze data to answer the question

##Tibble 1 #This tibble gives a visual of the different states relative to the overall data you can see the percentage lost and the percentage gained in bee colonies as well as their collective population.

##Tibble 2 #This tibble shows the difference between colonies added among states versus colonies lost.

##Tibble 3 #In this tibble you can observe the max bee colonies in each state where Califronia stands out with 1.69 million. You can also notice the colony lost percentage in Alabama is significantly high at 26% as well as Connecticut at 22% most likely due to rapidly changing weather patterns or diseases among the bee population.

##Graph 1 #This graph shows the amount of bee colonies relative to the 50 states in alphabetical order.

##Graph 2 #This next graph demonstrates the colony lost percentage above 10% and shows that most average around the 10-20%.

##Graph 3 #This graph shows the colony lost percentage between 0 and 50%. Demonstrating that stressors are a major contributer to bee colony losses among states.

##Graph 4 #This graph shows the colony lost percentages among all 50 U.S states in alphabetical order. It shows that Alabama and Connecticut are some of the highest in percentage lost among states at 26% and 22%

##Graph 5 #This last graph shows the balance between colonies lost and the colony lost percentage among all of the states. Some states colony loss percentage goes as high as 50%+ with little loss at the overall population as an outlier and some states see an extremely high amount of loss without taking a major amount of percentage loss from their state.

###Interpret

##Overall, this data gives a great representation of bee colony losses and thier percentages throughout the United States. Showing the efforts to restore them as well as the honey bee’s dwindling population among different states. I believe it is most likely due to the weather and makes it inhabital for their population and if that isn’t the case then disease is defintely a major factor.