Import data

# excel file
data <- read_excel("../00_data/MyData3.xlsx")
data
## # A tibble: 1,222 × 10
##     year months        state  colony_size colony_max colony_lost colony_lost_pct
##    <dbl> <chr>         <chr>        <dbl> <chr>            <dbl>           <dbl>
##  1  2015 January-March Alaba…        7000 7000              1800              26
##  2  2015 January-March Arizo…       35000 35000             4600              13
##  3  2015 January-March Arkan…       13000 14000             1500              11
##  4  2015 January-March Calif…     1440000 1690000         255000              15
##  5  2015 January-March Color…        3500 12500             1500              12
##  6  2015 January-March Conne…        3900 3900               870              22
##  7  2015 January-March Flori…      305000 315000           42000              13
##  8  2015 January-March Georg…      104000 105000           14500              14
##  9  2015 January-March Hawaii       10500 10500              380               4
## 10  2015 January-March Idaho        81000 88000             3700               4
## # ℹ 1,212 more rows
## # ℹ 3 more variables: colony_added <chr>, colony_reno <chr>,
## #   colony_reno_pct <chr>

State one question

What is the relationship between the initial colony size and colony losses across all states?

Plot data

ggplot(data = data) +
    geom_point(mapping = aes(x = colony_size, y = colony_lost_pct)) +
    scale_x_continuous(label = scales::comma_format(scale = 0.000001, suffix = "M"))

Interpret

It seems that there might be a weak negative relationship, that might have a weak connection. However, it seems that the smaller the colony(colony_n) is the more colonies they loose, and the bigger the colony is, the lower number of losses is the case.