# excel file
data <- read_excel("../00_data/MyData3.xlsx")
data
## # A tibble: 1,222 × 10
## year months state colony_size colony_max colony_lost colony_lost_pct
## <dbl> <chr> <chr> <dbl> <chr> <dbl> <dbl>
## 1 2015 January-March Alaba… 7000 7000 1800 26
## 2 2015 January-March Arizo… 35000 35000 4600 13
## 3 2015 January-March Arkan… 13000 14000 1500 11
## 4 2015 January-March Calif… 1440000 1690000 255000 15
## 5 2015 January-March Color… 3500 12500 1500 12
## 6 2015 January-March Conne… 3900 3900 870 22
## 7 2015 January-March Flori… 305000 315000 42000 13
## 8 2015 January-March Georg… 104000 105000 14500 14
## 9 2015 January-March Hawaii 10500 10500 380 4
## 10 2015 January-March Idaho 81000 88000 3700 4
## # ℹ 1,212 more rows
## # ℹ 3 more variables: colony_added <chr>, colony_reno <chr>,
## # colony_reno_pct <chr>
What is the relationship between the initial colony size and colony losses across all states?
ggplot(data = data) +
geom_point(mapping = aes(x = colony_size, y = colony_lost_pct)) +
scale_x_continuous(label = scales::comma_format(scale = 0.000001, suffix = "M"))
It seems that there might be a weak negative relationship, that might have a weak connection. However, it seems that the smaller the colony(colony_n) is the more colonies they loose, and the bigger the colony is, the lower number of losses is the case.