# excel file
colony <- read_excel("../00_data/myData.xlsx")
colony
## # A tibble: 1,222 × 10
## year months state colon…¹ colon…² colon…³ colon…⁴ colon…⁵ colon…⁶ colon…⁷
## <dbl> <chr> <chr> <dbl> <chr> <dbl> <dbl> <chr> <chr> <chr>
## 1 2015 January-… Alab… 7000 7000 1800 26 2800 250 4
## 2 2015 January-… Ariz… 35000 35000 4600 13 3400 2100 6
## 3 2015 January-… Arka… 13000 14000 1500 11 1200 90 1
## 4 2015 January-… Cali… 1440000 1690000 255000 15 250000 124000 7
## 5 2015 January-… Colo… 3500 12500 1500 12 200 140 1
## 6 2015 January-… Conn… 3900 3900 870 22 290 NA NA
## 7 2015 January-… Flor… 305000 315000 42000 13 54000 25000 8
## 8 2015 January-… Geor… 104000 105000 14500 14 47000 9500 9
## 9 2015 January-… Hawa… 10500 10500 380 4 3400 760 7
## 10 2015 January-… Idaho 81000 88000 3700 4 2600 8000 9
## # … with 1,212 more rows, and abbreviated variable names ¹colony_n,
## # ²colony_max, ³colony_lost, ⁴colony_lost_pct, ⁵colony_added, ⁶colony_reno,
## # ⁷colony_reno_pct
Which states have the most bee colony losses in 2015?
ggplot(data = colony) +
geom_point(mapping = aes(x = state, y = colony_lost_pct, color = year))
ggplot(data = colony, mapping = aes(x = state, y = colony_lost_pct)) +
geom_point(mapping = aes(color = year)) +
geom_smooth(data = filter(colony, year == "2015"), se = FALSE)
While I was able to create a plot for my data that would allow me to answer my question, I was not able to filter out some of the data that is not needed, making the graph difficult to read. There are multiple data points of 2015 for each state as they represent different time spans. I attempted to filter out data on the second plot but was unsuccesful.