Import data

# excel file
colony <- read_excel("../00_data/myData.xlsx")
colony
## # A tibble: 1,222 × 10
##     year months    state colon…¹ colon…² colon…³ colon…⁴ colon…⁵ colon…⁶ colon…⁷
##    <dbl> <chr>     <chr>   <dbl> <chr>     <dbl>   <dbl> <chr>   <chr>   <chr>  
##  1  2015 January-… Alab…    7000 7000       1800      26 2800    250     4      
##  2  2015 January-… Ariz…   35000 35000      4600      13 3400    2100    6      
##  3  2015 January-… Arka…   13000 14000      1500      11 1200    90      1      
##  4  2015 January-… Cali… 1440000 1690000  255000      15 250000  124000  7      
##  5  2015 January-… Colo…    3500 12500      1500      12 200     140     1      
##  6  2015 January-… Conn…    3900 3900        870      22 290     NA      NA     
##  7  2015 January-… Flor…  305000 315000    42000      13 54000   25000   8      
##  8  2015 January-… Geor…  104000 105000    14500      14 47000   9500    9      
##  9  2015 January-… Hawa…   10500 10500       380       4 3400    760     7      
## 10  2015 January-… Idaho   81000 88000      3700       4 2600    8000    9      
## # … with 1,212 more rows, and abbreviated variable names ¹​colony_n,
## #   ²​colony_max, ³​colony_lost, ⁴​colony_lost_pct, ⁵​colony_added, ⁶​colony_reno,
## #   ⁷​colony_reno_pct

State one question

Which states have the most bee colony losses in 2015?

Plot data

ggplot(data = colony) + 
  geom_point(mapping = aes(x = state, y = colony_lost_pct, color = year))

ggplot(data = colony, mapping = aes(x = state, y = colony_lost_pct)) + 
  geom_point(mapping = aes(color = year)) + 
  geom_smooth(data = filter(colony, year == "2015"), se = FALSE)

Interpret

While I was able to create a plot for my data that would allow me to answer my question, I was not able to filter out some of the data that is not needed, making the graph difficult to read. There are multiple data points of 2015 for each state as they represent different time spans. I attempted to filter out data on the second plot but was unsuccesful.