1 Packages and data

Load packages and read in datasets for class exercises:

fish.abundance <- read.csv(file = "data/fish_abundance.csv")
site.info <- read.csv(file = "data/site_info.csv")

2 Exercise 1

Re-familiarize with the “fish_abundance.csv” dataset. The dataset includes many fish species, which were counted across different sites and depths. Fish diversity often decreases with depth on coral reefs, so let’s explore whether there is a relationship between depth and diversity.

2.1 Part I

  1. Simplify the dataset to include species, site, depth.
  2. Use distinct() to make sure no species were recorded twice.
  3. Calculate species richness by site and depth.
summary(fish.abundance)
##     surveyid           country              site              sitelat      
##  Min.   :  4000720   Length:6159        Length:6159        Min.   :-14.69  
##  1st Qu.:  4000733   Class :character   Class :character   1st Qu.:-14.69  
##  Median :  4000750   Mode  :character   Mode  :character   Median :-14.66  
##  Mean   :402351719                                         Mean   :-14.67  
##  3rd Qu.:912347536                                         3rd Qu.:-14.65  
##  Max.   :912347584                                         Max.   :-14.65  
##     sitelong      surveydate            depth           family         
##  Min.   :145.4   Length:6159        Min.   : 1.000   Length:6159       
##  1st Qu.:145.4   Class :character   1st Qu.: 3.000   Class :character  
##  Median :145.4   Mode  :character   Median : 4.500   Mode  :character  
##  Mean   :145.5                      Mean   : 5.569                     
##  3rd Qu.:145.5                      3rd Qu.: 6.000                     
##  Max.   :145.5                      Max.   :16.000                     
##     genus             species              block           total       
##  Length:6159        Length:6159        Min.   :1.000   Min.   :   1.0  
##  Class :character   Class :character   1st Qu.:1.000   1st Qu.:   1.0  
##  Mode  :character   Mode  :character   Median :1.000   Median :   3.0  
##                                        Mean   :1.497   Mean   :  31.6  
##                                        3rd Qu.:2.000   3rd Qu.:  12.0  
##                                        Max.   :2.000   Max.   :8000.0  
##     genspe         
##  Length:6159       
##  Class :character  
##  Mode  :character  
##                    
##                    
## 
fish.sprich <- fish.abundance %>%
  select(site, depth, genspe) %>%
  distinct() %>%
  group_by(site, depth) %>%
  summarize(sprich = n())
## `summarise()` has grouped output by 'site'. You can override using the
## `.groups` argument.
fish.sprich
## # A tibble: 43 × 3
## # Groups:   site [15]
##    site                depth sprich
##    <chr>               <dbl>  <int>
##  1 Bird Islets           2.5    107
##  2 Bird Islets           3      117
##  3 Bird Islets           3.1     64
##  4 Bird Islets          10       40
##  5 Blue Hole             2       68
##  6 Blue Hole             3.5     71
##  7 Blue Hole             9       65
##  8 Blue Hole            10       58
##  9 Blue Lagoon Channel   5       88
## 10 Blue Lagoon Channel   8       85
## # … with 33 more rows

2.2 Part II

  1. Plot depth versus species richness.
  2. Add site as a color variable.
fish.sprich.plot <- ggplot(fish.sprich, aes(x = depth, y = sprich, color = site)) +
  geom_point()
fish.sprich.plot

2.3 Part III

  1. Produce another plot, only including the families Labridae, Acanthuridae, Pomacentridae, and Chaetodontidae.
  2. Use with facet_wrap() or facet_grid() to create separate facets for each family.
fish.sprich.fam <- fish.abundance %>%
  filter(family %in% c("Acanthuridae", "Chaetodontidae", "Serranidae", "Pomacentridae")) %>%
  select(site, depth, family, genspe) %>%
  distinct() %>%
  group_by(site, depth, family) %>%
  summarize(sprich = n()) 
## `summarise()` has grouped output by 'site', 'depth'. You can override using the
## `.groups` argument.
fish.sprich.fam
## # A tibble: 162 × 4
## # Groups:   site, depth [43]
##    site        depth family         sprich
##    <chr>       <dbl> <chr>           <int>
##  1 Bird Islets   2.5 Acanthuridae        9
##  2 Bird Islets   2.5 Chaetodontidae      6
##  3 Bird Islets   2.5 Pomacentridae      29
##  4 Bird Islets   2.5 Serranidae          2
##  5 Bird Islets   3   Acanthuridae        5
##  6 Bird Islets   3   Chaetodontidae      9
##  7 Bird Islets   3   Pomacentridae      29
##  8 Bird Islets   3   Serranidae          4
##  9 Bird Islets   3.1 Acanthuridae        3
## 10 Bird Islets   3.1 Chaetodontidae      7
## # … with 152 more rows
fish.sprich.fam.plot <- ggplot(fish.sprich.fam, aes(x = depth, y = sprich, color = site)) +
  geom_point() + 
  facet_wrap(.~family, scales = "free")
fish.sprich.fam.plot

3 Exercise 2

In the last plot from Exercise 1, it appears as though some sites have higher species richness than others. Let’s further examine why species richness across sites using “site_info.csv,” which includes metadata on the exposure of each site.

3.1 Part I

  1. Create a dataset that summarizes species richness by surveyid and site.
  2. Join this species richness data with the site exposure metadata.
fish.sprich.site <- fish.abundance %>%
  select(surveyid, site, genspe) %>%
  distinct() %>%
  group_by(surveyid, site) %>%
  summarize(sprich = n()) %>%
  left_join(site.info)
## `summarise()` has grouped output by 'surveyid'. You can override using the
## `.groups` argument.
## Joining, by = "site"
fish.sprich.site
## # A tibble: 62 × 4
## # Groups:   surveyid [62]
##    surveyid site                 sprich exposure
##       <int> <chr>                 <int> <chr>   
##  1  4000720 Watsons Bay north        87 lagoon  
##  2  4000721 Watsons Bay north        84 lagoon  
##  3  4000722 Watsons-Turtle Reef      71 lagoon  
##  4  4000723 Watsons-Turtle Reef      82 lagoon  
##  5  4000724 Horseshoe Reef           60 lagoon  
##  6  4000725 Horseshoe Reef           68 lagoon  
##  7  4000726 Vickys Reef              66 lagoon  
##  8  4000727 Vickys Reef              82 lagoon  
##  9  4000728 Mermaid Cove dropoff     88 exposed 
## 10  4000729 Mermaid Cove dropoff     68 exposed 
## # … with 52 more rows

3.2 Part II

  1. Use a violin plot to visualize the species richness of exposed vs. lagoon sites.
fish.sprich.site.plot <- ggplot(fish.sprich.site, aes(x = exposure, y = sprich, fill = exposure)) +
  geom_violin(draw_quantiles = c(0.025, 0.5, 0.975)) +
  geom_jitter(width = 0.1)
fish.sprich.site.plot

3.3 Part III

  1. Calculate the average abundance of each family in a given survey.
  2. Plot the average abundance using density curves.
  3. Bonus: Transform the x-axis to make the plot more useful.
# 1) Calculate the average abundance of each family in a given survey.
fish.abun.survey <- fish.abundance %>%
  group_by(surveyid, family) %>%
  summarize(total.fish = sum(total))
## `summarise()` has grouped output by 'surveyid'. You can override using the
## `.groups` argument.
fish.abun.survey
## # A tibble: 1,249 × 3
## # Groups:   surveyid [62]
##    surveyid family         total.fish
##       <int> <chr>               <int>
##  1  4000720 Acanthuridae           37
##  2  4000720 Blenniidae              9
##  3  4000720 Carangidae              2
##  4  4000720 Chaetodontidae         33
##  5  4000720 Gobiidae                5
##  6  4000720 Haemulidae              3
##  7  4000720 Labridae              235
##  8  4000720 Lutjanidae              1
##  9  4000720 Microdesmidae          19
## 10  4000720 Mugilidae               4
## # … with 1,239 more rows
# 2) Plot the average abundance using density curves.
fish.abun.plot <- ggplot(fish.abun.survey, 
                        aes(x = total.fish, y = family)) +
                          geom_density_ridges(alpha = 0.5, fill = "steelblue")
fish.abun.plot
## Picking joint bandwidth of 19.9

# 3) Bonus: Transform the x-axis to make the plot more useful.
# use fct_reorder() to reorder the y-variable as descending based on the total sum of fish in each family
# use rel_min_height() to cut the tails
fish.abun.plot2 <- ggplot(fish.abun.survey, 
                        aes(x = log10(total.fish),
                            y = fct_reorder(family, total.fish, .fun = sum))) + 
                          geom_density_ridges(alpha = 0.5, rel_min_height = 0.005, fill = "steelblue") +
  xlab("Species richness (log)") +
  ylab("Fish family")
fish.abun.plot2
## Picking joint bandwidth of 0.215