Lab 3 Assignment

Author

Micah Lohr

#load packages
library(tidyverse)

── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.4.0      ✔ purrr   1.0.1 
✔ tibble  3.1.8      ✔ dplyr   1.0.10
✔ tidyr   1.3.0      ✔ stringr 1.5.0 
✔ readr   2.1.3      ✔ forcats 0.5.2 
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()

library(lubridate)


Attaching package: 'lubridate'

The following objects are masked from 'package:base':

    date, intersect, setdiff, union

library(palmerpenguins)

Essentials:

1.) Read in https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-05-04/water.csv check the format of the date column and change it using lubridate so it is correct

#read in the water data
water<-read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-05-04/water.csv')

Rows: 473293 Columns: 13
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (9): report_date, status_id, water_source, water_tech, facility_type, co...
dbl (4): row_id, lat_deg, lon_deg, install_year

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

head(water)

# A tibble: 6 × 13
  row_id lat_deg lon_deg repor…¹ statu…² water…³ water…⁴ facil…⁵ count…⁶ insta…⁷
   <dbl>   <dbl>   <dbl> <chr>   <chr>   <chr>   <chr>   <chr>   <chr>     <dbl>
1   3957   8.07     38.6 04/06/… y       <NA>    <NA>    <NA>    Ethiop…      NA
2  33512   7.37     40.5 08/04/… y       Protec… <NA>    Improv… Ethiop…    2019
3  35125   0.773    34.9 03/18/… y       Protec… <NA>    Improv… Kenya        NA
4  37760   0.781    35.0 03/18/… y       Boreho… <NA>    Improv… Kenya        NA
5  38118   0.779    35.0 03/18/… y       Protec… <NA>    Improv… Kenya        NA
6  38501   0.308    34.1 03/19/… y       Boreho… <NA>    Improv… Kenya        NA
# … with 3 more variables: installer <chr>, pay <chr>, status <chr>, and
#   abbreviated variable names ¹report_date, ²status_id, ³water_source,
#   ⁴water_tech, ⁵facility_type, ⁶country_name, ⁷install_year

str(water)

spc_tbl_ [473,293 × 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ row_id       : num [1:473293] 3957 33512 35125 37760 38118 ...
 $ lat_deg      : num [1:473293] 8.073 7.374 0.773 0.781 0.779 ...
 $ lon_deg      : num [1:473293] 38.6 40.5 34.9 35 35 ...
 $ report_date  : chr [1:473293] "04/06/2017" "08/04/2020" "03/18/2015" "03/18/2015" ...
 $ status_id    : chr [1:473293] "y" "y" "y" "y" ...
 $ water_source : chr [1:473293] NA "Protected Spring" "Protected Shallow Well" "Borehole" ...
 $ water_tech   : chr [1:473293] NA NA NA NA ...
 $ facility_type: chr [1:473293] NA "Improved" "Improved" "Improved" ...
 $ country_name : chr [1:473293] "Ethiopia" "Ethiopia" "Kenya" "Kenya" ...
 $ install_year : num [1:473293] NA 2019 NA NA NA ...
 $ installer    : chr [1:473293] "Private-CRS" "WaterAid" NA NA ...
 $ pay          : chr [1:473293] NA NA NA NA ...
 $ status       : chr [1:473293] NA NA NA NA ...
 - attr(*, "spec")=
  .. cols(
  ..   row_id = col_double(),
  ..   lat_deg = col_double(),
  ..   lon_deg = col_double(),
  ..   report_date = col_character(),
  ..   status_id = col_character(),
  ..   water_source = col_character(),
  ..   water_tech = col_character(),
  ..   facility_type = col_character(),
  ..   country_name = col_character(),
  ..   install_year = col_double(),
  ..   installer = col_character(),
  ..   pay = col_character(),
  ..   status = col_character()
  .. )
 - attr(*, "problems")=<externalptr>

water$report_date<-mdy(water$report_date)
str(water)

spc_tbl_ [473,293 × 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ row_id       : num [1:473293] 3957 33512 35125 37760 38118 ...
 $ lat_deg      : num [1:473293] 8.073 7.374 0.773 0.781 0.779 ...
 $ lon_deg      : num [1:473293] 38.6 40.5 34.9 35 35 ...
 $ report_date  : Date[1:473293], format: "2017-04-06" "2020-08-04" ...
 $ status_id    : chr [1:473293] "y" "y" "y" "y" ...
 $ water_source : chr [1:473293] NA "Protected Spring" "Protected Shallow Well" "Borehole" ...
 $ water_tech   : chr [1:473293] NA NA NA NA ...
 $ facility_type: chr [1:473293] NA "Improved" "Improved" "Improved" ...
 $ country_name : chr [1:473293] "Ethiopia" "Ethiopia" "Kenya" "Kenya" ...
 $ install_year : num [1:473293] NA 2019 NA NA NA ...
 $ installer    : chr [1:473293] "Private-CRS" "WaterAid" NA NA ...
 $ pay          : chr [1:473293] NA NA NA NA ...
 $ status       : chr [1:473293] NA NA NA NA ...
 - attr(*, "spec")=
  .. cols(
  ..   row_id = col_double(),
  ..   lat_deg = col_double(),
  ..   lon_deg = col_double(),
  ..   report_date = col_character(),
  ..   status_id = col_character(),
  ..   water_source = col_character(),
  ..   water_tech = col_character(),
  ..   facility_type = col_character(),
  ..   country_name = col_character(),
  ..   install_year = col_double(),
  ..   installer = col_character(),
  ..   pay = col_character(),
  ..   status = col_character()
  .. )
 - attr(*, "problems")=<externalptr>

2 Read in some data from Justin’s PhD!

bzcoral<-read.csv('https://raw.githubusercontent.com/jbaumann3/BIOL234_Biostats_MHC/main/coralcover.csv')
head(bzcoral)

  site       type lat transect diver cc_percent
1    1  Back Reef   3        1     4  5.8435758
2    1  Back Reef   3        2     4  0.9505263
3    1  Back Reef   3        3     4  5.2423389
4    1  Back Reef   3        4     5  5.0040475
5    1  Back Reef   3        5     5  5.8954916
6    2 Patch Reef   3        1     4  5.2826190

3 Look at the data and generate a hypothesis to test (X, Y, and/or Z has no effect on coral cover (cc_percent), for example). cc_percent is the only numerical variable we care about here! Everything else is categorical. Write your hypothesis in BOLD below.

H0: Reef type will have no effect on coral cover (cc_percent)

Ha: Reef type will have an effect on coral cover (cc_percent)

4 Filter our the columns that you are not using for your hypothesis test

bzcoral2<-bzcoral
head(bzcoral2)

  site       type lat transect diver cc_percent
1    1  Back Reef   3        1     4  5.8435758
2    1  Back Reef   3        2     4  0.9505263
3    1  Back Reef   3        3     4  5.2423389
4    1  Back Reef   3        4     5  5.0040475
5    1  Back Reef   3        5     5  5.8954916
6    2 Patch Reef   3        1     4  5.2826190

bzcoral3<-bzcoral2[,c(2,6)]
head(bzcoral3)

        type cc_percent
1  Back Reef  5.8435758
2  Back Reef  0.9505263
3  Back Reef  5.2423389
4  Back Reef  5.0040475
5  Back Reef  5.8954916
6 Patch Reef  5.2826190

5 Using the pipe, %>%, group your data and calcualte the mean(s) you need for your visual hypothesis test

meancc<-bzcoral3 %>% 
group_by(type) %>% 
summarize(meancc = mean(cc_percent), sd=sd(cc_percent),n=n(),se=sd/sqrt(n)) 
meancc

# A tibble: 3 × 5
  type       meancc    sd     n    se
  <chr>       <dbl> <dbl> <int> <dbl>
1 Back Reef   11.9   9.52    29 1.77 
2 Nearshore    3.16  2.25    18 0.531
3 Patch Reef  12.7  11.5     30 2.11

6 Graph your results! Means + errorbars required :) Make a nice, easy to see graph with clear labels and text

ggplot(data=meancc,aes(x=type,y=meancc, color=type))+
  geom_point()+
  geom_jitter(data=bzcoral2, aes(x=type,y=cc_percent,color=type),alpha=0.3,size=0.3)+
  geom_errorbar(data=meancc,aes(x=type,ymin=meancc-se,ymax=meancc+se, width=0.2))+
  theme_classic()+
  theme(axis.text=element_text(size=10))+
  labs(x='Reef Type',y='Average Coral Cover (%)',title='Coral Cover by Reef Type')+
  theme(plot.title=element_text(hjust=0.5))

7 Assess your hypothesis! What does your graph show (Note: We did not do stats, so please do not say ‘significant’)

Based on the graph, our Ha appears to be supported even though we cannot say for sure without doing the stats: there appears to be an association between reef type and % coral cover. Specifically, nearshore reefs seem to have lower % coral cover compared to back and patch reefs.

# Depth

1: Read in intertidal transect data. View it, identify the columns that contain species/cover items and pivot from wide to long format!

op<-read.csv('https://raw.githubusercontent.com/jbaumann3/BIOL234_Biostats_MHC/main/Ocean_Point_Data%202018.csv')
head(op)

  Number  trans_description year tidal_ht_m tidal_ht_cat_num bare_rock calothr
1      0 a_cobble_protected 2017      -0.13                1         0       0
2      0 a_cobble_protected 2017      -0.13                1         5       0
3      0 a_cobble_protected 2017      -0.13                1         5       0
4      0 a_cobble_protected 2018       0.38                2        10      12
5      0 a_cobble_protected 2018       0.38                2         0      93
6      0 a_cobble_protected 2018       0.78                2         0      65
  semibal mytilus Halichondria Spongomorpha chaetomo ulva_lac ascophyl lamin_di
1       0       0            0            0        0        0        0        0
2       0       0            0            0        0        5        0        0
3       0       0            0            0        0        0        0        0
4       8       0            0            0        0        0       10        0
5       7       0            0            0        0        0        0        0
6      30       0            0            0        0        0        0        0
  lamin_sa fucus_spir fucus_vesic fucus_dist fucus_sp chondrus mastocar
1        0          0           0          0        0        0       40
2        0          0           5          0        0        0       20
3        0          0           5          0        0        0       20
4        0          0          60          0        0        0        0
5        0          0           0          0        0        0        0
6        0          0           5          0        0        0        0
  petrocel corralina ceramium lithothamn Enteromorpha tectura litt_sax litt_lit
1        0         2        0         58            0       0        0       64
2        0         5        0         60            0       0        0        0
3        0        15        0         55            0       0        0        0
4        0         0        0          0            0       0        0      112
5        0         0        0          0            0       0        0      112
6        0         0        0          0            0       0        0        0
  litt_obt nucella carcinus Cancer.crabs
1        0       0        0            0
2        0       0        0            0
3       16       0        0            0
4       80       0        0            0
5       16       0        0            0
6        0       0        0            0

long_op<- op %>%
  pivot_longer(bare_rock:Cancer.crabs,names_to='species',values_to='abundance')

head(long_op)

# A tibble: 6 × 7
  Number trans_description   year tidal_ht_m tidal_ht_cat_num species    abund…¹
   <int> <chr>              <int>      <dbl>            <int> <chr>        <dbl>
1      0 a_cobble_protected  2017      -0.13                1 bare_rock        0
2      0 a_cobble_protected  2017      -0.13                1 calothr          0
3      0 a_cobble_protected  2017      -0.13                1 semibal          0
4      0 a_cobble_protected  2017      -0.13                1 mytilus          0
5      0 a_cobble_protected  2017      -0.13                1 Halichond…       0
6      0 a_cobble_protected  2017      -0.13                1 Spongomor…       0
# … with abbreviated variable name ¹abundance

2 Filter data such that we retain only ‘species’ that are animals. These include: Semibal, Carinbus, Cancer Crabs, nucella, litt_obt, litt_lit, litt_sax. Everything else is either rock or algae. Please do this filter step AFTER you pivot to long format. Note: It is easier to do this when you are in wide format, but this is good practice! Ask questions if you need help :). Hint: The ‘|’ key is how you say ‘or’ in code :)

animalsonly<-filter(long_op,species=='semibal'|species== 'carcinus'|species== 'Cancer.crabs' | species=='nucella'|species== 'litt_obt'|species=='litt_lit'|species== 'litt_sax')
head(animalsonly)

# A tibble: 6 × 7
  Number trans_description   year tidal_ht_m tidal_ht_cat_num species  abundance
   <int> <chr>              <int>      <dbl>            <int> <chr>        <dbl>
1      0 a_cobble_protected  2017      -0.13                1 semibal          0
2      0 a_cobble_protected  2017      -0.13                1 litt_sax         0
3      0 a_cobble_protected  2017      -0.13                1 litt_lit        64
4      0 a_cobble_protected  2017      -0.13                1 litt_obt         0
5      0 a_cobble_protected  2017      -0.13                1 nucella          0
6      0 a_cobble_protected  2017      -0.13                1 carcinus         0

2 Using the same data–> rename the factors in the trans_description column based on wave exposure. a_cobble_protected is low, c_flat_profile and d_semi_expos are both moderate, e_exposed is high. Hint: use ifelse(). Justin can help!

animalsonly$wave<-ifelse(animalsonly$trans_description=="a_cobble_protected","low",
                         ifelse(animalsonly$trans_description=="e_exposed","high", "moderate"))
head(animalsonly)

# A tibble: 6 × 8
  Number trans_description   year tidal_ht_m tidal_ht_ca…¹ species abund…² wave 
   <int> <chr>              <int>      <dbl>         <int> <chr>     <dbl> <chr>
1      0 a_cobble_protected  2017      -0.13             1 semibal       0 low  
2      0 a_cobble_protected  2017      -0.13             1 litt_s…       0 low  
3      0 a_cobble_protected  2017      -0.13             1 litt_l…      64 low  
4      0 a_cobble_protected  2017      -0.13             1 litt_o…       0 low  
5      0 a_cobble_protected  2017      -0.13             1 nucella       0 low  
6      0 a_cobble_protected  2017      -0.13             1 carcin…       0 low  
# … with abbreviated variable names ¹tidal_ht_cat_num, ²abundance

3 Using the same data, make a simplified tidal height column. We want tidal height cat <4 to be ‘low’, >7 to be m’oderate’high’, and in between to be ‘intermediate’. Hint: You can use if_else for this as well!

animalsonly$tidalheight<-ifelse(animalsonly$tidal_ht_cat_num<4,"low",
                                ifelse(animalsonly$tidal_ht_cat_num>7,"high","intermediate"))
head(animalsonly)

# A tibble: 6 × 9
  Number trans_description   year tidal_…¹ tidal…² species abund…³ wave  tidal…⁴
   <int> <chr>              <int>    <dbl>   <int> <chr>     <dbl> <chr> <chr>  
1      0 a_cobble_protected  2017    -0.13       1 semibal       0 low   low    
2      0 a_cobble_protected  2017    -0.13       1 litt_s…       0 low   low    
3      0 a_cobble_protected  2017    -0.13       1 litt_l…      64 low   low    
4      0 a_cobble_protected  2017    -0.13       1 litt_o…       0 low   low    
5      0 a_cobble_protected  2017    -0.13       1 nucella       0 low   low    
6      0 a_cobble_protected  2017    -0.13       1 carcin…       0 low   low    
# … with abbreviated variable names ¹tidal_ht_m, ²tidal_ht_cat_num, ³abundance,
#   ⁴tidalheight

4 Using the same dataframe, build 2 new dataframes, one calculating mean and error (standard error) for semibal (barnacle) abundance and one doing the same for nucella (whelk) abundance by tidal height and wave exposure (trans_description)

semibal<-filter(animalsonly,species=='semibal') %>%
group_by(wave,tidalheight) %>%
summarize(mean = mean(abundance), sd=sd(abundance),n=n(),se=sd/sqrt(n))

`summarise()` has grouped output by 'wave'. You can override using the
`.groups` argument.

semibal

# A tibble: 9 × 6
# Groups:   wave [3]
  wave     tidalheight    mean    sd     n    se
  <chr>    <chr>         <dbl> <dbl> <int> <dbl>
1 high     high          0      0       12 0    
2 high     intermediate 32.4   30.4     18 7.17 
3 high     low           6.11   7.82     9 2.61 
4 low      high          0      0        9 0    
5 low      intermediate  2.24   5.24    17 1.27 
6 low      low           6.11   9.53     9 3.18 
7 moderate high          0      0       18 0    
8 moderate intermediate 14.4   20.8     36 3.47 
9 moderate low           0.593  1.45    27 0.279

nucella<-filter(animalsonly,species=='nucella') %>%
group_by(wave,tidalheight) %>%
summarize(mean = mean(abundance), sd=sd(abundance),n=n(),se=sd/sqrt(n))

`summarise()` has grouped output by 'wave'. You can override using the
`.groups` argument.

nucella

# A tibble: 9 × 6
# Groups:   wave [3]
  wave     tidalheight   mean    sd     n    se
  <chr>    <chr>        <dbl> <dbl> <int> <dbl>
1 high     high          0     0       12  0   
2 high     intermediate  0     0       18  0   
3 high     low           1.78  5.33     9  1.78
4 low      high          0     0        9  0   
5 low      intermediate  0     0       17  0   
6 low      low           0     0        9  0   
7 moderate high          0     0       18  0   
8 moderate intermediate  4.44 13.0     36  2.17
9 moderate low           2.96  8.92    27  1.72

5 Plot barnacle and snail abundance + error across tidal heights & wave exposures. You can plot them both on the same graphs (merging your dataframes could help) or you can make multiple graphs and patchwork them together.

semibal$species="semibal"
nucella$species="nucella"
combined <-rbind(semibal,nucella)
combined

# A tibble: 18 × 7
# Groups:   wave [3]
   wave     tidalheight    mean    sd     n    se species
   <chr>    <chr>         <dbl> <dbl> <int> <dbl> <chr>  
 1 high     high          0      0       12 0     semibal
 2 high     intermediate 32.4   30.4     18 7.17  semibal
 3 high     low           6.11   7.82     9 2.61  semibal
 4 low      high          0      0        9 0     semibal
 5 low      intermediate  2.24   5.24    17 1.27  semibal
 6 low      low           6.11   9.53     9 3.18  semibal
 7 moderate high          0      0       18 0     semibal
 8 moderate intermediate 14.4   20.8     36 3.47  semibal
 9 moderate low           0.593  1.45    27 0.279 semibal
10 high     high          0      0       12 0     nucella
11 high     intermediate  0      0       18 0     nucella
12 high     low           1.78   5.33     9 1.78  nucella
13 low      high          0      0        9 0     nucella
14 low      intermediate  0      0       17 0     nucella
15 low      low           0      0        9 0     nucella
16 moderate high          0      0       18 0     nucella
17 moderate intermediate  4.44  13.0     36 2.17  nucella
18 moderate low           2.96   8.92    27 1.72  nucella

pd=position_dodge(width=0.2)

ggplot(data=combined,aes(x=tidalheight,y=mean, color=wave))+
  geom_point(position=pd)+
  geom_errorbar(data=combined,aes(x=tidalheight,ymin=mean-se,ymax=mean+se, width=0.2),position=pd)+
  theme_classic()+
  theme(axis.text=element_text(size=10))+
  labs(x='Tidal Height',y='Average Species Abundance')+
  theme(plot.title=element_text(hjust=0.5))+
  facet_wrap(~species)

6 Write a short interpretative statement. We didn’t run any stats, so avoid the word ‘significant.’ How do snail and whelk abundances appear to vary by tidal height and wave exposure? I do not need you to tell me anything about the ecology of the system, but feel free to do so if you’d like :) I just need you to interpret your graphs.

It appears that neither nucella or semibal appear at high tidal heights, regardless of wave exposure. At intermediate tidal heights, it appears that nucella and semibal abundances are different at different wave exposures. The most striking difference in their average abundances is at intermediate tidal heights combined with high wave exposure. It appears that the semibal is most abundance in these conditions of intermediate tidal height and high wave exposure. Nucella are most abundant at intermediate tidal height and moderate wave exposure.