Attaching package: 'lubridate'
The following objects are masked from 'package:base':
date, intersect, setdiff, union
library(palmerpenguins)
Essentials:
1.) Read in https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-05-04/water.csv check the format of the date column and change it using lubridate so it is correct
#read in the water datawater<-read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-05-04/water.csv')
Rows: 473293 Columns: 13
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (9): report_date, status_id, water_source, water_tech, facility_type, co...
dbl (4): row_id, lat_deg, lon_deg, install_year
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(water)
# A tibble: 6 × 13
row_id lat_deg lon_deg repor…¹ statu…² water…³ water…⁴ facil…⁵ count…⁶ insta…⁷
<dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
1 3957 8.07 38.6 04/06/… y <NA> <NA> <NA> Ethiop… NA
2 33512 7.37 40.5 08/04/… y Protec… <NA> Improv… Ethiop… 2019
3 35125 0.773 34.9 03/18/… y Protec… <NA> Improv… Kenya NA
4 37760 0.781 35.0 03/18/… y Boreho… <NA> Improv… Kenya NA
5 38118 0.779 35.0 03/18/… y Protec… <NA> Improv… Kenya NA
6 38501 0.308 34.1 03/19/… y Boreho… <NA> Improv… Kenya NA
# … with 3 more variables: installer <chr>, pay <chr>, status <chr>, and
# abbreviated variable names ¹report_date, ²status_id, ³water_source,
# ⁴water_tech, ⁵facility_type, ⁶country_name, ⁷install_year
# Change format of date column with lubridatewater$report_date <-mdy(water$report_date)str(water)
spc_tbl_ [473,293 × 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ row_id : num [1:473293] 3957 33512 35125 37760 38118 ...
$ lat_deg : num [1:473293] 8.073 7.374 0.773 0.781 0.779 ...
$ lon_deg : num [1:473293] 38.6 40.5 34.9 35 35 ...
$ report_date : Date[1:473293], format: "2017-04-06" "2020-08-04" ...
$ status_id : chr [1:473293] "y" "y" "y" "y" ...
$ water_source : chr [1:473293] NA "Protected Spring" "Protected Shallow Well" "Borehole" ...
$ water_tech : chr [1:473293] NA NA NA NA ...
$ facility_type: chr [1:473293] NA "Improved" "Improved" "Improved" ...
$ country_name : chr [1:473293] "Ethiopia" "Ethiopia" "Kenya" "Kenya" ...
$ install_year : num [1:473293] NA 2019 NA NA NA ...
$ installer : chr [1:473293] "Private-CRS" "WaterAid" NA NA ...
$ pay : chr [1:473293] NA NA NA NA ...
$ status : chr [1:473293] NA NA NA NA ...
- attr(*, "spec")=
.. cols(
.. row_id = col_double(),
.. lat_deg = col_double(),
.. lon_deg = col_double(),
.. report_date = col_character(),
.. status_id = col_character(),
.. water_source = col_character(),
.. water_tech = col_character(),
.. facility_type = col_character(),
.. country_name = col_character(),
.. install_year = col_double(),
.. installer = col_character(),
.. pay = col_character(),
.. status = col_character()
.. )
- attr(*, "problems")=<externalptr>
site type lat transect diver cc_percent
1 1 Back Reef 3 1 4 5.8435758
2 1 Back Reef 3 2 4 0.9505263
3 1 Back Reef 3 3 4 5.2423389
4 1 Back Reef 3 4 5 5.0040475
5 1 Back Reef 3 5 5 5.8954916
6 2 Patch Reef 3 1 4 5.2826190
3 Look at the data and generate a hypothesis to test (X, Y, and/or Z has no effect on coral cover (cc_percent), for example). cc_percent is the only numerical variable we care about here! Everything else is categorical. Write your hypothesis in BOLD below.
Null hypothesis: Reef type has no effect on coral cover. Alternative hypothesis: Reef type has an effect on coral cover.
4 Filter out the columns that you are not using for your hypothesis test
type cc_percent
1 Back Reef 5.8435758
2 Back Reef 0.9505263
3 Back Reef 5.2423389
4 Back Reef 5.0040475
5 Back Reef 5.8954916
6 Patch Reef 5.2826190
5 Using the pipe, %>%, group your data and calculate the mean(s) you need for your visual hypothesis test
bzcoral_mean <- bzcoral_sel %>%group_by(type) %>%summarise(meancc_percent =mean(cc_percent), sd =sd(cc_percent), n =n(), se = sd/sqrt(n))
6 Graph your results! Means + errorbars required :) Make a nice, easy to see graph with clear labels and text
ggplot(data = bzcoral_mean, aes(x = type, y = meancc_percent, color = type)) +geom_point() +geom_errorbar(data = bzcoral_mean, aes(x = type, ymin = meancc_percent - se, ymax = meancc_percent + se)) +theme_classic() +labs(x='Reef Type', y ='Mean Coral Cover Percent', title ='Mean coral cover percent by reef type', color ='Type')
7 Assess your hypothesis! What does your graph show (Note: We did not do stats, so please do not say ‘significant’)
Mean coral cover percent is smaller in nearshore reefs compared to back reefs and patch reefs. From looking at my graph, the data seem to support my alternative hypothesis that reef type has an effect on coral cover. # Depth
1: Read in intertidal transect data. View it, identify the columns that contain species/cover items and pivot from wide to long format!
# All of the columns from bare_rock over are species/cover itemsop_long <- op %>%pivot_longer(bare_rock : Cancer.crabs, names_to ='species', values_to ='abundance')head(op_long) # Column you want to start pivoting at : Column you want to end pivoting at
2 Filter data such that we retain only ‘species’ that are animals. These include: carcinus, Cancer.crabs, nucella, litt_obt, litt_lit, litt_sax, semibal. Everything else is either rock or algae. Please do this filter step AFTER you pivot to long format. Note: It is easier to do this when you are in wide format, but this is good practice! Ask questions if you need help :). Hint: The ‘|’ key is how you say ‘or’ in code :)
op_animals <- op_long %>%filter(species =="carcinus"| species =="Cancer.crabs"| species =="nucella"| species =="litt_obt"| species =="litt_lit"| species =="litt_sax"| species =="semibal")head(op_animals)
2 Using the same data–> rename the factors in the trans_description column based on wave exposure. a_cobble_protected is low, c_flat_profile and d_semi_expos are both moderate, e_exposed is high. Hint: use ifelse(). Justin can help!
3 Using the same data, make a simplified tidal height column. We want tidal height cat <4 to be ‘low’, >7 to be m’oderate’high’, and in between to be ‘intermediate’. Hint: You can use if_else for this as well!
4 Using the same dataframe, build 2 new dataframes, one calculating mean and error (standard error) for semibal (barnacle) abundance and one doing the same for nucella (whelk) abundance by tidal height and wave exposure (trans_description)
#New df semibalsemibal <- op_animals %>%filter(species =="semibal") %>%group_by(wave, tidal_ht_simp) %>%summarize(mean =mean(abundance), sd =sd(abundance), n =n(), se = sd/sqrt(n))
`summarise()` has grouped output by 'wave'. You can override using the
`.groups` argument.
semibal$species ="Semibal"head(semibal)
# A tibble: 6 × 7
# Groups: wave [2]
wave tidal_ht_simp mean sd n se species
<chr> <chr> <dbl> <dbl> <int> <dbl> <chr>
1 High High 0 0 12 0 Semibal
2 High Intermediate 32.4 30.4 18 7.17 Semibal
3 High Low 6.11 7.82 9 2.61 Semibal
4 Low High 0 0 9 0 Semibal
5 Low Intermediate 2.24 5.24 17 1.27 Semibal
6 Low Low 6.11 9.53 9 3.18 Semibal
#New df nucellanucella <- op_animals %>%filter(species =="nucella") %>%group_by(wave, tidal_ht_simp) %>%summarize(mean =mean(abundance), sd =sd(abundance), n =n(), se = sd/sqrt(n))
`summarise()` has grouped output by 'wave'. You can override using the
`.groups` argument.
nucella$species ="Nucella"head(nucella)
# A tibble: 6 × 7
# Groups: wave [2]
wave tidal_ht_simp mean sd n se species
<chr> <chr> <dbl> <dbl> <int> <dbl> <chr>
1 High High 0 0 12 0 Nucella
2 High Intermediate 0 0 18 0 Nucella
3 High Low 1.78 5.33 9 1.78 Nucella
4 Low High 0 0 9 0 Nucella
5 Low Intermediate 0 0 17 0 Nucella
6 Low Low 0 0 9 0 Nucella
5 Plot barnacle and snail abundance + error across tidal heights & wave exposures. You can plot them both on the same graphs (merging your dataframes could help) or you can make multiple graphs and patchwork them together.
combined <-rbind(nucella, semibal)head(combined)
# A tibble: 6 × 7
# Groups: wave [2]
wave tidal_ht_simp mean sd n se species
<chr> <chr> <dbl> <dbl> <int> <dbl> <chr>
1 High High 0 0 12 0 Nucella
2 High Intermediate 0 0 18 0 Nucella
3 High Low 1.78 5.33 9 1.78 Nucella
4 Low High 0 0 9 0 Nucella
5 Low Intermediate 0 0 17 0 Nucella
6 Low Low 0 0 9 0 Nucella
library(patchwork)ggplot(data = combined, aes(x = wave, y = mean, color = tidal_ht_simp)) +geom_point() +geom_errorbar(data = combined, aes(x = wave, ymin = mean - se, ymax = mean + se)) +theme_classic() +facet_wrap(~species) +labs(x='Wave exposure', y ='Count', color ='Tidal height')
6 Write a short interpretative statement. We didn’t run any stats, so avoid the word ‘significant.’ How do snail and whelk abundances appear to vary by tidal height and wave exposure? I do not need you to tell me anything about the ecology of the system, but feel free to do so if you’d like :) I just need you to interpret your graphs.
It appears that neither Semibal or Nucella are found at high tidal heights. The data show Semibals as most abundant where there is both high wave exposure and intermediate tidal height. The data also show Nucella as most abundant when there is both moderate wave exposure and intermediate or low tidal height and high wave exposure and low tidal height.