Lab 0

Author

Zoe Ash

HI!

this is craaazy

join bates woho NOW

iwtv iwtv vamp

~~scribblies~~

penguin sits on mans chest

1.

Install R and Rstudio. Get Rstudio running on your computer and get familiar with the layout.

I installed R and RStudio - otherwise I wouldn’t have been able to make this Quarto document! I used the instructions provided for Lab 0 on our class website.

2.

Make a folder (directory) on your computer for this course and then make an Rstudio project for this course that runs from that folder.

I created a folder!

3.

Make a new quarto document. This is where you will do the lab assignment that you will turn in. Make a title and subject headers for each question. Copy the instructions and then add your work below.

You’re reading the Quarto document right now!

4.

Install packages and load packages

I installed the Packages under the Packages window in RStudio.

library(palmerpenguins)
library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(lubridate)
library(performance)
library(ggsci)
library(patchwork)

The “library” command loads the packages.

5.

Reading in those data! Also look at the data with some commands.

those_data <- read_csv("https://raw.githubusercontent.com/jbaumann3/BIOL234_Biostats_MHC/main/belize_coral_survey_data_2016.csv")

Warning: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
  dat <- vroom(...)
  problems(dat)

Rows: 5223 Columns: 23
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (4): method, type, life.history, species
dbl (19): site, lat, transect, diver, percent.of.cover, l, w, area, pale, bl...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

This pulls the .csv data file off the site online and names it “those_data”!

str(those_data) # Tells us the "attributes" of each column and the type of data that is in it. e.g. "num" is numbers!

spc_tbl_ [5,223 × 23] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ method          : chr [1:5223] "Video" "Video" "Video" "Video" ...
 $ site            : num [1:5223] 1 1 1 1 1 1 1 1 1 1 ...
 $ type            : chr [1:5223] "Back Reef" "Back Reef" "Back Reef" "Back Reef" ...
 $ lat             : num [1:5223] 3 3 3 3 3 3 3 3 3 3 ...
 $ transect        : num [1:5223] 1 1 1 1 1 1 1 1 1 1 ...
 $ diver           : num [1:5223] 4 4 4 4 4 4 4 4 4 4 ...
 $ life.history    : chr [1:5223] "Stress Tolerant" NA "Stress Tolerant" "Generalist" ...
 $ species         : chr [1:5223] "SSID" "MCOM" "SSID" "OFAV" ...
 $ percent.of.cover: num [1:5223] 0.3 0.9 0.8 0.9 0.3 0.85 0.8 0.4 0.9 0.85 ...
 $ l               : num [1:5223] 5.62 10.8 29.41 26.72 11.79 ...
 $ w               : num [1:5223] 5.62 5.4 12.94 18.28 16.07 ...
 $ area            : num [1:5223] 31.6 58.3 380.6 488.5 189.4 ...
 $ pale            : num [1:5223] 0 0 0 0 0 0 0 0 0 0 ...
 $ bleached        : num [1:5223] 0 0 0 0 0 0 0 0 0 0 ...
 $ total.pb        : num [1:5223] 0 0 0 0 0 0 0 0 0 0 ...
 $ percent.pb      : num [1:5223] 0 0 0 0 0 0 0 0 0 0 ...
 $ new             : num [1:5223] 0.7 0 0 0 0 0.1 0 0 0 0 ...
 $ trans           : num [1:5223] 0 0 0 0 0 0 0 0 0 0 ...
 $ old             : num [1:5223] 0 0 0.4 0 0 0 0 0 0 0 ...
 $ total.mort      : num [1:5223] 16 2 10 2 2 4 2 2 2 2 ...
 $ percent.mort    : num [1:5223] 70 0 40 0 0 10 0 0 0 0 ...
 $ disease         : num [1:5223] 0 0 0 0 0 0 0 0 0 0 ...
 $ percentdisease  : num [1:5223] 0 0 0 0 0 0 0 0 0 0 ...
 - attr(*, "spec")=
  .. cols(
  ..   method = col_character(),
  ..   site = col_double(),
  ..   type = col_character(),
  ..   lat = col_double(),
  ..   transect = col_double(),
  ..   diver = col_double(),
  ..   life.history = col_character(),
  ..   species = col_character(),
  ..   percent.of.cover = col_double(),
  ..   l = col_double(),
  ..   w = col_double(),
  ..   area = col_double(),
  ..   pale = col_double(),
  ..   bleached = col_double(),
  ..   total.pb = col_double(),
  ..   percent.pb = col_double(),
  ..   new = col_double(),
  ..   trans = col_double(),
  ..   old = col_double(),
  ..   total.mort = col_double(),
  ..   percent.mort = col_double(),
  ..   disease = col_double(),
  ..   percentdisease = col_double()
  .. )
 - attr(*, "problems")=<externalptr>

head(those_data) # Shows us just the top 6 rows

# A tibble: 6 × 23
  method  site type     lat transect diver life.history species percent.of.cover
  <chr>  <dbl> <chr>  <dbl>    <dbl> <dbl> <chr>        <chr>              <dbl>
1 Video      1 Back …     3        1     4 Stress Tole… SSID                0.3 
2 Video      1 Back …     3        1     4 <NA>         MCOM                0.9 
3 Video      1 Back …     3        1     4 Stress Tole… SSID                0.8 
4 Video      1 Back …     3        1     4 Generalist   OFAV                0.9 
5 Video      1 Back …     3        1     4 Weedy        UTEN                0.3 
6 Video      1 Back …     3        1     4 Stress Tole… SSID                0.85
# ℹ 14 more variables: l <dbl>, w <dbl>, area <dbl>, pale <dbl>,
#   bleached <dbl>, total.pb <dbl>, percent.pb <dbl>, new <dbl>, trans <dbl>,
#   old <dbl>, total.mort <dbl>, percent.mort <dbl>, disease <dbl>,
#   percentdisease <dbl>

tail(those_data) # Shows us just the bottom 6 rows

# A tibble: 6 × 23
  method  site type     lat transect diver life.history species percent.of.cover
  <chr>  <dbl> <chr>  <dbl>    <dbl> <dbl> <chr>        <chr>              <dbl>
1 AGRRA     14 Patch…     1        6     3 Weedy        PAST                  NA
2 AGRRA     14 Patch…     1        6     3 <NA>         PDIV                  NA
3 AGRRA     14 Patch…     1        6     3 Stress Tole… SSID                  NA
4 AGRRA     14 Patch…     1        6     3 <NA>         MCOM                  NA
5 AGRRA     14 Patch…     1        6     3 Weedy        PAST                  NA
6 AGRRA     14 Patch…     1        6     3 Weedy        PAST                  NA
# ℹ 14 more variables: l <dbl>, w <dbl>, area <dbl>, pale <dbl>,
#   bleached <dbl>, total.pb <dbl>, percent.pb <dbl>, new <dbl>, trans <dbl>,
#   old <dbl>, total.mort <dbl>, percent.mort <dbl>, disease <dbl>,
#   percentdisease <dbl>

summary(those_data) # Gives statistics like mean and median for each column

    method               site            type                lat       
 Length:5223        Min.   : 1.000   Length:5223        Min.   :1.000  
 Class :character   1st Qu.: 4.000   Class :character   1st Qu.:2.000  
 Mode  :character   Median : 7.000   Mode  :character   Median :3.000  
                    Mean   : 7.366                      Mean   :3.196  
                    3rd Qu.:11.000                      3rd Qu.:4.000  
                    Max.   :14.000                      Max.   :5.000  
                                                                       
    transect         diver       life.history         species         
 Min.   :1.000   Min.   :1.000   Length:5223        Length:5223       
 1st Qu.:2.000   1st Qu.:2.000   Class :character   Class :character  
 Median :4.000   Median :4.000   Mode  :character   Mode  :character  
 Mean   :3.545   Mean   :3.433                                        
 3rd Qu.:5.000   3rd Qu.:5.000                                        
 Max.   :7.000   Max.   :5.000                                        
                                                                      
 percent.of.cover       l                w                 area          
 Min.   :0.1000   Min.   :  2.57   Min.   :   1.500   Min.   :     4.50  
 1st Qu.:0.6500   1st Qu.:  8.75   1st Qu.:   6.924   1st Qu.:    54.93  
 Median :0.8000   Median : 14.22   Median :  10.723   Median :   150.00  
 Mean   :0.7351   Mean   : 22.19   Mean   :  17.707   Mean   :   750.69  
 3rd Qu.:0.9000   3rd Qu.: 26.92   3rd Qu.:  20.000   3rd Qu.:   506.42  
 Max.   :7.0000   Max.   :600.00   Max.   :4250.000   Max.   :216750.00  
 NA's   :2286                      NA's   :1          NA's   :1          
      pale            bleached           total.pb         percent.pb     
 Min.   :0.00000   Min.   :0.000000   Min.   :0.00000   Min.   :  0.000  
 1st Qu.:0.00000   1st Qu.:0.000000   1st Qu.:0.00000   1st Qu.:  0.000  
 Median :0.00000   Median :0.000000   Median :0.00000   Median :  0.000  
 Mean   :0.03855   Mean   :0.009736   Mean   :0.04829   Mean   :  4.829  
 3rd Qu.:0.00000   3rd Qu.:0.000000   3rd Qu.:0.00000   3rd Qu.:  0.000  
 Max.   :1.00000   Max.   :1.000000   Max.   :1.15000   Max.   :115.000  
                                                                         
      new              trans              old            total.mort    
 Min.   :0.00000   Min.   :0.00000   Min.   :0.00000   Min.   : 1.000  
 1st Qu.:0.00000   1st Qu.:0.00000   1st Qu.:0.00000   1st Qu.: 2.000  
 Median :0.00000   Median :0.00000   Median :0.00000   Median : 2.000  
 Mean   :0.01295   Mean   :0.01374   Mean   :0.02827   Mean   : 3.099  
 3rd Qu.:0.00000   3rd Qu.:0.00000   3rd Qu.:0.00000   3rd Qu.: 2.000  
 Max.   :1.00000   Max.   :0.95000   Max.   :0.95000   Max.   :22.000  
 NA's   :1         NA's   :1         NA's   :4                         
  percent.mort        disease        percentdisease   
 Min.   :  0.000   Min.   :0.00000   Min.   :  0.000  
 1st Qu.:  0.000   1st Qu.:0.00000   1st Qu.:  0.000  
 Median :  0.000   Median :0.00000   Median :  0.000  
 Mean   :  5.495   Mean   :0.02834   Mean   :  2.834  
 3rd Qu.:  0.000   3rd Qu.:0.00000   3rd Qu.:  0.000  
 Max.   :100.000   Max.   :1.00000   Max.   :100.000  
 NA's   :1

nrow(those_data) # Number of rows in dataset

[1] 5223

ncol(those_data) # Number of columns in dataset

[1] 23

6.

Change a numeric column to a factor. (And add a new column!)

#change data to a factor
those_data$transect <- as.factor(those_data$transect) # This tells R to view the data in this column as a "factor" (i.e., as qualitative)
those_data$site <- as.factor(those_data$site)
those_data$site2 <- as.factor(those_data$site) # This makes a new column based off the site column that is considered a "factor"
str(those_data)

spc_tbl_ [5,223 × 24] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ method          : chr [1:5223] "Video" "Video" "Video" "Video" ...
 $ site            : Factor w/ 13 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ type            : chr [1:5223] "Back Reef" "Back Reef" "Back Reef" "Back Reef" ...
 $ lat             : num [1:5223] 3 3 3 3 3 3 3 3 3 3 ...
 $ transect        : Factor w/ 7 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ diver           : num [1:5223] 4 4 4 4 4 4 4 4 4 4 ...
 $ life.history    : chr [1:5223] "Stress Tolerant" NA "Stress Tolerant" "Generalist" ...
 $ species         : chr [1:5223] "SSID" "MCOM" "SSID" "OFAV" ...
 $ percent.of.cover: num [1:5223] 0.3 0.9 0.8 0.9 0.3 0.85 0.8 0.4 0.9 0.85 ...
 $ l               : num [1:5223] 5.62 10.8 29.41 26.72 11.79 ...
 $ w               : num [1:5223] 5.62 5.4 12.94 18.28 16.07 ...
 $ area            : num [1:5223] 31.6 58.3 380.6 488.5 189.4 ...
 $ pale            : num [1:5223] 0 0 0 0 0 0 0 0 0 0 ...
 $ bleached        : num [1:5223] 0 0 0 0 0 0 0 0 0 0 ...
 $ total.pb        : num [1:5223] 0 0 0 0 0 0 0 0 0 0 ...
 $ percent.pb      : num [1:5223] 0 0 0 0 0 0 0 0 0 0 ...
 $ new             : num [1:5223] 0.7 0 0 0 0 0.1 0 0 0 0 ...
 $ trans           : num [1:5223] 0 0 0 0 0 0 0 0 0 0 ...
 $ old             : num [1:5223] 0 0 0.4 0 0 0 0 0 0 0 ...
 $ total.mort      : num [1:5223] 16 2 10 2 2 4 2 2 2 2 ...
 $ percent.mort    : num [1:5223] 70 0 40 0 0 10 0 0 0 0 ...
 $ disease         : num [1:5223] 0 0 0 0 0 0 0 0 0 0 ...
 $ percentdisease  : num [1:5223] 0 0 0 0 0 0 0 0 0 0 ...
 $ site2           : Factor w/ 13 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
 - attr(*, "spec")=
  .. cols(
  ..   method = col_character(),
  ..   site = col_double(),
  ..   type = col_character(),
  ..   lat = col_double(),
  ..   transect = col_double(),
  ..   diver = col_double(),
  ..   life.history = col_character(),
  ..   species = col_character(),
  ..   percent.of.cover = col_double(),
  ..   l = col_double(),
  ..   w = col_double(),
  ..   area = col_double(),
  ..   pale = col_double(),
  ..   bleached = col_double(),
  ..   total.pb = col_double(),
  ..   percent.pb = col_double(),
  ..   new = col_double(),
  ..   trans = col_double(),
  ..   old = col_double(),
  ..   total.mort = col_double(),
  ..   percent.mort = col_double(),
  ..   disease = col_double(),
  ..   percentdisease = col_double()
  .. )
 - attr(*, "problems")=<externalptr>

# to make a new column instead, do dataframe$columnnew <- as.factor(dataframe$column)
# or dataframe$columnnew <- "whatever i want in there"

Like we did in class, I chose to do this with site. To make sure I had the hang of it, I also did it with transect.

I also wanted to be sure I could switch them back…

those_data$site <- as.numeric(as.character(those_data$site))
those_data$transect <- as.numeric(as.character(those_data$transect)) # These commands set the data to be considered as quantitative again
str(those_data)

spc_tbl_ [5,223 × 24] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ method          : chr [1:5223] "Video" "Video" "Video" "Video" ...
 $ site            : num [1:5223] 1 1 1 1 1 1 1 1 1 1 ...
 $ type            : chr [1:5223] "Back Reef" "Back Reef" "Back Reef" "Back Reef" ...
 $ lat             : num [1:5223] 3 3 3 3 3 3 3 3 3 3 ...
 $ transect        : num [1:5223] 1 1 1 1 1 1 1 1 1 1 ...
 $ diver           : num [1:5223] 4 4 4 4 4 4 4 4 4 4 ...
 $ life.history    : chr [1:5223] "Stress Tolerant" NA "Stress Tolerant" "Generalist" ...
 $ species         : chr [1:5223] "SSID" "MCOM" "SSID" "OFAV" ...
 $ percent.of.cover: num [1:5223] 0.3 0.9 0.8 0.9 0.3 0.85 0.8 0.4 0.9 0.85 ...
 $ l               : num [1:5223] 5.62 10.8 29.41 26.72 11.79 ...
 $ w               : num [1:5223] 5.62 5.4 12.94 18.28 16.07 ...
 $ area            : num [1:5223] 31.6 58.3 380.6 488.5 189.4 ...
 $ pale            : num [1:5223] 0 0 0 0 0 0 0 0 0 0 ...
 $ bleached        : num [1:5223] 0 0 0 0 0 0 0 0 0 0 ...
 $ total.pb        : num [1:5223] 0 0 0 0 0 0 0 0 0 0 ...
 $ percent.pb      : num [1:5223] 0 0 0 0 0 0 0 0 0 0 ...
 $ new             : num [1:5223] 0.7 0 0 0 0 0.1 0 0 0 0 ...
 $ trans           : num [1:5223] 0 0 0 0 0 0 0 0 0 0 ...
 $ old             : num [1:5223] 0 0 0.4 0 0 0 0 0 0 0 ...
 $ total.mort      : num [1:5223] 16 2 10 2 2 4 2 2 2 2 ...
 $ percent.mort    : num [1:5223] 70 0 40 0 0 10 0 0 0 0 ...
 $ disease         : num [1:5223] 0 0 0 0 0 0 0 0 0 0 ...
 $ percentdisease  : num [1:5223] 0 0 0 0 0 0 0 0 0 0 ...
 $ site2           : Factor w/ 13 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
 - attr(*, "spec")=
  .. cols(
  ..   method = col_character(),
  ..   site = col_double(),
  ..   type = col_character(),
  ..   lat = col_double(),
  ..   transect = col_double(),
  ..   diver = col_double(),
  ..   life.history = col_character(),
  ..   species = col_character(),
  ..   percent.of.cover = col_double(),
  ..   l = col_double(),
  ..   w = col_double(),
  ..   area = col_double(),
  ..   pale = col_double(),
  ..   bleached = col_double(),
  ..   total.pb = col_double(),
  ..   percent.pb = col_double(),
  ..   new = col_double(),
  ..   trans = col_double(),
  ..   old = col_double(),
  ..   total.mort = col_double(),
  ..   percent.mort = col_double(),
  ..   disease = col_double(),
  ..   percentdisease = col_double()
  .. )
 - attr(*, "problems")=<externalptr>

7.

Save data on your computer and read the file back into R!

write_csv(those_data, file='those_data.csv') # This saves those_data as a new file locally on my computer

the_same_data <- read_csv("data/those_data.csv") # Then I read the file back into R under a new name

Rows: 5223 Columns: 24
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (4): method, type, life.history, species
dbl (20): site, lat, transect, diver, percent.of.cover, l, w, area, pale, bl...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

str(the_same_data)

spc_tbl_ [5,223 × 24] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ method          : chr [1:5223] "Video" "Video" "Video" "Video" ...
 $ site            : num [1:5223] 1 1 1 1 1 1 1 1 1 1 ...
 $ type            : chr [1:5223] "Back Reef" "Back Reef" "Back Reef" "Back Reef" ...
 $ lat             : num [1:5223] 3 3 3 3 3 3 3 3 3 3 ...
 $ transect        : num [1:5223] 1 1 1 1 1 1 1 1 1 1 ...
 $ diver           : num [1:5223] 4 4 4 4 4 4 4 4 4 4 ...
 $ life.history    : chr [1:5223] "Stress Tolerant" NA "Stress Tolerant" "Generalist" ...
 $ species         : chr [1:5223] "SSID" "MCOM" "SSID" "OFAV" ...
 $ percent.of.cover: num [1:5223] 0.3 0.9 0.8 0.9 0.3 0.85 0.8 0.4 0.9 0.85 ...
 $ l               : num [1:5223] 5.62 10.8 29.41 26.72 11.79 ...
 $ w               : num [1:5223] 5.62 5.4 12.94 18.28 16.07 ...
 $ area            : num [1:5223] 31.6 58.3 380.6 488.5 189.4 ...
 $ pale            : num [1:5223] 0 0 0 0 0 0 0 0 0 0 ...
 $ bleached        : num [1:5223] 0 0 0 0 0 0 0 0 0 0 ...
 $ total.pb        : num [1:5223] 0 0 0 0 0 0 0 0 0 0 ...
 $ percent.pb      : num [1:5223] 0 0 0 0 0 0 0 0 0 0 ...
 $ new             : num [1:5223] 0.7 0 0 0 0 0.1 0 0 0 0 ...
 $ trans           : num [1:5223] 0 0 0 0 0 0 0 0 0 0 ...
 $ old             : num [1:5223] 0 0 0.4 0 0 0 0 0 0 0 ...
 $ total.mort      : num [1:5223] 16 2 10 2 2 4 2 2 2 2 ...
 $ percent.mort    : num [1:5223] 70 0 40 0 0 10 0 0 0 0 ...
 $ disease         : num [1:5223] 0 0 0 0 0 0 0 0 0 0 ...
 $ percentdisease  : num [1:5223] 0 0 0 0 0 0 0 0 0 0 ...
 $ site2           : num [1:5223] 1 1 1 1 1 1 1 1 1 1 ...
 - attr(*, "spec")=
  .. cols(
  ..   method = col_character(),
  ..   site = col_double(),
  ..   type = col_character(),
  ..   lat = col_double(),
  ..   transect = col_double(),
  ..   diver = col_double(),
  ..   life.history = col_character(),
  ..   species = col_character(),
  ..   percent.of.cover = col_double(),
  ..   l = col_double(),
  ..   w = col_double(),
  ..   area = col_double(),
  ..   pale = col_double(),
  ..   bleached = col_double(),
  ..   total.pb = col_double(),
  ..   percent.pb = col_double(),
  ..   new = col_double(),
  ..   trans = col_double(),
  ..   old = col_double(),
  ..   total.mort = col_double(),
  ..   percent.mort = col_double(),
  ..   disease = col_double(),
  ..   percentdisease = col_double(),
  ..   site2 = col_double()
  .. )
 - attr(*, "problems")=<externalptr>

# the_same_data has all the rows of the original - plus site2!

8.

Render as an HTML!

Again - here it is :)