Part 0: The preamble

This is where you should prepare yourself for all of the work below. In other words, load all of the packages you might need to do the work you are doing below. It is generally a good idea to load data and packages at the top of an R document/script. Today we are just loading packages since reading in data is part of our tutorial.

#Loading ALL of my necessary packages here :)
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5     v purrr   0.3.4
## v tibble  3.1.6     v dplyr   1.0.7
## v tidyr   1.1.4     v stringr 1.4.0
## v readr   2.1.1     v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(infer)
## Warning: package 'infer' was built under R version 4.1.3
library(ggbeeswarm)

Part 1: Make an Rproject for this class.

Need help making a proj? Item number 3 in This tutorial can help.

getwd()
## [1] "C:/Users/leahh/OneDrive - Smith College/Documents/SDS 300"
setwd("C:/Users/leahh/OneDrive - Smith College/Documents/SDS 300")

Part 2: Ensure you working directory is set correctly!

a.) Set your working directory to work nicely with your rproj (you should have done this when you setup your proj). Show me what directory you chose using code in the chunk below

#My rproj working directory is: 

getwd()
## [1] "C:/Users/leahh/OneDrive - Smith College/Documents/SDS 300"

b.) CHANGE your working directory using a line of code (the function you want here is setwd())

setwd("C:/Users/leahh/OneDrive - Smith College/Documents/Honours")

c.) CHANGE your working directory using the RStudio GUI. Set it back to your preferred working directory and show me that it worked using code

getwd()
## [1] "C:/Users/leahh/OneDrive - Smith College/Documents/SDS 300"

Part 3: Read in data!

a.) Read in data from file (csv)

Go to the course moodle page and find the Labs section. Download the ‘reef_life_survey_habitat.csv’ file. Save it somewhere meaningful (your course rproj directory, perhaps?). Now, read it into R USING CODE (not the ‘import dataset’ GUI). Using read.csv() (base R) or read_csv (tidyverse) is strongly preferred. Have another method? Ask me first!

#I don’t have access to the moodle, so I’ve just imported an old data set that I have

#Reading in data from a file, naming is, and preparing it as an R object for later use 

example_set <- read.csv("Three_Corridors_Data.csv")

b.) Read in data from a url (github, in this case)

Follow this link to my github page. Next, find the file ‘belize_coral_survey_data_2016.csv’ and read that file into R using the URL. Need help with this? Try looking here first, then ask me!

#Reading in some coral survey from Justin's github to practice with!

coralcover <- read.csv("https://raw.githubusercontent.com/jbaumann3/BIOL234_Biostats_MHC/main/belize_coral_survey_data_2016.csv")

5. Practice data maniuplation

a) Using the data from my github, select columns of interest. This includes type, lat, life.history, species, and percent.of.cover (for now)

#select columns
select.coral <- coralcover%>%
  select(type, lat, life.history, species, percent.of.cover)

b.) Filter that dataframe to include all life.history options except for NA

#Filter the data

filter_coral <- select.coral%>%
  filter(!is.na(life.history))

c.) Make a new column that contains a name for each latitude. Currently latitude (lat) is a number. Let’s make it a name that aligns with a coastal town, as this is easier to understand. Lat 1 is San Pedro, Lat 2 is Belize City, Lat 3 is Dangriga, Lat 4 is Placencia, Lat 5 is Punta Gorda. There are many ways to do this. Give it a shot! I’m happy to help.

#Add a new column that assigns a name to each latitude

new_coral <- filter_coral %>%
  mutate(Latitude = case_when(lat == 1 ~ "San Pedro",
         lat == 2 ~ "Belize City",
         lat == 3 ~ "Dangriga",
         lat == 4 ~ "Placencia",
         lat == 5 ~ "Punta Gorda")) 

6.) Basic Plots

Make 3 exploratory plots in ggplot. You can manipulate data anyway you’d like. I strongly recommend avoiding bar graphs.

# plot 1
ggplot(new_coral, aes(x= percent.of.cover))+
  geom_histogram(colour="white")+
  theme_bw() + theme(panel.border = element_rect(colour = "black", size = .5),  axis.line = element_line(colour = "black"))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 1698 rows containing non-finite values (stat_bin).

# plot 2
ggplot(new_coral, aes(x = Latitude, y = percent.of.cover, colour = type))+
  geom_boxplot()+
  theme_bw() + theme(panel.border = element_rect(colour = "black", size = .5),  axis.line = element_line(colour = "black"))
## Warning: Removed 1698 rows containing non-finite values (stat_boxplot).

# plot 3
ggplot(new_coral, aes(x = Latitude, y = log(percent.of.cover), colour = type))+
  geom_boxplot()+
  theme_bw() + theme(panel.border = element_rect(colour = "black", size = .5),  axis.line = element_line(colour = "black"))
## Warning: Removed 1698 rows containing non-finite values (stat_boxplot).

7.) save your dataframe to file

#save your dataframe to file

write.csv(new_coral, file="SDS300_lab1_data.csv")

8.) Render and turn in your html quarto doc

You can do this however works for you. You should be able to turn in a link to your html file (if you publish on Rpubs or github, for exmaple) or you can turn in your acutal html file. Ensure that is works though! IF I can’t load it you will get a U (and need to revise and resubmit)