R_workbook

Author

Owen King

1 Important Notice Whilst the data sets built into each packages are useful to familiarise yourself with the shark data and to apply the code being taught to an actual data set exclusively use the shark data set starting week 4.

sad but optimistic cat

2 Getting to grips with the language

Any code or cool things you find for Quarto add to this subheading

  • bullet point

Rana temporia

3 R code and suggestions for dissertation

  • use spatial distribution models.
  • helpful website on carrying them out and what they actually on on this link..
  • once the code is complete and i have finished the methodology and results sections, input the code onto this HTML to hace a digital copy of it.
  • to download the packages run the code install.packages(c(‘raster’, ‘rgdal’, ‘dismo’, ‘rJava’))
  • Then run each package using the Library() function.

4 What code to run for sessions:

code for Quarto sessions

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(quarto)

5 formative assessment notes

  • for formative 1 you need to think of a Title and a question for you ‘research project’.
  1. Is blotching related to stress? and what variables cause this reaction when handling sharks for research purposes.
  2. What is the main cause of blotching in sharks when being caught and handled by researchers.
  3. what is the correlation between blotching in sharks and numerous variables and can we use these variables to predict the time taken for blotching to occur?
  4. What are the leading factors influencing the time taken for blotching in sharks to occurs, and can the time taken for blotching to occur be predicted?

6 summative assessment notes

  • comparing the time taken for blotching to occur between both male and female may show a difference in the speed of stress responses in male and female.
  • looking the the cortisole levels, BPM and time taken to blotch could be interesting in seeing whether it can be associated as a stress related behavior or not.
  • Look for papers associating length with age of a shark to try determine the age of each individual shark and compare how stressed they would be in comparison to how old they are, i.e. the older they are the less stressed they may be.
  • compare the air temp, water temp, cortisole levels, bpm and blotch to understand wether the air temperature and water temperature have an impact on these stress related measurments.
  • propose the idea that the blotching could be due to being overheated due to the large temperature difference between the water and air.
  • if air and water temp have a significant effect on the blotching time, find a test that could predict the parameters in which different times to blotch could occur in response to air and water temp.
  • before starting a mulitvariate statistical test read the definition of one and how it differs from univariete tests just to give a general idea on what it is testing.
  • Archemedes principle (buoyant force counteracting water desnity) ## Sessions Formative: code and packages used for sessions and summative ## start up code for sessions:
warning(FALSE)
Warning: FALSE
getwd()
[1] "C:/Users/Owen King/OneDrive/Documents/r studio stuff/Quarto"
library(tidyverse, palmerpenguins, knitr)
data("diamonds")
view(diamonds)

7 Week 1 - Introduction to R and Quarto

  • established what packages would be needed for the rest of the module
  • introduction to lecturer and quarto
  • created own quarto document and had a little play around with R

8 week 2 - Ethics for experimental design

  • discussed similar themes regarding ethics surrounding research
  • similar to previous dissertation module
  • take away from the subject is what is expected treat all animals equally

9 week 3 - Data Wrangling and Research Questions

code for the R workbook: 6.6.7

library(dplyr, tidyverse) # opens both packages required for the next comands.
view(diamonds) # this opens the data set diamonds in a seperate tab.
diamonds %>% arrange(price) # the pipe opens the diamonds data set and then the arranges function arranges the entire dataset by price.
# A tibble: 53,940 × 10
   carat cut       color clarity depth table price     x     y     z
   <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
 1  0.23 Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43
 2  0.21 Premium   E     SI1      59.8    61   326  3.89  3.84  2.31
 3  0.23 Good      E     VS1      56.9    65   327  4.05  4.07  2.31
 4  0.29 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63
 5  0.31 Good      J     SI2      63.3    58   335  4.34  4.35  2.75
 6  0.24 Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48
 7  0.24 Very Good I     VVS1     62.3    57   336  3.95  3.98  2.47
 8  0.26 Very Good H     SI1      61.9    55   337  4.07  4.11  2.53
 9  0.22 Fair      E     VS2      65.1    61   337  3.87  3.78  2.49
10  0.23 Very Good H     VS1      59.4    61   338  4     4.05  2.39
# ℹ 53,930 more rows
diamonds %>% arrange(desc(price)) # arrange the data set by price in decending order.
# A tibble: 53,940 × 10
   carat cut       color clarity depth table price     x     y     z
   <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
 1  2.29 Premium   I     VS2      60.8    60 18823  8.5   8.47  5.16
 2  2    Very Good G     SI1      63.5    56 18818  7.9   7.97  5.04
 3  1.51 Ideal     G     IF       61.7    55 18806  7.37  7.41  4.56
 4  2.07 Ideal     G     SI2      62.5    55 18804  8.2   8.13  5.11
 5  2    Very Good H     SI1      62.8    57 18803  7.95  8     5.01
 6  2.29 Premium   I     SI1      61.8    59 18797  8.52  8.45  5.24
 7  2.04 Premium   H     SI1      58.1    60 18795  8.37  8.28  4.84
 8  2    Premium   I     VS1      60.8    59 18795  8.13  8.02  4.91
 9  1.71 Premium   F     VS2      62.3    59 18791  7.57  7.53  4.7 
10  2.15 Ideal     G     SI2      62.6    54 18791  8.29  8.35  5.21
# ℹ 53,930 more rows
diamonds %>% arrange("cut", "price") # arrange both the 
# A tibble: 53,940 × 10
   carat cut       color clarity depth table price     x     y     z
   <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
 1  0.23 Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43
 2  0.21 Premium   E     SI1      59.8    61   326  3.89  3.84  2.31
 3  0.23 Good      E     VS1      56.9    65   327  4.05  4.07  2.31
 4  0.29 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63
 5  0.31 Good      J     SI2      63.3    58   335  4.34  4.35  2.75
 6  0.24 Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48
 7  0.24 Very Good I     VVS1     62.3    57   336  3.95  3.98  2.47
 8  0.26 Very Good H     SI1      61.9    55   337  4.07  4.11  2.53
 9  0.22 Fair      E     VS2      65.1    61   337  3.87  3.78  2.49
10  0.23 Very Good H     VS1      59.4    61   338  4     4.05  2.39
# ℹ 53,930 more rows
diamonds %>% arrange(desc("price"), desc("cut")) # arranging the data set by two variables, the desc function can only have one variable within each individual desc command.
# A tibble: 53,940 × 10
   carat cut       color clarity depth table price     x     y     z
   <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
 1  0.23 Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43
 2  0.21 Premium   E     SI1      59.8    61   326  3.89  3.84  2.31
 3  0.23 Good      E     VS1      56.9    65   327  4.05  4.07  2.31
 4  0.29 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63
 5  0.31 Good      J     SI2      63.3    58   335  4.34  4.35  2.75
 6  0.24 Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48
 7  0.24 Very Good I     VVS1     62.3    57   336  3.95  3.98  2.47
 8  0.26 Very Good H     SI1      61.9    55   337  4.07  4.11  2.53
 9  0.22 Fair      E     VS2      65.1    61   337  3.87  3.78  2.49
10  0.23 Very Good H     VS1      59.4    61   338  4     4.05  2.39
# ℹ 53,930 more rows
diamonds %>% arrange(price, clarity) # arrange the data by the price and clarity
# A tibble: 53,940 × 10
   carat cut       color clarity depth table price     x     y     z
   <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
 1  0.23 Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43
 2  0.21 Premium   E     SI1      59.8    61   326  3.89  3.84  2.31
 3  0.23 Good      E     VS1      56.9    65   327  4.05  4.07  2.31
 4  0.29 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63
 5  0.31 Good      J     SI2      63.3    58   335  4.34  4.35  2.75
 6  0.24 Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48
 7  0.24 Very Good I     VVS1     62.3    57   336  3.95  3.98  2.47
 8  0.26 Very Good H     SI1      61.9    55   337  4.07  4.11  2.53
 9  0.22 Fair      E     VS2      65.1    61   337  3.87  3.78  2.49
10  0.23 Very Good H     VS1      59.4    61   338  4     4.05  2.39
# ℹ 53,930 more rows
diamonds <- diamonds %>% mutate(saleprice = price - 250) # this creates a new variable using the mutate function and the <- assigns this new varaible to the diamonds data set, saving the change rather than it not saving.
diamonds_no_xyz <- diamonds %>% select(-x, -y, -z) # the begining of this code creates a new data set so as not to deleate x, y, z from the main data set. after this using the pipe function %>% select the data and make sure to add a minus symbol so you can delete the desired data sets.
diamonds %>% group_by(cut) %>% summarize(count = n()) %>% ungroup() # firstly you create a pipe within the diamonds data set abd then group the data by "cut" you then create another pipe into the now grouped data and ask it to sumaarize the data for you. within the summary command the count = n() asks R to count the number of diamonds within each group n() being a simple way of calculating how many rows belong to the grouped varaible in this case it is "cut". then creating anotehr pipe and ungrouping the data set to make further code work as intended.
# A tibble: 5 × 2
  cut       count
  <ord>     <int>
1 Fair       1610
2 Good       4906
3 Very Good 12082
4 Premium   13791
5 Ideal     21551
diamonds <- diamonds %>% mutate(totalnum = nrow(diamonds)) # firstly the <- function allows us to being editing the current data set named diamonds, we then open up a pipe in the data set with %>% and use the mutate tool to create a new column, we name the column "totalnum" and then apply a calcuation which will determine what appears in the column. using the function nrow(diamonds) we ask R to count the amount of rows of data that are found within the diamond data set these changes can then be viewed by asked R to view the data using view(diamonds) or by clicking on the data set in the enviroment tab in the top right.

week 4 data exploration: download the correct packages and load them with the library function.

install.packages("ggplot2")
Warning: package 'ggplot2' is in use and will not be installed
library(ggplot2)

to create a scatter plot use the function ggplot replace the fields that are currently filled with whatever data found for

# ggplot(sharks, aes(x = blotch, y = depth)) + geom_point()