Week 1 01.10.24

Meet Quarto

This workbook will serve as a guide to assist with coding as well as highlighting my development.

A picture from a recent trip to Iceland.

The small fishing town of Húsavík is the whale watching capital of Europe.

Photograph of a defrosting lake against a sunrise — Reykjavikurtjorn during Polar winter.

To add a photograph use the image sign above and select browse which will pull up all available images. It is also possible to insert links to data/ buzzwords.

Coding

Switiching to source (tab) will highlight the markdown which will be generated as you work in visual. Usually the rendered document will look somewhat the same as it did using ‘visual’.

Rendered documents refer to the generation of a file which contains the combination of the code and the output.

A YAML header is highlighted by 3 dashes (—) at either end.

YAML uses key value pairs in the format key: value. Other fields found in headers are : date, subtitle, theme, font colour.

Code Chunks

R code chunks identified with { r } with (optional) chunk options, in YAML style, identified by #| at the beginning of the line.

In this case, the label of the code chunk is load-packages, and we set include to false to indicate that we don’t want the chunk itself or any of its outputs in the rendered documents.

Echo: false - hides code only producing output. - this can be document wide if add function to YAML

Markdown text

Text with formatting, including section headers, hyperlinks, an embedded image, and an inline code chunk.

Quarto uses markdown syntax for text. If using the visual editor, you won’t need to learn much markdown syntax for authoring your document, as you can use the menus and shortcuts to add a header, bold text, insert a table. if using the sourse editor, you can achieve these with the markdown expression like ##, **bold**

This scatter plot illustrates the relationship between flipper and bill length of penguins.

Week 3 pre-session 08.10.24

Tidyverse

with reference to chapter 4 R for Graduate Students by Y. Wendy Huynh.

── Attaching core tidyverse packages ────────────────────── tidyverse 2.0.0

✔ dplyr     1.1.4     ✔ readr     2.1.5 

✔forcats    1.0.0     ✔ stringr   1.5.1 

✔ ggplot2   3.5.1     ✔ tibble    3.2.1 

✔lubridate  1.9.3     ✔ tidyr     1.3.1 

✔ purrr     1.0.2     

── Conflicts ──────────────────────────────────────── tidyverse_conflicts() ── 

✖ dplyr::filter() masks stats::filter() 

✖ dplyr::lag()    masks stats::lag() 

ℹ Use the conflicted package to force all conflicts to become errors

Tidyverse is a package which contains other packages such as dplyr and ggplot2. This package must be installed on the device before accessing onthe library.

Using install.packages(tidyverse) before library(tidyverse).

Packages often have naming conflicts, therefore R uses the functions related to the most recent package.

To load ONLY the filter function from dplyr

dplyr::filter()

loads ALL functions from dplyr

library(dplyr)

Example Code:

diamonds %>%

group_by(clarity) %>%

summarize(m = mean(price)) %>%

ungroup()

# A tibble: 8 x 2
clarity     m
<ord>   <dbl> 
1 I1      3924
2 SI2     5063
3 SI1     3996
4 VS2     3925
5 VS1     3839
6 VVS2    3284
7 VVS1    2523
8 IF      2865.

Diamond data set

Chapter 5 R for Grad students

The diamond dataset is built into R and is available with the ggplot2 package.

View(diamonds) will open up the data when typed into the console. To edit this data you must do so in code.

view(diamonds)

str(diamonds) will allow you to look at the structure of the data. There are 10 total variables (three ordered factors, one integer, and 6 numeric.

str(diamonds)

tibble [53,940 × 10] (S3: tbl_df/tbl/data.frame)
 $ carat  : num [1:53940] 0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ...
 $ cut    : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ...
 $ color  : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 5 2 5 ...
 $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ...
 $ depth  : num [1:53940] 61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ...
 $ table  : num [1:53940] 55 61 65 58 58 57 57 55 61 61 ...
 $ price  : int [1:53940] 326 326 327 334 335 336 336 337 337 338 ...
 $ x      : num [1:53940] 3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ...
 $ y      : num [1:53940] 3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ...
 $ z      : num [1:53940] 2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ...

Don’t forget that R cares about capitalization.

Week 3 post-session 10.10.24

Chapter 6.6 arrange ( )

Allows you arrange values within a variable in ascending or descending order (if that is applicable to your values). This can apply to both numerical and non-numerical values.

diamonds %>% arrange(cut)

# A tibble: 53,940 × 10
   carat cut   color clarity depth table price     x     y     z
   <dbl> <ord> <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
 1  0.22 Fair  E     VS2      65.1    61   337  3.87  3.78  2.49
 2  0.86 Fair  E     SI2      55.1    69  2757  6.45  6.33  3.52
 3  0.96 Fair  F     SI2      66.3    62  2759  6.27  5.95  4.07
 4  0.7  Fair  F     VS2      64.5    57  2762  5.57  5.53  3.58
 5  0.7  Fair  F     VS2      65.3    55  2762  5.63  5.58  3.66
 6  0.91 Fair  H     SI2      64.4    57  2763  6.11  6.09  3.93
 7  0.91 Fair  H     SI2      65.7    60  2763  6.03  5.99  3.95
 8  0.98 Fair  H     SI2      67.9    60  2777  6.05  5.97  4.08
 9  0.84 Fair  G     SI1      55.1    67  2782  6.39  6.2   3.47
10  1.01 Fair  E     I1       64.5    58  2788  6.29  6.21  4.03
# ℹ 53,930 more rows

diamonds %>%                         # utilizes the diamonds dataset
  group_by(color, clarity) %>%       # groups data by color and clarity variables
  mutate(price200 = mean(price)) %>% # creates new variable (average price by groups)
  ungroup() %>%                      # data no longer grouped by color and clarity
  mutate(random10 = 10 + price) %>%  # new variable, original price + $10
  select(cut, color,                 # retain only these listed columns
         clarity, price, 
         price200, random10) %>% 
  arrange(color) %>%                 # visualize data ordered by color
  group_by(cut) %>%                  # group data by cut
  mutate(dis = n_distinct(price),    # counts the number of unique price values per cut 
         rowID = row_number()) %>%   # numbers each row consecutively for each cut
  ungroup()                          # final ungrouping of data

# A tibble: 53,940 × 8
   cut       color clarity price price200 random10   dis rowID
   <ord>     <ord> <ord>   <int>    <dbl>    <dbl> <int> <int>
 1 Very Good D     VS2       357    2587.      367  5840     1
 2 Very Good D     VS1       402    3030.      412  5840     2
 3 Very Good D     VS2       403    2587.      413  5840     3
 4 Good      D     VS2       403    2587.      413  3086     1
 5 Good      D     VS1       403    3030.      413  3086     2
 6 Premium   D     VS2       404    2587.      414  6014     1
 7 Premium   D     SI1       552    2976.      562  6014     2
 8 Ideal     D     SI1       552    2976.      562  7281     1
 9 Ideal     D     SI1       552    2976.      562  7281     2
10 Very Good D     VVS1      553    2948.      563  5840     4
# ℹ 53,930 more rows

Chapter 6.7 Extra practise

View all of the variable names in

view(diamonds)

Arrange the diamonds by lowest to highest price

diamonds %>% arrange(price)

# A tibble: 53,940 × 10
   carat cut       color clarity depth table price     x     y     z
   <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
 1  0.23 Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43
 2  0.21 Premium   E     SI1      59.8    61   326  3.89  3.84  2.31
 3  0.23 Good      E     VS1      56.9    65   327  4.05  4.07  2.31
 4  0.29 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63
 5  0.31 Good      J     SI2      63.3    58   335  4.34  4.35  2.75
 6  0.24 Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48
 7  0.24 Very Good I     VVS1     62.3    57   336  3.95  3.98  2.47
 8  0.26 Very Good H     SI1      61.9    55   337  4.07  4.11  2.53
 9  0.22 Fair      E     VS2      65.1    61   337  3.87  3.78  2.49
10  0.23 Very Good H     VS1      59.4    61   338  4     4.05  2.39
# ℹ 53,930 more rows

Arrange the diamonds by highest to lowest price

diamonds %>% arrange(desc(price))

# A tibble: 53,940 × 10
   carat cut       color clarity depth table price     x     y     z
   <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
 1  2.29 Premium   I     VS2      60.8    60 18823  8.5   8.47  5.16
 2  2    Very Good G     SI1      63.5    56 18818  7.9   7.97  5.04
 3  1.51 Ideal     G     IF       61.7    55 18806  7.37  7.41  4.56
 4  2.07 Ideal     G     SI2      62.5    55 18804  8.2   8.13  5.11
 5  2    Very Good H     SI1      62.8    57 18803  7.95  8     5.01
 6  2.29 Premium   I     SI1      61.8    59 18797  8.52  8.45  5.24
 7  2.04 Premium   H     SI1      58.1    60 18795  8.37  8.28  4.84
 8  2    Premium   I     VS1      60.8    59 18795  8.13  8.02  4.91
 9  1.71 Premium   F     VS2      62.3    59 18791  7.57  7.53  4.7 
10  2.15 Ideal     G     SI2      62.6    54 18791  8.29  8.35  5.21
# ℹ 53,930 more rows

Arrange the diamonds by lowest price and cut

diamonds %>% arrange(price)%>% arrange(cut)

# A tibble: 53,940 × 10
   carat cut   color clarity depth table price     x     y     z
   <dbl> <ord> <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
 1  0.22 Fair  E     VS2      65.1    61   337  3.87  3.78  2.49
 2  0.25 Fair  E     VS1      55.2    64   361  4.21  4.23  2.33
 3  0.23 Fair  G     VVS2     61.4    66   369  3.87  3.91  2.39
 4  0.27 Fair  E     VS1      66.4    58   371  3.99  4.02  2.66
 5  0.3  Fair  J     VS2      64.8    58   416  4.24  4.16  2.72
 6  0.3  Fair  F     SI1      63.1    58   496  4.3   4.22  2.69
 7  0.34 Fair  J     SI1      64.5    57   497  4.38  4.36  2.82
 8  0.37 Fair  F     SI1      65.3    56   527  4.53  4.47  2.94
 9  0.3  Fair  D     SI2      64.6    54   536  4.29  4.25  2.76
10  0.25 Fair  D     VS1      61.2    55   563  4.09  4.11  2.51
# ℹ 53,930 more rows

Arrange the diamonds by highest price and cut

diamonds %>% arrange(desc(price)) %>% arrange(desc(cut))

# A tibble: 53,940 × 10
   carat cut   color clarity depth table price     x     y     z
   <dbl> <ord> <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
 1  1.51 Ideal G     IF       61.7  55   18806  7.37  7.41  4.56
 2  2.07 Ideal G     SI2      62.5  55   18804  8.2   8.13  5.11
 3  2.15 Ideal G     SI2      62.6  54   18791  8.29  8.35  5.21
 4  2.05 Ideal G     SI1      61.9  57   18787  8.1   8.16  5.03
 5  1.6  Ideal F     VS1      62    56   18780  7.47  7.52  4.65
 6  2.06 Ideal I     VS2      62.2  55   18779  8.15  8.19  5.08
 7  1.71 Ideal G     VVS2     62.1  55   18768  7.66  7.63  4.75
 8  2.08 Ideal H     SI1      58.7  60   18760  8.36  8.4   4.92
 9  2.03 Ideal G     SI1      60    55.8 18757  8.17  8.3   4.95
10  2.61 Ideal I     SI2      62.1  56   18756  8.85  8.73  5.46
# ℹ 53,930 more rows

Arrange the diamonds by by lowest to highest price and worst to best clarity.

diamonds %>% arrange(price) %>% arrange(desc(clarity))

# A tibble: 53,940 × 10
   carat cut       color clarity depth table price     x     y     z
   <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
 1  0.23 Very Good H     IF       63.9    55   369  3.89  3.9   2.49
 2  0.24 Very Good H     IF       61.3    56   449  4.04  4.06  2.48
 3  0.26 Ideal     H     IF       61.1    57   468  4.12  4.16  2.53
 4  0.23 Very Good F     IF       61      62   485  3.95  3.99  2.42
 5  0.3  Ideal     J     IF       61.5    56   489  4.32  4.33  2.66
 6  0.3  Ideal     J     IF       61.5    57   489  4.29  4.36  2.66
 7  0.23 Very Good E     IF       59.9    58   492  3.98  4.03  2.4 
 8  0.24 Good      F     IF       65.1    58   492  3.86  3.88  2.52
 9  0.24 Ideal     H     IF       62.5    54   504  3.97  4     2.49
10  0.24 Ideal     H     IF       62.1    57   504  4     4.04  2.5 
# ℹ 53,930 more rows

Create a new variable named salePrice to reflect a discount of $250 off of the original cost of each diamond

 diamonds %>% mutate(saleprice = price - 250)

# A tibble: 53,940 × 11
   carat cut       color clarity depth table price     x     y     z saleprice
   <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>     <dbl>
 1  0.23 Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43        76
 2  0.21 Premium   E     SI1      59.8    61   326  3.89  3.84  2.31        76
 3  0.23 Good      E     VS1      56.9    65   327  4.05  4.07  2.31        77
 4  0.29 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63        84
 5  0.31 Good      J     SI2      63.3    58   335  4.34  4.35  2.75        85
 6  0.24 Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48        86
 7  0.24 Very Good I     VVS1     62.3    57   336  3.95  3.98  2.47        86
 8  0.26 Very Good H     SI1      61.9    55   337  4.07  4.11  2.53        87
 9  0.22 Fair      E     VS2      65.1    61   337  3.87  3.78  2.49        87
10  0.23 Very Good H     VS1      59.4    61   338  4     4.05  2.39        88
# ℹ 53,930 more rows

Remove the x, y, and zvariables from the diamond dataset

diamonds %>% select(-x, -y, -z)

# A tibble: 53,940 × 7
   carat cut       color clarity depth table price
   <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int>
 1  0.23 Ideal     E     SI2      61.5    55   326
 2  0.21 Premium   E     SI1      59.8    61   326
 3  0.23 Good      E     VS1      56.9    65   327
 4  0.29 Premium   I     VS2      62.4    58   334
 5  0.31 Good      J     SI2      63.3    58   335
 6  0.24 Very Good J     VVS2     62.8    57   336
 7  0.24 Very Good I     VVS1     62.3    57   336
 8  0.26 Very Good H     SI1      61.9    55   337
 9  0.22 Fair      E     VS2      65.1    61   337
10  0.23 Very Good H     VS1      59.4    61   338
# ℹ 53,930 more rows

Determine the number of diamonds there are for each cut value

diamonds %>% group_by(cut) %>% summarise(number = n()) %>% ungroup()

# A tibble: 5 × 2
  cut       number
  <ord>      <int>
1 Fair        1610
2 Good        4906
3 Very Good  12082
4 Premium    13791
5 Ideal      21551

Create a new column named total num that calculates the total number of diamonds.

diamonds %>% mutate(totalnum = sum(n()))

# A tibble: 53,940 × 11
   carat cut       color clarity depth table price     x     y     z totalnum
   <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>    <int>
 1  0.23 Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43    53940
 2  0.21 Premium   E     SI1      59.8    61   326  3.89  3.84  2.31    53940
 3  0.23 Good      E     VS1      56.9    65   327  4.05  4.07  2.31    53940
 4  0.29 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63    53940
 5  0.31 Good      J     SI2      63.3    58   335  4.34  4.35  2.75    53940
 6  0.24 Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48    53940
 7  0.24 Very Good I     VVS1     62.3    57   336  3.95  3.98  2.47    53940
 8  0.26 Very Good H     SI1      61.9    55   337  4.07  4.11  2.53    53940
 9  0.22 Fair      E     VS2      65.1    61   337  3.87  3.78  2.49    53940
10  0.23 Very Good H     VS1      59.4    61   338  4     4.05  2.39    53940
# ℹ 53,930 more rows