Week 1 01.10.24
Meet Quarto
This workbook will serve as a guide to assist with coding as well as highlighting my development.
A picture from a recent trip to Iceland.
To add a photograph use the image sign above and select browse which will pull up all available images. It is also possible to insert links to data/ buzzwords.
Coding
Switiching to source (tab) will highlight the markdown which will be generated as you work in visual. Usually the rendered document will look somewhat the same as it did using ‘visual’.
Rendered documents refer to the generation of a file which contains the combination of the code and the output.
A YAML header is highlighted by 3 dashes (—) at either end.
YAML uses key value pairs in the format key: value. Other fields found in headers are : date, subtitle, theme, font colour.
Code Chunks
R code chunks identified with { r } with (optional) chunk options, in YAML style, identified by #| at the beginning of the line.
In this case, the label of the code chunk is load-packages, and we set include to false to indicate that we don’t want the chunk itself or any of its outputs in the rendered documents.
Echo: false - hides code only producing output. - this can be document wide if add function to YAML
Markdown text
Text with formatting, including section headers, hyperlinks, an embedded image, and an inline code chunk.
Quarto uses markdown syntax for text. If using the visual editor, you won’t need to learn much markdown syntax for authoring your document, as you can use the menus and shortcuts to add a header, bold text, insert a table. if using the sourse editor, you can achieve these with the markdown expression like ##, **bold**
This scatter plot illustrates the relationship between flipper and bill length of penguins.
Week 3 pre-session 08.10.24
Tidyverse
with reference to chapter 4 R for Graduate Students by Y. Wendy Huynh.
── Attaching core tidyverse packages ────────────────────── tidyverse 2.0.0
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ──────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package to force all conflicts to become errors
Tidyverse is a package which contains other packages such as dplyr and ggplot2. This package must be installed on the device before accessing onthe library.
Using install.packages(tidyverse) before library(tidyverse).
Packages often have naming conflicts, therefore R uses the functions related to the most recent package.
To load ONLY the filter function from dplyr
dplyr::filter()
loads ALL functions from dplyr
library(dplyr)
Example Code:
diamonds %>%
group_by(clarity) %>%
summarize(m = mean(price)) %>%
ungroup()
# A tibble: 8 x 2
clarity m
<ord> <dbl>
1 I1 3924
2 SI2 5063
3 SI1 3996
4 VS2 3925
5 VS1 3839
6 VVS2 3284
7 VVS1 2523
8 IF 2865.
Diamond data set
Chapter 5 R for Grad students
The diamond dataset is built into R and is available with the ggplot2 package.
View(diamonds) will open up the data when typed into the console. To edit this data you must do so in code.
view(diamonds)str(diamonds) will allow you to look at the structure of the data. There are 10 total variables (three ordered factors, one integer, and 6 numeric.
str(diamonds)tibble [53,940 × 10] (S3: tbl_df/tbl/data.frame)
$ carat : num [1:53940] 0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ...
$ cut : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ...
$ color : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 5 2 5 ...
$ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ...
$ depth : num [1:53940] 61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ...
$ table : num [1:53940] 55 61 65 58 58 57 57 55 61 61 ...
$ price : int [1:53940] 326 326 327 334 335 336 336 337 337 338 ...
$ x : num [1:53940] 3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ...
$ y : num [1:53940] 3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ...
$ z : num [1:53940] 2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ...
Week 3 post-session 10.10.24
Chapter 6.6 arrange ( )
Allows you arrange values within a variable in ascending or descending order (if that is applicable to your values). This can apply to both numerical and non-numerical values.
diamonds %>% arrange(cut)# A tibble: 53,940 × 10
carat cut color clarity depth table price x y z
<dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1 0.22 Fair E VS2 65.1 61 337 3.87 3.78 2.49
2 0.86 Fair E SI2 55.1 69 2757 6.45 6.33 3.52
3 0.96 Fair F SI2 66.3 62 2759 6.27 5.95 4.07
4 0.7 Fair F VS2 64.5 57 2762 5.57 5.53 3.58
5 0.7 Fair F VS2 65.3 55 2762 5.63 5.58 3.66
6 0.91 Fair H SI2 64.4 57 2763 6.11 6.09 3.93
7 0.91 Fair H SI2 65.7 60 2763 6.03 5.99 3.95
8 0.98 Fair H SI2 67.9 60 2777 6.05 5.97 4.08
9 0.84 Fair G SI1 55.1 67 2782 6.39 6.2 3.47
10 1.01 Fair E I1 64.5 58 2788 6.29 6.21 4.03
# ℹ 53,930 more rows
diamonds %>% # utilizes the diamonds dataset
group_by(color, clarity) %>% # groups data by color and clarity variables
mutate(price200 = mean(price)) %>% # creates new variable (average price by groups)
ungroup() %>% # data no longer grouped by color and clarity
mutate(random10 = 10 + price) %>% # new variable, original price + $10
select(cut, color, # retain only these listed columns
clarity, price,
price200, random10) %>%
arrange(color) %>% # visualize data ordered by color
group_by(cut) %>% # group data by cut
mutate(dis = n_distinct(price), # counts the number of unique price values per cut
rowID = row_number()) %>% # numbers each row consecutively for each cut
ungroup() # final ungrouping of data# A tibble: 53,940 × 8
cut color clarity price price200 random10 dis rowID
<ord> <ord> <ord> <int> <dbl> <dbl> <int> <int>
1 Very Good D VS2 357 2587. 367 5840 1
2 Very Good D VS1 402 3030. 412 5840 2
3 Very Good D VS2 403 2587. 413 5840 3
4 Good D VS2 403 2587. 413 3086 1
5 Good D VS1 403 3030. 413 3086 2
6 Premium D VS2 404 2587. 414 6014 1
7 Premium D SI1 552 2976. 562 6014 2
8 Ideal D SI1 552 2976. 562 7281 1
9 Ideal D SI1 552 2976. 562 7281 2
10 Very Good D VVS1 553 2948. 563 5840 4
# ℹ 53,930 more rows
Chapter 6.7 Extra practise
View all of the variable names in
view(diamonds)- Arrange the diamonds by lowest to highest price
diamonds %>% arrange(price)# A tibble: 53,940 × 10
carat cut color clarity depth table price x y z
<dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31
4 0.29 Premium I VS2 62.4 58 334 4.2 4.23 2.63
5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75
6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48
7 0.24 Very Good I VVS1 62.3 57 336 3.95 3.98 2.47
8 0.26 Very Good H SI1 61.9 55 337 4.07 4.11 2.53
9 0.22 Fair E VS2 65.1 61 337 3.87 3.78 2.49
10 0.23 Very Good H VS1 59.4 61 338 4 4.05 2.39
# ℹ 53,930 more rows
- Arrange the diamonds by highest to lowest price
diamonds %>% arrange(desc(price))# A tibble: 53,940 × 10
carat cut color clarity depth table price x y z
<dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1 2.29 Premium I VS2 60.8 60 18823 8.5 8.47 5.16
2 2 Very Good G SI1 63.5 56 18818 7.9 7.97 5.04
3 1.51 Ideal G IF 61.7 55 18806 7.37 7.41 4.56
4 2.07 Ideal G SI2 62.5 55 18804 8.2 8.13 5.11
5 2 Very Good H SI1 62.8 57 18803 7.95 8 5.01
6 2.29 Premium I SI1 61.8 59 18797 8.52 8.45 5.24
7 2.04 Premium H SI1 58.1 60 18795 8.37 8.28 4.84
8 2 Premium I VS1 60.8 59 18795 8.13 8.02 4.91
9 1.71 Premium F VS2 62.3 59 18791 7.57 7.53 4.7
10 2.15 Ideal G SI2 62.6 54 18791 8.29 8.35 5.21
# ℹ 53,930 more rows
- Arrange the diamonds by lowest price and cut
diamonds %>% arrange(price)%>% arrange(cut)# A tibble: 53,940 × 10
carat cut color clarity depth table price x y z
<dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1 0.22 Fair E VS2 65.1 61 337 3.87 3.78 2.49
2 0.25 Fair E VS1 55.2 64 361 4.21 4.23 2.33
3 0.23 Fair G VVS2 61.4 66 369 3.87 3.91 2.39
4 0.27 Fair E VS1 66.4 58 371 3.99 4.02 2.66
5 0.3 Fair J VS2 64.8 58 416 4.24 4.16 2.72
6 0.3 Fair F SI1 63.1 58 496 4.3 4.22 2.69
7 0.34 Fair J SI1 64.5 57 497 4.38 4.36 2.82
8 0.37 Fair F SI1 65.3 56 527 4.53 4.47 2.94
9 0.3 Fair D SI2 64.6 54 536 4.29 4.25 2.76
10 0.25 Fair D VS1 61.2 55 563 4.09 4.11 2.51
# ℹ 53,930 more rows
Arrange the diamonds by highest price and cut
diamonds %>% arrange(desc(price)) %>% arrange(desc(cut))# A tibble: 53,940 × 10 carat cut color clarity depth table price x y z <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl> 1 1.51 Ideal G IF 61.7 55 18806 7.37 7.41 4.56 2 2.07 Ideal G SI2 62.5 55 18804 8.2 8.13 5.11 3 2.15 Ideal G SI2 62.6 54 18791 8.29 8.35 5.21 4 2.05 Ideal G SI1 61.9 57 18787 8.1 8.16 5.03 5 1.6 Ideal F VS1 62 56 18780 7.47 7.52 4.65 6 2.06 Ideal I VS2 62.2 55 18779 8.15 8.19 5.08 7 1.71 Ideal G VVS2 62.1 55 18768 7.66 7.63 4.75 8 2.08 Ideal H SI1 58.7 60 18760 8.36 8.4 4.92 9 2.03 Ideal G SI1 60 55.8 18757 8.17 8.3 4.95 10 2.61 Ideal I SI2 62.1 56 18756 8.85 8.73 5.46 # ℹ 53,930 more rows
Arrange the diamonds by by lowest to highest price and worst to best clarity.
diamonds %>% arrange(price) %>% arrange(desc(clarity))# A tibble: 53,940 × 10 carat cut color clarity depth table price x y z <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl> 1 0.23 Very Good H IF 63.9 55 369 3.89 3.9 2.49 2 0.24 Very Good H IF 61.3 56 449 4.04 4.06 2.48 3 0.26 Ideal H IF 61.1 57 468 4.12 4.16 2.53 4 0.23 Very Good F IF 61 62 485 3.95 3.99 2.42 5 0.3 Ideal J IF 61.5 56 489 4.32 4.33 2.66 6 0.3 Ideal J IF 61.5 57 489 4.29 4.36 2.66 7 0.23 Very Good E IF 59.9 58 492 3.98 4.03 2.4 8 0.24 Good F IF 65.1 58 492 3.86 3.88 2.52 9 0.24 Ideal H IF 62.5 54 504 3.97 4 2.49 10 0.24 Ideal H IF 62.1 57 504 4 4.04 2.5 # ℹ 53,930 more rowsCreate a new variable named salePrice to reflect a discount of $250 off of the original cost of each diamond
diamonds %>% mutate(saleprice = price - 250)# A tibble: 53,940 × 11 carat cut color clarity depth table price x y z saleprice <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <dbl> 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43 76 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31 76 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31 77 4 0.29 Premium I VS2 62.4 58 334 4.2 4.23 2.63 84 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75 85 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48 86 7 0.24 Very Good I VVS1 62.3 57 336 3.95 3.98 2.47 86 8 0.26 Very Good H SI1 61.9 55 337 4.07 4.11 2.53 87 9 0.22 Fair E VS2 65.1 61 337 3.87 3.78 2.49 87 10 0.23 Very Good H VS1 59.4 61 338 4 4.05 2.39 88 # ℹ 53,930 more rowsRemove the x, y, and zvariables from the diamond dataset
diamonds %>% select(-x, -y, -z)# A tibble: 53,940 × 7 carat cut color clarity depth table price <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> 1 0.23 Ideal E SI2 61.5 55 326 2 0.21 Premium E SI1 59.8 61 326 3 0.23 Good E VS1 56.9 65 327 4 0.29 Premium I VS2 62.4 58 334 5 0.31 Good J SI2 63.3 58 335 6 0.24 Very Good J VVS2 62.8 57 336 7 0.24 Very Good I VVS1 62.3 57 336 8 0.26 Very Good H SI1 61.9 55 337 9 0.22 Fair E VS2 65.1 61 337 10 0.23 Very Good H VS1 59.4 61 338 # ℹ 53,930 more rowsDetermine the number of diamonds there are for each cut value
diamonds %>% group_by(cut) %>% summarise(number = n()) %>% ungroup()# A tibble: 5 × 2
cut number
<ord> <int>
1 Fair 1610
2 Good 4906
3 Very Good 12082
4 Premium 13791
5 Ideal 21551
- Create a new column named total num that calculates the total number of diamonds.
diamonds %>% mutate(totalnum = sum(n()))# A tibble: 53,940 × 11
carat cut color clarity depth table price x y z totalnum
<dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <int>
1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43 53940
2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31 53940
3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31 53940
4 0.29 Premium I VS2 62.4 58 334 4.2 4.23 2.63 53940
5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75 53940
6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48 53940
7 0.24 Very Good I VVS1 62.3 57 336 3.95 3.98 2.47 53940
8 0.26 Very Good H SI1 61.9 55 337 4.07 4.11 2.53 53940
9 0.22 Fair E VS2 65.1 61 337 3.87 3.78 2.49 53940
10 0.23 Very Good H VS1 59.4 61 338 4 4.05 2.39 53940
# ℹ 53,930 more rows