Week 1 from Introduction to Tidyverse

Notes from reading & exploring basic R from https://bookdown.org/ansellbr/WEHI_tidyR_course_book/making-beautiful-plots.html

0 Shortcut References

R

# Create the pipe operator with CTRL + SHIFT + M

Markdown

# Create a new {r} chunk with CTRL + ALT + I
# 

1.1 Welcome to R

1.1.1 How to install a package

#install.packages('tidyverse')

1.1.2 Loading relevant packages.

Notice that when using a library it DOES NOT have single brackets, unlike the

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.2     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.4     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

1.1.13 Using the pipe %>% for readability

Say we want to view the beginning of a df using head.

mpg_df <- mpg

Can simply run with inputting the data directly as an argument.

head(mpg_df)
## # A tibble: 6 × 11
##   manufacturer model displ  year   cyl trans      drv     cty   hwy fl    class 
##   <chr>        <chr> <dbl> <int> <int> <chr>      <chr> <int> <int> <chr> <chr> 
## 1 audi         a4      1.8  1999     4 auto(l5)   f        18    29 p     compa…
## 2 audi         a4      1.8  1999     4 manual(m5) f        21    29 p     compa…
## 3 audi         a4      2    2008     4 manual(m6) f        20    31 p     compa…
## 4 audi         a4      2    2008     4 auto(av)   f        21    30 p     compa…
## 5 audi         a4      2.8  1999     6 auto(l5)   f        16    26 p     compa…
## 6 audi         a4      2.8  1999     6 manual(m5) f        18    26 p     compa…

However, instead we can use to pipe (%>%) to get the same result.

mpg_df %>% head()
## # A tibble: 6 × 11
##   manufacturer model displ  year   cyl trans      drv     cty   hwy fl    class 
##   <chr>        <chr> <dbl> <int> <int> <chr>      <chr> <int> <int> <chr> <chr> 
## 1 audi         a4      1.8  1999     4 auto(l5)   f        18    29 p     compa…
## 2 audi         a4      1.8  1999     4 manual(m5) f        21    29 p     compa…
## 3 audi         a4      2    2008     4 manual(m6) f        20    31 p     compa…
## 4 audi         a4      2    2008     4 auto(av)   f        21    30 p     compa…
## 5 audi         a4      2.8  1999     6 auto(l5)   f        16    26 p     compa…
## 6 audi         a4      2.8  1999     6 manual(m5) f        18    26 p     compa…

Although the two provide the same result, the second pipe option increases readability. Since R uses a lot of nested arguments, using the pipe function makes arguments clear through left to right reading.

1.2 Graphics

This section discusses how to use the standardize graphics function ggplot. ggplot stands for grammar graphics plot. The grammar refers to the standardized organization of graphics, meaning there is a standarized way of creating many different types of plots. This means that understanding ggplot allows for versatile plot creation with a base understanding of basic grammer/rules.

# automatically create an R chunk using CTRL+ATL+I

# ggplot is a blank canvas to create an plot
mpg_df %>% ggplot()

2.3 Aesthetics

We can add variables in aes (aesthetics argument) to adjust the plot to account for the type of data we provide it. In this case, there is an x and y provided with numeric values for each, so it creates a coordinate plane.

mpg_df %>% ggplot(aes(x = displ, # engine volume
                      y = cty))  # city milage

2.4 Geometry

We can add geomtric information using the geom (geometry) sections. Using the ‘+’ adds an additional layer to the base graph.

mpg_df %>% ggplot(aes(x = displ, # engine volume
                      y = cty)) + geom_point()

2.5 Adding color (Aesthetics to geom_points)

We can similarly add aesthetic changes to geom component layers.

Here we color each of the points based on the ‘class’ column in the mpg_df dataframe. Here the classes are nominal, so they are given a different color for each of the

mpg_df %>% ggplot(aes(x = displ, y = cty)) + # new line!
              geom_point(aes(colour = class))

Notice the new line. These can only happen AFTER ‘+’ layers.

Below, I use the cty variable to color the points. Since the cty column is now continuous we can see that the coloring generated is different to represent the gradient of colors being used!

mpg_df %>% ggplot(aes(x = displ, y = cty)) + # new line!
              geom_point(aes(colour = cty))

2.6 Adding more layers! (A trendline)

Now we can create another layer to show a trendline on our scatterplot. geom_smooth creates this summary line of all the data. Here, we specify that its a linear model with (lm).

mpg_df %>% ggplot(aes(x = displ, y = cty)) + 
              geom_point(aes(colour = class)) +
              geom_smooth(method = 'lm') #lm = linear model
## `geom_smooth()` using formula = 'y ~ x'

Ordering of the layers matters.

Notice above that the trendline appears above the points, covering some of them up.

This is because the layers are drawn in order of the addition. So here, the points are drawn first and then the trendline, placing it on top.

Thus, if we were to re-arrange the layers such that the trendline is drawn first, the points could then appear over the trendline.

mpg_df %>% ggplot(aes(x = displ, y = cty)) + 
              geom_smooth(method = 'lm') +  #SWAPPED!
              geom_point(aes(colour = class))
## `geom_smooth()` using formula = 'y ~ x'

2.7 Facets (Subplots)

In some cases, we want to create subplots(!) from our original plot. This is called faceting.

Below we specify that the overall, above graph should be seperated based on the class variable. We do this using the ‘~’. Notice that each of the subplots only includes points from their respective classes. Further, the regression line is now drawn for each of the subplots!

mpg_df %>% ggplot(aes(x = displ, y = cty)) + 
              geom_smooth(method = 'lm') +  
              geom_point(aes(colour = class)) + 
              facet_wrap( ~ class)
## `geom_smooth()` using formula = 'y ~ x'

2.8 Coordinate spaces

ggplot automatically scales plots that are created based on the variables fed into it.

However, lets saw we want to limit that x and y axes to only should above 0. We can adjust our graph with another layer essentially cropping the graph to a our desired lengths.

Where xlim/ylim mean x limit and y limit.

mpg_df %>% ggplot(aes(x = displ, y = cty)) + 
              geom_smooth(method = 'lm') +  
              geom_point(aes(colour = class)) + 
              xlim(0, 7) + ylim(0, 40) 
## `geom_smooth()` using formula = 'y ~ x'

2.9 Axis labels

We can further add to our graph by adding layers with axes and title labels. We simply add new layers with the appropriate functions. Here we use xlab/ylab for x label & y label, and then ggtitle for a graphic title.

mpg_df %>% ggplot(aes(x = displ, y = cty)) + 
              geom_smooth(method = 'lm') +  
              geom_point(aes(colour = class)) + 
              xlim(0, 7) + ylim(0, 40) + 
              xlab('Engine size (L)') +
              ylab('Miles per gallon in city') +
              ggtitle(label = 'Engine size affects milage', subtitle = 'Can have a subtitle!') 
## `geom_smooth()` using formula = 'y ~ x'

2 Making more types of beautiful graphs (ggplot cont.)

Now we are going to explore different kinds of graphs using the diamond database!

There are many other types of graphs, so this section shows several other types of graphs we can make and how.

3 made edits…