These notes are designed to track Section 3 of R for Data Science, by Wickham and Grolemund.

First get all of the packages included in the tidyverse. This includes ggplot2.

library(tidyverse)
## Loading tidyverse: ggplot2
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Loading tidyverse: dplyr
## Conflicts with tidy packages ----------------------------------------------
## filter(): dplyr, stats
## lag():    dplyr, stats

Examine the mpg dataframe from ggplot2.

str(mpg)
## Classes 'tbl_df', 'tbl' and 'data.frame':    234 obs. of  11 variables:
##  $ manufacturer: chr  "audi" "audi" "audi" "audi" ...
##  $ model       : chr  "a4" "a4" "a4" "a4" ...
##  $ displ       : num  1.8 1.8 2 2 2.8 2.8 3.1 1.8 1.8 2 ...
##  $ year        : int  1999 1999 2008 2008 1999 1999 2008 1999 1999 2008 ...
##  $ cyl         : int  4 4 4 4 6 6 6 4 4 4 ...
##  $ trans       : chr  "auto(l5)" "manual(m5)" "manual(m6)" "auto(av)" ...
##  $ drv         : chr  "f" "f" "f" "f" ...
##  $ cty         : int  18 21 20 21 16 18 18 18 16 20 ...
##  $ hwy         : int  29 29 31 30 26 26 27 26 25 28 ...
##  $ fl          : chr  "p" "p" "p" "p" ...
##  $ class       : chr  "compact" "compact" "compact" "compact" ...
summary(mpg)
##  manufacturer          model               displ            year     
##  Length:234         Length:234         Min.   :1.600   Min.   :1999  
##  Class :character   Class :character   1st Qu.:2.400   1st Qu.:1999  
##  Mode  :character   Mode  :character   Median :3.300   Median :2004  
##                                        Mean   :3.472   Mean   :2004  
##                                        3rd Qu.:4.600   3rd Qu.:2008  
##                                        Max.   :7.000   Max.   :2008  
##       cyl           trans               drv                 cty       
##  Min.   :4.000   Length:234         Length:234         Min.   : 9.00  
##  1st Qu.:4.000   Class :character   Class :character   1st Qu.:14.00  
##  Median :6.000   Mode  :character   Mode  :character   Median :17.00  
##  Mean   :5.889                                         Mean   :16.86  
##  3rd Qu.:8.000                                         3rd Qu.:19.00  
##  Max.   :8.000                                         Max.   :35.00  
##       hwy             fl               class          
##  Min.   :12.00   Length:234         Length:234        
##  1st Qu.:18.00   Class :character   Class :character  
##  Median :24.00   Mode  :character   Mode  :character  
##  Mean   :23.44                                        
##  3rd Qu.:27.00                                        
##  Max.   :44.00

The following variables are character: class, fl, drv and trans. Get tables of these to see what they mean.

table(mpg$class)
## 
##    2seater    compact    midsize    minivan     pickup subcompact 
##          5         47         41         11         33         35 
##        suv 
##         62
table(mpg$fl)
## 
##   c   d   e   p   r 
##   1   5   8  52 168
table(mpg$drv)
## 
##   4   f   r 
## 103 106  25
table(mpg$trans)
## 
##   auto(av)   auto(l3)   auto(l4)   auto(l5)   auto(l6)   auto(s4) 
##          5          2         83         39          6          3 
##   auto(s5)   auto(s6) manual(m5) manual(m6) 
##          3         16         58         19

The minimal ggplot2 command does nothing but specify a dataframe. Note that ggplot only works on dataframes. Run a minimal command and create an object to examine.

g1 = ggplot(data = mpg)
str(g1)
## List of 9
##  $ data       :Classes 'tbl_df', 'tbl' and 'data.frame': 234 obs. of  11 variables:
##   ..$ manufacturer: chr [1:234] "audi" "audi" "audi" "audi" ...
##   ..$ model       : chr [1:234] "a4" "a4" "a4" "a4" ...
##   ..$ displ       : num [1:234] 1.8 1.8 2 2 2.8 2.8 3.1 1.8 1.8 2 ...
##   ..$ year        : int [1:234] 1999 1999 2008 2008 1999 1999 2008 1999 1999 2008 ...
##   ..$ cyl         : int [1:234] 4 4 4 4 6 6 6 4 4 4 ...
##   ..$ trans       : chr [1:234] "auto(l5)" "manual(m5)" "manual(m6)" "auto(av)" ...
##   ..$ drv         : chr [1:234] "f" "f" "f" "f" ...
##   ..$ cty         : int [1:234] 18 21 20 21 16 18 18 18 16 20 ...
##   ..$ hwy         : int [1:234] 29 29 31 30 26 26 27 26 25 28 ...
##   ..$ fl          : chr [1:234] "p" "p" "p" "p" ...
##   ..$ class       : chr [1:234] "compact" "compact" "compact" "compact" ...
##  $ layers     : list()
##  $ scales     :Classes 'ScalesList', 'ggproto' <ggproto object: Class ScalesList>
##     add: function
##     clone: function
##     find: function
##     get_scales: function
##     has_scale: function
##     input: function
##     n: function
##     non_position_scales: function
##     scales: NULL
##     super:  <ggproto object: Class ScalesList> 
##  $ mapping    : list()
##  $ theme      : list()
##  $ coordinates:Classes 'CoordCartesian', 'Coord', 'ggproto' <ggproto object: Class CoordCartesian, Coord>
##     aspect: function
##     distance: function
##     expand: TRUE
##     is_linear: function
##     labels: function
##     limits: list
##     range: function
##     render_axis_h: function
##     render_axis_v: function
##     render_bg: function
##     render_fg: function
##     train: function
##     transform: function
##     super:  <ggproto object: Class CoordCartesian, Coord> 
##  $ facet      :Classes 'FacetNull', 'Facet', 'ggproto' <ggproto object: Class FacetNull, Facet>
##     compute_layout: function
##     draw_back: function
##     draw_front: function
##     draw_labels: function
##     draw_panels: function
##     finish_data: function
##     init_scales: function
##     map: function
##     map_data: function
##     params: list
##     render_back: function
##     render_front: function
##     render_panels: function
##     setup_data: function
##     setup_params: function
##     shrink: TRUE
##     train: function
##     train_positions: function
##     train_scales: function
##     vars: function
##     super:  <ggproto object: Class FacetNull, Facet> 
##  $ plot_env   :<environment: R_GlobalEnv> 
##  $ labels     : list()
##  - attr(*, "class")= chr [1:2] "gg" "ggplot"

The book suggests a template starting with just a minimal command. I usually prefer to place the basic aes specification at the beginning.

g1 = ggplot(data = mpg, aes(x = displ,y=hwy))
str(g1)
## List of 9
##  $ data       :Classes 'tbl_df', 'tbl' and 'data.frame': 234 obs. of  11 variables:
##   ..$ manufacturer: chr [1:234] "audi" "audi" "audi" "audi" ...
##   ..$ model       : chr [1:234] "a4" "a4" "a4" "a4" ...
##   ..$ displ       : num [1:234] 1.8 1.8 2 2 2.8 2.8 3.1 1.8 1.8 2 ...
##   ..$ year        : int [1:234] 1999 1999 2008 2008 1999 1999 2008 1999 1999 2008 ...
##   ..$ cyl         : int [1:234] 4 4 4 4 6 6 6 4 4 4 ...
##   ..$ trans       : chr [1:234] "auto(l5)" "manual(m5)" "manual(m6)" "auto(av)" ...
##   ..$ drv         : chr [1:234] "f" "f" "f" "f" ...
##   ..$ cty         : int [1:234] 18 21 20 21 16 18 18 18 16 20 ...
##   ..$ hwy         : int [1:234] 29 29 31 30 26 26 27 26 25 28 ...
##   ..$ fl          : chr [1:234] "p" "p" "p" "p" ...
##   ..$ class       : chr [1:234] "compact" "compact" "compact" "compact" ...
##  $ layers     : list()
##  $ scales     :Classes 'ScalesList', 'ggproto' <ggproto object: Class ScalesList>
##     add: function
##     clone: function
##     find: function
##     get_scales: function
##     has_scale: function
##     input: function
##     n: function
##     non_position_scales: function
##     scales: NULL
##     super:  <ggproto object: Class ScalesList> 
##  $ mapping    :List of 2
##   ..$ x: symbol displ
##   ..$ y: symbol hwy
##  $ theme      : list()
##  $ coordinates:Classes 'CoordCartesian', 'Coord', 'ggproto' <ggproto object: Class CoordCartesian, Coord>
##     aspect: function
##     distance: function
##     expand: TRUE
##     is_linear: function
##     labels: function
##     limits: list
##     range: function
##     render_axis_h: function
##     render_axis_v: function
##     render_bg: function
##     render_fg: function
##     train: function
##     transform: function
##     super:  <ggproto object: Class CoordCartesian, Coord> 
##  $ facet      :Classes 'FacetNull', 'Facet', 'ggproto' <ggproto object: Class FacetNull, Facet>
##     compute_layout: function
##     draw_back: function
##     draw_front: function
##     draw_labels: function
##     draw_panels: function
##     finish_data: function
##     init_scales: function
##     map: function
##     map_data: function
##     params: list
##     render_back: function
##     render_front: function
##     render_panels: function
##     setup_data: function
##     setup_params: function
##     shrink: TRUE
##     train: function
##     train_positions: function
##     train_scales: function
##     vars: function
##     super:  <ggproto object: Class FacetNull, Facet> 
##  $ plot_env   :<environment: R_GlobalEnv> 
##  $ labels     :List of 2
##   ..$ x: chr "displ"
##   ..$ y: chr "hwy"
##  - attr(*, "class")= chr [1:2] "gg" "ggplot"

The object g1 has a lot of content, but attempting to display it produces an empty graph, basically a blank slate. Note tha the two axes have been created with useful values.

g1

With the aesthetics specified at the beginning we don’t need to repeat the aesthestics in the following layers.

Start with a layer containing a geom, geom_point() to create a scatterplot.

g2 = g1 + geom_point()
g2

We can add an aesthetic in the geom_point geom. Use color to indicate the class of variable.

g2 = g1 + geom_point(aes(color = class))
g2

Note that the original aesthestics specified when we created g1 are preserved.

Now we can add another layer. Add a layer with a second geom, geom_smooth() to add a loess curve.

g3 = g2 + geom_smooth()
g3
## `geom_smooth()` using method = 'loess'