Introduction

This file covers basic plotting functions with ggplot2. Material covered includes: - Plotting by category by color, shape, etc. - Plotting multiple geoms - Plotting and highlighting subsets of data - Transforming data with stat_count() to determine best fit plot type - Aesthetic adjustments - Plot template at end

Shortcuts

Ctrl + Alt + I #creates new chunks Ctrl + Enter #to run selected line Ctrl + Shift + Enter #to run whole chunk

when kniting: more # = smaller text on printed documents

# install.packages("tidyverse") 
# install.packages("knitr")
# remove.packages()
library("tidyverse") 
library("knitr") 

Plotting

Create plot with mpg dataset from tidyverse (can use cursor and tab to select x & y varriables)

mpg
## # A tibble: 234 × 11
##    manufacturer model      displ  year   cyl trans drv     cty   hwy fl    class
##    <chr>        <chr>      <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
##  1 audi         a4           1.8  1999     4 auto… f        18    29 p     comp…
##  2 audi         a4           1.8  1999     4 manu… f        21    29 p     comp…
##  3 audi         a4           2    2008     4 manu… f        20    31 p     comp…
##  4 audi         a4           2    2008     4 auto… f        21    30 p     comp…
##  5 audi         a4           2.8  1999     6 auto… f        16    26 p     comp…
##  6 audi         a4           2.8  1999     6 manu… f        18    26 p     comp…
##  7 audi         a4           3.1  2008     6 auto… f        18    27 p     comp…
##  8 audi         a4 quattro   1.8  1999     4 manu… 4        18    26 p     comp…
##  9 audi         a4 quattro   1.8  1999     4 auto… 4        16    25 p     comp…
## 10 audi         a4 quattro   2    2008     4 manu… 4        20    28 p     comp…
## # ℹ 224 more rows
ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy)) 

#displ is engine size, hwy is fuel efficency
#geom_points creates scatter plot, aes() is used to select x and y variables

Plot template using ggplot()

#ggplot(data = <DATA>) + 
#  <GEOM_FUNCTION>(mapping = aes(<MAPPINGS>)

Set car class to be displayed by color using aes()

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy, colour = class))

Set car class to be displayed by size of point (not good data visualization)

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy, size = class))

Set car class to be displayed by different transparency

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy, alpha = class))

Set car class to be displayed by different shapes (not recomended when > 6 categories)

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy, shape = class))

Turn all points blue

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy), color = "blue")

Color code engine sizes > than 5

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy, colour = displ < 5))

Error reminder: the + must be on top line

ggplot(data = mpg) 

+ geom_point(mapping = aes(x = displ, y = hwy)) 
## Error:
## ! Cannot use `+` with a single argument.
## ℹ Did you accidentally put `+` on a new line?

Creating sub plots/pannels. Divide by class and set 2 rows with facet_wrap()

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy)) + 
  facet_wrap(~ class, nrow = 2) #~ for object, nrow for no. of rows

Use facet_grid() to split up plots by multiple variables (separate variables w/ ~)

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy)) + 
  facet_grid(drv ~ cyl)

Use . if you don’t want to include the rows or columns section

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy))+ 
facet_grid(. ~ cyl)

Plot data as smooth line w/ geom_smooth()

ggplot(data = mpg) + 
  geom_smooth(mapping = aes(x = displ, y = hwy))

Can comment out plots that don’t display data well to keep track

ggplot(data = mpg) + 
  #geom_point(mapping = aes(x = displ, y = hwy)) # points horrible 
  geom_smooth(mapping = aes(x = displ, y = hwy)) # smooth line better

Can generate mutiple lines for different categories with linetype =

ggplot(data = mpg) + 
  geom_smooth(mapping = aes(x = displ, y = hwy, linetype = drv))

Use group = to sort data by a category

 ggplot(data = mpg) +
  geom_smooth(mapping = aes(x = displ, y = hwy, group = drv))

Change color of each line based on drv and exclude legend

ggplot(data = mpg) +
  geom_smooth(
    mapping = aes(x = displ, y = hwy, color = drv),
    show.legend = FALSE,
  )

Plot data as sacterplot and smooth line graph on top of each other (would have to change x variable in both locations to edit)

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy)) +
  geom_smooth(mapping = aes(x = displ, y = hwy))

More efficent way to plot above set if you want to change x & y varriables

ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_point() + 
  geom_smooth()

Plots are the same just different methods

ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_point() + 
  geom_smooth()

ggplot() + 
  geom_point(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_smooth(data = mpg, mapping = aes(x = displ, y = hwy))

Can edit one geom to separate things by color

ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_point(mapping = aes(color = class)) + 
  geom_smooth()

Plot a subset of the data with one geom (only subcompact vehicles)

ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_point(mapping = aes(color = class)) + 
  geom_smooth(data = filter(mpg, class == "subcompact"), se = FALSE)

Change geom_(line, histogram, box, bar) to get the desired chart type

ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_line()

Transforming Data (diamonds dataset)

Bar chart shows most have high quality cuts

ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut))

stat_count() can be used instead of geom (automatically does stats and best fit plot)

ggplot(data = diamonds) + 
  stat_count(mapping = aes(x = cut))

Change count (a summary) to identify which uses raw values. (Required for when you want to change defaults )

demo <- tribble(
  ~cut,         ~freq,
  "Fair",       1610,
  "Good",       4906,
  "Very Good",  12082,
  "Premium",    13791,
  "Ideal",      21551)
demo
## # A tibble: 5 × 2
##   cut        freq
##   <chr>     <dbl>
## 1 Fair       1610
## 2 Good       4906
## 3 Very Good 12082
## 4 Premium   13791
## 5 Ideal     21551
ggplot(data = demo) +
  geom_bar(mapping = aes(x = cut, y = freq), stat = "identity")

Aethstetic Adjustments

fill = can add differential colors to bar chart on diamond cut or used to map proportion of cuts with certian clarity

ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut, fill = cut))

ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut, fill = clarity))

Position Addjustments: identity (raw data), fill (changes heights), and dodge (forces ggplot2 to not put things on top of each other)

To alter transparency (alpha)

ggplot(data = diamonds, mapping = aes(x = cut, fill = clarity)) + 
  geom_bar(alpha = 1/5, position = "identity")   #Identity uses raw data points instead

ggplot(data = diamonds, mapping = aes(x = cut, fill = clarity)) + 
  geom_bar(alpha = 3/5, position = "identity")

To color the bar outlines with no fill color (never use)

ggplot(data = diamonds, mapping = aes(x = cut, colour = clarity)) + 
  geom_bar(fill = NA, position = "identity")

To make bars the same height to look at proportions in a group use: position = “fill”

Can do something similar with position = “doge” to place objects besides each other (pop-size structure)

ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut, fill = clarity), position = "fill")

ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut, fill = clarity), position = "dodge")

position = “jitter” adds a small amount of random noise to each point lessen points overlap. This is useful for scatterplots but not barplots.

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy), position = "jitter")

PLOT TEMPLATE (fill in <>)

#ggplot(data = <DATA>) + 
#  <GEOM_FUNCTION>(
#     mapping = aes(<MAPPINGS>),
#    stat = <STAT>, 
#     position = <POSITION>
#  ) +
# <FACET_FUNCTION>