Introduction to Data Visualization with ggplot2

Phoenix R User Group, Alexey Butyrev

January 22, 2018

Roadmap

Introduction: ggplot2

Plot (Base)

  1. “Artist’s palette” model
  2. Start with blank canvas and build up from there
  3. Start with plot function (or similar)
  4. Use annotation functions to add/modify (text, lines, points, axis)

Pros:

Convenient, mirrors how we think of building plots and analyzing data

Cons:

Can’t go back once plot has started (i.e. to adjust margins); need to plan in advance Difficult to “translate” to others once a new plot has been created (no graphical “language”). Plot is just a series of R commands

plot(x = pressure$temperature, y= pressure$pressure, 
     type = "b", col = "blue", pch = 20, 
     xlab = "Temperature", ylab = "Pressure", main = "Plot Example")

Lattice (Base)

Plots are created with a single function call (xyplot, bwplot, etc.)

Pros:

Most useful for conditioning types of plots: Looking at how y changes with x across levels of z Thinks like margins/spacing set automatically because entire plot is specified at once Good for putting many many plots on a screen Cons:

Sometimes awkward to specify an entire plot in a single function call Annotation in plot is not intuitive Use of panel functions and subscripts difficult to wield and requires intense preparation Cannot ‘add’ to the plot once it’s created

library(lattice)

Depth <- equal.count(quakes$depth, number=8, overlap=.1)
xyplot(lat ~ long | Depth, data = quakes)

states <- data.frame(state.x77,
                     state.name = dimnames(state.x77)[[1]],
                     state.region = state.region)
xyplot(Murder ~ Population | state.region, data = states,
       groups = state.name,
       panel = function(x, y, subscripts, groups) {
           ltext(x = x, y = y, labels = groups[subscripts], cex=1,
                 fontfamily = "HersheySans")
       })

ggplot2: qplot (Quick Plot)

Data

library(ggplot2)
str(mpg)
## Classes 'tbl_df', 'tbl' and 'data.frame':    234 obs. of  11 variables:
##  $ manufacturer: chr  "audi" "audi" "audi" "audi" ...
##  $ model       : chr  "a4" "a4" "a4" "a4" ...
##  $ displ       : num  1.8 1.8 2 2 2.8 2.8 3.1 1.8 1.8 2 ...
##  $ year        : int  1999 1999 2008 2008 1999 1999 2008 1999 1999 2008 ...
##  $ cyl         : int  4 4 4 4 6 6 6 4 4 4 ...
##  $ trans       : chr  "auto(l5)" "manual(m5)" "manual(m6)" "auto(av)" ...
##  $ drv         : chr  "f" "f" "f" "f" ...
##  $ cty         : int  18 21 20 21 16 18 18 18 16 20 ...
##  $ hwy         : int  29 29 31 30 26 26 27 26 25 28 ...
##  $ fl          : chr  "p" "p" "p" "p" ...
##  $ class       : chr  "compact" "compact" "compact" "compact" ...

Scatter plot

qplot(displ, hwy, data = mpg)

Color

qplot(displ, hwy, data = mpg, color = drv)

Geoms

qplot(displ, hwy, data = mpg, geom = c("point", "smooth"))

Histogram

qplot(hwy, data = mpg)

Facets

qplot(displ, hwy, data = mpg, facets = drv ~.)

qplot(hwy, data = mpg, facets = drv ~.)

Basic comonents of ggplot2 Plot

Building plots with ggplot2

Example of gpplot2

g <- ggplot(mpg, aes(class))
g <- g + geom_bar()
print(g)

Color by type

g <- g + geom_bar(aes(fill = drv), color = "black")
print(g)

Change color

g <- g + scale_fill_manual(values=c("DarkMagenta", "DarkOrange", "DodgerBlue"), 
                           name = "", 
                           labels = c("4-wheel drive","front-wheel drive","rear wheel drive"))
print(g)

Change axis

g <- g + labs(x = "Vehicle Class", y = "Number of Cars", title = "Example of ggplot2", subtitle = "Meetup, Phoenix, 2018")
print(g)

Flip coordinates

g <- g + coord_flip()
print(g)

Add %s to bars

library(scales)
t <- table(mpg$class)
t
## 
##    2seater    compact    midsize    minivan     pickup subcompact 
##          5         47         41         11         33         35 
##        suv 
##         62
text.df <- data.frame(class = names(t), count = as.numeric(t), rate = percent(as.numeric(t)/ sum(as.numeric(t))))

text.df 
##        class count  rate
## 1    2seater     5  2.1%
## 2    compact    47 20.1%
## 3    midsize    41 17.5%
## 4    minivan    11  4.7%
## 5     pickup    33 14.1%
## 6 subcompact    35 15.0%
## 7        suv    62 26.5%
g <- g + geom_text(data = text.df, aes(x = class,y = count, label = rate), hjust=-0.5, fontface = "bold")
g <- g + ylim(0, 90)
print(g)

Sort

library(dplyr)

g <- ggplot(mpg %>% 
              inner_join(text.df %>% mutate(lbl = as.factor(count)),"class"), aes(x= lbl))
g <- g + geom_bar()
g <- g + geom_bar(aes(fill = drv), color = "black")
g <- g + scale_fill_manual(values=c("DarkMagenta", "DarkOrange", "DodgerBlue"), 
                           name = "", 
                           labels = c("4-wheel drive","front-wheel drive","rear wheel drive"))
g <- g + labs(x = "Vehicle Class", y = "Number of Cars", title = "Example of ggplot2", subtitle = "Meetup, Phoenix, 2018")
g <- g + coord_flip()
g <- g + geom_text(data = text.df, aes(x = as.factor(count),y = count, label = rate), hjust=-0.5, fontface = "bold")
g <- g + ylim(0, 80)
g <- g + scale_x_discrete(labels  = levels(reorder(text.df$class, text.df$count)))
print(g)

Themes

p <- g + theme_dark()
p <- p + geom_text(data = text.df, aes(x = as.factor(count),y = count, label = rate), hjust=-0.5, fontface = "bold", color = "yellow")
print(p)

p <- g + theme_bw()
print(p)

Final chart

p <- g +   theme(legend.position = "right",
                 strip.text.x      = element_text(family = "sans", face = "bold"),
                 plot.background   = element_rect(fill = "white"),
                 panel.background  = element_rect(fill="White", color = "black"),
                 panel.border      = element_rect(fill=NA, color = "black"),
                 panel.grid.major  = element_line(colour = "LightGrey"),
                 axis.text.y       = element_text(family = "sans", colour = "black", size = 10, face = "bold"),
                 axis.text.x       = element_text(family = "sans", colour = "black", size = 8),
                 panel.grid.major.y       = element_line(colour = "LightGrey", linetype = 3),
                 axis.ticks = element_blank())
p <- p + xlab("")
plot(p)

Examples