Introduction to ggplot2

Angela Zoss
February 9, 2017

What is ggplot2?

an R package designed to create plots based on a theory of the grammar of graphics.

Data visualization chapter, R4DS book

Why ggplot2 instead of base R?

  • nice defaults
  • easy faceting
  • (arguably) more natural syntax
  • can switch chart types more easily

“Why I use ggplot2”, David Robinson

ggplot2: Elements

Basic elements in any ggplot2 visualization

  • data
  • geom
    (or “shape” or “mark”)
  • coordinate system
    (the arrangement of the marks;
    most geoms use default, cartesian)

types of geoms

  • geom_bar()
  • geom_point()
  • geom_histogram()
  • geom_map()
  • etc.

Note: some geoms also include
data summary functions.

e.g., the “bar” geom will count
data points in each category.

ggplot2 cheatsheet

ggplot2: Basic syntax

template for a simple plot

aesthetic variable mappings

aesthetic variable mappings

aesthetic variable mappings

non-variable adjustments

template for a more complex plot

ggplot2: Building a plot

library(ggplot2)

ggplot(mpg, aes(displ, hwy)) +
    geom_point()

plot of chunk unnamed-chunk-1

Graphics for Communication, R for Data Science

library(ggplot2)

ggplot(mpg, aes(displ, hwy)) +
    geom_point(aes(color = class))

plot of chunk unnamed-chunk-2

Graphics for Communication, R for Data Science

library(ggplot2)

ggplot(mpg, aes(displ, hwy)) +
    geom_point(aes(color = class)) +
    labs(x = "Engine Displacement, in Liters", y="Highway Miles per Gallon")

plot of chunk unnamed-chunk-3

Graphics for Communication, R for Data Science

library(ggplot2)

ggplot(mpg, aes(displ, hwy)) +
    geom_point(aes(color = class)) +
    labs(x = "Engine Displacement, in Liters", y="Highway Miles per Gallon") + 
    theme_bw()

plot of chunk unnamed-chunk-4

Graphics for Communication, R for Data Science

library(ggplot2)

ggplot(mpg, aes(displ, hwy)) +
    geom_point(aes(color = class), size=7) +
    labs(x = "Engine Displacement, in Liters", y="Highway Miles per Gallon") + 
    theme_bw()

plot of chunk unnamed-chunk-5

Graphics for Communication, R for Data Science

library(ggplot2)

ggplot(mpg, aes(displ, hwy)) +
    geom_point(aes(color = class), size=7, alpha=0.5) +
    labs(x = "Engine Displacement, in Liters", y="Highway Miles per Gallon") + 
    theme_bw()

plot of chunk unnamed-chunk-6

Graphics for Communication, R for Data Science

library(ggplot2)

ggplot(mpg, aes(displ, hwy)) +
    geom_point(aes(color = class), size=7, alpha=0.5) +
    labs(x = "Engine Displacement, in Liters", y="Highway Miles per Gallon") +
    scale_color_brewer(palette="Dark2", name="") + 
    theme_bw()

plot of chunk unnamed-chunk-7

Graphics for Communication, R for Data Science

# geom_bin2d will aggregate points for you

# using scale_?_log10 will change the axis spacing 
# but leave labels comprehensible

ggplot(diamonds, aes(carat, price)) +
    geom_bin2d() +
    scale_x_log10() +
    scale_y_log10()

Graphics for Communication, R for Data Science

Principles for Effective Visualizations

Principle 1: Order matters

plot of chunk unnamed-chunk-9

Order by semantics

data$answer <- 
    factor(data$answer,
           levels=c("None","A little", "Some", "A lot"),
           ordered = TRUE)

plot of chunk unnamed-chunk-11

plot of chunk unnamed-chunk-12

Order by value

data$academic_field <- 
    factor(data$academic_field,
           levels=names(
               sort(
                   table(
                       data$academic_field),decreasing=TRUE)))

plot of chunk unnamed-chunk-14

Principle 2: Put long categories on y-axis

plot of chunk unnamed-chunk-15

coord_flip()

plot of chunk unnamed-chunk-17

Oops!

data$academic_field <- 
    factor(data$academic_field,
           levels=names(
               sort(
                   table(data$academic_field),
                   decreasing=TRUE)))
data$academic_field <- 
    factor(data$academic_field,
           levels=names(
               sort(
                   table(data$academic_field))))

plot of chunk unnamed-chunk-20

Principle 3: Pick a purpose

plot of chunk unnamed-chunk-21

plot of chunk unnamed-chunk-22

Different placement helps with different comparisons.

fill=highest_degree
facet_grid(.~highest_degree)

Principle 4: Keep scales consistent

plot of chunk unnamed-chunk-25

plot of chunk unnamed-chunk-26

Keep all categories, manually set axes

scale_x_discrete(drop=FALSE)

scale_y_continuous(limits=c(0,40),breaks=c(0,10,20,30,40),minor_breaks=NULL)

plot of chunk unnamed-chunk-28

plot of chunk unnamed-chunk-29

Principle 5: Select meaningful colors

plot of chunk unnamed-chunk-30

Select colors manually, or use alternate palette

scale_fill_manual(
    values=c("snow4","snow3",
             "tan3","tan1",
             "turquoise2","turquoise4"))

scale_fill_manual(
    values=c("#fee391","#fe9929", "#cc4c02"))

# Also see package RColorBrewer
scale_fill_brewer(palette="BrBG")

plot of chunk unnamed-chunk-32

Resources

Questions?