Introduction to ggplot2

Angela Zoss
November 15, 2016

ggplot2: Elements

Basic elements in any ggplot2 visualization

  • data
  • geom
    (or “shape” or “mark”)
  • coordinate system
    (the arrangement of the marks;
    most geoms use default, cartesian)

types of geoms

  • geom_bar()
  • geom_point()
  • geom_histogram()
  • geom_map()
  • etc.

Note: some geoms also include
data summary functions.

e.g., the “bar” geom will count
data points in each category.

ggplot2 cheatsheet

ggplot2: Basic syntax

template for a simple plot

aesthetic variable mappings

aesthetic variable mappings

aesthetic variable mappings

non-variable adjustments

template for a more complex plot

ggplot2: Examples

library(ggplot2)

ggplot(mpg, aes(displ, hwy)) +
    geom_point(aes(color = class))

Graphics for Communication, R for Data Science

# geom_bin2d will aggregate points for you

# using scale_?_log10 will change the axis spacing 
# but leave labels comprehensible

ggplot(diamonds, aes(carat, price)) +
    geom_bin2d() +
    scale_x_log10() +
    scale_y_log10()

Graphics for Communication, R for Data Science

Principles for Effective Visualizations

Principle 1: Order matters

plot of chunk unnamed-chunk-3

Order by semantics

data$answer <- 
    factor(data$answer,
           levels=c("None","A little", "Some", "A lot"),
           ordered = TRUE)

plot of chunk unnamed-chunk-5

plot of chunk unnamed-chunk-6

Order by value

data$academic_field <- 
    factor(data$academic_field,
           levels=names(
               sort(
                   table(
                       data$academic_field),decreasing=TRUE)))

plot of chunk unnamed-chunk-8

Principle 2: Put long categories on y-axis

plot of chunk unnamed-chunk-9

coord_flip()

plot of chunk unnamed-chunk-11

Oops!

data$academic_field <- 
    factor(data$academic_field,
           levels=names(
               sort(
                   table(data$academic_field),
                   decreasing=TRUE)))
data$academic_field <- 
    factor(data$academic_field,
           levels=names(
               sort(
                   table(data$academic_field))))

plot of chunk unnamed-chunk-14

Principle 3: Pick a purpose

plot of chunk unnamed-chunk-15

plot of chunk unnamed-chunk-16

Different placement helps with different comparisons.

fill=highest_degree
facet_grid(.~highest_degree)

Principle 4: Keep scales consistent

plot of chunk unnamed-chunk-19

plot of chunk unnamed-chunk-20

Keep all categories, manually set axes

scale_x_discrete(drop=FALSE)

scale_y_continuous(limits=c(0,40),breaks=c(0,10,20,30,40),minor_breaks=NULL)

plot of chunk unnamed-chunk-22

plot of chunk unnamed-chunk-23

Principle 5: Select meaningful colors

plot of chunk unnamed-chunk-24

Select colors manually, or use alternate palette

scale_fill_manual(
    values=c("snow4","snow3",
             "tan3","tan1",
             "turquoise2","turquoise4"))

scale_fill_manual(
    values=c("#fee391","#fe9929", "#cc4c02"))

# Also see package RColorBrewer
scale_fill_brewer(palette="BrBG")

plot of chunk unnamed-chunk-26

Questions?