Spring 2020

Outline

  1. GGPlot2 Overview
  2. GGPlot2 Layers and Examples
  3. Colors in GGPlot2
  4. More Examples
  5. More Information

GGPlot2 Overview

Why ggplot2?

The ggplot2 library:

  • uses a consistent grammar of graphics
  • provides a high-level plots specification
  • allows user to think in terms of a layers data visualization pipeline
  • provides extensive visualization functionality for many common graphics
  • is widely used

Pipeline / Grammar

  • Users specify building plots for a plot, then layer them as desired
  • Building blocks include:
    • data sets
    • aesthetic mapping (what fields map to what visual elements)
    • geometry objects (how to visually encode those elements)
    • transformations, coordinate systems, and scaling mechanisms
    • thematic fine-tuning (fonts, annotations, positions adjustments)
    • faceting (multi-panel plotting)
  • Each layer is added onto the old layer, making it intuitive and natural

Basics of ggplot2

  • The ggplot() is the foundational function in the ggplot2 library
  • It initializes a plotting object, setting up:
    • The data set to use
    • Which variables map to what plot elements (x, y, fill, size, etc.)
    • The core plot object
  • The object returned by ggplot() is not a visualization by itself
  • It is extended and interpretted with subsequent function calls for visualziation

Data Frames

  • Most plot tools have functions that take explicit data types as arguments (e.g., the built-in plot() function takes R vectors for x and y)
  • ggplot consumes R data frames, and it understands the concept of variables (fields) in a data frame
  • ggplot understands the difference between numeric and categorical data and treats them differently
  • You can flexibly map variables to whatever visual elements you prefer
  • Data frames should be in long (statistical) form

Plot Element Visualization Attributes

Elements you choose to visualize may be set explicitly, or may be mapped to a variable using aes()

  • x – Position along the x-axis
  • y – Position along the y-axis
  • size – Point width, line thickness, etc.
  • linetype – Type of line (dotted, dashed, solid, etc.)
  • color – Usually the color of the outer border of something
  • fill – The color of the inner fill of some shape

Plot Element Visualization Attributes (2)

Additional elements you choose to visualize may be set explicitly, or may be mapped to a variable using aes()

  • shape – The symbol being used for some point
  • linetype – The style of line objects (e.g., solid, dashed)
  • alpha – The transparency of an object
  • label – Text labels associated with an object

Creating a Plot Object with ggplot()

  • Load the ggplot2 library
  • Call ggplot(), providing:
    • The data frame to use
    • The variable mapping, aes()
library(ggplot2)
rd = data.frame(
  Student = c("Bob", "Sue", "Cat", "Lin"),
  NumberGrade = c(96, 82, 97, 74),
  LetterGrade = factor(c("A","B","A","C")) )

p = ggplot(rd, aes(y=NumberGrade))

Encoding Plot Elements with Geometric Objects

ggplot2 interprets plot elements using geom objects:

  • geom_point – Points
  • geom_bar – Bars
  • geom_line – Lines
  • geom_smooth – Smoothed curves
  • geom_polygon – Polygons
  • geom_boxplot – Boxplot

Geometric Objects (2)

Other Types of Layers

Aside from geom objects, there are other kinds of layers:

  • Scales – Scaling controls for the mapping between data and aesthetics (scale_x_discrete(), scale_size_continuous(), etc. In general: scale_AESTHETIC_QUALIFER
  • Coordinate systems – For transforming data to other coordinate systems (coord_flip(), coord_polar(), etc.)
  • Faceting – Splitting up data into trellis displays (facet_grid(), facet_wrap(), etc.)
  • Themes – Changing non-data elements of plot (element_text(), etc.)

http://docs.ggplot2.org/current/

GGPlot2 Layers and Examples

Adding Layers to the Plot for Visualization

  • Use the “+” operator to add visualization layers to the plot object
  • Layers are placed in the order that they are added, in a pipeline
  • It’s easy to keep each layer separate and explicit
library(ggplot2)
rd = data.frame(Student = c("Bob", "Sue", "Cat", "Lin"),
                NumberGrade = c(96, 82, 97, 74),
                LetterGrade = factor(c("A","B","A","C")) )

ggplot(rd, aes(x=Student,y=NumberGrade)) +   # Build the plot object
    geom_point(size=5) +                     # Encode visually using points
    xlab("Student Name") +                   # Label the X axis
    ylab("Numeric Grade") +                  # Label the Y axis
    ggtitle("Course Grade Results")          # Give the plot a title

Interpretting the Plot for Visualization

Interpretting the Plot for Visualization

  • The same basic plot object can be interpretted differently:
library(ggplot2)
rd = data.frame(Student = c("Bob", "Sue", "Cat", "Lin"),
                NumberGrade = c(96, 82, 97, 74),
                LetterGrade = factor(c("A","B","A","C")) )

ggplot(rd, aes(x=Student,y=NumberGrade)) +
    geom_bar(stat="identity") +              # Only line that changed...
    xlab("Student Name") + 
    ylab("Numeric Grade") + 
    ggtitle("Course Grade Results")

Interpretting the Plot for Visualization

Interpretting the Plot for Visualization

library(ggplot2)
rd = data.frame(Student = c("Bob", "Sue", "Cat", "Lin"),
                NumberGrade = c(96, 82, 97, 74),
                LetterGrade = factor(c("A","B","A","C")) )

ggplot(rd, aes(x=Student,y=NumberGrade)) +
    geom_bar(stat="identity") + 
    coord_flip() +
    xlab("Student Name") + 
    ylab("Numeric Grade") + 
    ggtitle("Course Grade Results")

Interpretting the Plot for Visualization

Interpretting the Plot for Visualization

  • Other variables can be mapped to other plot features, as well
library(ggplot2)
rd = data.frame(Student = c("Bob", "Sue", "Cat", "Lin"),
                NumberGrade = c(96, 82, 97, 74),
                LetterGrade = factor(c("A","B","A","C")) )

ggplot(rd, aes(x=Student,y=NumberGrade,fill=LetterGrade)) +
    geom_bar(stat="identity") + 
    xlab("Student Name") + 
    ylab("Numeric Grade") + 
    ggtitle("Course Grade Results")

Interpretting the Plot for Visualization

Colors in GGPlot2

Color vs. Fill

  • In ggplot2:
    • color refers to the color of borders
    • fill refers to the color of the inside fill
  • Except with the default point shape, which is actually a font – and so uses color only
  • Unless you change the shape to a drawn shape (e.g., shape 21)

Using the Default Point Shape (Font)

library(ggplot2)

myData = data.frame(Furbletude=rnorm(30),
                    Blehmekness=rnorm(30))

ggplot(myData, aes(x=Furbletude, y=Blehmekness)) +
  geom_point(color="lightblue", fill="darkblue", size=4)

Using the Default Point Shape (Font)

Using a Drawn Point Shape

library(ggplot2)

myData = data.frame(Furbletude=rnorm(30),
                    Blehmekness=rnorm(30))

ggplot(myData, aes(x=Furbletude, y=Blehmekness)) +
  geom_point(color="darkblue", fill="lightblue", size=4, shape=21)

Using a Drawn Point Shape

Prebuilt Colors vs. Customized Colors

  • R has hundreds of prebuilt colors (type colors() at the console to list)
  • But you can also specify custom colors in several ways, including:
    • In RGB hex via a string – e.g., “#992B1A”
    • Using the rgb() function – e.g., rgb(0.26, 0.52, 0.87)
    • Using the hsv() function – e.g., hsv(0.17, 0.98, 0.66)
  • You can construct palettes as lists of these
  • Or you can use prebuilt palletes

Customizing Colors

library(ggplot2)

myData = data.frame(Count=sample(1:10, 30, replace=T),
    Awesomeness=sample(c("CoolThings", "SillyThings", "Meh"), 30, replace=T))

ggplot(myData, aes(x=Awesomeness, y=Count)) +
  geom_bar(stat="identity", color="white", fill=rgb(0.12, 0.76, 0.9))

Customizing Colors

For Data-Driven Properties Like Color, Use aes()

library(ggplot2)

myData = data.frame(Count=sample(1:10, 30, replace=T),
        Awesomeness=sample(c("CoolThings", "SillyThings", "Meh"), 30, replace=T),
        TypeOfThing=sample(c("A", "B", "C"), 30, replace=T))

ggplot(myData, aes(x=Awesomeness, y=Count, fill=TypeOfThing)) +
  geom_bar(stat="identity", color="black")

For Data-Driven Properties Like Color, Use aes()

Customizing Palettes

  • In ggplot2, the scale_…() functions are used to override data-driven properties like colors
  • We’ll talk more about the specifics later
  • For now, we’ll get used to using RColorBrewer
  • It’s nice because it has some good pre-built discrete and continuous palettes
  • So install it, if you have not already

Selecting a Pre-Built Palette from RColorBrewer

library(ggplot2)
library(RColorBrewer)

myData = data.frame(Count=sample(1:10, 30, replace=T),
        Awesomeness=sample(c("CoolThings", "SillyThings", "Meh"), 30, replace=T),
        TypeOfThing=sample(c("A", "B", "C"), 30, replace=T))

ggplot(myData, aes(x=Awesomeness, y=Count, fill=TypeOfThing)) +
  geom_bar(stat="identity") +
  scale_fill_brewer(palette="Set1")

Selecting a Pre-Built Palette from RColorBrewer

More Examples

Mapping Continuous Colors and Sizes

library(ggplot2)

ggplot(mtcars, aes(x=mpg,y=hp)) + 
  geom_smooth(size=1.5, color="darkgray") + 
  geom_point(aes(size=gear,color=cyl)) +
  xlab("Miles per Gallon") + 
  ylab("Horse Power")

Mapping Continuous Colors and Sizes

Changing Font & Font Size

library(ggplot2)

ggplot(mtcars, aes(x=mpg,y=hp)) + 
  geom_point(size=4, shape=21, fill="lightblue", color="darkblue") +  
  xlab("Miles per Gallon") + 
  ylab("Horse Power") +
  theme(text=element_text(size=18, family="Times"))

Changing Font & Font Size

Histograms

library(ggplot2)

ggplot(diamonds, aes(carat)) + 
  geom_histogram(binwidth=0.5, fill="wheat", color="black") +  
  xlab("Carat") + 
  ylab("Count") +
  ggtitle("Diamond Carat Distribution")
  theme(text=element_text(size=18, family="Times"))

Histograms

Further Help

More Resources