Introduction to ggplot2

Robert Norberg
Thursday, Aug 27, 2015

What is ggplot2?

  • An R package for producing statistical graphics

  • One of Hadley Wickham's early packages

  • A great reson to learn R!

Why ggplot2?

  • Eschews the artist & canvas style of plotting
  • Replaces it with a layered approach
  • Designed to shorten the distance from the mind to the page

How to get ggplot2

Once you have your R setup running, you can install ggplot2 via

install.packages("ggplot2")

Plots require data

“[Every] non-key [attribute] must provide a fact about the key, the whole key, and nothing but the key.” - Bill Kent

We will use the diamonds data set

A dataset containing the prices and other attributes of 53,940 round cut diamonds.

  • price: price in US dollars
  • carat: weight of the diamond
  • cut: quality of the cut (Fair, Good, Very Good, Premium, Ideal)
  • colour: diamond colour, from J (worst) to D (best)
  • clarity: a measurement of how clear the diamond is (I1 (worst), SI1, SI2, VS1, VS2, VVS1, VVS2, IF (best))
  • x: length in mm
  • y: width in mm
  • z: depth in mm
  • depth: total depth percentage = z / mean(x, y) = 2 * z / (x + y) (43–79)
  • table: width of top of diamond relative to widest point

How to get the diamonds data set

library(ggplot2)
data(diamonds)
names(diamonds)
 [1] "carat"   "cut"     "color"   "clarity" "depth"   "table"   "price"  
 [8] "x"       "y"       "z"      

And now for some plots!

We will investigate the relationship between carat and price of diamond.

We will go from this

plot of chunk unnamed-chunk-3

To this

plot of chunk unnamed-chunk-4

Live demonstration

A challenge for you

The first person to recreate this plot exactly wins lunch on me at the (nearby) restaraunt of their choice! (email me your code to win)

plot of chunk unnamed-chunk-5