ggplot2

David Lee

November 16, 2016

Why use R/ggplot2?

What is ggplot2?

Grammar of graphics =

Wilkinson describes graphics in “layers”

See this article by Hadley Wickham: A Layered Grammar of Graphics for further details

Starting from a blank canvas

Let’s load ggplot2 and load the dataset diamonds, which comes with ggplot2.

library(ggplot2)
data(diamonds)
summary(diamonds)
##      carat               cut        color        clarity     
##  Min.   :0.2000   Fair     : 1610   D: 6775   SI1    :13065  
##  1st Qu.:0.4000   Good     : 4906   E: 9797   VS2    :12258  
##  Median :0.7000   Very Good:12082   F: 9542   SI2    : 9194  
##  Mean   :0.7979   Premium  :13791   G:11292   VS1    : 8171  
##  3rd Qu.:1.0400   Ideal    :21551   H: 8304   VVS2   : 5066  
##  Max.   :5.0100                     I: 5422   VVS1   : 3655  
##                                     J: 2808   (Other): 2531  
##      depth           table           price             x         
##  Min.   :43.00   Min.   :43.00   Min.   :  326   Min.   : 0.000  
##  1st Qu.:61.00   1st Qu.:56.00   1st Qu.:  950   1st Qu.: 4.710  
##  Median :61.80   Median :57.00   Median : 2401   Median : 5.700  
##  Mean   :61.75   Mean   :57.46   Mean   : 3933   Mean   : 5.731  
##  3rd Qu.:62.50   3rd Qu.:59.00   3rd Qu.: 5324   3rd Qu.: 6.540  
##  Max.   :79.00   Max.   :95.00   Max.   :18823   Max.   :10.740  
##                                                                  
##        y                z         
##  Min.   : 0.000   Min.   : 0.000  
##  1st Qu.: 4.720   1st Qu.: 2.910  
##  Median : 5.710   Median : 3.530  
##  Mean   : 5.735   Mean   : 3.539  
##  3rd Qu.: 6.540   3rd Qu.: 4.040  
##  Max.   :58.900   Max.   :31.800  
## 

We’ll start with the most basic function, “ggplot”.

What is the first element we pick after choosing the dataset?

Aesthetic mapping

Let’s start with no aesthetic mapping.

g <- ggplot(diamonds)
summary(g)
## data: carat, cut, color, clarity, depth, table, price, x, y, z
##   [53940x10]
## faceting: facet_null()
g

What are we looking at in this box?

Aesthetic mapping

Let’s add one variable at a time, starting with carat.

g <- ggplot(diamonds, aes(x=carat))
summary(g)
## data: carat, cut, color, clarity, depth, table, price, x, y, z
##   [53940x10]
## mapping:  x = carat
## faceting: facet_null()
g

Now we add price.

ggplot(diamonds, aes(x=carat,y=price))

summary(g)
## data: carat, cut, color, clarity, depth, table, price, x, y, z
##   [53940x10]
## mapping:  x = carat
## faceting: facet_null()

Next, we’ll add geoms. How do we add points to this graph?

Geoms

g <- (ggplot(diamonds, aes(x=carat,y=price)) + geom_point() )
g

summary(g)
## data: carat, cut, color, clarity, depth, table, price, x, y, z
##   [53940x10]
## mapping:  x = carat, y = price
## faceting: facet_null() 
## -----------------------------------
## geom_point: na.rm = FALSE
## stat_identity: na.rm = FALSE
## position_identity

Note, I literally added the geom of points. This is what we mean by adding layers.

How do we include another geom, a smoothed mean (geom_smooth)?

How do we include another aesthetic, coloring by the diamond’s cut?

Adding aesthetics and geoms

ggplot(diamonds, aes(x=carat,y=price,color=cut)) +
  geom_point() +
  geom_smooth()

Since the aesthetic mappings were defined in the ggplot function itself, the points and smoothed means inherited those aes features.

How do we change this so that we only have 1 single smoothed mean?

Specifying aesthetics per geom

ggplot(diamonds, aes(x=carat,y=price)) +
  geom_point(aes(color=cut)) +
  geom_smooth()

Next, let’s add labels and change color theme.

Labels and theme

ggplot(diamonds, aes(x=carat,y=price)) +
  geom_point(aes(color=cut)) +
  geom_smooth() +
  labs(title = "Scatterplot",
    x = "Carat",
    y = "Price") +
  theme(panel.grid.major = element_blank(),
         panel.grid.minor = element_blank(), 
         panel.background = element_blank())+
  theme(axis.line.y = element_line(colour = "black", size=.5),
        axis.line.x = element_line(colour = "black", size=.5))

Themes look like a pain to configure. How do we save theme settings?

Let’s talk about facets.

Facets

Sometimes I’m exploring and I just want to compare factors side-by-side in separate graphs. Instead of coding “cut” of the diamond to a color, I use facets.

ggplot(diamonds, aes(x=carat,y=price)) +
  geom_point(size=1) +
  geom_smooth(size=1) +
  labs(title = "Scatterplot by Cut",
    x = "Carat",
    y = "Price") +
  theme(panel.grid.major = element_blank(),
         panel.grid.minor = element_blank(), 
         panel.background = element_blank())+
  theme(axis.line.y = element_line(colour = "black", size=.5),
        axis.line.x = element_line(colour = "black", size=.5)) +
  facet_grid(cut ~ .)

Other times, I want to look at levels across 2 variables like cut and clarity.

ggplot(diamonds, aes(x=carat,y=price)) +
  geom_point(size=.5) +
  geom_smooth(size=1) +
  labs(title = "Scatterplot by Cut",
    x = "Carat",
    y = "Price") +
  theme(panel.grid.major = element_blank(),
         panel.grid.minor = element_blank(), 
         panel.background = element_blank())+
  theme(axis.line.y = element_line(colour = "black", size=.5),
        axis.line.x = element_line(colour = "black", size=.5)) +
  facet_grid(cut ~ clarity)
## Warning: Computation failed in `stat_smooth()`:
## x has insufficient unique values to support 10 knots: reduce k.

Examples of other graphs

Stacked barplot (which I generally don’t recommend)

ggplot(diamonds, aes(clarity)) +
  geom_bar(aes(fill=cut)) +
  scale_fill_brewer() +
  theme_minimal() +
  theme(legend.position="top") +
  labs(title = "Stacked barplot",
    x = "Clarity",
    y = "# of Diamonds")

Histogram of diamond prices

ggplot(diamonds) +
  geom_histogram(aes(price),color="red",fill="white") +
  scale_fill_brewer() +
  theme_minimal() +
  theme(legend.position="top") +
  labs(title = "Histogram of diamond prices",
    x = "Price",
    y = "# of Diamonds") +
  coord_flip() +
  geom_vline(xintercept=3944,alpha=.8,linetype=2) +
  annotate("text", x = 4411, y = 10115, label = "National avg cost of wedding stone (not fact)")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Boxplot of Price per Carat by Color, from A. Meliji

ggplot(diamonds, aes(factor(color), (price/carat), fill=color)) +
  geom_boxplot() +
  ggtitle("Diamond Price per Carat according Color") +
  xlab("Color") +
  ylab("Diamond Price per Carat U$")