In this lab we will be learning about some basics with ggplot2.

Part I: Learning about your data

Step 1: Load in the data

First let’s load in the diamonds dataset. This data set is in the tidyverse package, so make sure that that library is called first.

library(tidyverse)
data("diamonds")

Step 2: Learn a little bit about this data

The str function allows you to learn about the structure of a dataset.

str(diamonds)
## tibble [53,940 Ă— 10] (S3: tbl_df/tbl/data.frame)
##  $ carat  : num [1:53940] 0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ...
##  $ cut    : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ...
##  $ color  : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 5 2 5 ...
##  $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ...
##  $ depth  : num [1:53940] 61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ...
##  $ table  : num [1:53940] 55 61 65 58 58 57 57 55 61 61 ...
##  $ price  : int [1:53940] 326 326 327 334 335 336 336 337 337 338 ...
##  $ x      : num [1:53940] 3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ...
##  $ y      : num [1:53940] 3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ...
##  $ z      : num [1:53940] 2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ...

It’s Your Turn! Learning by doing

  1. How many rows are in diamonds? How many columns?
  2. Which variables in diamonds are categorical? Which variables are continuous? (Hint: look at the output for str())
  3. What does the table variable describe? Read the help for ?diamonds to find out.

Part II: Start with a basic scatterplot

Here is a simple scatterplot of price vs carat:

ggplot(data=diamonds, aes(x=carat, y=price))+
  geom_point()

What do you observe?

It’s Your Turn! Learning by doing

  1. Run ggplot(data=diamonds). What do you see?
  2. Make a scatterplot of price vs depth.
  3. What happens if you make a scatterplot of cut vs clarity? Why is the plot not useful?

Part III: Aesthetic Mappings

Aesthetic mappings translate

A) Color

Color Gradient for Ordinal Data

# If using a categorical variable each category will have a color
ggplot(diamonds, aes(carat, price, color=clarity))+
  geom_point()

Unique colors for Nominal Data

# if not ordered..
ggplot(diamonds, aes(carat, price, color=as.character(clarity)))+
         geom_point()

Saturation Gradient for Numeric

# If using a numeric variable there will be a color gradient 
ggplot(diamonds, aes(carat, price, color=depth))+
  geom_point()

You can also apply a single color to all the data points by specifying the color outside of the aesthetic mapping.

ggplot(diamonds, aes(carat, price))+
  geom_point(color="blue")

B) Transparency

ggplot(diamonds, aes(carat, price, alpha=clarity))+
  geom_point()

C) Shape

ggplot(diamonds, aes(carat, price, shape=clarity))+
  geom_point()
## Warning: Using shapes for an ordinal variable is not advised
## Warning: The shape palette can deal with a maximum of 6 discrete values because
## more than 6 becomes difficult to discriminate; you have 8. Consider
## specifying shapes manually if you must have them.
## Warning: Removed 5445 rows containing missing values (geom_point).

D) Size

ggplot(diamonds, aes(carat, price, size=clarity))+
  geom_point()

It’s Your Turn! Learning by doing

  1. What’s gone wrong with this code? Why are the points not blue?
ggplot(diamonds, aes(carat, price, color="blue"))+
  geom_point()

  1. Map a continuous variable to color, size, and shape. How do these aesthetics behave differently for categorical vs. continuous variables?

  2. What happens if you map the same variable to multiple aesthetics?

  3. What happens if you map an aesthetic to something other than a variable name, like aes(colour = carat < 3)? Note, you’ll also need to specify x and y.

Part IV: Facets

Sometimes it’s useful to look at subgroups within our data. We can do this with facets.

facet_wrap()

You can specify a single discrete variable to facet by and R organize plots to fill the space.

ggplot(diamonds, aes(carat, price))+
  geom_point()+
  facet_wrap(~cut)

facet_grid()

You can also create a grid of graphs. The first argument to the function specifies rows and the second columns.

## Grid
ggplot(diamonds, aes(carat, price))+
  geom_point()+
  facet_grid(color~cut)

If you prefer to not facet in the rows or columns dimension, use a . instead of a variable name, e.g. + facet_grid(. ~ color).

It’s Your Turn! Learning by doing

  1. What happens if you facet on a continuous variable?

  2. What plots does the following code make? What does . do?

ggplot(diamonds, aes(carat, price))+
  geom_point()+
  facet_grid(color~.)

ggplot(diamonds, aes(carat, price))+
  geom_point()+
  facet_grid(.~cut)
  1. When using facet_grid() you should usually put the variable with more unique levels in the columns. Why?