In this lab we will be learning about some basics with ggplot2
.
First let’s load in the diamonds
dataset. This data set is in the tidyverse
package, so make sure that that library is called first.
library(tidyverse)
data("diamonds")
The str
function allows you to learn about the structure of a dataset.
str(diamonds)
## tibble [53,940 Ă— 10] (S3: tbl_df/tbl/data.frame)
## $ carat : num [1:53940] 0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ...
## $ cut : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ...
## $ color : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 5 2 5 ...
## $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ...
## $ depth : num [1:53940] 61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ...
## $ table : num [1:53940] 55 61 65 58 58 57 57 55 61 61 ...
## $ price : int [1:53940] 326 326 327 334 335 336 336 337 337 338 ...
## $ x : num [1:53940] 3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ...
## $ y : num [1:53940] 3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ...
## $ z : num [1:53940] 2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ...
diamonds
? How many columns?diamonds
are categorical? Which variables are continuous? (Hint: look at the output for str()
)table
variable describe? Read the help for ?diamonds
to find out.Here is a simple scatterplot of price
vs carat
:
ggplot(data=diamonds, aes(x=carat, y=price))+
geom_point()
What do you observe?
ggplot(data=diamonds)
. What do you see?price
vs depth
.cut
vs clarity
? Why is the plot not useful?Aesthetic mappings translate
# If using a categorical variable each category will have a color
ggplot(diamonds, aes(carat, price, color=clarity))+
geom_point()
# if not ordered..
ggplot(diamonds, aes(carat, price, color=as.character(clarity)))+
geom_point()
# If using a numeric variable there will be a color gradient
ggplot(diamonds, aes(carat, price, color=depth))+
geom_point()
You can also apply a single color to all the data points by specifying the color
outside of the aesthetic mapping.
ggplot(diamonds, aes(carat, price))+
geom_point(color="blue")
ggplot(diamonds, aes(carat, price, alpha=clarity))+
geom_point()
ggplot(diamonds, aes(carat, price, shape=clarity))+
geom_point()
## Warning: Using shapes for an ordinal variable is not advised
## Warning: The shape palette can deal with a maximum of 6 discrete values because
## more than 6 becomes difficult to discriminate; you have 8. Consider
## specifying shapes manually if you must have them.
## Warning: Removed 5445 rows containing missing values (geom_point).
ggplot(diamonds, aes(carat, price, size=clarity))+
geom_point()
ggplot(diamonds, aes(carat, price, color="blue"))+
geom_point()
Map a continuous variable to color, size, and shape. How do these aesthetics behave differently for categorical vs. continuous variables?
What happens if you map the same variable to multiple aesthetics?
What happens if you map an aesthetic to something other than a variable name, like aes(colour = carat < 3)
? Note, you’ll also need to specify x and y.
Sometimes it’s useful to look at subgroups within our data. We can do this with facets.
facet_wrap()
You can specify a single discrete variable to facet by and R organize plots to fill the space.
ggplot(diamonds, aes(carat, price))+
geom_point()+
facet_wrap(~cut)
facet_grid()
You can also create a grid of graphs. The first argument to the function specifies rows and the second columns.
## Grid
ggplot(diamonds, aes(carat, price))+
geom_point()+
facet_grid(color~cut)
If you prefer to not facet in the rows or columns dimension, use a . instead of a variable name, e.g. + facet_grid(. ~ color)
.
What happens if you facet on a continuous variable?
What plots does the following code make? What does .
do?
ggplot(diamonds, aes(carat, price))+
geom_point()+
facet_grid(color~.)
ggplot(diamonds, aes(carat, price))+
geom_point()+
facet_grid(.~cut)
facet_grid()
you should usually put the variable with more unique levels in the columns. Why?