I drew much of the material in this lesson from, Wickham & Grolemund, “R for Data Science.”

So the first step is to load ggplot2 and the data.

library(ggplot2)
data(mpg)
data(diamonds)

We are going to do a plot with the data set: mpg. What is mpg? Type “library(ggplot2)” then “data(mpg)” then ?mpg" in the console to find out.

Now do a scatter plot with city miles per gallon and highway miles per gallon.

ggplot(data=mpg) + geom_point(mapping=aes(x=hwy, y=cty))

What did we do? “ggplot(data=mpg)” was the base layer, then we defined geom_point(mapping=aes(x=…,y=…)).

You can put “mapping=aes(…)” into either the base layer, or into the geoms. In previous lessons, we put it in the base layer. If you put it in the base layer, it applies to all of the geoms. Can you think of an example where it would be useful to have different aesthetic mappings in different geoms?

You can also define some aesthetics globally (in the base layer) and others for each geom.

Now you try it. Add a code chunk below that plots the diamonds data set, price versus carat. The diamonds data set is already loaded for you. You need to define the base layer and the scatter plot geom: geom_point() as done above.

ggplot(data=diamonds) + geom_point(mapping=aes(x=price, y=carat))

Now the next thing we are going to do is make the color of the dots red.

ggplot(data=mpg) + geom_point(mapping=aes(x=hwy, y=cty), color="red")

Now we want different color dots, based on a categorical variable in the data set. Which variable? Let’s pick drv, which is f for “front-wheel drive”, r for “rear-wheel drive” and 4 for “four-wheel drive”. We will map drv to the aesthetic “color” in the mapping. Examine the code below. What is difference between the code above and code below?

The difference in the data, when comparing the data above and below, is that the above data does not specify whether the data points are demonstrating f for “front-wheel drive,” r for “rear-wheel drive,” or 4 for “four-weel drive.”

ggplot(data=mpg) + geom_point(mapping=aes(x=hwy, y=cty, color=drv))

Now look at the diamonds data set. Try typing: “?diamonds” in the console. Pick a categorical data to map to color. Now add a code chunk that draws a scatter plot with different colored dots, based on your choice of categorical data.

ggplot(data=diamonds) + geom_point(mapping=aes(x=price, y=carat, color=color))

In addition to color, you can also map variables to “size”, “shape”, “alpha” (transparency), and perhaps other things. Of course, you can also define those aesthetics for the whole plot.

Now make several more plots, add them below, mapping different attributes to different variables, and try doing several attributes at once. You can try with either data set: mpg or diamonds, or both (one at a time).

ggplot(data=diamonds) + geom_point(mapping=aes(x=clarity, y=price, color=color))

ggplot(data=diamonds) + geom_point(mapping=aes(x=price, y=depth, color=color))

ggplot(data=diamonds) + geom_point(mapping=aes(x=y, y=x, color=color))

ggplot(data=diamonds) + geom_point(mapping=aes(x=depth, y=table, color=color))

ggplot(data=diamonds) + geom_point(mapping=aes(x=color, y=x, color=color))

ggplot(data=diamonds) + geom_point(mapping=aes(x=price, y=table, color=color))

ggplot(data=diamonds) + geom_point(mapping=aes(x=carat, y=cut, color=color))

Try mapping a quantitative variable to color, size, or shape. What happens?

ggplot(data=diamonds) + geom_point(mapping=aes(x=carat, y=cut, color=carat))

This bins the data.

What happens if you map the same variable to multiple aesthetics?

The same variable is represented in two different ways, with color and with y position.

What does the stroke aesthetic do with geom_point()? Hint: it only works with continuous (i.e. quantiative) variable. Try mpg instead of diamonds, otherwise it takes too long.

ggplot(data=mpg) + geom_point(mapping=aes(x=hwy, y=cty, stroke = displ))

What happens if you map an aesthetic to something other than a variable name, i.e. for mpg, aes(x = hwy, y = cty, color = disp < 5)

ggplot(data = diamonds, mapping = aes(x = carat, y = price, color = carat > 5))+geom_point()

It shows whether the data is true or false.

Try adding another layer to your plots. A good choice here is geom_smooth(). You need to define the aesthetics, (globally in the base layer, or locally in each geom). Type “?geom_smooth()” into the console. Also type “geom_point()” into the console. Apparently, you can also define data locally as well.

ggplot(data = diamonds, mapping = aes(x = carat, y = price, color = carat > 5))+geom_point()+geom_smooth()
## `geom_smooth()` using method = 'gam'

geom_smooth() using method = ‘gam’

Also see about making box plots, and/or violin plots with ggolot2. We have done it before, but can you figure out how? For help, type “?geom_boxplot” and/or “?geom_violin” into the console. These take an argument “fill”. Try putting it in as a argument (outside of aes) and try putting it in inside of aes, mapped to a variable, like “drv” in the mpg data set.

ggplot(data=diamonds, aes(x=0, y=price)) + geom_violin(fill="orange")

ggplot(data = diamonds, mapping = aes(x = color, y = price, color = color)) + geom_boxplot()