I drew much of the material in this lesson from, Wickham & Grolemund, “R for Data Science.”
So the first step is to load ggplot2 and the data.
library(ggplot2)
data(mpg)
data(diamonds)
We are going to do a plot with the data set: mpg. What is mpg? Type “library(ggplot2)” then “data(mpg)” then ?mpg" in the console to find out.
Now do a scatter plot with city miles per gallon and highway miles per gallon.
Requires a code chunk " ``` " and a name. The base layer is the outline of the graph (like the piece of paper that we draw one) the geoms are the points.
ggplot(data=mpg) + geom_point(mapping=aes(x=hwy, y=cty))
What did we do? “ggplot(data=mpg)” was the base layer, then we defined geom_point(mapping=aes(x=…,y=…)).
Aesthetic mappings are the graphs and geoms are visual representations of substents in an observation.All geoms take aesthetic parameters.
You can also define some aesthetics globally (in the base layer) and others for each geom.
Now you try it. Add a code chunk below that plots the diamonds data set, price versus carat. The diamonds data set is already loaded for you. You need to define the base layer and the scatter plot geom: geom_point() as done above.
ggplot(data=diamonds) + geom_point(mapping=aes(x=carat,y=price))
Now the next thing we are going to do is make the color of the dots red.
ggplot(data=mpg) + geom_point(mapping=aes(x=hwy, y=cty), color="red")
Now we want different color dots, based on a categorical variable in the data set. Which variable? Let’s pick drv, which is f for “front-wheel drive”, r for “rear-wheel drive” and 4 for “four-wheel drive”. We will map drv to the aesthetic “color” in the mapping. Examine the code below. What is difference between the code above and code below?
ggplot(data=mpg) + geom_point(mapping=aes(x=hwy, y=cty, color=drv))
The code above creates a scatterplot that has red dots while the code below creates a scatterplot that has different colored dots.
Now look at the diamonds data set. Try typing: “?diamonds” in the console. Pick a categorical data to map to color. Now add a code chunk that draws a scatter plot with different colored dots, based on your choice of categorical data.
ggplot(data=diamonds) + geom_point(mapping=aes(x=x, y=z, color=cut))
In addition to color, you can also map variables to “size”, “shape”, “alpha” (transparency), and perhaps other things. Of course, you can also define those aesthetics for the whole plot.
Now make several more plots, add them below, mapping different attributes to different variables, and try doing several attributes at once. You can try with either data set: mpg or diamonds, or both (one at a time).
ggplot(data=diamonds) + geom_point(mapping=aes(x=x, y=y, color=color))
ggplot(data=diamonds) + geom_point(mapping=aes(x=y, y=z, color=clarity))
ggplot(data=diamonds) + geom_point(mapping=aes(x=z, y=x, color=color, clarity))
## Warning: Ignoring unknown aesthetics:
Try mapping a quantitative variable to color, size, or shape. What happens?
ggplot(data=diamonds) + geom_point(mapping=aes(x=x, y=z, size=carat))
There is no color when I map a quantitative variable. The carat variable is shown in different sizes instead of color.
What happens if you map the same variable to multiple aesthetics?
ggplot(data=diamonds) + geom_point(mapping=aes(x=x, y=z, size=carat, color=carat))
There are two legends for the variable. When the aesthetics color is used the highest carat, 5, is the lightest blue and the lowest carat, 1, is the darkest blue. When the aesthetics size is used the highest carat corresponds to the biggest bubble and the lowest carat corresponds to the smallest bubble.
What does the stroke aesthetic do with geom_point()? Hint: it only works with continuous (i.e. quantiative) variable. Try mpg instead of diamonds, otherwise it takes too long.
ggplot(data=mpg) + geom_point(mapping=aes(x=hwy, y=cty, stroke=cty))
It makes the bubbles bigger based on the city and blurs them together.
What happens if you map an aesthetic to something other than a variable name, i.e. for mpg, aes(x = hwy, y = cty, color = disp < 5)
ggplot(data=mpg) + geom_point(mapping=aes(x=hwy, y=cty, color = displ<5))
The legend gives a TRUE or FALSE colored dot based on whether the engine is displaced in litres and is greater than 5.
Try adding another layer to your plots. A good choice here is geom_smooth(). You need to define the aesthetics, (globally in the base layer, or locally in each geom). Type “?geom_smooth()” into the console. Also type “geom_point()” into the console. Apparently, you can also define data locally as well.
ggplot(data=mpg) + geom_point(mapping=aes(x=hwy, y=cty, color = drv)) + geom_smooth(mapping=aes(x=hwy, y=cty))
## `geom_smooth()` using method = 'loess'
Also see about making box plots, and/or violin plots with ggolot2. We have done it before, but can you figure out how? For help, type “?geom_boxplot” and/or “?geom_violin” into the console. These take an argument “fill”. Try putting it in as a argument (outside of aes) and try putting it in inside of aes, mapped to a variable, like “drv” in the mpg data set.
ggplot(data=mpg) + geom_boxplot(mapping=aes(x=class, y=cty))
ggplot(data=mpg) + geom_violin(mapping=aes(x=class, y=cty))