library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.4
## ✔ tibble 3.1.8 ✔ dplyr 1.0.10
## ✔ tidyr 1.2.1 ✔ stringr 1.4.1
## ✔ readr 2.1.2 ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
ggplot(data = mpg). What do you see?ggplot(data = mpg)
Only a grid with no plots.
mpg? How many columns?dim(mpg)
## [1] 234 11
234 rows, 11 columns.
drv variable describe? Read the help for
?mpg to find out.?mpg
“drv- the type of drive train, where f = front-wheel drive, r = rear wheel drive, 4 = 4wd”
hwy vs cyl.ggplot(data = mpg) + geom_point(aes(x = hwy, y = cyl))
class vs
drv? Why is the plot not useful?ggplot(mpg) + geom_point(aes(class,drv))
Class is a discrete variable, so a scatterplot doesn’t really make sense. A bar plot would be better.
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = "blue"))
Color for the entire plot (not dependent on a variable) the color
argument must go outside the aes() function.
mpg are categorical? Which variables
are continuous? (Hint: type ?mpg to read the documentation
for the dataset). How can you see this information when you run
mpg?Manufacturer, model, trans, drv, fl and class are categorical. Displ,
year, cty, and hwy are continuous. You can look at which are
<int> and which are <char>
mpg
color, size,
and shape. How do these aesthetics behave differently for
categorical vs. continuous variables?ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = class))
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = year))
Categorical variables get specific color/size values, where with continuous variables a spectrum is assigned. Shapes can’t be put on a smooth spectrum, so you cannot assign a continuous variable to shape.
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = trans, shape = trans))
## Warning: The shape palette can deal with a maximum of 6 discrete values because
## more than 6 becomes difficult to discriminate; you have 10. Consider
## specifying shapes manually if you must have them.
## Warning: Removed 96 rows containing missing values (geom_point).
That variable is assigned a value for each of the aesthetics. For example, auto(l4) is green and square.
stroke aesthetic do? What shapes does it
work with? (Hint: use ?geom_point)Stroke changes the thickness of a border around the shape being graphed.
aes(colour = displ < 5)? Note,
you’ll also need to specify x and y.ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = displ < 5))
If you set it to a Boolean condition then it will create a version of the aesthetic for true and false.
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_wrap(~ cty)
It makes a graph for every unqiue instance of the continous variable.
facet_grid(drv ~ cyl) mean? How do they relate to this
plot?ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_grid(drv ~ cyl)
ggplot(data = mpg) +
geom_point(mapping = aes(x = drv, y = cyl))
Becuase both are discrete variables, all of the points for their
intersection fall on the same point. Each point on this graph represents
all of the points on the facet_grid(drv ~ cyl) plot.
.
do?ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_grid(drv ~ .)
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_grid(. ~ cyl)
It makes a facet grid with only one variable. The .
represents nothing. Its a facet wrap but instead of seperate plots, its
one plot with seperate sections.
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_wrap(~ class, nrow = 2)
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = class))
It is much easier to see the distributions of the individual classes, but harder to see the overall trend across all classes. With a larger dataset, the overall chart becomes increasingly crowded. Faceting can help separate the data into more readable plots.
?facet_wrap. What does nrow do? What
does ncol do? What other options control the layout of the
individual panels? Why doesn’t facet_grid() have
nrow and ncol arguments?nrow and ncol sets the number of rows and
columns the faceted plots with be displayed in.
facet_grid() doesn’t have these arguments because the rows
and columns are determined by the number of unqiue values of the
variables used to facet.
facet_grid() you should usually put the
variable with more unique levels in the columns. Why?This would make is easier to read and interpret because graphs and computer screens are wider on the x axis. So if we can squish each graph less.