library(tidyverse)## Loading tidyverse: ggplot2
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Loading tidyverse: dplyr
## Conflicts with tidy packages ----------------------------------------------
## filter(): dplyr, stats
## lag(): dplyr, stats
Bill Cleveland called these graphics trellises and implemented them in the original S language in the package Trellis.
Trellis was essentially re-created in R under the name lattice by Deepayan Sarkar. See http://lmdvr.r-forge.r-project.org/figures/figures.html. See figures 2.8 - 2.11 on Titanic survivors.
Trellis and Lattice are strongly focused on this type of graphic and have declined in popularity since the grammar of graphics aproach ggplot2.
Tufte refers to this type of graphic as ‘small multiples.’
To produce separate graphs for each value of a single categorical variable use facet_wrap() as a layer. The syntax requires a single-variable formula in a formula (preceded by a ~).
Example
d = ggplot(data=mpg, aes(x=cty,y=hwy)) + geom_point()
d + facet_wrap(~class)The layout can be controlled with the arguments ncol or nrow.
d + facet_wrap(~class,nrow=2) + ggtitle("nrow = 2")d + facet_wrap(~class,ncol=2) + ggtitle("ncol=2")Create two graphics showing the relationship between displ and hwy broken down by the categorical variable drv in the dataframe mpg. In one version, use a single row arrangement. In the other, use a single column. Which do you prefer.
e = ggplot(data=mpg,aes(x=displ,y=hwy)) + geom_point()
ee + facet_wrap(~drv,nrow=1) + ggtitle("nrow=1")e + facet_wrap(~drv,ncol=1) + ggtitle("ncol=1")To produce separate graphs for each combination of values of two categorical variables use facet_grid() as a layer. The syntax requires a two-variable formula (two variables separated by a ~). The first variable designates the rows and the second designates the columns.
Example
f = ggplot(mpg,aes(x=cty,y=hwy)) + geom_point()
ff + facet_grid(drv~class) + ggtitle("drv in rows")f + facet_grid(class~drv) + ggtitle("drv in columns")Review the variables in in mpg.
glimpse(mpg)## Observations: 234
## Variables: 11
## $ manufacturer <chr> "audi", "audi", "audi", "audi", "audi", "audi", "...
## $ model <chr> "a4", "a4", "a4", "a4", "a4", "a4", "a4", "a4 qua...
## $ displ <dbl> 1.8, 1.8, 2.0, 2.0, 2.8, 2.8, 3.1, 1.8, 1.8, 2.0,...
## $ year <int> 1999, 1999, 2008, 2008, 1999, 1999, 2008, 1999, 1...
## $ cyl <int> 4, 4, 4, 4, 6, 6, 6, 4, 4, 4, 4, 6, 6, 6, 6, 6, 6...
## $ trans <chr> "auto(l5)", "manual(m5)", "manual(m6)", "auto(av)...
## $ drv <chr> "f", "f", "f", "f", "f", "f", "f", "4", "4", "4",...
## $ cty <int> 18, 21, 20, 21, 16, 18, 18, 18, 16, 20, 19, 15, 1...
## $ hwy <int> 29, 29, 31, 30, 26, 26, 27, 26, 25, 28, 27, 25, 2...
## $ fl <chr> "p", "p", "p", "p", "p", "p", "p", "p", "p", "p",...
## $ class <chr> "compact", "compact", "compact", "compact", "comp...
What are the quantitative variables of interest? * displ * cty * hwy
In displaying relationships among these using a scatterplot we can put two on the axes and map the third to either color or size. If we use a smoother alone, we can only use two variables.
What are the categorical variables of interest?
How can we extend the relationship among quantitative variables to include one or more categorical variables?
Map to:
Using the mpg dataframe, try something we haven’t done yet.
g = ggplot(data=mpg,aes(x=displ,y=hwy,color=year)) + geom_point()
gg + facet_grid(drv~trans)g + facet_wrap(~manufacturer)The primary use of the bar chart is to display the counts of values of one or more cateogrical variables.
Example of just one variable.
ggplot(data = mpg,aes(x= class)) + geom_bar()Map the second categorical to fill.
ggplot(data = mpg,aes(x= class,fill=drv)) + geom_bar()ggplot(data = mpg,aes(fill= class,x=drv)) + geom_bar()The default value of the position parameter is “stack”. The alternatives are “dodge” and “fill.”
ggplot(data = mpg,aes(x= class,fill=drv)) + geom_bar(position = "dodge")This is know as a side-by-side bar chart.
ggplot(data = mpg,aes(x= class,fill=drv)) + geom_bar(position = 'fill')All bars are of height 1.0 and the counts are lost. What we have displayed are proportions.
ggplot(data = mpg,aes(x= class)) +
geom_bar() +
facet_wrap(~drv)Fix the labels problem with ncol = 1.
ggplot(data = mpg,aes(x= class)) +
geom_bar() +
facet_wrap(~drv,ncol=1)Or maybe coord_flip
ggplot(data = mpg,aes(x= class)) +
geom_bar() + coord_flip() +
facet_wrap(~drv)This looks dull, so add some color with fill.
ggplot(data = mpg,aes(x= class,fill=drv)) +
geom_bar() + coord_flip() +
facet_wrap(~drv)Recall that what we have been doing is visualizing a contingency table, which we can examine numerically with CrossTable from the gmodels package.
library(gmodels)
CrossTable(mpg$class,mpg$drv)##
##
## Cell Contents
## |-------------------------|
## | N |
## | Chi-square contribution |
## | N / Row Total |
## | N / Col Total |
## | N / Table Total |
## |-------------------------|
##
##
## Total Observations in Table: 234
##
##
## | mpg$drv
## mpg$class | 4 | f | r | Row Total |
## -------------|-----------|-----------|-----------|-----------|
## 2seater | 0 | 0 | 5 | 5 |
## | 2.201 | 2.265 | 37.334 | |
## | 0.000 | 0.000 | 1.000 | 0.021 |
## | 0.000 | 0.000 | 0.200 | |
## | 0.000 | 0.000 | 0.021 | |
## -------------|-----------|-----------|-----------|-----------|
## compact | 12 | 35 | 0 | 47 |
## | 3.649 | 8.828 | 5.021 | |
## | 0.255 | 0.745 | 0.000 | 0.201 |
## | 0.117 | 0.330 | 0.000 | |
## | 0.051 | 0.150 | 0.000 | |
## -------------|-----------|-----------|-----------|-----------|
## midsize | 3 | 38 | 0 | 41 |
## | 12.546 | 20.321 | 4.380 | |
## | 0.073 | 0.927 | 0.000 | 0.175 |
## | 0.029 | 0.358 | 0.000 | |
## | 0.013 | 0.162 | 0.000 | |
## -------------|-----------|-----------|-----------|-----------|
## minivan | 0 | 11 | 0 | 11 |
## | 4.842 | 7.266 | 1.175 | |
## | 0.000 | 1.000 | 0.000 | 0.047 |
## | 0.000 | 0.104 | 0.000 | |
## | 0.000 | 0.047 | 0.000 | |
## -------------|-----------|-----------|-----------|-----------|
## pickup | 33 | 0 | 0 | 33 |
## | 23.497 | 14.949 | 3.526 | |
## | 1.000 | 0.000 | 0.000 | 0.141 |
## | 0.320 | 0.000 | 0.000 | |
## | 0.141 | 0.000 | 0.000 | |
## -------------|-----------|-----------|-----------|-----------|
## subcompact | 4 | 22 | 9 | 35 |
## | 8.445 | 2.382 | 7.401 | |
## | 0.114 | 0.629 | 0.257 | 0.150 |
## | 0.039 | 0.208 | 0.360 | |
## | 0.017 | 0.094 | 0.038 | |
## -------------|-----------|-----------|-----------|-----------|
## suv | 51 | 0 | 11 | 62 |
## | 20.598 | 28.085 | 2.891 | |
## | 0.823 | 0.000 | 0.177 | 0.265 |
## | 0.495 | 0.000 | 0.440 | |
## | 0.218 | 0.000 | 0.047 | |
## -------------|-----------|-----------|-----------|-----------|
## Column Total | 103 | 106 | 25 | 234 |
## | 0.440 | 0.453 | 0.107 | |
## -------------|-----------|-----------|-----------|-----------|
##
##
Create a visualization of the relationship between drv and cyl in the mpg dataframe.
Three primary ways * Histogram * Boxplot * Density
Use the diamonds dataset and look at price.
ggplot(data = diamonds,aes(x=price)) + geom_histogram()## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Try a different binwidth or two.
ggplot(data = diamonds,aes(x=price)) + geom_histogram(binwidth=1000)ggplot(data = diamonds,aes(x=price)) + geom_histogram(binwidth=2000)Try a logarithmic scale.
ggplot(data = diamonds,aes(x=price)) +
geom_histogram(binwidth=.1) +
scale_x_log10()Boxplot in ggplot has a pecularity in that it requires both and x and y aesthetic.
# This will fail.
# ggplot(data = diamonds,aes(y=price)) + geom_boxplot()Fix this by using any constant for x.
ggplot(data = diamonds,aes(y=price,x='Whatever')) + geom_boxplot() Use coord_flip() to make it horizontal and align visually with the other visualizations of one quantitative variable.
ggplot(data = diamonds,aes(y=price,x="Whatever")) +
geom_boxplot() +
coord_flip()A density plot is essentially a smoothed histogram.
ggplot(data = diamonds,aes(x=price)) + geom_density()This makes the plot more or less detailed.
ggplot(data = diamonds,aes(x=price)) + geom_density(adjust = 5)ggplot(data = diamonds,aes(x=price)) + geom_density(adjust = 1/5)Create visualizations of the variable carat in the diamonds dataframe.
ggplot(diamonds,aes(x=carat)) + geom_histogram(binwidth=.1) + facet_wrap(~cut,ncol=1)ggplot(diamonds,aes(x=carat,fill=cut)) + geom_density(adjust=2) + facet_wrap(~cut,ncol=1)ggplot(diamonds,aes(y=carat,x=cut)) + geom_boxplot() + coord_flip()