1+2
## [1] 3
library(tidyverse)
## Warning: package 'ggplot2' was built under R version 4.5.2
## Warning: package 'readr' was built under R version 4.5.2
mpg
## # A tibble: 234 × 11
## manufacturer model displ year cyl trans drv cty hwy fl class
## <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
## 1 audi a4 1.8 1999 4 auto… f 18 29 p comp…
## 2 audi a4 1.8 1999 4 manu… f 21 29 p comp…
## 3 audi a4 2 2008 4 manu… f 20 31 p comp…
## 4 audi a4 2 2008 4 auto… f 21 30 p comp…
## 5 audi a4 2.8 1999 6 auto… f 16 26 p comp…
## 6 audi a4 2.8 1999 6 manu… f 18 26 p comp…
## 7 audi a4 3.1 2008 6 auto… f 18 27 p comp…
## 8 audi a4 quattro 1.8 1999 4 manu… 4 18 26 p comp…
## 9 audi a4 quattro 1.8 1999 4 auto… 4 16 25 p comp…
## 10 audi a4 quattro 2 2008 4 manu… 4 20 28 p comp…
## # ℹ 224 more rows
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = class))
How to get help
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_wrap(~class, nrow = 2)
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_grid(drv ~ cyl)
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_wrap(~year, nrow = 2)
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_wrap(~displ, nrow = 2)
It will display the data points in many facets, treating the continuous variable like a categorical. Thus, it will split them into the range (the continuous variable) we chose and will create a separate “box” (facet) for each one. It is noted that by doing so, you risk having too many facets, like in the case above: displ.
ggplot(data = mpg) +
geom_point(mapping = aes(x = drv, y = cyl))
An empty cell in plot with facet_grid(drv ~ cyl) means that it is not an option that exist in the data set. This relate to this plot as this facet shows all the combinations that already exist, which helps identify combinations that do not exist and/or are missing.
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_grid(drv ~ .)
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_grid(. ~ cyl)
When the . is on the right, it creates an horizontal display, with facets for each value of drv. When the . is on the left, it creates an vertical display, with facets for each value of cyl.
Basically, the . indicates “no facet in this dimension”, as such, when on the right only row facets can be created, and when on the left, only column facets can be created.
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_wrap(~ class, nrow = 2)
What are the advantages to using faceting instead of the colour aesthetic? What are the disadvantages? How might the balance change if you had a larger dataset?
By using faceting instead of the colour aesthetic, it gives a clearer view of each distinct group. It is easier to compare their patterns when using facets since there is less clutter and overlapping, making it easier to read each points.
However, the disadvantages of faceting are that it is more difficult to compare the values from different groups as they are in different facets. It is less obvious than when using one panel and colours. Also, depending on the number of facets, it can be more difficult to make them fit in a document or a page when there are a lot of them.
If I had a larger dataset, facets would become more valuable since they is a higher chance of overlapping points and clutter. Also, the more groups, the more colours, which would make it even more difficult to read Thus, facets would allow a clearer view of the data for each group.
nrow: Controls the number of rows
ncol: Controls the number of columns
Other options that control the layout of the individual panels are: scales, dir, and strip.position.
facet_grid() doesn’t have nrow and ncol arguments because it is defined by rows and columns, thus we don’t have to manually set them.
Having more levels in the columns will work better with screen, since most screens have a bigger width then length, making it a better use of space to have data appear horizontally.
different visual object to represent data
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy))
ggplot(data = mpg) +
geom_smooth(mapping = aes(x = displ, y = hwy))
not every aesthetic works with every geom
ggplot(data = mpg) +
geom_smooth(mapping = aes(x = displ, y = hwy, linetype = drv))
two geoms in the same graph!
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point(mapping = aes(color = class)) +
geom_smooth()
local vs. global mappings This makes it possible to display different aesthetics in different layers.
specify different data for each layer
ggplot(data = mpg) +
geom_smooth(mapping = aes(x = displ, y = hwy))
ggplot(data = mpg) +
geom_smooth(mapping = aes(x = displ, y = hwy, group = drv))
ggplot(data = mpg) +
geom_smooth(mapping = aes(x = displ, y = hwy, color = drv), show.legend = FALSE)
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut))
demo = tribble(
~cut, ~freq,
"Fair", 1610,
"Good", 4906,
"Very Good", 12082,
"Premium", 13791,
"Ideal", 21551)
ggplot(data = demo) +
geom_bar(mapping = aes(x = cut, y = freq), stat = "identity")
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, y = stat(prop), group = 1))
## Warning: `stat(prop)` was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(prop)` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
#> Warning: `stat(prop)` was deprecated in ggplot2 3.4.0.
#> ℹ Please use `after_stat(prop)` instead.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
ggplot(data = diamonds) +
stat_summary(mapping = aes(x = cut, y = depth), fun.min = min, fun.max = max, fun = median)
adjustments for bar charts
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = clarity), position = "dodge")
Others:
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, colour = cut))
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = cut))
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = clarity))
ggplot(data = diamonds, mapping = aes(x = cut, fill = clarity)) +
geom_bar(alpha = 1/5, position = "identity")
ggplot(data = diamonds, mapping = aes(x = cut, colour = clarity)) +
geom_bar(fill = NA, position = "identity")
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = clarity), position = "fill")
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = clarity), position = "dodge")
adjustments for scatterplots
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy), position = "jitter")
switch x and y
ggplot(data = mpg, mapping = aes(x = class, y = hwy)) +
geom_boxplot()
ggplot(data = mpg, mapping = aes(x = class, y = hwy)) +
geom_boxplot() +
coord_flip()
set the aspect ratio correctly for maps
Polar coordinates reveal an interesting connection between a bar chart and a Coxcomb chart.
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = cut)) +
coord_polar()
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = cut), show.legend = FALSE, width = 1) +
theme(aspect.ratio = 1) +
labs(x = NULL, y = NULL)
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = cut), show.legend = FALSE, width = 1) +
theme(aspect.ratio = 1) +
labs(x = NULL, y = NULL) + coord_flip()
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = cut), show.legend = FALSE, width = 1) +
theme(aspect.ratio = 1) +
labs(x = NULL, y = NULL) + coord_polar()
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = cut))
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = cut)) +
coord_polar()
The grammar of graphics is based on the insight that you can uniquely describe any plot as a combination of: