library(tidyverse)
Loading tidyverse: ggplot2
Loading tidyverse: tibble
Loading tidyverse: tidyr
Loading tidyverse: readr
Loading tidyverse: purrr
Loading tidyverse: dplyr
Conflicts with tidy packages ------------------------------------------------------------------------------------
filter(): dplyr, stats
lag(): dplyr, stats
Nothing plotted, but a canvas for a plot is shown.
dim(mtcars)
[1] 32 11
?mpg
# drv
# f = front-wheel drive, r = rear wheel drive, 4 = 4wd
ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy))
The class and drv attributes are categorical. Therefore the plot shows the mapping between these categories.
ggplot(data = mpg) + geom_point(mapping = aes(x = class, y = drv))
ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy, color = "blue"))
The points are not blue, because the color layer is specified within the aes mappings. Thus the framework tries to plot the color against an attribute “blue”, but this does not exist within the data. The correct code would be to set the color manually in the geom_point method.
ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy), color = "blue")
The information is shown directly under the column names: <chr> [characters] are likely to be categorical, whereas <dbl> [double] and <int> [integer] are likely to be continuous.
head(mpg, 1)
For continuous variables a scale is shown, otherwise the category names.
ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy, color = cyl, size = hwy, shape = drv))
Simply both layers are applied.
ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy, color = cyl, size = cyl))
The stroke aesthetic seems to adjust the plotted object thickness.
The aesthetic is applied to the evaluated value.
ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy, colour = displ < 5))
There is one facet for each value e.g. a facet on displ.
ggplot(data = mpg) +
geom_point(mapping = aes(x = cyl, y = hwy)) +
facet_wrap(~ displ)
The facets are empty when there is no data for the according combination e.g. rear wheel drive (r) with 4 or 5 cylinder is not listed. The 7 cylinder factes are missed entirely.
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_grid(drv ~ cyl)
ggplot(data = mpg) +
geom_point(mapping = aes(x = drv, y = cyl))
The “attribute ~ dot” notation plots the attribute values without a column attribute, thus showing multiple row-wise plots for each attribute value. The y-axis is repeated. With “dot ~ attribute” the row attribute is missing, thus showing column-wise the plots. Then the x-axis is repeated.
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_grid(drv ~ .)
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_grid(. ~ cyl)
With faceting it is easier to examine the indivual classes. With coloring it is easier to see how the classes are clustered overall. With larger datasets it’s more likely that you want to see the overall clustering instead of the individual point clouds.
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = class))
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_wrap(~ class, nrow = 2)
Facet grids do not have these configuration because the rows and cols are determined by the specified attributes.
?facet_wrap
#nrow, ncol: Number of rows and columns.
#scales: should Scales be fixed ("fixed", the default), free ("free"), or free in one dimension ("free_x", "free_y").
#shrink: If TRUE, will shrink scales to fit output of statistics, not raw data. If FALSE, will be range of raw data before statistical summary.
When putting the more levels on the row axis, then the y-axis would shrink so that it is harder to see which actual values are at the points as shown in the plot.
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_grid(class ~ drv)
ggplot(data = mpg) +
geom_line(mapping = aes(x = displ, y = hwy)) +
geom_point(mapping = aes(x = displ, y = hwy))
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) +
geom_point() +
geom_smooth(se = FALSE)
Didn’t expected that there would be multiple lines. Maybe because grouped by “color = drv”.
Actually, never used before, but in 3.9 coordinate systems.
Shows the confidence interval around the line. (the grey area)
No, because the layers inherit the configuration from ggplot.
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point() +
geom_smooth()
ggplot() +
geom_point(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_smooth(data = mpg, mapping = aes(x = displ, y = hwy))
Notice: These packages seem to erase the background.
#install.packages("gridExtra")
#install.packages("cowplot")
library(cowplot)
Attaching package: 㤼㸱cowplot㤼㸲
The following object is masked from 㤼㸱package:ggplot2㤼㸲:
ggsave
p1 <- ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point() +
geom_smooth(se = FALSE)
p2 <- ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point() +
geom_smooth(mapping = aes(group = drv), se = FALSE)
p3 <- ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color=drv)) +
geom_point() +
geom_smooth(se = FALSE)
p4 <- ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point(mapping = aes(color=drv)) +
geom_smooth(se = FALSE)
p5 <- ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point(mapping = aes(color=drv)) +
geom_smooth(se = FALSE, mapping = aes(linetype = drv))
p6 <- ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point(mapping = aes(color=drv)) +
geom_point(shape = 21, color = "white", stroke = 1)
theme_set(theme_gray())
plot_grid(p1, p2, p3, p4, p5, p6, labels=c("1","2","3", "4","5","6"), ncol=2, nrow = 3)
`geom_smooth()` using method = 'loess'
`geom_smooth()` using method = 'loess'
`geom_smooth()` using method = 'loess'
`geom_smooth()` using method = 'loess'
`geom_smooth()` using method = 'loess'
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point(mapping = aes(color=drv)) +
geom_point(shape = 21, color = "white", stroke = 2)