As Training Workbook for ggplot2
Author: Naki Cam, myBI, DPDHL IT Services GmbH Reference: Hadley Wickham, R for Data Science
notice: feel free to share !
Prerequisites:
Download & Install RStudio Download & Install Microsoft R Open or other R Distribution install packages with using following code: install.packages(tidyverse)
# loading tidyverse framework
library(tidyverse)
Loading tidyverse: ggplot2
Loading tidyverse: tibble
Loading tidyverse: tidyr
Loading tidyverse: readr
Loading tidyverse: purrr
Loading tidyverse: dplyr
Conflicts with tidy packages -----------------------------------------------------------------------------
filter(): dplyr, stats
lag(): dplyr, stats
head(mpg)
# creating ggplot chart
ggplot(data = mpg, mapping = aes(displ, hwy)) + geom_point()

# exercise
# will still be empty
ggplot(data = mpg)

# how many rows and columns
dim(mpg)
[1] 234 11
# asking for help to mpg dataset
?mpg
# sample
ggplot(data = mpg) + geom_point(aes(hwy, cyl))


?geom_point
ggplot(data = mpg) + geom_point(aes(x = displ, y = hwy, color = class), stroke = 3)

ggplot(data = mpg) + geom_point(aes(displ, hwy, color = class, stroke = displ > 5))

ggplot(data = mpg) + geom_point(aes(displ, hwy, color = hwy > 30 & displ > 2)) + facet_wrap(~class)

ggplot(data = mpg) + geom_point(aes(displ, hwy, color = hwy > 30 & displ > 2)) + facet_wrap(~class, nrow = 2)

head(mpg)
ggplot(data = mpg) + geom_point(aes(displ, hwy), color = "blue") + facet_wrap(drv~cyl, nrow = 2)


ggplot(data = mpg) +
geom_point(aes(displ, hwy, color = class)) +
facet_grid(.~cyl)

# exercise
# 1 nach einer kontinuirlichen Variablen facetten erzeugen -> keine gute idee ..
ggplot(data = mpg) +
geom_point(aes(displ, hwy, color = class)) +
facet_grid(.~hwy)

# 2
ggplot(data = mpg) +
geom_point(aes(drv, cyl)) +
facet_grid(drv~cyl)

# 3
ggplot(data = mpg) +
geom_point(aes(displ, hwy)) +
facet_grid(drv~.)

ggplot(data = mpg) +
geom_point(aes(displ, hwy)) +
facet_grid(.~drv)

ggplot(data = mpg) +
geom_point(aes(displ, hwy)) +
facet_grid(.~cyl)

ggplot(data = mpg) +
geom_point(aes(displ, hwy)) +
facet_grid(cyl~.)


?facet_wrap
# nrow
# ncol
# welche anderen parameter
Facet Wrap
facet_wrap wraps a 1d sequence of panels into 2d. This is generally a better use of screen space than facet_grid because most displays are roughly rectangular.
Template
facet_wrap(facets, nrow = NULL, ncol = NULL, scales = “fixed”,shrink = TRUE, labeller = “label_value”, as.table = TRUE,switch = NULL, drop = TRUE, dir = “h”, strip.position = “top”)
?facet_grid
# why no nrow and ncol parameters?
Facet Grid
facet_grid forms a matrix of panels defined by row and column facetting variables. It is most useful when you have two discrete variables, and all combinations of the variables exist in the data.
Template
facet_grid(facets, margins = FALSE, scales = “fixed”, space = “fixed”,shrink = TRUE, labeller = “label_value”, as.table = TRUE,switch = NULL, drop = TRUE)
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy))

ggplot(data = mpg) +
geom_smooth(mapping = aes(x = displ, y = hwy))

NA
# different lines in geom_smooth by parameter "linetype = value""
ggplot(data = mpg) +
geom_smooth(mapping = aes(x = displ, y = hwy, linetype=drv))

# # 0 = blank, 1 = solid, 2 = dashed, 3 = dotted, 4 = dotdash, 5 = longdash, 6 = twodash
ggplot(data = mpg) +
geom_smooth(mapping = aes(x = displ, y = hwy), linetype=2)

Description
Aids the eye in seeing patterns in the presence of overplotting. geom_smooth and stat_smooth are effectively aliases: they both use the same arguments. Use geom_smooth unless you want to display the results with a non-standard geom.
Usage
geom_smooth(mapping = NULL, data = NULL, stat = “smooth”, position = “identity”, …, method = “auto”, formula = y ~ x, se = TRUE, na.rm = FALSE, show.legend = NA, inherit.aes = TRUE)
stat_smooth(mapping = NULL, data = NULL, geom = “smooth”, position = “identity”, …, method = “auto”, formula = y ~ x, se = TRUE, n = 80, span = 0.75, fullrange = FALSE, level = 0.95, method.args = list(), na.rm = FALSE, show.legend = NA, inherit.aes = TRUE)
# using stat_smoth
ggplot(data = mpg) + stat_smooth(aes(displ, hwy, linetype=drv),
show.legend = FALSE)

# using multiple geom functions in one chart
ggplot(data = mpg) +
geom_point(aes(displ, hwy)) +
geom_smooth(aes(displ, hwy))

# same but global and less code
ggplot(data = mpg, mapping = aes(x = displ, y= hwy)) + geom_point() + geom_smooth()

# another example but added parameters
ggplot(data = mpg, mapping = aes(x = displ, y= hwy)) +
## changed colors
geom_point( mapping = aes(color=class)) +
## changes to geom line only for subset!
geom_smooth(data = filter(mpg, class == "subcompact"),se = FALSE)

#exercise
# brainstorming
g <- ggplot(data = mpg, mapping = aes(x = displ, y= hwy, color = drv))
g + geom_point() + geom_smooth(se=FALSE)

# Comparison between programming logic -> almost the same output chart
ggplot(data = mpg, mapping = aes(x = displ, y= hwy, color = drv)) + geom_point() + geom_smooth()

ggplot(data = mpg, mapping = aes(x = displ, y= hwy, color = drv)) +
geom_point( data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_smooth(data = mpg, mapping = aes(x = displ, y = hwy))

# exercise: Program Code for shown plots
# without splitting into legends
ggplot(data = mpg, mapping = aes(x = displ, y= hwy)) + geom_point() + geom_smooth(se=FALSE)

# with legends
ggplot(data = mpg, mapping = aes(x = displ, y= hwy)) + geom_point() + geom_smooth(data = filter(mpg, drv == "4"), se=FALSE) +
geom_smooth(data = filter(mpg, drv == "f"),se=FALSE) +
geom_smooth(data = filter(mpg, drv == "r"),se=FALSE)

Bar charts
Description:
There are two types of bar charts: geom_bar makes the height of the bar proportional to the number of cases in each group (or if the weight aethetic is supplied, the sum of the weights). If you want the heights of the bars to represent values in the data, use geom_col instead. geom_bar uses stat_count by default: it counts the number of cases at each x position. geom_col uses stat_identity: it leaves the data as is.
Usage:
geom_bar(mapping = NULL, data = NULL, stat = “count”, position = “stack”, …, width = NULL, binwidth = NULL, na.rm = FALSE, show.legend = NA, inherit.aes = TRUE)
geom_col(mapping = NULL, data = NULL, position = “stack”, …, width = NULL, na.rm = FALSE, show.legend = NA, inherit.aes = TRUE)
stat_count(mapping = NULL, data = NULL, geom = “bar”, position = “stack”, …, width = NULL, na.rm = FALSE, show.legend = NA, inherit.aes = TRUE)
# using geom_bar function
ggplot(data = diamonds) + geom_bar(mapping = aes(x = cut))

# changing related computed variables
ggplot(data = diamonds) +
geom_bar(aes(cut, ..prop.., group = 1))

# using stat_summary for descriptive statistics
ggplot(data = diamonds) + stat_summary(aes(cut, depth), fun.ymin = min, fun.ymax = max, fun.y = median)

# list of other transformations available in ggplot2
# f.e.
?stat_bin
Summarise y values at unique/binned x
Description:
stat_summary operates on unique x; stat_summary_bin operators on binned x. They are more flexible versions of stat_bin: instead of just counting, they can compute any aggregate.
Usage:
stat_summary_bin(mapping = NULL, data = NULL, geom = “pointrange”, position = “identity”, …, fun.data = NULL, fun.y = NULL, fun.ymax = NULL, fun.ymin = NULL, fun.args = list(), na.rm = FALSE, show.legend = NA, inherit.aes = TRUE)
stat_summary(mapping = NULL, data = NULL, geom = “pointrange”, position = “identity”, …, fun.data = NULL, fun.y = NULL, fun.ymax = NULL, fun.ymin = NULL, fun.args = list(), na.rm = FALSE, show.legend = NA, inherit.aes = TRUE)
# exersice
# using geom_pointrange for same plot like before
ggplot(data = diamonds) +
geom_pointrange(mapping = aes(x = cut, y = depth),
stat = "summary",
fun.ymin = min,
fun.ymax = max,
fun.y = median)

# using geom_col vs geom_bar
ggplot(data = diamonds) + geom_col(aes(cut, depth)) + scale_y_continuous(labels = scales::comma)

ggplot(data = diamonds) + geom_bar(aes(cut))

?stat_smooth
# what is wrong with this plots
ggplot(diamonds) + geom_bar(aes(cut,y = ..prop..))

ggplot(diamonds) + geom_bar(aes(cut, fill = color,y = ..prop..))

ggplot(diamonds) + geom_bar(aes(cut,y = ..prop.., group = TRUE))

ggplot(diamonds) + geom_bar(aes(cut, fill = color,y = ..prop..,group = 1))

ggplot(diamonds) + geom_bar(aes(cut, fill = color))

4 Position Functions
# further color functions on geom_bar
# stacked bar chart
ggplot(diamonds) + geom_bar(aes(cut, fill = clarity),position = "identity")

# clustered bar chart
ggplot(diamonds) + geom_bar(aes(cut, fill = clarity),position = "dodge")

# 100% stacked bar chart
ggplot(diamonds) + geom_bar(aes(cut, fill = clarity),position = "fill")

# Jitter Chart with "position = jitter" & geom_point
ggplot(diamonds) + geom_point(aes(cut,depth, color = clarity),position = "jitter")

# Or short version
# Jitter Chart with "position = jitter" & geom_point
ggplot(diamonds) + geom_jitter(aes(cut,depth, color = clarity))


#exercise
# 1
ggplot(mpg, mapping = aes(cty, hwy)) + geom_point(aes(color = class)) + geom_smooth(method = lm, se=F)

ggplot(mpg, mapping = aes(cty, hwy)) + geom_jitter(aes(cty,hwy, color = class), alpha = 0.6, size = 4) + geom_smooth(method = lm, se=F)

# 3
# standard geom_jitter chart
ggplot(mpg, mapping = aes(cty, hwy)) + geom_jitter(aes(color = class))

# bubble chart with geom_jitter
ggplot(mpg, mapping = aes(cty, hwy)) + geom_count(aes(color = class))

# 4
# default position = "dodge"
ggplot(mpg, aes(cty, hwy)) + geom_boxplot(aes(drv, color = drv))


Cartesian coordinates with x and y flipped
Description:
Flip cartesian coordinates so that horizontal becomes vertical, and vertical, horizontal. This is primarily useful for converting geoms and statistics which display y conditional on x, to x conditional on y.
Usage:
coord_flip(xlim = NULL, ylim = NULL, expand = TRUE) Arguments
xlim: Limits for the x and y axes. ylim: Limits for the x and y axes. expand: If TRUE, the default, adds a small expansion factor to the limits to ensure that data and axes don’t overlap. If FALSE, limits are taken exactly from the data or xlim/ylim.
nz <- map_data("nz")
Error: Package `maps` required for `map_data`.
Please install and try again.
Map projections
Description:
coord_map projects a portion of the earth, which is approximately spherical, onto a flat 2D plane using any projection defined by the mapproj package. Map projections do not, in general, preserve straight lines, so this requires considerable computation. coord_quickmap is a quick approximation that does preserve straight lines. It works best for smaller areas closer to the equator.
Usage:
coord_map(projection = “mercator”, …, parameters = NULL, orientation = NULL, xlim = NULL, ylim = NULL)
coord_quickmap(xlim = NULL, ylim = NULL, expand = TRUE)
bar <- ggplot(diamonds) +
geom_bar(aes(cut, fill=cut), show.legend = FALSE, width = 1) + theme(aspect.ratio = 1) + labs(x = NULL, y = NULL)
bar + coord_flip()

#
bar + coord_polar(direction = 1)

Polar coordinates
Description:
The polar coordinate system is most commonly used for pie charts, which are a stacked bar chart in polar coordinates.
Usage:
coord_polar(theta = “x”, start = 0, direction = 1) Arguments
theta:
variable to map angle to (x or y) start:
offset of starting point from 12 o’clock in radians direction:
1, clockwise; -1, anticlockwise
# exercise
# 1 from stacked chart to polar chart
ggplot(diamonds) + geom_bar(aes(cut, fill = clarity),position = "identity", alpha = 0.2)

ggplot(diamonds) + geom_bar(aes(cut, fill = clarity),position = "identity", alpha = 0.2) + coord_polar()

Modify axis, legend, and plot labels
Description:
Good labels are critical for making your plots accessible to a wider audience. Ensure the axis and legend labels display the full variable name. Use the plot title and subtitle to explain the main findings. It’s common to use the caption to provide information about the data source.
Usage:
labs(…)
xlab(label)
ylab(label)
ggtitle(label, subtitle = NULL)
ggplot( data = mpg, mapping = aes( x = cty, y = hwy)) + geom_point() + geom_abline() + coord_fixed()

Reference lines: horizontal, vertical, and diagonal
Description:
These geoms add reference lines (sometimes called rules) to a plot, either horizontal, vertical, or diagonal (specified by slope and intercept). These are useful for annotating plots.
Usage:
geom_abline(mapping = NULL, data = NULL, …, slope, intercept, na.rm = FALSE, show.legend = NA)
geom_hline(mapping = NULL, data = NULL, …, yintercept, na.rm = FALSE, show.legend = NA)
geom_vline(mapping = NULL, data = NULL, …, xintercept, na.rm = FALSE, show.legend = NA)
Cartesian coordinates with fixed “aspect ratio”
Description:
A fixed scale coordinate system forces a specified ratio between the physical representation of data units on the axes. The ratio represents the number of units on the y-axis equivalent to one unit on the x-axis. The default, ratio = 1, ensures that one unit on the x-axis is the same length as one unit on the y-axis. Ratios higher than one make units on the y axis longer than units on the x-axis, and vice versa. This is similar to eqscplot, but it works for all types of graphics.
Usage:
coord_fixed(ratio = 1, xlim = NULL, ylim = NULL, expand = TRUE)
Final Template for ggplot
ggplot( data = < DATA >) +
< GEOM_FUNCTION >( mapping = aes( < MAPPINGS >), stat = < STAT >, position = < POSITION > ) +
< COORDINATE_FUNCTION > +
< FACET_FUNCTION >
