To install a new package, use the function
install.packages("name.package")
To load a library, use the function
library(name.package)
### Install packages
# install.packages("ggplot2")
# install.packages("ggridges")
# install.packages("tidyverse")
# install.packages("janitor")
# install.packages("kableExtra")
# install.packages("unikn")
# install.packages("ggpubr")
# install.packages("sjPlot")
### Load libraries
library(tidyverse)
library(ggplot2)
library(ggridges)
library(janitor) # helpful to clean col names `clean_names()`
library(kableExtra) # to display and edit tables
library(unikn) # for uni Konstanz theme
library(ggpubr) # to arrange plots
library(sjPlot)
# set options for tables
bs_style <- c("striped", "hover", "condensed", "responsive")
options(kable_styling_bootstrap_options = bs_style)
### Other data
### Run the following chunk
data("state")
state.x77 %>%
as.data.frame() %>%
rownames_to_column() %>%
rename(
state = rowname
) %>%
janitor::clean_names() -> df
rm(state.abb, state.area, state.center, state.division, state.name, state.region, state.x77)
| state | population | income | illiteracy | life_exp | murder | hs_grad | frost | area |
|---|---|---|---|---|---|---|---|---|
| Alabama | 3615 | 3624 | 2.1 | 69.05 | 15.1 | 41.3 | 20 | 50708 |
| Alaska | 365 | 6315 | 1.5 | 69.31 | 11.3 | 66.7 | 152 | 566432 |
| Arizona | 2212 | 4530 | 1.8 | 70.55 | 7.8 | 58.1 | 15 | 113417 |
| Arkansas | 2110 | 3378 | 1.9 | 70.66 | 10.1 | 39.9 | 65 | 51945 |
| California | 21198 | 5114 | 1.1 | 71.71 | 10.3 | 62.6 | 20 | 156361 |
| Colorado | 2541 | 4884 | 0.7 | 72.06 | 6.8 | 63.9 | 166 | 103766 |
- Define plot aesthetics with
ggplot(aes(x = ..., y = ...))
df %>%
### pass the data to the ggplot function
ggplot(
### define aesthetics
aes(x = income, y = illiteracy)
)
- Define the type of plot. Here, we use
geom_boxplot().
- Add colors to the plot using the
colorargument
Here, we want to color by low- vs. high-population states
df %>%
### pass the data to the ggplot function
ggplot(
### define aesthetics
aes(x = income, y = illiteracy, color = population2)
) +
### define geom
geom_point()
- Modify the plot labs with
xlab()andylab()or withlab(x = "...", y = "...")
df %>%
### pass the data to the ggplot function
ggplot(
### define aesthetics
aes(x = income, y = illiteracy, color = population2)
) +
### define geom (geom_boxplot)
geom_point() +
### add or edit labs
labs(x = "Income", y = "Illiteracy")
- Add a title with
ggtitle()
df %>%
### pass the data to the ggplot function
ggplot(
### define aesthetics
aes(x = income, y = illiteracy, color = population2)
) +
### define geom (geom_boxplot)
geom_point() +
### add or edit labs
labs(x = "Income", y = "Illiteracy") +
### add title
ggtitle("Illiteracy by income and population size")
- Change theme of the plot with
theme_ ...().
A list of themes is provided in the section customize your plot > change plot theme
df %>%
### pass the data to the ggplot function
ggplot(
### define aesthetics
aes(x = income, y = illiteracy, color = population2)
) +
### define geom (geom_boxplot)
geom_point() +
### add or edit labs
labs(x = "Income", y = "Illiteracy") +
### add title
ggtitle("Illiteracy by income and population size") +
### change plot theme
theme_unikn()
- Change the axis limits
geom_boxplot(): “The boxplot compactly displays the distribution of a continuous variable. It visualises five summary statistics (the median, two hinges and two whiskers), and all”outlying” points individually”.
Usage:
geom_boxplot(
outlier.color = NULL,# if NULL inherit colors from ggplot() aesthetics
outlier.fill = NULL, # if NULL inherit colors from ggplot() aesthetics
outlier.shape = 19, # change number to change shape
outlier.size = 1.5, # change number to change shape
outlier.alpha = NULL, # modify transparency of outlier color
varwidth = FALSE, # if TRUE, plots widths are proportional to the square roots of the number of observation
na.rm = FALSE, # removes NA values
inherit.aes = TRUE # default
)
In the following, we will create a new column in the df1
based on whether the state population is above or below the mean (4246).
The new column is called population2 and we will use an
if_else statement within the function mutate,
which is used to create or mutate columns in a dataframe.
Now, we can plot the percentage of illiteracy by low and
high-population states
df %>%
ggplot(aes(x = population2, y = illiteracy, fill = population2)) +
geom_boxplot() +
ggtitle("% illiteracy by low/high-population states")
How to read a boxplot:
The lower and upper hinges correspond to the first and third quartiles (the 25th and 75th percentiles).
The upper whisker extends from the hinge to the largest value no further than 1.5 * IQR from the hinge (where IQR is the inter-quartile range, or distance between the first and third quartiles).
The lower whisker extends from the hinge to the smallest value at most 1.5 * IQR of the hinge. Data beyond the end of the whiskers are called “outlying” points and are plotted individually.
Run the following code for more information about
geom_boxplot()
help("geom_boxplot")
[Reference: R documentation]
“There are two types of bar charts: geom_bar()and
geom_col().
geom_bar() makes the height of the bar proportional
to the number of cases in each group (or if the weight aesthetic is
supplied, the sum of the weights).
geom_bar() uses stat_count() by
default: it counts the number of cases at each x position
If you want the heights of the bars to represent values in the
data, use geom_col() instead.
geom_col() uses stat_identity(): it
leaves the data as is”
geom_col()
Usage:
geom_col(
position = "stack", # define position, stack is default
...,
just = 0.5, #default. -1 moves the bar to the right, +1 moves bars to the left
width = NULL, # define bar size
na.rm = FALSE,
show.legend = NA,
..., # for other options see section customize your plots
)
In the following, we are going to plot the population of the first 5
states of df
df %>%
### select first 5 rows
slice_head(n = 5) %>%
ggplot(aes(x = state, y = population, fill = state)) +
geom_col(color = "black", width = .8) +
ggtitle("Population by state")
geom_bar(): Creates bar charts, useful for visualizing the count or frequency of categorical data.
Usage
geom_bar(
position = "stack", # define position, stack is default
stats = "count", # default - if you wish to define y value, either use geom_col() or define stats = "identity
...,
just = 0.5, #default. -1 moves the bar to the right, +1 moves bars to the left
width = NULL, # define bar size
na.rm = FALSE,
show.legend = NA,
..., # for other options see section customize your plots
)
Similarly to the example above, we are going to plot the population
of the first 5 states of df1
df %>%
### select first 5 rows
slice_head(n = 5) %>%
ggplot(aes(x = state, y = population, fill = state)) +
geom_bar(stat = "identity", color = "black") +
ggtitle("Population by state")
“A bar chart uses height to represent a value, and so the base of the bar must always be shown to produce a valid visual comparison. Proceed with caution when using transformed scales with a bar chart. It’s important to always use a meaningful reference point for the base of the bar. For example, for log transformations the reference point is 1. In fact, when using a log scale, geom_bar() automatically places the base of the bar at 1. Furthermore, never use stacked bars with a transformed scale, because scaling happens before stacking. As a consequence, the height of bars will be wrong when stacking occurs with a transformed scale.
By default, multiple bars occupying the same x position will be stacked atop one another by position_stack(). If you want them to be dodged side-to-side, use position_dodge() or position_dodge2(). Finally, position_fill() shows relative proportions at each x by stacking the bars and then standardising each bar to have the same height”.
Run the following code for more information about
geom_bar() and geom_col()
help("geom_col")
help("geom_bar")
[Reference: R documentation]
geom_density(): Display a smooth estimate of the distribution of continuous data.
Usage:
geom_density(
stat = "density", # default
position = "identity",
...,
na.rm = FALSE, # remove na values
outline.type = "upper"
)
In the following, we will plot the distribution of
illiteracy. Note:
geom_density() only requires either x or y aesthetics.
df %>%
ggplot(aes(x = illiteracy)) +
geom_density(fill = "deepskyblue4") +
ggtitle("Distribution of illiteracy")
We can also verify the distribution of illiteracy by states with high
vs. low population using the fill argument within the
aes() function
df %>%
ggplot(aes(x = illiteracy, fill = population2)) +
geom_density() +
ggtitle("Distribution of illiteracy by high vs. low-population states")
In this case, it is better to lower the color
transparency to see how the two groups are distributed.
We can do this with the alpha argument within the
geom_density function
df %>%
ggplot(aes(x = illiteracy, fill = population2)) +
geom_density(alpha = .7) +
ggtitle("Distribution of illiteracy by high vs. low-population states")
geom_density_ridges(): Create ridge plots to visualize the distribution of continuous data along one or more categorical variables.
Usage:
geom_density_ridges(
mapping = NULL,
data = NULL,
stat = "density_ridges",
position = "points_sina",
panel_scaling = TRUE, # scaling is calculated for each panel
na.rm = FALSE, # removes na values
...
)
In the following, the distribution of illiteracy,
murder and life_exp. In order to use
ggridges::geom_density_ridges(), we have to manipulate the
data format. To do this, we wil use the function
pivot_longer.
Note: the function
geom_density_ridges() is part of the package
ggridges.
df %>%
dplyr::select(illiteracy, murder, life_exp) %>%
pivot_longer(names_to = "measure", values_to = "value", 1:3) %>%
ggplot(aes(x = value, y = measure, fill = measure)) +
ggridges::geom_density_ridges(alpha = .7)
We can also plot the distribution of illiteracy,
murder and life_exp based on states with high
and low income.
First, we have to create a new column (income2) based on
whether the income of the state is above or below 4436 (mean). To do
this, we use again the if_else() function within the
mutate() function.
df %>%
mutate(income2 = if_else(income >= 4436, "high", "low")) -> df
Now we can create a plot with geom_density_ridges()
df %>%
dplyr::select(income2, illiteracy, murder, life_exp) %>%
pivot_longer(names_to = "measure", values_to = "value", 2:4) %>%
ggplot(aes(x = value, y = measure, fill = income2)) +
ggridges::geom_density_ridges(alpha = .7)
Another option is to use the function
facet_wrap(~variable) instead of the fill =
argument
df %>%
dplyr::select(income2, illiteracy, murder, life_exp) %>%
pivot_longer(names_to = "measure", values_to = "value", 2:4) %>%
ggplot(aes(x = value, y = measure, fill = measure)) +
ggridges::geom_density_ridges(alpha = .7) +
facet_wrap(~income2)
Run the following code for more information about
geom_density() and
geom_density_ridges()
help("geom_density")
help("geom_density_ridges")
[Reference: R documentation]
geom_histogram(): Display the distribution of continuous data using bars.
“Visualise the distribution of a single continuous variable by dividing the x axis into bins and counting the number of observations in each bin. Histograms (geom_histogram()) display the counts with bars; frequency polygons (geom_freqpoly()) display the counts with lines. Frequency polygons are more suitable when you want to compare the distribution across the levels of a categorical variable”.
Usage:
geom_histogram(
mapping = NULL,
data = NULL,
stat = "bin",
position = "stack", # or "jitter"
...,
binwidth = NULL, # width of the bins
bins = NULL, # number of bins overridden by binwidth
na.rm = FALSE, # remove NAs values
)
Similarly to geom_density(),
geom_histogram() also only requires either x or y
aesthetics
ggarrange(df %>%
ggplot(aes(x = illiteracy)) +
geom_histogram(fill = "deepskyblue4", color = "black", alpha = .5, position = "jitter", bins = 30) +
ggtitle("Distribution of illiteracy") +
annotate("label", x = 2.7, y = 10, label = "jitter"),
df %>%
ggplot(aes(x = illiteracy)) +
geom_histogram(fill = "indianred3", color = "black", alpha = .5, position = "stack", bins = 10) + ### default
ggtitle("Distribution of illiteracy") +
annotate("label", x = 2.7, y = 12.8, label = "stack"))
Run the following code for more information about
geom_histogram()
help("geom_histogram")
[Reference: R documentation]
geom_point()andgeom_jitter(): Display scatterplots.geom_jitteravoinds overlapping of points.
“geom_point() and geom_jitter() are used to
create scatterplots. geom_point() is most useful for
displaying the relationship between two continuous variables. It can be
used to compare one continuous and one categorical variable, or two
categorical variables, but a variation like geom_jitter(),
is usually more appropriate.”
Usage:
geom_point(
mapping = NULL,
data = NULL,
stat = "identity",
position = "identity", # or jitter
...,
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE
)
geom_jitter(
mapping = NULL,
data = NULL,
stat = "identity",
position = "jitter",
...,
width = NULL,
height = NULL,
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE
)
Let’s compare the two plots by plotting the percentage of illiteracy
by countries with low vs. high income (income2)
df %>%
ggplot(aes(x = income2, y = illiteracy, color = income2)) +
geom_point() +
ggtitle("Scatter plot with geom_point()")
“geom_jitter() is a convenient shortcut for
geom_point(position = "jitter"). It adds a small amount of
random variation to the location of each point, and is a useful way of
handling overplotting caused by discreteness in smaller datasets.”
df %>%
ggplot(aes(x = income2, y = illiteracy, color = income2)) +
geom_jitter() +
ggtitle("Scatter plot with geom_jitter()")
Run the following code for more information about
geom_point() and geom_jitter()
help("geom_point")
help("geom_jitter")
[Reference: R documentation]
geom_line(): Connect data points with lines, useful for showing trends.
Usage:
geom_line(
mapping = NULL,
data = NULL,
stat = "identity",
position = "identity",
na.rm = FALSE,
orientation = NA,
show.legend = NA,
inherit.aes = TRUE,
...
)
### Alternatives
geom_path()
geom_step()
### see help()
df %>%
slice_head(n = 5) %>%
ggplot(aes(x = income, y = illiteracy)) +
geom_line(size = .8, linetype = 3, color = "deepskyblue4") +
geom_point(size = 2, color = "deepskyblue4") +
ggtitle("Change in illiteracy by income")
Run the following code for more information about
geom_line()
help("geom_line")
[Reference: R documentation]
geom_smooth(): Adds a smooth trend line to a scatter plot
geom_smooth() calculates: Predicted
value of y, lower and upper pointwise confidence
interval around the mean and standard
error.
Usage:
geom_smooth(
mapping = NULL,
data = NULL,
stat = "smooth",
position = "identity",
...,
method = NULL, # IMPORTANT: Use method "lm" for plotting linear regression
formula = NULL,
se = TRUE,
na.rm = FALSE,
...
)
Let’s plot the relationship between murders and illiteracy:
df %>%
ggplot(aes(x = murder, y = illiteracy)) +
geom_jitter(alpha = .5, color = "indianred") +
geom_smooth(color = "indianred", fill = "lightgrey") +
ggtitle("Relationship between illiteracy and murder")
If you specify method = "lm" within the
geom_smooth() function, you get a regression line
df %>%
ggplot(aes(x = murder, y = illiteracy)) +
geom_jitter(alpha = .5, color = "indianred") +
geom_smooth(color = "indianred", fill = "lightgrey", method = "lm") +
ggtitle("Smooth plot with (method = lm)")
Run the following code for more information about
geom_smooth()
help("geom_smooth")
[Reference: R documentation]
geom_errorbar(): Adds error bar to plot. It allows you to visually represent the variability or uncertainty associated with the data points.
Usage:
geom_errorbar(
mapping = NULL,
data = NULL,
stat = "identity", # default
position = "identity", # default
...,
na.rm = FALSE,
)
In the following plot, we are using geom_point() to plot
the average income for states with high vs. low population. We will use
geom_errorbar() to plot the standard deviation.
### first, let's compute the mean and the standard error
df %>%
summarize(
mean = mean(income),
sd = sd(income),
se = sd/sqrt(n()),
.by = "population2"
) -> summary_income
summary_income %>%
ggplot(aes(x = population2, y = mean, color = population2)) +
geom_point(size = 3) +
geom_errorbar(aes(x = population2, ymin = mean-se, ymax = mean+se), width = 0.1, size = 0.5) +
ggtitle("Income by population (high vs. low)")
Run the following code for more information about
geom_errorbar()
help("geom_errorbar")
[Reference: R documentation]
geom_polyglon(): the start and end points are connected and the inside is coloured by fill.
map_data. See help for more information.# download map states using `map_data`
map_data("state") -> usa.map
df %>% mutate(state = tolower(state)) -> df.map
states <- c("alabama", "alaska", "arizona", "arkansas", "california", "colorado", "connecticut", "delaware", "florida", "georgia", "hawaii", "idaho", "illinois", "indiana", "iowa", "kansas", "kentucky", "louisiana",
"maine", "maryland", "massachusetts", "michigan", "minnesota", "mississippi", "missouri", "montana",
"nebraska", "nevada", "new hampshire", "new jersey", "new mexico", "new york", "north carolina",
"north dakota", "ohio", "oklahoma", "oregon", "pennsylvania", "rhode island", "south carolina",
"south dakota", "tennessee", "texas", "utah", "vermont", "virginia", "washington", "west virginia",
"wisconsin", "wyoming")
usa.map %>%
### filter only the states of our df
filter(region %in% states) -> usa.map
### join dfs
df.map %>% rename(region = state) -> df.map
common.cols <- intersect(names(df.map), names(usa.map))
left_join(df.map, usa.map, by = common.cols) -> df.map
In the following, we plot the percentage of murder by
state (in the current df - region)
df.map %>%
ggplot(aes(x = long, y = lat, group = group, fill = murder)) +
geom_polygon(color = "black") +
scale_fill_gradient(low="white", high="indianred3") +
ggtitle("Murder by state") +
labs(x = "", y = "")
Run the following code for more information about
geom_polyglon()
??geom_polygon
[Reference: R documentation]
geom_violin():
“A violin plot is a compact display of a continuous distribution. It is a blend of geom_boxplot() and geom_density(): a violin plot is a mirrored density plot displayed in the same way as a boxplot”
geom_violin(
mapping = NULL,
data = NULL,
stat = "ydensity",
position = "dodge",
...,
draw_quantiles = NULL, # If not(NULL) (default), draw horizontal lines at the given quantiles of the density estimate.
trim = TRUE, # If TRUE (default), trim the tails of the violins to the range of the data. If FALSE, don't trim the tails.
scale = "area", # if "area" (default), all violins have the same area (before trimming the tails). If "count", areas are scaled proportionally to the number of observations. If "width", all violins have the same maximum width.
na.rm = FALSE,
...
)
One example plotting the percentage of illiteracy rate by income (low/high)
df %>%
ggplot(aes(x = income2, y = illiteracy, fill = income2)) +
geom_violin(alpha = .5) +
ggtitle("% illiteracy by low/high-income states")
Run the following code for more information about
geom_violin()
help("geom_violin")
References:
R documentation;
Hintze, J. L., Nelson, R. D. (1998) Violin Plots: A Box Plot-Density Trace Synergism. The American Statistician 52, 181-184.
ggcorplot(): Plots correlation matrix
cor_pmat(): Compute a correlation matrix p-values
Usage
ggcorrplot(
corr,
method = c("square", "circle"),
type = c("full", "lower", "upper"),
ggtheme = ggplot2::theme_minimal,
title = "",
show.legend = TRUE,
legend.title = "Corr",
show.diag = NULL,
colors = c("blue", "white", "red"), # set colors
outline.color = "gray",
hc.order = FALSE,
hc.method = "complete",
lab = FALSE, # add correlation coefficient to the plot
lab_col = "black",
lab_size = 4,
p.mat = NULL,
sig.level = 0.05, # p-value significance level
insig = c("pch", "blank"), # if pch = add characters, if blank = remove correlation
pch = 4, # shape
pch.col = "black",
pch.cex = 5, # size pch
# the size, the color and the string rotation of text label (variable names).
tl.cex = 12,
tl.col = "black",
tl.srt = 45,
digits = 2,
as.is = FALSE
)
cor_pmat(x, ...)
Let’s plot the correlation between all the continuous variable in
df
df %>%
dplyr::select(2:9) %>% mutate_all(~scale(.x)) %>%
cor(method = "pearson") -> cor
ggcorrplot::cor_pmat(cor) -> p.values
round(p.values, 3) -> p.values
ggcorrplot::ggcorrplot(cor, hc.order = T,
type = "lower",
lab = T,
lab_size = 4,
method = "square",
colors = c("grey", "white", "turquoise3"),
p.mat = p.values,
pch.col = "grey50",
pch = 4,
show.legend = F,
insig = "pch",
title = "Correlation",
ggtheme = unikn::theme_unikn(),
outline.color = "black",
tl.cex = 10,
tl.col = "black",
tl.srt = 90) +
labs(
caption = "non significant correlations (p<.05) are crossed out")
Run the following code for more information about
ggcorrplot::ggcorrplot()
help("ggcorrplot")
[References: R documentation]
plot_likert(): Plot likert scales as centered stacked bars.
Usage:
plot_likert(
items,
groups = NULL,
groups.titles = "auto",
title = NULL,
legend.title = NULL,
legend.labels = NULL,
axis.titles = NULL,
axis.labels = NULL,
# optional, amount of categories of items (e.g. "strongly disagree", "disagree", "agree" and "strongly agree" would be catcount = 4).
catcount = NULL,
# If there's a neutral category (like "don't know" etc.), specify the index number (value) for this category.
cat.neutral = NULL,
sort.frq = NULL,
weight.by = NULL,
title.wtd.suffix = NULL,
wrap.title = 50,
wrap.labels = 30,
wrap.legend.title = 30,
wrap.legend.labels = 28,
geom.size = 0.6,
geom.colors = "BrBG",
cat.neutral.color = "grey70",
intercept.line.color = "grey50",
reverse.colors = FALSE,
values = "show",
show.n = TRUE,
show.legend = TRUE,
show.prc.sign = FALSE,
grid.range = 1,
grid.breaks = 0.2,
expand.grid = TRUE,
digits = 1,
reverse.scale = FALSE,
coord.flip = TRUE,
sort.groups = TRUE,
legend.pos = "bottom",
rel_heights = 1,
group.legend.options = list(nrow = NULL, byrow = TRUE),
cowplot.options = list(label_x = 0.01, hjust = 0, align = "v")
)
Example:
| TrustInGovernment | PoliticalKnowledge | SatisfactionDemocracy | InterestPolitics | ApprovalLeaders |
|---|---|---|---|---|
| 5 | 4 | 4 | 1 | 2 |
| 4 | 5 | 2 | 5 | 3 |
| 4 | 2 | 3 | 3 | 5 |
| 2 | 4 | 5 | 1 | 1 |
| 2 | 4 | 4 | 5 | 1 |
| 3 | 1 | 3 | 1 | 5 |
### make sure to mutate variables into "factors"
df1 %>%
mutate_all(~as.factor(.x)) -> df1
sjPlot::plot_likert(df1,
catcount = 5,
geom.colors = c("#993F00", "#FF8E32", "#FFE5CC", "#B2FCFF", "#51C3CC", "grey"),
# legend.labels = c()
reverse.scale = T,
title = "Likert plot",
geom.size = .5) +
theme(aspect.ratio = 1/2)
df2 %>%
mutate_all(~as.factor(.x)) -> df2
sjPlot::plot_likert(df2,
catcount = 5, # there are 5 categories (1 to 5 and 1 neutral)
cat.neutral = 1, # the position of the neutral category, here is the first as it is 0
geom.colors = c("#993F00", "#FF8E32", "#FFE5CC", "#B2FCFF", "#51C3CC", "grey"),
legend.labels = c("I don't know","1", "2", "3", "4", "5"),
reverse.scale = T,
title = "Likert plot",
geom.size = .6) +
theme(aspect.ratio = 1/2)
sjPlot::plot_likert(df2,
catcount = 5, # there are 5 categories (1 to 5 and 1 neutral)
cat.neutral = 1, # the position of the neutral category, here is the first as it is 0
geom.colors = c("#993F00", "#FF8E32", "#FFE5CC", "#B2FCFF", "#51C3CC", "grey"),
legend.labels = c("I don't know","1", "2", "3", "4", "5"),
reverse.scale = T,
title = "Likert plot",
geom.size = .6,
values = "sum.inside") +
theme(aspect.ratio = 1/2)
Run the following code for more information about
sjPlot::plot_likert()
??sjPlot::plot_likert
[References: R documentation]
In the following sections you will learn how to customize your plots.
You can find these information by typing ggplot2-specs in
the help panel.
Almost every geom has either
colour,fill, or both.
Colours and fills can be specified in the following ways:
A name, e.g., “red”. R has 657 built-in named colours, which can
be listed with colours().
head(colours()) # only print first rows with "head()"
## [1] "white" "aliceblue" "antiquewhite" "antiquewhite1"
## [5] "antiquewhite2" "antiquewhite3"
An rgb specification (see https://r-charts.com/color-palettes/)
The transparency of the colors can be modified with the argument
alpha. A lower value sets a more transparent
colour.
The arguments fill and colour are normally
specified within the aes() function, especially if you are
plotting categorical variables in relation to continuous variables and
you want different colours/fill for each level of your categorical
variable.
Examples:
### exaple of a geom with only the color argument
df %>%
ggplot(aes(x = income, y = illiteracy)) +
geom_jitter(color = "indianred3") +
ggtitle("Set the color of the plot")
### example of a geom with both color and fill argument
ggarrange(
df %>%
ggplot(aes(x = income2, y = illiteracy)) +
geom_violin(fill = "#51C3CC") +
annotate("label", x = 1.5, y = 3, label = "fill", fill = "#51C3CC"),
df %>%
ggplot(aes(x = income2, y = illiteracy)) +
geom_violin(color = "#51C3CC") +
annotate("label", x = 1.5, y = 3, label = "colour", color = "#51C3CC")
)
colour and fill based on levels of
categorical variables. In this case, the two arguments have to be
specified within the
aes(x = ..., y = ..., fill = ..., colour = ...) function.
This can be done either within the ggplot() function or
within the geom_X() function/### option 1
ggarrange(
df %>%
ggplot(aes(x = income2, y = illiteracy, fill = income2)) +
geom_violin() +
annotate("label", x = 1.5, y = 3, label = "fill"),
df %>%
ggplot(aes(x = income2, y = illiteracy, colour = income2)) +
geom_violin() +
annotate("label", x = 1.5, y = 3, label = "colour")
)
### option 2
ggarrange(
df %>%
ggplot(aes(x = income2, y = illiteracy)) +
geom_violin(aes(fill = income2)) +
annotate("label", x = 1.5, y = 3, label = "fill"),
df %>%
ggplot(aes(x = income2, y = illiteracy)) +
geom_violin(aes(colour = income2)) +
annotate("label", x = 1.5, y = 3, label = "colour")
)
Of course, you may want to set your favourite colors instead of using the default ones. For this, there are several functions that you can use depending on the type of variables (i.e., discrete/continuous).
For discrete variables
[Reference: https://stackoverflow.com/questions/70942728/understanding-color-scales-in-ggplot2]
ggarrange(
df %>%
ggplot(aes(x = income2, y = illiteracy)) +
geom_violin(aes(fill = income2)) +
annotate("label", x = 1.5, y = 3, label = "fill") +
scale_fill_manual(values = c("#0073C2", "#EFC000")),
df %>%
ggplot(aes(x = income2, y = illiteracy)) +
geom_violin(aes(colour = income2), linewidth = 1) +
annotate("label", x = 1.5, y = 3, label = "colour") +
scale_color_manual(values = c("#0073C2", "#EFC000"))
)
For continuous variables
[Reference: https://stackoverflow.com/questions/70942728/understanding-color-scales-in-ggplot2]
df %>%
ggplot(aes(x = income, y = illiteracy, color = income)) +
geom_jitter() +
scale_color_gradient2(low = "#BCFFB2", mid = "#8AE67E", high = "#1F990F") +
ggtitle("Example with scale_color_gradient()")
The appearance of a line is affected by
linewidth,linetype,lineend.
Line types can be specified with:
An integer or name:
0 = blank,
1 = solid,
2 = dashed,
3 = dotted,
4 = dotdash,
5 = longdash,
6 = twodash
lineend paramter, and can be one of “round”, “butt” (the
default), or “square”.Example:
ggarrange(
df %>%
slice_head(n = 5) %>%
ggplot(aes(x = income, y = illiteracy)) +
geom_line(size = .8, linetype = 3, color = "deepskyblue4") +
geom_point(size = 2, color = "deepskyblue4") +
ggtitle("Dotted line"),
df %>%
slice_head(n = 5) %>%
ggplot(aes(x = income, y = illiteracy)) +
geom_line(size = .8, linetype = 6, color = "indianred") +
geom_point(size = 2, color = "indianred") +
ggtitle("Twodasch line")
)
You can add vertical and horizontal lines with
geom_vline()andgeom_hline()respectively
Let’s plot the distribution of illiteracy by income (high vs. low)
df %>%
ggplot(aes(x = illiteracy, fill = income2)) +
geom_density(alpha = .7) +
ggtitle("Distribution of illiteracy by high vs. low-population states") +
scale_fill_manual(values = c("#0073C2", "#EFC000"))
Let’s now compute the mean score of illiteracy by income (high
vs. low). To do this, we can use the function summarize()
from the tidyverse package.
df %>%
summarize(
mean = mean(illiteracy),
.by = "income2"
) -> summary_illiteracy
kable(summary_illiteracy) %>% kable_styling()
| income2 | mean |
|---|---|
| low | 1.457143 |
| high | 0.962069 |
Now we can add a vertical line to the density plot, to signal the mean value for each state group (high vs. low income)
df %>%
ggplot(aes(x = illiteracy, fill = income2)) +
geom_density(alpha = .7) +
ggtitle("geom_vline()") +
scale_fill_manual(values = c("#0073C2", "#EFC000")) +
### ADD VERTICAL LINE
geom_vline(xintercept = 1.457143, color = "#EFC000") +
geom_vline(xintercept = 0.962069, color = "#0073C2")
The following, is a “more professional” alternative which avoids you
to type all values manually. Above, we saved the summary table into a
new variable summary_illiteracy. We can retrieve the
xintercept values from the tables directly as follows:
df %>%
ggplot(aes(x = illiteracy, fill = income2)) +
geom_density(alpha = .7) +
ggtitle("geom_vline()") +
scale_fill_manual(values = c("#0073C2", "#EFC000")) +
### ADD VERTICAL LINE
geom_vline(xintercept = summary_illiteracy$mean[summary_illiteracy$income2 == "low"], color = "#EFC000") +
geom_vline(xintercept = summary_illiteracy$mean[summary_illiteracy$income2 == "high"], color = "#0073C2")
You can edit the linetype and size with linetype and
size respectively.
df %>%
ggplot(aes(x = illiteracy, fill = income2)) +
geom_density(alpha = .7) +
ggtitle("geom_vline()") +
scale_fill_manual(values = c("#0073C2", "#EFC000")) +
### ADD VERTICAL LINE
geom_vline(xintercept = summary_illiteracy$mean[summary_illiteracy$income2 == "low"], color = "#EFC000", linetype = 2, size = 1.5) +
geom_vline(xintercept = summary_illiteracy$mean[summary_illiteracy$income2 == "high"], color = "#0073C2", linetype = 6, size = 1.5)
Similarly, we can add an horizontal line to the plot.
In the following, we will plot the population of the first 5 countries in the df, and we will add an horizontal line to the plot to illustrate what the mean population of the 5 states is (i.e., 5900)
df %>%
slice_head(n = 5) %>%
ggplot(aes(x = state, y = population, fill = state)) +
geom_col(color = "black") +
geom_hline(yintercept = 5900, color = "indianred", linetype = 6, size = 1) +
ggtitle("geom_hline()")
Use
shapeorpchto edit point shape
Example:
df %>%
ggplot(aes(x = income, y = illiteracy, pch = income2, color = income2)) +
geom_jitter(size = 3) +
scale_shape_manual(values = c(15, 17)) +
scale_color_manual(values = c("#0073C2", "#EFC000")) +
ggtitle("Edit point shape")
Usage:
theme(
line,
rect,
text,
title,
aspect.ratio,
axis.title,
axis.title.x,
axis.title.x.top,
axis.title.x.bottom,
axis.title.y,
axis.title.y.left,
axis.title.y.right,
axis.text,
axis.text.x,
axis.text.x.top,
axis.text.x.bottom,
axis.text.y,
axis.text.y.left,
axis.text.y.right,
axis.ticks,
axis.ticks.x,
axis.ticks.x.top,
axis.ticks.x.bottom,
axis.ticks.y,
axis.ticks.y.left,
axis.ticks.y.right,
axis.ticks.length,
axis.ticks.length.x,
axis.ticks.length.x.top,
axis.ticks.length.x.bottom,
axis.ticks.length.y,
axis.ticks.length.y.left,
axis.ticks.length.y.right,
axis.line,
axis.line.x,
axis.line.x.top,
axis.line.x.bottom,
axis.line.y,
axis.line.y.left,
axis.line.y.right,
legend.background,
legend.margin,
legend.spacing,
legend.spacing.x,
legend.spacing.y,
legend.key,
legend.key.size,
legend.key.height,
legend.key.width,
legend.text,
legend.text.align,
legend.title,
legend.title.align,
legend.position,
legend.direction,
legend.justification,
legend.box,
legend.box.just,
legend.box.margin,
legend.box.background,
legend.box.spacing,
panel.background,
panel.border,
panel.spacing,
panel.spacing.x,
panel.spacing.y,
panel.grid,
panel.grid.major,
panel.grid.minor,
panel.grid.major.x,
panel.grid.major.y,
panel.grid.minor.x,
panel.grid.minor.y,
panel.ontop,
plot.background,
plot.title,
plot.title.position,
plot.subtitle,
plot.caption,
plot.caption.position,
plot.tag,
plot.tag.position,
plot.margin,
strip.background,
strip.background.x,
strip.background.y,
strip.clip,
strip.placement,
strip.text,
strip.text.x,
strip.text.x.bottom,
strip.text.x.top,
strip.text.y,
strip.text.y.left,
strip.text.y.right,
strip.switch.pad.grid,
strip.switch.pad.wrap,
...,
complete = FALSE,
validate = TRUE
)
Arguments
line: all line elements (element_line())
rect: all rectangular elements
(element_rect())
text: all text elements (element_text())
title: all title elements: plot, axes, legends
(element_text(); inherits from text)
aspect.ratio: aspect ratio of the panel
axis.title, axis.title.x,
axis.title.y, axis.title.x.top,
axis.title.x.bottom, axis.title.y.left,
axis.title.y.righ: labels of axes
(element_text()).
axis.text, axis.text.x,
axis.text.y, axis.text.x.top,
axis.text.x.bottom, axis.text.y.left,
axis.text.y.right: tick labels along axes
(element_text()).
axis.ticks, axis.ticks.x,
axis.ticks.x.top, axis.ticks.x.bottom,
axis.ticks.y, axis.ticks.y.left,
axis.ticks.y.righ: tick marks along axes
(element_line()).
axis.ticks.length, axis.ticks.length.x,
axis.ticks.length.x.top,
axis.ticks.length.x.bottom,
axis.ticks.length.y, axis.ticks.length.y.left,
axis.ticks.length.y.right: length of tick marks
(unit)
axis.line, axis.line.x,
axis.line.x.top, axis.line.x.bottom,
axis.line.y, axis.line.y.left,
axis.line.y.right: lines along axes
(element_line()).
legend.background: background of legend
(element_rect(); inherits from rect)
legend.margin: the margin around each legend
(margin())
legend.spacing, legend.spacing.x,
legend.spacing.y: the spacing between legends
(unit).
legend.key: background underneath legend
keys
legend.key.size, legend.key.height,
legend.key.width: size of legend keys (unit)
legend.text: legend item labels (element_text();
inherits from text)
legend.text.align: alignment of legend labels
(number from 0 (left) to 1 (right))
legend.title: title of legend (element_text();
inherits from title)
legend.title.align: alignment of legend title
(number from 0 (left) to 1 (right))
legend.position: the position of legends (“none”,
“left”, “right”, “bottom”, “top”, or two-element numeric
vector)
legend.direction: layout of items in legends
(“horizontal” or “vertical”)
legend.justification: anchor point for positioning
legend inside plot (“center” or two-element numeric vector) or the
justification according to the plot area when positioned outside the
plot
legend.box: arrangement of multiple legends
(“horizontal” or “vertical”)
legend.box.just: justification of each legend within
the overall bounding box, when there are multiple legends (“top”,
“bottom”, “left”, or “right”)
legend.box.margin: margins around the full legend
area, as specified using margin()
legend.box.background: background of legend area
(element_rect(); inherits from rect)
plot.title: plot title (text appearance)
(element_text(); inherits from title) left-aligned by default
plot.title.position,
plot.caption.position: Alignment of the plot title/subtitle
and caption.
plot.subtitle: plot subtitle (text appearance)
(element_text(); inherits from title) left-aligned by default
plot.caption: caption below the plot (text
appearance) (element_text(); inherits from title) right-aligned by
default
For more information, run the following code:
help(theme)
Example:
df %>%
ggplot(aes(x = hs_grad, y = murder, pch = income2, color = income2)) +
geom_jitter(size = 2) +
geom_smooth(method = "lm", fill = "lightgrey") +
scale_color_manual(values = c("#0073C2", "#EFC000")) +
ggtitle("theme()") +
### customize plot
theme(
### change axis title x and y
axis.title.x = element_text(family = "mono", color = "deepskyblue4"),
axis.title.y = element_text(family = "mono", color = "deepskyblue4"),
### change title
plot.title = element_text(face = "italic", size = 20, color = "indianred"),
### legend title
legend.title = element_text(face = "italic", family = "mono", size = 12),
### edit legend position
legend.position = c(1,1),
legend.justification = c("right", "top"),
legend.background = element_rect(color = "black"),
legend.key = element_rect(color = "black")
)
There are several built-in themes in ggplot2. Here is a
helpful link: https://r-charts.com/ggplot2/themes/
Here are themes from ggplot2:::
The following are themes from the ggpubr:: package:
Here are themes from sjPlot:: and unikn::
packages:
plots can be annotated with the
geom_text()or with theannotate()function
Let’s see some examples
Imagine we want to annotate the mean of illiteracy for
both high and low income states. Here is how you can do it with
geom_text()
df %>%
ggplot(aes(x = income2, y = illiteracy, fill = income2)) +
geom_boxplot() +
scale_fill_manual(values = c("#0073C2", "#EFC000")) +
theme_unikn() +
geom_text(data = summary_illiteracy, aes(x = income2, y = mean, label = mean), vjust = -0.5, size = 5)
And here is how you can use annotate
df %>%
ggplot(aes(x = income2, y = illiteracy, fill = income2)) +
geom_boxplot() +
scale_fill_manual(values = c("#0073C2", "#EFC000")) +
theme_unikn() +
annotate("label", x = summary_illiteracy$income2, y = summary_illiteracy$mean, label = summary_illiteracy$mean)
With annotate you can choose between “text” and “label”
df %>%
ggplot(aes(x = income2, y = illiteracy, fill = income2)) +
geom_boxplot() +
scale_fill_manual(values = c("#0073C2", "#EFC000")) +
theme_unikn() +
annotate("text", x = summary_illiteracy$income2, y = summary_illiteracy$mean, label = summary_illiteracy$mean)
And you can edit the color, fill and size with color,
fill and size
df %>%
ggplot(aes(x = income2, y = illiteracy, fill = income2)) +
geom_boxplot() +
scale_fill_manual(values = c("#0073C2", "#EFC000")) +
theme_unikn() +
annotate("label", x = summary_illiteracy$income2, y = summary_illiteracy$mean, label = summary_illiteracy$mean, fill = "grey", size = 6)
For more information run the following code:
help(annotate)
ggpubr::ggarrange()is used to arrange multiple plots together
Usage:
ggarrange(
...,
plotlist = NULL,
ncol = NULL,
nrow = NULL,
labels = NULL,
label.x = 0,
label.y = 1,
hjust = -0.5,
vjust = 1.5,
font.label = list(size = 14, color = "black", face = "bold", family = NULL),
align = c("none", "h", "v", "hv"),
widths = 1,
heights = 1,
legend = NULL,
common.legend = FALSE,
legend.grob = NULL
)
Example:
ggarrange(
df %>%
ggplot(aes(x = income2, y = illiteracy)) +
geom_violin(aes(fill = income2), alpha = .9) +
scale_fill_manual(values = c("#0073C2", "#EFC000")) +
theme_sjplot(),
df %>%
ggplot(aes(x = income2, y = illiteracy)) +
geom_violin(aes(fill = income2), alpha = .9) +
scale_fill_manual(values = c("#0073C2", "#EFC000")) +
theme_sjplot2(),
df %>%
ggplot(aes(x = income2, y = illiteracy)) +
geom_violin(aes(fill = income2), alpha = .9) +
scale_fill_manual(values = c("#0073C2", "#EFC000")) +
theme_unikn(),
df %>%
ggplot(aes(x = income2, y = illiteracy)) +
geom_violin(aes(fill = income2), alpha = .9) +
scale_fill_manual(values = c("#0073C2", "#EFC000")) +
theme_grau(),
common.legend = T,
### edit legend position
legend = "bottom"
)
If the plots have the same x and y axes, you can remove them form the
plot codes using xlab() and ylab() and add
them once the plots are arranged with the function
annotate_figure()
annotate_figure(
p,
top = NULL,
bottom = NULL,
left = NULL,
right = NULL,
fig.lab = NULL,
fig.lab.pos = c("top.left", "top", "top.right", "bottom.left", "bottom",
"bottom.right"),
fig.lab.size,
fig.lab.face
)
Example:
ggarrange(
df %>%
ggplot(aes(x = income2, y = illiteracy)) +
geom_violin(aes(fill = income2), alpha = .9) +
scale_fill_manual(values = c("#0073C2", "#EFC000")) +
ylab(" ") + xlab(" ") +
theme_sjplot2(),
df %>%
ggplot(aes(x = income2, y = illiteracy)) +
geom_violin(aes(fill = income2), alpha = .9) +
scale_fill_manual(values = c("#0073C2", "#EFC000")) +
ylab(" ") + xlab(" ") +
theme_sjplot2(),
df %>%
ggplot(aes(x = income2, y = illiteracy)) +
geom_violin(aes(fill = income2), alpha = .9) +
scale_fill_manual(values = c("#0073C2", "#EFC000")) +
ylab(" ") + xlab(" ") +
theme_sjplot2(),
df %>%
ggplot(aes(x = income2, y = illiteracy)) +
geom_violin(aes(fill = income2), alpha = .9) +
scale_fill_manual(values = c("#0073C2", "#EFC000")) +
ylab(" ") + xlab(" ") +
theme_sjplot2(),
common.legend = T,
### edit legend position
legend = "right"
) -> p
annotate_figure(p,
top = text_grob("annotate_figure()", color = "black", face = "bold", size = 14),
bottom = text_grob("Income", color = "darkgrey", face = "italic"),
left = text_grob("Illiteracy", color = "darkgrey", face = "italic", rot = 90))
Data
| index_code | expenditure_on_education_pct_gdp | mortality_rate_infant | gini_index | gdp_per_capita_ppp | inflation_consumer_prices | intentional_homicides | unemployment | gross_fixed_capital_formation | population_density | suicide_mortality_rate | tax_revenue | taxes_on_income_profits_capital | alcohol_consumption_per_capita | government_health_expenditure_pct_gdp | urban_population_pct_total | country | time | sex | rating |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| AUS-2003 | 5.246357 | 4.9 | 33.5 | 30121.82 | 2.732596 | 1.533073 | 5.933 | 26.05029 | 2.567035 | 10.5 | 24.29997 | 62.72655 | NA | 5.623778 | 84.343 | AUS | 2003 | BOY | 527 |
| AUS-2003 | 5.246357 | 4.9 | 33.5 | 30121.82 | 2.732596 | 1.533073 | 5.933 | 26.05029 | 2.567035 | 10.5 | 24.29997 | 62.72655 | NA | 5.623778 | 84.343 | AUS | 2003 | GIRL | 522 |
| AUS-2003 | 5.246357 | 4.9 | 33.5 | 30121.82 | 2.732596 | 1.533073 | 5.933 | 26.05029 | 2.567035 | 10.5 | 24.29997 | 62.72655 | NA | 5.623778 | 84.343 | AUS | 2003 | TOT | 524 |
| AUS-2006 | 4.738430 | 4.7 | NA | 34846.72 | 3.555288 | 1.372940 | 4.785 | 27.78913 | 2.662089 | 10.6 | 24.51177 | 65.23156 | NA | 5.719998 | 84.700 | AUS | 2006 | BOY | 527 |
| AUS-2006 | 4.738430 | 4.7 | NA | 34846.72 | 3.555288 | 1.372940 | 4.785 | 27.78913 | 2.662089 | 10.6 | 24.51177 | 65.23156 | NA | 5.719998 | 84.700 | AUS | 2006 | GIRL | 513 |
| AUS-2006 | 4.738430 | 4.7 | NA | 34846.72 | 3.555288 | 1.372940 | 4.785 | 27.78913 | 2.662089 | 10.6 | 24.51177 | 65.23156 | NA | 5.719998 | 84.700 | AUS | 2006 | TOT | 520 |
Dataset structure
index_code: Index code.
expenditure_on_education_pct_gdp: Expenditure on
education as a percentage of GDP.
mortality_rate_infant: Infant mortality
rate.
gini_index: Gini index.
gdp_per_capita_ppp: GDP per capita in terms of
purchasing power parity.
inflation_consumer_prices: Consumer price
inflation.
intentional_homicides: Intentional
homicides.
unemployment: Unemployment rate.
gross_fixed_capital_formation: Gross fixed capital
formation as a percentage of GDP.
population_density: Population density.
suicide_mortality_rate: Suicide mortality
rate.
tax_revenue: Tax revenue.
taxes_on_income_profits_capital: Taxes on income,
profits, and capital gains.
alcohol_consumption_per_capita: Total alcohol
consumption per capita.
government_health_expenditure_pct_gdp: Government
health expenditure as a percentage of GDP.
urban_population_pct_total: Urban population
percentage of the total population.
country: Country.
time: Years.
sex: Sex.
rating: Value of PISA (Programme for International
Student Assessment) Results.
Plot the relationship between
alcohol_consumption_per_capitaandsuicide_mortality_rate
The purpose of this exercise is to select the right plot. Here, the two variables are continuous.
df %>%
ggplot(aes(x = suicide_mortality_rate, y = alcohol_consumption_per_capita)) +
geom_point()
Plot the
government_health_expenditure_pct_gdpover the years (time) for Australia only (AUS)
The purpose of this exercise is to select the right
plot. Here, the two variables are continuous.
However, we have time data, so we want to see the trend
of government_health_expenditure_pct_gdp over time.
### Run this
df %>%
filter(country == "AUS") -> ex.a2
ex.a2 %>%
ggplot(aes(x = time, y = government_health_expenditure_pct_gdp)) +
geom_point() +
geom_line()
Plot the PISA
ratingbysex
The purpose of this exercise is to select the right plot. Here, we have a categorical variable and a continuous variable. What is the best type of plot?
Note: You do not have to plot the “TOT” level from
the sex column. To do this, add the following code before
the ggplot function:
df %>% filter(sex != "TOT").
ggarrange(
df %>%
filter(sex != "TOT") %>%
ggplot(aes(x = sex, y = rating)) +
geom_boxplot(),
df %>%
filter(sex != "TOT") %>%
ggplot(aes(x = sex, y = rating)) +
geom_violin()
)
Plot the distribution of PISA
rating
df %>%
ggplot(aes(x = rating)) +
geom_density()
Plot the distribution of
population_densityfor Italy (ITA) and Germany (DEU)
ggplot() function:
df %>% filter(country %in% c("ITA", "DEU"))df %>%
filter(country %in% c("ITA", "DEU")) %>%
ggplot(aes(x = population_density, fill = country)) +
geom_density()
Plot the relationship between
expenditure_on_education_pct_gdpandratingusinggeom_point()
df %>%
ggplot(aes(x = expenditure_on_education_pct_gdp, y = rating)) +
geom_point(color = "indianred")
Plot the PISA
ratingover the years (time) for Italy only (ITA) bysexusinggeom_point()andgeom_line()
sexggplot function:
df %>% filter(country == "ITA" & sex != "TOT")df %>%
filter(country == "ITA" & sex != "TOT") %>%
ggplot(aes(x = time, y = rating, color = sex, pch = sex, linetype = sex)) +
geom_point(size = 3) +
geom_line(size = .5) +
scale_color_manual(values = c("indianred3", "deepskyblue4"))
Plot the PISA
ratingbystatein 2018 with geom_point usinggeom_point()
theme()theme()ggplot function:
df %>% filter(sex == "TOT" & time == 2018)df %>%
filter(sex == "TOT" & time == 2018) %>%
ggplot(aes(x = country, y = rating, color = rating)) +
geom_point() +
xlab("") + ylab("") +
theme(
axis.text.x = element_text(angle = 90)
)
Plot the distribution of PISA
ratingandexpenditure_on_education_pct_gdpin 2015 withgeom_density_ridges()
theme()Note: You have to edit the df first, so run the following code
### Run this code
df %>%
filter(time == 2015) %>%
distinct(country, .keep_all = T) %>%
dplyr::select(expenditure_on_education_pct_gdp, rating) %>%
pivot_longer(names_to = "measure", values_to = "value", 1:2) %>%
mutate(value = scale(value)) -> ex.B4
ex.B4 %>%
ggplot(aes(x = value, y = measure, fill = measure)) +
geom_density_ridges(alpha = .5) +
xlab("") + ylab("") +
theme(legend.position = "none")
Plot the mean and standard deviation of
expenditure_on_education_pct_gdpfor Italy and Germany usinggeom_point()andgeom_errorbar()
### Run this code
df %>%
filter(country == "ITA" | country == "DEU") %>%
summarize(
mean = mean(expenditure_on_education_pct_gdp),
sd = sd(expenditure_on_education_pct_gdp),
.by = "country"
) -> ex.b5
ex.b5 %>%
ggplot(aes(x = country, y = mean, color = country, linetype = country)) +
geom_point(size = 4) +
geom_errorbar(aes(ymin = mean - sd, ymax = mean + sd), width = .1, size = .6) +
ylim(3, 6)
Plot the distribution of the data on the plot of exercise B5
geom_jitter()### Run this code
df %>%
filter(country == "ITA" | country == "DEU") %>%
summarize(
mean = mean(expenditure_on_education_pct_gdp),
sd = sd(expenditure_on_education_pct_gdp),
.by = "country"
) -> challenge
challenge %>%
ggplot(aes(x = country, y = mean, color = country, linetype = country)) +
geom_point(size = 4) +
geom_errorbar(aes(ymin = mean - sd, ymax = mean + sd), width = .1, size = .6) +
ylim(3, 6) +
geom_jitter(data = df %>% filter(country %in% c("ITA", "DEU")), aes(x = country, y = expenditure_on_education_pct_gdp), alpha = .4) +
scale_color_manual(values = c("orange", "deepskyblue4"))
Plot the distribution of
government_health_expenditure_pct_gdpandmortality_rate_infantfor the following countries: Australia, United States and Canada. Usegeom_density_ridges()
government_health_expenditure_pct_gdp and
mortality_rate_infant should be scaled### Run this code
df %>%
dplyr::select(country, mortality_rate_infant, government_health_expenditure_pct_gdp) %>%
mutate(across(2:3, ~scale(.x))) %>%
filter(country %in% c("AUS", "USA", "CAN")) %>%
pivot_longer(names_to = "measure", values_to = "value", 2:3) -> ex.c1
ex.c1 %>%
ggplot(aes(x = value, y = country, fill = measure)) +
geom_density_ridges(alpha = .8) +
scale_fill_manual(values = c("orange", "deepskyblue3")) +
ggtitle("Distribution plot") +
xlab("") + ylab("")
In the plot of exercise C1, remove the legend using
theme()and manually write it inside the plot usingannotate()
government_health_expenditure_pct_gdp and
mortality_rate_infant should be scaledex.c1 %>%
ggplot(aes(x = value, y = country, fill = measure)) +
geom_density_ridges(alpha = .8) +
scale_fill_manual(values = c("orange", "deepskyblue4")) +
ggtitle("Distribution plot") +
xlab("") + ylab("") +
### C2
theme(legend.position = "none") +
annotate("label", x = 1, y = 4, label = "Government health expenditure pct gdp", fill = "orange") +
annotate("label", x = 1, y = 4.25, label = "Mortality rate infant", fill = "deepskyblue3")
Plot the distribution of all continuous variables in 2018 with
geom_histogram()and display them in a single plot usingggarrange()
### Run this code
df %>%
filter(time == 2018) %>%
dplyr::select(2:16, 20) %>%
mutate_all(~scale(.x)) -> ex.c3
ggarrange(
ex.c3 %>%
ggplot(aes(x = expenditure_on_education_pct_gdp)) +
geom_histogram() + theme(axis.title.x = element_text(size = 7)) + ylab(" "),
ex.c3 %>%
ggplot(aes(x = mortality_rate_infant)) +
geom_histogram() + theme(axis.title.x = element_text(size = 7)) + ylab(" "),
ex.c3 %>%
ggplot(aes(x = gini_index)) +
geom_histogram() + theme(axis.title.x = element_text(size = 7)) + ylab(" "),
ex.c3 %>%
ggplot(aes(x = gdp_per_capita_ppp)) +
geom_histogram() + theme(axis.title.x = element_text(size = 7)) + ylab(" "),
ex.c3 %>%
ggplot(aes(x = inflation_consumer_prices)) +
geom_histogram() + theme(axis.title.x = element_text(size = 7)) + ylab(" "),
ex.c3 %>%
ggplot(aes(x = intentional_homicides)) +
geom_histogram() + theme(axis.title.x = element_text(size = 7)) + ylab(" "),
ex.c3 %>%
ggplot(aes(x = unemployment)) +
geom_histogram() + theme(axis.title.x = element_text(size = 7)) + ylab(" "),
ex.c3 %>%
ggplot(aes(x = gross_fixed_capital_formation)) +
geom_histogram() + theme(axis.title.x = element_text(size = 7)) + ylab(" "),
ex.c3 %>%
ggplot(aes(x = population_density)) +
geom_histogram() + theme(axis.title.x = element_text(size = 7)) + ylab(" "),
ex.c3 %>%
ggplot(aes(x = suicide_mortality_rate)) +
geom_histogram() + theme(axis.title.x = element_text(size = 7)) + ylab(" "),
ex.c3 %>%
ggplot(aes(x = tax_revenue)) +
geom_histogram() + theme(axis.title.x = element_text(size = 7)) + ylab(" "),
ex.c3 %>%
ggplot(aes(x = taxes_on_income_profits_capital)) +
geom_histogram() + theme(axis.title.x = element_text(size = 7)) + ylab(" "),
ex.c3 %>%
ggplot(aes(x = government_health_expenditure_pct_gdp)) +
geom_histogram() + theme(axis.title.x = element_text(size = 7)) + ylab(" "),
ex.c3 %>%
ggplot(aes(x = urban_population_pct_total)) +
geom_histogram() + theme(axis.title.x = element_text(size = 7)) + ylab(" "),
ex.c3 %>%
ggplot(aes(x = rating)) +
geom_histogram() + theme(axis.title.x = element_text(size = 7)) + ylab(" ")
)
Use the plot of exercise C3 and change the color of the histograms, and annotate the final arranged grid adding a caption using
annotate_figure()
ggarrange(
ex.c3 %>%
ggplot(aes(x = expenditure_on_education_pct_gdp)) +
geom_histogram(fill = "indianred3") + theme(axis.title.x = element_text(size = 7)) + ylab(" "),
ex.c3 %>%
ggplot(aes(x = mortality_rate_infant)) +
geom_histogram(fill = "indianred3") + theme(axis.title.x = element_text(size = 7)) + ylab(" "),
ex.c3 %>%
ggplot(aes(x = gini_index)) +
geom_histogram(fill = "indianred3") + theme(axis.title.x = element_text(size = 7)) + ylab(" "),
ex.c3 %>%
ggplot(aes(x = gdp_per_capita_ppp)) +
geom_histogram(fill = "indianred3") + theme(axis.title.x = element_text(size = 7)) + ylab(" "),
ex.c3 %>%
ggplot(aes(x = inflation_consumer_prices)) +
geom_histogram(fill = "indianred3") + theme(axis.title.x = element_text(size = 7)) + ylab(" "),
ex.c3 %>%
ggplot(aes(x = intentional_homicides)) +
geom_histogram(fill = "indianred3") + theme(axis.title.x = element_text(size = 7)) + ylab(" "),
ex.c3 %>%
ggplot(aes(x = unemployment)) +
geom_histogram(fill = "indianred3") + theme(axis.title.x = element_text(size = 7)) + ylab(" "),
ex.c3 %>%
ggplot(aes(x = gross_fixed_capital_formation)) +
geom_histogram(fill = "indianred3") + theme(axis.title.x = element_text(size = 7)) + ylab(" "),
ex.c3 %>%
ggplot(aes(x = population_density)) +
geom_histogram(fill = "indianred3") + theme(axis.title.x = element_text(size = 7)) + ylab(" "),
ex.c3 %>%
ggplot(aes(x = suicide_mortality_rate)) +
geom_histogram(fill = "indianred3") + theme(axis.title.x = element_text(size = 7)) + ylab(" "),
ex.c3 %>%
ggplot(aes(x = tax_revenue)) +
geom_histogram(fill = "indianred3") + theme(axis.title.x = element_text(size = 7)) + ylab(" "),
ex.c3 %>%
ggplot(aes(x = taxes_on_income_profits_capital)) +
geom_histogram(fill = "indianred3") + theme(axis.title.x = element_text(size = 7)) + ylab(" "),
ex.c3 %>%
ggplot(aes(x = government_health_expenditure_pct_gdp)) +
geom_histogram(fill = "indianred3") + theme(axis.title.x = element_text(size = 7)) + ylab(" "),
ex.c3 %>%
ggplot(aes(x = urban_population_pct_total)) +
geom_histogram(fill = "indianred3") + theme(axis.title.x = element_text(size = 7)) + ylab(" "),
ex.c3 %>%
ggplot(aes(x = rating)) +
geom_histogram(fill = "indianred3") + theme(axis.title.x = element_text(size = 7)) + ylab(" ")
) %>%
annotate_figure(bottom = text_grob("Exercise C3. Distribution plot", color = "black", face = "italic", size = 10, hjust = 1.5))
Plot the
meanvalue ofratingbycountrywith a bar chart. Usegeom_bar(stat = "identity")
country### Run this code first
df %>%
filter(sex == "TOT") %>%
summarize(
mean = round(mean(rating),2),
.by = c("country")
) -> ex.c5
ex.c5 %>%
ggplot(aes(x = country, y = mean, fill = country)) +
geom_bar(stat = "identity", color = "black") +
theme(axis.text.x = element_text(size = 8, angle = 90, face = "bold"),
legend.position = "none") +
xlab(" ") + ylab(" ") +
ggtitle("Mean value of rating")
Use the plot of exercise C5 and annotate the
meanvalue as a label above each column usingannotate()
ex.c5ex.c5 %>%
ggplot(aes(x = country, y = mean, fill = country)) +
geom_bar(stat = "identity", color = "black", width = .5) +
theme(axis.text.x = element_text(size = 8, angle = 90, face = "bold"),
legend.position = "none") +
xlab(" ") + ylab(" ") +
ggtitle("Mean value of rating") +
annotate("label",
x = ex.c5$country,
y = ex.c5$mean,
label = ex.c5$mean,
size = 2)
Plot the change in the trend of
unemploymentin Germany, France, Ireland and Italy overtimeusinggeom_point()andgeom_line().
The trend for each country should be displayed in a separate panel using
facet_wrap().
### Run this code first
df %>%
filter(country %in% c("ITA", "DEU", "FRA", "IRL")) %>%
summarize(
unemployment = unemployment,
unemployement.mean = round(mean(unemployment),0),
sd.un = sd(unemployment),
suic.mean = round(mean(suicide_mortality_rate),0),
sd.su = sd(suicide_mortality_rate),
.by = c("country", "time")
) -> ex.c7
ex.c7 %>%
ggplot(aes(x = time, y = unemployment, fill = country, color = country, pch = country, linetype = country)) +
geom_point(size = 3) +
geom_line(size = 1) +
theme(legend.position = "none") +
scale_color_manual(values = c("indianred3", "darkgreen", "orange", "deepskyblue4")) +
facet_wrap(~country)
In a plot similar to the one from exercise C7, annotate the mean of
unemploymentat eachtimepoint for eachcountry. Useggarrange()here.
annotate_figure().### Run this code first
df %>%
filter(country %in% c("ITA", "DEU", "FRA", "IRL")) %>%
summarize(
unemployment = unemployment,
unemployement.mean = round(mean(unemployment),0),
sd.un = sd(unemployment),
suic.mean = round(mean(suicide_mortality_rate),0),
sd.su = sd(suicide_mortality_rate),
.by = c("country", "time")
) -> ex.c8
ggarrange(
ex.c8 %>%
filter(country == "ITA") %>%
ggplot(aes(x = time, y = unemployment)) +
geom_point(size = 3, pch = 16, color = "indianred3") +
geom_line(size = 1, linetype = 1, color = "indianred3") +
theme(legend.position = "none") +
annotate("label", x = ex.c8$time[ex.c8$country == "ITA"], y = ex.c8$unemployement.mean[ex.c8$country == "ITA"], label = ex.c8$unemployement.mean[ex.c8$country == "ITA"], color = "black", size = 2) +
annotate("label", x = 2017, y = 8, label = "Italy", color = "indianred3", size = 3) +
xlab(" ") + ylab(" "),
ex.c8 %>%
filter(country == "DEU") %>%
ggplot(aes(x = time, y = unemployment)) +
geom_point(size = 3, pch = 16, color = "darkgreen") +
geom_line(size = 1, linetype = 3, color = "darkgreen") +
theme(legend.position = "none") +
annotate("label", x = ex.c8$time[ex.c8$country == "DEU"], y = ex.c8$unemployement.mean[ex.c8$country == "DEU"], label = ex.c8$unemployement.mean[ex.c8$country == "DEU"], color = "black", size = 2) +
annotate("label", x = 2017, y = 10, label = "Germany", color = "darkgreen", size = 3) +
xlab(" ") + ylab(" "),
ex.c8 %>%
filter(country == "FRA") %>%
ggplot(aes(x = time, y = unemployment)) +
geom_point(size = 3, pch = 16, color = "orange") +
geom_line(size = 1, linetype = 16, color = "orange") +
theme(legend.position = "none") +
annotate("label", x = ex.c8$time[ex.c8$country == "FRA"], y = ex.c8$unemployement.mean[ex.c8$country == "FRA"], label = ex.c8$unemployement.mean[ex.c8$country == "FRA"], color = "black", size = 2) +
annotate("label", x = 2017, y = 8.5, label = "France", color = "orange", size = 3) +
xlab(" ") + ylab(" "),
ex.c8 %>%
filter(country == "IRL") %>%
ggplot(aes(x = time, y = unemployment)) +
geom_point(size = 3, pch = 16, color = "deepskyblue4") +
geom_line(size = 1, linetype = 11, color = "deepskyblue4") +
theme(legend.position = "none") +
annotate("label", x = ex.c8$time[ex.c8$country == "IRL"], y = ex.c8$unemployement.mean[ex.c8$country == "IRL"], label = ex.c8$unemployement.mean[ex.c8$country == "IRL"], color = "black", size = 2) +
annotate("label", x = 2017, y = 14, label = "Ireland", color = "deepskyblue4", size = 3) +
xlab(" ") + ylab(" ")
) %>%
annotate_figure(
top = text_grob(label = "Trend of unemployment", face = "bold", hjust = 1.7),
left = text_grob(label = "unemployment", face = "italic", rot = 90, size = 10)
)
Plot the change in
taxes_on_income_profits_capitalbased ongdp_per_capita_pppin 2018 usinggeom_smooth().
Plot the distribution of the data using
geom_jitter().
### Run this code
df %>%
mutate(taxes_on_income_profits_capital = scale(taxes_on_income_profits_capital),
gdp_per_capita_ppp = scale(gdp_per_capita_ppp)) %>%
filter(time == 2018) -> ex.c9
ex.c9 %>%
ggplot(aes(x = gdp_per_capita_ppp, y = taxes_on_income_profits_capital)) +
geom_jitter(color = "orange", alpha = .4) +
geom_smooth(method = "lm", color = "deepskyblue4", fill = "lightgrey")
Using a
boxplot(), plotratingover time.
geom_jitter().rating using geom_hline()ggplot:
df %>% mutate(time = as.factor(time))df %>%
mutate(time = as.factor(time)) %>%
ggplot(aes(x = time, y = gdp_per_capita_ppp)) +
geom_boxplot(aes(fill = time), alpha = .1) +
geom_jitter(aes(color = time), size = 2, alpha = .4) +
geom_hline(yintercept = mean(df$gdp_per_capita_ppp), size = 1, color = "indianred3", linetype = 2) +
scale_color_manual(values = c("#F3A360", "#EFB27E", "#DFCEBA", "#BEC8CC", "#7FAFD2", "#5082B0")) +
scale_fill_manual(values = c("#F3A360", "#EFB27E", "#DFCEBA", "#BEC8CC", "#7FAFD2", "#5082B0")) +
xlab(" ") + ylab("GDP per capita") +
theme(legend.position = "none")
- Plot where the PISA ratings were collected in 2018 with map using
geom_polyglon()
# Run this code first
map_data("world") -> world
country_mapping <- c(
AUS = "Australia", AUT = "Austria", BEL = "Belgium", CAN = "Canada", CHL = "Chile",
COL = "Colombia", CRI = "Costa Rica", CZE = "Czech Republic", DNK = "Denmark", EST = "Estonia",
FIN = "Finland", FRA = "France", DEU = "Germany", GRC = "Greece", HUN = "Hungary", ISL = "Iceland",
IRL = "Ireland", ISR = "Israel", ITA = "Italy", JPN = "Japan", KOR = "South Korea", LVA = "Latvia",
LTU = "Lithuania", LUX = "Luxembourg", MEX = "Mexico", NLD = "Netherlands", NZL = "New Zealand",
NOR = "Norway", POL = "Poland", PRT = "Portugal", SVK = "Slovakia", SVN = "Slovenia", ESP = "Spain",
SWE = "Sweden", CHE = "Switzerland", TUR = "Turkey", USA = "United States", GBR = "United Kingdom",
BRA = "Brazil"
)
# Add full country name column
df %>%
mutate(full_country_name = country_mapping[country]) -> df
# Join data frames
df %>%
filter(sex == "TOT" & time == 2018) %>%
rename(region = full_country_name) %>%
dplyr::select(region, rating) -> df.map.2018
common.cols <- intersect(names(df.map.2018), names(world))
left_join(world, df.map.2018, by = common.cols) -> df.map.2018
df.map.2018 %>%
mutate(rating = if_else(is.na(rating), 0, rating)) %>%
ggplot(aes(x = long, y = lat, group = group, fill = rating)) +
geom_polygon(color = "black") +
scale_fill_gradient2(low = "white", mid = "lightgrey", high = "orange") +
ggtitle("PISA rating in 2018") +
labs(x = "", y = "")
Useful links:
tidyverse functions: https://dplyr.tidyverse.org/articles/dplyr.htmlggplot2 package documentation: https://ggplot2.tidyverse.org/reference/ggplot2 themes: https://r-charts.com/ggplot2/themes/help("name function"))