GraphPad Prism surely is a great tool for preparation clean, neat and publication-quality figures. But it is not free. Once 30-days trial period expired and lockdown kicked in, the motivation was obvious. R
’you there for me? Below is the first part of simple tutorial on how to prepare GraphPad-like figures in R
using ggplot
package.
Example is based on build in dataset ChickWeight
, containing effect of diet on early growth of chicks. Only one time-point (weight at 20 days) will be used for visualization.
library(dplyr)
chicken_day20 <- ChickWeight %>%
mutate(Diet = paste0("diet_", Diet)) %>% # fix diet names, so they don't start with a number
rename(Weight = weight) %>% # fix variable names (Capitalized)
filter(Time == 20) %>% # filter 20-days old chicks
select(-Time)
head(chicken_day20)
## Weight Chick Diet
## 1 199 1 diet_1
## 2 209 2 diet_1
## 3 198 3 diet_1
## 4 160 4 diet_1
## 5 220 5 diet_1
## 6 160 6 diet_1
Dataset is nice, simple and tidy (see what it means if you are not familiar: (https://vita.had.co.nz/papers/tidy-data.pdf)).
GraphPad unfortunately does not benefit from tidy data format. The following code will reshape the data so it can be easily imported to GraphPad. Note, that values became column names (diet_1
, diet_2
, etc.). That is the reason why values from Diet
column (factor) were changed to character just moment ago: we don’t want column names starting from numbers!
library(tidyr)
chicken_day20_wide <- chicken_day20 %>%
pivot_wider(names_from = Diet, values_from = Weight)
head(chicken_day20_wide)
## # A tibble: 6 x 5
## Chick diet_1 diet_2 diet_3 diet_4
## <ord> <dbl> <dbl> <dbl> <dbl>
## 1 1 199 NA NA NA
## 2 2 209 NA NA NA
## 3 3 198 NA NA NA
## 4 4 160 NA NA NA
## 5 5 220 NA NA NA
## 6 6 160 NA NA NA
Note how the data is structured. In GraphPad (Table format: column) repeats are in rows of the same column. So every column (except Chick
) is representing different group (condition, or in this case type of diet). In other words: Column headers are values, not variable names (first feature of messy data!). R is doing its best to keep data not messy: one observation per row. That is why every chick still have its own row. But each chick can be only under certain type of diet, so weight measurement can be associated only with single diet type. Thus, NA
s are filling remaining cells. Since that can already be imported to GraphPad lets just save the file:
write.csv(chicken_day20_wide, "chicken_wide.csv")
After import to GraphPad (copy-paste from Excel or using any other method), I plotted the data with minor adjustments, namely:
Apart from columns (mean value) I added individual data points for better data representation
I changed column fill to yellow and removed column borders
I added both-direction SD error bars (instead of up only)
GraphPad default figure (with slight adjustments)
I used ggplot2
to plot the data in R, similarly mapping for each diet (x) the following values:
library(ggplot2)
ggplot(chicken_day20, aes(x = Diet, y = Weight)) +
stat_summary(fun = mean, geom = "bar") + # plot mean (column)
stat_summary(fun.data = mean_sdl,
fun.args = list(mult = 1),
geom = "errorbar") + # plot errorbars
geom_jitter() # plot individual points
Note, that stat_summary
was used to plot data summary (mean and SD). It is just easier than creating separate data frame with summarized data.
Bloody hell, as accurate as ugly, huh?
Before we move on, lets work on few details first:
fill = "skyblue"
to fill columns with nice color.width = 0.7
to make them less crowded.width = 0.3
width = 0.3
labs
title
y
-axis titlex
-axis titleAlso, assign the resulting plot to p
line_size <- 1 # defining variable upfront as we will re-use it
p <- ggplot(chicken_day20, aes(x = Diet, y = Weight)) +
stat_summary(
fun = mean,
geom = "bar",
fill = "skyblue",
width = 0.7
) +
stat_summary(
fun.data = mean_sdl,
fun.args = list(mult = 1),
geom = "errorbar",
width = 0.3,
size = line_size
) +
geom_jitter(width = 0.1) +
labs(title = "Chicken weight", x = "", y = "Weight (grams)")
p
Use theme_foundation()
from ggthemes
library to modify the theme. The reason for using this theme is well explained in help ?theme_foundation
:
This theme is designed to be a foundation from which to build new themes, and not meant to be used directly. theme_foundation() is a complete theme with only minimal number of elements defined. It is easier to create new themes by extending this one rather than theme_gray() or theme_bw(), because those themes define elements deep in the hierarchy.
I will keep and use base_size
as a separate variable:
library(ggthemes)
base_size <- 12 # defining separately, same as for line_size
p <- p + theme_foundation(base_size = base_size, base_family = "sans")
p
p <- p + theme(
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_blank(),
panel.background = element_blank(),
plot.background = element_blank() # that will remove border when using ggsave!
)
p
Size of the lines is controlled by previously defined line_size
.
p <- p + theme(
axis.line = element_line(colour="black", size = line_size),
axis.ticks = element_line(colour="black", size = line_size)
)
p
base_size = 12
, use +2 (14pt) for plot title and -1 (11pt) for axis text (but not axis title).axis.text.x
axis.text
hjust = 0.5
, face = "bold"
# size relative to base_size, i.e if "base_size = 12", "then title_text_rel_size = +2" will increase base size to 14
axis_text_rel_size = -1
title_text_rel_size = +2
p <- p + theme(
text = element_text(colour = "black"),
plot.title = element_text(
face = "bold",
size = rel((title_text_rel_size + base_size) / base_size),
hjust = 0.5
),
axis.title = element_text(face = "bold", size = rel(1)),
axis.title.y = element_text(angle = 90, vjust = 2),
axis.title.x = element_text(vjust = -0.2),
axis.text = element_text(face = "bold", size = rel((axis_text_rel_size +
base_size) / base_size
)),
axis.text.x = element_text(
angle = 45,
hjust = 1,
vjust = 1
)
)
p
axis_text_rel_size = -1
title_text_rel_size = +2
p <-
p + scale_y_continuous(limits = c(0, 400), expand = expansion(mult = c(0, 0)))
p
Use dimensions matching GraphPad for better comparison.
ggsave("figure_ggplot.png", dpi=100, dev='png', height=8.16, width=7.06, units="cm")
library(ggplot2)
library(ggthemes)
line_size <- 1 # defining variable upfront as we will re-use it
base_size <- 12 # defining separately, same as for line_size
axis_text_rel_size = -1
title_text_rel_size = +2
ggplot(chicken_day20, aes(x = Diet, y = Weight)) +
# plot bars
stat_summary(fun = mean, geom = "bar", fill = "skyblue", width = 0.7) +
# plot error bars
stat_summary(fun.data = mean_sdl, fun.args = list(mult = 1),
geom = "errorbar", width = 0.3, size = line_size) +
# plot individual points
geom_jitter(width = 0.1) +
# set scale limits
scale_y_continuous(limits = c(0, 400), expand = expansion(mult = c(0, 0))) +
# set labs
labs(title = "Chicken weight", x = "", y = "Weight (grams)") +
# theme
theme_foundation(base_size = base_size, base_family = "sans") +
theme(
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_blank(),
panel.background = element_blank(),
text = element_text(colour = "black"),
plot.title = element_text(face = "bold",
size = rel((title_text_rel_size + base_size) / base_size), hjust = 0.5),
axis.line = element_line(colour="black", size = line_size),
axis.ticks = element_line(colour="black", size = line_size),
axis.title = element_text(face = "bold", size = rel(1)),
axis.title.y = element_text(angle = 90, vjust = 2),
axis.title.x = element_text(vjust = -0.2),
axis.text = element_text(face = "bold", size = rel((axis_text_rel_size + base_size) / base_size)),
axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1),
plot.background = element_blank()
)
Close enough?
Figure prepared with GraphPad
Figure prepared with ggplot