GraphPad Prism surely is a great tool for preparation clean, neat and publication-quality figures. But it is not free. Once 30-days trial period expired and lockdown kicked in, the motivation was obvious. R’you there for me? Below is the first part of simple tutorial on how to prepare GraphPad-like figures in R using ggplot package.

Prepare dataset

Example is based on build in dataset ChickWeight, containing effect of diet on early growth of chicks. Only one time-point (weight at 20 days) will be used for visualization.

Data for ggplot

library(dplyr)
chicken_day20 <- ChickWeight %>%
  mutate(Diet = paste0("diet_", Diet)) %>% # fix diet names, so they don't start with a number
  rename(Weight = weight) %>% # fix variable names (Capitalized)
  filter(Time == 20) %>%  # filter 20-days old chicks
  select(-Time)
head(chicken_day20)
##   Weight Chick   Diet
## 1    199     1 diet_1
## 2    209     2 diet_1
## 3    198     3 diet_1
## 4    160     4 diet_1
## 5    220     5 diet_1
## 6    160     6 diet_1

Dataset is nice, simple and tidy (see what it means if you are not familiar: (https://vita.had.co.nz/papers/tidy-data.pdf)).

Data for GraphPad

GraphPad unfortunately does not benefit from tidy data format. The following code will reshape the data so it can be easily imported to GraphPad. Note, that values became column names (diet_1, diet_2, etc.). That is the reason why values from Diet column (factor) were changed to character just moment ago: we don’t want column names starting from numbers!

library(tidyr)
chicken_day20_wide <- chicken_day20 %>%
  pivot_wider(names_from = Diet, values_from = Weight)

head(chicken_day20_wide)
## # A tibble: 6 x 5
##   Chick diet_1 diet_2 diet_3 diet_4
##   <ord>  <dbl>  <dbl>  <dbl>  <dbl>
## 1 1        199     NA     NA     NA
## 2 2        209     NA     NA     NA
## 3 3        198     NA     NA     NA
## 4 4        160     NA     NA     NA
## 5 5        220     NA     NA     NA
## 6 6        160     NA     NA     NA

Note how the data is structured. In GraphPad (Table format: column) repeats are in rows of the same column. So every column (except Chick) is representing different group (condition, or in this case type of diet). In other words: Column headers are values, not variable names (first feature of messy data!). R is doing its best to keep data not messy: one observation per row. That is why every chick still have its own row. But each chick can be only under certain type of diet, so weight measurement can be associated only with single diet type. Thus, NAs are filling remaining cells. Since that can already be imported to GraphPad lets just save the file:

write.csv(chicken_day20_wide, "chicken_wide.csv")

After import to GraphPad (copy-paste from Excel or using any other method), I plotted the data with minor adjustments, namely:

  1. Apart from columns (mean value) I added individual data points for better data representation

  2. I changed column fill to yellow and removed column borders

  3. I added both-direction SD error bars (instead of up only)

GraphPad default figure (with slight adjustments)

GraphPad default figure (with slight adjustments)

Default plot with ggplot

I used ggplot2 to plot the data in R, similarly mapping for each diet (x) the following values:

library(ggplot2)
ggplot(chicken_day20, aes(x = Diet, y = Weight)) +
  stat_summary(fun = mean, geom = "bar") + # plot mean (column)
  stat_summary(fun.data = mean_sdl,
               fun.args = list(mult = 1),
               geom = "errorbar") + # plot errorbars
  geom_jitter() # plot individual points

Note, that stat_summary was used to plot data summary (mean and SD). It is just easier than creating separate data frame with summarized data.

Bloody hell, as accurate as ugly, huh?

Before we move on, lets work on few details first:

  1. Columns geom_bar()
    • Use fill = "skyblue" to fill columns with nice color.
    • Decrease column width with width = 0.7 to make them less crowded.
  2. Error bars geom_errorbar()
    • Decrease cap width to 30% width = 0.3
  3. Individual data points geom_jitter()
    • Reduce jittering to 30% width = 0.3
  4. labs
    • Add chart title
    • Add y-axis title
    • remove x-axis title

Also, assign the resulting plot to p

line_size <- 1 # defining variable upfront as we will re-use it
p <- ggplot(chicken_day20, aes(x = Diet, y = Weight)) +
  stat_summary(
    fun = mean,
    geom = "bar",
    fill = "skyblue",
    width = 0.7
  ) +
  stat_summary(
    fun.data = mean_sdl,
    fun.args = list(mult = 1),
    geom = "errorbar",
    width = 0.3,
    size = line_size
  ) +
  geom_jitter(width = 0.1) +
  labs(title = "Chicken weight", x = "", y = "Weight (grams)")
p

Theme modification

Start from theme_foundation:

Use theme_foundation() from ggthemes library to modify the theme. The reason for using this theme is well explained in help ?theme_foundation:

This theme is designed to be a foundation from which to build new themes, and not meant to be used directly. theme_foundation() is a complete theme with only minimal number of elements defined. It is easier to create new themes by extending this one rather than theme_gray() or theme_bw(), because those themes define elements deep in the hierarchy.

I will keep and use base_size as a separate variable:

library(ggthemes)
base_size <- 12 # defining separately, same as for line_size
p <- p + theme_foundation(base_size = base_size, base_family = "sans")
p

Clean the grids and borders:

p <- p + theme(
  panel.grid.major = element_blank(),
  panel.grid.minor = element_blank(),
  panel.border = element_blank(),
  panel.background = element_blank(),
  plot.background = element_blank() # that will remove border when using ggsave!
)
p

Add axis lines and ticks:

Size of the lines is controlled by previously defined line_size.

p <- p + theme(
  axis.line = element_line(colour="black", size = line_size),
  axis.ticks = element_line(colour="black", size = line_size)
)
p

Set text size and alignment:

  • Use relative values to control font size. Since base_size = 12, use +2 (14pt) for plot title and -1 (11pt) for axis text (but not axis title).
  • 45dg angle for axis.text.x
  • bold for all axis.text
  • center and bold the plot.title hjust = 0.5, face = "bold"
# size relative to base_size, i.e if "base_size = 12", "then title_text_rel_size = +2" will increase base size to 14
axis_text_rel_size = -1 
title_text_rel_size = +2

p <- p + theme(
  text = element_text(colour = "black"),
  plot.title = element_text(
    face = "bold",
    size = rel((title_text_rel_size + base_size) / base_size),
    hjust = 0.5
  ),
  axis.title = element_text(face = "bold", size = rel(1)),
  axis.title.y = element_text(angle = 90, vjust = 2),
  axis.title.x = element_text(vjust = -0.2),
  axis.text = element_text(face = "bold", size = rel((axis_text_rel_size +
                                                        base_size) / base_size
  )),
  axis.text.x = element_text(
    angle = 45,
    hjust = 1,
    vjust = 1
  )
)
p

Remove y-margin and set limits:

axis_text_rel_size = -1
title_text_rel_size = +2
p <-
  p + scale_y_continuous(limits = c(0, 400), expand = expansion(mult = c(0, 0)))
p

Save figure

Use dimensions matching GraphPad for better comparison.

ggsave("figure_ggplot.png", dpi=100, dev='png', height=8.16, width=7.06, units="cm")

Entire plot function

library(ggplot2)
library(ggthemes)

line_size <- 1 # defining variable upfront as we will re-use it
base_size <- 12 # defining separately, same as for line_size
axis_text_rel_size = -1
title_text_rel_size = +2


ggplot(chicken_day20, aes(x = Diet, y = Weight)) +
  # plot bars
  stat_summary(fun = mean, geom = "bar", fill = "skyblue", width = 0.7) +
  
  # plot error bars
  stat_summary(fun.data = mean_sdl, fun.args = list(mult = 1), 
               geom = "errorbar", width = 0.3, size = line_size) +
  
  # plot individual points
  geom_jitter(width = 0.1) +
  
  # set scale limits
  scale_y_continuous(limits = c(0, 400), expand = expansion(mult = c(0, 0))) + 
  
  # set labs
  labs(title = "Chicken weight", x = "", y = "Weight (grams)") + 
  
  # theme
  theme_foundation(base_size = base_size, base_family = "sans") + 
  theme(
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
    panel.border = element_blank(),
    panel.background = element_blank(),
    text = element_text(colour = "black"),
    plot.title = element_text(face = "bold", 
                              size = rel((title_text_rel_size + base_size) / base_size), hjust = 0.5),
    axis.line = element_line(colour="black", size = line_size),
    axis.ticks = element_line(colour="black", size = line_size),
    axis.title = element_text(face = "bold", size = rel(1)),
    axis.title.y = element_text(angle = 90, vjust = 2),
    axis.title.x = element_text(vjust = -0.2),
    axis.text = element_text(face = "bold", size = rel((axis_text_rel_size + base_size) / base_size)),
    axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1),
    plot.background = element_blank()
  )

Result: ggplot vs. GraphPad

Close enough?

Figure prepared with GraphPad

Figure prepared with GraphPad

Figure prepared with ggplot

Figure prepared with ggplot