\[\\[0.25in]\]
This is the primary source document for ggplot2, a
package under the tidyverse group:
https://ggplot2.tidyverse.org/index.html
\[\\[0.05in]\]
This cheat sheet covers basic ggplot2 syntax: https://github.com/rstudio/cheatsheets/blob/main/data-visualization-2.1.pdf
This website is a good walk thru: https://www.datanovia.com/en/blog/ggplot-examples-best-reference/
\[\\[0.05in]\]
I like this website for coloring: http://www.sthda.com/english/wiki/ggplot2-colors-how-to-change-colors-automatically-and-manually
\[\\[0.01in]\]
Plus these online tools for specific colors:
https://www.appypie.com/design/image-color-picker/
https://www.google.com/search?q=rgb+to+hex
\[\\[0.5in]\]
Packages used in this walkthru:
require(cowplot)
require(dplyr)
require(ggplot2)
require(ggrepel)
require(reshape2)
require(stringr)
require(tidyr)
require(tidyverse)
require(viridis)
\[\\[0.2in]\]
type1_df <- data.frame(matrix(NA, nrow = 5, ncol = 4)) # empty dataframe for data type 1
colnames(type1_df) <- c("Type", "var1", "var2", "var3") # Columns for data
type1_df$Type <- "Type1"
type1_df$var1 <- rnorm(n = 5, mean = 100, sd = 10) # Normal distribution for made up variables
type1_df$var2 <- rnorm(n = 5, mean = 20, sd = 12)
type1_df$var3 <- rnorm(n = 5, mean = 56, sd = 2)
type2_df <- data.frame(matrix(NA, nrow = 5, ncol = 4))
colnames(type2_df) <- c("Type", "var1", "var2", "var3")
type2_df$Type <- "Type2"
type2_df$var1 <- rnorm(n = 5, mean = 120, sd = 20)
type2_df$var2 <- rnorm(n = 5, mean = 20, sd = 5)
type2_df$var3 <- rnorm(n = 5, mean = 75, sd = 10)
type3_df <- data.frame(matrix(NA, nrow = 5, ncol = 4))
colnames(type3_df) <- c("Type", "var1", "var2", "var3")
type3_df$Type <- "Type3"
type3_df$var1 <- rnorm(n = 5, mean = 140, sd = 10)
type3_df$var2 <- rnorm(n = 5, mean = 36, sd = 9)
type3_df$var3 <- rnorm(n = 5, mean = 50, sd = 20)
play_df <- dplyr::bind_rows(type1_df, type2_df, type3_df) # Combining the 3 data types
play_df$Sample <- paste0("Sample", sprintf("%02s", row.names(play_df))) # Adding sample names
play_df <- play_df[, c("Sample", "Type", "var1", "var2", "var3")] # Rearranging the columns
head(play_df)
Our play data is 15 samples from three different types, with data for 3 variables.
Let’s take a look at it with a simple graph.
\[\\[0.2in]\]
ggplot(play_df, aes(x = var1, y = var2)) +
geom_point()
Let’s dissect that syntax a bit:
ggplot calls the package functionplay_df is the dataframe we’re using - this object
usually needs to be a data.frame (check this with str() if
you’re unsure and/or use the function as.data.frame() on
it), although some graph types may need something different like a
matrix/etcaes specifies the aesthetics details, including the x
and y axes+ sign shows that there are more argumentsgeom_point is function to make it a scatter plotNote that you can run the plot command with the cursor anywhere
inside the argument, it doesn’t have to be at the beginning. If you’ve
used the plus signs correctly it will run in one pass. ggplot graph
elements are built with on top of each argument sequentially - the order
matters. For some details they’re additive (such as
geom_point), while others will only recognize the last
one.
\[\\[0.2in]\]
The block above produces a graph but doesn’t save it as an object. Let’s fix that and add some more details:
first_plot <- ggplot(play_df, aes(x = var1, y = var2, color = Type)) +
geom_point() +
xlab("variable 1") +
ylab("variable 2") +
ggtitle("Scatter plot", subtitle = "Colored by discrete variable")
first_plot
Syntax again:
first_plot is the new plot object, you need to call it
again to display itcolor in the aes() can be categorical or
numeric data: be aware this can be tricky. Some ggplot functions will
spell it colour, and different variables will distinguish
color and fillxlab and ylab are the labels for each
axisggtitle shows the title. You can also include
subtitle inside the () as a separate detail.
You may need to call out the main title as main if it’s not
written first. Write \n to create a new line in your text
if needed.\[\\[0.02in]\] Alternatively, we can color the points with a continuous variable:
ggplot(play_df, aes(x = var1, y = var2, color = var3)) +
geom_point() +
ggtitle("Scatter plot", subtitle = "Colored by continuous variable")
\[\\[0.5in]\]
A neat thing we can start doing, now that we have a basic plot as an
object, is generating new plots without overwriting the original base
one by using the + after our plot object:
# Setting different axis scales:
first_plot +
xlim(0, 200) +
ylim(0, 50) +
ggtitle("Scatter plot", subtitle = "Manual axis ranges")
# Changing the axis intervals:
# Note that you'll use `continuous` or `discrete` depending on your data
first_plot +
scale_x_continuous(breaks = seq(0, 200, by = 10)) +
scale_y_continuous(breaks = seq(0, 50, by = 5)) +
ggtitle("Scatter plot", subtitle = "Manual axis intervals")
# Sub-setting your data without modifying the dataframe:
# You can use other logical arguments here too, like `>` and `grep`
second_plot <- ggplot(subset(play_df, Type %in% "Type1"), aes(x = var1, y = var2, color = Type)) +
geom_point() +
ggtitle("Scatter plot", subtitle = "Subsetting data")
second_plot
# Changing other `aes` arguments: `shape`, `size`, and `alpha` (the opacity)
ggplot(play_df, aes(x = var1, y = var2, color = Type, shape = Type, size = var3, alpha = var3)) +
geom_point() +
ggtitle("Scatter plot", subtitle = "Data shape/size/opacity")
Manually setting the colors and adding outlines: - Use
scale_color_manual or scale_fill_manual - You
can use: written colors (e.g. ‘black’), hexidecimal, RGB, etc. - Setting
the colour (here to black) outside of the
aes() changes the outline
ggplot(play_df, aes(x = var1, y = var2)) +
geom_point(aes(fill = Type), colour = "black", size = 2, shape = 21) +
scale_fill_manual(values = c("Type1" = "#FF99CC",
"Type2" = "#00FFFF",
"Type3" = "#00FF00")) +
ggtitle("Scatter plot", subtitle = "Manual coloring by a discrete variable")
# Or for a manual gradient along a continuous variable:
# Note that the scale values are % of the range, not actual values
ggplot(play_df, aes(x = var1, y = var2, color = var3)) +
geom_point(aes(fill = var3, fill = var3), colour = "black", size = 2, shape = 21) +
scale_fill_gradientn(colours = c("red","yellow","green","cyan","blue"),
values = c(1.0, 0.8, 0.6, 0.4, 0.2, 0)) +
ggtitle("Scatter plot", subtitle = "Manual coloring by a continuous variable")
\[\\[0.5in]\]
Now that we have some plots, we should save them. If you’re using RStudio, you can click the export button on the plot quadrant - but in my experience this is cumbersome to do for many files and isn’t easily replicatable for size/etc. Plus, since the graphs are dynamic, the text/point/etc sizes will change for different file sizes.
As a safe alternative, we can hard code lines that export the file. I would recommend generating the plots and checking the saved output a couple times until it’s the size/etc you like.
Some syntax in ggsave:
filename: the name of the output file. Note that the
file type (.png, .jpeg, .pdf, etc.) needs to be includedplot: specifies which plot object you’re savingunits: can be inches, centimeters, etcdpi: resolution# From the plot object:
ggsave(filename = "filename.png", plot = first_plot, width = 14, height = 8,
units = c("in"), dpi = 250)
# Or a more flexible approach:
ggsave(filename = "filename.png", plot = last_plot(), width = 14, height = 8,
units = c("in"), dpi = 250)
For saving many files as a single .pdf file, use the base
pdf function.
You’ll need to have run each of the objects, and it’ll save the last
version run. Note that you need to include the dev.off() or
else it will continue adding files. This defaults to your working
directory, but you can add an argument to modify the file path. Fun tip:
using the paste0 function in the file name can
be used to print per variable if you run plots iteratively thru a file
directory, or to pull the name of your raw data when you import it.
pdf(file = "ggplot_plots.pdf", width = 6, height = 5)
first_plot
second_plot
invisible(dev.off())
\[\\[0.5in]\]
Now that we’ve covered the basics, we can add in some additional customization:
# You can manipulate data while plotting, instead of calculating it outside as an independent variable:
ggplot(play_df, aes(x = var1+var3, y = log(var2), color = Type)) +
geom_point() +
ggtitle("Scatter plot", subtitle = "Calculations in ggplot code")
\[\\[0.1in]\]
Labeling points:
label argument in the first line
aes()hjust and vjust are the horizontal and
vertical adjustmentscheck_overlap in the geom_text, or adding
additional arguments like geom_jitter,
position_jitter, and ggrepelggplot(play_df, aes(x = var1, y = var2, color = Type, label = Sample)) +
geom_point() +
geom_text(hjust = 0, vjust = -2) +
ggtitle("Scatter plot", subtitle = "Basic labels")
ggplot(play_df, aes(x = var1, y = var2, color = Type, label = Sample)) +
geom_point() + ylim(0, 40) +
geom_label_repel(aes(label = Sample, fill = factor(Type)), color = 'white') +
ggtitle("Scatter plot", subtitle = "Pretty labels")
# Just like in the data manipulations above, there are many places where you can add new arguments:
ggplot(play_df, aes(x = var1, y = var2, color = Type, label = Sample)) +
geom_point() + ylim(0, 40) +
geom_label_repel(data = play_df, aes(label = ifelse(var2 > 20, as.character(Sample), ''))) +
ggtitle("Scatter plot", subtitle = "Labels based on logistical argument")
\[\\[0.1in]\]
Some aesthetic changes to your plots:
Adding colored blocks to the plot using geom_rect:
ggplot(play_df, aes(x = var1, y = var2, color = Type)) +
geom_rect(aes(inherit.aes = FALSE, xmin = 100, xmax = 120, ymin = 0, ymax = Inf),
color = "transparent", fill = "#9c7ba8", alpha = 0.01) +
geom_rect(aes(inherit.aes = FALSE, xmin = 115, xmax = 130, ymin = 25, ymax = 50),
color = "transparent", fill = "#0ea113", alpha = 0.01) +
geom_point() +
ggtitle("Scatter plot", subtitle = "Highlighting regions of the graph")
Adding lines using geom_vline (vertical) or
geom_vline (horizontal), these can be used to make nice
zero lines too:
ggplot(play_df, aes(x = var1, y = var2, color = Type)) +
geom_vline(xintercept = c(12, 90, 120), color = "#e380ff", size = 1) +
geom_vline(xintercept = 0, color = 'black', size = 0.4) +
geom_hline(yintercept = 0, color = 'black', size = 0.4) +
geom_point() +
ggtitle("Scatter plot", subtitle = "Vertical and horizontal lines")
# Changing font of the plot:
ggplot(play_df, aes(x = var1, y = var2, color = Type)) +
geom_point() +
theme(text = element_text(family = "Times New Roman", size = 15)) +
ggtitle("Scatter plot", subtitle = "Changing the font")
# Changing the angle of the axis labels
# This is more helpful for long data names
ggplot(play_df, aes(x = Type, y = var3, color = Type)) +
scale_x_discrete(guide = guide_axis(angle = 45)) +
geom_point() +
ggtitle("Scatter plot", subtitle = "Angled axis labels")
Editing the legend using guides:
titleggplot(play_df, aes(x = var1, y = var2, color = Type)) +
geom_point() +
guides(color = "none") +
ggtitle("Scatter plot", subtitle = "Removing the legend")
# You can use pre-made palettes for coloring:
# These are usually outside packages, such as `RColorBrewer` and `viridis`
ggplot(play_df, aes(x = var1, y = var3, color = var1)) +
geom_point() +
scale_color_viridis(option = "D") +
ggtitle("Scatter plot", subtitle = "Using pre-made color palettes")
# Themes are a wide suite of options on how the graph layout looks
ggplot(play_df, aes(x = var1, y = var2, color = Type)) +
geom_point() +
theme_classic() +
ggtitle("Scatter plot", subtitle = "Classic graph design theme")
Using theme_void and the other theme()
settings are nice if you need to add a plot to another figure,
everything here is transparent except the scatter points
ggplot(play_df, aes(x = var1, y = var2, color = Type)) +
geom_point() +
theme(panel.background = element_rect(fill = 'transparent'),
plot.background = element_rect(fill = 'transparent', color = NA),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
legend.background = element_rect(fill = 'transparent'),
legend.box.background = element_rect(fill = 'transparent')) +
theme_void() +
guides(color = FALSE)
\[\\[0.1in]\]
Breaking up your data into separate graphs using
facet_wrap:
~ shows what variables you’re using. Leaving one
side blank will leave it intact, and which side is blank will dictate if
it’s the x or y axis. You can also do two variables, like X ~ Y.scale to free to let the axes match
each internal dataset, or set it to fixed to keep them
constant. You can also choose this for just one axis.ggplot(play_df, aes(x = var1, y = var2, color = Type)) +
facet_wrap(~ Type, scales = "free") +
geom_point() +
ggtitle("Scatter plot", subtitle = "Breaking up data")
\[\\[0.1in]\]
More code for manipulating/adding to data:
# Plotting an inset inside of a main graph
big.plot <- ggplot(play_df, aes(x = var1, y = var2, color = Type)) +
ylim(0, 75) + xlim(25, 175) +
ggtitle("Main plot: var1 vs var2", subtitle = "Adding an inset plot") +
geom_point()
small.plot <- ggplot(play_df, aes(x = var1/var3, y = var2/var3, color = Type)) +
geom_point() +
theme(legend.position = "none")
plot.with.inset <- ggdraw() +
draw_plot(big.plot) +
draw_plot(small.plot, x = .1, y = 0.5, width = .3, height = .3)
plot.with.inset
# Changing the axis scales:
ggplot(play_df, aes(x = var1, y = var2, color = Type)) +
geom_point() +
coord_trans(x = "log2", y = "log2") +
ggtitle("Scatter plot", subtitle = "Changing the axis scales")
# Highlighting samples
ggplot(play_df, aes(x = var1, y = var2, color = Type)) +
geom_point(color = "black", size = 2) +
geom_point(data = play_df[c(2:5, 10), ], aes(x = var1, y = var2), colour = "red", size = 3) +
ggtitle("Scatter plot", subtitle = "Highlighting specific samples")
\[\\[0.5in]\]
Great, now that we’ve covered the basics for customizing a scatter plot, let’s explore some other common graph types:
# More scatter plot options, such as ellipses (`stat_ellipse`) and regressions (`geom_smooth`):
ggplot(play_df, aes(x = var1, y = var2, color = Type)) +
stat_ellipse(aes(x = var1, y = var2, color = Type), type = "norm") +
geom_smooth(method = 'lm', formula = y ~ x) +
geom_point() +
ggtitle("Scatter plot", subtitle = "Ellipses and regressions")
You control normal vs stacked plots by what variable you plot and color by, and in ggplot2 pie charts are just stacked barplots with a circular axis.
# Normal bar, plotting sample and coloring by type
ggplot(play_df, aes(x = Sample, y = var2, fill = Type, color = Type)) +
geom_bar(stat = "identity") +
scale_x_discrete(guide = guide_axis(angle = 45)) +
ggtitle("Bar plot", subtitle = "Normal")
# Stacked bar, plotting type and coloring by sample
ggplot(play_df, aes(x = Type, y = var2, fill = Sample, color = Sample)) +
geom_bar(stat = "identity") +
scale_x_discrete(guide = guide_axis(angle = 45)) +
ggtitle("Bar plot", subtitle = "Stacked")
# Pie chart:
ggplot(play_df, aes(x = "", y = var2, fill = Type)) +
geom_bar(stat = "identity", width = 1) +
coord_polar("y", start = 0) +
theme_void() +
ggtitle("Pie chart")
ggplot(play_df, aes(x = Type, y = var2, color = Type)) +
geom_boxplot(outlier.shape = NA) +
geom_point(aes(group = Type), alpha = 0.75, position = position_jitterdodge()) +
ggtitle("Box plot")
ggplot(play_df, aes(x = Type, y = var2, color = Type)) +
geom_violin() +
geom_point(aes(group = Type), alpha = 0.75, position = position_jitterdodge()) +
ggtitle("Violin plot")
# Definitely check out the `reshape2` package's `melt` and `cast` functions
melted.data <- melt(play_df, id = "Sample")
melted.data <- melted.data[(16:60), ]
melted.data$value <- as.numeric(melted.data$value)
head(melted.data)
ggplot(melted.data, aes(x = variable, y = Sample, fill = value)) +
geom_tile() +
scale_fill_gradient(low = "#00FFFF", high = "#9c7ba8") +
ggtitle("Heatmap")
\[\\[0.25in]\]
The ternary diagram packages interfere with ggplot2 - in my experience it’s best practice to load them immediately before the code and unloaded them immediately afterwards
require(ggtern)
require(ggalt)
ggtern(data = play_df, aes(var1, var2, var3, color = Type, fill = Type)) +
geom_encircle(alpha = 0.3, size = 1) +
scale_shape_manual(values = 1:8) +
geom_point(size = 2.2, color = "black") +
geom_point(size = 2) +
theme_rgbw() +
guides(color = "none") +
ggtitle("Ternary plot")
unloadNamespace("ggtern")
unloadNamespace("ggalt")
\[\\[0.25in]\]
Here’s the code I use to run a PCA plot, with all of the info included for both the stats and graphs:
PCA_df <- play_df
rownames(PCA_df) <- 1:nrow(PCA_df)
PCA_df[, 3:5] <- sapply(PCA_df[, 3:5], as.numeric)
forPCA <- PCA_df[, (3:5)]
myPCA <- prcomp(forPCA[,-1], center = TRUE, retx = TRUE)
percentage <- round(myPCA$sdev / sum(myPCA$sdev) * 100, 2)
percentage <- paste(colnames(myPCA), "(", paste( as.character(percentage), "%", ")", sep="") )
PCA_df$Sample <- rownames(PCA_df)
PCA.df <- as.data.frame(myPCA$x, row.names = FALSE)
PCA.df$Sample <- rownames(PCA_df)
PCA_df_total <- dplyr::left_join(PCA.df, PCA_df, by = "Sample")
ggplot(PCA_df_total, aes(x = PC1, y = PC2, fill = Type, group = Type)) +
geom_vline(colour = "#000000", xintercept = 0) +
geom_hline(colour = "#000000", yintercept = 0) +
stat_ellipse(aes(fill = Type), geom = "polygon", type = "t", level = 0.95, alpha = 0.1) +
geom_point(colour = "black", shape = 21) +
xlab(paste0("PC1", percentage[1])) + ylab(paste0("PC2", percentage[2])) +
guides(size = FALSE, fill = guide_legend(override.aes = list(size = 3))) +
ggtitle("Principle Components Analysis")
\[\\[0.25in]\]
Happy plotting!
\[\\[1in]\]