Introduction

ggplot is a powerful and a flexible R package, implemented by Hadley Wickham, for producing elegant graphics.

The concept behind ggplot2 divides plot into three different fundamental parts: Plot = data + Aesthetics + Geometry.

The principal components of every plot can be defined as follow:

  • data is a data frame
  • Aesthetics is used to indicate x and y variables. It can also be used to control the color, the size or the shape of points, the height of bars, etc…..
  • Geometry defines the type of graphics (histogram, box plot, line plot, density plot, dot plot, ….)

There are two major functions in ggplot2 package: qplot() and ggplot() functions. - qplot() stands for quick plot, which can be used to produce easily simple plots. - ggplot() function is more flexible and robust than qplot for building a plot piece by piece.

Load the necessary packages

Data Format and Preparation

The data should be a data.frame (columns are variables and rows are observations).

The data set mtcars is used in the examples below:

Quick plot

Quick plot with ggplot2 in R software and data visualization

The function qplot() [in ggplot2] is very similar to the basic plot() function from the R base package. It can be used to create and combine easily different types of plots. However, it remains less flexible than the function ggplot().

This chapter provides a brief introduction to qplot(), which stands for quick plot.

Data Format

The data must be a data.frame (columns are variables and rows are observations).

mtcars : Motor Trend Car Road Tests.
Description: The data comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973 - 74 models).

Format: A data frame with 32 observations on 3 variables.
- [, 1] mpg Miles/(US) gallon
- [, 2] cyl Number of cylinders
- [, 3] wt Weight (lb/1000)

Usage of the qplot()

A simplified format of qplot is:

qplot (x,y=NULL, data, geom="auto", xlim=c(NA,NA), ylim=c(NA,NA))

x : x values
y : y values (optional)
data : data frame to use (optional).
geom : Character vector specifying geom to use. Defaults to “point” if x and y are specified, and “histogram” if only x is specified.
xlim, ylim: x and y axis limits

Other arguments including main, xlab, ylab and log can be used also: - main: Plot title - xlab, ylab: x and y axis labels - log: which variables to log transform. Allowed values are “x”, “y” or “xy”.

Scatterplots

Basic Scatterplots

The plot can be created using data from either numeric vectors or a data frame:

Scatter plots with smoothed line

The option smooth is used to add a smoothed line with its standard error:

Change scatter plot colors

Points can be colored according to the values of a continuous or a discrete variable. The argument colour is used.

Change the shape and the size of points

Like color, the shape and the size of points can be controlled by a continuous or discrete variable.

Scatter plot with texts

The argument label is used to specify the texts to be used for each points:

Box plot, dot plot and violin plot

PlantGrowth data set is used in the following example :

geom = “boxplot”: draws a box plot
geom = “dotplot”: draws a dot plot. The supplementary arguments stackdir = “center” and binaxis = “y” are required.
geom = “violin”: draws a violin plot. The argument trim is set to FALSE

Change the color by groups:

Histogram and density plots

The histogram and density plots are used to display the distribution of data.

Generate some data

The R code below generates some data containing the weights by sex (M for male; F for female):

Histogram

Density plot

Main Titles and axis titles

Titles can be added to the plot as follow:

Box plot

This R tutorial describes how to create a box plot using R software and ggplot2 package.

The function geom_boxplot() is used. A simplified format is :

geom_boxplot(outlier.colour="black", outlier.shape=16,outlier.size=2, notch=FALSE)

Details

  • outlier.colour, outlier.shape, outlier.size : The color, the shape and the size for outlying points
  • notch : logical value. If TRUE, make a notched box plot. The notch displays a confidence interval around the median which is normally based on the median +/- 1.58*IQR/sqrt(n). Notches are used to compare groups; if the notches of two boxes do not overlap, this is a strong evidence that the medians differ.

Prepare the data set

In this illustration, the ToothGrowth data was used:

Make sure that the variable dose is converted as a factor variable using the above R script.

Basic box plot

The function stat_summary() can be used to add mean points to a box plot :

Box plot with dots

Dots (or points) can be added to a box plot using the functions geom_dotplot() or geom_jitter() :

Change box plot line colors

Box plot line colors can be automatically controlled by the levels of the variable dose :

It is also possible to change manually box plot line colors using the functions :

  • scale_color_manual() : to use custom colors
  • scale_color_brewer() : to use color palettes from RColorBrewer package
  • scale_color_grey() : to use grey color palettes

Change box plot fill colors In the R code below, box plot fill colors are automatically controlled by the levels of dose :

It is also possible to change manually box plot fill colors using the functions :

  • scale_fill_manual() : to use custom colors
  • scale_fill_brewer() : to use color palettes from RColorBrewer package
  • scale_fill_grey() : to use grey color palettes

Change the legend position

Change the order of items in the legend The function scale_x_discrete can be used to change the order of items to “2”, “0.5”, “1” :

geom_boxplot(outlier.colour="black", outlier.shape=16,outlier.size=2, notch=FALSE)

Box plot with multiple groups

Change box plot colors and add dots :

Customized box plots

Change fill colors manually :

Histogram

This R tutorial describes how to create a histogram plot using R software and ggplot2 package.

The function geom_histogram() is used. You can also add a line for the mean using the function geom_vline.

Prepare the data The data below will be used :

Basic histogram plots

Add mean line and density plot on the histogram

  • The histogram is plotted with density instead of count on y-axis
  • Overlay with transparent density plot. The value of alpha controls the level of transparency

Change histogram plot line types and colors

Change histogram plot colors by groups

Calculate the mean of each group :

The package plyr is used to calculate the average weight of each group :

Change line colors

Histogram plot line colors can be automatically controlled by the levels of the variable sex.

It is also possible to change manually histogram plot line colors using the functions :

  • scale_color_manual() : to use custom colors
  • scale_color_brewer() : to use color palettes from RColorBrewer package
  • scale_color_grey() : to use grey color palettes

Change fill colors

Histogram plot fill colors can be automatically controlled by the levels of sex :

It is also possible to change manually histogram plot fill colors using the functions :

  • scale_fill_manual() : to use custom colors
  • scale_fill_brewer() : to use color palettes from RColorBrewer package
  • scale_fill_grey() : to use grey color palettes

Change the legend position

Use facets

Split the plots into multiple panels:

Customized histogram plots

Combine histogram and density plots :

Change line colors manually :

Scatter plot

This article describes how create a scatter plot using R software and ggplot2 package. The function geom_point() is used.

Prepare the data

mtcars data sets are used in the examples below.

Basic scatter plots

Simple scatter plots are created using the R code below. The color, the size and the shape of points can be changed using the function geom_point() as follow :

Add regression lines The functions below can be used to add regression lines to a scatter plot :

  • geom_smooth() and stat_smooth()
  • geom_abline()
  • geom_abline() has been already described at this link : ggplot2 add straight lines to a plot.

Only the function geom_smooth() is covered in this section.

## geom_smooth: na.rm = FALSE, orientation = NA, se = TRUE
## stat_smooth: na.rm = FALSE, orientation = NA, se = TRUE, fullrange = FALSE, level = 0.95, method = auto
## position_identity
  • method : smoothing method to be used. Possible values are lm, glm, gam, loess, rlm.
    • method = “loess”: This is the default value for small number of observations. It computes a smooth local regression. You can read more about loess using the R code ?loess.
    • method =“lm”: It fits a linear model. Note that, it’s also possible to indicate the formula as formula = y ~ poly(x, 3) to specify a degree 3 polynomial.
  • se : logical value. If TRUE, confidence interval is displayed around smooth. A simplified format is :
  • fullrange : logical value. If TRUE, the fit spans the full range of the plot
  • level : level of confidence interval to use. Default value is 0.95

Change the appearance of points and lines This section describes how to change :

  • the color and the shape of points
  • the line type and color of the regression line
  • the fill color of the confidence interval

Scatter plots with multiple groups

This section describes how to change point colors and shapes automatically and manually.

Change the point color/shape/size automatically

In the R code below, point shapes, colors and sizes are controlled by the levels of the factor variable cyl :

Add regression lines

Regression lines can be added as follow :

The fill color of confidence bands can be changed as follow :

Change the point color/shape/size manually

The functions below are used :

  • scale_shape_manual() for point shapes
  • scale_color_manual() for point colors
  • scale_size_manual() for point sizes

It is also possible to change manually point and line colors using the functions :

  • scale_color_brewer() : to use color palettes from RColorBrewer package
  • scale_color_grey() : to use grey color palettes

Scatter plots with the 2d density estimation

The functions geom_density_2d() or

stat_density_2d() can be used :

Bar plot

This R tutorial describes how to create a barplot using R software and ggplot2 package.

The function geom_bar() can be used.

Basic barplots

Data derived from ToothGrowth data sets are used. ToothGrowth describes the effect of Vitamin C on Tooth growth in Guinea pigs.

Create barplots

Change the width and the color of bars :

Choose which items to display :

Bar plot with labels

Barplot of counts To make a barplot of counts, we will use the mtcars data sets :

Change outline colors

Barplot outline colors can be automatically controlled by the levels of the variable dose :

It is also possible to change manually barplot line colors using the functions :

  • scale_color_manual() : to use custom colors
  • scale_color_brewer() : to use color palettes from RColorBrewer package
  • scale_color_grey() : to use grey color palettes

Change fill colors

In the R code below, barplot fill colors are automatically controlled by the levels of dose :

It is also possible to change manually barplot fill colors using the functions :

  • scale_fill_manual() : to use custom colors
  • scale_fill_brewer() : to use color palettes from RColorBrewer package
  • scale_fill_grey() : to use grey color palettes

Use black outline color :

Change legend position

Change the order of items in the legend The function scale_x_discrete can be used to change the order of items to “2”, “0.5”, “1” :

Barplot with multiple groups Data derived from ToothGrowth data sets are used. ToothGrowth describes the effect of Vitamin C on tooth growth in Guinea pigs. Three dose levels of Vitamin C (0.5, 1, and 2 mg) with each of two delivery methods [orange juice (OJ) or ascorbic acid (VC)] are used :

Create barplots A stacked barplot is created by default. You can use the function position_dodge() to change this. The barplot fill color is controlled by the levels of dose :

Change the color manually :

Add labels Add labels to a dodged barplot :

Add labels to a stacked barplot : 3 steps are required

  1. Sort the data by dose and supp : the package plyr is used
  2. Calculate the cumulative sum of the variable len for each dose
  3. Create the plot

If you want to place the labels at the middle of bars, you have to modify the cumulative sum as follow :

Barplot with a numeric x-axis If the variable on x-axis is numeric, it can be useful to treat it as a continuous or a factor variable depending on what you want to do :

Barplot with error bars The helper function below will be used to calculate the mean and the standard deviation, for the variable of interest, in each group :

Summarize the data

The function geom_errorbar() can be used to produce a bar graph with error bars :

Customized barplots

Change fill colors manually :

Line plot

This R tutorial describes how to change line types of a graph generated using ggplot2 package.

Line types in R The different line types available in R software are : “blank”, “solid”, “dashed”, “dotted”, “dotdash”, “longdash”, “twodash”.

Basic line plots

Create line plots and change line types The argument linetype is used to change the line type :

Line plot with multiple groups

Change globally the appearance of lines In the graphs below, line types, colors and sizes are the same for the two groups :

Change automatically the line types by groups In the graphs below, line types, colors and sizes are changed automatically by the levels of the variable sex :

Change manually the appearance of lines The functions below can be used :

  • scale_linetype_manual() : to change line types
  • scale_color_manual() : to change line colors
  • scale_size_manual() : to change the size of lines

Error bars

This tutorial describes how to create a graph with error bars using R software and ggplot2 package. There are different types of error bars which can be created using the functions below :

  • geom_errorbar()
  • geom_linerange()
  • geom_pointrange()
  • geom_crossbar()
  • geom_errorbarh()

Add error bars to a bar and line plots

ToothGrowth data is used. It describes the effect of Vitamin C on tooth growth in Guinea pigs. Three dose levels of Vitamin C (0.5, 1, and 2 mg) with each of two delivery methods [orange juice (OJ) or ascorbic acid (VC)] are used :

In the example below, we’ll plot the mean value of Tooth length in each group. The standard deviation is used to draw the error bars on the graph.

First, the helper function below will be used to calculate the mean and the standard deviation, for the variable of interest, in each group :

Summarize the data :

Barplot with error bars The function geom_errorbar() can be used to produce the error bars :

Line plot with error bars

You can also use the functions geom_pointrange() or geom_linerange() instead of using geom_errorbar()

Dot plot with mean point and error bars

The functions geom_dotplot() and stat_summary() are used :

The mean +/- SD can be added as a crossbar , a error bar or a pointrange :

Pie chart

This R tutorial describes how to create a pie chart for data visualization using R software and ggplot2 package.

The function coord_polar() is used to produce a pie chart, which is just a stacked bar chart in polar coordinates.

Simple pie charts

Use a barplot to visualize the data

Create a pie chart :

Change the pie chart fill colors

It is possible to change manually the pie chart fill colors using the functions :

  • scale_fill_manual() : to use custom colors
  • scale_fill_brewer() : to use color palettes from RColorBrewer package
  • scale_fill_grey() : to use grey color palettes

Create a pie chart from a factor variable PlantGrowth data is used :

Create the pie chart of the count of observations in each group :

QQ plot

This R tutorial describes how to create a qq plot (or quantile-quantile plot) using R software and ggplot2 package. QQ plots is used to check whether a given data follows normal distribution.

The function stat_qq() or qplot() can be used.

Prepare the data

mtcars data sets are used in the examples below.

Basic qq plots

In the example below, the distribution of the variable mpg is explored :

Change qq plot point shapes by groups

In the R code below, point shapes are controlled automatically by the variable cyl.

You can also set point shapes manually using the function scale_shape_manual()

In the R code below, point colors of the qq plot are automatically controlled by the levels of cyl :

It is also possible to change manually qq plot colors using the functions :

  • scale_color_manual() : to use custom colors
  • scale_color_brewer() : to use color palettes from RColorBrewer package
  • scale_color_grey() : to use grey color palettes

Change the legend position

Customized qq plots

Change colors manually :

ECDF plot

This R tutorial describes how to create an ECDF plot (or Empirical Cumulative Density Function) using R software and ggplot2 package. ECDF reports for any given number the percent of individuals that are below that threshold.

The function stat_ecdf() can be used.

Create some data

ECDF plots

Customized ECDF plots

Save a ggplot

print(): print a ggplot to a file

To print directly a ggplot to a file, the function print() is used:

## png 
##   2

For printing to a png file, use:

## png 
##   2

ggsave: save the last ggplot

ggsave is a convenient function for saving the last plot that you displayed. It also guesses the type of graphics device from the extension. This means the only argument you need to supply is the filename.

It’s also possible to make a ggplot and to save it from the screen using the function ggsave():

For saving to a png file, use:

## Saving 7 x 5 in image

Graphical Parameters

The aim of this tutorial is to describe how to modify plot titles (main title, axis labels and legend titles) using R software and ggplot2 package.

The functions below can be used :

ggtitle(label) # for the main title
xlab(label) # for the x axis label
ylab(label) # for the y axis label
labs(...) # for the main title, axis labels and legend titles

The argument label is the text to be used for the main title or for the axis labels.

Prepare the data

ToothGrowth data is used in the following examples.

Example of plot

Change the main title and axis labels

Change plot titles by using the functions ggtitle(), xlab() and ylab() :

Change plot titles using the function labs() as follow :

It is also possible to change legend titles using the function labs():

Change the appearance of the main title and axis labels

Main title and, x and y axis labels can be customized using the functions theme() and element_text() as follow :

# main title
p + theme(plot.title = element_text(family, face, colour, size))
# x axis title 
p + theme(axis.title.x = element_text(family, face, colour, size))
# y axis title
p + theme(axis.title.y = element_text(family, face, colour, size))

The arguments below can be used for the function element_text() to change the appearance of the text :

  • family : font family
  • face : font face. Possible values are “plain”, “italic”, “bold” and “bold.italic”
  • colour : text color
  • size : text size in pts
  • hjust : horizontal justification (in [0, 1])
  • vjust : vertical justification (in [0, 1])
  • lineheight : line height. In multi-line text, the - lineheight argument is used to change the spacing between lines.
  • color : an alias for colour

Remove x and y axis labels

It’s possible to hide the main title and axis labels using the function element_blank() as follow :

Legend ggplot

The goal of this R tutorial is to describe how to change the legend of a graph generated using ggplot2 package.

ToothGrowth data is used in the examples below :

Make sure that the variable dose is converted as a factor variable using the above R script.

Example of plot

Change the legend position

The position of the legend can be changed using the function theme() as follow :

Note that, the argument legend.position can be also a numeric vector c(x,y). In this case it is possible to position the legend inside the plotting area. x and y are the coordinates of the legend box. Their values should be between 0 and 1. c(0,0) corresponds to the “bottom left” and c(1,1) corresponds to the “top right” position.

Change the legend title and text font styles

Change the background color of the legend box

Change the order of legend items

To change the order of items to “2”, “0.5”, “1” :

Remove the plot legend

Remove slashes in the legend of a bar plot

guides() : set or remove the legend for a specific aesthetic

It’s possible to use the function guides() to set or remove the legend of a particular aesthetic(fill, color, size, shape, etc). mtcars data sets are used :

Default plot without guide specification

The R code below creates a scatter plot. The color and the shape of the points are determined by the factor variables cyl and gear, respectively. The size of the points are controlled by the variable qsec.

Change the legend position for multiple guides

Change the order for multiple guides

The function guide_legend() is used :

If a continuous color is used, the order of the color guide can be changed using the function guide_colourbar() :

Remove a legend for a particular aesthetic

The R code below removes the legend for the aesthetics color and size :

Removing a particular legend can be done also when using the functions scale_xx. In this case the argument guide is used as follow :

Colors

The goal of this article is to describe how to change the color of a graph generated using R software and ggplot2 package. A color can be specified either by name (e.g.: “red”) or by hexadecimal code (e.g. : “#FF1234”). The different color systems available in R are described at this link : colors in R.

In this R tutorial, you will learn how to :

  • change colors by groups (automatically and manually)
  • use RColorBrewer and Wes Anderson color palettes
  • use gradient colors

Prepare the data

ToothGrowth and mtcars data sets are used in the examples below.

Simple plots

Change colors by groups

The following R code changes the color of the graph by the levels of dose :

The lightness (l) and the chroma (c, intensity of color) of the default (hue) colors can be modified using the functions scale_hue as follow :

Note that, the default values for l and c are : l = 65, c = 100.

Change colors manually

A custom color palettes can be specified using the functions :

  • scale_fill_manual() for box plot, bar plot, violin plot, etc
  • scale_color_manual() for lines and points

Note that, the argument breaks can be used to control the appearance of the legend. This holds true also for the other scale_xx() functions.

Use RColorBrewer palettes

The color palettes available in the RColorBrewer package are described here : color in R.

Available brewer palletes: Reds, Blues, Greys, Purples,RdPu, YlGn, YlOrRd, YlOrBr,YlGnBu,YlGn, Greens, Oranges, BuPu, BuGn, OrRd, PuBu,PuBuGn, Set1, Set2, Set3, Paste1, Paste2, Paired, Accent, Spectral, RdYlGn, RdYlGn, RdYlBu, RdGy, RdBu, PuOr, PRGn, PiYG, BrBG.

Use Wes Anderson color palettes Install and load the color palettes as follow :

The available color palettes are :GrandBudapest1, Moonrise1, Royal1, Moonrise2, Royal2, Cavalcanti, Moonrise3, GrandBudapest2, Chevalier, Zissou, FantasticFox, Darjeeling, Rushmore.

Use gray colors

The functions to use are :

scale_colour_grey() for points, lines, etc scale_fill_grey() for box plot, bar plot, violin plot, etc

Change the gray value at the low and the high ends of the palette :

Continuous colors

The graph can be colored according to the values of a continuous variable using the functions :

scale_color_gradient(), scale_fill_gradient() for sequential gradients between two colors scale_color_gradient2(), scale_fill_gradient2() for diverging gradients scale_color_gradientn(), scale_fill_gradientn() for gradient between n colors

Gradient colors for scatter plots

The graphs are colored using the qsec continuous variable :

Gradient colors for histogram plots

Gradient between n colors