Learning Objectives:

At the end of the session, the participants are expected to:

  • understand the basic concepts of presenting data
  • learn how to use the R base functions in generating appropriate plots to effectively present data
  • learn how to use the functions from the package ggplot2 for effective data presentation

Introduction

  • After data collection, we organize and analyze the data, and then we present the results of our analysis on some form that will allow us to reveal and highlight the important information that we were able to extract.

  • One way to effectively present our data is by graphical presentation, in which we provide a visual picture of the data set. It also allows us to present more information about the variable of interest, without showing too many numbers.

“Analyzing data presented in a good statistical chart is analogous to examining a painting where we can discover a deeper message.”

— Almeda, Capistrano and Sarte (2010)

Let us take a look at these pictures and try to see what is wrong with them.

image1

image2

R Base Graphs

Preliminary tasks

  1. Launch RStudio.

  2. Prepare your data and save it in an external .txt, .csv or .xlsx files.

  3. Import your data into R.

For illustration purposes, we will use the mtcars data.

# Create my_data
my_data <- mtcars
# Print the first 6 rows
head(my_data, 6)

Creating Graphs

The R base function plot() can be used to create graphs.

Let us create a scatterplot of the miles/US gallon and Weight (in 1000 lbs).

plot(x = my_data$wt, y = my_data$mpg,
     pch = 16, frame = TRUE,
     xlab = "Weight (in 1000 lbs)", ylab = "Miles per gallon", col = "royalblue4")

How do you specify colors in R plots?

R has 657 built in color names. To see the list of color names in R:

colors() 

R uses hexadecimal system, which is a base-16 number system used to describe color.

The image below shows an example of hexadecimal colors and their corresponding code. This is lifted from www.visibone.com

hex

To know more about colors in R, check this cheat sheet

Saving graphs

If you are working with RStudio, the plot can be exported from menu in the plot panel.

Plots –> Export –> Save as Image or Save as PDF

Figure 2.1

It’s also possible to save the graph using R codes as follow:

  1. Specify files to save your image using a function such as jpeg(), png(), or pdf(). Additional argument indicating the width and the height of the image can be also used.
  2. Create the plot
  3. Close the file with dev.off()
# Open a pdf file
pdf("rplot.pdf") 
# 2. Create a plot
plot(x = my_data$wt, y = my_data$mpg,
     pch = 16, frame = FALSE,
     xlab = "wt", ylab = "mpg", col = "#2E9FDF")
# Close the pdf file
dev.off() 
## png 
##   2

Or use this:

# 1. Open jpeg file
jpeg("rplot.jpg", width = 350, height = 350)
# 2. Create the plot
plot(x = my_data$wt, y = my_data$mpg,
     pch = 16, frame = FALSE,
     xlab = "wt", ylab = "mpg", col = "#2E9FDF")
# 3. Close the file
dev.off()
## png 
##   2

Note that the R code above saves the file in the current working directory.

Generic plot types in R

The plot() function is the generic function for plotting in R. It can be used to create basic graphs.

A simplified format of the function is:

plot(x, y, type="p")
  • x and y: the coordinates of points to plot

  • type : the type of graph to create; Possible values are :

    • type=“p”: for points (by default)
    • type=“l”: for lines
    • type=“b”: for both; points are connected by a line
    • type=“o”: for both ‘overplotted’;
    • type=“h”: for ‘histogram’ like vertical lines
    • type=“s”: for stair steps
    • type=“n”: for no plotting

Examples

x<-1:10; y=x*x
p1<-plot(x, y, type="p")

p2<-plot(x, y, type="b")

p3<-plot(x, y, type="h")

p4<-plot(x, y, type="s")

Scatterplots

Scatterplots are generally use to examine relationship between variables.

  • R base scatter plot: plot()
  • Enhanced scatter plots: car::scatterplot()
  • Scatterplot matrices

R base scatter plot: plot()

x <- mtcars$wt
y <- mtcars$mpg
# Plot with main and axis titles
# Change point shape (pch = 19) and remove frame.
plot(x, y, main = "Main title of the Graph",
     xlab = "X axis title", ylab = "Y axis title",
     pch = 19, frame = FALSE)
# Add regression line
abline(lm(y ~ x, data = mtcars), col = "red")

In choosing for the point symbols (pch) in R, we have the following list of options:

pch

Other details can be found here.

# Add loess fit
plot(x, y, main = "Main title of the Graph",
     xlab = "X axis title", ylab = "Y axis title",
     pch = 19, frame = FALSE)
lines(lowess(x, y), col = "blue")

Scatterplot matrices

Basic plots: pair()

data(iris)
head(iris)
pairs(iris[,1:4], pch = 19)

Show only upper panel only:

pairs(iris[,1:4], pch = 19, lower.panel = NULL)

Color points by group (species):

my_cols <- c("#993333", "#FFCC33", "#003399")  
pairs(iris[,1:4], pch = 19,  cex = 0.5,
      col = my_cols[iris$Species],
      lower.panel=NULL)

Use the R package psych

The function pairs.panels [in psych package] can be also used to create a scatter plot of matrices, with bivariate scatter plots below the diagonal, histograms on the diagonal, and the Pearson correlation above the diagonal.

install.packages(psych)
library(psych)
pairs.panels(iris[,-5], 
             method = "pearson", # correlation method
             hist.col = "#00AFBB",
             density = TRUE,  # show density plots
             ellipses = FALSE # show correlation ellipses
             )

Box plots

Boxplots are used to examine the following:

  • the central tendency (represented by the median)
  • the dispersion (represented by the first and third quartile)
  • the skewness or symmetry of distribution of data
  • the outliers (if there are any)

To demonstrate how to create box plots in R, we’ll use the R built-in ToothGrowth data set.

Description of ToothGrowth data

The response is the length of odontoblasts (cells responsible for tooth growth) in 60 guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice or ascorbic acid (a form of vitamin C and coded as VC).

# Print the first 6 rows
head(ToothGrowth, 6)

boxplot() function

Draw a box plot of the length of teeth:

# Box plot of one variable
boxplot(ToothGrowth$len)

# Box plots by groups (dose)
# remove frame
boxplot(len ~ dose, data = ToothGrowth, frame = FALSE)

# Horizontal box plots
boxplot(len ~ dose, data = ToothGrowth, frame = FALSE,
        horizontal = TRUE)

We can edit the labels of the groups.

# Change group names
boxplot(len ~ dose, data = ToothGrowth, frame = FALSE, names = c("D0.5", "D1", "D2"))

# Change the color of border using one single color
boxplot(len ~ dose, data = ToothGrowth, frame = FALSE,
        border = "steelblue")

# Change the color of border.
#  Use different colors for each group
boxplot(len ~ dose, data = ToothGrowth, frame = FALSE,
        border = c("#999999", "#E69F00", "#56B4E9"))

# Change fill color : single color
boxplot(len ~ dose, data = ToothGrowth, frame = FALSE,
        col = "steelblue")

# Change fill color: multiple colors
boxplot(len ~ dose, data = ToothGrowth, frame = FALSE,
        col = c("#999999", "#E69F00", "#56B4E9"))

** Box plot with multiple groups**

boxplot(len ~ supp*dose, data = ToothGrowth,
        col = c("green", "steelblue"), frame = FALSE)

Change main title and axis labels

# Change axis titles and remove the frame
boxplot(len ~ dose, data = ToothGrowth,
        main = "Plot of length by dose",
        xlab = "Dose (mg)", ylab = "Length",
        col = "lightgray", frame = FALSE)

Barplots

Bar graphs are used to compare things between different groups or to track changes over time. However, when trying to measure change over time, bar graphs are best when the changes are larger.

Here, we’ll use the R built-in VADeaths data set.

Description of VADeaths data

Death rates per 1000 in Virginia in 1940.

VADeaths
##       Rural Male Rural Female Urban Male Urban Female
## 50-54       11.7          8.7       15.4          8.4
## 55-59       18.1         11.7       24.3         13.6
## 60-64       26.9         20.3       37.0         19.3
## 65-69       41.0         30.9       54.6         35.1
## 70-74       66.0         54.3       71.1         50.0
# Subset
x <- VADeaths[1:3, "Rural Male"]
x
## 50-54 55-59 60-64 
##  11.7  18.1  26.9
# Bar plot of one variable
barplot(x)

# Horizontal bar plot
barplot(x, horiz = TRUE)

#Change group names
barplot(x, names.arg = c("A", "B", "C"))

# Change border and fill color using one single color
b1<-barplot(x, col = "white", border = "steelblue")

# Change the color of border.
#  Use different colors for each group
b2<-barplot(x, col = "white",
        border = c("#999999", "#E69F00", "#56B4E9"))

# Change fill color : single color
b3<-barplot(x, col = "steelblue")

# Change fill color: multiple colors
b4<-barplot(x, col = c("#999999", "#E69F00", "#56B4E9"))

# Change axis titles
# Change color (col = "gray") and remove frame
barplot(x, main = "Death Rates in Virginia",
        xlab = "Age", ylab = "Rate")

Stacked bar plots

barplot(VADeaths,
         col = c("lightblue", "mistyrose", "lightcyan", 
                 "lavender", "cornsilk"),
        legend = rownames(VADeaths))

Grouped bar plots

barplot(VADeaths,
         col = c("lightblue", "mistyrose", "lightcyan", 
                 "lavender", "cornsilk"),
        legend = rownames(VADeaths), beside = TRUE)

Line plots

Line graphs are used to track changes over short and long periods of time. When smaller changes exist, line graphs are better to use than bar graphs. Line graphs can also be used to compare changes over the same period of time for more than one group.

We can use the plot() function and the line() function to create line plots in r.

The simplified format of plot() function to create a line plot. To add another line in the same plot we can use the lines() function.

plot(x, y, type = "l", lty = 1)
lines(x, y, type = "l", lty = 1)
  • x, y: coordinate vectors of points to join

  • type: character indicating the type of plotting. Allowed values are:

  • “p” for points

  • “l” for lines

  • “b” for both points and lines

  • “c” for empty points joined by lines

  • “o” for overplotted points and lines

  • “s” and “S” for stair steps

  • “n” does not produce any points or lines

  • lty: line types. Line types can either be specified as an integer (0=blank, 1=solid (default), 2=dashed, 3=dotted, 4=dotdash, 5=longdash, 6=twodash) or as one of the character strings “blank”, “solid”, “dashed”, “dotted”, “dotdash”, “longdash”, or “twodash”, where “blank” uses ‘invisible lines’ (i.e., does not draw them).

# Create some variables
x <- 1:10
y1 <- x*x
y2  <- 2*y1
# Create a basic stair steps plot 
a<-plot(x, y1, type = "S")

# Show both points and line
b<-plot(x, y1, type = "b", pch = 19, 
     col = "red", xlab = "x", ylab = "y")

# Create a first line
plot(x, y1, type = "b", frame = FALSE, pch = 19, 
     col = "red", xlab = "x", ylab = "y")
# Add a second line
lines(x, y2, pch = 18, col = "blue", type = "b", lty = 2)
# Add a legend to the plot
legend("topleft", legend=c("Line 1", "Line 2"),
       col=c("red", "blue"), lty = 1:2, cex=0.8)

Pie Charts

Pie charts are generally used to show percentage or proportional data and usually the percentage represented by each category is provided next to the corresponding slice of pie. Pie charts are good for displaying data for around 6 categories or fewer.

# Create some data
df <- data.frame(
  group = c("Male", "Female", "Child"),
  value = c(25, 25, 50)
  )
df

The function pie() can be used to draw a pie chart.

pie(x, labels = lbs), radius = 0.8)
  • x: a vector of non-negative numerical quantities. The values in x are displayed as the areas of pie slices.
  • labels: character strings giving names for the slices.
  • radius: radius of the pie circle. If the character strings labeling the slices are long it may be necessary to use a smaller radius.
pct <- round(df$value/sum(df$value)*100)
lbl <- paste(df$group, pct)
lbls <- paste(lbl,"%",sep="")
pie(df$value, labels = lbls, radius = 1)

# Change colors
pie(df$value, labels = lbls, radius = 1,
    col = c("#999999", "#E69F00", "#56B4E9"))

3D pie chart

Te function pie3D()[in plotrix package] can be used to draw a 3D pie chart.

Install plotrix package:

install.packages("plotrix")
# 3D pie chart
library("plotrix")
pie3D(df$value, labels = df$group, radius = 1.5, 
      col = c("#999999", "#E69F00", "#56B4E9"))

# Explode the pie chart
pie3D(df$value, labels = df$group, radius = 1.5,
      col = c("#999999", "#E69F00", "#56B4E9"),
      explode = 0.1)

Histogram and Density Plots

The data set contains the value of weight by sex for 200 individuals.

set.seed(1234)
x <- c(rnorm(200, mean=55, sd=5),
     rnorm(200, mean=65, sd=5))
head(x)
## [1] 48.96467 56.38715 60.42221 43.27151 57.14562 57.53028

A histogram can be created using the function hist(), which simplified format is as follow:

hist(x, breaks)
  • x: a numeric vector
  • breaks: breakpoints between histogram cells.
hist(x, col = "steelblue", frame = FALSE)

# Change the number of breaks
hist(x, col = "steelblue", frame = FALSE,
     breaks = 30)

The function density() is used to estimate kernel density.

# Compute the density data
dens <- density(mtcars$mpg)
# plot density
plot(dens, frame = TRUE, col = "steelblue", 
     main = "Density plot of mpg")

# Fill the density plot using polygon()
plot(dens, frame = TRUE, col = "steelblue", 
     main = "Density plot of mpg") 
polygon(dens, col = "steelblue")

Quantile-quantile qqplots()

The Q–Q (quantile-quantile) plot is a probability plot, which is a graphical method for comparing two probability distributions by plotting their quantiles against each other. This can be used to check for teh normality of the data.

Here, we’ll use the again the built-in R data set named ToothGrowth.

# Store the data in the variable my_data
my_data <- ToothGrowth

The R base functions qqnorm() and qqplot() can be used to produce quantile-quantile plots:

  • qqnorm(): produces a normal QQ plot of the variable
  • qqline(): adds a reference line
qqnorm(my_data$len, pch = 1, frame = FALSE)
qqline(my_data$len, col = "steelblue", lwd = 2)

It’s also possible to use the function qqPlot() [in car package]:

install.packages("car")
library(car)
qqPlot(my_data$len)

## [1] 23  1

As all the points fall approximately along this reference line, we can assume normality.

SUMMARY

  • scatterplots
  • boxplots
  • bar plots
  • line plots
  • pie charts
  • histogram and density plots
  • qqplots

Note that other graphical parameters can be customized. To know more about this, click here

ggplot2

ggplot is a powerful and a flexible R package, implemented by Hadley Wickham, for producing elegant graphics.

The concept behind ggplot2 divides plot into three different fundamental parts: Plot = data + Aesthetics + Geometry.

The principal components of every plot can be defined as follow:

  • data is a data frame
  • Aesthetics is used to indicate x and y variables. It can also be used to control the color, the size or the shape of points, the height of bars, etc…..
  • Geometry defines the type of graphics (histogram, box plot, line plot, density plot, dot plot, ….)

There are two major functions in ggplot2 package: qplot() and ggplot() functions. - qplot() stands for quick plot, which can be used to produce easily simple plots. - ggplot() function is more flexible and robust than qplot for building a plot piece by piece.

Install and Load the necessary packages

install.packages("ggplot2")
install.packages("ggpubr")
library(ggplot2)
library(ggpubr)

Data Format and Preparation

The data should be a data.frame (columns are variables and rows are observations).

The data set mtcars is used in the examples below:

#load the data
data(mtcars)
df<-mtcars[,c("mpg","cyl","wt")]
head(df)

Box plot

This R tutorial describes how to create a box plot using R software and ggplot2 package.

The function geom_boxplot() is used. A simplified format is :

geom_boxplot(outlier.colour="black", outlier.shape=16,outlier.size=2)
  • outlier.colour, outlier.shape, outlier.size : The color, the shape and the size for outlying points

Prepare the data set

In this illustration, the ToothGrowth data was used:

# Convert the variable dose from a numeric to a factor variable
ToothGrowth$dose <- as.factor(ToothGrowth$dose)
head(ToothGrowth)

Make sure that the variable dose is converted as a factor variable using the above R script.

Basic box plot

# Basic box plot
p <- ggplot(ToothGrowth, aes(x=dose, y=len)) + 
  geom_boxplot()
# Rotate the box plot
b<-p + coord_flip()
# Notched box plot
c<-ggplot(ToothGrowth, aes(x=dose, y=len)) + 
  geom_boxplot(notch=TRUE)
# Change outlier, color, shape and size
d<-ggplot(ToothGrowth, aes(x=dose, y=len)) + 
  geom_boxplot(outlier.colour="red",outlier.shape=8,outlier.size=4)
ggarrange(p,b,c,d, nrow=2, ncol=2, labels=c("a","b","c","d"))

Box plot with dots

Dots (or points) can be added to a box plot using the functions geom_dotplot() or geom_jitter() :

# Box plot with dot plot
a<-p + geom_dotplot(binaxis='y', stackdir='center', dotsize=1)
# Box plot with jittered points 0.2 : degree of jitter in x direction
b<-p + geom_jitter(shape=16,position=position_jitter(0.2))
ggarrange(a,b, labels=c("a","b"))

Note that geom_jitter adds a small amount of random variation to the location of each point, and is a useful way of handling overplotting caused by discreteness in smaller datasets.

Change box plot line colors

Box plot line colors can be automatically controlled by the levels of the variable dose :

# Change box plot line colors by groups
p<-ggplot(ToothGrowth, aes(x=dose, y=len, color=dose)) +
  geom_boxplot()
p

It is also possible to change manually box plot line colors using the functions :

  • scale_color_manual() : to use custom colors
  • scale_color_brewer() : to use color palettes from RColorBrewer package
  • scale_color_grey() : to use grey color palettes
# Use custom color palettes
a<-p+scale_color_manual(values=c("#999999", "#E69F00", "#56B4E9"))
# Use brewer color palettes
b<-p+scale_color_brewer(palette="Dark2")
# Use grey scale
c<-p + scale_color_grey() + theme_classic()
ggarrange(a,b,c, nrow=1,ncol=3, labels=c("a","b","c"))

Change box plot fill colors

In the R code below, box plot fill colors are automatically controlled by the levels of dose :

# Use single color
a<-ggplot(ToothGrowth, aes(x=dose, y=len)) +
  geom_boxplot(fill='#A4A4A4', color="black")+
  theme_classic()
# Change box plot colors by groups
p<-ggplot(ToothGrowth, aes(x=dose, y=len, fill=dose)) +
  geom_boxplot()
ggarrange(a,p, labels=c("a","p"))

It is also possible to change manually box plot fill colors using the functions :

  • scale_fill_manual() : to use custom colors
  • scale_fill_brewer() : to use color palettes from RColorBrewer package
  • scale_fill_grey() : to use grey color palettes
# Use custom color palettes
a<-p+scale_fill_manual(values=c("#999999", "#E69F00", "#56B4E9"))
# use brewer color palettes
b<-p+scale_fill_brewer(palette="Dark2")
# Use grey scale
c<-p + scale_fill_grey() + theme_classic()
ggarrange(a,b,c, nrow=1,ncol=3,labels=c("a","b","c"))

Change the legend position

a<-p + theme(legend.position="top")
b<-p + theme(legend.position="bottom")
c<-p + theme(legend.position="none") # Remove legend
ggarrange(a,b,c,nrow=1,ncol=3, labels=c("a","b","c"))

Change the order of items in the legend

The function scale_x_discrete can be used to change the order of items to “2”, “0.5”, “1” :

geom_boxplot(outlier.colour="black", outlier.shape=16,outlier.size=2, notch=FALSE)

Box plot with multiple groups

# Change box plot colors by groups
a<-ggplot(ToothGrowth, aes(x=dose, y=len,
    fill=supp)) + 
    geom_boxplot()
# Change the position
p<-ggplot(ToothGrowth, aes(x=dose, y=len, fill=supp)) +
  geom_boxplot(position=position_dodge(1))
ggarrange(a,p, labels=c("a","p"))

Change box plot colors and add dots :

# Add dots
a<-p + geom_dotplot(binaxis='y', stackdir='center',
                 position=position_dodge(1))
# Change colors
b<-p+scale_fill_manual(values=c("#999999", "#E69F00", "#56B4E9"))
ggarrange(a,b,labels=c("a","b"))

Customized box plots

# Basic box plot
a<-ggplot(ToothGrowth, aes(x=dose, y=len)) + 
  geom_boxplot(fill="gray")+
  labs(title="Plot of length per dose",x="Dose (mg)", y = "Length")+
  theme_classic()
# Change  automatically color by groups
bp <- ggplot(ToothGrowth, aes(x=dose, y=len, fill=dose)) + 
  geom_boxplot()+
  labs(title="Plot of length  per dose",x="Dose (mg)", y = "Length")
b<-bp + theme_classic()
ggarrange(a,b,labels=c("a","b"))

Change fill colors manually :

# Continuous colors
a<-bp + scale_fill_brewer(palette="Blues") + theme_classic()
# Discrete colors
b<-bp + scale_fill_brewer(palette="Dark2") + theme_minimal()
# Gradient colors
c<-bp + scale_fill_brewer(palette="RdBu") + theme_minimal()
ggarrange(a,b,c, nrow=1,ncol=3, labels=c("a","b","c"))

Histogram

This R tutorial describes how to create a histogram plot using R software and ggplot2 package.

The function geom_histogram() is used. You can also add a line for the mean using the function geom_vline.

Prepare the data The data below will be used :

set.seed(1234)
df <- data.frame(
  sex=factor(rep(c("F", "M"), each=200)),
  weight=round(c(rnorm(200, mean=55, sd=5), rnorm(200, mean=65, sd=5)))
  )
head(df)

Basic histogram plots

library(ggplot2)
# Basic histogram
a<-ggplot(df, aes(x=weight)) + geom_histogram()
# Change the width of bins
b<-ggplot(df, aes(x=weight)) + 
  geom_histogram(binwidth=1)
# Change colors
p<-ggplot(df, aes(x=weight)) + 
  geom_histogram(color="black", fill="white")
ggarrange(a,b,p,nrow=1,ncol=3, labels=c("a","b","p"))

Add mean line and density plot on the histogram

  • The histogram is plotted with density instead of count on y-axis
  • Overlay with transparent density plot. The value of alpha controls the level of transparency
# Add mean line
a<-p+ geom_vline(aes(xintercept=mean(weight)),
            color="blue", linetype="dashed", size=1)
# Histogram with density plot
b<-ggplot(df, aes(x=weight)) + 
 geom_histogram(aes(y=..density..), colour="black", fill="white")+
 geom_density(alpha=.2, fill="#FF6666")
ggarrange(a,b, labels=c("a","b"))

Change histogram plot line types and colors

# Change line color and fill color
a<-ggplot(df, aes(x=weight))+
  geom_histogram(color="darkblue", fill="lightblue")
# Change line type
b<-ggplot(df, aes(x=weight))+
  geom_histogram(color="black", fill="lightblue",
                 linetype="dashed")
ggarrange(a,b, labels=c("a","b"))

Change histogram plot colors by groups

Calculate the mean of each group :

The package plyr is used to calculate the average weight of each group :

library(plyr)
mu <- ddply(df, "sex", summarise, grp.mean=mean(weight))
head(mu)

Change line colors

Histogram plot line colors can be automatically controlled by the levels of the variable sex.

# Change histogram plot line colors by groups
a<-ggplot(df, aes(x=weight, color=sex)) +
  geom_histogram(fill="white")
# Overlaid histograms
b<-ggplot(df, aes(x=weight, color=sex)) +
  geom_histogram(fill="white", alpha=0.5, position="identity")
ggarrange(a,b, labels=c("a","b"))

# Interleaved histograms
a<-ggplot(df, aes(x=weight, color=sex)) +
  geom_histogram(fill="white", position="dodge")+
  theme(legend.position="top")
# Add mean lines
p<-ggplot(df, aes(x=weight, color=sex)) +
  geom_histogram(fill="white", position="dodge")+
  geom_vline(data=mu, aes(xintercept=grp.mean, color=sex),
             linetype="dashed")+
  theme(legend.position="top")
ggarrange(a,p, labels=c("a","p"))

It is also possible to change manually histogram plot line colors using the functions :

  • scale_color_manual() : to use custom colors
  • scale_color_brewer() : to use color palettes from RColorBrewer package
  • scale_color_grey() : to use grey color palettes
# Use custom color palettes
a<-p+scale_color_manual(values=c("#999999", "#E69F00", "#56B4E9"))
# Use brewer color palettes
b<-p+scale_color_brewer(palette="Dark2")
# Use grey scale
c<-p + scale_color_grey() + theme_classic() +
  theme(legend.position="top")
ggarrange(a,b,c, nrow=1, ncol=3, labels=c("a","b","c"))

Change fill colors

Histogram plot fill colors can be automatically controlled by the levels of sex :

# Change histogram plot fill colors by groups
a<-ggplot(df, aes(x=weight, fill=sex, color=sex)) +
  geom_histogram(position="identity")
# Use semi-transparent fill
p<-ggplot(df, aes(x=weight, fill=sex, color=sex)) +
  geom_histogram(position="identity", alpha=0.5)
# Add mean lines
c<-p+geom_vline(data=mu, aes(xintercept=grp.mean, color=sex),
             linetype="dashed")
ggarrange(a,p,c, nrow=1, ncol=3, labels=c("a","p","c"))

It is also possible to change manually histogram plot fill colors using the functions :

  • scale_fill_manual() : to use custom colors
  • scale_fill_brewer() : to use color palettes from RColorBrewer package
  • scale_fill_grey() : to use grey color palettes
# Use custom color palettes
a<-p+scale_color_manual(values=c("#999999", "#E69F00", "#56B4E9"))+
  scale_fill_manual(values=c("#999999", "#E69F00", "#56B4E9"))
# use brewer color palettes
b<-p+scale_color_brewer(palette="Dark2")+
  scale_fill_brewer(palette="Dark2")
# Use grey scale
c<-p + scale_color_grey()+scale_fill_grey() +
  theme_classic()
ggarrange(a,b,c, nrow=1, ncol=3, labels=c("a","b","c"))

Change the legend position

a<-p + theme(legend.position="top")
b<-p + theme(legend.position="bottom")
# Remove legend
c<-p + theme(legend.position="none")
ggarrange(a,b,c, nrow=1, ncol=3, labels=c("a","b","c"))

Use facets

Split the plots into multiple panels:

p<-ggplot(df, aes(x=weight))+
  geom_histogram(color="black", fill="white")+
  facet_grid(sex ~ .)
# Add mean lines
a<-p+geom_vline(data=mu, aes(xintercept=grp.mean, color="red"),
             linetype="dashed")
ggarrange(p,a,labels=c("p","a"))

Customized histogram plots

# Basic histogram
a<-ggplot(df, aes(x=weight, fill=sex)) +
  geom_histogram(fill="white", color="black")+
  geom_vline(aes(xintercept=mean(weight)), color="blue",
             linetype="dashed")+
  labs(title="Weight histogram plot",x="Weight(kg)", y = "Count")+
  theme_classic()
# Change line colors by groups
b<-ggplot(df, aes(x=weight, color=sex, fill=sex)) +
  geom_histogram(position="identity", alpha=0.5)+
  geom_vline(data=mu, aes(xintercept=grp.mean, color=sex),
             linetype="dashed")+
  scale_color_manual(values=c("#999999", "#E69F00", "#56B4E9"))+
  scale_fill_manual(values=c("#999999", "#E69F00", "#56B4E9"))+
  labs(title="Weight histogram plot",x="Weight(kg)", y = "Count")+
  theme_classic()
ggarrange(a,b,labels=c("a","b"))

Combine histogram and density plots :

# Change line colors by groups
ggplot(df, aes(x=weight, color=sex, fill=sex)) +
geom_histogram(aes(y=..density..), position="identity", alpha=0.5)+
geom_density(alpha=0.6)+
geom_vline(data=mu, aes(xintercept=grp.mean, color=sex),
           linetype="dashed")+
scale_color_manual(values=c("#999999", "#E69F00", "#56B4E9"))+
scale_fill_manual(values=c("#999999", "#E69F00", "#56B4E9"))+
labs(title="Weight histogram plot",x="Weight(kg)", y = "Density")+
theme_classic()

Change line colors manually :

p<-ggplot(df, aes(x=weight, color=sex)) +
  geom_histogram(fill="white", position="dodge")+
  geom_vline(data=mu, aes(xintercept=grp.mean, color=sex),
             linetype="dashed")
# Continuous colors
a<-p + scale_color_brewer(palette="Paired") + 
  theme_classic()+theme(legend.position="top")
# Discrete colors
b<-p + scale_color_brewer(palette="Dark2") +
  theme_minimal()+theme_classic()+theme(legend.position="top")
# Gradient colors
c<-p + scale_color_brewer(palette="Accent") + 
  theme_minimal()+theme(legend.position="top")
ggarrange(a,b,c,nrow=1, ncol=3, labels=c("a","b","c"))

Scatter plot

This article describes how create a scatter plot using R software and ggplot2 package. The function geom_point() is used.

Prepare the data

mtcars data sets are used in the examples below.

# Convert cyl column from a numeric to a factor variable
mtcars$cyl <- as.factor(mtcars$cyl)
head(mtcars)

Basic scatter plots

Simple scatter plots are created using the R code below. The color, the size and the shape of points can be changed using the function geom_point() as follow :

library(ggplot2)
# Basic scatter plot
a<-ggplot(mtcars, aes(x=wt, y=mpg)) + geom_point()
# Change the point size, and shape
b<-ggplot(mtcars, aes(x=wt, y=mpg)) +
  geom_point(size=2, shape=23)
ggarrange(a,b, labels=c("a","b"))

# Change the point size
ggplot(mtcars, aes(x=wt, y=mpg)) + 
  geom_point(aes(size=qsec))

Add regression lines The functions below can be used to add regression lines to a scatter plot :

  • geom_smooth() and stat_smooth()
  • geom_abline()
  • geom_abline() has been already described at this link : ggplot2 add straight lines to a plot.

Only the function geom_smooth() is covered in this section.

geom_smooth(method="auto", se=TRUE, fullrange=FALSE, level=0.95)
  • method : smoothing method to be used. Possible values are lm, glm, gam, loess, rlm.
    • method = “loess”: This is the default value for small number of observations. It computes a smooth local regression. You can read more about loess using the R code ?loess.
    • method =“lm”: It fits a linear model. Note that, it’s also possible to indicate the formula as formula = y ~ poly(x, 3) to specify a degree 3 polynomial.
  • se : logical value. If TRUE, confidence interval is displayed around smooth. A simplified format is :
  • fullrange : logical value. If TRUE, the fit spans the full range of the plot
  • level : level of confidence interval to use. Default value is 0.95
# Add the regression line
a<-ggplot(mtcars, aes(x=wt, y=mpg)) + 
  geom_point()+
  geom_smooth(method=lm)
# Remove the confidence interval
b<-ggplot(mtcars, aes(x=wt, y=mpg)) + 
  geom_point()+
  geom_smooth(method=lm, se=FALSE)
# Loess method
c<-ggplot(mtcars, aes(x=wt, y=mpg)) + 
  geom_point()+
  geom_smooth()
ggarrange(a,b,c, nrow=1, ncol=3, labels=c("a","b","c"))

Change the appearance of points and lines This section describes how to change :

  • the color and the shape of points
  • the line type and color of the regression line
  • the fill color of the confidence interval
# Change the point colors and shapes
# Change the line type and color
a<-ggplot(mtcars, aes(x=wt, y=mpg)) + 
  geom_point(shape=18, color="blue")+
  geom_smooth(method=lm, se=FALSE, linetype="dashed",
             color="darkred")
# Change the confidence interval fill color
b<-ggplot(mtcars, aes(x=wt, y=mpg)) + 
  geom_point(shape=18, color="blue")+
  geom_smooth(method=lm,  linetype="dashed",
             color="darkred", fill="blue")
ggarrange(a,b, labels=c("a","b"))

Scatter plots with multiple groups

This section describes how to change point colors and shapes automatically and manually.

Change the point color/shape/size automatically

In the R code below, point shapes, colors and sizes are controlled by the levels of the factor variable cyl :

# Change point shapes by the levels of cyl
a<-ggplot(mtcars, aes(x=wt, y=mpg, shape=cyl)) +
  geom_point()
# Change point shapes and colors
b<-ggplot(mtcars, aes(x=wt, y=mpg, shape=cyl, color=cyl)) +
  geom_point()
# Change point shapes, colors and sizes
c<-ggplot(mtcars, aes(x=wt, y=mpg, shape=cyl, color=cyl, size=cyl)) +
  geom_point()
ggarrange(a,b,c,nrow=1, ncol=3, labels=c("a","b","c"))

Add regression lines

Regression lines can be added as follow :

# Add regression lines
a<-ggplot(mtcars, aes(x=wt, y=mpg, color=cyl, shape=cyl)) +
  geom_point() + 
  geom_smooth(method=lm)
# Remove confidence intervals
# Extend the regression lines
b<-ggplot(mtcars, aes(x=wt, y=mpg, color=cyl, shape=cyl)) +
  geom_point() + 
  geom_smooth(method=lm, se=FALSE, fullrange=TRUE)
ggarrange(a,b, labels=c("a","b"))

The fill color of confidence bands can be changed as follow :

ggplot(mtcars, aes(x=wt, y=mpg, color=cyl, shape=cyl)) +
  geom_point() + 
  geom_smooth(method=lm, aes(fill=cyl))

Change the point color/shape/size manually

The functions below are used :

  • scale_shape_manual() for point shapes
  • scale_color_manual() for point colors
  • scale_size_manual() for point sizes
# Change point shapes and colors manually
a<-ggplot(mtcars, aes(x=wt, y=mpg, color=cyl, shape=cyl)) +
  geom_point() + 
  geom_smooth(method=lm, se=FALSE, fullrange=TRUE)+
  scale_shape_manual(values=c(3, 16, 17))+ 
  scale_color_manual(values=c('#999999','#E69F00', '#56B4E9'))+
  theme(legend.position="top")
  
# Change the point sizes manually
b<-ggplot(mtcars, aes(x=wt, y=mpg, color=cyl, shape=cyl))+
  geom_point(aes(size=cyl)) + 
  geom_smooth(method=lm, se=FALSE, fullrange=TRUE)+
  scale_shape_manual(values=c(3, 16, 17))+ 
  scale_color_manual(values=c('#999999','#E69F00', '#56B4E9'))+
  scale_size_manual(values=c(2,3,4))+
  theme(legend.position="top")
ggarrange(a,b, labels=c("a","b"))

It is also possible to change manually point and line colors using the functions :

  • scale_color_brewer() : to use color palettes from RColorBrewer package
  • scale_color_grey() : to use grey color palettes
p <- ggplot(mtcars, aes(x=wt, y=mpg, color=cyl, shape=cyl)) +
  geom_point() + 
  geom_smooth(method=lm, se=FALSE, fullrange=TRUE)+
  theme_classic()
# Use brewer color palettes
a<-p+scale_color_brewer(palette="Dark2")
# Use grey scale
b<-p + scale_color_grey()
ggarrange(a,b, labels=c("a","b"))

Bar plot

This R tutorial describes how to create a barplot using R software and ggplot2 package.

The function geom_bar() can be used.

Basic barplots

Data derived from ToothGrowth data sets are used. ToothGrowth describes the effect of Vitamin C on Tooth growth in Guinea pigs.

df <- data.frame(dose=c("D0.5", "D1", "D2"),
                len=c(4.2, 10, 29.5))
head(df)

Create barplots

library(ggplot2)
# Basic barplot
p<-ggplot(data=df, aes(x=dose, y=len)) +
  geom_bar(stat="identity")
   
# Horizontal bar plot
a<-p + coord_flip()

Change the width and the color of bars :

# Change the width of bars
a<-ggplot(data=df, aes(x=dose, y=len)) +
  geom_bar(stat="identity", width=0.5)
# Change colors
b<-ggplot(data=df, aes(x=dose, y=len)) +
  geom_bar(stat="identity", color="blue", fill="white")
# Minimal theme + blue fill color
p<-ggplot(data=df, aes(x=dose, y=len)) +
  geom_bar(stat="identity", fill="steelblue")+
  theme_minimal()
ggarrange(a,b,p, nrow=1, ncol=3, labels=c("a","b","p"))

Choose which items to display :

p + scale_x_discrete(limits=c("D0.5", "D2"))

Bar plot with labels

# Outside bars
a<-ggplot(data=df, aes(x=dose, y=len)) +
  geom_bar(stat="identity", fill="steelblue")+
  geom_text(aes(label=len), vjust=-0.3, size=3.5)+
  theme_minimal()
# Inside bars
b<-ggplot(data=df, aes(x=dose, y=len)) +
  geom_bar(stat="identity", fill="steelblue")+
  geom_text(aes(label=len), vjust=1.6, color="white", size=3.5)+
  theme_minimal()
ggarrange(a,b, labels=c("a","b"))

Barplot of counts

To make a barplot of counts, we will use the mtcars data sets :

head(mtcars)
# Don't map a variable to y
ggplot(mtcars, aes(x=factor(cyl)))+
  geom_bar(stat="count", width=0.7, fill="steelblue")+
  theme_minimal()

Change outline colors

Barplot outline colors can be automatically controlled by the levels of the variable dose :

# Change barplot line colors by groups
p<-ggplot(df, aes(x=dose, y=len, color=dose)) +
  geom_bar(stat="identity", fill="white")

It is also possible to change manually barplot line colors using the functions :

  • scale_color_manual() : to use custom colors
  • scale_color_brewer() : to use color palettes from RColorBrewer package
  • scale_color_grey() : to use grey color palettes
# Use custom color palettes
a<-p+scale_color_manual(values=c("#999999", "#E69F00", "#56B4E9"))
# Use brewer color palettes
b<-p+scale_color_brewer(palette="Dark2")
# Use grey scale
c<-p + scale_color_grey() + theme_classic()
ggarrange(a,b,c,nrow=1,ncol=3,labels=c("a","b","c"))

Change fill colors

In the R code below, barplot fill colors are automatically controlled by the levels of dose :

# Change barplot fill colors by groups
p<-ggplot(df, aes(x=dose, y=len, fill=dose)) +
  geom_bar(stat="identity")+theme_minimal()
p

It is also possible to change manually barplot fill colors using the functions :

  • scale_fill_manual() : to use custom colors
  • scale_fill_brewer() : to use color palettes from RColorBrewer package
  • scale_fill_grey() : to use grey color palettes
# Use custom color palettes
a<-p+scale_fill_manual(values=c("#999999", "#E69F00", "#56B4E9"))
# use brewer color palettes
b<-p+scale_fill_brewer(palette="Dark2")
# Use grey scale
c<-p + scale_fill_grey()
ggarrange(a,b,c,nrow=1, ncol=3,labels=c("a","b","c"))

Use black outline color :

ggplot(df, aes(x=dose, y=len, fill=dose))+
geom_bar(stat="identity", color="black")+
scale_fill_manual(values=c("#999999", "#E69F00", "#56B4E9"))+
  theme_minimal()

Change legend position

# Change bar fill colors to blues
p <- p+scale_fill_brewer(palette="Blues")
a<-p + theme(legend.position="top")
b<-p + theme(legend.position="bottom")
# Remove legend
c<-p + theme(legend.position="none")
ggarrange(a,b,c, nrow=1, ncol=3, labels=c("a","b","c"))

Change the order of items in the legend

The function scale_x_discrete can be used to change the order of items to “2”, “0.5”, “1” :

p + scale_x_discrete(limits=c("D2", "D0.5", "D1"))

Barplot with multiple groups

  • Data derived from ToothGrowth data sets are used.
  • ToothGrowth describes the effect of Vitamin C on tooth growth in Guinea pigs.
  • Three dose levels of Vitamin C (0.5, 1, and 2 mg) with each of two delivery methods [orange juice (OJ) or ascorbic acid (VC)] are used :
df2 <- data.frame(supp=rep(c("VC", "OJ"), each=3),
                dose=rep(c("D0.5", "D1", "D2"),2),
                len=c(6.8, 15, 33, 4.2, 10, 29.5))
head(df2)

Create barplots

A stacked barplot is created by default. You can use the function position_dodge() to change this. The barplot fill color is controlled by the levels of dose :

# Stacked barplot with multiple groups
a<-ggplot(data=df2, aes(x=dose, y=len, fill=supp)) +
  geom_bar(stat="identity")
# Use position=position_dodge()
b<-ggplot(data=df2, aes(x=dose, y=len, fill=supp)) +
geom_bar(stat="identity", position=position_dodge())
ggarrange(a, b, labels=c("a","b"))

Change the color manually :

# Change the colors manually
p <- ggplot(data=df2, aes(x=dose, y=len, fill=supp)) +
geom_bar(stat="identity", color="black", position=position_dodge())+
  theme_minimal()
# Use custom colors
a<-p + scale_fill_manual(values=c('#999999','#E69F00'))
# Use brewer color palettes
b<-p + scale_fill_brewer(palette="Blues")
ggarrange(a,b, labels=c("a","b"))

Add labels

Add labels to a dodged barplot :

ggplot(data=df2, aes(x=dose, y=len, fill=supp)) +
  geom_bar(stat="identity", position=position_dodge())+
  geom_text(aes(label=len), vjust=1.6, color="white",
            position = position_dodge(0.9), size=3.5)+
  scale_fill_brewer(palette="Paired")+
  theme_minimal()

Add labels to a stacked barplot : 3 steps are required

  1. Sort the data by dose and supp : the package plyr is used
  2. Calculate the cumulative sum of the variable len for each dose
  3. Create the plot
library(plyr)
# Sort by dose and supp
df_sorted <- arrange(df2, dose, supp) 
head(df_sorted)
# Calculate the cumulative sum of len for each dose
df_cumsum <- ddply(df_sorted, "dose",
                   transform, label_ypos=cumsum(len))
head(df_cumsum)
# Create the barplot
ggplot(data=df_cumsum, aes(x=dose, y=len, fill=supp)) +
  geom_bar(stat="identity")+
  geom_text(aes(y=label_ypos, label=len), vjust=1.6, 
            color="white", size=3.5)+
  scale_fill_brewer(palette="Paired")+
  theme_minimal()

If you want to place the labels at the middle of bars, you have to modify the cumulative sum as follow :

df_cumsum <- ddply(df_sorted, "dose",
                   transform, 
                   label_ypos=cumsum(len) - 0.5*len)
# Create the barplot
ggplot(data=df_cumsum, aes(x=dose, y=len, fill=supp)) +
  geom_bar(stat="identity")+
  geom_text(aes(y=label_ypos, label=len), vjust=1.6, 
            color="white", size=3.5)+
  scale_fill_brewer(palette="Paired")+
  theme_minimal()

Barplot with a numeric x-axis

If the variable on x-axis is numeric, it can be useful to treat it as a continuous or a factor variable depending on what you want to do :

# Create some data
df2 <- data.frame(supp=rep(c("VC", "OJ"), each=3),
                dose=rep(c("0.5", "1", "2"),2),
                len=c(6.8, 15, 33, 4.2, 10, 29.5))
head(df2)
# x axis treated as continuous variable
df2$dose <- as.numeric(as.vector(df2$dose))
a<-ggplot(data=df2, aes(x=dose, y=len, fill=supp)) +
  geom_bar(stat="identity", position=position_dodge())+
  scale_fill_brewer(palette="Paired")+
  theme_minimal()
# Axis treated as discrete variable
df2$dose<-as.factor(df2$dose)
b<-ggplot(data=df2, aes(x=dose, y=len, fill=supp)) +
  geom_bar(stat="identity", position=position_dodge())+
  scale_fill_brewer(palette="Paired")+
  theme_minimal()
ggarrange(a,b,labels=c("a","b"))

Customized barplots

# Change color by groups
# Add error bars
p + labs(title="Plot of length  per dose", 
         x="Dose (mg)", y = "Length")+
   scale_fill_manual(values=c('black','lightgray'))+
   theme_classic()

Change fill colors manually :

# Greens
a<-p + scale_fill_brewer(palette="Greens") + theme_minimal()
# Reds
b<-p + scale_fill_brewer(palette="Reds") + theme_minimal()
ggarrange(a,b,labels=c("a","b"))

Line plot

This R tutorial describes how to change line types of a graph generated using ggplot2 package.

Line types in R

The different line types available in R software are : “blank”, “solid”, “dashed”, “dotted”, “dotdash”, “longdash”, “twodash”.

Basic line plots

df <- data.frame(time=c("breakfeast", "Lunch", "Dinner"),
                bill=c(10, 30, 15))
head(df)

Create line plots and change line types

The argument linetype is used to change the line type :

# Basic line plot with points
a<-ggplot(data=df, aes(x=time, y=bill, group=1)) +
  geom_line()+
  geom_point()
# Change the line type
b<-ggplot(data=df, aes(x=time, y=bill, group=1)) +
  geom_line(linetype = "dashed")+
  geom_point()
ggarrange(a,b, labels=c("a","b"))

Line plot with multiple groups

df2 <- data.frame(sex = rep(c("Female", "Male"), each=3),
                  time=c("breakfeast", "Lunch", "Dinner"),
                  bill=c(10, 30, 15, 13, 40, 17) )
head(df2)

Change globally the appearance of lines

In the graphs below, line types, colors and sizes are the same for the two groups :

# Line plot with multiple groups
a<-ggplot(data=df2, aes(x=time, y=bill, group=sex)) +
  geom_line()+
  geom_point()
# Change line types
b<-ggplot(data=df2, aes(x=time, y=bill, group=sex)) +
  geom_line(linetype="dashed")+
  geom_point()
# Change line colors and sizes
c<-ggplot(data=df2, aes(x=time, y=bill, group=sex)) +
  geom_line(linetype="dotted", color="red", size=2)+
  geom_point(color="blue", size=3)
ggarrange(a,b,c, nrow=1, ncol=3, labels=c("a","b","c"))

Change automatically the line types by groups

In the graphs below, line types, colors and sizes are changed automatically by the levels of the variable sex :

# Change line types by groups (sex)
a<-ggplot(df2, aes(x=time, y=bill, group=sex)) +
  geom_line(aes(linetype=sex))+
  geom_point()+
  theme(legend.position="top")
# Change line types + colors
b<-ggplot(df2, aes(x=time, y=bill, group=sex)) +
  geom_line(aes(linetype=sex, color=sex))+
  geom_point(aes(color=sex))+
  theme(legend.position="top")
ggarrange(a,b,labels=c("a","b"))

Change manually the appearance of lines

The functions below can be used :

  • scale_linetype_manual() : to change line types
  • scale_color_manual() : to change line colors
  • scale_size_manual() : to change the size of lines
# Set line types manually
a<-ggplot(df2, aes(x=time, y=bill, group=sex)) +
  geom_line(aes(linetype=sex))+
  geom_point()+
  scale_linetype_manual(values=c("twodash", "dotted"))+
  theme(legend.position="top")
# Change line colors and sizes
b<-ggplot(df2, aes(x=time, y=bill, group=sex)) +
  geom_line(aes(linetype=sex, color=sex, size=sex))+
  geom_point()+
  scale_linetype_manual(values=c("twodash", "dotted"))+
  scale_color_manual(values=c('#999999','#E69F00'))+
  scale_size_manual(values=c(1, 1.5))+
  theme(legend.position="top")
ggarrange(a,b,labels=c("a","b"))

Error bars

This tutorial describes how to create a graph with error bars using R software and ggplot2 package. There are different types of error bars which can be created using the functions below :

  • geom_errorbar()
  • geom_linerange()
  • geom_pointrange()
  • geom_crossbar()
  • geom_errorbarh()

Add error bars to a bar and line plots

The ToothGrowth data is used.

df <- ToothGrowth
df$dose <- as.factor(df$dose)
head(df)

In the example below, we’ll plot the mean value of Tooth length in each group. The standard deviation is used to draw the error bars on the graph.

First, the helper function below will be used to calculate the mean and the standard deviation, for the variable of interest, in each group :

#+++++++++++++++++++++++++
# Function to calculate the mean and the standard deviation
  # for each group
#+++++++++++++++++++++++++
# data : a data frame
# varname : the name of a column containing the variable
  #to be summariezed
# groupnames : vector of column names to be used as
  # grouping variables
data_summary <- function(data, varname, groupnames){
  require(plyr)
  summary_func <- function(x, col){
    c(mean = mean(x[[col]], na.rm=TRUE),
      sd = sd(x[[col]], na.rm=TRUE))
  }
  data_sum<-ddply(data, groupnames, .fun=summary_func,
                  varname)
  data_sum <- rename(data_sum, c("mean" = varname))
 return(data_sum)
}

Summarize the data :

df2 <- data_summary(ToothGrowth, varname="len", 
                    groupnames=c("supp", "dose"))
# Convert dose to a factor variable
df2$dose=as.factor(df2$dose)
head(df2)

Barplot with error bars

The function geom_errorbar() can be used to produce the error bars :

# Default bar plot
p<- ggplot(df2, aes(x=dose, y=len, fill=supp)) + 
  geom_bar(stat="identity", color="black", 
           position=position_dodge()) +
  geom_errorbar(aes(ymin=len-sd, ymax=len+sd), width=.2,
                 position=position_dodge(.9)) 
# Finished bar plot
a<-p+labs(title="Tooth length per dose", x="Dose (mg)", y = "Length")+
   theme_classic() +
   scale_fill_manual(values=c('#999999','#E69F00'))
ggarrange(p,a,labels=c("p","a"))

# Keep only upper error bars
 ggplot(df2, aes(x=dose, y=len, fill=supp)) + 
  geom_bar(stat="identity", color="black", position=position_dodge()) +
  geom_errorbar(aes(ymin=len, ymax=len+sd), width=.2,
                 position=position_dodge(.9)) 

Line plot with error bars

# Default line plot
p<- ggplot(df2, aes(x=dose, y=len, group=supp, color=supp)) + 
  geom_line() +
  geom_point()+
  geom_errorbar(aes(ymin=len-sd, ymax=len+sd), width=.2,
                 position=position_dodge(0.05))
# Finished line plot
a<-p+labs(title="Tooth length per dose", x="Dose (mg)", y = "Length")+
   theme_classic() +
   scale_color_manual(values=c('#999999','#E69F00'))
ggarrange(p,a,labels=c("p","a"))

You can also use the functions geom_pointrange() or geom_linerange() instead of using geom_errorbar()

# Use geom_pointrange
a<-ggplot(df2, aes(x=dose, y=len, group=supp, color=supp)) + 
geom_pointrange(aes(ymin=len-sd, ymax=len+sd))
# Use geom_line()+geom_pointrange()
b<-ggplot(df2, aes(x=dose, y=len, group=supp, color=supp)) + 
  geom_line()+
  geom_pointrange(aes(ymin=len-sd, ymax=len+sd))
ggarrange(a,b,labels=c("a","b"))

Dot plot with mean point and error bars

The functions geom_dotplot() and stat_summary() are used :

The mean +/- SD can be added as a crossbar , a error bar or a pointrange :

p <- ggplot(df, aes(x=dose, y=len)) + 
    geom_dotplot(binaxis='y', stackdir='center')
# use geom_crossbar()
a<-p + stat_summary(fun.data="mean_sdl", fun.args = list(mult=1), 
                 geom="crossbar", width=0.5)
# Use geom_errorbar()
b<-p + stat_summary(fun.data=mean_sdl, fun.args = list(mult=1), 
        geom="errorbar", color="red", width=0.2) +
  stat_summary(fun.y=mean, geom="point", color="red")
   
# Use geom_pointrange()
c<-p + stat_summary(fun.data=mean_sdl, fun.args = list(mult=1), 
                 geom="pointrange", color="red")
ggarrange(a,b,c,nrow=1,ncol=3,labels=c("a","b","c"))

Pie chart

This R tutorial describes how to create a pie chart for data visualization using R software and ggplot2 package.

The function coord_polar() is used to produce a pie chart, which is just a stacked bar chart in polar coordinates.

Simple pie charts

df <- data.frame(
  group = c("Male", "Female", "Child"),
  value = c(25, 25, 50)
  )
head(df)

Use a barplot to visualize the data

# Barplot
bp<- ggplot(df, aes(x="", y=value, fill=group))+
geom_bar(width = 1, stat = "identity")
bp

Create a pie chart :

pie <- bp + coord_polar("y", start=0)
pie

Change the pie chart fill colors

It is possible to change manually the pie chart fill colors using the functions :

  • scale_fill_manual() : to use custom colors
  • scale_fill_brewer() : to use color palettes from RColorBrewer package
  • scale_fill_grey() : to use grey color palettes
# Use custom color palettes
pie + scale_fill_manual(values=c("#999999", "#E69F00", "#56B4E9"))

# use brewer color palettes
pie + scale_fill_brewer(palette="Dark2")

# use brewer color palettes
pie + scale_fill_brewer(palette="Blues")+
  theme_minimal()

# Use grey scale
pie + scale_fill_grey() + theme_minimal()

Create a pie chart from a factor variable

PlantGrowth data is used :

Description of the PlantGrowth data:

Results from an experiment to compare yields (as measured by dried weight of plants) obtained under a control and two different treatment conditions.

head(PlantGrowth)

Create the pie chart of the count of observations in each group :

ggplot(PlantGrowth, aes(x=factor(1), fill=group))+
  geom_bar(width = 1)+
  coord_polar("y")

QQ plot

This R tutorial describes how to create a qq plot (or quantile-quantile plot) using R software and ggplot2 package. QQ plots is used to check whether a given data follows normal distribution.

The function stat_qq() or qplot() can be used.

Prepare the data

mtcars data sets are used in the examples below.

# Convert cyl column from a numeric to a factor variable
mtcars$cyl <- as.factor(mtcars$cyl)
head(mtcars)

Basic qq plots

In the example below, the distribution of the variable mpg is explored :

# Solution 1
a<-qplot(sample = mpg, data = mtcars)
# Solution 2
b<-ggplot(mtcars, aes(sample=mpg))+stat_qq()
ggarrange(a,b, labels=c("a","b"))

Change qq plot point shapes by groups

In the R code below, point shapes are controlled automatically by the variable cyl.

You can also set point shapes manually using the function scale_shape_manual()

# Change point shapes by groups
p<-qplot(sample = mpg, data = mtcars, shape=cyl)
# Change point shapes manually
a<-p + scale_shape_manual(values=c(1,17,19))
ggarrange(p,a,labels=c("p","a"))

In the R code below, point colors of the qq plot are automatically controlled by the levels of cyl :

# Change qq plot colors by groups
p<-qplot(sample = mpg, data = mtcars, color=cyl)

It is also possible to change manually qq plot colors using the functions :

  • scale_color_manual() : to use custom colors
  • scale_color_brewer() : to use color palettes from RColorBrewer package
  • scale_color_grey() : to use grey color palettes
# Use custom color palettes
a<-p+scale_color_manual(values=c("#999999", "#E69F00", "#56B4E9"))
# Use brewer color palettes
b<-p+scale_color_brewer(palette="Dark2")
# Use grey scale
c<-p + scale_color_grey() + theme_classic()
ggarrange(a, b,c,nrow=1, ncol=3,labels=c("a","b","c"))

Change the legend position

a<-p + theme(legend.position="top")
b<-p + theme(legend.position="bottom")
c<-p + theme(legend.position="none") # Remove legend
ggarrange(a,b,c,nrow=1,ncol=3, labels=c("a","b","c"))

Customized qq plots

# Basic qq plot
qplot(sample = mpg, data = mtcars)+
labs(title="Miles per gallon according to the weight",
       y = "Miles/(US) gallon")+
theme_classic()

# Change color/shape by groups
p <- qplot(sample = mpg, data = mtcars, color=cyl, shape=cyl)+
labs(title="Miles per gallon according to the weight",
       y = "Miles/(US) gallon")
a<-p + theme_classic()
ggarrange(p,a,labels=c("p","a"))