ggplot is a powerful and a flexible R package, implemented by Hadley Wickham, for producing elegant graphics.
The concept behind ggplot2 divides plot into three different fundamental parts: Plot = data + Aesthetics + Geometry.
The principal components of every plot can be defined as follow:
There are two major functions in ggplot2 package: qplot() and ggplot() functions. - qplot() stands for quick plot, which can be used to produce easily simple plots. - ggplot() function is more flexible and robust than qplot for building a plot piece by piece.
Load the necessary packages
Data Format and Preparation
The data should be a data.frame (columns are variables and rows are observations).
The data set mtcars is used in the examples below:
Quick plot with ggplot2 in R software and data visualization
The function qplot() [in ggplot2] is very similar to the basic plot() function from the R base package. It can be used to create and combine easily different types of plots. However, it remains less flexible than the function ggplot().
This chapter provides a brief introduction to qplot(), which stands for quick plot.
Data Format
The data must be a data.frame (columns are variables and rows are observations).
mtcars : Motor Trend Car Road Tests.
Description: The data comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973 - 74 models).
Format: A data frame with 32 observations on 3 variables.
- [, 1] mpg Miles/(US) gallon
- [, 2] cyl Number of cylinders
- [, 3] wt Weight (lb/1000)
Usage of the qplot()
A simplified format of qplot is:
qplot (x,y=NULL, data, geom="auto", xlim=c(NA,NA), ylim=c(NA,NA))
x : x values
y : y values (optional)
data : data frame to use (optional).
geom : Character vector specifying geom to use. Defaults to “point” if x and y are specified, and “histogram” if only x is specified.
xlim, ylim: x and y axis limits
Other arguments including main, xlab, ylab and log can be used also: - main: Plot title - xlab, ylab: x and y axis labels - log: which variables to log transform. Allowed values are “x”, “y” or “xy”.
Scatterplots
Basic Scatterplots
The plot can be created using data from either numeric vectors or a data frame:
#Use data from numberic vectors
x <- 1:10; y = x*x ;
par(mfrow=c(1,3))
# basic plot
a<-qplot(x,y)
#Add line
b<-qplot(x, y, geom=c("point", "line"))
#Use data from a data frame
c<-qplot(mpg, wt, data=mtcars)
ggarrange(
a, b, c,nrow=1, ncol=3, labels=c("a","b","c"),common.legend = TRUE, legend = "bottom"
)
Scatter plots with smoothed line
The option smooth is used to add a smoothed line with its standard error:
#Smoothing
a<-qplot(mpg, wt, data = mtcars, geom = c("point", "smooth"))
#Linear fits by group (smooth line by groups)
b<-qplot(mpg, wt, data = mtcars, color =factor(cyl),
geom=c("point", "smooth"))
ggarrange(a,b,labels = c("a", "b"),legend = "bottom")
Change scatter plot colors
Points can be colored according to the values of a continuous or a discrete variable. The argument colour is used.
# Change the color by a continuous numeric variable
a<-qplot(mpg, wt, data = mtcars, colour = cyl)
# Change the color by groups (factor)
df <- mtcars
df[,'cyl'] <- as.factor(df[,'cyl'])
b<-qplot(mpg, wt, data = df, colour = cyl)
# Add lines
c<-qplot(mpg, wt, data = df, colour = cyl,
geom=c("point", "line"))
ggarrange(a,b,c,labels=c("a","b","c"), nrow=1, ncol=3)
Change the shape and the size of points
Like color, the shape and the size of points can be controlled by a continuous or discrete variable.
#Change the size of points according to the values of a continuous variable
a<-qplot(mpg, wt, data = mtcars, size = mpg)
# Change point shapes by groups
b<-qplot(mpg, wt, data = mtcars, shape = factor(cyl))
ggarrange(a,b, nrow=1, ncol=2, labels=c("a","b"))
Scatter plot with texts
The argument label is used to specify the texts to be used for each points:
Box plot, dot plot and violin plot
PlantGrowth data set is used in the following example :
geom = “boxplot”: draws a box plot
geom = “dotplot”: draws a dot plot. The supplementary arguments stackdir = “center” and binaxis = “y” are required.
geom = “violin”: draws a violin plot. The argument trim is set to FALSE
# Basic box plot from a numeric vector
x <- "1"
y <- rnorm(100)
a<-qplot(x, y, geom="boxplot")
# Basic box plot from data frame
b<-qplot(group, weight, data = PlantGrowth,
geom=c("boxplot"))
# Dot plot
c<-qplot(group, weight, data = PlantGrowth,
geom=c("dotplot"),
stackdir = "center", binaxis = "y")
# Violin plot
d<-qplot(group, weight, data = PlantGrowth,
geom=c("violin"), trim = FALSE)
ggarrange(a,b,c,d)
Change the color by groups:
# Box plot from a data frame
# Add jitter and change fill color by group
a<-qplot(group, weight, data = PlantGrowth,
geom=c("boxplot", "jitter"), fill = group)
# Dot plot
b<-qplot(group, weight, data = PlantGrowth,
geom = "dotplot", stackdir = "center", binaxis = "y",color = group, fill = group)
ggarrange(a,b, labels=c("a","b"))
Histogram and density plots
The histogram and density plots are used to display the distribution of data.
Generate some data
The R code below generates some data containing the weights by sex (M for male; F for female):
set.seed(1234)
mydata = data.frame(
sex = factor(rep(c("F", "M"), each=200)),
weight = c(rnorm(200, 55), rnorm(200, 58)))
head(mydata)
Histogram
# Basic histogram
a<-qplot(weight, data = mydata, geom = "histogram")
# Change histogram fill color by group (sex)
b<-qplot(weight, data = mydata, geom = "histogram",
fill = sex)
ggarrange(a,b, labels=c("a","b"))
Density plot
# Basic density plot
a<-qplot(weight, data = mydata, geom = "density")
# Change density plot line color by group (sex)
# change line type
b<-qplot(weight, data = mydata, geom = "density",
color = sex, linetype = sex)
ggarrange(a,b,labels=c("a","b"))
Main Titles and axis titles
Titles can be added to the plot as follow:
qplot(weight, data = mydata, geom = "density",
xlab = "Weight (kg)", ylab = "Density",
main = "Density plot of Weight")
This R tutorial describes how to create a box plot using R software and ggplot2 package.
The function geom_boxplot() is used. A simplified format is :
geom_boxplot(outlier.colour="black", outlier.shape=16,outlier.size=2, notch=FALSE)
Details
Prepare the data set
In this illustration, the ToothGrowth data was used:
# Convert the variable dose from a numeric to a factor variable
ToothGrowth$dose <- as.factor(ToothGrowth$dose)
head(ToothGrowth)
Make sure that the variable dose is converted as a factor variable using the above R script.
Basic box plot
# Basic box plot
p <- ggplot(ToothGrowth, aes(x=dose, y=len)) +
geom_boxplot()
# Rotate the box plot
b<-p + coord_flip()
# Notched box plot
c<-ggplot(ToothGrowth, aes(x=dose, y=len)) +
geom_boxplot(notch=TRUE)
# Change outlier, color, shape and size
d<-ggplot(ToothGrowth, aes(x=dose, y=len)) +
geom_boxplot(outlier.colour="red",outlier.shape=8,outlier.size=4)
ggarrange(p,b,c,d, nrow=2, ncol=2, labels=c("a","b","c","d"))
The function stat_summary() can be used to add mean points to a box plot :
# Box plot with mean points
a<-p + stat_summary(fun.y=mean, geom="point", shape=23, size=4)
# Choose which items to display
b<- p + scale_x_discrete(limits=c("0.5", "2"))
ggarrange(a,b,labels=c("a","b"))
Box plot with dots
Dots (or points) can be added to a box plot using the functions geom_dotplot() or geom_jitter() :
# Box plot with dot plot
a<-p + geom_dotplot(binaxis='y', stackdir='center', dotsize=1)
# Box plot with jittered points 0.2 : degree of jitter in x direction
b<-p + geom_jitter(shape=16,position=position_jitter(0.2))
ggarrange(a,b, labels=c("a","b"))
Change box plot line colors
Box plot line colors can be automatically controlled by the levels of the variable dose :
# Change box plot line colors by groups
p<-ggplot(ToothGrowth, aes(x=dose, y=len, color=dose)) +
geom_boxplot()
p
It is also possible to change manually box plot line colors using the functions :
# Use custom color palettes
a<-p+scale_color_manual(values=c("#999999", "#E69F00", "#56B4E9"))
# Use brewer color palettes
b<-p+scale_color_brewer(palette="Dark2")
# Use grey scale
c<-p + scale_color_grey() + theme_classic()
ggarrange(a,b,c, nrow=1,ncol=3, labels=c("a","b","c"))
Change box plot fill colors In the R code below, box plot fill colors are automatically controlled by the levels of dose :
# Use single color
a<-ggplot(ToothGrowth, aes(x=dose, y=len)) +
geom_boxplot(fill='#A4A4A4', color="black")+
theme_classic()
# Change box plot colors by groups
p<-ggplot(ToothGrowth, aes(x=dose, y=len, fill=dose)) +
geom_boxplot()
ggarrange(a,p, labels=c("a","p"))
It is also possible to change manually box plot fill colors using the functions :
# Use custom color palettes
a<-p+scale_fill_manual(values=c("#999999", "#E69F00", "#56B4E9"))
# use brewer color palettes
b<-p+scale_fill_brewer(palette="Dark2")
# Use grey scale
c<-p + scale_fill_grey() + theme_classic()
ggarrange(a,b,c, nrow=1,ncol=3,labels=c("a","b","c"))
Change the legend position
a<-p + theme(legend.position="top")
b<-p + theme(legend.position="bottom")
c<-p + theme(legend.position="none") # Remove legend
ggarrange(a,b,c,nrow=1,ncol=3, labels=c("a","b","c"))
Change the order of items in the legend The function scale_x_discrete can be used to change the order of items to “2”, “0.5”, “1” :
geom_boxplot(outlier.colour="black", outlier.shape=16,outlier.size=2, notch=FALSE)
Box plot with multiple groups
# Change box plot colors by groups
a<-ggplot(ToothGrowth, aes(x=dose, y=len,
fill=supp)) +
geom_boxplot()
# Change the position
p<-ggplot(ToothGrowth, aes(x=dose, y=len, fill=supp)) +
geom_boxplot(position=position_dodge(1))
ggarrange(a,p, labels=c("a","p"))
Change box plot colors and add dots :
# Add dots
a<-p + geom_dotplot(binaxis='y', stackdir='center',
position=position_dodge(1))
# Change colors
b<-p+scale_fill_manual(values=c("#999999", "#E69F00", "#56B4E9"))
ggarrange(a,b,labels=c("a","b"))
Customized box plots
# Basic box plot
a<-ggplot(ToothGrowth, aes(x=dose, y=len)) +
geom_boxplot(fill="gray")+
labs(title="Plot of length per dose",x="Dose (mg)", y = "Length")+
theme_classic()
# Change automatically color by groups
bp <- ggplot(ToothGrowth, aes(x=dose, y=len, fill=dose)) +
geom_boxplot()+
labs(title="Plot of length per dose",x="Dose (mg)", y = "Length")
b<-bp + theme_classic()
ggarrange(a,b,labels=c("a","b"))
Change fill colors manually :
# Continuous colors
a<-bp + scale_fill_brewer(palette="Blues") + theme_classic()
# Discrete colors
b<-bp + scale_fill_brewer(palette="Dark2") + theme_minimal()
# Gradient colors
c<-bp + scale_fill_brewer(palette="RdBu") + theme_minimal()
ggarrange(a,b,c, nrow=1,ncol=3, labels=c("a","b","c"))
This R tutorial describes how to create a histogram plot using R software and ggplot2 package.
The function geom_histogram() is used. You can also add a line for the mean using the function geom_vline.
Prepare the data The data below will be used :
set.seed(1234)
df <- data.frame(
sex=factor(rep(c("F", "M"), each=200)),
weight=round(c(rnorm(200, mean=55, sd=5), rnorm(200, mean=65, sd=5)))
)
head(df)
Basic histogram plots
library(ggplot2)
# Basic histogram
a<-ggplot(df, aes(x=weight)) + geom_histogram()
# Change the width of bins
b<-ggplot(df, aes(x=weight)) +
geom_histogram(binwidth=1)
# Change colors
p<-ggplot(df, aes(x=weight)) +
geom_histogram(color="black", fill="white")
ggarrange(a,b,p,nrow=1,ncol=3, labels=c("a","b","p"))
Add mean line and density plot on the histogram
# Add mean line
a<-p+ geom_vline(aes(xintercept=mean(weight)),
color="blue", linetype="dashed", size=1)
# Histogram with density plot
b<-ggplot(df, aes(x=weight)) +
geom_histogram(aes(y=..density..), colour="black", fill="white")+
geom_density(alpha=.2, fill="#FF6666")
ggarrange(a,b, labels=c("a","b"))
Change histogram plot line types and colors
# Change line color and fill color
a<-ggplot(df, aes(x=weight))+
geom_histogram(color="darkblue", fill="lightblue")
# Change line type
b<-ggplot(df, aes(x=weight))+
geom_histogram(color="black", fill="lightblue",
linetype="dashed")
ggarrange(a,b, labels=c("a","b"))
Change histogram plot colors by groups
Calculate the mean of each group :
The package plyr is used to calculate the average weight of each group :
Change line colors
Histogram plot line colors can be automatically controlled by the levels of the variable sex.
# Change histogram plot line colors by groups
a<-ggplot(df, aes(x=weight, color=sex)) +
geom_histogram(fill="white")
# Overlaid histograms
b<-ggplot(df, aes(x=weight, color=sex)) +
geom_histogram(fill="white", alpha=0.5, position="identity")
ggarrange(a,b, labels=c("a","b"))
# Interleaved histograms
a<-ggplot(df, aes(x=weight, color=sex)) +
geom_histogram(fill="white", position="dodge")+
theme(legend.position="top")
# Add mean lines
p<-ggplot(df, aes(x=weight, color=sex)) +
geom_histogram(fill="white", position="dodge")+
geom_vline(data=mu, aes(xintercept=grp.mean, color=sex),
linetype="dashed")+
theme(legend.position="top")
ggarrange(a,p, labels=c("a","p"))
It is also possible to change manually histogram plot line colors using the functions :
# Use custom color palettes
a<-p+scale_color_manual(values=c("#999999", "#E69F00", "#56B4E9"))
# Use brewer color palettes
b<-p+scale_color_brewer(palette="Dark2")
# Use grey scale
c<-p + scale_color_grey() + theme_classic() +
theme(legend.position="top")
ggarrange(a,b,c, nrow=1, ncol=3, labels=c("a","b","c"))
Change fill colors
Histogram plot fill colors can be automatically controlled by the levels of sex :
# Change histogram plot fill colors by groups
a<-ggplot(df, aes(x=weight, fill=sex, color=sex)) +
geom_histogram(position="identity")
# Use semi-transparent fill
p<-ggplot(df, aes(x=weight, fill=sex, color=sex)) +
geom_histogram(position="identity", alpha=0.5)
# Add mean lines
c<-p+geom_vline(data=mu, aes(xintercept=grp.mean, color=sex),
linetype="dashed")
ggarrange(a,p,c, nrow=1, ncol=3, labels=c("a","p","c"))
It is also possible to change manually histogram plot fill colors using the functions :
# Use custom color palettes
a<-p+scale_color_manual(values=c("#999999", "#E69F00", "#56B4E9"))+
scale_fill_manual(values=c("#999999", "#E69F00", "#56B4E9"))
# use brewer color palettes
b<-p+scale_color_brewer(palette="Dark2")+
scale_fill_brewer(palette="Dark2")
# Use grey scale
c<-p + scale_color_grey()+scale_fill_grey() +
theme_classic()
ggarrange(a,b,c, nrow=1, ncol=3, labels=c("a","b","c"))
Change the legend position
a<-p + theme(legend.position="top")
b<-p + theme(legend.position="bottom")
# Remove legend
c<-p + theme(legend.position="none")
ggarrange(a,b,c, nrow=1, ncol=3, labels=c("a","b","c"))
Use facets
Split the plots into multiple panels:
p<-ggplot(df, aes(x=weight))+
geom_histogram(color="black", fill="white")+
facet_grid(sex ~ .)
# Add mean lines
a<-p+geom_vline(data=mu, aes(xintercept=grp.mean, color="red"),
linetype="dashed")
ggarrange(p,a,labels=c("p","a"))
Customized histogram plots
# Basic histogram
a<-ggplot(df, aes(x=weight, fill=sex)) +
geom_histogram(fill="white", color="black")+
geom_vline(aes(xintercept=mean(weight)), color="blue",
linetype="dashed")+
labs(title="Weight histogram plot",x="Weight(kg)", y = "Count")+
theme_classic()
# Change line colors by groups
b<-ggplot(df, aes(x=weight, color=sex, fill=sex)) +
geom_histogram(position="identity", alpha=0.5)+
geom_vline(data=mu, aes(xintercept=grp.mean, color=sex),
linetype="dashed")+
scale_color_manual(values=c("#999999", "#E69F00", "#56B4E9"))+
scale_fill_manual(values=c("#999999", "#E69F00", "#56B4E9"))+
labs(title="Weight histogram plot",x="Weight(kg)", y = "Count")+
theme_classic()
ggarrange(a,b,labels=c("a","b"))
Combine histogram and density plots :
# Change line colors by groups
ggplot(df, aes(x=weight, color=sex, fill=sex)) +
geom_histogram(aes(y=..density..), position="identity", alpha=0.5)+
geom_density(alpha=0.6)+
geom_vline(data=mu, aes(xintercept=grp.mean, color=sex),
linetype="dashed")+
scale_color_manual(values=c("#999999", "#E69F00", "#56B4E9"))+
scale_fill_manual(values=c("#999999", "#E69F00", "#56B4E9"))+
labs(title="Weight histogram plot",x="Weight(kg)", y = "Density")+
theme_classic()
Change line colors manually :
p<-ggplot(df, aes(x=weight, color=sex)) +
geom_histogram(fill="white", position="dodge")+
geom_vline(data=mu, aes(xintercept=grp.mean, color=sex),
linetype="dashed")
# Continuous colors
a<-p + scale_color_brewer(palette="Paired") +
theme_classic()+theme(legend.position="top")
# Discrete colors
b<-p + scale_color_brewer(palette="Dark2") +
theme_minimal()+theme_classic()+theme(legend.position="top")
# Gradient colors
c<-p + scale_color_brewer(palette="Accent") +
theme_minimal()+theme(legend.position="top")
ggarrange(a,b,c,nrow=1, ncol=3, labels=c("a","b","c"))
This article describes how create a scatter plot using R software and ggplot2 package. The function geom_point() is used.
Prepare the data
mtcars data sets are used in the examples below.
# Convert cyl column from a numeric to a factor variable
mtcars$cyl <- as.factor(mtcars$cyl)
head(mtcars)
Basic scatter plots
Simple scatter plots are created using the R code below. The color, the size and the shape of points can be changed using the function geom_point() as follow :
library(ggplot2)
# Basic scatter plot
a<-ggplot(mtcars, aes(x=wt, y=mpg)) + geom_point()
# Change the point size, and shape
b<-ggplot(mtcars, aes(x=wt, y=mpg)) +
geom_point(size=2, shape=23)
ggarrange(a,b, labels=c("a","b"))
Add regression lines The functions below can be used to add regression lines to a scatter plot :
Only the function geom_smooth() is covered in this section.
## geom_smooth: na.rm = FALSE, orientation = NA, se = TRUE
## stat_smooth: na.rm = FALSE, orientation = NA, se = TRUE, fullrange = FALSE, level = 0.95, method = auto
## position_identity
# Add the regression line
a<-ggplot(mtcars, aes(x=wt, y=mpg)) +
geom_point()+
geom_smooth(method=lm)
# Remove the confidence interval
b<-ggplot(mtcars, aes(x=wt, y=mpg)) +
geom_point()+
geom_smooth(method=lm, se=FALSE)
# Loess method
c<-ggplot(mtcars, aes(x=wt, y=mpg)) +
geom_point()+
geom_smooth()
ggarrange(a,b,c, nrow=1, ncol=3, labels=c("a","b","c"))
Change the appearance of points and lines This section describes how to change :
# Change the point colors and shapes
# Change the line type and color
a<-ggplot(mtcars, aes(x=wt, y=mpg)) +
geom_point(shape=18, color="blue")+
geom_smooth(method=lm, se=FALSE, linetype="dashed",
color="darkred")
# Change the confidence interval fill color
b<-ggplot(mtcars, aes(x=wt, y=mpg)) +
geom_point(shape=18, color="blue")+
geom_smooth(method=lm, linetype="dashed",
color="darkred", fill="blue")
ggarrange(a,b, labels=c("a","b"))
Scatter plots with multiple groups
This section describes how to change point colors and shapes automatically and manually.
Change the point color/shape/size automatically
In the R code below, point shapes, colors and sizes are controlled by the levels of the factor variable cyl :
# Change point shapes by the levels of cyl
a<-ggplot(mtcars, aes(x=wt, y=mpg, shape=cyl)) +
geom_point()
# Change point shapes and colors
b<-ggplot(mtcars, aes(x=wt, y=mpg, shape=cyl, color=cyl)) +
geom_point()
# Change point shapes, colors and sizes
c<-ggplot(mtcars, aes(x=wt, y=mpg, shape=cyl, color=cyl, size=cyl)) +
geom_point()
ggarrange(a,b,c,nrow=1, ncol=3, labels=c("a","b","c"))
Add regression lines
Regression lines can be added as follow :
# Add regression lines
a<-ggplot(mtcars, aes(x=wt, y=mpg, color=cyl, shape=cyl)) +
geom_point() +
geom_smooth(method=lm)
# Remove confidence intervals
# Extend the regression lines
b<-ggplot(mtcars, aes(x=wt, y=mpg, color=cyl, shape=cyl)) +
geom_point() +
geom_smooth(method=lm, se=FALSE, fullrange=TRUE)
ggarrange(a,b, labels=c("a","b"))
The fill color of confidence bands can be changed as follow :
ggplot(mtcars, aes(x=wt, y=mpg, color=cyl, shape=cyl)) +
geom_point() +
geom_smooth(method=lm, aes(fill=cyl))
Change the point color/shape/size manually
The functions below are used :
# Change point shapes and colors manually
a<-ggplot(mtcars, aes(x=wt, y=mpg, color=cyl, shape=cyl)) +
geom_point() +
geom_smooth(method=lm, se=FALSE, fullrange=TRUE)+
scale_shape_manual(values=c(3, 16, 17))+
scale_color_manual(values=c('#999999','#E69F00', '#56B4E9'))+
theme(legend.position="top")
# Change the point sizes manually
b<-ggplot(mtcars, aes(x=wt, y=mpg, color=cyl, shape=cyl))+
geom_point(aes(size=cyl)) +
geom_smooth(method=lm, se=FALSE, fullrange=TRUE)+
scale_shape_manual(values=c(3, 16, 17))+
scale_color_manual(values=c('#999999','#E69F00', '#56B4E9'))+
scale_size_manual(values=c(2,3,4))+
theme(legend.position="top")
ggarrange(a,b, labels=c("a","b"))
It is also possible to change manually point and line colors using the functions :
p <- ggplot(mtcars, aes(x=wt, y=mpg, color=cyl, shape=cyl)) +
geom_point() +
geom_smooth(method=lm, se=FALSE, fullrange=TRUE)+
theme_classic()
# Use brewer color palettes
a<-p+scale_color_brewer(palette="Dark2")
# Use grey scale
b<-p + scale_color_grey()
ggarrange(a,b, labels=c("a","b"))
Scatter plots with the 2d density estimation
The functions geom_density_2d() or
stat_density_2d() can be used :
# Scatter plot with the 2d density estimation
sp <- ggplot(faithful, aes(x=eruptions, y=waiting)) + geom_point()
a<-sp + geom_density_2d()
# Gradient color
b<-sp + stat_density_2d(aes(fill = ..level..), geom="polygon")
# Change the gradient color
c<-sp + stat_density_2d(aes(fill = ..level..), geom="polygon")+
scale_fill_gradient(low="blue", high="red")
ggarrange(a,b,c,nrow=1, ncol=3, labels=c("a","b","c"))
This R tutorial describes how to create a barplot using R software and ggplot2 package.
The function geom_bar() can be used.
Basic barplots
Data derived from ToothGrowth data sets are used. ToothGrowth describes the effect of Vitamin C on Tooth growth in Guinea pigs.
Create barplots
library(ggplot2)
# Basic barplot
p<-ggplot(data=df, aes(x=dose, y=len)) +
geom_bar(stat="identity")
# Horizontal bar plot
a<-p + coord_flip()
Change the width and the color of bars :
# Change the width of bars
a<-ggplot(data=df, aes(x=dose, y=len)) +
geom_bar(stat="identity", width=0.5)
# Change colors
b<-ggplot(data=df, aes(x=dose, y=len)) +
geom_bar(stat="identity", color="blue", fill="white")
# Minimal theme + blue fill color
p<-ggplot(data=df, aes(x=dose, y=len)) +
geom_bar(stat="identity", fill="steelblue")+
theme_minimal()
ggarrange(a,b,p, nrow=1, ncol=3, labels=c("a","b","p"))
Choose which items to display :
Bar plot with labels
# Outside bars
a<-ggplot(data=df, aes(x=dose, y=len)) +
geom_bar(stat="identity", fill="steelblue")+
geom_text(aes(label=len), vjust=-0.3, size=3.5)+
theme_minimal()
# Inside bars
b<-ggplot(data=df, aes(x=dose, y=len)) +
geom_bar(stat="identity", fill="steelblue")+
geom_text(aes(label=len), vjust=1.6, color="white", size=3.5)+
theme_minimal()
ggarrange(a,b, labels=c("a","b"))
Barplot of counts To make a barplot of counts, we will use the mtcars data sets :
# Don't map a variable to y
ggplot(mtcars, aes(x=factor(cyl)))+
geom_bar(stat="count", width=0.7, fill="steelblue")+
theme_minimal()
Change outline colors
Barplot outline colors can be automatically controlled by the levels of the variable dose :
# Change barplot line colors by groups
p<-ggplot(df, aes(x=dose, y=len, color=dose)) +
geom_bar(stat="identity", fill="white")
It is also possible to change manually barplot line colors using the functions :
# Use custom color palettes
a<-p+scale_color_manual(values=c("#999999", "#E69F00", "#56B4E9"))
# Use brewer color palettes
b<-p+scale_color_brewer(palette="Dark2")
# Use grey scale
c<-p + scale_color_grey() + theme_classic()
ggarrange(a,b,c,nrow=1,ncol=3,labels=c("a","b","c"))
Change fill colors
In the R code below, barplot fill colors are automatically controlled by the levels of dose :
# Change barplot fill colors by groups
p<-ggplot(df, aes(x=dose, y=len, fill=dose)) +
geom_bar(stat="identity")+theme_minimal()
p
It is also possible to change manually barplot fill colors using the functions :
# Use custom color palettes
a<-p+scale_fill_manual(values=c("#999999", "#E69F00", "#56B4E9"))
# use brewer color palettes
b<-p+scale_fill_brewer(palette="Dark2")
# Use grey scale
c<-p + scale_fill_grey()
ggarrange(a,b,c,nrow=1, ncol=3,labels=c("a","b","c"))
Use black outline color :
ggplot(df, aes(x=dose, y=len, fill=dose))+
geom_bar(stat="identity", color="black")+
scale_fill_manual(values=c("#999999", "#E69F00", "#56B4E9"))+
theme_minimal()
Change legend position
# Change bar fill colors to blues
p <- p+scale_fill_brewer(palette="Blues")
a<-p + theme(legend.position="top")
b<-p + theme(legend.position="bottom")
# Remove legend
c<-p + theme(legend.position="none")
ggarrange(a,b,c, nrow=1, ncol=3, labels=c("a","b","c"))
Change the order of items in the legend The function scale_x_discrete can be used to change the order of items to “2”, “0.5”, “1” :
Barplot with multiple groups Data derived from ToothGrowth data sets are used. ToothGrowth describes the effect of Vitamin C on tooth growth in Guinea pigs. Three dose levels of Vitamin C (0.5, 1, and 2 mg) with each of two delivery methods [orange juice (OJ) or ascorbic acid (VC)] are used :
df2 <- data.frame(supp=rep(c("VC", "OJ"), each=3),
dose=rep(c("D0.5", "D1", "D2"),2),
len=c(6.8, 15, 33, 4.2, 10, 29.5))
head(df2)
Create barplots A stacked barplot is created by default. You can use the function position_dodge() to change this. The barplot fill color is controlled by the levels of dose :
# Stacked barplot with multiple groups
a<-ggplot(data=df2, aes(x=dose, y=len, fill=supp)) +
geom_bar(stat="identity")
# Use position=position_dodge()
b<-ggplot(data=df2, aes(x=dose, y=len, fill=supp)) +
geom_bar(stat="identity", position=position_dodge())
ggarrange(a, b, labels=c("a","b"))
Change the color manually :
# Change the colors manually
p <- ggplot(data=df2, aes(x=dose, y=len, fill=supp)) +
geom_bar(stat="identity", color="black", position=position_dodge())+
theme_minimal()
# Use custom colors
a<-p + scale_fill_manual(values=c('#999999','#E69F00'))
# Use brewer color palettes
b<-p + scale_fill_brewer(palette="Blues")
ggarrange(a,b, labels=c("a","b"))
Add labels Add labels to a dodged barplot :
ggplot(data=df2, aes(x=dose, y=len, fill=supp)) +
geom_bar(stat="identity", position=position_dodge())+
geom_text(aes(label=len), vjust=1.6, color="white",
position = position_dodge(0.9), size=3.5)+
scale_fill_brewer(palette="Paired")+
theme_minimal()
Add labels to a stacked barplot : 3 steps are required
# Calculate the cumulative sum of len for each dose
df_cumsum <- ddply(df_sorted, "dose",
transform, label_ypos=cumsum(len))
head(df_cumsum)
# Create the barplot
ggplot(data=df_cumsum, aes(x=dose, y=len, fill=supp)) +
geom_bar(stat="identity")+
geom_text(aes(y=label_ypos, label=len), vjust=1.6,
color="white", size=3.5)+
scale_fill_brewer(palette="Paired")+
theme_minimal()
If you want to place the labels at the middle of bars, you have to modify the cumulative sum as follow :
df_cumsum <- ddply(df_sorted, "dose",
transform,
label_ypos=cumsum(len) - 0.5*len)
# Create the barplot
ggplot(data=df_cumsum, aes(x=dose, y=len, fill=supp)) +
geom_bar(stat="identity")+
geom_text(aes(y=label_ypos, label=len), vjust=1.6,
color="white", size=3.5)+
scale_fill_brewer(palette="Paired")+
theme_minimal()
Barplot with a numeric x-axis If the variable on x-axis is numeric, it can be useful to treat it as a continuous or a factor variable depending on what you want to do :
# Create some data
df2 <- data.frame(supp=rep(c("VC", "OJ"), each=3),
dose=rep(c("0.5", "1", "2"),2),
len=c(6.8, 15, 33, 4.2, 10, 29.5))
head(df2)
# x axis treated as continuous variable
df2$dose <- as.numeric(as.vector(df2$dose))
a<-ggplot(data=df2, aes(x=dose, y=len, fill=supp)) +
geom_bar(stat="identity", position=position_dodge())+
scale_fill_brewer(palette="Paired")+
theme_minimal()
# Axis treated as discrete variable
df2$dose<-as.factor(df2$dose)
b<-ggplot(data=df2, aes(x=dose, y=len, fill=supp)) +
geom_bar(stat="identity", position=position_dodge())+
scale_fill_brewer(palette="Paired")+
theme_minimal()
ggarrange(a,b,labels=c("a","b"))
Barplot with error bars The helper function below will be used to calculate the mean and the standard deviation, for the variable of interest, in each group :
#+++++++++++++++++++++++++
# Function to calculate the mean and the standard deviation
# for each group
#+++++++++++++++++++++++++
# data : a data frame
# varname : the name of a column containing the variable
#to be summariezed
# groupnames : vector of column names to be used as
# grouping variables
data_summary <- function(data, varname, groupnames){
require(plyr)
summary_func <- function(x, col){
c(mean = mean(x[[col]], na.rm=TRUE),
sd = sd(x[[col]], na.rm=TRUE))
}
data_sum<-ddply(data, groupnames, .fun=summary_func,
varname)
data_sum <- rename(data_sum, c("mean" = varname))
return(data_sum)
}
Summarize the data
df3 <- data_summary(ToothGrowth, varname="len",
groupnames=c("supp", "dose"))
# Convert dose to a factor variable
df3$dose=as.factor(df3$dose)
head(df3)
The function geom_errorbar() can be used to produce a bar graph with error bars :
# Standard deviation of the mean as error bar
p <- ggplot(df3, aes(x=dose, y=len, fill=supp)) +
geom_bar(stat="identity", position=position_dodge()) +
geom_errorbar(aes(ymin=len-sd, ymax=len+sd), width=.2,
position=position_dodge(.9))
p + scale_fill_brewer(palette="Paired") + theme_minimal()
Customized barplots
# Change color by groups
# Add error bars
p + labs(title="Plot of length per dose",
x="Dose (mg)", y = "Length")+
scale_fill_manual(values=c('black','lightgray'))+
theme_classic()
Change fill colors manually :
# Greens
a<-p + scale_fill_brewer(palette="Greens") + theme_minimal()
# Reds
b<-p + scale_fill_brewer(palette="Reds") + theme_minimal()
ggarrange(a,b,labels=c("a","b"))
This R tutorial describes how to change line types of a graph generated using ggplot2 package.
Line types in R The different line types available in R software are : “blank”, “solid”, “dashed”, “dotted”, “dotdash”, “longdash”, “twodash”.
Basic line plots
Create line plots and change line types The argument linetype is used to change the line type :
# Basic line plot with points
a<-ggplot(data=df, aes(x=time, y=bill, group=1)) +
geom_line()+
geom_point()
# Change the line type
b<-ggplot(data=df, aes(x=time, y=bill, group=1)) +
geom_line(linetype = "dashed")+
geom_point()
ggarrange(a,b, labels=c("a","b"))
Line plot with multiple groups
df2 <- data.frame(sex = rep(c("Female", "Male"), each=3),
time=c("breakfeast", "Lunch", "Dinner"),
bill=c(10, 30, 15, 13, 40, 17) )
head(df2)
Change globally the appearance of lines In the graphs below, line types, colors and sizes are the same for the two groups :
# Line plot with multiple groups
a<-ggplot(data=df2, aes(x=time, y=bill, group=sex)) +
geom_line()+
geom_point()
# Change line types
b<-ggplot(data=df2, aes(x=time, y=bill, group=sex)) +
geom_line(linetype="dashed")+
geom_point()
# Change line colors and sizes
c<-ggplot(data=df2, aes(x=time, y=bill, group=sex)) +
geom_line(linetype="dotted", color="red", size=2)+
geom_point(color="blue", size=3)
ggarrange(a,b,c, nrow=1, ncol=3, labels=c("a","b","c"))
Change automatically the line types by groups In the graphs below, line types, colors and sizes are changed automatically by the levels of the variable sex :
# Change line types by groups (sex)
a<-ggplot(df2, aes(x=time, y=bill, group=sex)) +
geom_line(aes(linetype=sex))+
geom_point()+
theme(legend.position="top")
# Change line types + colors
b<-ggplot(df2, aes(x=time, y=bill, group=sex)) +
geom_line(aes(linetype=sex, color=sex))+
geom_point(aes(color=sex))+
theme(legend.position="top")
ggarrange(a,b,labels=c("a","b"))
Change manually the appearance of lines The functions below can be used :
# Set line types manually
a<-ggplot(df2, aes(x=time, y=bill, group=sex)) +
geom_line(aes(linetype=sex))+
geom_point()+
scale_linetype_manual(values=c("twodash", "dotted"))+
theme(legend.position="top")
# Change line colors and sizes
b<-ggplot(df2, aes(x=time, y=bill, group=sex)) +
geom_line(aes(linetype=sex, color=sex, size=sex))+
geom_point()+
scale_linetype_manual(values=c("twodash", "dotted"))+
scale_color_manual(values=c('#999999','#E69F00'))+
scale_size_manual(values=c(1, 1.5))+
theme(legend.position="top")
ggarrange(a,b,labels=c("a","b"))
This tutorial describes how to create a graph with error bars using R software and ggplot2 package. There are different types of error bars which can be created using the functions below :
Add error bars to a bar and line plots
ToothGrowth data is used. It describes the effect of Vitamin C on tooth growth in Guinea pigs. Three dose levels of Vitamin C (0.5, 1, and 2 mg) with each of two delivery methods [orange juice (OJ) or ascorbic acid (VC)] are used :
In the example below, we’ll plot the mean value of Tooth length in each group. The standard deviation is used to draw the error bars on the graph.
First, the helper function below will be used to calculate the mean and the standard deviation, for the variable of interest, in each group :
#+++++++++++++++++++++++++
# Function to calculate the mean and the standard deviation
# for each group
#+++++++++++++++++++++++++
# data : a data frame
# varname : the name of a column containing the variable
#to be summariezed
# groupnames : vector of column names to be used as
# grouping variables
data_summary <- function(data, varname, groupnames){
require(plyr)
summary_func <- function(x, col){
c(mean = mean(x[[col]], na.rm=TRUE),
sd = sd(x[[col]], na.rm=TRUE))
}
data_sum<-ddply(data, groupnames, .fun=summary_func,
varname)
data_sum <- rename(data_sum, c("mean" = varname))
return(data_sum)
}
Summarize the data :
df2 <- data_summary(ToothGrowth, varname="len",
groupnames=c("supp", "dose"))
# Convert dose to a factor variable
df2$dose=as.factor(df2$dose)
head(df2)
Barplot with error bars The function geom_errorbar() can be used to produce the error bars :
# Default bar plot
p<- ggplot(df2, aes(x=dose, y=len, fill=supp)) +
geom_bar(stat="identity", color="black",
position=position_dodge()) +
geom_errorbar(aes(ymin=len-sd, ymax=len+sd), width=.2,
position=position_dodge(.9))
# Finished bar plot
a<-p+labs(title="Tooth length per dose", x="Dose (mg)", y = "Length")+
theme_classic() +
scale_fill_manual(values=c('#999999','#E69F00'))
ggarrange(p,a,labels=c("p","a"))
# Keep only upper error bars
ggplot(df2, aes(x=dose, y=len, fill=supp)) +
geom_bar(stat="identity", color="black", position=position_dodge()) +
geom_errorbar(aes(ymin=len, ymax=len+sd), width=.2,
position=position_dodge(.9))
Line plot with error bars
# Default line plot
p<- ggplot(df2, aes(x=dose, y=len, group=supp, color=supp)) +
geom_line() +
geom_point()+
geom_errorbar(aes(ymin=len-sd, ymax=len+sd), width=.2,
position=position_dodge(0.05))
# Finished line plot
a<-p+labs(title="Tooth length per dose", x="Dose (mg)", y = "Length")+
theme_classic() +
scale_color_manual(values=c('#999999','#E69F00'))
ggarrange(p,a,labels=c("p","a"))
You can also use the functions geom_pointrange() or geom_linerange() instead of using geom_errorbar()
# Use geom_pointrange
a<-ggplot(df2, aes(x=dose, y=len, group=supp, color=supp)) +
geom_pointrange(aes(ymin=len-sd, ymax=len+sd))
# Use geom_line()+geom_pointrange()
b<-ggplot(df2, aes(x=dose, y=len, group=supp, color=supp)) +
geom_line()+
geom_pointrange(aes(ymin=len-sd, ymax=len+sd))
ggarrange(a,b,labels=c("a","b"))
Dot plot with mean point and error bars
The functions geom_dotplot() and stat_summary() are used :
The mean +/- SD can be added as a crossbar , a error bar or a pointrange :
p <- ggplot(df, aes(x=dose, y=len)) +
geom_dotplot(binaxis='y', stackdir='center')
# use geom_crossbar()
a<-p + stat_summary(fun.data="mean_sdl", fun.args = list(mult=1),
geom="crossbar", width=0.5)
# Use geom_errorbar()
b<-p + stat_summary(fun.data=mean_sdl, fun.args = list(mult=1),
geom="errorbar", color="red", width=0.2) +
stat_summary(fun.y=mean, geom="point", color="red")
# Use geom_pointrange()
c<-p + stat_summary(fun.data=mean_sdl, fun.args = list(mult=1),
geom="pointrange", color="red")
ggarrange(a,b,c,nrow=1,ncol=3,labels=c("a","b","c"))
This R tutorial describes how to create a pie chart for data visualization using R software and ggplot2 package.
The function coord_polar() is used to produce a pie chart, which is just a stacked bar chart in polar coordinates.
Simple pie charts
Use a barplot to visualize the data
# Barplot
bp<- ggplot(df, aes(x="", y=value, fill=group))+
geom_bar(width = 1, stat = "identity")
bp
Create a pie chart :
Change the pie chart fill colors
It is possible to change manually the pie chart fill colors using the functions :
Create a pie chart from a factor variable PlantGrowth data is used :
Create the pie chart of the count of observations in each group :
This R tutorial describes how to create a qq plot (or quantile-quantile plot) using R software and ggplot2 package. QQ plots is used to check whether a given data follows normal distribution.
The function stat_qq() or qplot() can be used.
Prepare the data
mtcars data sets are used in the examples below.
# Convert cyl column from a numeric to a factor variable
mtcars$cyl <- as.factor(mtcars$cyl)
head(mtcars)
Basic qq plots
In the example below, the distribution of the variable mpg is explored :
# Solution 1
a<-qplot(sample = mpg, data = mtcars)
# Solution 2
b<-ggplot(mtcars, aes(sample=mpg))+stat_qq()
ggarrange(a,b, labels=c("a","b"))
Change qq plot point shapes by groups
In the R code below, point shapes are controlled automatically by the variable cyl.
You can also set point shapes manually using the function scale_shape_manual()
# Change point shapes by groups
p<-qplot(sample = mpg, data = mtcars, shape=cyl)
# Change point shapes manually
a<-p + scale_shape_manual(values=c(1,17,19))
ggarrange(p,a,labels=c("p","a"))
In the R code below, point colors of the qq plot are automatically controlled by the levels of cyl :
It is also possible to change manually qq plot colors using the functions :
# Use custom color palettes
a<-p+scale_color_manual(values=c("#999999", "#E69F00", "#56B4E9"))
# Use brewer color palettes
b<-p+scale_color_brewer(palette="Dark2")
# Use grey scale
c<-p + scale_color_grey() + theme_classic()
ggarrange(a, b,c,nrow=1, ncol=3,labels=c("a","b","c"))
Change the legend position
a<-p + theme(legend.position="top")
b<-p + theme(legend.position="bottom")
c<-p + theme(legend.position="none") # Remove legend
ggarrange(a,b,c,nrow=1,ncol=3, labels=c("a","b","c"))
Customized qq plots
# Basic qq plot
qplot(sample = mpg, data = mtcars)+
labs(title="Miles per gallon \n according to the weight",
y = "Miles/(US) gallon")+
theme_classic()
# Change color/shape by groups
p <- qplot(sample = mpg, data = mtcars, color=cyl, shape=cyl)+
labs(title="Miles per gallon \n according to the weight",
y = "Miles/(US) gallon")
a<-p + theme_classic()
ggarrange(p,a,labels=c("p","a"))
Change colors manually :
# Continuous colors
a<-p + scale_color_brewer(palette="Blues") + theme_classic()
# Discrete colors
b<-p + scale_color_brewer(palette="Dark2") + theme_minimal()
# Gradient colors
c<-p + scale_color_brewer(palette="RdBu")
ggarrange(a,b,c, nrow=1,ncol=3, labels=c("a","b","c"))
This R tutorial describes how to create an ECDF plot (or Empirical Cumulative Density Function) using R software and ggplot2 package. ECDF reports for any given number the percent of individuals that are below that threshold.
The function stat_ecdf() can be used.
Create some data
ECDF plots
a<-ggplot(df, aes(height)) + stat_ecdf(geom = "point")
b<-ggplot(df, aes(height)) + stat_ecdf(geom = "step")
ggarrange(a,b, labels=c("a","b"))
Customized ECDF plots
# Basic ECDF plot
ggplot(df, aes(height)) + stat_ecdf(geom = "step")+
labs(title="Empirical Cumulative \n Density Function",
y = "F(height)", x="Height in inch")+
theme_classic()
print(): print a ggplot to a file
To print directly a ggplot to a file, the function print() is used:
# Print the plot to a pdf file
pdf("myplot.pdf")
myplot <- ggplot(mtcars, aes(wt, mpg)) + geom_point()
print(myplot)
dev.off()
## png
## 2
For printing to a png file, use:
## png
## 2
ggsave: save the last ggplot
ggsave is a convenient function for saving the last plot that you displayed. It also guesses the type of graphics device from the extension. This means the only argument you need to supply is the filename.
It’s also possible to make a ggplot and to save it from the screen using the function ggsave():
# 1. Create a plot
# The plot is displayed on the screen
ggplot(mtcars, aes(wt, mpg)) + geom_point()
For saving to a png file, use:
## Saving 7 x 5 in image
The aim of this tutorial is to describe how to modify plot titles (main title, axis labels and legend titles) using R software and ggplot2 package.
The functions below can be used :
ggtitle(label) # for the main title
xlab(label) # for the x axis label
ylab(label) # for the y axis label
labs(...) # for the main title, axis labels and legend titles
The argument label is the text to be used for the main title or for the axis labels.
Prepare the data
ToothGrowth data is used in the following examples.
# convert dose column from a numeric to a factor variable
ToothGrowth$dose <- as.factor(ToothGrowth$dose)
head(ToothGrowth)
Example of plot
Change the main title and axis labels
Change plot titles by using the functions ggtitle(), xlab() and ylab() :
Change plot titles using the function labs() as follow :
It is also possible to change legend titles using the function labs():
# Default plot
p <- ggplot(ToothGrowth, aes(x=dose, y=len, fill=dose))+
geom_boxplot()
# Modify legend titles
a<-p + labs(fill = "Dose (mg)")
ggarrange(p,a,labels=c("p","a"))
Change the appearance of the main title and axis labels
Main title and, x and y axis labels can be customized using the functions theme() and element_text() as follow :
# main title
p + theme(plot.title = element_text(family, face, colour, size))
# x axis title
p + theme(axis.title.x = element_text(family, face, colour, size))
# y axis title
p + theme(axis.title.y = element_text(family, face, colour, size))
The arguments below can be used for the function element_text() to change the appearance of the text :
# Default plot
p <- ggplot(ToothGrowth, aes(x=dose, y=len)) + geom_boxplot() +
ggtitle("Plot of length \n by dose") +
xlab("Dose (mg)") + ylab("Teeth length")
# Change the color, the size and the face of
# the main title, x and y axis labels
a<-p + theme(
plot.title = element_text(color="red", size=14, face="bold.italic"),
axis.title.x = element_text(color="blue", size=14, face="bold"),
axis.title.y = element_text(color="#993333", size=14, face="bold")
)
ggarrange(p,a, labels=c("p","a"))
Remove x and y axis labels
It’s possible to hide the main title and axis labels using the function element_blank() as follow :
# Hide the main title and axis titles
p + theme(
plot.title = element_blank(),
axis.title.x = element_blank(),
axis.title.y = element_blank())
The goal of this R tutorial is to describe how to change the legend of a graph generated using ggplot2 package.
ToothGrowth data is used in the examples below :
# Convert the variable dose from numeric to factor variable
ToothGrowth$dose <- as.factor(ToothGrowth$dose)
head(ToothGrowth)
Make sure that the variable dose is converted as a factor variable using the above R script.
Example of plot
Change the legend position
The position of the legend can be changed using the function theme() as follow :
a<-p + theme(legend.position="top")
b<-p + theme(legend.position="bottom")
ggarrange(a,b, labels=c("a","b"))
Note that, the argument legend.position can be also a numeric vector c(x,y). In this case it is possible to position the legend inside the plotting area. x and y are the coordinates of the legend box. Their values should be between 0 and 1. c(0,0) corresponds to the “bottom left” and c(1,1) corresponds to the “top right” position.
Change the legend title and text font styles
# legend title
a<-p + theme(legend.title = element_text(colour="blue", size=10,
face="bold"))
# legend labels
b<-p + theme(legend.text = element_text(colour="blue", size=10,
face="bold"))
ggarrange(a,b,labels=c("a","b"))
Change the background color of the legend box
# legend box background color
a<-p + theme(legend.background = element_rect(fill="lightblue",size=0.5, linetype="solid"))
b<-p + theme(legend.background = element_rect(fill="lightblue", size=0.5, linetype="solid", colour ="darkblue"))
ggarrange(a,b, labels=c("a","b"))
Change the order of legend items
To change the order of items to “2”, “0.5”, “1” :
Remove the plot legend
# Remove only the legend title
a<-p + theme(legend.title = element_blank())
# Remove the plot legend
b<-p + theme(legend.position='none')
ggarrange(a,b, labels=c("a","b"))
Remove slashes in the legend of a bar plot
# Default plot
a<-ggplot(data=ToothGrowth, aes(x=dose, fill=dose)) + geom_bar()
# Change bar plot border color,
# but slashes are added in the legend
b<-ggplot(data=ToothGrowth, aes(x=dose, fill=dose)) +
geom_bar(colour="black")
# Hide the slashes:
#1. plot the bars with no border color,
#2. plot the bars again with border color, but with a blank legend.
c<-ggplot(data=ToothGrowth, aes(x=dose, fill=dose))+
geom_bar() +
geom_bar(colour="black", show_guide=FALSE)
ggarrange(a,b,c, nrow=1, ncol=3, labels=c("a", "b", "c"))
guides() : set or remove the legend for a specific aesthetic
It’s possible to use the function guides() to set or remove the legend of a particular aesthetic(fill, color, size, shape, etc). mtcars data sets are used :
# Prepare the data : convert cyl and gear to factor variables
mtcars$cyl<-as.factor(mtcars$cyl)
mtcars$gear <- as.factor(mtcars$gear)
head(mtcars)
Default plot without guide specification
The R code below creates a scatter plot. The color and the shape of the points are determined by the factor variables cyl and gear, respectively. The size of the points are controlled by the variable qsec.
p <- ggplot(data = mtcars,
aes(x=mpg, y=wt, color=cyl, size=qsec, shape=gear))+
geom_point()
# Print the plot without guide specification
p
Change the legend position for multiple guides
Change the order for multiple guides
The function guide_legend() is used :
p+guides(color = guide_legend(order=1),
size = guide_legend(order=2),
shape = guide_legend(order=3))
If a continuous color is used, the order of the color guide can be changed using the function guide_colourbar() :
qplot(data = mpg, x = displ, y = cty, size = hwy,
colour = cyl, shape = drv) +
guides(colour = guide_colourbar(order = 1),
alpha = guide_legend(order = 2),
size = guide_legend(order = 3))
Remove a legend for a particular aesthetic
The R code below removes the legend for the aesthetics color and size :
Removing a particular legend can be done also when using the functions scale_xx. In this case the argument guide is used as follow :
# Remove legend for the point shape
a<-p+scale_shape(guide=FALSE)
# Remove legend for size
b<-p +scale_size(guide=FALSE)
# Remove legend for color
c<-p + scale_color_manual(values=c('#999999','#E69F00','#56B4E9'),
guide=FALSE)
ggarrange(a,b,c, nrow=1, ncol=3, labels=c("a", "b", "c"))
The goal of this article is to describe how to change the color of a graph generated using R software and ggplot2 package. A color can be specified either by name (e.g.: “red”) or by hexadecimal code (e.g. : “#FF1234”). The different color systems available in R are described at this link : colors in R.
In this R tutorial, you will learn how to :
Prepare the data
ToothGrowth and mtcars data sets are used in the examples below.
# Convert dose and cyl columns from numeric to factor variables
ToothGrowth$dose <- as.factor(ToothGrowth$dose)
mtcars$cyl <- as.factor(mtcars$cyl)
head(ToothGrowth)
Simple plots
# Box plot
a<- ggplot(ToothGrowth, aes(x=dose, y=len)) +geom_boxplot()
# scatter plot
b<- ggplot(mtcars, aes(x=wt, y=mpg)) + geom_point()
ggarrange(a,b, labels=c("a","b"))
Change colors by groups
The following R code changes the color of the graph by the levels of dose :
# Box plot
bp<-ggplot(ToothGrowth, aes(x=dose, y=len, fill=dose)) +
geom_boxplot()
# Scatter plot
sp<-ggplot(mtcars, aes(x=wt, y=mpg, color=cyl)) + geom_point()
ggarrange(bp,sp, labels=c("bp","sp"))
The lightness (l) and the chroma (c, intensity of color) of the default (hue) colors can be modified using the functions scale_hue as follow :
# Box plot
a<-bp + scale_fill_hue(l=40, c=35)
# Scatter plot
b<-sp + scale_color_hue(l=40, c=35)
ggarrange(a,b,labels=c("a","b"))
Note that, the default values for l and c are : l = 65, c = 100.
Change colors manually
A custom color palettes can be specified using the functions :
# Box plot
a<-bp + scale_fill_manual(values=c("#999999", "#E69F00", "#56B4E9"))
# Scatter plot
b<-sp + scale_color_manual(values=c("#999999", "#E69F00", "#56B4E9"))
ggarrange(a,b, labels=c("a","b"))
Note that, the argument breaks can be used to control the appearance of the legend. This holds true also for the other scale_xx() functions.
# Box plot
a<-bp + scale_fill_manual(breaks = c("2", "1", "0.5"),
values=c("red", "blue", "green"))
# Scatter plot
b<-sp + scale_color_manual(breaks = c("8", "6", "4"),
values=c("red", "blue", "green"))
ggarrange(a,b,labels=c("a","b"))
Use RColorBrewer palettes
The color palettes available in the RColorBrewer package are described here : color in R.
# Box plot
a<-bp + scale_fill_brewer(palette="Dark2")
# Scatter plot
b<-sp + scale_color_brewer(palette="BrBG")
ggarrange(a,b, labels=c("a","b"))
Available brewer palletes: Reds, Blues, Greys, Purples,RdPu, YlGn, YlOrRd, YlOrBr,YlGnBu,YlGn, Greens, Oranges, BuPu, BuGn, OrRd, PuBu,PuBuGn, Set1, Set2, Set3, Paste1, Paste2, Paired, Accent, Spectral, RdYlGn, RdYlGn, RdYlBu, RdGy, RdBu, PuOr, PRGn, PiYG, BrBG.
Use Wes Anderson color palettes Install and load the color palettes as follow :
The available color palettes are :GrandBudapest1, Moonrise1, Royal1, Moonrise2, Royal2, Cavalcanti, Moonrise3, GrandBudapest2, Chevalier, Zissou, FantasticFox, Darjeeling, Rushmore.
# Box plot
a<-bp+scale_fill_manual(values=wes_palette(n=3, name="Moonrise1"))
# Scatter plot
b<-sp+scale_color_manual(values=wes_palette(n=3, name="GrandBudapest1"))
ggarrange(a,b, labels=c("a","b"))
Use gray colors
The functions to use are :
scale_colour_grey() for points, lines, etc scale_fill_grey() for box plot, bar plot, violin plot, etc
# Box plot
a<-bp + scale_fill_grey() + theme_classic()
# Scatter plot
b<-sp + scale_color_grey() + theme_classic()
ggarrange(a,b, labels=c("a","b"))
Change the gray value at the low and the high ends of the palette :
# Box plot
a<-bp + scale_fill_grey(start=0.8, end=0.2) + theme_classic()
# Scatter plot
b<-sp + scale_color_grey(start=0.8, end=0.2) + theme_classic()
ggarrange(a,b, labels=c("a","b"))
Continuous colors
The graph can be colored according to the values of a continuous variable using the functions :
scale_color_gradient(), scale_fill_gradient() for sequential gradients between two colors scale_color_gradient2(), scale_fill_gradient2() for diverging gradients scale_color_gradientn(), scale_fill_gradientn() for gradient between n colors
Gradient colors for scatter plots
The graphs are colored using the qsec continuous variable :
# Color by qsec values
sp2<-ggplot(mtcars, aes(x=wt, y=mpg, color=qsec)) + geom_point()
# Change the low and high colors
# Sequential color scheme
a<-sp2+scale_color_gradient(low="blue", high="red")
# Diverging color scheme
mid<-mean(mtcars$qsec)
b<-sp2+scale_color_gradient2(midpoint=mid, low="blue", mid="white",
high="red", space ="Lab" )
ggarrange(sp2,a,b, labels=c("sp2","a","b"))
Gradient colors for histogram plots
set.seed(1234)
x <- rnorm(200)
# Histogram
hp<-qplot(x =x, fill=..count.., geom="histogram")
# Sequential color scheme
a<-hp+scale_fill_gradient(low="blue", high="red")
ggarrange(hp,a,labels=c("hp","a"))
Gradient between n colors
# Scatter plot
# Color points by the mpg variable
sp3<-ggplot(mtcars, aes(x=wt, y=mpg, color=mpg)) + geom_point()
# Gradient between n colors
a<-sp3+scale_color_gradientn(colours = rainbow(5))
ggarrange(sp3,a,labels=c("sp3","a"))