At the end of the session, the participants are expected to:
After data collection, we organize and analyze the data, and then we present the results of our analysis on some form that will allow us to reveal and highlight the important information that we were able to extract.
One way to effectively present our data is by graphical presentation, in which we provide a visual picture of the data set. It also allows us to present more information about the variable of interest, without showing too many numbers.
“Analyzing data presented in a good statistical chart is analogous to examining a painting where we can discover a deeper message.”
Let us take a look at these pictures and try to see what is wrong with them.
image1
image2
Preliminary tasks
Launch RStudio.
Prepare your data and save it in an external .txt, .csv or .xlsx files.
Import your data into R.
For illustration purposes, we will use the mtcars data.
Creating Graphs
The R base function plot() can be used to create graphs.
Let us create a scatterplot of the miles/US gallon and Weight (in 1000 lbs).
plot(x = my_data$wt, y = my_data$mpg,
pch = 16, frame = TRUE,
xlab = "Weight (in 1000 lbs)", ylab = "Miles per gallon", col = "royalblue4")
How do you specify colors in R plots?
R has 657 built in color names. To see the list of color names in R:
R uses hexadecimal system, which is a base-16 number system used to describe color.
The image below shows an example of hexadecimal colors and their corresponding code. This is lifted from www.visibone.com
hex
To know more about colors in R, check this cheat sheet
Saving graphs
If you are working with RStudio, the plot can be exported from menu in the plot panel.
Plots –> Export –> Save as Image or Save as PDF
Figure 2.1
It’s also possible to save the graph using R codes as follow:
# Open a pdf file
pdf("rplot.pdf")
# 2. Create a plot
plot(x = my_data$wt, y = my_data$mpg,
pch = 16, frame = FALSE,
xlab = "wt", ylab = "mpg", col = "#2E9FDF")
# Close the pdf file
dev.off()
## png
## 2
Or use this:
# 1. Open jpeg file
jpeg("rplot.jpg", width = 350, height = 350)
# 2. Create the plot
plot(x = my_data$wt, y = my_data$mpg,
pch = 16, frame = FALSE,
xlab = "wt", ylab = "mpg", col = "#2E9FDF")
# 3. Close the file
dev.off()
## png
## 2
Note that the R code above saves the file in the current working directory.
The plot() function is the generic function for plotting in R. It can be used to create basic graphs.
A simplified format of the function is:
x and y: the coordinates of points to plot
type : the type of graph to create; Possible values are :
Examples
Scatterplots are generally use to examine relationship between variables.
R base scatter plot: plot()
x <- mtcars$wt
y <- mtcars$mpg
# Plot with main and axis titles
# Change point shape (pch = 19) and remove frame.
plot(x, y, main = "Main title of the Graph",
xlab = "X axis title", ylab = "Y axis title",
pch = 19, frame = FALSE)
# Add regression line
abline(lm(y ~ x, data = mtcars), col = "red")
In choosing for the point symbols (pch) in R, we have the following list of options:
pch
Other details can be found here.
# Add loess fit
plot(x, y, main = "Main title of the Graph",
xlab = "X axis title", ylab = "Y axis title",
pch = 19, frame = FALSE)
lines(lowess(x, y), col = "blue")
Basic plots: pair()
Show only upper panel only:
Color points by group (species):
my_cols <- c("#993333", "#FFCC33", "#003399")
pairs(iris[,1:4], pch = 19, cex = 0.5,
col = my_cols[iris$Species],
lower.panel=NULL)
Use the R package psych
The function pairs.panels [in psych package] can be also used to create a scatter plot of matrices, with bivariate scatter plots below the diagonal, histograms on the diagonal, and the Pearson correlation above the diagonal.
library(psych)
pairs.panels(iris[,-5],
method = "pearson", # correlation method
hist.col = "#00AFBB",
density = TRUE, # show density plots
ellipses = FALSE # show correlation ellipses
)
Boxplots are used to examine the following:
To demonstrate how to create box plots in R, we’ll use the R built-in ToothGrowth data set.
Description of ToothGrowth data
The response is the length of odontoblasts (cells responsible for tooth growth) in 60 guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice or ascorbic acid (a form of vitamin C and coded as VC).
boxplot() function
Draw a box plot of the length of teeth:
We can edit the labels of the groups.
# Change group names
boxplot(len ~ dose, data = ToothGrowth, frame = FALSE, names = c("D0.5", "D1", "D2"))
# Change the color of border using one single color
boxplot(len ~ dose, data = ToothGrowth, frame = FALSE,
border = "steelblue")
# Change the color of border.
# Use different colors for each group
boxplot(len ~ dose, data = ToothGrowth, frame = FALSE,
border = c("#999999", "#E69F00", "#56B4E9"))
# Change fill color : single color
boxplot(len ~ dose, data = ToothGrowth, frame = FALSE,
col = "steelblue")
# Change fill color: multiple colors
boxplot(len ~ dose, data = ToothGrowth, frame = FALSE,
col = c("#999999", "#E69F00", "#56B4E9"))
** Box plot with multiple groups**
Change main title and axis labels
# Change axis titles and remove the frame
boxplot(len ~ dose, data = ToothGrowth,
main = "Plot of length by dose",
xlab = "Dose (mg)", ylab = "Length",
col = "lightgray", frame = FALSE)
Bar graphs are used to compare things between different groups or to track changes over time. However, when trying to measure change over time, bar graphs are best when the changes are larger.
Here, we’ll use the R built-in VADeaths data set.
Description of VADeaths data
Death rates per 1000 in Virginia in 1940.
## Rural Male Rural Female Urban Male Urban Female
## 50-54 11.7 8.7 15.4 8.4
## 55-59 18.1 11.7 24.3 13.6
## 60-64 26.9 20.3 37.0 19.3
## 65-69 41.0 30.9 54.6 35.1
## 70-74 66.0 54.3 71.1 50.0
## 50-54 55-59 60-64
## 11.7 18.1 26.9
# Change border and fill color using one single color
b1<-barplot(x, col = "white", border = "steelblue")
# Change the color of border.
# Use different colors for each group
b2<-barplot(x, col = "white",
border = c("#999999", "#E69F00", "#56B4E9"))
# Change axis titles
# Change color (col = "gray") and remove frame
barplot(x, main = "Death Rates in Virginia",
xlab = "Age", ylab = "Rate")
Line graphs are used to track changes over short and long periods of time. When smaller changes exist, line graphs are better to use than bar graphs. Line graphs can also be used to compare changes over the same period of time for more than one group.
We can use the plot() function and the line() function to create line plots in r.
The simplified format of plot() function to create a line plot. To add another line in the same plot we can use the lines() function.
x, y: coordinate vectors of points to join
type: character indicating the type of plotting. Allowed values are:
“p” for points
“l” for lines
“b” for both points and lines
“c” for empty points joined by lines
“o” for overplotted points and lines
“s” and “S” for stair steps
“n” does not produce any points or lines
lty: line types. Line types can either be specified as an integer (0=blank, 1=solid (default), 2=dashed, 3=dotted, 4=dotdash, 5=longdash, 6=twodash) or as one of the character strings “blank”, “solid”, “dashed”, “dotted”, “dotdash”, “longdash”, or “twodash”, where “blank” uses ‘invisible lines’ (i.e., does not draw them).
# Show both points and line
b<-plot(x, y1, type = "b", pch = 19,
col = "red", xlab = "x", ylab = "y")
# Create a first line
plot(x, y1, type = "b", frame = FALSE, pch = 19,
col = "red", xlab = "x", ylab = "y")
# Add a second line
lines(x, y2, pch = 18, col = "blue", type = "b", lty = 2)
# Add a legend to the plot
legend("topleft", legend=c("Line 1", "Line 2"),
col=c("red", "blue"), lty = 1:2, cex=0.8)
Pie charts are generally used to show percentage or proportional data and usually the percentage represented by each category is provided next to the corresponding slice of pie. Pie charts are good for displaying data for around 6 categories or fewer.
# Create some data
df <- data.frame(
group = c("Male", "Female", "Child"),
value = c(25, 25, 50)
)
df
The function pie() can be used to draw a pie chart.
pct <- round(df$value/sum(df$value)*100)
lbl <- paste(df$group, pct)
lbls <- paste(lbl,"%",sep="")
pie(df$value, labels = lbls, radius = 1)
Te function pie3D()[in plotrix package] can be used to draw a 3D pie chart.
Install plotrix package:
# 3D pie chart
library("plotrix")
pie3D(df$value, labels = df$group, radius = 1.5,
col = c("#999999", "#E69F00", "#56B4E9"))
# Explode the pie chart
pie3D(df$value, labels = df$group, radius = 1.5,
col = c("#999999", "#E69F00", "#56B4E9"),
explode = 0.1)
The data set contains the value of weight by sex for 200 individuals.
## [1] 48.96467 56.38715 60.42221 43.27151 57.14562 57.53028
A histogram can be created using the function hist(), which simplified format is as follow:
The function density() is used to estimate kernel density.
# Compute the density data
dens <- density(mtcars$mpg)
# plot density
plot(dens, frame = TRUE, col = "steelblue",
main = "Density plot of mpg")
# Fill the density plot using polygon()
plot(dens, frame = TRUE, col = "steelblue",
main = "Density plot of mpg")
polygon(dens, col = "steelblue")
The Q–Q (quantile-quantile) plot is a probability plot, which is a graphical method for comparing two probability distributions by plotting their quantiles against each other. This can be used to check for teh normality of the data.
Here, we’ll use the again the built-in R data set named ToothGrowth.
The R base functions qqnorm() and qqplot() can be used to produce quantile-quantile plots:
It’s also possible to use the function qqPlot() [in car package]:
## [1] 23 1
As all the points fall approximately along this reference line, we can assume normality.
SUMMARY
Note that other graphical parameters can be customized. To know more about this, click here
ggplot is a powerful and a flexible R package, implemented by Hadley Wickham, for producing elegant graphics.
The concept behind ggplot2 divides plot into three different fundamental parts: Plot = data + Aesthetics + Geometry.
The principal components of every plot can be defined as follow:
There are two major functions in ggplot2 package: qplot() and ggplot() functions. - qplot() stands for quick plot, which can be used to produce easily simple plots. - ggplot() function is more flexible and robust than qplot for building a plot piece by piece.
Install and Load the necessary packages
Data Format and Preparation
The data should be a data.frame (columns are variables and rows are observations).
The data set mtcars is used in the examples below:
This R tutorial describes how to create a box plot using R software and ggplot2 package.
The function geom_boxplot() is used. A simplified format is :
Prepare the data set
In this illustration, the ToothGrowth data was used:
# Convert the variable dose from a numeric to a factor variable
ToothGrowth$dose <- as.factor(ToothGrowth$dose)
head(ToothGrowth)
Make sure that the variable dose is converted as a factor variable using the above R script.
Basic box plot
# Basic box plot
p <- ggplot(ToothGrowth, aes(x=dose, y=len)) +
geom_boxplot()
# Rotate the box plot
b<-p + coord_flip()
# Notched box plot
c<-ggplot(ToothGrowth, aes(x=dose, y=len)) +
geom_boxplot(notch=TRUE)
# Change outlier, color, shape and size
d<-ggplot(ToothGrowth, aes(x=dose, y=len)) +
geom_boxplot(outlier.colour="red",outlier.shape=8,outlier.size=4)
ggarrange(p,b,c,d, nrow=2, ncol=2, labels=c("a","b","c","d"))
Box plot with dots
Dots (or points) can be added to a box plot using the functions geom_dotplot() or geom_jitter() :
# Box plot with dot plot
a<-p + geom_dotplot(binaxis='y', stackdir='center', dotsize=1)
# Box plot with jittered points 0.2 : degree of jitter in x direction
b<-p + geom_jitter(shape=16,position=position_jitter(0.2))
ggarrange(a,b, labels=c("a","b"))
Note that geom_jitter adds a small amount of random variation to the location of each point, and is a useful way of handling overplotting caused by discreteness in smaller datasets.
Change box plot line colors
Box plot line colors can be automatically controlled by the levels of the variable dose :
# Change box plot line colors by groups
p<-ggplot(ToothGrowth, aes(x=dose, y=len, color=dose)) +
geom_boxplot()
p
It is also possible to change manually box plot line colors using the functions :
# Use custom color palettes
a<-p+scale_color_manual(values=c("#999999", "#E69F00", "#56B4E9"))
# Use brewer color palettes
b<-p+scale_color_brewer(palette="Dark2")
# Use grey scale
c<-p + scale_color_grey() + theme_classic()
ggarrange(a,b,c, nrow=1,ncol=3, labels=c("a","b","c"))
Change box plot fill colors
In the R code below, box plot fill colors are automatically controlled by the levels of dose :
# Use single color
a<-ggplot(ToothGrowth, aes(x=dose, y=len)) +
geom_boxplot(fill='#A4A4A4', color="black")+
theme_classic()
# Change box plot colors by groups
p<-ggplot(ToothGrowth, aes(x=dose, y=len, fill=dose)) +
geom_boxplot()
ggarrange(a,p, labels=c("a","p"))
It is also possible to change manually box plot fill colors using the functions :
# Use custom color palettes
a<-p+scale_fill_manual(values=c("#999999", "#E69F00", "#56B4E9"))
# use brewer color palettes
b<-p+scale_fill_brewer(palette="Dark2")
# Use grey scale
c<-p + scale_fill_grey() + theme_classic()
ggarrange(a,b,c, nrow=1,ncol=3,labels=c("a","b","c"))
Change the legend position
a<-p + theme(legend.position="top")
b<-p + theme(legend.position="bottom")
c<-p + theme(legend.position="none") # Remove legend
ggarrange(a,b,c,nrow=1,ncol=3, labels=c("a","b","c"))
Change the order of items in the legend
The function scale_x_discrete can be used to change the order of items to “2”, “0.5”, “1” :
Box plot with multiple groups
# Change box plot colors by groups
a<-ggplot(ToothGrowth, aes(x=dose, y=len,
fill=supp)) +
geom_boxplot()
# Change the position
p<-ggplot(ToothGrowth, aes(x=dose, y=len, fill=supp)) +
geom_boxplot(position=position_dodge(1))
ggarrange(a,p, labels=c("a","p"))
Change box plot colors and add dots :
# Add dots
a<-p + geom_dotplot(binaxis='y', stackdir='center',
position=position_dodge(1))
# Change colors
b<-p+scale_fill_manual(values=c("#999999", "#E69F00", "#56B4E9"))
ggarrange(a,b,labels=c("a","b"))
Customized box plots
# Basic box plot
a<-ggplot(ToothGrowth, aes(x=dose, y=len)) +
geom_boxplot(fill="gray")+
labs(title="Plot of length per dose",x="Dose (mg)", y = "Length")+
theme_classic()
# Change automatically color by groups
bp <- ggplot(ToothGrowth, aes(x=dose, y=len, fill=dose)) +
geom_boxplot()+
labs(title="Plot of length per dose",x="Dose (mg)", y = "Length")
b<-bp + theme_classic()
ggarrange(a,b,labels=c("a","b"))
Change fill colors manually :
# Continuous colors
a<-bp + scale_fill_brewer(palette="Blues") + theme_classic()
# Discrete colors
b<-bp + scale_fill_brewer(palette="Dark2") + theme_minimal()
# Gradient colors
c<-bp + scale_fill_brewer(palette="RdBu") + theme_minimal()
ggarrange(a,b,c, nrow=1,ncol=3, labels=c("a","b","c"))
This R tutorial describes how to create a histogram plot using R software and ggplot2 package.
The function geom_histogram() is used. You can also add a line for the mean using the function geom_vline.
Prepare the data The data below will be used :
set.seed(1234)
df <- data.frame(
sex=factor(rep(c("F", "M"), each=200)),
weight=round(c(rnorm(200, mean=55, sd=5), rnorm(200, mean=65, sd=5)))
)
head(df)
Basic histogram plots
library(ggplot2)
# Basic histogram
a<-ggplot(df, aes(x=weight)) + geom_histogram()
# Change the width of bins
b<-ggplot(df, aes(x=weight)) +
geom_histogram(binwidth=1)
# Change colors
p<-ggplot(df, aes(x=weight)) +
geom_histogram(color="black", fill="white")
ggarrange(a,b,p,nrow=1,ncol=3, labels=c("a","b","p"))
Add mean line and density plot on the histogram
# Add mean line
a<-p+ geom_vline(aes(xintercept=mean(weight)),
color="blue", linetype="dashed", size=1)
# Histogram with density plot
b<-ggplot(df, aes(x=weight)) +
geom_histogram(aes(y=..density..), colour="black", fill="white")+
geom_density(alpha=.2, fill="#FF6666")
ggarrange(a,b, labels=c("a","b"))
Change histogram plot line types and colors
# Change line color and fill color
a<-ggplot(df, aes(x=weight))+
geom_histogram(color="darkblue", fill="lightblue")
# Change line type
b<-ggplot(df, aes(x=weight))+
geom_histogram(color="black", fill="lightblue",
linetype="dashed")
ggarrange(a,b, labels=c("a","b"))
Change histogram plot colors by groups
Calculate the mean of each group :
The package plyr is used to calculate the average weight of each group :
Change line colors
Histogram plot line colors can be automatically controlled by the levels of the variable sex.
# Change histogram plot line colors by groups
a<-ggplot(df, aes(x=weight, color=sex)) +
geom_histogram(fill="white")
# Overlaid histograms
b<-ggplot(df, aes(x=weight, color=sex)) +
geom_histogram(fill="white", alpha=0.5, position="identity")
ggarrange(a,b, labels=c("a","b"))
# Interleaved histograms
a<-ggplot(df, aes(x=weight, color=sex)) +
geom_histogram(fill="white", position="dodge")+
theme(legend.position="top")
# Add mean lines
p<-ggplot(df, aes(x=weight, color=sex)) +
geom_histogram(fill="white", position="dodge")+
geom_vline(data=mu, aes(xintercept=grp.mean, color=sex),
linetype="dashed")+
theme(legend.position="top")
ggarrange(a,p, labels=c("a","p"))
It is also possible to change manually histogram plot line colors using the functions :
# Use custom color palettes
a<-p+scale_color_manual(values=c("#999999", "#E69F00", "#56B4E9"))
# Use brewer color palettes
b<-p+scale_color_brewer(palette="Dark2")
# Use grey scale
c<-p + scale_color_grey() + theme_classic() +
theme(legend.position="top")
ggarrange(a,b,c, nrow=1, ncol=3, labels=c("a","b","c"))
Change fill colors
Histogram plot fill colors can be automatically controlled by the levels of sex :
# Change histogram plot fill colors by groups
a<-ggplot(df, aes(x=weight, fill=sex, color=sex)) +
geom_histogram(position="identity")
# Use semi-transparent fill
p<-ggplot(df, aes(x=weight, fill=sex, color=sex)) +
geom_histogram(position="identity", alpha=0.5)
# Add mean lines
c<-p+geom_vline(data=mu, aes(xintercept=grp.mean, color=sex),
linetype="dashed")
ggarrange(a,p,c, nrow=1, ncol=3, labels=c("a","p","c"))
It is also possible to change manually histogram plot fill colors using the functions :
# Use custom color palettes
a<-p+scale_color_manual(values=c("#999999", "#E69F00", "#56B4E9"))+
scale_fill_manual(values=c("#999999", "#E69F00", "#56B4E9"))
# use brewer color palettes
b<-p+scale_color_brewer(palette="Dark2")+
scale_fill_brewer(palette="Dark2")
# Use grey scale
c<-p + scale_color_grey()+scale_fill_grey() +
theme_classic()
ggarrange(a,b,c, nrow=1, ncol=3, labels=c("a","b","c"))
Change the legend position
a<-p + theme(legend.position="top")
b<-p + theme(legend.position="bottom")
# Remove legend
c<-p + theme(legend.position="none")
ggarrange(a,b,c, nrow=1, ncol=3, labels=c("a","b","c"))
Use facets
Split the plots into multiple panels:
p<-ggplot(df, aes(x=weight))+
geom_histogram(color="black", fill="white")+
facet_grid(sex ~ .)
# Add mean lines
a<-p+geom_vline(data=mu, aes(xintercept=grp.mean, color="red"),
linetype="dashed")
ggarrange(p,a,labels=c("p","a"))
Customized histogram plots
# Basic histogram
a<-ggplot(df, aes(x=weight, fill=sex)) +
geom_histogram(fill="white", color="black")+
geom_vline(aes(xintercept=mean(weight)), color="blue",
linetype="dashed")+
labs(title="Weight histogram plot",x="Weight(kg)", y = "Count")+
theme_classic()
# Change line colors by groups
b<-ggplot(df, aes(x=weight, color=sex, fill=sex)) +
geom_histogram(position="identity", alpha=0.5)+
geom_vline(data=mu, aes(xintercept=grp.mean, color=sex),
linetype="dashed")+
scale_color_manual(values=c("#999999", "#E69F00", "#56B4E9"))+
scale_fill_manual(values=c("#999999", "#E69F00", "#56B4E9"))+
labs(title="Weight histogram plot",x="Weight(kg)", y = "Count")+
theme_classic()
ggarrange(a,b,labels=c("a","b"))
Combine histogram and density plots :
# Change line colors by groups
ggplot(df, aes(x=weight, color=sex, fill=sex)) +
geom_histogram(aes(y=..density..), position="identity", alpha=0.5)+
geom_density(alpha=0.6)+
geom_vline(data=mu, aes(xintercept=grp.mean, color=sex),
linetype="dashed")+
scale_color_manual(values=c("#999999", "#E69F00", "#56B4E9"))+
scale_fill_manual(values=c("#999999", "#E69F00", "#56B4E9"))+
labs(title="Weight histogram plot",x="Weight(kg)", y = "Density")+
theme_classic()
Change line colors manually :
p<-ggplot(df, aes(x=weight, color=sex)) +
geom_histogram(fill="white", position="dodge")+
geom_vline(data=mu, aes(xintercept=grp.mean, color=sex),
linetype="dashed")
# Continuous colors
a<-p + scale_color_brewer(palette="Paired") +
theme_classic()+theme(legend.position="top")
# Discrete colors
b<-p + scale_color_brewer(palette="Dark2") +
theme_minimal()+theme_classic()+theme(legend.position="top")
# Gradient colors
c<-p + scale_color_brewer(palette="Accent") +
theme_minimal()+theme(legend.position="top")
ggarrange(a,b,c,nrow=1, ncol=3, labels=c("a","b","c"))
This article describes how create a scatter plot using R software and ggplot2 package. The function geom_point() is used.
Prepare the data
mtcars data sets are used in the examples below.
# Convert cyl column from a numeric to a factor variable
mtcars$cyl <- as.factor(mtcars$cyl)
head(mtcars)
Basic scatter plots
Simple scatter plots are created using the R code below. The color, the size and the shape of points can be changed using the function geom_point() as follow :
library(ggplot2)
# Basic scatter plot
a<-ggplot(mtcars, aes(x=wt, y=mpg)) + geom_point()
# Change the point size, and shape
b<-ggplot(mtcars, aes(x=wt, y=mpg)) +
geom_point(size=2, shape=23)
ggarrange(a,b, labels=c("a","b"))
Add regression lines The functions below can be used to add regression lines to a scatter plot :
Only the function geom_smooth() is covered in this section.
# Add the regression line
a<-ggplot(mtcars, aes(x=wt, y=mpg)) +
geom_point()+
geom_smooth(method=lm)
# Remove the confidence interval
b<-ggplot(mtcars, aes(x=wt, y=mpg)) +
geom_point()+
geom_smooth(method=lm, se=FALSE)
# Loess method
c<-ggplot(mtcars, aes(x=wt, y=mpg)) +
geom_point()+
geom_smooth()
ggarrange(a,b,c, nrow=1, ncol=3, labels=c("a","b","c"))
Change the appearance of points and lines This section describes how to change :
# Change the point colors and shapes
# Change the line type and color
a<-ggplot(mtcars, aes(x=wt, y=mpg)) +
geom_point(shape=18, color="blue")+
geom_smooth(method=lm, se=FALSE, linetype="dashed",
color="darkred")
# Change the confidence interval fill color
b<-ggplot(mtcars, aes(x=wt, y=mpg)) +
geom_point(shape=18, color="blue")+
geom_smooth(method=lm, linetype="dashed",
color="darkred", fill="blue")
ggarrange(a,b, labels=c("a","b"))
Scatter plots with multiple groups
This section describes how to change point colors and shapes automatically and manually.
Change the point color/shape/size automatically
In the R code below, point shapes, colors and sizes are controlled by the levels of the factor variable cyl :
# Change point shapes by the levels of cyl
a<-ggplot(mtcars, aes(x=wt, y=mpg, shape=cyl)) +
geom_point()
# Change point shapes and colors
b<-ggplot(mtcars, aes(x=wt, y=mpg, shape=cyl, color=cyl)) +
geom_point()
# Change point shapes, colors and sizes
c<-ggplot(mtcars, aes(x=wt, y=mpg, shape=cyl, color=cyl, size=cyl)) +
geom_point()
ggarrange(a,b,c,nrow=1, ncol=3, labels=c("a","b","c"))
Add regression lines
Regression lines can be added as follow :
# Add regression lines
a<-ggplot(mtcars, aes(x=wt, y=mpg, color=cyl, shape=cyl)) +
geom_point() +
geom_smooth(method=lm)
# Remove confidence intervals
# Extend the regression lines
b<-ggplot(mtcars, aes(x=wt, y=mpg, color=cyl, shape=cyl)) +
geom_point() +
geom_smooth(method=lm, se=FALSE, fullrange=TRUE)
ggarrange(a,b, labels=c("a","b"))
The fill color of confidence bands can be changed as follow :
ggplot(mtcars, aes(x=wt, y=mpg, color=cyl, shape=cyl)) +
geom_point() +
geom_smooth(method=lm, aes(fill=cyl))
Change the point color/shape/size manually
The functions below are used :
# Change point shapes and colors manually
a<-ggplot(mtcars, aes(x=wt, y=mpg, color=cyl, shape=cyl)) +
geom_point() +
geom_smooth(method=lm, se=FALSE, fullrange=TRUE)+
scale_shape_manual(values=c(3, 16, 17))+
scale_color_manual(values=c('#999999','#E69F00', '#56B4E9'))+
theme(legend.position="top")
# Change the point sizes manually
b<-ggplot(mtcars, aes(x=wt, y=mpg, color=cyl, shape=cyl))+
geom_point(aes(size=cyl)) +
geom_smooth(method=lm, se=FALSE, fullrange=TRUE)+
scale_shape_manual(values=c(3, 16, 17))+
scale_color_manual(values=c('#999999','#E69F00', '#56B4E9'))+
scale_size_manual(values=c(2,3,4))+
theme(legend.position="top")
ggarrange(a,b, labels=c("a","b"))
It is also possible to change manually point and line colors using the functions :
p <- ggplot(mtcars, aes(x=wt, y=mpg, color=cyl, shape=cyl)) +
geom_point() +
geom_smooth(method=lm, se=FALSE, fullrange=TRUE)+
theme_classic()
# Use brewer color palettes
a<-p+scale_color_brewer(palette="Dark2")
# Use grey scale
b<-p + scale_color_grey()
ggarrange(a,b, labels=c("a","b"))
This R tutorial describes how to create a barplot using R software and ggplot2 package.
The function geom_bar() can be used.
Basic barplots
Data derived from ToothGrowth data sets are used. ToothGrowth describes the effect of Vitamin C on Tooth growth in Guinea pigs.
Create barplots
library(ggplot2)
# Basic barplot
p<-ggplot(data=df, aes(x=dose, y=len)) +
geom_bar(stat="identity")
# Horizontal bar plot
a<-p + coord_flip()
Change the width and the color of bars :
# Change the width of bars
a<-ggplot(data=df, aes(x=dose, y=len)) +
geom_bar(stat="identity", width=0.5)
# Change colors
b<-ggplot(data=df, aes(x=dose, y=len)) +
geom_bar(stat="identity", color="blue", fill="white")
# Minimal theme + blue fill color
p<-ggplot(data=df, aes(x=dose, y=len)) +
geom_bar(stat="identity", fill="steelblue")+
theme_minimal()
ggarrange(a,b,p, nrow=1, ncol=3, labels=c("a","b","p"))
Choose which items to display :
Bar plot with labels
# Outside bars
a<-ggplot(data=df, aes(x=dose, y=len)) +
geom_bar(stat="identity", fill="steelblue")+
geom_text(aes(label=len), vjust=-0.3, size=3.5)+
theme_minimal()
# Inside bars
b<-ggplot(data=df, aes(x=dose, y=len)) +
geom_bar(stat="identity", fill="steelblue")+
geom_text(aes(label=len), vjust=1.6, color="white", size=3.5)+
theme_minimal()
ggarrange(a,b, labels=c("a","b"))
Barplot of counts
To make a barplot of counts, we will use the mtcars data sets :
# Don't map a variable to y
ggplot(mtcars, aes(x=factor(cyl)))+
geom_bar(stat="count", width=0.7, fill="steelblue")+
theme_minimal()
Change outline colors
Barplot outline colors can be automatically controlled by the levels of the variable dose :
# Change barplot line colors by groups
p<-ggplot(df, aes(x=dose, y=len, color=dose)) +
geom_bar(stat="identity", fill="white")
It is also possible to change manually barplot line colors using the functions :
# Use custom color palettes
a<-p+scale_color_manual(values=c("#999999", "#E69F00", "#56B4E9"))
# Use brewer color palettes
b<-p+scale_color_brewer(palette="Dark2")
# Use grey scale
c<-p + scale_color_grey() + theme_classic()
ggarrange(a,b,c,nrow=1,ncol=3,labels=c("a","b","c"))
Change fill colors
In the R code below, barplot fill colors are automatically controlled by the levels of dose :
# Change barplot fill colors by groups
p<-ggplot(df, aes(x=dose, y=len, fill=dose)) +
geom_bar(stat="identity")+theme_minimal()
p
It is also possible to change manually barplot fill colors using the functions :
# Use custom color palettes
a<-p+scale_fill_manual(values=c("#999999", "#E69F00", "#56B4E9"))
# use brewer color palettes
b<-p+scale_fill_brewer(palette="Dark2")
# Use grey scale
c<-p + scale_fill_grey()
ggarrange(a,b,c,nrow=1, ncol=3,labels=c("a","b","c"))
Use black outline color :
ggplot(df, aes(x=dose, y=len, fill=dose))+
geom_bar(stat="identity", color="black")+
scale_fill_manual(values=c("#999999", "#E69F00", "#56B4E9"))+
theme_minimal()
Change legend position
# Change bar fill colors to blues
p <- p+scale_fill_brewer(palette="Blues")
a<-p + theme(legend.position="top")
b<-p + theme(legend.position="bottom")
# Remove legend
c<-p + theme(legend.position="none")
ggarrange(a,b,c, nrow=1, ncol=3, labels=c("a","b","c"))
Change the order of items in the legend
The function scale_x_discrete can be used to change the order of items to “2”, “0.5”, “1” :
Barplot with multiple groups
df2 <- data.frame(supp=rep(c("VC", "OJ"), each=3),
dose=rep(c("D0.5", "D1", "D2"),2),
len=c(6.8, 15, 33, 4.2, 10, 29.5))
head(df2)
Create barplots
A stacked barplot is created by default. You can use the function position_dodge() to change this. The barplot fill color is controlled by the levels of dose :
# Stacked barplot with multiple groups
a<-ggplot(data=df2, aes(x=dose, y=len, fill=supp)) +
geom_bar(stat="identity")
# Use position=position_dodge()
b<-ggplot(data=df2, aes(x=dose, y=len, fill=supp)) +
geom_bar(stat="identity", position=position_dodge())
ggarrange(a, b, labels=c("a","b"))
Change the color manually :
# Change the colors manually
p <- ggplot(data=df2, aes(x=dose, y=len, fill=supp)) +
geom_bar(stat="identity", color="black", position=position_dodge())+
theme_minimal()
# Use custom colors
a<-p + scale_fill_manual(values=c('#999999','#E69F00'))
# Use brewer color palettes
b<-p + scale_fill_brewer(palette="Blues")
ggarrange(a,b, labels=c("a","b"))
Add labels
Add labels to a dodged barplot :
ggplot(data=df2, aes(x=dose, y=len, fill=supp)) +
geom_bar(stat="identity", position=position_dodge())+
geom_text(aes(label=len), vjust=1.6, color="white",
position = position_dodge(0.9), size=3.5)+
scale_fill_brewer(palette="Paired")+
theme_minimal()
Add labels to a stacked barplot : 3 steps are required
# Calculate the cumulative sum of len for each dose
df_cumsum <- ddply(df_sorted, "dose",
transform, label_ypos=cumsum(len))
head(df_cumsum)
# Create the barplot
ggplot(data=df_cumsum, aes(x=dose, y=len, fill=supp)) +
geom_bar(stat="identity")+
geom_text(aes(y=label_ypos, label=len), vjust=1.6,
color="white", size=3.5)+
scale_fill_brewer(palette="Paired")+
theme_minimal()
If you want to place the labels at the middle of bars, you have to modify the cumulative sum as follow :
df_cumsum <- ddply(df_sorted, "dose",
transform,
label_ypos=cumsum(len) - 0.5*len)
# Create the barplot
ggplot(data=df_cumsum, aes(x=dose, y=len, fill=supp)) +
geom_bar(stat="identity")+
geom_text(aes(y=label_ypos, label=len), vjust=1.6,
color="white", size=3.5)+
scale_fill_brewer(palette="Paired")+
theme_minimal()
Barplot with a numeric x-axis
If the variable on x-axis is numeric, it can be useful to treat it as a continuous or a factor variable depending on what you want to do :
# Create some data
df2 <- data.frame(supp=rep(c("VC", "OJ"), each=3),
dose=rep(c("0.5", "1", "2"),2),
len=c(6.8, 15, 33, 4.2, 10, 29.5))
head(df2)
# x axis treated as continuous variable
df2$dose <- as.numeric(as.vector(df2$dose))
a<-ggplot(data=df2, aes(x=dose, y=len, fill=supp)) +
geom_bar(stat="identity", position=position_dodge())+
scale_fill_brewer(palette="Paired")+
theme_minimal()
# Axis treated as discrete variable
df2$dose<-as.factor(df2$dose)
b<-ggplot(data=df2, aes(x=dose, y=len, fill=supp)) +
geom_bar(stat="identity", position=position_dodge())+
scale_fill_brewer(palette="Paired")+
theme_minimal()
ggarrange(a,b,labels=c("a","b"))
Customized barplots
# Change color by groups
# Add error bars
p + labs(title="Plot of length per dose",
x="Dose (mg)", y = "Length")+
scale_fill_manual(values=c('black','lightgray'))+
theme_classic()
Change fill colors manually :
# Greens
a<-p + scale_fill_brewer(palette="Greens") + theme_minimal()
# Reds
b<-p + scale_fill_brewer(palette="Reds") + theme_minimal()
ggarrange(a,b,labels=c("a","b"))
This R tutorial describes how to change line types of a graph generated using ggplot2 package.
Line types in R
The different line types available in R software are : “blank”, “solid”, “dashed”, “dotted”, “dotdash”, “longdash”, “twodash”.
Basic line plots
Create line plots and change line types
The argument linetype is used to change the line type :
# Basic line plot with points
a<-ggplot(data=df, aes(x=time, y=bill, group=1)) +
geom_line()+
geom_point()
# Change the line type
b<-ggplot(data=df, aes(x=time, y=bill, group=1)) +
geom_line(linetype = "dashed")+
geom_point()
ggarrange(a,b, labels=c("a","b"))
Line plot with multiple groups
df2 <- data.frame(sex = rep(c("Female", "Male"), each=3),
time=c("breakfeast", "Lunch", "Dinner"),
bill=c(10, 30, 15, 13, 40, 17) )
head(df2)
Change globally the appearance of lines
In the graphs below, line types, colors and sizes are the same for the two groups :
# Line plot with multiple groups
a<-ggplot(data=df2, aes(x=time, y=bill, group=sex)) +
geom_line()+
geom_point()
# Change line types
b<-ggplot(data=df2, aes(x=time, y=bill, group=sex)) +
geom_line(linetype="dashed")+
geom_point()
# Change line colors and sizes
c<-ggplot(data=df2, aes(x=time, y=bill, group=sex)) +
geom_line(linetype="dotted", color="red", size=2)+
geom_point(color="blue", size=3)
ggarrange(a,b,c, nrow=1, ncol=3, labels=c("a","b","c"))
Change automatically the line types by groups
In the graphs below, line types, colors and sizes are changed automatically by the levels of the variable sex :
# Change line types by groups (sex)
a<-ggplot(df2, aes(x=time, y=bill, group=sex)) +
geom_line(aes(linetype=sex))+
geom_point()+
theme(legend.position="top")
# Change line types + colors
b<-ggplot(df2, aes(x=time, y=bill, group=sex)) +
geom_line(aes(linetype=sex, color=sex))+
geom_point(aes(color=sex))+
theme(legend.position="top")
ggarrange(a,b,labels=c("a","b"))
Change manually the appearance of lines
The functions below can be used :
# Set line types manually
a<-ggplot(df2, aes(x=time, y=bill, group=sex)) +
geom_line(aes(linetype=sex))+
geom_point()+
scale_linetype_manual(values=c("twodash", "dotted"))+
theme(legend.position="top")
# Change line colors and sizes
b<-ggplot(df2, aes(x=time, y=bill, group=sex)) +
geom_line(aes(linetype=sex, color=sex, size=sex))+
geom_point()+
scale_linetype_manual(values=c("twodash", "dotted"))+
scale_color_manual(values=c('#999999','#E69F00'))+
scale_size_manual(values=c(1, 1.5))+
theme(legend.position="top")
ggarrange(a,b,labels=c("a","b"))
This tutorial describes how to create a graph with error bars using R software and ggplot2 package. There are different types of error bars which can be created using the functions below :
Add error bars to a bar and line plots
The ToothGrowth data is used.
In the example below, we’ll plot the mean value of Tooth length in each group. The standard deviation is used to draw the error bars on the graph.
First, the helper function below will be used to calculate the mean and the standard deviation, for the variable of interest, in each group :
#+++++++++++++++++++++++++
# Function to calculate the mean and the standard deviation
# for each group
#+++++++++++++++++++++++++
# data : a data frame
# varname : the name of a column containing the variable
#to be summariezed
# groupnames : vector of column names to be used as
# grouping variables
data_summary <- function(data, varname, groupnames){
require(plyr)
summary_func <- function(x, col){
c(mean = mean(x[[col]], na.rm=TRUE),
sd = sd(x[[col]], na.rm=TRUE))
}
data_sum<-ddply(data, groupnames, .fun=summary_func,
varname)
data_sum <- rename(data_sum, c("mean" = varname))
return(data_sum)
}
Summarize the data :
df2 <- data_summary(ToothGrowth, varname="len",
groupnames=c("supp", "dose"))
# Convert dose to a factor variable
df2$dose=as.factor(df2$dose)
head(df2)
Barplot with error bars
The function geom_errorbar() can be used to produce the error bars :
# Default bar plot
p<- ggplot(df2, aes(x=dose, y=len, fill=supp)) +
geom_bar(stat="identity", color="black",
position=position_dodge()) +
geom_errorbar(aes(ymin=len-sd, ymax=len+sd), width=.2,
position=position_dodge(.9))
# Finished bar plot
a<-p+labs(title="Tooth length per dose", x="Dose (mg)", y = "Length")+
theme_classic() +
scale_fill_manual(values=c('#999999','#E69F00'))
ggarrange(p,a,labels=c("p","a"))
# Keep only upper error bars
ggplot(df2, aes(x=dose, y=len, fill=supp)) +
geom_bar(stat="identity", color="black", position=position_dodge()) +
geom_errorbar(aes(ymin=len, ymax=len+sd), width=.2,
position=position_dodge(.9))
Line plot with error bars
# Default line plot
p<- ggplot(df2, aes(x=dose, y=len, group=supp, color=supp)) +
geom_line() +
geom_point()+
geom_errorbar(aes(ymin=len-sd, ymax=len+sd), width=.2,
position=position_dodge(0.05))
# Finished line plot
a<-p+labs(title="Tooth length per dose", x="Dose (mg)", y = "Length")+
theme_classic() +
scale_color_manual(values=c('#999999','#E69F00'))
ggarrange(p,a,labels=c("p","a"))
You can also use the functions geom_pointrange() or geom_linerange() instead of using geom_errorbar()
# Use geom_pointrange
a<-ggplot(df2, aes(x=dose, y=len, group=supp, color=supp)) +
geom_pointrange(aes(ymin=len-sd, ymax=len+sd))
# Use geom_line()+geom_pointrange()
b<-ggplot(df2, aes(x=dose, y=len, group=supp, color=supp)) +
geom_line()+
geom_pointrange(aes(ymin=len-sd, ymax=len+sd))
ggarrange(a,b,labels=c("a","b"))
Dot plot with mean point and error bars
The functions geom_dotplot() and stat_summary() are used :
The mean +/- SD can be added as a crossbar , a error bar or a pointrange :
p <- ggplot(df, aes(x=dose, y=len)) +
geom_dotplot(binaxis='y', stackdir='center')
# use geom_crossbar()
a<-p + stat_summary(fun.data="mean_sdl", fun.args = list(mult=1),
geom="crossbar", width=0.5)
# Use geom_errorbar()
b<-p + stat_summary(fun.data=mean_sdl, fun.args = list(mult=1),
geom="errorbar", color="red", width=0.2) +
stat_summary(fun.y=mean, geom="point", color="red")
# Use geom_pointrange()
c<-p + stat_summary(fun.data=mean_sdl, fun.args = list(mult=1),
geom="pointrange", color="red")
ggarrange(a,b,c,nrow=1,ncol=3,labels=c("a","b","c"))
This R tutorial describes how to create a pie chart for data visualization using R software and ggplot2 package.
The function coord_polar() is used to produce a pie chart, which is just a stacked bar chart in polar coordinates.
Simple pie charts
Use a barplot to visualize the data
# Barplot
bp<- ggplot(df, aes(x="", y=value, fill=group))+
geom_bar(width = 1, stat = "identity")
bp
Create a pie chart :
Change the pie chart fill colors
It is possible to change manually the pie chart fill colors using the functions :
Create a pie chart from a factor variable
PlantGrowth data is used :
Description of the PlantGrowth data:
Results from an experiment to compare yields (as measured by dried weight of plants) obtained under a control and two different treatment conditions.
Create the pie chart of the count of observations in each group :
This R tutorial describes how to create a qq plot (or quantile-quantile plot) using R software and ggplot2 package. QQ plots is used to check whether a given data follows normal distribution.
The function stat_qq() or qplot() can be used.
Prepare the data
mtcars data sets are used in the examples below.
# Convert cyl column from a numeric to a factor variable
mtcars$cyl <- as.factor(mtcars$cyl)
head(mtcars)
Basic qq plots
In the example below, the distribution of the variable mpg is explored :
# Solution 1
a<-qplot(sample = mpg, data = mtcars)
# Solution 2
b<-ggplot(mtcars, aes(sample=mpg))+stat_qq()
ggarrange(a,b, labels=c("a","b"))
Change qq plot point shapes by groups
In the R code below, point shapes are controlled automatically by the variable cyl.
You can also set point shapes manually using the function scale_shape_manual()
# Change point shapes by groups
p<-qplot(sample = mpg, data = mtcars, shape=cyl)
# Change point shapes manually
a<-p + scale_shape_manual(values=c(1,17,19))
ggarrange(p,a,labels=c("p","a"))
In the R code below, point colors of the qq plot are automatically controlled by the levels of cyl :
It is also possible to change manually qq plot colors using the functions :
# Use custom color palettes
a<-p+scale_color_manual(values=c("#999999", "#E69F00", "#56B4E9"))
# Use brewer color palettes
b<-p+scale_color_brewer(palette="Dark2")
# Use grey scale
c<-p + scale_color_grey() + theme_classic()
ggarrange(a, b,c,nrow=1, ncol=3,labels=c("a","b","c"))
Change the legend position
a<-p + theme(legend.position="top")
b<-p + theme(legend.position="bottom")
c<-p + theme(legend.position="none") # Remove legend
ggarrange(a,b,c,nrow=1,ncol=3, labels=c("a","b","c"))
Customized qq plots
# Basic qq plot
qplot(sample = mpg, data = mtcars)+
labs(title="Miles per gallon according to the weight",
y = "Miles/(US) gallon")+
theme_classic()
# Change color/shape by groups
p <- qplot(sample = mpg, data = mtcars, color=cyl, shape=cyl)+
labs(title="Miles per gallon according to the weight",
y = "Miles/(US) gallon")
a<-p + theme_classic()
ggarrange(p,a,labels=c("p","a"))