The goal of this lesson is to provide an overview of ways to plot in R
We will cover:
base graphics system
ggplot2 package
Batteries not included
With the base installation of R you have the ability to generate 2D graphics quickly.
Basically a two-step process:
1. Initialize a new plot
2. Add to an exisiting plot
But often you can customize upfront using arguments, making it one-step
Base functions
Include:
plot
: make scatterplotlines
: add lines to plotpoints
: add points to plottext
: add texttitle
: add title to axes or plotmtext
: add margin textaxis
: add axis tick/labels
Important parameters
The `par()' function is used for setting/overriding graphical control parameters. These include:
pch
: plotting symbollty
: line typelwd
: line widthcol
: plotting colorlas
: orientation of axis labels bg
: background colormar
: margin sizeoma
: outer margin sizemfrow
: number of plots per row, column. Plots filled in row-wise.mfcol
: number of plots per row, column. Plots filled in column-wise.
Deconstruction of plot
plot(cars)
By default, axis and annotations are TRUE
or T
plot(cars,axes=F,ann=F)
plot(cars,axes=T,ann=F)
plot(cars,axes=T,ann=T)
Notice this is equivalent to
plot(cars$speed,cars$dist,xlab="speed", ylab="dist")
Why we need to specify axis labels:
plot(cars$speed,cars$dist)
Adding to a plot
Initialize first,
Add a title
Add a grid
plot(cars)
title(main="Speed and Stopping Distances of Cars")
grid()
Or, in one step
plot(cars,xlab="speed (mph)", ylab="dist (ft)",
main="Speed and Stopping Distances of Cars",
grid())
Parameters
type
par(mfrow=c(2,2))
#plot 1
plot(cars,xlab="speed (mph)", ylab="dist (ft)",
main="Speed and Stopping Distances of Cars",
type="p")
grid()
#plot 2
plot(cars,xlab="speed (mph)", ylab="dist (ft)",
main="Speed and Stopping Distances of Cars",
type="l")
grid()
#plot 3
plot(cars,xlab="speed (mph)", ylab="dist (ft)",
main="Speed and Stopping Distances of Cars",
type="b")
grid()
#plot 4
plot(cars,xlab="speed (mph)", ylab="dist (ft)",
main="Speed and Stopping Distances of Cars",
type="o")
grid()
line type (lty
), weight (lwd
), color (col
)
par(mfrow=c(2,2))
#plot 1
plot(cars,xlab="speed (mph)", ylab="dist (ft)",
main="lty=2",
type="l",
lty=2,
col="red")
grid()
#plot 2
plot(cars,xlab="speed (mph)", ylab="dist (ft)",
main="lty='dashed'",
type="l",
lty="dashed",
col="red")
grid()
#plot 3
plot(cars,xlab="speed (mph)", ylab="dist (ft)",
main="lwd=3",
type="l",
col="red",
lwd=3)
grid()
#plot 4
plot(cars,xlab="speed (mph)", ylab="dist (ft)",
main="type='p',lwd=3",
type="p",
col="blue",
lwd=3)
grid()
Plotting character (pch
) and size (cex
)
par(mfrow=c(2,2))
#plot 1
plot(cars,xlab="speed (mph)", ylab="dist (ft)",
main="pch=2",
pch=2)
grid()
#plot 2
plot(cars,xlab="speed (mph)", ylab="dist (ft)",
main="pch=3",
pch=3)
#plot 3
plot(cars,xlab="speed (mph)", ylab="dist (ft)",
main="pch=21, cex=2",
pch=21,
cex=2)
grid()
#plot 4
plot(cars,xlab="speed (mph)", ylab="dist (ft)",
main="bg='yellow'",
pch=21,
col="red",
bg="yellow",
cex=2)
grid()
Add regression line
The lm()
funtion is used to fit linear models.
Model coeffients and statisitics can be extracted for display.
Confidence intervals can be calculated and plotted.
Step through the example below:
plot(cars,xlab="speed (mph)", ylab="dist (ft)",
main="Speed and Stopping Distances of Cars",
pch=2)
grid()
x<-cars$speed
y<-cars$dist
lm<-lm(y~x) #define linear model
abline(lm, lty=3, col="blue") #draw regression line
lm<-summary(lm) #extract the r-squared value
names(lm)
r2<-lm$adj.r.squared
lm$coefficients
p<-lm$coefficients[2,4] #extract the p value
mtext(col="blue",side=3,
bquote(italic(R)^2 == .(format(r2, digits = 3))))
lm<-lm(y~x)
pred<-predict(lm, interval = "confidence", level = 0.95)
lines(x,pred[,2],col="red",lty=2) #add lower CI band
lines(x,pred[,3],col="red",lty=2) #add upper CI band
predict()
can be used for point estimates:
predict(lm,data.frame(x=21.5),interval = "confidence")
## fit lwr upr
## 1 66.97 60.25 73.68
Box Plots
Use boxplot()
. Graphical parameters are similar to plot()
attach()
function is useful to point to an R object once and save typing. Remember to detach()
the object when finished!attach(chickwts)
boxplot(weight~feed,
col=topo.colors(6), # a color palette
horizontal=F,
main="Chick weight by feed type",
xlab="feed type",
ylab="weight, g",
notch=F,
boxwex = 0.3)
grid(NA,6) #add horizontal gridlines only
detach(chickwts)
Using a subset of the data
attach(chickwts)
boxplot(weight~feed,
subset(chickwts,weight > 250),
col=topo.colors(6), # a color palette
horizontal=F,
main="Chick weight by feed type\n Weights > 250 g",
xlab="feed type",
ylab="weight, g",
boxwex = 0.4)
detach(chickwts)
T-test
The t.test()
function provides a variety of t-tests.
By default it assumes unequal variances.
Let's subset data from the last box plot and perform a t-test.
attach(chickwts)
set1<-"soybean"
set2<-"sunflower"
set1<-subset(weight,feed==set1)
set2<-subset(weight,feed==set2)
tt<-t.test(set1,set2,alternative='two.sided', conf.level=.95,
var.equal=TRUE)
tt
##
## Two Sample t-test
##
## data: set1 and set2
## t = -4.05, df = 24, p-value = 0.0004641
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -124.52 -40.45
## sample estimates:
## mean of x mean of y
## 246.4 328.9
p<-format(tt$p.value, digits = 3)
p
## [1] "0.000464"
detach(chickwts)
Histograms
hist()
plots a histogram
hist(mtcars$hp)
Arguments
breaks |
breakpoints between histogram cells |
freq |
logical. If F probability densities are plotted |
col |
fill color for bars |
border |
border color around bars |
labels |
logical, or character if not F |
Examples
hist(mtcars$hp,col="blue",freq=F)
hist(mtcars$hp,col="blue",breaks=20)
hist(mtcars$hp,col="yellow",border="red",labels=T)
Plot layout
Use mfrow
or mfcol
to control multiple plot layout.
# 4 figures arranged in 2 rows and 2 columns
attach(mtcars)
par(mfrow=c(2,2))
plot(wt,mpg, main="Scatterplot of wt vs. mpg")
plot(wt,disp, main="Scatterplot of wt vs disp")
hist(wt, main="Histogram of wt")
boxplot(wt, main="Boxplot of wt")
detach(mtcars)
To reset the layout without clearing session info, run
par(mfrow=c(1,1))
Another example
Just run par(mfrow=c(2,2))
initally
Is there another way to plot this, other then by using base graphics?
Base graphics code
plot(cars,
main="Speed and Stopping Distances of Cars")
grid()
x<-cars$speed
y<-cars$dist
lm<-lm(y~x)
abline(lm, lty=3, col="blue")
pred<-predict(lm, interval = "confidence", level = 0.95)
lines(x,pred[,2],col="red",lty=2) #add lower CI band
lines(x,pred[,3],col="red",lty=2) #add upper CI band
and plot
Yes
We can use the package ggplot2
like this...
using ggplot
...
d<-ggplot(cars,aes(x=speed,y=dist))
d<-d + geom_point()
d<-d + geom_smooth(method="lm")
d<-d + stat_smooth(method="lm")
d<-d + ggtitle("Speed and Stopping Distances of Cars")
d
plot
or this
using qplot
...
qplot(speed, dist, data = cars, geom=c("point", "smooth"),
method="lm",main="Speed and Stopping Distances of Cars")
plot
ggplot2
One of the most popular R packages
Based on the "Grammar of Graphics" (Wilkinson, 1999), a general scheme for data visualization which breaks up graphs into semantic components such as scales and layers
Like all R packages, a vast amount of help exists online
Why use ggplot2 over base graphics?
Install ggplot2
install.packages("ggplot2")
Load ggplot2
library(ggplot2)
Let's see what data sets are included in ggplot2
data(package = "ggplot2")
Type ?diamonds
to see a brief description of this dataset.
Two ways to plot in ggplot2
qplot
ggplot
qplot
qplot
...
is short for quick plot.
concisely generates a plot with little code
uses similar syntax to the base plot
command
doesn't allow for full control over all graphical parameters as `ggplot
does
Let's take a smaller subset from the diamonds dataset..
library(ggplot2)
set.seed(16)
data<-diamonds[sample(nrow(diamonds),500),]
head(data)
## carat cut color clarity depth table price x y z
## 36847 0.37 Ideal F VS2 61.1 57 957 4.67 4.63 2.84
## 13168 1.00 Premium E SI1 60.7 59 5445 6.38 6.41 3.88
## 24279 2.22 Fair G SI2 64.4 58 12508 8.32 8.15 5.30
## 12376 0.36 Ideal E SI1 60.9 57 597 4.58 4.62 2.80
## 46575 0.56 Ideal E SI1 62.8 58 1784 5.31 5.26 3.32
## 16785 1.31 Very Good I SI1 62.6 59 6686 6.97 6.86 4.33
using data from objects
qplot(data$cut, data$carat)
in data frames
qplot(carat, price, data = data)
add 95% confidence intervals
qplot(carat, price, data = data,geom=c("point", "smooth"), method="lm")
against additional factor by shape
qplot(carat, price, data = data, shape=cut)
against additional factor by size
data2<-data[sample(nrow(data),50),] #even smaller subset
qplot(carat, price, data = data2, size=cut)
against additional factor by color
qplot(carat, price, data = data, color=clarity)
add 95% confidence intervals per addtional factor
qplot(carat, price, data = data, color=clarity,geom=c("point", "smooth"), method="lm")
qplot
basic histogram
qplot(carat, data = data,
geom="histogram")
change bin width
qplot(carat, data = data,
geom="histogram", binwidth = 0.3)
Fill by additional factor
qplot(carat, data = data, fill=clarity,
geom="histogram", binwidth = 0.1)
Unstack bars: use position="dodge"
qplot(carat, data = data, fill=clarity,
geom="histogram", binwidth = 0.1, position="dodge")
Adjust bin width
qplot(carat, data = data, fill=clarity,
geom="histogram", binwidth = 1, position="dodge")
Probability density
qplot(carat, data = data, fill=clarity,
geom="density")
Set transparancy using alpha
qplot(carat, data = data, fill=clarity,
geom="density",alpha=0.1)
qplot
basic boxplot
qplot(cut, price, data = data, geom="boxplot")
Fill by the factor and get a quick legend
qplot(cut, price, data = data, geom="boxplot",fill=cut)
Set fixed fill color - I("color")
qplot(cut, price, data = data, geom="boxplot",fill=I("red"))
or a color palette
qplot(cut, price, data = data, geom="boxplot",fill=I(topo.colors(5)))
Fill by an additional factor
qplot(cut, price, data = data, geom="boxplot",fill=clarity)
Notice that color
only applies to the box borders
qplot(cut, price, data = data, geom="boxplot",color=clarity)
Add data points jittered to reduce overplotting
qplot(cut, price, data = data, geom=c("boxplot","jitter"),fill=cut)
qplot
Facets are used for conditioning plots by one or two variables.
use facets=~a
for one variable (a)
qplot(carat, data = data, fill=clarity,facets=~color,
geom="density",alpha=0.1)
use facets=a~b
for two variables (a and b)
qplot(carat, data = data, fill=clarity,facets=color~cut,
geom="density",alpha=0.1)
scatter plot
qplot(carat, price, data = data, color=clarity,facets=color~cut)
boxplot
qplot(cut, price, data = data, geom="boxplot",fill=cut,facets=~color)
ggplot
How is the plot is constructed using ggplot
?
Data
Create the object, which is always a data frame
ggplot(diamonds, aes(x=carat, y=price))
aes
are plot aesthetics: mapping of variables to various parts of the plot
Layers
Add layers
geom_point()
geom_xxx
are geometic objects, or plot types
A basic ggplot combines data with layers by using the +
sign
ggplot(diamonds, aes(x=carat, y=price))+geom_point()
Additional layers
Facets
Conditioning on variable(s)
facet_wrap(~clarity)
Scales
Control the mapping between data and aesthetics
scale_y_log10
ggtitle("Example")
ylab("price (USD)")
Adding the additional layers to the first example...
ggplot(diamonds)+geom_point(aes(x=carat, y=price,color=clarity))+facet_wrap(~cut)+scale_y_log10()+ggtitle("Example with more layers")+ylab("price (USD)")
Theme
Control non-data components of the plot
Starting point
d<-ggplot(data,aes(x=carat,y=price,color=clarity))
d<-d + geom_point()
d
Change axis labels
d<-ggplot(data,aes(x=carat,y=price,color=clarity))
d<-d + geom_point()
d<-d + labs(x="diamond carat", y="diamond price")
d
Add title
d<-ggplot(data,aes(x=carat,y=price,color=clarity))
d<-d + geom_point()
d<-d + labs(x="diamond carat", y="diamond price")
d<-d + labs(title="Carat-price relationship")
d
Modify title apperance
d<-ggplot(data,aes(x=carat,y=price,color=clarity))
d<-d + geom_point()
d<-d + labs(x="diamond carat", y="diamond price")
d<-d + labs(title="Carat-price relationship")
d<-d + theme(plot.title = element_text(size = rel(2), color = "blue"))
d
Black and white theme
d<-ggplot(data,aes(x=carat,y=price,color=clarity))
d<-d + geom_point()
d<-d + labs(x="diamond carat", y="diamond price")
d<-d + labs(title="Carat-price relationship")
d<-d + theme_bw()
d<-d + theme(plot.title = element_text(size = rel(2), color = "blue"))
d
Change legend location
d<-ggplot(data,aes(x=carat,y=price,color=clarity))
d<-d + geom_point()
d<-d + labs(x="diamond carat", y="diamond price")
d<-d + labs(title="Carat-price relationship")
d<-d + theme_bw()
d<-d + theme(plot.title = element_text(size = rel(2), color = "blue"))
d<-d + theme(legend.position = "bottom")
d
Additional themes are availible in the package ggthemes
Solarized theme
Theme and color and fill scales based on the Solarized palette
library(ggthemes)
d<-ggplot(data,aes(x=carat,y=price,color=clarity)) + geom_point()
d<-d + theme_solarized() + scale_colour_solarized("blue")
d
Solarized dark
library(ggthemes)
d<-ggplot(data,aes(x=carat,y=price,color=clarity)) + geom_point()
d<-d + theme_solarized(light = F) + scale_colour_solarized("red")
d
Inverse gray theme
library(ggthemes)
d<-ggplot(data,aes(x=carat,y=price,color=clarity)) + geom_point()
d<-d + theme_igray()
d
There's even a theme best described by the package author..
"For that classic ugly look and feel. For ironic purposes only. 3D bars and pies not included. Please never use this theme."
You guessed it, an Excel theme!
library(ggthemes)
ggplot(data,aes(clarity, fill = cut)) + geom_bar() + scale_fill_excel() + theme_excel()
In ggplot2, par(mfrow=c(nrows, ncols))
doesn't work to arrange multiple plots.
Use the gridExtra
package instead.
library(gridExtra)
library(ggthemes)
d<-ggplot(data,aes(x=carat,y=price,color=clarity)) + geom_point()
e<-d + theme_tufte()
f<-d + theme_solarized()
g<-d + theme_few()
grid.arrange(d,e,f,g,ncol=2,nrow=2)
Wrap-up
The base graphics in R are more then adequate in most cases to create whatever plot you wish
Use ggplot2 to make more refined plots
qplot
and ggplot
are two ways to plot in ggplot2
Another popular plotting package worth checking out is latticeExtra
.
Plotting resources
Quick-R is again a nice resource for base plotting
docs.ggplot2.org is great for all things ggplot2
Questions?
Next session: Interactivity