Use R

Session 4: Plotting



Created by Robert Hempton


The goal of this lesson is to provide an overview of ways to plot in R

We will cover:

base graphics system

ggplot2 package

Base graphics

Batteries not included

With the base installation of R you have the ability to generate 2D graphics quickly.

Basically a two-step process:

1. Initialize a new plot

2. Add to an exisiting plot

But often you can customize upfront using arguments, making it one-step

Base functions

Include:

  • plot: make scatterplot
  • lines: add lines to plot
  • points: add points to plot
  • text: add text
  • title: add title to axes or plot
  • mtext: add margin text
  • axis: add axis tick/labels

Important parameters

The `par()' function is used for setting/overriding graphical control parameters. These include:

  • pch: plotting symbol
  • lty: line type
  • lwd: line width
  • col: plotting color
  • las: orientation of axis labels
  • bg: background color
  • mar: margin size
  • oma: outer margin size
  • mfrow: number of plots per row, column. Plots filled in row-wise.
  • mfcol: number of plots per row, column. Plots filled in column-wise.

Deconstruction of plot

plot(cars)

plot of chunk unnamed-chunk-1

By default, axis and annotations are TRUE or T

plot(cars,axes=F,ann=F)

plot of chunk unnamed-chunk-2

plot(cars,axes=T,ann=F)

plot of chunk unnamed-chunk-3

plot(cars,axes=T,ann=T)

plot of chunk unnamed-chunk-4

Notice this is equivalent to

plot(cars$speed,cars$dist,xlab="speed", ylab="dist")

Why we need to specify axis labels:

plot(cars$speed,cars$dist)

Adding to a plot

Initialize first,

Add a title

Add a grid

plot(cars)
title(main="Speed and Stopping Distances of Cars")
grid()

Or, in one step

plot(cars,xlab="speed (mph)", ylab="dist (ft)", 
     main="Speed and Stopping Distances of Cars",
     grid())

plot of chunk unnamed-chunk-9

Parameters

type

par(mfrow=c(2,2))
#plot 1
plot(cars,xlab="speed (mph)", ylab="dist (ft)", 
     main="Speed and Stopping Distances of Cars",
     type="p")
grid()
#plot 2
plot(cars,xlab="speed (mph)", ylab="dist (ft)", 
     main="Speed and Stopping Distances of Cars",
     type="l")
grid()
#plot 3
plot(cars,xlab="speed (mph)", ylab="dist (ft)", 
     main="Speed and Stopping Distances of Cars",
     type="b")
grid()
#plot 4
plot(cars,xlab="speed (mph)", ylab="dist (ft)", 
     main="Speed and Stopping Distances of Cars",
     type="o")
grid()

plot of chunk unnamed-chunk-11

line type (lty), weight (lwd), color (col)

par(mfrow=c(2,2))
#plot 1
plot(cars,xlab="speed (mph)", ylab="dist (ft)", 
     main="lty=2",
     type="l",
     lty=2,
     col="red")
grid()
#plot 2
plot(cars,xlab="speed (mph)", ylab="dist (ft)", 
     main="lty='dashed'",
     type="l",
     lty="dashed",
     col="red")
grid()
#plot 3
plot(cars,xlab="speed (mph)", ylab="dist (ft)", 
     main="lwd=3",
     type="l",
     col="red",
     lwd=3)
grid()
#plot 4
plot(cars,xlab="speed (mph)", ylab="dist (ft)", 
     main="type='p',lwd=3",
     type="p",
     col="blue",
     lwd=3)
grid()

plot of chunk unnamed-chunk-13

Plotting character (pch) and size (cex)

par(mfrow=c(2,2))
#plot 1
plot(cars,xlab="speed (mph)", ylab="dist (ft)", 
     main="pch=2",
     pch=2)
grid()
#plot 2
plot(cars,xlab="speed (mph)", ylab="dist (ft)", 
     main="pch=3",
     pch=3)
#plot 3
plot(cars,xlab="speed (mph)", ylab="dist (ft)", 
     main="pch=21, cex=2",
     pch=21,
     cex=2)
grid()
#plot 4
plot(cars,xlab="speed (mph)", ylab="dist (ft)", 
     main="bg='yellow'",
     pch=21,
     col="red",
     bg="yellow",
     cex=2)
grid()

plot of chunk unnamed-chunk-15

Add regression line

The lm() funtion is used to fit linear models.

Model coeffients and statisitics can be extracted for display.

Confidence intervals can be calculated and plotted.

Step through the example below:

plot(cars,xlab="speed (mph)", ylab="dist (ft)",
     main="Speed and Stopping Distances of Cars",
     pch=2)
grid()
x<-cars$speed
y<-cars$dist
lm<-lm(y~x) #define linear model
abline(lm, lty=3, col="blue") #draw regression line
lm<-summary(lm) #extract the r-squared value
names(lm)
r2<-lm$adj.r.squared
lm$coefficients 
p<-lm$coefficients[2,4] #extract the p value
mtext(col="blue",side=3,
      bquote(italic(R)^2 == .(format(r2, digits = 3))))
lm<-lm(y~x)
pred<-predict(lm, interval = "confidence", level = 0.95)
lines(x,pred[,2],col="red",lty=2) #add lower CI band
lines(x,pred[,3],col="red",lty=2) #add upper CI band

plot of chunk unnamed-chunk-18

predict() can be used for point estimates:

predict(lm,data.frame(x=21.5),interval = "confidence")
##     fit   lwr   upr
## 1 66.97 60.25 73.68

Box Plots

Use boxplot(). Graphical parameters are similar to plot()

  • Tip: The attach() function is useful to point to an R object once and save typing. Remember to detach() the object when finished!
attach(chickwts)
boxplot(weight~feed,
        col=topo.colors(6), # a color palette
        horizontal=F,
        main="Chick weight by feed type", 
        xlab="feed type",
        ylab="weight, g",
        notch=F,
        boxwex = 0.3)
grid(NA,6)  #add horizontal gridlines only
detach(chickwts)

Using a subset of the data

attach(chickwts)
boxplot(weight~feed,
        subset(chickwts,weight > 250),
        col=topo.colors(6), # a color palette
        horizontal=F,
        main="Chick weight by feed type\n Weights > 250 g",
        xlab="feed type",
        ylab="weight, g",
        boxwex = 0.4)

plot of chunk unnamed-chunk-21

detach(chickwts)

T-test

The t.test() function provides a variety of t-tests.

By default it assumes unequal variances.

Let's subset data from the last box plot and perform a t-test.

attach(chickwts)
set1<-"soybean"
set2<-"sunflower"
set1<-subset(weight,feed==set1)
set2<-subset(weight,feed==set2)
tt<-t.test(set1,set2,alternative='two.sided', conf.level=.95, 
       var.equal=TRUE)
tt
## 
##  Two Sample t-test
## 
## data:  set1 and set2
## t = -4.05, df = 24, p-value = 0.0004641
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -124.52  -40.45
## sample estimates:
## mean of x mean of y 
##     246.4     328.9
p<-format(tt$p.value, digits = 3)
p
## [1] "0.000464"
detach(chickwts)

Histograms

hist() plots a histogram

hist(mtcars$hp)

Arguments

breaks breakpoints between histogram cells
freq logical. If F probability densities are plotted
col fill color for bars
border border color around bars
labels logical, or character if not F

Examples

hist(mtcars$hp,col="blue",freq=F)

plot of chunk unnamed-chunk-24

hist(mtcars$hp,col="blue",breaks=20)

plot of chunk unnamed-chunk-25

hist(mtcars$hp,col="yellow",border="red",labels=T)

plot of chunk unnamed-chunk-26

Plot layout

Use mfrow or mfcol to control multiple plot layout.

# 4 figures arranged in 2 rows and 2 columns
attach(mtcars)
par(mfrow=c(2,2))
plot(wt,mpg, main="Scatterplot of wt vs. mpg")
plot(wt,disp, main="Scatterplot of wt vs disp")
hist(wt, main="Histogram of wt")
boxplot(wt, main="Boxplot of wt")
detach(mtcars)

plot of chunk unnamed-chunk-28 To reset the layout without clearing session info, run par(mfrow=c(1,1))

Another example

Just run par(mfrow=c(2,2)) initally

plot of chunk unnamed-chunk-29

ggplot2

Is there another way to plot this, other then by using base graphics?

Base graphics code

plot(cars,
     main="Speed and Stopping Distances of Cars")
grid()
x<-cars$speed
y<-cars$dist
lm<-lm(y~x)
abline(lm, lty=3, col="blue")
pred<-predict(lm, interval = "confidence", level = 0.95)
lines(x,pred[,2],col="red",lty=2) #add lower CI band
lines(x,pred[,3],col="red",lty=2) #add upper CI band

and plot plot of chunk unnamed-chunk-31

Yes

We can use the package ggplot2

like this...

using ggplot...

d<-ggplot(cars,aes(x=speed,y=dist))
d<-d + geom_point()
d<-d + geom_smooth(method="lm")
d<-d + stat_smooth(method="lm")
d<-d + ggtitle("Speed and Stopping Distances of Cars")
d

plot plot of chunk unnamed-chunk-33

or this

using qplot...

qplot(speed, dist, data = cars, geom=c("point", "smooth"), 
      method="lm",main="Speed and Stopping Distances of Cars")

plot plot of chunk unnamed-chunk-35

ggplot2

One of the most popular R packages


Based on the "Grammar of Graphics" (Wilkinson, 1999), a general scheme for data visualization which breaks up graphs into semantic components such as scales and layers


Like all R packages, a vast amount of help exists online

Why use ggplot2 over base graphics?

  • Better looking plots than base R
  • Less code (or more systematic at least)
  • Easy facetting
  • Pre-loaded and customized themes, spatial data, pairs plots
  • Many plot types

Install ggplot2

install.packages("ggplot2")

Load ggplot2

library(ggplot2)

Let's see what data sets are included in ggplot2

data(package = "ggplot2")

Type ?diamonds to see a brief description of this dataset.

Two ways to plot in ggplot2

qplot

ggplot

qplot

qplot...

is short for quick plot.

concisely generates a plot with little code

uses similar syntax to the base plot command

doesn't allow for full control over all graphical parameters as `ggplot does

Let's take a smaller subset from the diamonds dataset..

library(ggplot2)
set.seed(16)
data<-diamonds[sample(nrow(diamonds),500),] 
head(data)
##       carat       cut color clarity depth table price    x    y    z
## 36847  0.37     Ideal     F     VS2  61.1    57   957 4.67 4.63 2.84
## 13168  1.00   Premium     E     SI1  60.7    59  5445 6.38 6.41 3.88
## 24279  2.22      Fair     G     SI2  64.4    58 12508 8.32 8.15 5.30
## 12376  0.36     Ideal     E     SI1  60.9    57   597 4.58 4.62 2.80
## 46575  0.56     Ideal     E     SI1  62.8    58  1784 5.31 5.26 3.32
## 16785  1.31 Very Good     I     SI1  62.6    59  6686 6.97 6.86 4.33

using data from objects

qplot(data$cut, data$carat)

plot of chunk unnamed-chunk-40

in data frames

qplot(carat, price, data = data)

plot of chunk unnamed-chunk-41

add 95% confidence intervals

qplot(carat, price, data = data,geom=c("point", "smooth"), method="lm")

plot of chunk unnamed-chunk-42

against additional factor by shape

qplot(carat, price, data = data, shape=cut)

plot of chunk unnamed-chunk-43

against additional factor by size

data2<-data[sample(nrow(data),50),] #even smaller subset
qplot(carat, price, data = data2, size=cut)

plot of chunk unnamed-chunk-44

against additional factor by color

qplot(carat, price, data = data, color=clarity)

plot of chunk unnamed-chunk-45

add 95% confidence intervals per addtional factor

qplot(carat, price, data = data, color=clarity,geom=c("point", "smooth"), method="lm")

plot of chunk unnamed-chunk-46

Histograms in qplot

basic histogram

qplot(carat, data = data,
  geom="histogram")

plot of chunk unnamed-chunk-47

change bin width

qplot(carat, data = data,
  geom="histogram", binwidth = 0.3)

plot of chunk unnamed-chunk-48

Fill by additional factor

qplot(carat, data = data, fill=clarity,
  geom="histogram", binwidth = 0.1)

plot of chunk unnamed-chunk-49

Unstack bars: use position="dodge"

qplot(carat, data = data, fill=clarity,
  geom="histogram", binwidth = 0.1, position="dodge")

plot of chunk unnamed-chunk-50

Adjust bin width

qplot(carat, data = data, fill=clarity,
  geom="histogram", binwidth = 1, position="dodge")

plot of chunk unnamed-chunk-51

Probability density

qplot(carat, data = data, fill=clarity,
  geom="density")

plot of chunk unnamed-chunk-52

Set transparancy using alpha

qplot(carat, data = data, fill=clarity,
  geom="density",alpha=0.1)

plot of chunk unnamed-chunk-53

Boxplots in qplot

basic boxplot

qplot(cut, price, data = data, geom="boxplot")

plot of chunk unnamed-chunk-54

Fill by the factor and get a quick legend

qplot(cut, price, data = data, geom="boxplot",fill=cut)

plot of chunk unnamed-chunk-55

Set fixed fill color - I("color")

qplot(cut, price, data = data, geom="boxplot",fill=I("red"))

plot of chunk unnamed-chunk-56

or a color palette

qplot(cut, price, data = data, geom="boxplot",fill=I(topo.colors(5)))

plot of chunk unnamed-chunk-57

Fill by an additional factor

qplot(cut, price, data = data, geom="boxplot",fill=clarity)

plot of chunk unnamed-chunk-58

Notice that color only applies to the box borders

qplot(cut, price, data = data, geom="boxplot",color=clarity)

plot of chunk unnamed-chunk-59

Add data points jittered to reduce overplotting

qplot(cut, price, data = data, geom=c("boxplot","jitter"),fill=cut)

plot of chunk unnamed-chunk-60

Facets in qplot

Facets are used for conditioning plots by one or two variables.

use facets=~a for one variable (a)

qplot(carat, data = data, fill=clarity,facets=~color,
  geom="density",alpha=0.1)

plot of chunk unnamed-chunk-61

use facets=a~b for two variables (a and b)

qplot(carat, data = data, fill=clarity,facets=color~cut,
  geom="density",alpha=0.1)

plot of chunk unnamed-chunk-62

scatter plot

qplot(carat, price, data = data, color=clarity,facets=color~cut)

plot of chunk unnamed-chunk-63

boxplot

qplot(cut, price, data = data, geom="boxplot",fill=cut,facets=~color)

plot of chunk unnamed-chunk-64

ggplot

How is the plot is constructed using ggplot?

Data
Create the object, which is always a data frame

ggplot(diamonds, aes(x=carat, y=price))

aes are plot aesthetics: mapping of variables to various parts of the plot

Layers
Add layers

geom_point()

geom_xxx are geometic objects, or plot types

A basic ggplot combines data with layers by using the + sign

ggplot(diamonds, aes(x=carat, y=price))+geom_point()

plot of chunk unnamed-chunk-67

Additional layers

Facets
Conditioning on variable(s)

facet_wrap(~clarity)

Scales
Control the mapping between data and aesthetics

scale_y_log10
ggtitle("Example")
ylab("price (USD)")

Adding the additional layers to the first example...

ggplot(diamonds)+geom_point(aes(x=carat, y=price,color=clarity))+facet_wrap(~cut)+scale_y_log10()+ggtitle("Example with more layers")+ylab("price (USD)")

plot of chunk unnamed-chunk-70

Theme

Control non-data components of the plot

Starting point

d<-ggplot(data,aes(x=carat,y=price,color=clarity))
d<-d + geom_point()
d

plot of chunk unnamed-chunk-71

Change axis labels

d<-ggplot(data,aes(x=carat,y=price,color=clarity))
d<-d + geom_point()
d<-d + labs(x="diamond carat", y="diamond price")
d

plot of chunk unnamed-chunk-72

Add title

d<-ggplot(data,aes(x=carat,y=price,color=clarity))
d<-d + geom_point()
d<-d + labs(x="diamond carat", y="diamond price")
d<-d + labs(title="Carat-price relationship")
d

plot of chunk unnamed-chunk-73

Modify title apperance

d<-ggplot(data,aes(x=carat,y=price,color=clarity))
d<-d + geom_point()
d<-d + labs(x="diamond carat", y="diamond price")
d<-d + labs(title="Carat-price relationship")
d<-d + theme(plot.title = element_text(size = rel(2), color = "blue"))
d

plot of chunk unnamed-chunk-74

Black and white theme

d<-ggplot(data,aes(x=carat,y=price,color=clarity))
d<-d + geom_point()
d<-d + labs(x="diamond carat", y="diamond price")
d<-d + labs(title="Carat-price relationship")
d<-d + theme_bw()
d<-d + theme(plot.title = element_text(size = rel(2), color = "blue"))
d

plot of chunk unnamed-chunk-75

Change legend location

d<-ggplot(data,aes(x=carat,y=price,color=clarity))
d<-d + geom_point()
d<-d + labs(x="diamond carat", y="diamond price")
d<-d + labs(title="Carat-price relationship")
d<-d + theme_bw()
d<-d + theme(plot.title = element_text(size = rel(2), color = "blue"))
d<-d + theme(legend.position = "bottom")
d

plot of chunk unnamed-chunk-76

Additional themes are availible in the package ggthemes

Solarized theme

Theme and color and fill scales based on the Solarized palette

library(ggthemes)
d<-ggplot(data,aes(x=carat,y=price,color=clarity)) + geom_point()
d<-d + theme_solarized() + scale_colour_solarized("blue")
d

plot of chunk unnamed-chunk-77

Solarized dark

library(ggthemes)
d<-ggplot(data,aes(x=carat,y=price,color=clarity)) + geom_point()
d<-d + theme_solarized(light = F) + scale_colour_solarized("red")
d

plot of chunk unnamed-chunk-78

Inverse gray theme

library(ggthemes)
d<-ggplot(data,aes(x=carat,y=price,color=clarity)) + geom_point()
d<-d + theme_igray()
d

plot of chunk unnamed-chunk-79

There's even a theme best described by the package author..


"For that classic ugly look and feel. For ironic purposes only. 3D bars and pies not included. Please never use this theme."


You guessed it, an Excel theme!

library(ggthemes)
ggplot(data,aes(clarity, fill = cut)) + geom_bar() + scale_fill_excel() + theme_excel()

plot of chunk unnamed-chunk-80

In ggplot2, par(mfrow=c(nrows, ncols)) doesn't work to arrange multiple plots.

Use the gridExtra package instead.

library(gridExtra)
library(ggthemes)
d<-ggplot(data,aes(x=carat,y=price,color=clarity)) + geom_point()
e<-d + theme_tufte()
f<-d + theme_solarized()
g<-d + theme_few()
grid.arrange(d,e,f,g,ncol=2,nrow=2)

plot of chunk unnamed-chunk-81

Wrap-up

The base graphics in R are more then adequate in most cases to create whatever plot you wish

Use ggplot2 to make more refined plots

qplot and ggplot are two ways to plot in ggplot2

Another popular plotting package worth checking out is latticeExtra.

Plotting resources

Quick-R is again a nice resource for base plotting

docs.ggplot2.org is great for all things ggplot2

Questions?


Next session: Interactivity