A well-crafted graph can help you make meaningful comparisons among thousands of pieces of information, extracting patterns not easily found through other methods. This is one area where R shines.
R includes different packages for plotting data. Two packages build directly on top of the graphics engine: the graphics package and the grid package. These represent two largely incompatible graphics systems and they divide the bulk of graphics functionality in R into two separate worlds.
The graphics package contains a wide variety of functions for plotting data. This chapter gives an overview of the graphics package. We’ll discuss grid packages, especially lattice and ggplot2, in Chapter 7.
In this chapter,
attach(mtcars)
plot(wt, mpg)
abline(lm(mpg~wt))
title("Regression of MPG on Weight")
detach(mtcars) mtcars. Then we open a graphics window and generates a scatter plot between automobile weight on the horizontal axis and miles per gallon on the vertical axis. We use the third statement to add a line of best fit and the fourth statement to add a title. Finally, we detaches the data frame.pdf("mygraph.pdf")
attach(mtcars)
plot(wt, mpg)
abline(lm(mpg~wt))
title("Regression of MPG on Weight")
detach(mtcars)
dev.off()pdf(), you can use the functions win.metafile(), png(), jpeg(), bmp(), tiff(), xfig(), and postscript() to save graphs in other formats. See chapter 1, section 7.2 for more details on sending graphic output to files.Via GUI
We discuss the methods in RStudio.On a Mac and window platform, select Export > Save as… from the plots window, and choose the format and location desired in the resulting dialog.
If you just want to copy the image, click Zoom, right-clicking on the plot zoom window, and select copy image, then you can paste it into an appropriate file type, such as a Word document.
Table 6.1
| Dosage | Response to Drug A | Response to Drug B |
|---|---|---|
| 20 | 16 | 15 |
| 30 | 20 | 18 |
| 40 | 27 | 25 |
| 45 | 40 | 31 |
| 60 | 60 | 40 |
dose <- c(20, 30, 40, 45, 60)
drugA <- c(16, 20, 27, 40, 60)
drugB <- c(15, 18, 25, 31, 40) # Figure 2
plot(dose, drugA, type="b") Plot() is a generic function, which can draw many types of objects, including vectors, tables, and time series. In this case, plot(x, y, type="b") places x on the horizontal axis and y on the vertical axis, plots the (x, y) data points, and connects them with line segments. The option type="b" indicates that both points and lines should be plotted.
Graphical parameters can be used to specify fonts, colors, line styles, axes, reference lines, and annotations. These values are specified for the settings either with a call to the par() function or as arguments to a specified graphics function such as plot().
lattice package and ggplot2 package in next chapter. (These two packages have their own methods for customizing a graph’s appearance.)optionname=value pairs directly to a high-level plotting function.
# Figure 3
plot(dose, drugA, type="b", lty=2, pch=17, lwd=3, cex=2)lty=2) and symbol for points to a solid triangle(pch=17)(rather than a solid line and an open circle by default, as you can see from Figure 2)lwd=3 means the line is three times wider than the default width and cex=2 means the plotting symbols are twice as large as the default size.Check parameters for symbols and lines in Table 6.2
Table 6.2
| Parameter | Description |
|---|---|
| type | Specifies what type of plot should be drawn(see Figure 6.4). |
| pch | Specifies the symbol to use when plotting points (see Figure 6.5). |
| cex | Specifies the symbol size. cex is a number indicating the amount by which plotting symbols should be scaled relative to the default. 1=default, 1.5 is 50% larger, 0.5 is 50% smaller, and so forth. |
| lty | Specifies the line type (see Figure 6.5). |
| lwd | Specifies the line width. lwd is expressed relative to the default (default=1). For example, lwd=2 generates a line twice as wide as the default. |
Figure 6.4
Figure 6.5
Table 6.3
| Parameter | Description |
|---|---|
| col | Default plotting color. Some functions (such as lines and pie) accept a vector of values that are recycled. For example, if col=c('red', 'blue')and three lines are plotted, the first line will be red, the second blue, and the third red. |
| col.axis | Color for axis text. |
| col.lab | Color for axis labels. |
| col.main | Color for titles. |
| col.sub | Color for subtitles. |
| fg | The plot’s foreground color. |
| bg | The plot’s background color. |
Table 6.4
| Parameter | Description| |
|---|---|
| cex | Number indicating the amount by which plotted text should be scaled relative to the default. 1=default, 1.5 is 50% larger, 0.5 is 50% smaller, etc. |
| cex.axis | Magnification of axis text relative to cex. |
| cex.lab | Magnification of axis labels relative to cex. |
| cex.main | Magnification of titles relative to cex. |
| cex.sub | Magnification of subtitles relative to cex. |
Table 6.5
| Parameter | Description |
|---|---|
| font | Integer specifying font to use for plotted text.. 1=plain, 2=bold, 3=italic, 4=bold italic, 5=symbol (in Adobe symbol encoding). |
| font.axis | Font for axis text. |
| font.lab | Font for axis labels. |
| font.main | Font for titles. |
| font.sub | Font for subtitles. |
| ps | Font point size (roughly 1/72 inch). The text size = ps*cex. |
| family | Font family for drawing text. Standard values are serif, sans, and mono. |
Table 6.6
| Parameter | Description |
|---|---|
| pin | Plot dimensions (width, height) in inches. |
| mai | Numerical vector indicating margin size where c(bottom, left, top, right) is expressed in inches. |
| mar | Numerical vector indicating margin size where c(bottom, left, top, right) is expressed in lines. The default is c(5, 4, 4, 2) + 0.1. |
The code par(pin=c(4,3), mai=c(1,.5, 1, .2)) produces graphs that are 4 inches wide by 3 inches tall, with a 1-inch margin on the bottom and top, a 0.5-inch margin on the left, and a 0.2-inch margin on the right.
# Figure 4
plot(dose, drugA, type="b",
col="red", lty=2, pch=2, lwd=2,
main="Clinical Trials for Drug A",
sub="This is hypothetical data",
xlab="Dosage", ylab="Drug Response",
xlim=c(0, 60), ylim=c(0, 70))However, not all functions allow you to add these options. See the help for the function of interest to see what options are accepted. In this case, possible options are to add further output to the plot using low-level graphics functions.
Note: Some high-level plotting functions include default titles and labels. They can be removed by adding ann=FALSE in the plot() statement or in a separate par() statement(which we’ll discuss later).
Titles
title(main="main title", sub="sub-title", xlab="x-axis label", ylab="y-axis label") title(main="My Title", col.main="red",
sub="My Sub-title", col.sub="blue",
xlab="My X label", ylab="My Y label",
col.lab="green", cex.lab=0.75)axis(side, at=, labels=, pos=, lty=, col=, las=, tck=, ...) | Parameter | Description |
|---|---|
| side | An integer indicating the side of the graph to draw the axis (1=bottom, 2=left, 3=top, 4=right). |
| at | A numeric vector indicating where tick marks should be drawn.| |
| labels | A character vector of labels to be placed at the tick marks (if NULL, the at values will be used). |
| pos | The coordinate at which the axis line is to be drawn (that is, the value on the other axis where it crosses). |
| lty | Line type. |
| col | The line and tick mark color. |
| las | Labels are parallel (=0) or perpendicular (=2) to the axis. |
| tck | Length of tick mark as a fraction of the plotting region (a negative number is outside the graph, a positive number is inside, 0 suppresses ticks, 1 creates gridlines); the default is 0.01. |
| (…) | Other graphical parameters. |
Reference lines
abline(h=yvalues, v=xvalues) Legend
legend(location, title, legend, ...) Example
# Figure 5
#Specify data
x <- c(1:10)
y <- x
z <- 10/x
opar <- par(no.readonly=TRUE)
#Increase margins
par(mar=c(5, 4, 4, 8) + 0.1)
#Plot x versus y
plot(x, y, type="b",
pch=21, col="red",
yaxt="n", lty=3, ann=FALSE)
#Add x versus 1/x line
lines(x, z, type="b", pch=22, col="blue", lty=2)
#Draw your axes
axis(2, at=x, labels=x, col.axis="red", las=2)
axis(4, at=z, labels=round(z, digits=2),
col.axis="blue", las=2, cex.axis=0.7, tck=-.01)
#Add titles and text
mtext("y=1/x", side=4, line=3, cex.lab=1, las=2, col="blue")
title("An Example of Creative Axes",
xlab="X values",
ylab="Y=X")
par(opar) par()plot(), these options are only in effect for that specific graph.plot(dose, drugA, type="b", lty=2, pch=17,lwd=3, cex=2, col="red")
plot(dose, drugB, type="b", lty=2, pch=17,lwd=3, cex=2, col="blue")Typing the same parameters several time makes the code tedious and error prone. Besides, as we mentioned before, not all plotting functions allow you to specify all possible graphical parameters in this way. Alternatively, we can use par() function.
The par() function sets values that will be in effect for the rest of the session or until they’re changed. The format is par(optionname=value, optionname=value, ...). Adding the no.readonly=TRUE option produces a list of current graphical settings that can be modified.
Continuing our example, we write the code with par()
# Figure 7
opar <- par(no.readonly=TRUE)
par(lty=2, pch=17,lwd=3, cex=2)
plot(dose, drugA, type="b", col="red")
plot(dose, drugB, type="b", col="blue")
par(opar) nutshell package:# If you didn't install the nutshell packages, first type
#install.packages("nutshell")
library(nutshell)
data(toxins.and.cancer) plot() function. Let’s compare the overall cancer rate (number of cancer deaths divided by state population) to the presence of toxins (total toxic chemicals release divided by state area):# Figure 8
attach(toxins.and.cancer)
plot(total_toxic_chemicals/Surface_Area,deaths_total/Population)Suppose that you wanted to know which states were associated with which points. R provides some interactive tools for identifying points on plots.
We can use the locator function to tell us the coordinates of a specific point (or set of points). To do this, first plot the data. Next, type locator(1). Then click on a point in the open graphics window. You would see results like this:
locator(1)$x
[1] 0.0005163023
$y
[1] 0.002481683
Another useful function for identifying points is identify(). We pass in three arguments to identify(): the x-axis variable, the y-axis variable, and the variable whose values we would like to see printed for each point. Then clicking on a given point in the plot will cause R to print the value of the variable of interest. You can click finish to exit the function. The numbers printed under the identify() function correspond to the rows for the selected points.
To label all of the points at once, you could use the text function to add labels to the plot.
# Figure 9
plot(air_on_site/Surface_Area, deaths_lung/Population,
xlab="Air Release Rate of Toxic Chemicals",
ylab="Lung Cancer Death Rate")
text(air_on_site/Surface_Area, deaths_lung/Population,
labels=State_Abbrev,
cex=0.5,
pos=4)| Parameter | Description |
|---|---|
| location | Location can be an x,y coordinate. Alternatively, the text can be placed interactively via mouse by specifying location as locator(1). |
| pos | Position relative to location. 1 = below, 2 = left, 3 = above, 4 = right. If you specify pos, you can specify offset= in percent of character width. |
| side | Which margin to place text in, where 1 = bottom, 2 = left, 3 = top, 4 = right. You can specify line= to indicate the line in the margin starting with 0 (closest to the plot area) and moving out. You can also specify adj=0 for left/bottom alignment or adj = 1 for top/right alignment. |
If you have a data frame with n different variables and you would like to generate a scatter plot for each pair of values in the data frame, try the pairs function. As an example, let’s plot the hits, runs, strikeouts, walks, and home runs for each Major League Baseball (MLB) player who had more than 100 at bats in 2008.
# Figure 10
library(nutshell)
data(batting.2008)
pairs(batting.2008[batting.2008$AB>100, c("H","R","SO","BB","HR")]) plot(x, y = NULL, plot.type = c("multiple", "single"),
xy.labels, xy.lines, panel = lines, nc, yax.flip = FALSE,
mar.multi = c(0, 5.1, 0, if(yax.flip) 5.1 else 2.1),
oma.multi = c(6, 0, 5, 0), axes = TRUE, ...) The arguments x and y specify ts objects, panel specifies how to plot the time series (by default, lines), and other arguments specify how to break time series into different plots (as in lattice). As an example, we’ll plot the turkey price data:
# figure 11
library(nutshell)
data(turkey.price.ts)
plot(turkey.price.ts)As you can see, turkey prices are very seasonal. There are huge sales in November and December (for Thanksgiving and Christmas) and minor sales in spring (probably for Easter).
Another way to look at seasonal effects is with an autocorrelation plot. You can also plot the autocorrelation function for a time series. The plot is generated by default when you call acf.
# Figure 12
acf(turkey.price.ts) barplot function.library(nutshell)
data(doctorates) # Figure 13
# make this into a matrix:
doctorates.m <- as.matrix(doctorates[2:7])
rownames(doctorates.m) <- doctorates[,1]
doctorates.m## engineering science education health humanities other
## 2001 5323 20643 6436 1591 5213 2159
## 2002 5511 20017 6349 1541 5178 2141
## 2003 5079 19529 6503 1654 5051 2209
## 2004 5280 20001 6643 1633 5020 2180
## 2005 5777 20498 6635 1720 5013 2480
## 2006 6425 21564 6226 1785 4949 2436
Here we have used some statements discussed in the former chapters, check it if you feel confused.
Let’s start by just showing a bar plot of doctorates in 2001 by type:
# Figure 14
barplot(doctorates.m[1,]) # Figure 15
barplot(doctorates.m,beside=TRUE,horiz=TRUE,legend=TRUE,cex.names=.55,
col=c("yellow","gray","orange","blue","red","purple"))# Figure 16
barplot(t(doctorates.m),legend=TRUE,ylim=c(0,66000))beside horiz col, you can guess how they are used by comparing the two charts above.help if you want more details about barplot parameters.graphic package.# Figure 17
boxplot(mtcars$wt, main="Boxplot of wt") # Figure 18
domestic.catch.2006 <-c(7752,1166,463,1108)
names(domestic.catch.2006) <-c("one","two","three","four")
pie(domestic.catch.2006,init.angle=90)# Figure 19
x=seq(-pi,pi,length =50)
y=x
f=outer(x,y,function (x,y)cos(y)/(1+x^2))
fa=(f-t(f))/2
persp(x,y,fa,theta=20,phi=30) # Figure 20
image(x,y,fa) # Figure 21
contour(x,y,f) par() or layout() function.par() function, you can include the graphical parameter mfrow=c(nrows, ncols) to create a matrix of nrows x ncols plots that are filled in by row. Alternatively, you can use mfcol=c(nrows, ncols) to fill the matrix by columns.# Figure 22
attach(mtcars)
opar <- par(no.readonly=TRUE)
par(mfrow=c(2,2))
plot(wt,mpg, main="Scatterplot of wt vs. mpg")
plot(wt,disp, main="Scatterplot of wt vs disp")
barplot(wt, main="Barplot of wt")
boxplot(wt, main="Boxplot of wt")
par(opar)
detach(mtcars) layout() function has the form layout(mat) where mat is a matrix object specifying the location of the multiple plots to combine. In the following code, one figure is placed in row 1 and two figures are placed in row 2:# Figure 24
attach(mtcars)
layout(matrix(c(1,1,2,3), 2, 2, byrow = TRUE))
plot(wt,mpg)
plot(wt,disp)
barplot(wt)
detach(mtcars) Exercises 1: Use the data of Dose and DrugB and add correct graphical parameters to your function to get the following graph. Then save the graph as pdf.
Exercise 2: The MASS library contains the Boston data set, which records medv (median house value) for 506 neighborhoods around Boston. Draw a plot to find the relationship between medv and lstat (percent of households with low socioeconomic status). Find out the age (average age of houses) of the left lower point. (You can use names to see all the variables names in the dataset if you are interested.)
Exercise 3: Use the data of mtcars and drugs , plot the following graph.(In the second graph, the red text are rownames of mtcars.)