1 An overview of R graphics

A well-crafted graph can help you make meaningful comparisons among thousands of pieces of information, extracting patterns not easily found through other methods. This is one area where R shines.

R includes different packages for plotting data. Two packages build directly on top of the graphics engine: the graphics package and the grid package. These represent two largely incompatible graphics systems and they divide the bulk of graphics functionality in R into two separate worlds.

The graphics package contains a wide variety of functions for plotting data. This chapter gives an overview of the graphics package. We’ll discuss grid packages, especially lattice and ggplot2, in Chapter 7.

In this chapter,

  • We first review general methods for working with graphs.
    • How to create and save graphs.
    • How to modify the features that are found in any graph.
  • Then, we’ll focus on specific types of graphs, including
    • Scatter Plots
    • Time Series
    • Bar Charts
    • Other Plots
  • Finally, we’ll investigate ways to combine multiple graphs into one overall graph.

2 Working with graphics

2.1 Create and save graphs

  • Consider the following five lines:
attach(mtcars) 
plot(wt, mpg) 
abline(lm(mpg~wt)) 
title("Regression of MPG on Weight") 
detach(mtcars) 
  • First we attach the data frame mtcars. Then we open a graphics window and generates a scatter plot between automobile weight on the horizontal axis and miles per gallon on the vertical axis. We use the third statement to add a line of best fit and the fourth statement to add a title. Finally, we detaches the data frame.
  • In R, graphs are typically created in this interactive fashion.

  • You can save your graphs via code or through GUI menus.
  • Using code
    • To save a graph via code, sandwich the statements that produce the graph between a statement that sets a destination and a statement that closes that destination.
    pdf("mygraph.pdf")
     attach(mtcars)
     plot(wt, mpg)
     abline(lm(mpg~wt))
     title("Regression of MPG on Weight")
     detach(mtcars)
    dev.off()
    • In addition to pdf(), you can use the functions win.metafile(), png(), jpeg(), bmp(), tiff(), xfig(), and postscript() to save graphs in other formats. See chapter 1, section 7.2 for more details on sending graphic output to files.
  • Via GUI

    We discuss the methods in RStudio.
    • On a Mac and window platform, select Export > Save as… from the plots window, and choose the format and location desired in the resulting dialog.

    • If you just want to copy the image, click Zoom, right-clicking on the plot zoom window, and select copy image, then you can paste it into an appropriate file type, such as a Word document.

2.2 Using graphical parameters

  • Let’s start with a simple example. The following dataset describes patient response to two drugs at five dosage levels.

Table 6.1

Dosage Response to Drug A Response to Drug B
20 16 15
30 20 18
40 27 25
45 40 31
60 60 40
  • This data can be input from keyboard.
dose  <- c(20, 30, 40, 45, 60) 
drugA <- c(16, 20, 27, 40, 60) 
drugB <- c(15, 18, 25, 31, 40) 
  • A simple line graph relating dose to response for drug A can be created by using
# Figure 2
plot(dose, drugA, type="b") 

Plot() is a generic function, which can draw many types of objects, including vectors, tables, and time series. In this case, plot(x, y, type="b") places x on the horizontal axis and y on the vertical axis, plots the (x, y) data points, and connects them with line segments. The option type="b" indicates that both points and lines should be plotted.

Graphical parameters can be used to specify fonts, colors, line styles, axes, reference lines, and annotations. These values are specified for the settings either with a call to the par() function or as arguments to a specified graphics function such as plot().

  • Note: The methods discussed here will work on all the graphs described in the class, with the exception of those created with the lattice package and ggplot2 package in next chapter. (These two packages have their own methods for customizing a graph’s appearance.)

2.2.1 Adding arguments to plot()

  • One way to specify graphical parameters is by providing the optionname=value pairs directly to a high-level plotting function.
    • (Not all high-level plotting functions allow you to specify all possible graphical parameters, but we’ll introduce this method first.)
    # Figure 3
    plot(dose, drugA, type="b", lty=2, pch=17, lwd=3, cex=2)

    • This statement changes the line type to dashed(lty=2) and symbol for points to a solid triangle(pch=17)(rather than a solid line and an open circle by default, as you can see from Figure 2)
    • lwd=3 means the line is three times wider than the default width and cex=2 means the plotting symbols are twice as large as the default size.
  • Check parameters for symbols and lines in Table 6.2

Table 6.2

Parameter Description
type Specifies what type of plot should be drawn(see Figure 6.4).
pch Specifies the symbol to use when plotting points (see Figure 6.5).
cex Specifies the symbol size. cex is a number indicating the amount by which plotting symbols should be scaled relative to the default. 1=default, 1.5 is 50% larger, 0.5 is 50% smaller, and so forth.
lty Specifies the line type (see Figure 6.5).
lwd Specifies the line width. lwd is expressed relative to the default (default=1). For example, lwd=2 generates a line twice as wide as the default.

Figure 6.4

Figure 6.4
Figure 6.5

Figure 6.5

  • To change colors, use the following parameters.

Table 6.3

Parameter Description
col Default plotting color. Some functions (such as lines and pie) accept a vector of values that are recycled. For example, if col=c('red', 'blue')and three lines are plotted, the first line will be red, the second blue, and the third red.
col.axis Color for axis text.
col.lab Color for axis labels.
col.main Color for titles.
col.sub Color for subtitles.
fg The plot’s foreground color.
bg The plot’s background color.
  • To specify text size, font and style

Table 6.4

Parameter Description|
cex Number indicating the amount by which plotted text should be scaled relative to the default. 1=default, 1.5 is 50% larger, 0.5 is 50% smaller, etc.
cex.axis Magnification of axis text relative to cex.
cex.lab Magnification of axis labels relative to cex.
cex.main Magnification of titles relative to cex.
cex.sub Magnification of subtitles relative to cex.

Table 6.5

Parameter Description
font Integer specifying font to use for plotted text.. 1=plain, 2=bold, 3=italic, 4=bold italic, 5=symbol (in Adobe symbol encoding).
font.axis Font for axis text.
font.lab Font for axis labels.
font.main Font for titles.
font.sub Font for subtitles.
ps Font point size (roughly 1/72 inch). The text size = ps*cex.
family Font family for drawing text. Standard values are serif, sans, and mono.
  • To control the plot dimensions and margin sizes

Table 6.6

Parameter Description
pin Plot dimensions (width, height) in inches.
mai Numerical vector indicating margin size where c(bottom, left, top, right) is expressed in inches.
mar Numerical vector indicating margin size where c(bottom, left, top, right) is expressed in lines. The default is c(5, 4, 4, 2) + 0.1.
  • The code par(pin=c(4,3), mai=c(1,.5, 1, .2)) produces graphs that are 4 inches wide by 3 inches tall, with a 1-inch margin on the bottom and top, a 0.5-inch margin on the left, and a 0.2-inch margin on the right.

    • To include axis and text options.
    # Figure 4
    plot(dose, drugA, type="b",
         col="red", lty=2, pch=2, lwd=2,
         main="Clinical Trials for Drug A",
         sub="This is hypothetical data",
         xlab="Dosage", ylab="Drug Response",
         xlim=c(0, 60), ylim=c(0, 70))

2.2.2 Using low-level graphics functions

However, not all functions allow you to add these options. See the help for the function of interest to see what options are accepted. In this case, possible options are to add further output to the plot using low-level graphics functions.

  • Note: Some high-level plotting functions include default titles and labels. They can be removed by adding ann=FALSE in the plot() statement or in a separate par() statement(which we’ll discuss later).

  • Titles

title(main="main title", sub="sub-title", xlab="x-axis label", ylab="y-axis label") 
title(main="My Title", col.main="red", 
      sub="My Sub-title", col.sub="blue", 
      xlab="My X label", ylab="My Y label", 
      col.lab="green", cex.lab=0.75)
  • Axes
axis(side, at=, labels=, pos=, lty=, col=, las=, tck=, ...) 
Parameter Description
side An integer indicating the side of the graph to draw the axis (1=bottom, 2=left, 3=top, 4=right).
at A numeric vector indicating where tick marks should be drawn.|
labels A character vector of labels to be placed at the tick marks (if NULL, the at values will be used).
pos The coordinate at which the axis line is to be drawn (that is, the value on the other axis where it crosses).
lty Line type.
col The line and tick mark color.
las Labels are parallel (=0) or perpendicular (=2) to the axis.
tck Length of tick mark as a fraction of the plotting region (a negative number is outside the graph, a positive number is inside, 0 suppresses ticks, 1 creates gridlines); the default is 0.01.
(…) Other graphical parameters.
  • Reference lines

    abline(h=yvalues, v=xvalues) 
  • Legend

    legend(location, title, legend, ...) 
  • Example

    # Figure 5
    #Specify data
    x <- c(1:10) 
    y <- x 
    z <- 10/x 
    
    opar <- par(no.readonly=TRUE)
    
    #Increase margins
    par(mar=c(5, 4, 4, 8) + 0.1) 
    
    #Plot x versus y
    plot(x, y, type="b",      
     pch=21, col="red", 
     yaxt="n", lty=3, ann=FALSE) 
    
    #Add x versus 1/x line
    lines(x, z, type="b", pch=22, col="blue", lty=2)
    
    #Draw your axes
    axis(2, at=x, labels=x, col.axis="red", las=2) 
    
    axis(4, at=z, labels=round(z, digits=2),      
     col.axis="blue", las=2, cex.axis=0.7, tck=-.01) 
    
    #Add titles and text
    mtext("y=1/x", side=4, line=3, cex.lab=1, las=2, col="blue") 
    
    title("An Example of Creative Axes",       
      xlab="X values",       
      ylab="Y=X")  
    
    par(opar) 

2.2.3 Function par()

  • By adding parameters to plot(), these options are only in effect for that specific graph.
  • In this case, to plot two graghs with the same parameters we have to type:
plot(dose, drugA, type="b", lty=2, pch=17,lwd=3, cex=2, col="red")
plot(dose, drugB, type="b", lty=2, pch=17,lwd=3, cex=2, col="blue")
  • Typing the same parameters several time makes the code tedious and error prone. Besides, as we mentioned before, not all plotting functions allow you to specify all possible graphical parameters in this way. Alternatively, we can use par() function.

  • The par() function sets values that will be in effect for the rest of the session or until they’re changed. The format is par(optionname=value, optionname=value, ...). Adding the no.readonly=TRUE option produces a list of current graphical settings that can be modified.

  • Continuing our example, we write the code with par()

# Figure 7
opar <- par(no.readonly=TRUE) 
par(lty=2, pch=17,lwd=3, cex=2)
plot(dose, drugA, type="b", col="red") 
plot(dose, drugB, type="b", col="blue") 
par(opar) 
  • The first statement makes a copy of the current settings. The second statement changes the default parameters. We then generate the plot and restore the original settings.

3 Specific Types of Graphs

3.1 Scatter Plots

  • From this section, we’ll introduce some specific types of graphs in graphics package.
  • To show how to use scatter plots, we will look at cases of cancer in 2008 and toxic waste releases by state in 2006. Data on new cancer cases (and deaths from cancer) are tabulated by the American Cancer Society; information on toxic chemicals released into the environment is tabulated by the U.S. Environmental Protection Agency (EPA).
  • The sample data is included in the nutshell package:
# If you didn't install the nutshell packages, first type
#install.packages("nutshell") 
library(nutshell)
data(toxins.and.cancer) 
  • To show a scatter plot, we can also use the plot() function. Let’s compare the overall cancer rate (number of cancer deaths divided by state population) to the presence of toxins (total toxic chemicals release divided by state area):
# Figure 8
attach(toxins.and.cancer)
plot(total_toxic_chemicals/Surface_Area,deaths_total/Population)

  • Suppose that you wanted to know which states were associated with which points. R provides some interactive tools for identifying points on plots.

    • We can use the locator function to tell us the coordinates of a specific point (or set of points). To do this, first plot the data. Next, type locator(1). Then click on a point in the open graphics window. You would see results like this:

      locator(1)

      $x

      [1] 0.0005163023

      $y

      [1] 0.002481683

    • Another useful function for identifying points is identify(). We pass in three arguments to identify(): the x-axis variable, the y-axis variable, and the variable whose values we would like to see printed for each point. Then clicking on a given point in the plot will cause R to print the value of the variable of interest. You can click finish to exit the function. The numbers printed under the identify() function correspond to the rows for the selected points.

    • To label all of the points at once, you could use the text function to add labels to the plot.

      # Figure 9
      plot(air_on_site/Surface_Area, deaths_lung/Population,
      xlab="Air Release Rate of Toxic Chemicals",
      ylab="Lung Cancer Death Rate") 
      
      text(air_on_site/Surface_Area, deaths_lung/Population,
      labels=State_Abbrev,
      cex=0.5,
      pos=4)

Parameter Description
location Location can be an x,y coordinate. Alternatively, the text can be placed interactively via mouse by specifying location as locator(1).
pos Position relative to location. 1 = below, 2 = left, 3 = above, 4 = right. If you specify pos, you can specify offset= in percent of character width.
side Which margin to place text in, where 1 = bottom, 2 = left, 3 = top, 4 = right. You can specify line= to indicate the line in the margin starting with 0 (closest to the plot area) and moving out. You can also specify adj=0 for left/bottom alignment or adj = 1 for top/right alignment.
  • If you have a data frame with n different variables and you would like to generate a scatter plot for each pair of values in the data frame, try the pairs function. As an example, let’s plot the hits, runs, strikeouts, walks, and home runs for each Major League Baseball (MLB) player who had more than 100 at bats in 2008.

    # Figure 10
    library(nutshell)
    data(batting.2008) 
    pairs(batting.2008[batting.2008$AB>100, c("H","R","SO","BB","HR")]) 

3.2 Plotting Time Series

  • R includes tools for plotting time series data. The plot function has a method for time series:
plot(x, y = NULL, plot.type = c("multiple", "single"),
    xy.labels, xy.lines, panel = lines, nc, yax.flip = FALSE,
    mar.multi = c(0, 5.1, 0, if(yax.flip) 5.1 else 2.1),
    oma.multi = c(6, 0, 5, 0), axes = TRUE, ...) 
  • The arguments x and y specify ts objects, panel specifies how to plot the time series (by default, lines), and other arguments specify how to break time series into different plots (as in lattice). As an example, we’ll plot the turkey price data:

    # figure 11
    library(nutshell) 
    data(turkey.price.ts)
    plot(turkey.price.ts)

    As you can see, turkey prices are very seasonal. There are huge sales in November and December (for Thanksgiving and Christmas) and minor sales in spring (probably for Easter).

  • Another way to look at seasonal effects is with an autocorrelation plot. You can also plot the autocorrelation function for a time series. The plot is generated by default when you call acf.

# Figure 12
acf(turkey.price.ts) 

  • As you can see, points are correlated over 12-month cycles (and inversely correlated over 6-month cycles). Time series analysis is discussed further in Chapter 9.

3.3 Bar Charts

  • To draw bar (or column) charts in R, use the barplot function.
  • As an example, let’s look at doctoral degrees awarded in the United States between 2001 and 2006.
library(nutshell) 
data(doctorates) 
  • The barplot function can’t work with a data frame, so let’s first transform this into a matrix for plotting:
# Figure 13
# make this into a matrix: 
doctorates.m <- as.matrix(doctorates[2:7]) 
rownames(doctorates.m) <- doctorates[,1] 
doctorates.m
##      engineering science education health humanities other
## 2001        5323   20643      6436   1591       5213  2159
## 2002        5511   20017      6349   1541       5178  2141
## 2003        5079   19529      6503   1654       5051  2209
## 2004        5280   20001      6643   1633       5020  2180
## 2005        5777   20498      6635   1720       5013  2480
## 2006        6425   21564      6226   1785       4949  2436
  • Here we have used some statements discussed in the former chapters, check it if you feel confused.

  • Let’s start by just showing a bar plot of doctorates in 2001 by type:

# Figure 14
barplot(doctorates.m[1,]) 

  • Suppose that we wanted to show all of the different years as bars stacked next to each other. Suppose that we also wanted the bars plotted horizontally and wanted to show a legend and different colors for the different years.
# Figure 15
barplot(doctorates.m,beside=TRUE,horiz=TRUE,legend=TRUE,cex.names=.55,
        col=c("yellow","gray","orange","blue","red","purple"))

  • Finally, suppose that we wanted to show doctorates by year as stacked bars. To do this, we need to transform the matrix so that each column is a year and each row is a discipline. We also need to make sure that there is enough room to see the legend, so we’ll extend the limits on the y-axis:
# Figure 16
barplot(t(doctorates.m),legend=TRUE,ylim=c(0,66000))

  • Though we haven’t introduced some arguments like beside horiz col, you can guess how they are used by comparing the two charts above.
  • See help if you want more details about barplot parameters.

3.4 Other Plots

  • Here we list some Other plots included in the graphic package.
  • boxplot()
# Figure 17
boxplot(mtcars$wt, main="Boxplot of wt") 

  • pie()
# Figure 18
domestic.catch.2006 <-c(7752,1166,463,1108)
names(domestic.catch.2006) <-c("one","two","three","four")
pie(domestic.catch.2006,init.angle=90)

  • persp()
# Figure 19
x=seq(-pi,pi,length =50)
y=x 
f=outer(x,y,function (x,y)cos(y)/(1+x^2)) 
fa=(f-t(f))/2
persp(x,y,fa,theta=20,phi=30) 

  • image()
# Figure 20
image(x,y,fa) 

  • contour()
# Figure 21
contour(x,y,f) 

4 Combining Graphs

  • R makes it easy to combine several graphs into one overall graph, using either the par() or layout() function.
  • With the par() function, you can include the graphical parameter mfrow=c(nrows, ncols) to create a matrix of nrows x ncols plots that are filled in by row. Alternatively, you can use mfcol=c(nrows, ncols) to fill the matrix by columns.
# Figure 22
attach(mtcars) 
opar <- par(no.readonly=TRUE) 
par(mfrow=c(2,2)) 
plot(wt,mpg, main="Scatterplot of wt vs. mpg")
plot(wt,disp, main="Scatterplot of wt vs disp") 
barplot(wt, main="Barplot of wt")
boxplot(wt, main="Boxplot of wt") 
par(opar) 
detach(mtcars) 

  • The layout() function has the form layout(mat) where mat is a matrix object specifying the location of the multiple plots to combine. In the following code, one figure is placed in row 1 and two figures are placed in row 2:
# Figure 24
attach(mtcars) 
layout(matrix(c(1,1,2,3), 2, 2, byrow = TRUE)) 
plot(wt,mpg)
plot(wt,disp) 
barplot(wt)
detach(mtcars) 

5 Exercises

  • Exercises 1: Use the data of Dose and DrugB and add correct graphical parameters to your function to get the following graph. Then save the graph as pdf.

  • Exercise 2: The MASS library contains the Boston data set, which records medv (median house value) for 506 neighborhoods around Boston. Draw a plot to find the relationship between medv and lstat (percent of households with low socioeconomic status). Find out the age (average age of houses) of the left lower point. (You can use names to see all the variables names in the dataset if you are interested.)

Exercise 3: Use the data of mtcars and drugs , plot the following graph.(In the second graph, the red text are rownames of mtcars.)

6 References

  • Kabacoff, R. I. (2011). “R in Action”. Manning Publications Co.
  • Adler, J. (2010). “R in a nutshell: A desktop quick reference”. O’Reilly Media, Inc.
  • James, G., Witten, D., Hastie, T. and Tibshirani, R. (2013). “An Introduction to Statistical Learning with Applications in R”. Springer.
  • Murrell, P. (2011). “R Graphics”. CRC Press.