GESC 258- Geographical Research Methods

Basic Plots

We look at some of the ways R can display information graphically. This is a basic introduction to some of the basic plotting commands. It is assumed that you know how to enter data or read data files which is covered in the first chapter, and it is assumed that you are familiar with the different data types.

In each of the topics that follow it is assumed that two different data sets, w1.dat and trees91.csv have been read and defined using the same variables as in the first chapter. Both of these data sets come from the study discussed on the web site given in the first chapter. We assume that they are read using “read.csv” into variables w1 and tree:

w1 <- read.csv(file="https://www.cyclismo.org/tutorial/R/_static/w1.dat",sep=",",head=TRUE)
names(w1)

## [1] "vals"

tree <- read.csv(file="https://www.cyclismo.org/tutorial/R/_static/trees91.csv",sep=",",head=TRUE)
names(tree)

##  [1] "C"      "N"      "CHBR"   "REP"    "LFBM"   "STBM"   "RTBM"   "LFNCC" 
##  [9] "STNCC"  "RTNCC"  "LFBCC"  "STBCC"  "RTBCC"  "LFCACC" "STCACC" "RTCACC"
## [17] "LFKCC"  "STKCC"  "RTKCC"  "LFMGCC" "STMGCC" "RTMGCC" "LFPCC"  "STPCC" 
## [25] "RTPCC"  "LFSCC"  "STSCC"  "RTSCC"

Strip Charts

A strip chart is the most basic type of plot available. It plots the data in order along a line with each data point represented as a box. Here we provide examples using the w1 data frame mentioned at the top of this page, and the one column of the data is w1$vals.

To create a strip chart of this data use the stripchart command:

help(stripchart)

stripchart(w1$vals)

This is the most basic possible strip charts. The stripchart() command takes many of the standard plot() options for labeling and annotations.

As you can see this is about as bare bones as you can get. There is no title nor axes labels. It only shows how the data looks if you were to put it all along one line and mark out a box at each point. If you would prefer to see which points are repeated you can specify that repeated points be stacked:

stripchart(w1$vals,method="stack")

A variation on this is to have the boxes moved up and down so that there is more separation between them:

stripchart(w1$vals,method="jitter")

If you do not want the boxes plotting in the horizontal direction you can plot them in the vertical direction:

 stripchart(w1$vals,vertical=TRUE)

 stripchart(w1$vals,vertical=TRUE,method="jitter")

Since you should always annotate your plots there are many different ways to add titles and labels. One way is within the stripchart command itself:

 stripchart(w1$vals,method="stack",
             main='Leaf BioMass in High CO2 Environment',
             xlab='BioMass of Leaves')

If you have a plot already and want to add a title, you can use the title command:

Note that this simply adds the title and labels and will write over the top of any titles or labels you already have.

Histograms

A histogram is very common plot. It plots the frequencies that data appears within certain ranges. Here we provide examples using the w1 data frame mentioned at the top of this page, and the one column of data is w1$vals.

To plot a histogram of the data use the “hist” command:

 hist(w1$vals)

 hist(w1$vals,main="Distribution of w1",xlab="w1")

#### Histogram Options

Many of the basic plot commands accept the same options. The help(hist) command will give you options specifically for the hist command. You can also use the help command to see more but also note that if you use help(plot) you may see more options. Experiment with different options to see what you can do.

As you can see R will automatically calculate the intervals to use. There are many options to determine how to break up the intervals. Here we look at just one way, varying the domain size and number of breaks. If you would like to know more about the other options check out the help page:

help(hist)

You can specify the number of breaks to use using the breaks option. Here we look at the histogram for various numbers of breaks:

 hist(w1$vals,breaks=2)

 hist(w1$vals,breaks=4)

 hist(w1$vals,breaks=6)
 hist(w1$vals,breaks=8)

 hist(w1$vals,breaks=12)

You can also vary the size of the domain using the xlim option. This option takes a vector with two entries in it, the left value and the right value:

 hist(w1$vals,breaks=12,xlim=c(0,10))

 hist(w1$vals,breaks=12,xlim=c(-1,2))

 hist(w1$vals,breaks=12,xlim=c(0,2))

 hist(w1$vals,breaks=12,xlim=c(1,1.3))

 hist(w1$vals,breaks=12,xlim=c(0.9,1.3))

The options for adding titles and labels are exactly the same as for strip charts. You should always annotate your plots and there are many different ways to add titles and labels. One way is within the hist command itself:

 hist(w1$vals,
       main='Leaf BioMass in High CO2 Environment',
       xlab='BioMass of Leaves')

If you have a plot already and want to change or add a title, you can use the title command:


 title('Leaf BioMass in High CO2 Environment',xlab='BioMass of Leaves')

>Note that this simply adds the title and labels and will write over the top of any titles or labels you already have.

It is not uncommon to add other kinds of plots to a histogram. For example, one of the options to the stripchart command is to add it to a plot that has already been drawn. For example, you might want to have a histogram with the strip chart drawn across the top. The addition of the strip chart might give you a better idea of the density of the data:

 hist(w1$vals,main='Leaf BioMass in High CO2 Environment',xlab='BioMass of Leaves',ylim=c(0,16))
 stripchart(w1$vals,add=TRUE,at=15.5)

### Boxplots A boxplot provides a graphical view of the median, quartiles, maximum, and minimum of a data set. Here we provide examples using two different data sets. The first is the w1 data frame mentioned at the top of this page, and the one column of data is w1$vals. The second is the tree data frame from the trees91.csv data file which is also mentioned at the top of the page.

We first use the w1 data set and look at the boxplot of this data set:

 boxplot(w1$vals)

Again, this is a very plain graph, and the title and labels can be specified in exactly the same way as in the stripchart and hist commands:

 boxplot(w1$vals,
          main='Leaf BioMass in High CO2 Environment',
          ylab='BioMass of Leaves')

> Note that the default orientation is to plot the boxplot vertically. Because of this we used the ylab option to specify the axis label. There are a large number of options for this command. To see more of the options see the help page:

 help(boxplot)

As an example you can specify that the boxplot be plotted horizontally by specifying the horizontal option:

 boxplot(w1$vals,
          main='Leaf BioMass in High CO2 Environment',
          xlab='BioMass of Leaves',
          horizontal=TRUE)

The option to plot the box plot horizontally can be put to good use to display a box plot on the same image as a histogram. You need to specify the add option, specify where to put the box plot using the at option, and turn off the addition of axes using the axes option:

 hist(w1$vals,main='Leaf BioMass in High CO2 Environment',xlab='BioMass of Leaves',ylim=c(0,16))
 boxplot(w1$vals,horizontal=TRUE,at=15.5,add=TRUE,axes=FALSE)

If you are feeling really crazy you can take a histogram and add a box plot and a strip chart:

 hist(w1$vals,main='Leaf BioMass in High CO2 Environment',xlab='BioMass of Leaves',ylim=c(0,16))
 boxplot(w1$vals,horizontal=TRUE,at=16,add=TRUE,axes=FALSE)
 stripchart(w1$vals,add=TRUE,at=15)

For the second part on boxplots we will look at the second data frame, “tree,” which comes from the trees91.csv file. To reiterate the discussion at the top of this page and the discussion in the data types chapter, we need to specify which columns are factors:

 tree <- read.csv(file="https://www.cyclismo.org/tutorial/R/_static/trees91.csv",sep=",",head=TRUE)
 tree$C <- factor(tree$C)
 tree$N <- factor(tree$N)

We can look at the boxplot of just the data for the stem biomass:

 boxplot(tree$STBM,
          main='Stem BioMass in Different CO2 Environments',
          ylab='BioMass of Stems')

That plot does not tell the whole story. It is for all of the trees, but the trees were grown in different kinds of environments. The boxplot command can be used to plot a separate box plot for each level. In this case the data is held in “tree$STBM,” and the different levels are stored as factors in “tree$C.” The command to create different boxplots is the following:

boxplot(tree$STBM~tree$C)

Note that for the level called “2” there are four outliers which are plotted as little circles. There are many options to annotate your plot including different labels for each level. Please use the help(boxplot) command for more information.

Scatter Plots

A scatter plot provides a graphical view of the relationship between two sets of numbers. Here we provide examples using the tree data frame from the trees91.csv data file which is mentioned at the top of the page. In particular we look at the relationship between the stem biomass (“tree$STBM”) and the leaf biomass (“tree$LFBM”).

The command to plot each pair of points as an x-coordinate and a y-coorindate is “plot:”

plot(tree$STBM,tree$LFBM)

It appears that there is a strong positive association between the biomass in the stems of a tree and the leaves of the tree. It appears to be a linear relationship. In fact, the corelation between these two sets of observations is quite high:

 cor(tree$STBM,tree$LFBM)

## [1] 0.9115949

Getting back to the plot, you should always annotate your graphs. The title and labels can be specified in exactly the same way as with the other plotting commands:

 plot(tree$STBM,tree$LFBM,
       main="Relationship Between Stem and Leaf Biomass",
       xlab="Stem Biomass",
       ylab="Leaf Biomass")

Normal QQ Plots

The final type of plot that we look at is the normal quantile plot. This plot is used to determine if your data is close to being normally distributed. You cannot be sure that the data is normally distributed, but you can rule out if it is not normally distributed. Here we provide examples using the w1 data frame mentioned at the top of this page, and the one column of data is w1$vals.

The command to generate a normal quantile plot is qqnorm. You can give it one argument, the univariate data set of interest:

 qqnorm(w1$vals)

You can annotate the plot in exactly the same way as all of the other plotting commands given here:

 qqnorm(w1$vals,
         main="Normal Q-Q Plot of the Leaf Biomass",
         xlab="Theoretical Quantiles of the Leaf Biomass",
         ylab="Sample Quantiles of the Leaf Biomass")

After you creat the normal quantile plot you can also add the theoretical line that the data should fall on if they were normally distributed:

 qqline(w1$vals)

In this example you should see that the data is not quite normally distributed. There are a few outliers, and it does not match up at the tails of the distribution.

Intermediate Plotting

We look at some more options for plotting, and we assume that you are familiar with the basic plotting commands. A variety of different subjects ranging from plotting options to the formatting of plots is given.

In many of the examples below we use some of R’s commands to generate random numbers according to various distributions. The section is divided into three sections. The focus of the first section is on graphing continuous data. The focus of the second section is on graphing discrete data. The third section offers some miscellaneous options that are useful in a variety of contexts.

Continuous Data

In the examples below a data set is defined using R’s normally distributed random number generator.

 x <- rnorm(10,sd=5,mean=20)
 y <- 2.5*x - 1.0 + rnorm(10,sd=9,mean=0)
 cor(x,y)

## [1] 0.6943603

Multiple Data Sets on One Plot

One common task is to plot multiple data sets on the same plot. In many situations the way to do this is to create the initial plot and then add additional information to the plot. For example, to plot bivariate data the plot command is used to initialize and create the plot. The points command can then be used to add additional data sets to the plot.

First define a set of normally distributed random numbers and then plot them. (This same data set is used throughout the examples below.)

 x <- rnorm(10,sd=5,mean=20)
 y <- 2.5*x - 1.0 + rnorm(10,sd=9,mean=0)
 cor(x,y)

## [1] 0.8264323

 plot(x,y,xlab="Independent",ylab="Dependent",main="Random Stuff")
 x1 <- runif(8,15,25)
 y1 <- 2.5*x1 - 1.0 + runif(8,-6,6)
 points(x1,y1,col=2)

Note that in the previous example, the colour for the second set of data points is set using the col option. You can try different numbers to see what colours are available. For most installations there are at least eight options from 1 to 8. Also note that in the example above the points are plotted as circles. The symbol that is used can be changed using the pch option.


x2 <- runif(8,15,25)
 y2 <- 2.5*x2 - 1.0 + runif(8,-6,6)
 points(x2,y2,col=3,pch=2)

Again, try different numbers to see the various options. Another helpful option is to add a legend. This can be done with the legend command. The options for the command, in order, are the x and y coordinates on the plot to place the legend followed by a list of labels to use. There are a large number of other options so use help(legend) to see more options. For example a list of colors can be given with the col option, and a list of symbols can be given with the pch option.

plot(x,y,xlab="Independent",ylab="Dependent",main="Random Stuff")
 points(x1,y1,col=2,pch=3)
 points(x2,y2,col=4,pch=5)
 legend(14,70,c("Original","one","two"),col=c(1,2,4),pch=c(1,3,5))

The three data sets displayed on the same graph. Another common task is to change the limits of the axes to change the size of the plotting area. This is achieved using the xlim and ylim options in the plot command. Both options take a vector of length two that have the minimum and maximum values.

 plot(x,y,xlab="Independent",ylab="Dependent",main="Random Stuff",xlim=c(0,30),ylim=c(0,100))
 points(x1,y1,col=2,pch=3)
 points(x2,y2,col=4,pch=5)
 legend(14,70,c("Original","one","two"),col=c(1,2,4),pch=c(1,3,5))

Multiple Graphs on One Image

Note that a new command was used in the previous example. The par command can be used to set different parameters. In the example above the mfrow was set. The plots are arranged in an array where the default number of rows and columns is one. The mfrow parameter is a vector with two entries. The first entry is the number of rows of images. The second entry is the number of columns. In the example above the plots were arranged in one row with two plots across.

 par(mfrow=c(2,2))
 plot(x,y,xlab="Independent",ylab="Dependent",main="first plot",xlim=c(0,30),ylim=c(0,100))
 boxplot(tree$STBM~tree$C,main="second plot")
 qqnorm(w1$vals,
         main="third plot",
         xlab="Theoretical Quantiles of the Leaf Biomass",
         ylab="Sample Quantiles of the Leaf Biomass")
 hist(w1$vals,main='forth plot',xlab='BioMass of Leaves',ylim=c(0,16))

Hand in (Optional)

Please add your name to the last plot using text(1.5,1,"YOUR/Name") function and submit it on MLS under Lab 2 folder.

Note: This assignment only has 1 bonus mark. So if you cannot submit anything you will not lose any marks.

Credits

All the credits of this material are for: https://www.cyclismo.org/tutorial/R