GESC 258- Geographical Research Methods
Basic Plots
We look at some of the ways R can display information graphically. This is a basic introduction to some of the basic plotting commands. It is assumed that you know how to enter data or read data files which is covered in the first chapter, and it is assumed that you are familiar with the different data types.
In each of the topics that follow it is assumed that two different data sets, w1.dat and trees91.csv have been read and defined using the same variables as in the first chapter. Both of these data sets come from the study discussed on the web site given in the first chapter. We assume that they are read using “read.csv” into variables w1 and tree:
<- read.csv(file="https://www.cyclismo.org/tutorial/R/_static/w1.dat",sep=",",head=TRUE)
w1 names(w1)
## [1] "vals"
<- read.csv(file="https://www.cyclismo.org/tutorial/R/_static/trees91.csv",sep=",",head=TRUE)
tree names(tree)
## [1] "C" "N" "CHBR" "REP" "LFBM" "STBM" "RTBM" "LFNCC"
## [9] "STNCC" "RTNCC" "LFBCC" "STBCC" "RTBCC" "LFCACC" "STCACC" "RTCACC"
## [17] "LFKCC" "STKCC" "RTKCC" "LFMGCC" "STMGCC" "RTMGCC" "LFPCC" "STPCC"
## [25] "RTPCC" "LFSCC" "STSCC" "RTSCC"
Strip Charts
A strip chart is the most basic type of plot available. It plots the
data in order along a line with each data point represented as a box.
Here we provide examples using the w1 data frame mentioned at the top of
this page, and the one column of the data is w1$vals
.
To create a strip chart of this data use the stripchart command:
help(stripchart)
stripchart(w1$vals)
This is the most basic possible strip charts. The
stripchart()
command takes many of the standard
plot()
options for labeling and annotations.
As you can see this is about as bare bones as you can get. There is no title nor axes labels. It only shows how the data looks if you were to put it all along one line and mark out a box at each point. If you would prefer to see which points are repeated you can specify that repeated points be stacked:
stripchart(w1$vals,method="stack")
A
variation on this is to have the boxes moved up and down so that there
is more separation between them:
stripchart(w1$vals,method="jitter")
If you do not want the boxes plotting in the horizontal direction you can plot them in the vertical direction:
stripchart(w1$vals,vertical=TRUE)
stripchart(w1$vals,vertical=TRUE,method="jitter")
Since you should always annotate your plots there are many different
ways to add titles and labels. One way is within the stripchart command
itself:
stripchart(w1$vals,method="stack",
main='Leaf BioMass in High CO2 Environment',
xlab='BioMass of Leaves')
If you have a plot already and want to add a title, you can use the
title command:
Note that this simply adds the title and labels and will write over the top of any titles or labels you already have.
Histograms
A histogram is very common plot. It plots the frequencies that data appears within certain ranges. Here we provide examples using the w1 data frame mentioned at the top of this page, and the one column of data is w1$vals.
To plot a histogram of the data use the “hist” command:
hist(w1$vals)
hist(w1$vals,main="Distribution of w1",xlab="w1")
#### Histogram Options
Many of the basic plot commands accept the same options. The
help(hist)
command will give you options specifically for
the hist command. You can also use the help command to see more but also
note that if you use help(plot)
you may see more options.
Experiment with different options to see what you can do.
As you can see R will automatically calculate the intervals to use. There are many options to determine how to break up the intervals. Here we look at just one way, varying the domain size and number of breaks. If you would like to know more about the other options check out the help page:
help(hist)
You can specify the number of breaks to use using the breaks option. Here we look at the histogram for various numbers of breaks:
hist(w1$vals,breaks=2)
hist(w1$vals,breaks=4)
hist(w1$vals,breaks=6)
hist(w1$vals,breaks=8)
hist(w1$vals,breaks=12)
You can also vary the size of the domain using the xlim option. This
option takes a vector with two entries in it, the left value and the
right value:
hist(w1$vals,breaks=12,xlim=c(0,10))
hist(w1$vals,breaks=12,xlim=c(-1,2))
hist(w1$vals,breaks=12,xlim=c(0,2))
hist(w1$vals,breaks=12,xlim=c(1,1.3))
hist(w1$vals,breaks=12,xlim=c(0.9,1.3))
The options for adding titles and labels are exactly the same as for
strip charts. You should always annotate your plots and there are many
different ways to add titles and labels. One way is within the hist
command itself:
hist(w1$vals,
main='Leaf BioMass in High CO2 Environment',
xlab='BioMass of Leaves')
If you have a plot already and want to change or add a title, you can
use the title command:
title('Leaf BioMass in High CO2 Environment',xlab='BioMass of Leaves')
>Note that this simply adds the title and labels and will write over
the top of any titles or labels you already have.
It is not uncommon to add other kinds of plots to a histogram. For example, one of the options to the stripchart command is to add it to a plot that has already been drawn. For example, you might want to have a histogram with the strip chart drawn across the top. The addition of the strip chart might give you a better idea of the density of the data:
hist(w1$vals,main='Leaf BioMass in High CO2 Environment',xlab='BioMass of Leaves',ylim=c(0,16))
stripchart(w1$vals,add=TRUE,at=15.5)
### Boxplots A boxplot provides a graphical view of the median,
quartiles, maximum, and minimum of a data set. Here we provide examples
using two different data sets. The first is the w1 data frame mentioned
at the top of this page, and the one column of data is w1$vals. The
second is the tree data frame from the trees91.csv data file which is
also mentioned at the top of the page.
We first use the w1 data set and look at the boxplot of this data set:
boxplot(w1$vals)
Again, this is a very plain graph, and the title and labels can be specified in exactly the same way as in the stripchart and hist commands:
boxplot(w1$vals,
main='Leaf BioMass in High CO2 Environment',
ylab='BioMass of Leaves')
> Note that the default orientation is to plot the boxplot
vertically. Because of this we used the ylab option to specify the axis
label. There are a large number of options for this command. To see more
of the options see the help page:
help(boxplot)
As an example you can specify that the boxplot be plotted horizontally by specifying the horizontal option:
boxplot(w1$vals,
main='Leaf BioMass in High CO2 Environment',
xlab='BioMass of Leaves',
horizontal=TRUE)
The option to plot the box plot horizontally can be put to good use to display a box plot on the same image as a histogram. You need to specify the add option, specify where to put the box plot using the at option, and turn off the addition of axes using the axes option:
hist(w1$vals,main='Leaf BioMass in High CO2 Environment',xlab='BioMass of Leaves',ylim=c(0,16))
boxplot(w1$vals,horizontal=TRUE,at=15.5,add=TRUE,axes=FALSE)
If you are feeling really crazy you can take a histogram and add a box plot and a strip chart:
hist(w1$vals,main='Leaf BioMass in High CO2 Environment',xlab='BioMass of Leaves',ylim=c(0,16))
boxplot(w1$vals,horizontal=TRUE,at=16,add=TRUE,axes=FALSE)
stripchart(w1$vals,add=TRUE,at=15)
For the second part on boxplots we will look at the second data frame, “tree,” which comes from the trees91.csv file. To reiterate the discussion at the top of this page and the discussion in the data types chapter, we need to specify which columns are factors:
<- read.csv(file="https://www.cyclismo.org/tutorial/R/_static/trees91.csv",sep=",",head=TRUE)
tree $C <- factor(tree$C)
tree$N <- factor(tree$N) tree
We can look at the boxplot of just the data for the stem biomass:
boxplot(tree$STBM,
main='Stem BioMass in Different CO2 Environments',
ylab='BioMass of Stems')
That plot does not tell the whole story. It is for all of the trees, but the trees were grown in different kinds of environments. The boxplot command can be used to plot a separate box plot for each level. In this case the data is held in “tree\(STBM,” and the different levels are stored as factors in “tree\)C.” The command to create different boxplots is the following:
boxplot(tree$STBM~tree$C)
Note that for the level called “2” there are four outliers which are plotted as little circles. There are many options to annotate your plot including different labels for each level. Please use the help(boxplot) command for more information.
Scatter Plots
A scatter plot provides a graphical view of the relationship between two sets of numbers. Here we provide examples using the tree data frame from the trees91.csv data file which is mentioned at the top of the page. In particular we look at the relationship between the stem biomass (“tree\(STBM”) and the leaf biomass (“tree\)LFBM”).
The command to plot each pair of points as an x-coordinate and a y-coorindate is “plot:”
plot(tree$STBM,tree$LFBM)
It appears that there is a strong positive association between the
biomass in the stems of a tree and the leaves of the tree. It appears to
be a linear relationship. In fact, the corelation between these two sets
of observations is quite high:
cor(tree$STBM,tree$LFBM)
## [1] 0.9115949
Getting back to the plot, you should always annotate your graphs. The title and labels can be specified in exactly the same way as with the other plotting commands:
plot(tree$STBM,tree$LFBM,
main="Relationship Between Stem and Leaf Biomass",
xlab="Stem Biomass",
ylab="Leaf Biomass")
Normal QQ Plots
The final type of plot that we look at is the normal quantile plot. This plot is used to determine if your data is close to being normally distributed. You cannot be sure that the data is normally distributed, but you can rule out if it is not normally distributed. Here we provide examples using the w1 data frame mentioned at the top of this page, and the one column of data is w1$vals.
The command to generate a normal quantile plot is qqnorm. You can give it one argument, the univariate data set of interest:
qqnorm(w1$vals)
You can annotate the plot in exactly the same way as all of the other
plotting commands given here:
qqnorm(w1$vals,
main="Normal Q-Q Plot of the Leaf Biomass",
xlab="Theoretical Quantiles of the Leaf Biomass",
ylab="Sample Quantiles of the Leaf Biomass")
After you creat the normal quantile plot you can also add the
theoretical line that the data should fall on if they were normally
distributed:
qqline(w1$vals)
In this example you should see that the data is not quite normally distributed. There are a few outliers, and it does not match up at the tails of the distribution.
Intermediate Plotting
We look at some more options for plotting, and we assume that you are familiar with the basic plotting commands. A variety of different subjects ranging from plotting options to the formatting of plots is given.
In many of the examples below we use some of R’s commands to generate random numbers according to various distributions. The section is divided into three sections. The focus of the first section is on graphing continuous data. The focus of the second section is on graphing discrete data. The third section offers some miscellaneous options that are useful in a variety of contexts.
Continuous Data
In the examples below a data set is defined using R’s normally distributed random number generator.
<- rnorm(10,sd=5,mean=20)
x <- 2.5*x - 1.0 + rnorm(10,sd=9,mean=0)
y cor(x,y)
## [1] 0.6943603
Multiple Data Sets on One Plot
One common task is to plot multiple data sets on the same plot. In many situations the way to do this is to create the initial plot and then add additional information to the plot. For example, to plot bivariate data the plot command is used to initialize and create the plot. The points command can then be used to add additional data sets to the plot.
First define a set of normally distributed random numbers and then plot them. (This same data set is used throughout the examples below.)
<- rnorm(10,sd=5,mean=20)
x <- 2.5*x - 1.0 + rnorm(10,sd=9,mean=0)
y cor(x,y)
## [1] 0.8264323
plot(x,y,xlab="Independent",ylab="Dependent",main="Random Stuff")
<- runif(8,15,25)
x1 <- 2.5*x1 - 1.0 + runif(8,-6,6)
y1 points(x1,y1,col=2)
Note that in the previous example, the colour for the second set of data points is set using the col option. You can try different numbers to see what colours are available. For most installations there are at least eight options from 1 to 8. Also note that in the example above the points are plotted as circles. The symbol that is used can be changed using the pch option.
x2 <- runif(8,15,25)
y2 <- 2.5*x2 - 1.0 + runif(8,-6,6)
points(x2,y2,col=3,pch=2)
Again, try different numbers to see the various options. Another helpful option is to add a legend. This can be done with the legend command. The options for the command, in order, are the x and y coordinates on the plot to place the legend followed by a list of labels to use. There are a large number of other options so use help(legend) to see more options. For example a list of colors can be given with the col option, and a list of symbols can be given with the pch option.
plot(x,y,xlab="Independent",ylab="Dependent",main="Random Stuff")
points(x1,y1,col=2,pch=3)
points(x2,y2,col=4,pch=5)
legend(14,70,c("Original","one","two"),col=c(1,2,4),pch=c(1,3,5))
The three data sets displayed on the same graph. Another common task is to change the limits of the axes to change the size of the plotting area. This is achieved using the xlim and ylim options in the plot command. Both options take a vector of length two that have the minimum and maximum values.
plot(x,y,xlab="Independent",ylab="Dependent",main="Random Stuff",xlim=c(0,30),ylim=c(0,100))
points(x1,y1,col=2,pch=3)
points(x2,y2,col=4,pch=5)
legend(14,70,c("Original","one","two"),col=c(1,2,4),pch=c(1,3,5))
Multiple Graphs on One Image
Note that a new command was used in the previous example. The par command can be used to set different parameters. In the example above the mfrow was set. The plots are arranged in an array where the default number of rows and columns is one. The mfrow parameter is a vector with two entries. The first entry is the number of rows of images. The second entry is the number of columns. In the example above the plots were arranged in one row with two plots across.
par(mfrow=c(2,2))
plot(x,y,xlab="Independent",ylab="Dependent",main="first plot",xlim=c(0,30),ylim=c(0,100))
boxplot(tree$STBM~tree$C,main="second plot")
qqnorm(w1$vals,
main="third plot",
xlab="Theoretical Quantiles of the Leaf Biomass",
ylab="Sample Quantiles of the Leaf Biomass")
hist(w1$vals,main='forth plot',xlab='BioMass of Leaves',ylim=c(0,16))
Hand in (Optional)
Please add your name to the last plot using
text(1.5,1,"YOUR/Name")
function and submit it on MLS under
Lab 2 folder.
Note: This assignment only has 1 bonus mark. So if you cannot submit anything you will not lose any marks.
Credits
All the credits of this material are for: https://www.cyclismo.org/tutorial/R