This is just some examples of a few settings for making graphs prettier, discussing what is going on with each example.

It is using the built in iris data set, and each code chunk for an example is completely self-contained, so includes some code repetition that is not needed if you were running through the lot in one go and had the previous code in memory.

First, a fairly basic boxplot

data(iris)
boxplot(iris$Sepal.Length ~ iris$Species)

plot of chunk unnamed-chunk-1

Adding Colours

Now we can fancy this up by making and using a few colours

data(iris)
setoscol = "#FF000088"
versicol = "#88880088"
virgicol = "#0000FF88"
collectivecols = c(setoscol,versicol,virgicol)
boxplot(iris$Sepal.Length ~ iris$Species, col=collectivecols)

plot of chunk unnamed-chunk-2

In this particular example I am storing individual colours, then putting them into a list in the order I want them used.

One way of defining colours is by putting the colour name, as described in:

http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf

However the way these colours are created is the hash mark followed by four two-digit hexidecimal numbers giving the amount of red, green, blue, and the opacity (how solid the colour is), so #FF000088 is maximum red, no green, no blue, and about 50% see through. For a sense of what the different amounts of red, green, and blue will give (before fading out with opacity) check the RGB hexidecimal numbers shown in:

http://www.ou.edu/research/electron/internet/bgcolors.shtml

Axis Control

We can add more settings to the plot command to control the axes in various ways, including removing them.

data(iris)
setoscol = "#FF000088"
versicol = "#88880088"
virgicol = "#0000FF88"
collectivecols = c(setoscol,versicol,virgicol)
boxplot(iris$Sepal.Length ~ iris$Species, col=collectivecols, ylab="Sepal Length", frame.plot=F, xaxt="n", ylim = c(4,max(iris$Sepal.Length)))
text(1,4,"Setosa")
text(2,4,"Versicolor")
text(3,4,"Virginica")

plot of chunk unnamed-chunk-3

This particular graph also takes advantage of the fact that with R’s base plotting system you can add extra stuff to a graph after making it, like drawing extra things onto a graph on paper. In this case, it makes a plot with no x axis at all, then uses text() commands to add extra text labels at chosen points.

Multiple graphs

Another way of showing the distribution of a variable between different groups would be to make a series of histograms for each subgroup, using the par() settings to make several graphs in one.

data(iris)
par(mfrow=c(3,1))
hist(iris$Sepal.Length[iris$Species == "setosa"])
hist(iris$Sepal.Length[iris$Species == "versicolor"])
hist(iris$Sepal.Length[iris$Species == "virginica"])

plot of chunk unnamed-chunk-4

par(mfrow=c(1,1))

Note the subsequent par() command is resetting things back to make one graph at a time

However, these graphs would be much nicer if we used the graph settings to give each subgraph the same x and y axis range, and made the break points for the histograms in the same places.

data(iris)

xmin = 4
xmax = 8.5
xdist = c(xmin,xmax)
xbreaks = seq(from=xmin, to=xmax, by=0.25)

setoscol = "#FF000088"
versicol = "#88880088"
virgicol = "#0000FF88"

par(mfrow=c(3,1))
hist(iris$Sepal.Length[iris$Species == "setosa"], xlim=xdist, xlab="", main="Setosa", col=setoscol, ylim=c(0,20), breaks=xbreaks)
hist(iris$Sepal.Length[iris$Species == "versicolor"], xlim=xdist, xlab="", main="Versicolor", col=versicol, ylim=c(0,20), breaks=xbreaks)
hist(iris$Sepal.Length[iris$Species == "virginica"], xlim=xdist, xlab="Sepal Length", main="Virginica", col=virgicol, ylim=c(0,20), breaks=xbreaks)

plot of chunk unnamed-chunk-5

par(mfrow=c(1,1))

Overlapping Histograms

If I was only comparing two things, rather than making three histograms I could take advantage of see-through colours and make one overlapping histogram

data(iris)

xmin = 4
xmax = 8.5
xdist = c(xmin,xmax)
xbreaks = seq(from=xmin, to=xmax, by=0.25)

setoscol = "#FF000088"
virgicol = "#0000FF88"
bothcol = "#9900AABB"


hist(iris$Sepal.Length[iris$Species == "setosa"], xlim=xdist, xlab="Sepal Length", col=setoscol, ylim=c(0,20), breaks=xbreaks, main="")
hist(iris$Sepal.Length[iris$Species == "virginica"], xlim=xdist, col=virgicol, breaks=xbreaks, add=TRUE)
legend("topright", legend=c("Setosa","Virginica", "Both"), fill=c(setoscol,virgicol, bothcol), inset= c(0.1,0.2),box.lwd = 0,box.col = "white")

plot of chunk unnamed-chunk-6

In this case, I annotated the graph with an extra legend (adding in a combined zone colour on the legend) as an additional instruction after making the initial graph.

Note: This document may be updated as I receive feedback, so should not be regarded as permanent unchanging content.