?<command>
is your best friend :-)The grammar of graphics was conceived by L. Wilkinson, trying to address the question: what is a statistical graphic? It allows to specify how variables in the data are mapped to aesthetic attributes that you can perceive.
Aesthetics (position, colour, size, etc) are applied to geometric objects (points, lines, polygons, etc). The plot may contain statistical transformations of the data (e.g. binning for histograms, smoothing, etc).
Grammar of graphic concepts were brought to R by H. Wickham in the ggplot2
package. Plots are built in a layered fashion from the data stored in a dataframe (beware the type of your columns! e.g. numeric, factor, logical). The main ggplot2
functions are:
ggplot()
to create a plot,geom_xxx()
and stat_xxx()
to add layers to the plot,aes()
to specify the aesthetics.The main aesthetics are x
, y
, colour
, size
, shape
, alpha
. The help (in particular all geoms and stats) is best browsed at http://docs.ggplot2.org/current/.
The data and aesthetics can be set in each layer. Alternatively, they can be specified when calling ggplot()
: in this case, these values are passed as defaults to all layers.
Each layer has a geom and a stat defined. However most of the time specifying only one is enough (mind the default value of the other!).
The teaser examples were created using qplot
syntax. This is a shortcut for the explicit ggplot
syntax that we used up to now. You will certainly find it in tutorial and forums. Hence it is introduced here quickly, however I don’t recommend spending too much time learning it: real life plots often require controlling details that can only be achieved with the ggplot
syntax!
The qplot
function itself takes the following arguments (and is conveniently designed with appropriate defaults):
qplot(x, ..., data, facets = NULL, geom = "auto", stat = list(NULL),
position = list(NULL), xlim = c(NA, NA), ylim = c(NA, NA), log = "", main = NULL,
xlab = deparse(substitute(x)), ylab = deparse(substitute(y)))
This qplot
call:
qplot(total_bill, tip, data=tips, facets=sex~time, geom=c("density2d", "smooth"))
is equivalent to this ggplot
one:
ggplot(aes(x=total_bill, y=tip), data=tips) +
stat_density2d() +
stat_smooth() +
facet_grid(sex~time)
You can add additional layers to a plot created using qplot
:
qplot(total_bill, tip, data=tips, geom="density2d") +
stat_smooth() +
facet_grid(sex~time)
For several geoms, the associated stat is not identity
(e.g. bin
, density
). In this case, the stat
transform produces one or more variables (e.g. ..density.., ..counts..) that you can map to the aesthetics of your choice.
stat_summary
is particularly useful to plot the average over all points at a given x:
ggplot(aes(x=size, y=tip), data=tips) +
geom_jitter(alpha=0.5) +
stat_summary(fun.y="mean", geom="line")
ggplot(aes(x=size, y=tip), data=tips) +
geom_jitter(alpha=0.5) +
stat_summary(fun.data="mean_cl_normal", geom="smooth")
ggplot(aes(x=size, y=tip, colour=sex), data=tips) +
geom_point(alpha=0.5, position=position_jitter(width=0.2)) +
geom_line(stat="summary", fun.y="mean") +
geom_errorbar(stat="summary", fun.data="mean_cl_normal", width=0.2)
Any aesthetic can be set to a constant value or mapped to a variable. In this example, colour is mapped to time
but alpha is set to 0.5
. In the qplot
syntax, setting is achieved with the I()
function (stands for “inhibit”).
ggplot(data=tips) +
geom_point(aes(x=total_bill, y=tip, colour=time, shape=sex, size=size), alpha=0.5)
qplot(total_bill, tip, data=tips, colour=time, shape=sex, size=size, alpha=I(0.5))
Scales define several important visual aspects of the plot, including its limits and transformation (e.g. log scale).
scale
functionAll parameters can be fine-tuned by calling the scale
function. Scales can be of different types (mainly continuous, discrete, and manual):
ggplot(aes(x=total_bill, y=tip), data=tips) +
geom_point() +
geom_smooth() +
scale_x_continuous(name='total bill', trans='log10') +
scale_y_continuous(limits=c(1, 3))
## Warning: Removed 98 rows containing missing values (stat_smooth).
## Warning: Removed 98 rows containing missing values (geom_point).
## Warning: Removed 6 rows containing missing values (geom_path).
Compare with:
ggplot(aes(x=total_bill, y=tip), data=subset(tips, tip>=1 & tip<=3)) +
geom_point() +
geom_smooth() +
scale_x_continuous(name='total bill', trans='log10')
Scales are used for all aesthetics (if you don’t set one, a default is used):
ggplot(aes(x=total_bill, y=tip, colour=size), data=tips) +
geom_point() +
scale_colour_gradient(low="red", high="white", trans="log10", breaks=c(1, 3, 6))
Several shortcuts mimicking the syntax of base R are available:
xlim
and ylim
xlab
, ylab
, and labs
ggplot(aes(x=total_bill, y=tip), data=tips) +
geom_point() +
geom_smooth() +
xlim(10, 20) + xlab('total bill')
Position adjustments apply minor tweaks to the position of elements within a layer. The main values of the position argument are:
identity
(don’t adjust position: most common default),stack
(overlapping object are shown on top of one another),dodge
(overlapping objects are shown side by side),jitter
(jitter points to avoid overplotting).ggplot() +
geom_histogram(aes(x=tip, fill=factor(size)), data=subset(tips, size%in%2:4)) # default poisition is 'stack'
ggplot() +
geom_histogram(aes(x=tip, y=..density.., fill=factor(size)), data=subset(tips, size%in%2:4), position='identity', alpha=0.5)
The general appearance of the plot is set by its theme. Themes can be set as default (for a given R session):
theme_set(theme_bw())
Themes can also be set (theme_xx
) and customized (theme
) for a given plot:
ggplot() +
geom_point(aes(x=total_bill, y=tip, colour=time, shape=sex, size=size), data=tips, alpha=0.5) +
theme_classic(base_size=18) + theme(legend.position='top')
You might also want to use a different discrete colour scale:
qplot(tip, fill=factor(size), data=subset(tips, size%in%2:4), geom="density", alpha=I(0.65)) +
scale_fill_brewer(palette="Set1")
# to set it as a default for discrete colour scales
scale_colour_discrete <- function(...) scale_colour_brewer(..., palette="Set1")
In order to arrange several plots on a single page, the easiest way is to use the gridExtra
package:
library(gridExtra)
p1 <- qplot(total_bill, tip, data=tips, colour=time, shape=sex, size=size, alpha=I(0.5))
p2 <- qplot(tip, fill=factor(size), data=subset(tips, size%in%2:4), facets=sex~., geom="density", alpha=I(0.5))
grid.arrange(p1, p2, ncol = 2)
The multiplot function is a nice (user contributed) alternative, allowing you to save plots in a list and to define the relative size of the plots: http://www.cookbook-r.com/Graphs/Multiple_graphs_on_one_page_(ggplot2)/