Graphics: Customization
Prologue
- In our previous session session, we looked at creating simple graphs with no aesthetics or additional annotations whatsoever.
- Sometimes we want to tweak the non-data part of our graph, or should I say “the little things”.
- Nevertheless, it is always the little things that make the biggest difference. Therefore, I stongly believe in also giving great attention to small details.
- Customazing one’s graph is often for the intention of presentation or publication. Oftertimes, we would plot quick and basic graphs as previously for exploration or exploratory analysis.
- We will be using the ggplot2 for plotting purposes here.
- We will be using the iris dataset availabe in base R here.
library(ggplot2)
data(iris)
class(iris); dim(iris); str(iris)
## [1] "data.frame"
## [1] 150 5
## 'data.frame': 150 obs. of 5 variables:
## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
## $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
Title and axis labels
- Pass the relevant arguments into the labs() function.
- The title argument specifies your main title.
- The subtitle argument specifies the smaller-front title just below your main title.
- The x argument specifies your x-axis label.
- The y argument specifies your y-axis label.
# Bare
ggplot(data=iris) +
geom_point(mapping=aes(x=Sepal.Length, y=Petal.Length))

# Ornamented
ggplot(data=iris) +
geom_point(mapping=aes(x=Sepal.Length, y=Petal.Length)) +
labs(title="Iris Dataset", subtitle="Correlation between septal length and petal length", x="Sepal length", y="Petal Length")

- The main and subtitle are left-aligned by default.
- Pass the hjust argument to the element_text() function to configure the alignment.
- The plot.title and plot.subtitle objects refer to the main and subtitle, respectively.
ggplot(data=iris) +
geom_point(mapping=aes(x=Sepal.Length, y=Petal.Length)) +
labs(title="Iris Dataset", subtitle="Correlation between septal length and petal length", x="Sepal length", y="Petal Length") +
theme(plot.title=element_text(hjust = 0.5), plot.subtitle=element_text(hjust = 0.5))

Tick labels
- This would be relevant for categorial variables mapped to the x-axis.
- You may prefer labels other than the default ones in your data frame.
- Pass the labels argument to the scale_x_discrete() function for this purpose.
# Bare
ggplot(data=iris) +
geom_boxplot(mapping=aes(x=Species, y=Sepal.Length))

# Ornamented
ggplot(data=iris) +
geom_boxplot(mapping=aes(x=Species, y=Sepal.Length)) +
scale_x_discrete(labels=c("Setosa species", "Versicolor species", "Virginica species"))

Tick values and intervals
- Or more appropriately called axis scale and intervals.
- This would be relevant for continuous variables mapped to either the x- or y-axis.
- Pass the limits argument to the scale_x_continuous() and scale_y_continuous() functions for this purpose.
- The first variable specifies the lower limit. The second variable specifies the upper limit.
ggplot(data=iris) +
geom_point(mapping=aes(x=Sepal.Length, y=Petal.Length)) +
scale_x_continuous(limits=c(4, 9)) +
scale_y_continuous(limits=c(1, 8))

- You may want to also specify the intervals in addition to specifying the lower and upper limits of the scale.
- Pass the breaks argument to the scale_x_continuous() and scale_y_continuous() functions for this purpose.
- The first variable specifies the lower limit. The second variable specifies the upper limit. The third variable specifies the interval.
ggplot(data=iris) +
geom_point(mapping=aes(x=Sepal.Length, y=Petal.Length)) +
scale_x_continuous(breaks=seq(4, 9, by=0.5)) +
scale_y_continuous(breaks=seq(1, 8, by=0.5))

Front size
- Pass the size argument to the element_text() function to configure the alignment.
- The axis.text and axis.title objects refer to the tick labels and axis labels, respectively.
# Smaller front
ggplot(data=iris) +
geom_point(mapping=aes(x=Sepal.Length, y=Petal.Length)) +
labs(title="Iris Dataset", subtitle="Correlation between septal length and petal length", x="Sepal length", y="Petal Length") +
theme(axis.text=element_text(size=1), axis.title=element_text(size=2), plot.subtitle=element_text(hjust=0.5, size=3), plot.title=element_text(hjust=0.5, size=4))

# Larger front
ggplot(data=iris) +
geom_point(mapping=aes(x=Sepal.Length, y=Petal.Length)) +
labs(title="Iris Dataset", subtitle="Correlation between septal length and petal length", x="Sepal length", y="Petal Length") +
theme(axis.text=element_text(size=11), axis.title=element_text(size=12), plot.subtitle=element_text(hjust=0.5, size=13), plot.title=element_text(hjust=0.5, size=14))

- Aside from size, there are other parameters for which you may configure. The complete list as follows.
#element_text(family = NULL, face = NULL, colour = NULL, size = NULL, hjust = NULL, vjust = NULL, angle = NULL, lineheight = NULL, color = NULL)
Color fills
- We will been looking into using just 1 color here.
- We will look into using >1 colours for later session when we look into adding new variables/dimensions/levels.
- Pass the color argument to the geom_*() function for dots.
- Pass the fill argument geom_*() function for boxes.
ggplot(data=iris) +
geom_boxplot(mapping=aes(x=Species, y=Sepal.Length), fill="blue")

ggplot(data=iris) +
geom_point(mapping=aes(x=Sepal.Length, y=Petal.Length), color="blue")

Shapes
- The default shape for dot points, such as those in the scatterplots, are round opaques ones.
- The full list of shapes can be found here.
- Pass the shape argument geom_*() function to specify your preferred shape.
ggplot(data=iris) +
geom_point(mapping=aes(x=Sepal.Length, y=Petal.Length), shape=23)

# Combine color with shape
ggplot(data=iris) +
geom_point(mapping=aes(x=Sepal.Length, y=Petal.Length), shape=23, fill="blue")
- Note that, the fill argument can be used only for the point shapes 21 to 25.
Borders
- You may want your bars and your points in your boxplots or scatterplots, respectively to have colored borders.
- For boxplots (and also bar charts), you would need to pass the fill and color arguments into the geom_*() function where fill and color specify the fill and border, respectively.
- Similarly for scatterplots, you would need to pass the fill and color arguments into the geom_*() function where fill and color specify the fill and border, respectively.
ggplot(data=iris) +
geom_boxplot(mapping=aes(x=Species, y=Sepal.Length), fill="blue", color="red")

ggplot(data=iris) +
geom_point(mapping=aes(x=Sepal.Length, y=Petal.Length), shape=23, fill="blue", color="red")

Background
- The default background color is grey with white grids.
- Use the theme_*() functions to specify which background you prefer.
- There are numerous theme_*() functions. Here, we will look into just a few..
- The theme_bw() yields light grids against an empty background
- The theme_classic() darkens the x- and y-axes but removes the grids.
- The theme_linedraw() hightlights the edge of the graph and retains the grids.
ggplot(data=iris) +
geom_point(mapping=aes(x=Sepal.Length, y=Petal.Length)) +
theme_bw()

ggplot(data=iris) +
geom_point(mapping=aes(x=Sepal.Length, y=Petal.Length)) +
theme_classic()

ggplot(data=iris) +
geom_point(mapping=aes(x=Sepal.Length, y=Petal.Length)) +
theme_linedraw()

Flipped graphs
- Useful for some bar charts and boxplots especially when the variables are ordered in some way, i.e. in ascending or descending order.
- Switches the position of x- and y-axes.
- The coord_flip() is useful for this purpose.
- Example of a flipped bar chart as follows.
# Calculate mean of sepal length for each species
collapsed <- aggregate(Sepal.Length ~ Species, data=iris, mean)
# Not flipped
ggplot(data=iris) +
geom_bar(mapping=aes(x=Species, y=Sepal.Length), stat="identity")

# Flipped
ggplot(data=iris) +
geom_bar(mapping=aes(x=Species, y=Sepal.Length), stat="identity") +
coord_flip()

- Notice that the tick labels are automatically rotated to a horizonal position.
- Example of a flipped boxplot as follows.
ggplot(data=iris) +
geom_boxplot(mapping=aes(x=Species, y=Sepal.Length)) +
coord_flip()

Summary
- Remember that you may combine customization for different features into a single function, e.g. you may customize the front, alignment, color etc. all in one function call.
- These are just some of the features for customization that I can think right off the top of my head.
- As you plot your own graphs, you may realize additional features that you may wish to customize.