Week 2: Principles of data visualisation- Week 3: Grammar of Graphics; aesthetics and attributes
- Week 4: Major visualisation tools
- Week 5: Customising visualisations (scales, themes, and labels)
Download exercises from Week 3 folder on NOW and move them into your R-project directory.
ggplot2| Component | Example | Description |
|---|---|---|
| Data | d_spellname |
The dataset being plotted |
| Aesthetics (aes) | x = dur, y = rt |
How data variables map to visual properties |
| Geometries (geom) | geom_point() |
The type of plot element (points, bars, lines) |
| Statistics (stat) | stat_smooth() |
Transformations of data (e.g., regression lines) |
| Scales | scale_x_log10() |
How data values are converted to aesthetics |
| Facets | facet_wrap(~group) |
Splitting data into small multiples |
| Coordinate system | coord_polar() |
The space where data are drawn (Cartesian, polar) |
| Theme | theme_minimal() |
Non-data display elements (fonts, text, grids, etc.) |
ggplot2 is an R package for creating statistical / data graphicsggplot2 builds on Wilkinson’s grammar by focussing on the primacy of layers and adapting it for use in R (Wickham, 2010).plot(), hist())Think about language where grammar determines how words can be combined into sentences
“words”: graphics consist of distinct layers of grammatical elements (data, aesthetics, geometries)
“grammar”: graphics are built around mappings that determine how data, aesthetics and geometries are combined.
Grammatical elements are organised as layers
Underlying grammar controls how graphics are combined
System of rules for mapping variables to graphical properties
Instead of memorising “plot types,” Grammar of Graphics teaches how plots are built.
It gives flexibility: you can create any plot by combining consistent building blocks.
ggplot(data = ...)mapping = aes()geom_...()ggplot(d_spellname, aes(x = dur, y = rt))
ggplot(d_spellname, aes(x = dur, y = rt)) + geom_point()
ggplot(d_spellname, aes(x = dur, y = rt)) + geom_quantile()
ggplot(d_spellname, aes(x = dur, y = rt)) + geom_rug()
ggplot(d_spellname, aes(x = dur, y = rt)) + geom_point() + geom_quantile() + geom_rug()
ggplot(d_spellname, aes(x = dur, y = rt)) + geom_point()
ggplot(d_spellname, aes(x = dur, y = rt)) + geom_point() + facet_grid( ~ modality)
ggplot(d_spellname, aes(x = dur, y = rt)) + geom_point() + stat_smooth(method = "lm", se = FALSE)
ggplot(d_spellname, aes(x = dur, y = rt)) + geom_point() + coord_trans(x = "log", y = "log")
ggplot(d_spellname, aes(x = dur, y = rt)) + geom_point() + coord_flip()
ggplot(d_spellname, aes(x = dur, y = rt)) + geom_point() + theme_dark()
ggplot(d_spellname, aes(x = dur, y = rt)) + geom_point() + theme(panel.background = element_blank())
Complete RMarkdown document 1a_gog.Rmd
Bonus: 1b_gog.Rmd
ggplot(d_spellname, aes(x = dur, y = rt, colour = group)) +
geom_point(size = .75) +
stat_smooth(method = "lm", se = F) +
labs(x = "Response duration (in msecs)",
y = "Reaction time (in msecs)") +
theme(legend.position = "right",
legend.justification = "top",
legend.direction = "vertical")
Aesthetics: visual properties mapped to variables (data-driven) e.g. aes(x = duration, y = time, colour = group) → colour changes with data
Attributes: visual properties set manually (constant) e.g. geom_point(colour = "blue", size = 2) → colour is fixed, not mapped to data
Think:
Aesthetic = mapping (data → visual)
Attribute = setting (visual → fixed property)
ggplot(d_spellname, aes(x = dur, y = rt)) + geom_point(colour = "red")
ggplot(d_spellname, aes(x = dur, y = rt)) + geom_point(aes(colour = modality))
ggplot(d_spellname, aes(x = dur, y = rt)) + geom_point(aes(colour = modality)) + stat_smooth(method = "lm")
ggplot(d_spellname, aes(x = dur, y = rt)) + geom_point() + stat_smooth(aes(colour = modality), method = "lm")
ggplot(d_spellname, aes(x = dur, y = rt)) + geom_point(aes(colour = modality)) + stat_smooth(aes(colour = modality), method = "lm")
ggplot(d_spellname, aes(x = dur, y = rt, colour = modality)) + geom_point() + stat_smooth(method = "lm")
ggplot(d_spellname, aes(x = dur, y = rt, shape = modality)) + geom_point(size = 2.5, colour = "red")
ggplot(d_spellname, aes(x = dur, y = rt, colour = modality)) + geom_point(size = 2.5)
ggplot(d_spellname, aes(x = dur, y = rt, colour = modality, shape = modality)) + geom_point(size = 2.5)
ggplot(d_spellname, aes(x = dur, y = rt, colour = modality, shape = name_familiarised)) + geom_point(size = 2.5)
ggplot(d_spellname, aes(x = dur, y = rt, colour = interaction(name_familiarised, modality, sep = "; "))) + geom_point(size = 2.5) + labs(colour = "")
x, y, colour, fill, groupgeom_point()x, y, shape, colour, fill, size, alpha, stroke, group
geom_bar()x, y, colour, fill, linewidth, linetype, alpha, width, group
geom_boxplot()x, y, lower, xlower, upper, xupper, middle, xmiddle, ymin, xmin, ymax, xmax, weight, colour, fill, size, alpha, shape, linetype, linewidth, width, group
Complete RMarkdown documents
The type—<dbl> and <chr>—determines which aesthetics you can map to it, and how ggplot2 interprets it visually.
glimpse(d_spellname)
Rows: 144 Columns: 7 $ ppt_id <dbl> 40, 40, 41, 41, 42, 42, 43, 43, 44, 44, 45, 45, 46, … $ name_familiarised <chr> "familiar name", "unfamiliar name", "familiar name",… $ ppt_vocab <dbl> 0.9500, 0.9500, 1.0000, 1.0000, 0.9250, 0.9250, 0.90… $ modality <chr> "speech", "speech", "speech", "speech", "speech", "s… $ rt <dbl> 1476.1579, 1249.5161, 1404.2000, 1170.5397, 1283.455… $ dur <dbl> 735.2632, 686.8065, 565.0400, 542.4603, 623.5147, 63… $ group <chr> "Modality: speech; familiar name", "Modality: speech…
ggplot2 automatically detects variable type:
Sometimes categorical variables are coded as numbers; condition = 1, 2, 3, 4; male = 1, female = 2
See Wong (2010) page 665
Best represented using gradual or quantitative encodings:
ggplot(d_spellname, aes(x = modality, y = rt)) + geom_jitter()
ggplot(d_spellname, aes(x = modality, y = rt)) + geom_jitter() + facet_wrap(~modality, scales = "free")
ggplot(d_spellname, aes(x = dur,
y = rt,
colour = ppt_vocab)) +
geom_point() +
scale_colour_gradient(high = "darkred", low = "yellow")
See Wong (2010) page 665
Best represented using distinct encodings:
ggplot(d_spellname, aes(x = dur,
y = rt,
colour = modality)) +
geom_point() +
scale_colour_brewer(type = "qual")
text / labelsggplot(d_spellname, aes(x = dur,
y = rt,
label = modality)) +
geom_text(size = 3)
shapeggplot(d_spellname, aes(x = dur,
y = rt,
shape = modality)) +
geom_point(size = 3)
ggplot(d_spellname, aes(x = dur,
y = rt,
colour = modality)) +
geom_point(size = 3)
ggplot(d_spellname, aes(x = dur,
y = rt,
colour = modality)) +
stat_smooth(method = "lm", se = F)
linetypeggplot(d_spellname, aes(x = dur,
y = rt,
linetype = modality)) +
stat_smooth(method = "lm", se = F)
sizeggplot(d_spellname, aes(x = dur,
y = rt,
size = modality)) +
stat_smooth(method = "lm", se = F)
| Variable Type | Typical Aesthetics | Visual Encoding |
|---|---|---|
| Continuous | x, y, size, color (gradient), alpha | magnitude, order |
| Categorical | x, shape, color (hue), fill | category, grouping |
Which aesthetics are most effective for continuous vs categorical variables?
Principles of perceptual psychology and visual encoding theory are underpinning the Grammar of Graphics and ggplot2’s design choices (Cleveland & McGill, 1984; Wilkinson, 2005)
The human visual system perceives some encodings more accurately than others.
| Accuracy (for continuous data) | Visual Encoding | Example in ggplot2 |
|---|---|---|
| Position on a common scale | where a point lies along an axis | aes(x, y) |
| Length / Size (1D) | bar height, line length | geom_bar(), aes(size) |
| Angle / Slope | pie chart slices, line angle | coord_polar(), geom_line() |
| Area (2D size) | circle or bubble area | aes(size) (use carefully) |
| Colour luminance / saturation | brightness / lightness gradient | scale_colour_gradient() |
| Colour hue | qualitative differences | scale_colour_brewer() |
| Shape / Symbol type | triangle, circle, square | aes(shape) |
| Spatial region / volume | map regions, 3D volume | geom_polygon() |
ggplot2Continuous variables prefer position / gradual encodings:
x, y: continuous scales (most accurate)size: can work, avoid misleading area scalingcolour: use gradientsalpha: transparency can suggest intensityAvoid using hue or shape: humans don’t perceive continuous changes in colour hue or shape as ordered or proportional.
Categorical variables prefer distinct encodings:
x, colour, fill, shape, linetypescale_colour_brewer(type = "qual"))Each category should be visually distinct: no gradient or ordering implied.
# Modality as character
ggplot(d_spellname, aes(x = dur,
y = rt,
colour = modality)) +
geom_point()
# Convert modality to numbers
d_spell_2 <- mutate(d_spellname,
mod_2 = as.numeric(factor(modality)))
ggplot(d_spell_2, aes(x = dur,
y = rt,
colour = mod_2)) +
geom_point()
# ppt_vocab as numeric
ggplot(d_spellname, aes(x = dur,
y = rt,
colour = ppt_vocab)) +
geom_point()
# Convert ppt_vocab to characters
d_spell_2 <- mutate(d_spellname,
vocab_2 = as.character(ppt_vocab))
ggplot(d_spell_2, aes(x = dur,
y = rt,
colour = vocab_2)) +
geom_point()
| Aesthetic | Best For | Why |
|---|---|---|
| Position (x, y) | Continuous | Most precise, linear mapping |
| Length / Size | Continuous | Perceived magnitude |
| Color (lightness) | Continuous | Suggests intensity or order |
| Color (hue) | Categorical | Distinct, unordered groups |
| Shape / Linetype | Categorical | Symbolic differences |
| Fill / Alpha | Either | Can show density or group separation |
Hue refers to the type of colour; what we usually mean when we say red, blue, green, yellow, etc.
It distinguishes different colours along the colour wheel, not how bright or dark they are.
ggplot2 termsWhen you map a categorical variable to colour (or fill), ggplot2 typically varies the hue:
Each penguin species (palmerpenguins::penguins) gets a different hue (blue, green, red).
This uses a qualitative palette, where hues are different but brightness is roughly equal.
There’s no order implied; just distinct color identities.
| Type | Used For | Visual Meaning |
|---|---|---|
| Hue-based (qualitative) | Categorical data | Different kinds |
| Lightness-based (sequential) | Continuous data | More ↔ Less |
| Diverging (two hues) | Continuous data around a midpoint | Below vs. above average |
Complete RMarkdown document 3_aaa.Rmd for combining attributes and aesthetics.
Next week we will continue with data visualisation in ggplot2. For basics of data visualisation in ggplot2 see
Identify a dataset you would like to use for the formative assessment: post a description on our Teams channel.
Think about what you would like to visualise about these data:
Andrews, M. (2021). Doing data science in R: An introduction for Social Scientists. SAGE Publications Ltd.
Cleveland, W. S., & McGill, R. (1984). Graphical perception: Theory, experimentation, and application to the development of graphical methods. Journal of the American Statistical Association, 79(387), 531–554.
Wickham, H. (2010). A layered grammar of graphics. Journal of Computational and Graphical Statistics, 19(1), 3–28.
Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer.
Wickham, H., & Grolemund, G. (2016). R for data science: Import, tidy, transform, visualize, and model data. O’Reilly Media, Inc.
Wilkinson, L. (2005). The grammar of graphics (2nd ed.). Springer.
Wong, B. (2010). Points of view: Design of data figures. Nature Methods, 7(9), 665.