Week 2: Principles of data visualisation- Week 3: Grammar of Graphics; aesthetics and attributes
- Week 4: Major visualisation tools
- Week 5: Customising visualisations (scales, themes, and labels)
Download exercises from Week 3 folder on NOW and move them into your R-project directory.
ggplot2
Component | Example | Description |
---|---|---|
Data | d_spellname |
The dataset being plotted |
Aesthetics (aes) | x = dur, y = rt |
How data variables map to visual properties |
Geometries (geom) | geom_point() |
The type of plot element (points, bars, lines) |
Statistics (stat) | stat_smooth() |
Transformations of data (e.g., regression lines) |
Scales | scale_x_log10() |
How data values are converted to aesthetics |
Facets | facet_wrap(~group) |
Splitting data into small multiples |
Coordinate system | coord_polar() |
The space where data are drawn (Cartesian, polar) |
Theme | theme_minimal() |
Non-data display elements (fonts, text, grids, etc.) |
ggplot2
is an R package for creating statistical / data graphicsggplot2
builds on Wilkinson’s grammar by focussing on the primacy of layers and adapting it for use in R (Wickham 2010).plot()
, hist()
)Think about language where grammar determines how words can be combined into sentences
“words”: graphics consist of distinct layers of grammatical elements (data, aesthetics, geometries)
“grammar”: graphics are built around mappings that determine how data, aesthetics and geometries are combined.
Grammatical elements are organised as layers
Underlying grammar controls how graphics are combined
System of rules for mapping variables to graphical properties
Instead of memorising “plot types,” Grammar of Graphics teaches how plots are built.
It gives flexibility: you can create any plot by combining consistent building blocks.
ggplot(data = ...)
mapping = aes()
geom_...()
ggplot(d_spellname, aes(x = dur, y = rt))
ggplot(d_spellname, aes(x = dur, y = rt)) + geom_point()
ggplot(d_spellname, aes(x = dur, y = rt)) + geom_quantile()
ggplot(d_spellname, aes(x = dur, y = rt)) + geom_rug()
ggplot(d_spellname, aes(x = dur, y = rt)) + geom_point() + geom_quantile() + geom_rug()
ggplot(d_spellname, aes(x = dur, y = rt)) + geom_point()
ggplot(d_spellname, aes(x = dur, y = rt)) + geom_point() + facet_grid( ~ modality)
ggplot(d_spellname, aes(x = dur, y = rt)) + geom_point() + stat_smooth(method = "lm", se = FALSE)
ggplot(d_spellname, aes(x = dur, y = rt)) + geom_point() + coord_trans(x = "log", y = "log")
ggplot(d_spellname, aes(x = dur, y = rt)) + geom_point() + coord_flip()
ggplot(d_spellname, aes(x = dur, y = rt)) + geom_point() + theme_dark()
ggplot(d_spellname, aes(x = dur, y = rt)) + geom_point() + theme(panel.background = element_blank())
Complete RMarkdown document 1a_gog.Rmd
Bonus: 1b_gog.Rmd
ggplot(d_spellname, aes(x = dur, y = rt, colour = group)) + geom_point(size = .75) + stat_smooth(method = "lm", se = F) + labs(x = "Response duration (in msecs)", y = "Reaction time (in msecs)") + theme(legend.position = "right", legend.justification = "top", legend.direction = "vertical")
Aesthetics: visual properties mapped to variables (data-driven) e.g. aes(x = duration, y = time, colour = group)
→ colour changes with data
Attributes: visual properties set manually (constant) e.g. geom_point(colour = "blue", size = 2)
→ colour is fixed, not mapped to data
Think:
Aesthetic = mapping (data → visual)
Attribute = setting (visual → fixed property)
ggplot(d_spellname, aes(x = dur, y = rt)) + geom_point(colour = "red")
ggplot(d_spellname, aes(x = dur, y = rt)) + geom_point(aes(colour = modality))
ggplot(d_spellname, aes(x = dur, y = rt)) + geom_point(aes(colour = modality)) + stat_smooth(method = "lm")
ggplot(d_spellname, aes(x = dur, y = rt)) + geom_point() + stat_smooth(aes(colour = modality), method = "lm")
ggplot(d_spellname, aes(x = dur, y = rt)) + geom_point(aes(colour = modality)) + stat_smooth(aes(colour = modality), method = "lm")
ggplot(d_spellname, aes(x = dur, y = rt, colour = modality)) + geom_point() + stat_smooth(method = "lm")
ggplot(d_spellname, aes(x = dur, y = rt, shape = modality)) + geom_point(size = 2.5, colour = "red")
ggplot(d_spellname, aes(x = dur, y = rt, colour = modality)) + geom_point(size = 2.5)
ggplot(d_spellname, aes(x = dur, y = rt, colour = modality, shape = modality)) + geom_point(size = 2.5)
ggplot(d_spellname, aes(x = dur, y = rt, colour = modality, shape = name_familiarised)) + geom_point(size = 2.5)
ggplot(d_spellname, aes(x = dur, y = rt, colour = interaction(name_familiarised, modality, sep = "; "))) + geom_point(size = 2.5) + labs(colour = "")
x
, y
, colour
, fill
, group
geom_point()
x, y, shape, colour, fill, size, alpha, stroke, group
geom_bar()
x, y, colour, fill, linewidth, linetype, alpha, width, group
geom_boxplot()
x, y, lower, xlower, upper, xupper, middle, xmiddle, ymin, xmin, ymax, xmax, weight, colour, fill, size, alpha, shape, linetype, linewidth, width, group
Complete RMarkdown documents
The type—<dbl>
and <chr>
—determines which aesthetics you can map to it, and how ggplot2
interprets it visually.
glimpse(d_spellname)
Rows: 144 Columns: 7 $ ppt_id <dbl> 40, 40, 41, 41, 42, 42, 43, 43, 44, 44, 45, 45, 46, … $ name_familiarised <chr> "familiar name", "unfamiliar name", "familiar name",… $ ppt_vocab <dbl> 0.9500, 0.9500, 1.0000, 1.0000, 0.9250, 0.9250, 0.90… $ modality <chr> "speech", "speech", "speech", "speech", "speech", "s… $ rt <dbl> 1476.1579, 1249.5161, 1404.2000, 1170.5397, 1283.455… $ dur <dbl> 735.2632, 686.8065, 565.0400, 542.4603, 623.5147, 63… $ group <chr> "Modality: speech; familiar name", "Modality: speech…
ggplot2
automatically detects variable type:
Sometimes categorical variables are coded as numbers; condition = 1, 2, 3, 4; male = 1, female = 2
See Wong (2010) page 665
Best represented using gradual or quantitative encodings:
ggplot(d_spellname, aes(x = modality, y = rt)) + geom_jitter()
ggplot(d_spellname, aes(x = modality, y = rt)) + geom_jitter() + facet_wrap(~modality, scales = "free")
ggplot(d_spellname, aes(x = dur, y = rt, colour = ppt_vocab)) + geom_point() + scale_colour_gradient(high = "darkred", low = "yellow")
See Wong (2010) page 665
Best represented using distinct encodings:
ggplot(d_spellname, aes(x = dur, y = rt, colour = modality)) + geom_point() + scale_colour_brewer(type = "qual")
text
/ labels
ggplot(d_spellname, aes(x = dur, y = rt, label = modality)) + geom_text(size = 3)
shape
ggplot(d_spellname, aes(x = dur, y = rt, shape = modality)) + geom_point(size = 3)
ggplot(d_spellname, aes(x = dur, y = rt, colour = modality)) + geom_point(size = 3)
ggplot(d_spellname, aes(x = dur, y = rt, colour = modality)) + stat_smooth(method = "lm", se = F)
linetype
ggplot(d_spellname, aes(x = dur, y = rt, linetype = modality)) + stat_smooth(method = "lm", se = F)
size
ggplot(d_spellname, aes(x = dur, y = rt, size = modality)) + stat_smooth(method = "lm", se = F)
Variable Type | Typical Aesthetics | Visual Encoding |
---|---|---|
Continuous | x, y, size, color (gradient), alpha | magnitude, order |
Categorical | x, shape, color (hue), fill | category, grouping |
Which aesthetics are most effective for continuous vs categorical variables?
Principles of perceptual psychology and visual encoding theory are underpinning the Grammar of Graphics and ggplot2
’s design choices (Wilkinson 2005; Cleveland and McGill 1984)
The human visual system perceives some encodings more accurately than others.
Accuracy (for continuous data) | Visual Encoding | Example in ggplot2 |
---|---|---|
Position on a common scale | where a point lies along an axis | aes(x, y) |
Length / Size (1D) | bar height, line length | geom_bar() , aes(size) |
Angle / Slope | pie chart slices, line angle | coord_polar() , geom_line() |
Area (2D size) | circle or bubble area | aes(size) (use carefully) |
Colour luminance / saturation | brightness / lightness gradient | scale_colour_gradient() |
Colour hue | qualitative differences | scale_colour_brewer() |
Shape / Symbol type | triangle, circle, square | aes(shape) |
Spatial region / volume | map regions, 3D volume | geom_polygon() |
ggplot2
Continuous variables prefer position / gradual encodings:
x
, y
: continuous scales (most accurate)size
: can work, avoid misleading area scalingcolour
: use gradientsalpha
: transparency can suggest intensityAvoid using hue or shape: humans don’t perceive continuous changes in colour hue or shape as ordered or proportional.
Categorical variables prefer distinct encodings:
x
, colour
, fill
, shape
, linetype
scale_colour_brewer(type = "qual")
)Each category should be visually distinct: no gradient or ordering implied.
# Modality as character ggplot(d_spellname, aes(x = dur, y = rt, colour = modality)) + geom_point() # Convert modality to numbers d_spell_2 <- mutate(d_spellname, mod_2 = as.numeric(factor(modality))) ggplot(d_spell_2, aes(x = dur, y = rt, colour = mod_2)) + geom_point()
# ppt_vocab as numeric ggplot(d_spellname, aes(x = dur, y = rt, colour = ppt_vocab)) + geom_point() # Convert ppt_vocab to characters d_spell_2 <- mutate(d_spellname, vocab_2 = as.character(ppt_vocab)) ggplot(d_spell_2, aes(x = dur, y = rt, colour = vocab_2)) + geom_point()
Aesthetic | Best For | Why |
---|---|---|
Position (x, y) | Continuous | Most precise, linear mapping |
Length / Size | Continuous | Perceived magnitude |
Color (lightness) | Continuous | Suggests intensity or order |
Color (hue) | Categorical | Distinct, unordered groups |
Shape / Linetype | Categorical | Symbolic differences |
Fill / Alpha | Either | Can show density or group separation |
Hue refers to the type of colour; what we usually mean when we say red, blue, green, yellow, etc.
It distinguishes different colours along the colour wheel, not how bright or dark they are.
ggplot2
termsWhen you map a categorical variable to colour
(or fill
), ggplot2
typically varies the hue:
Each penguin species (palmerpenguins::penguins
) gets a different hue (blue, green, red).
This uses a qualitative palette, where hues are different but brightness is roughly equal.
There’s no order implied; just distinct color identities.
Type | Used For | Visual Meaning |
---|---|---|
Hue-based (qualitative) | Categorical data | Different kinds |
Lightness-based (sequential) | Continuous data | More ↔ Less |
Diverging (two hues) | Continuous data around a midpoint | Below vs. above average |
Complete RMarkdown document 3_aaa.Rmd for combining attributes and aesthetics.
Next week we will continue with data visualisation in ggplot2
. For basics of data visualisation in ggplot2
see
Identify a dataset you would like to use for the formative assessment: post a description on our Teams channel.
Think about what you would like to visualise about these data:
Andrews, Mark. 2021. Doing Data Science in R: An Introduction for Social Scientists. SAGE Publications Ltd.
Cleveland, William S, and Robert McGill. 1984. “Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods.” Journal of the American Statistical Association 79 (387): 531–54.
Wickham, Hadley. 2010. “A Layered Grammar of Graphics.” Journal of Computational and Graphical Statistics 19 (1): 3–28.
———. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer.
Wickham, Hadley, and Garrett Grolemund. 2016. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. O’Reilly Media, Inc.
Wilkinson, Leland. 2005. The Grammar of Graphics. 2nd ed. Statistics and Computing. New York: Springer.
Wong, Bang. 2010. “Points of View: Design of Data Figures.” Nature Methods 7 (9): 665.