General outline for weeks 2 – 5

  • Week 2: Principles of data visualisation
  • Week 3: Grammar of Graphics; aesthetics and attributes
  • Week 4: Major visualisation tools
  • Week 5: Customising visualisations (scales, themes, and labels)

Objectives for today

  • Understanding the principle of the Grammar of Graphics
  • Differentiating between aesthetics and attributes
  • Using a variety of geometries, aesthetics and attributions in data visualization
  • Deciding between appropriate aesthetics for continuous and categorical variables

Download exercises from Week 3 folder on NOW and move them into your R-project directory.

Grammar of Graphics and ggplot2

What do all data visualisations have in common?

Core copmonents

Component Example Description
Data d_spellname The dataset being plotted
Aesthetics (aes) x = dur, y = rt How data variables map to visual properties
Geometries (geom) geom_point() The type of plot element (points, bars, lines)
Statistics (stat) stat_smooth() Transformations of data (e.g., regression lines)
Scales scale_x_log10() How data values are converted to aesthetics
Facets facet_wrap(~group) Splitting data into small multiples
Coordinate system coord_polar() The space where data are drawn (Cartesian, polar)
Theme theme_minimal() Non-data display elements (fonts, text, grids, etc.)

Grammar of Graphics

Wilkinson (2005)

  • ggplot2 is an R package for creating statistical / data graphics
  • It’s based on a Grammar of Graphics (Wilkinson 2005), hence “gg”
  • Framework for data visualisation emphasising a layered approach.
  • You assemble plots by combining independent components
  • Default settings allow rapid production of high-quality plots
  • The theming system allows custom styling
  • See also Wickham (2016), Wickham (2010)

Grammar of Graphics

Wilkinson (2005)

  • Fundamental features that underlie all statistical graphics.
  • Grammar of Graphics answera the question “what is a statistical graphic”?
  • ggplot2 builds on Wilkinson’s grammar by focussing on the primacy of layers and adapting it for use in R (Wickham 2010).
  • Grammar describes how graphics map between dataaesthetics / attributes (colour, shape, size) → geometric objects (points, lines, bars).
  • Higher-level plotting system (vs plot(), hist())
  • Complex visualisations can be creased with a minimal amount of code
  • Supports statistical transformations and coordinate systems
  • Enables faceting (splitting data into subsets)
  • Themes control visual appearance (fonts, backgrounds, etc.)

Grammar of Graphics

Wilkinson (2005)

Think about language where grammar determines how words can be combined into sentences

  • “words”: graphics consist of distinct layers of grammatical elements (data, aesthetics, geometries)

  • “grammar”: graphics are built around mappings that determine how data, aesthetics and geometries are combined.

  • Grammatical elements are organised as layers

  • Underlying grammar controls how graphics are combined

  • System of rules for mapping variables to graphical properties

  • Instead of memorising “plot types,” Grammar of Graphics teaches how plots are built.

  • It gives flexibility: you can create any plot by combining consistent building blocks.

Obligatory grammatical elements

  • data: the data you want to visualise indicated as ggplot(data = ...)
  • aesthetics: mapping of data to graphic properties (axes, size, colour) indicated as mapping = aes()
  • geometries: visual elements encoding the data indicated as geom_...()

ggplot(d_spellname, aes(x = dur, y = rt))

ggplot(d_spellname, aes(x = dur, y = rt)) +
  geom_point()

ggplot(d_spellname, aes(x = dur, y = rt)) +
  geom_quantile()

ggplot(d_spellname, aes(x = dur, y = rt)) +
  geom_rug()

ggplot(d_spellname, aes(x = dur, y = rt)) +
  geom_point() +
  geom_quantile() +
  geom_rug()

Optional grammatical elements

  • facets: dividing data into subplots
  • statistics: summarising representations
  • coordinates: plotting space
  • theme: visual properties not related to the data (font, background)

  • data
  • aesthetics
  • geometries

ggplot(d_spellname, aes(x = dur, y = rt)) +
  geom_point()

  • data
  • aesthetics
  • geometries
  • facets

ggplot(d_spellname, aes(x = dur, y = rt)) +
  geom_point() +
  facet_grid( ~ modality)

  • data
  • aesthetics
  • geometries
  • facets
  • statistics

ggplot(d_spellname, aes(x = dur, y = rt)) +
  geom_point() +
  stat_smooth(method = "lm", se = FALSE) 

  • data
  • aesthetics
  • geometries
  • facets
  • statistics
  • coordinates

ggplot(d_spellname, aes(x = dur, y = rt)) +
  geom_point() +
  coord_trans(x = "log", y = "log")

  • data
  • aesthetics
  • geometries
  • facets
  • statistics
  • coordinates

ggplot(d_spellname, aes(x = dur, y = rt)) +
  geom_point() +
  coord_flip()

  • data
  • aesthetics
  • geometries
  • facets
  • statistics
  • coordinates
  • theme

ggplot(d_spellname, aes(x = dur, y = rt)) +
  geom_point() +
  theme_dark()

  • data
  • aesthetics
  • geometries
  • facets
  • statistics
  • coordinates
  • theme

ggplot(d_spellname, aes(x = dur, y = rt)) +
  geom_point() +
  theme(panel.background = element_blank())

Grammatical elements in action

Complete RMarkdown document 1a_gog.Rmd

Bonus: 1b_gog.Rmd

Which layer corresponds to which part of the grammar?

ggplot(d_spellname, aes(x = dur, y = rt, colour = group)) +
  geom_point(size = .75) +
  stat_smooth(method = "lm", se = F) +
  labs(x = "Response duration (in msecs)",
       y = "Reaction time (in msecs)") +
  theme(legend.position = "right",
       legend.justification = "top",
       legend.direction = "vertical") 

Aesthetics vs Attributes

Aesthetics vs Attributes

Aesthetics: visual properties mapped to variables (data-driven) e.g. aes(x = duration, y = time, colour = group) → colour changes with data

Attributes: visual properties set manually (constant) e.g. geom_point(colour = "blue", size = 2) → colour is fixed, not mapped to data

Think:

Aesthetic = mapping (data → visual)

Attribute = setting (visual → fixed property)

Aesthetics vs Attributes

  • appearance of geometries
  • e.g. colour, size, shape
  • attributes take properties
  • aesthetics take variables

ggplot(d_spellname, aes(x = dur, y = rt)) +
  geom_point(colour = "red")

Aesthetics vs Attributes

  • appearance of geometries
  • e.g. colour, size, shape
  • attributes take properties
  • aesthetics take variables

ggplot(d_spellname, aes(x = dur, y = rt)) +
  geom_point(aes(colour = modality))

Aesthetics vs Attributes

  • appearance of geometries
  • e.g. colour, size, shape
  • attributes take properties
  • aesthetics take variables

ggplot(d_spellname, aes(x = dur, y = rt)) +
  geom_point(aes(colour = modality)) +
  stat_smooth(method = "lm")

Aesthetics vs Attributes

  • appearance of geometries
  • e.g. colour, size, shape
  • attributes take properties
  • aesthetics take variables

ggplot(d_spellname, aes(x = dur, y = rt)) +
  geom_point() +
  stat_smooth(aes(colour = modality), method = "lm")

Aesthetics vs Attributes

  • appearance of geometries
  • e.g. colour, size, shape
  • attributes take properties
  • aesthetics take variables

ggplot(d_spellname, aes(x = dur, y = rt)) +
  geom_point(aes(colour = modality)) +
  stat_smooth(aes(colour = modality), method = "lm")

Aesthetics vs Attributes

  • appearance of geometries
  • e.g. colour, size, shape
  • attributes take properties
  • aesthetics take variables

ggplot(d_spellname, aes(x = dur, y = rt, colour = modality)) +
  geom_point() +
  stat_smooth(method = "lm")

Aesthetics vs Attributes

ggplot(d_spellname, aes(x = dur, y = rt, shape = modality)) +
  geom_point(size = 2.5, colour = "red") 

Aesthetics vs Attributes

ggplot(d_spellname, aes(x = dur, y = rt, colour = modality)) +
  geom_point(size = 2.5)

Aesthetics vs Attributes

ggplot(d_spellname, aes(x = dur, y = rt, colour = modality, shape = modality))   +
  geom_point(size = 2.5)

Aesthetics vs Attributes

ggplot(d_spellname, aes(x = dur, y = rt, colour = modality, shape = name_familiarised)) +
  geom_point(size = 2.5)

Aesthetics vs Attributes

ggplot(d_spellname, aes(x = dur, y = rt, colour = interaction(name_familiarised, modality, sep = "; "))) +
  geom_point(size = 2.5) +
  labs(colour = "")

Aesthetics

typically x, y, colour, fill, group

  • some are required by geometries; others are optional
  • continuous vs discrete variables:
    • e.g. shape and label can only be used for categorical values
  • should be chosen to facilitate comprehension
  • scatterplot: geom_point()
x, y, shape, colour, fill, size, alpha, stroke, group
  • barplot: geom_bar()
x, y, colour, fill, linewidth, linetype, alpha, width, group
  • boxplot: geom_boxplot()
x, y, lower, xlower, upper, xupper, middle, xmiddle, ymin, 
xmin, ymax, xmax, weight, colour, fill, size, alpha, shape, 
linetype, linewidth, width, group

Aesthetics vs Attributes

Complete RMarkdown documents

  • 2a_attributes.R for attributes
  • 2b_aesthetics.R for aesthetics

Continuous vs Categorical Variables

The type—<dbl> and <chr>—determines which aesthetics you can map to it, and how ggplot2 interprets it visually.

glimpse(d_spellname)
Rows: 144
Columns: 7
$ ppt_id            <dbl> 40, 40, 41, 41, 42, 42, 43, 43, 44, 44, 45, 45, 46, …
$ name_familiarised <chr> "familiar name", "unfamiliar name", "familiar name",…
$ ppt_vocab         <dbl> 0.9500, 0.9500, 1.0000, 1.0000, 0.9250, 0.9250, 0.90…
$ modality          <chr> "speech", "speech", "speech", "speech", "speech", "s…
$ rt                <dbl> 1476.1579, 1249.5161, 1404.2000, 1170.5397, 1283.455…
$ dur               <dbl> 735.2632, 686.8065, 565.0400, 542.4603, 623.5147, 63…
$ group             <chr> "Modality: speech; familiar name", "Modality: speech…

ggplot2 automatically detects variable type:

  • Continuous: numeric variables (e.g., age, height, RT, temperature)
  • Categorical: factors or character variables (e.g., group, species, modality)

Sometimes categorical variables are coded as numbers; condition = 1, 2, 3, 4; male = 1, female = 2

Continuous Variables: Visual Decoding

See @wong2010points page 665

See Wong (2010) page 665

  • Examples: age, height, RT, duration, temperature
  • Convey magnitude and order

Best represented using gradual or quantitative encodings:

  • position on an aligned scale (x, y)
  • length, angle, area, volume
  • colour gradients (e.g., light → dark)
    • monochromatic colour spectrum (saturation, grey scale)
    • pure spectrum colours
  • size (circle area, line width)

Continuous Variables: Position on common scale

ggplot(d_spellname, aes(x = modality, y = rt)) +
  geom_jitter() 

Continuous Variables: Non-aligned scale

ggplot(d_spellname, aes(x = modality, y = rt)) +
  geom_jitter() +
  facet_wrap(~modality, scales = "free") 

Continuous Variables: Colour gradients

ggplot(d_spellname, aes(x = dur, 
                        y = rt, 
                        colour = ppt_vocab)) +
  geom_point() +
  scale_colour_gradient(high = "darkred", low = "yellow")

Categorical Variables: Visual Decoding

See @wong2010points page 665

See Wong (2010) page 665

  • Examples of categorical variables: condition, response modality, species
  • Convey difference in kind, not quantity

Best represented using distinct encodings:

  • position on discrete axis (x)
  • distinct shapes
  • distinct hues (qualitative color palette)

Categorical Variables: Colour gradients

ggplot(d_spellname, aes(x = dur, 
                        y = rt, 
                        colour = modality)) +
  geom_point() +
  scale_colour_brewer(type = "qual")

Categorical Variables: text / labels

ggplot(d_spellname, aes(x = dur, 
                        y = rt,
                        label = modality)) +
  geom_text(size = 3)

  • qualitative colours, labels, line colours
  • sequential colours, shape outlines, line type
  • filled shapes, hatching (shading with lines), line width

Categorical Variables: shape

ggplot(d_spellname, aes(x = dur, 
                        y = rt,
                        shape = modality)) +
  geom_point(size = 3)

  • qualitative colours, labels, line colours
  • sequential colours, shape outlines, line type
  • filled shapes, hatching (shading with lines), line width

Categorical Variables: Hue

ggplot(d_spellname, aes(x = dur, 
                        y = rt,
                        colour = modality)) +
  geom_point(size = 3)

  • qualitative colours, labels, line colours
  • sequential colours, shape outlines, line type
  • filled shapes, hatching (shading with lines), line width

Categorical Variables: Hue

ggplot(d_spellname, aes(x = dur, 
                        y = rt,
                        colour = modality)) +
  stat_smooth(method = "lm", se = F)

  • qualitative colours, labels, line colours
  • sequential colours, shape outlines, line type
  • filled shapes, hatching (shading with lines), line width

Categorical Variables: linetype

ggplot(d_spellname, aes(x = dur, 
                        y = rt,
                        linetype = modality)) +
  stat_smooth(method = "lm", se = F)

  • qualitative colours, labels, line colours
  • sequential colours, shape outlines, line type
  • filled shapes, hatching (shading with lines), line width

Categorical Variables: size

ggplot(d_spellname, aes(x = dur, 
                        y = rt,
                        size = modality)) +
  stat_smooth(method = "lm", se = F)

  • qualitative colours, labels, line colours
  • sequential colours, shape outlines, line type
  • filled shapes, hatching (shading with lines), line width

Continuous vs Categorical Variables

Variable Type Typical Aesthetics Visual Encoding
Continuous x, y, size, color (gradient), alpha magnitude, order
Categorical x, shape, color (hue), fill category, grouping

Visual Encoding Hierarchy

Which aesthetics are most effective for continuous vs categorical variables?

Principles of perceptual psychology and visual encoding theory are underpinning the Grammar of Graphics and ggplot2’s design choices (Wilkinson 2005; Cleveland and McGill 1984)

The human visual system perceives some encodings more accurately than others.

Accuracy (for continuous data) Visual Encoding Example in ggplot2
Position on a common scale where a point lies along an axis aes(x, y)
Length / Size (1D) bar height, line length geom_bar(), aes(size)
Angle / Slope pie chart slices, line angle coord_polar(), geom_line()
Area (2D size) circle or bubble area aes(size) (use carefully)
Colour luminance / saturation brightness / lightness gradient scale_colour_gradient()
Colour hue qualitative differences scale_colour_brewer()
Shape / Symbol type triangle, circle, square aes(shape)
Spatial region / volume map regions, 3D volume geom_polygon()


  • The top of the hierarchy (position, length) is best for continuous variables
  • The bottom (hue, shape) is best for categorical variables

Applied to ggplot2

Continuous variables prefer position / gradual encodings:

  • x, y: continuous scales (most accurate)
  • size: can work, avoid misleading area scaling
  • colour: use gradients
  • alpha: transparency can suggest intensity

Avoid using hue or shape: humans don’t perceive continuous changes in colour hue or shape as ordered or proportional.

Categorical variables prefer distinct encodings:

  • x, colour, fill, shape, linetype
  • Use qualitative palettes (e.g., scale_colour_brewer(type = "qual"))
  • Don’t map numeric variables to hue unless they’re binned or converted to factors.

Each category should be visually distinct: no gradient or ordering implied.

Which plot communicates difference better?

Which plot communicates difference better?

# Modality as character
ggplot(d_spellname, aes(x = dur, 
                        y = rt, 
                        colour = modality)) +
  geom_point() 

# Convert modality to numbers
d_spell_2 <- mutate(d_spellname, 
                    mod_2 = as.numeric(factor(modality))) 

ggplot(d_spell_2, aes(x = dur, 
                          y = rt, 
                          colour = mod_2)) +
  geom_point() 

Which plot shows quantity more precisely?

# ppt_vocab as numeric
ggplot(d_spellname, aes(x = dur, 
                        y = rt, 
                        colour = ppt_vocab)) +
  geom_point() 

# Convert ppt_vocab to characters
d_spell_2 <- mutate(d_spellname, 
                        vocab_2 = as.character(ppt_vocab))

ggplot(d_spell_2, aes(x = dur, 
                          y = rt, 
                          colour = vocab_2)) +
  geom_point() 

Summary Hierarchy

Aesthetic Best For Why
Position (x, y) Continuous Most precise, linear mapping
Length / Size Continuous Perceived magnitude
Color (lightness) Continuous Suggests intensity or order
Color (hue) Categorical Distinct, unordered groups
Shape / Linetype Categorical Symbolic differences
Fill / Alpha Either Can show density or group separation

What does “Hue” mean?

Hue refers to the type of colour; what we usually mean when we say red, blue, green, yellow, etc.

It distinguishes different colours along the colour wheel, not how bright or dark they are.

  • Hue = the colour family
  • Lightness / Luminance
  • Saturation = pure or intense

In ggplot2 terms

When you map a categorical variable to colour (or fill), ggplot2 typically varies the hue:

Each penguin species (palmerpenguins::penguins) gets a different hue (blue, green, red).

This uses a qualitative palette, where hues are different but brightness is roughly equal.

There’s no order implied; just distinct color identities.

Example: Hue vs. Brightness

Type Used For Visual Meaning
Hue-based (qualitative) Categorical data Different kinds
Lightness-based (sequential) Continuous data More ↔ Less
Diverging (two hues) Continuous data around a midpoint Below vs. above average

Aesthetics vs Attributes

Complete RMarkdown document 3_aaa.Rmd for combining attributes and aesthetics.

Reading

Homework

Identify a dataset you would like to use for the formative assessment: post a description on our Teams channel.

Think about what you would like to visualise about these data:

  • Combination of variables
  • Variable types and appropriate aesthetics
  • Raw data and / or summary statistics
  • Look for appropriate visualisation tools HERE

References

Andrews, Mark. 2021. Doing Data Science in R: An Introduction for Social Scientists. SAGE Publications Ltd.

Cleveland, William S, and Robert McGill. 1984. “Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods.” Journal of the American Statistical Association 79 (387): 531–54.

Wickham, Hadley. 2010. “A Layered Grammar of Graphics.” Journal of Computational and Graphical Statistics 19 (1): 3–28.

———. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer.

Wickham, Hadley, and Garrett Grolemund. 2016. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. O’Reilly Media, Inc.

Wilkinson, Leland. 2005. The Grammar of Graphics. 2nd ed. Statistics and Computing. New York: Springer.

Wong, Bang. 2010. “Points of View: Design of Data Figures.” Nature Methods 7 (9): 665.