PSYC40940: Grammar of Graphics

General outline for weeks 2 – 5

~~Week 2: Principles of data visualisation~~
Week 3: Grammar of Graphics; aesthetics and attributes
Week 4: Major visualisation tools
Week 5: Customising visualisations (scales, themes, and labels)

Objectives for today

Understanding the principle of the Grammar of Graphics
Differentiating between aesthetics and attributes
Using a variety of geometries, aesthetics and attributions in data visualization
Deciding between appropriate aesthetics for continuous and categorical variables

Download exercises from Week 3 folder on NOW and move them into your R-project directory.

Grammar of Graphics and `ggplot2`

What do all data visualisations have in common?

Core copmonents

Component	Example	Description
Data	`d_spellname`	The dataset being plotted
Aesthetics (aes)	`x = dur, y = rt`	How data variables map to visual properties
Geometries (geom)	`geom_point()`	The type of plot element (points, bars, lines)
Statistics (stat)	`stat_smooth()`	Transformations of data (e.g., regression lines)
Scales	`scale_x_log10()`	How data values are converted to aesthetics
Facets	`facet_wrap(~group)`	Splitting data into small multiples
Coordinate system	`coord_polar()`	The space where data are drawn (Cartesian, polar)
Theme	`theme_minimal()`	Non-data display elements (fonts, text, grids, etc.)

Grammar of Graphics

Wilkinson (2005)

ggplot2 is an R package for creating statistical / data graphics
It’s based on a Grammar of Graphics (Wilkinson 2005), hence “gg”
Framework for data visualisation emphasising a layered approach.
You assemble plots by combining independent components
Default settings allow rapid production of high-quality plots
The theming system allows custom styling
See also Wickham (2016), Wickham (2010)

Grammar of Graphics

Wilkinson (2005)

Fundamental features that underlie all statistical graphics.
Grammar of Graphics answera the question “what is a statistical graphic”?
ggplot2 builds on Wilkinson’s grammar by focussing on the primacy of layers and adapting it for use in R (Wickham 2010).
Grammar describes how graphics map between data → aesthetics / attributes (colour, shape, size) → geometric objects (points, lines, bars).
Higher-level plotting system (vs plot(), hist())
Complex visualisations can be creased with a minimal amount of code
Supports statistical transformations and coordinate systems
Enables faceting (splitting data into subsets)
Themes control visual appearance (fonts, backgrounds, etc.)

Grammar of Graphics

Wilkinson (2005)

Think about language where grammar determines how words can be combined into sentences

“words”: graphics consist of distinct layers of grammatical elements (data, aesthetics, geometries)
“grammar”: graphics are built around mappings that determine how data, aesthetics and geometries are combined.
Grammatical elements are organised as layers
Underlying grammar controls how graphics are combined
System of rules for mapping variables to graphical properties
Instead of memorising “plot types,” Grammar of Graphics teaches how plots are built.
It gives flexibility: you can create any plot by combining consistent building blocks.

Obligatory grammatical elements

data: the data you want to visualise indicated as ggplot(data = ...)
aesthetics: mapping of data to graphic properties (axes, size, colour) indicated as mapping = aes()
geometries: visual elements encoding the data indicated as geom_...()

ggplot(d_spellname, aes(x = dur, y = rt))

ggplot(d_spellname, aes(x = dur, y = rt)) +
  geom_point()

ggplot(d_spellname, aes(x = dur, y = rt)) +
  geom_quantile()

ggplot(d_spellname, aes(x = dur, y = rt)) +
  geom_rug()

ggplot(d_spellname, aes(x = dur, y = rt)) +
  geom_point() +
  geom_quantile() +
  geom_rug()

Optional grammatical elements

facets: dividing data into subplots
statistics: summarising representations
coordinates: plotting space
theme: visual properties not related to the data (font, background)

data
aesthetics
geometries

ggplot(d_spellname, aes(x = dur, y = rt)) +
  geom_point()

data
aesthetics
geometries
facets

ggplot(d_spellname, aes(x = dur, y = rt)) +
  geom_point() +
  facet_grid( ~ modality)

data
aesthetics
geometries
facets
statistics

ggplot(d_spellname, aes(x = dur, y = rt)) +
  geom_point() +
  stat_smooth(method = "lm", se = FALSE)

data
aesthetics
geometries
facets
statistics
coordinates

ggplot(d_spellname, aes(x = dur, y = rt)) +
  geom_point() +
  coord_trans(x = "log", y = "log")

data
aesthetics
geometries
facets
statistics
coordinates

ggplot(d_spellname, aes(x = dur, y = rt)) +
  geom_point() +
  coord_flip()

data
aesthetics
geometries
facets
statistics
coordinates
theme

ggplot(d_spellname, aes(x = dur, y = rt)) +
  geom_point() +
  theme_dark()

data
aesthetics
geometries
facets
statistics
coordinates
theme

ggplot(d_spellname, aes(x = dur, y = rt)) +
  geom_point() +
  theme(panel.background = element_blank())

Grammatical elements in action

Complete RMarkdown document 1a_gog.Rmd

Bonus: 1b_gog.Rmd

Which layer corresponds to which part of the grammar?

ggplot(d_spellname, aes(x = dur, y = rt, colour = group)) +
  geom_point(size = .75) +
  stat_smooth(method = "lm", se = F) +
  labs(x = "Response duration (in msecs)",
       y = "Reaction time (in msecs)") +
  theme(legend.position = "right",
       legend.justification = "top",
       legend.direction = "vertical")

Aesthetics vs Attributes

Aesthetics: visual properties mapped to variables (data-driven) e.g. aes(x = duration, y = time, colour = group) → colour changes with data

Attributes: visual properties set manually (constant) e.g. geom_point(colour = "blue", size = 2) → colour is fixed, not mapped to data

Think:

Aesthetic = mapping (data → visual)

Attribute = setting (visual → fixed property)

Aesthetics vs Attributes

appearance of geometries
e.g. colour, size, shape
attributes take properties
aesthetics take variables

ggplot(d_spellname, aes(x = dur, y = rt)) +
  geom_point(colour = "red")

Aesthetics vs Attributes

appearance of geometries
e.g. colour, size, shape
attributes take properties
aesthetics take variables

ggplot(d_spellname, aes(x = dur, y = rt)) +
  geom_point(aes(colour = modality))

Aesthetics vs Attributes

appearance of geometries
e.g. colour, size, shape
attributes take properties
aesthetics take variables

ggplot(d_spellname, aes(x = dur, y = rt)) +
  geom_point(aes(colour = modality)) +
  stat_smooth(method = "lm")

Aesthetics vs Attributes

appearance of geometries
e.g. colour, size, shape
attributes take properties
aesthetics take variables

ggplot(d_spellname, aes(x = dur, y = rt)) +
  geom_point() +
  stat_smooth(aes(colour = modality), method = "lm")

Aesthetics vs Attributes

appearance of geometries
e.g. colour, size, shape
attributes take properties
aesthetics take variables

ggplot(d_spellname, aes(x = dur, y = rt)) +
  geom_point(aes(colour = modality)) +
  stat_smooth(aes(colour = modality), method = "lm")

Aesthetics vs Attributes

appearance of geometries
e.g. colour, size, shape
attributes take properties
aesthetics take variables

ggplot(d_spellname, aes(x = dur, y = rt, colour = modality)) +
  geom_point() +
  stat_smooth(method = "lm")

Aesthetics vs Attributes

ggplot(d_spellname, aes(x = dur, y = rt, shape = modality)) +
  geom_point(size = 2.5, colour = "red")

Aesthetics vs Attributes

ggplot(d_spellname, aes(x = dur, y = rt, colour = modality)) +
  geom_point(size = 2.5)

Aesthetics vs Attributes

ggplot(d_spellname, aes(x = dur, y = rt, colour = modality, shape = modality))   +
  geom_point(size = 2.5)

Aesthetics vs Attributes

ggplot(d_spellname, aes(x = dur, y = rt, colour = modality, shape = name_familiarised)) +
  geom_point(size = 2.5)

Aesthetics vs Attributes

ggplot(d_spellname, aes(x = dur, y = rt, colour = interaction(name_familiarised, modality, sep = "; "))) +
  geom_point(size = 2.5) +
  labs(colour = "")

Aesthetics

typically `x`, `y`, `colour`, `fill`, `group`

some are required by geometries; others are optional
continuous vs discrete variables:
- e.g. shape and label can only be used for categorical values
should be chosen to facilitate comprehension

scatterplot: geom_point()

x, y, shape, colour, fill, size, alpha, stroke, group

barplot: geom_bar()

x, y, colour, fill, linewidth, linetype, alpha, width, group

boxplot: geom_boxplot()

x, y, lower, xlower, upper, xupper, middle, xmiddle, ymin, 
xmin, ymax, xmax, weight, colour, fill, size, alpha, shape, 
linetype, linewidth, width, group

Aesthetics vs Attributes

Complete RMarkdown documents

2a_attributes.R for attributes
2b_aesthetics.R for aesthetics

Continuous vs Categorical Variables

The type—<dbl> and <chr>—determines which aesthetics you can map to it, and how ggplot2 interprets it visually.

glimpse(d_spellname)

Rows: 144
Columns: 7
$ ppt_id            <dbl> 40, 40, 41, 41, 42, 42, 43, 43, 44, 44, 45, 45, 46, …
$ name_familiarised <chr> "familiar name", "unfamiliar name", "familiar name",…
$ ppt_vocab         <dbl> 0.9500, 0.9500, 1.0000, 1.0000, 0.9250, 0.9250, 0.90…
$ modality          <chr> "speech", "speech", "speech", "speech", "speech", "s…
$ rt                <dbl> 1476.1579, 1249.5161, 1404.2000, 1170.5397, 1283.455…
$ dur               <dbl> 735.2632, 686.8065, 565.0400, 542.4603, 623.5147, 63…
$ group             <chr> "Modality: speech; familiar name", "Modality: speech…

ggplot2 automatically detects variable type:

Continuous: numeric variables (e.g., age, height, RT, temperature)
Categorical: factors or character variables (e.g., group, species, modality)

Sometimes categorical variables are coded as numbers; condition = 1, 2, 3, 4; male = 1, female = 2

Continuous Variables: Visual Decoding

See Wong (2010) page 665

Examples: age, height, RT, duration, temperature
Convey magnitude and order

Best represented using gradual or quantitative encodings:

position on an aligned scale (x, y)
length, angle, area, volume
colour gradients (e.g., light → dark)
- monochromatic colour spectrum (saturation, grey scale)
- pure spectrum colours
size (circle area, line width)

Continuous Variables: Position on common scale

ggplot(d_spellname, aes(x = modality, y = rt)) +
  geom_jitter()

Continuous Variables: Non-aligned scale

ggplot(d_spellname, aes(x = modality, y = rt)) +
  geom_jitter() +
  facet_wrap(~modality, scales = "free")

Continuous Variables: Colour gradients

ggplot(d_spellname, aes(x = dur, 
                        y = rt, 
                        colour = ppt_vocab)) +
  geom_point() +
  scale_colour_gradient(high = "darkred", low = "yellow")

Categorical Variables: Visual Decoding

See Wong (2010) page 665

Examples of categorical variables: condition, response modality, species
Convey difference in kind, not quantity

Best represented using distinct encodings:

position on discrete axis (x)
distinct shapes
distinct hues (qualitative color palette)

Categorical Variables: Colour gradients

ggplot(d_spellname, aes(x = dur, 
                        y = rt, 
                        colour = modality)) +
  geom_point() +
  scale_colour_brewer(type = "qual")

Categorical Variables: `text` / `labels`

ggplot(d_spellname, aes(x = dur, 
                        y = rt,
                        label = modality)) +
  geom_text(size = 3)

qualitative colours, labels, line colours
sequential colours, shape outlines, line type
filled shapes, hatching (shading with lines), line width

Categorical Variables: `shape`

ggplot(d_spellname, aes(x = dur, 
                        y = rt,
                        shape = modality)) +
  geom_point(size = 3)

qualitative colours, labels, line colours
sequential colours, shape outlines, line type
filled shapes, hatching (shading with lines), line width

Categorical Variables: Hue

ggplot(d_spellname, aes(x = dur, 
                        y = rt,
                        colour = modality)) +
  geom_point(size = 3)

qualitative colours, labels, line colours
sequential colours, shape outlines, line type
filled shapes, hatching (shading with lines), line width

Categorical Variables: Hue

ggplot(d_spellname, aes(x = dur, 
                        y = rt,
                        colour = modality)) +
  stat_smooth(method = "lm", se = F)

qualitative colours, labels, line colours
sequential colours, shape outlines, line type
filled shapes, hatching (shading with lines), line width

Categorical Variables: `linetype`

ggplot(d_spellname, aes(x = dur, 
                        y = rt,
                        linetype = modality)) +
  stat_smooth(method = "lm", se = F)

qualitative colours, labels, line colours
sequential colours, shape outlines, line type
filled shapes, hatching (shading with lines), line width

Categorical Variables: `size`

ggplot(d_spellname, aes(x = dur, 
                        y = rt,
                        size = modality)) +
  stat_smooth(method = "lm", se = F)

qualitative colours, labels, line colours
sequential colours, shape outlines, line type
filled shapes, hatching (shading with lines), line width

Continuous vs Categorical Variables

Variable Type	Typical Aesthetics	Visual Encoding
Continuous	x, y, size, color (gradient), alpha	magnitude, order
Categorical	x, shape, color (hue), fill	category, grouping

Visual Encoding Hierarchy

Which aesthetics are most effective for continuous vs categorical variables?

Principles of perceptual psychology and visual encoding theory are underpinning the Grammar of Graphics and ggplot2’s design choices (Wilkinson 2005; Cleveland and McGill 1984)

The human visual system perceives some encodings more accurately than others.

Accuracy (for continuous data)	Visual Encoding	Example in ggplot2
Position on a common scale	where a point lies along an axis	`aes(x, y)`
Length / Size (1D)	bar height, line length	`geom_bar()`, `aes(size)`
Angle / Slope	pie chart slices, line angle	`coord_polar()`, `geom_line()`
Area (2D size)	circle or bubble area	`aes(size)` (use carefully)
Colour luminance / saturation	brightness / lightness gradient	`scale_colour_gradient()`
Colour hue	qualitative differences	`scale_colour_brewer()`
Shape / Symbol type	triangle, circle, square	`aes(shape)`
Spatial region / volume	map regions, 3D volume	`geom_polygon()`

The top of the hierarchy (position, length) is best for continuous variables
The bottom (hue, shape) is best for categorical variables

Applied to `ggplot2`

Continuous variables prefer position / gradual encodings:

x, y: continuous scales (most accurate)
size: can work, avoid misleading area scaling
colour: use gradients
alpha: transparency can suggest intensity

Avoid using hue or shape: humans don’t perceive continuous changes in colour hue or shape as ordered or proportional.

Categorical variables prefer distinct encodings:

x, colour, fill, shape, linetype
Use qualitative palettes (e.g., scale_colour_brewer(type = "qual"))
Don’t map numeric variables to hue unless they’re binned or converted to factors.

Each category should be visually distinct: no gradient or ordering implied.

Which plot communicates difference better?

# Modality as character
ggplot(d_spellname, aes(x = dur, 
                        y = rt, 
                        colour = modality)) +
  geom_point() 

# Convert modality to numbers
d_spell_2 <- mutate(d_spellname, 
                    mod_2 = as.numeric(factor(modality))) 

ggplot(d_spell_2, aes(x = dur, 
                          y = rt, 
                          colour = mod_2)) +
  geom_point()

Which plot shows quantity more precisely?

# ppt_vocab as numeric
ggplot(d_spellname, aes(x = dur, 
                        y = rt, 
                        colour = ppt_vocab)) +
  geom_point() 

# Convert ppt_vocab to characters
d_spell_2 <- mutate(d_spellname, 
                        vocab_2 = as.character(ppt_vocab))

ggplot(d_spell_2, aes(x = dur, 
                          y = rt, 
                          colour = vocab_2)) +
  geom_point()

Summary Hierarchy

Aesthetic	Best For	Why
Position (x, y)	Continuous	Most precise, linear mapping
Length / Size	Continuous	Perceived magnitude
Color (lightness)	Continuous	Suggests intensity or order
Color (hue)	Categorical	Distinct, unordered groups
Shape / Linetype	Categorical	Symbolic differences
Fill / Alpha	Either	Can show density or group separation

What does “Hue” mean?

Hue refers to the type of colour; what we usually mean when we say red, blue, green, yellow, etc.

It distinguishes different colours along the colour wheel, not how bright or dark they are.

Hue = the colour family
Lightness / Luminance
Saturation = pure or intense

In `ggplot2` terms

When you map a categorical variable to colour (or fill), ggplot2 typically varies the hue:

Each penguin species (palmerpenguins::penguins) gets a different hue (blue, green, red).

This uses a qualitative palette, where hues are different but brightness is roughly equal.

There’s no order implied; just distinct color identities.

Example: Hue vs. Brightness

Type	Used For	Visual Meaning
Hue-based (qualitative)	Categorical data	Different kinds
Lightness-based (sequential)	Continuous data	More ↔ Less
Diverging (two hues)	Continuous data around a midpoint	Below vs. above average

Aesthetics vs Attributes

Complete RMarkdown document 3_aaa.Rmd for combining attributes and aesthetics.

Reading

Next week we will continue with data visualisation in ggplot2. For basics of data visualisation in ggplot2 see

Homework

Identify a dataset you would like to use for the formative assessment: post a description on our Teams channel.

Think about what you would like to visualise about these data:

Combination of variables
Variable types and appropriate aesthetics
Raw data and / or summary statistics
Look for appropriate visualisation tools HERE

References

Andrews, Mark. 2021. Doing Data Science in R: An Introduction for Social Scientists. SAGE Publications Ltd.

Cleveland, William S, and Robert McGill. 1984. “Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods.” Journal of the American Statistical Association 79 (387): 531–54.

Wickham, Hadley. 2010. “A Layered Grammar of Graphics.” Journal of Computational and Graphical Statistics 19 (1): 3–28.

———. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer.

Wickham, Hadley, and Garrett Grolemund. 2016. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. O’Reilly Media, Inc.

Wilkinson, Leland. 2005. The Grammar of Graphics. 2nd ed. Statistics and Computing. New York: Springer.

Wong, Bang. 2010. “Points of View: Design of Data Figures.” Nature Methods 7 (9): 665.

General outline for weeks 2 – 5

Objectives for today

Grammar of Graphics and ggplot2

What do all data visualisations have in common?

Core copmonents

Grammar of Graphics

Wilkinson (2005)

Grammar of Graphics

Wilkinson (2005)

Grammar of Graphics

Wilkinson (2005)

Obligatory grammatical elements

Optional grammatical elements

Grammatical elements in action

Which layer corresponds to which part of the grammar?

Aesthetics vs Attributes

Aesthetics vs Attributes

Aesthetics vs Attributes

Aesthetics vs Attributes

Aesthetics vs Attributes

Aesthetics vs Attributes

Aesthetics vs Attributes

Aesthetics vs Attributes

Aesthetics vs Attributes

Aesthetics vs Attributes

Aesthetics vs Attributes

Aesthetics vs Attributes

Aesthetics vs Attributes

Aesthetics

typically x, y, colour, fill, group

Aesthetics vs Attributes

Continuous vs Categorical Variables

Continuous Variables: Visual Decoding

Continuous Variables: Position on common scale

Continuous Variables: Non-aligned scale

Continuous Variables: Colour gradients

Categorical Variables: Visual Decoding

Categorical Variables: Colour gradients

Categorical Variables: text / labels

Categorical Variables: shape

Categorical Variables: Hue

Categorical Variables: Hue

Categorical Variables: linetype

Categorical Variables: size

Continuous vs Categorical Variables

Visual Encoding Hierarchy

Applied to ggplot2

Which plot communicates difference better?

Which plot communicates difference better?

Which plot shows quantity more precisely?

Summary Hierarchy

What does “Hue” mean?

In ggplot2 terms

Example: Hue vs. Brightness

Aesthetics vs Attributes

Reading

Homework

References

Grammar of Graphics and `ggplot2`

typically `x`, `y`, `colour`, `fill`, `group`

Categorical Variables: `text` / `labels`

Categorical Variables: `shape`

Categorical Variables: `linetype`

Categorical Variables: `size`

Applied to `ggplot2`

In `ggplot2` terms