General outline for weeks 2 – 5

  • Week 2: Principles of data visualisation
  • Week 3: Grammar of graphics; aesthetics and attributes
  • Week 4: Major visualisation tools
  • Week 5: Customising visualisations (scales, themes, and labels)

Overview

  • Motivation / Relevance
  • ggplot2 teaser
  • Principles of data visualisation

Download “Exercises” folder from NOW Learning Room (week 2). Move files into R-Project folder.

The data

  • Picture naming task
  • Written and spoken responses
  • Manipulation: Prior familiarisation with most-common name for a picture

Exploring data

Rows: 8,670
Columns: 17
$ ppt_id            <dbl> 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, …
$ ppt_vocab         <dbl> 0.95, 0.95, 0.95, 0.95, 0.95, 0.95, 0.95, 0.95, 0.95, 0.95, 0.95, 0.95, 0.95, 0.95, 0.95, 0.…
$ image_id          <chr> "almond.jpg", "ambulance.jpg", "aubergine.jpg", "austronaut.jpg", "bagpipes.jpg", "basketbal…
$ resp              <chr> "almond", "ambulance", "aubergine", "austronaut", "bagpipes", "basketball", "binoculars", "b…
$ name_familiarised <lgl> TRUE, FALSE, FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, FALSE, FALSE, TRUE, …
$ modality          <chr> "speech", "speech", "speech", "speech", "speech", "speech", "speech", "speech", "speech", "s…
$ rt                <dbl> 1312, 1057, 967, 1148, 1100, 1295, 1205, 2470, 1292, 1012, 1476, 2241, 3120, 1401, 1232, 119…
$ dur               <dbl> 649, 544, 800, 680, 587, 452, 693, 618, 721, 740, 407, 873, 3510, 708, 691, 738, 287, 542, 9…
$ spell_div         <dbl> 0.52909473, 0.39199841, 1.04293875, 1.23439860, 0.88868655, 0.44764042, 1.16363177, 0.416013…
$ name_div          <dbl> 3.7269149, 0.1407271, 3.7693227, 2.9867884, 0.2224148, 0.7472810, 0.2578952, 2.8129985, 4.49…
$ aoa               <dbl> 7.67, 6.16, NA, 6.28, NA, 5.30, 6.79, 4.63, 8.72, 5.20, 9.76, 8.22, 12.25, 5.84, 9.61, 6.18,…
$ freq              <dbl> 1.0769429, 4.8264469, NA, NA, 1.9932336, 2.7816910, 2.6863808, 5.5655792, 2.3297058, 3.51928…
$ nsyl              <dbl> 2, 3, 4, 3, 3, 3, 4, 2, 2, 3, 2, 4, 3, 2, 2, 4, 1, 3, 2, 3, 2, 1, 1, 2, 2, 3, 2, 2, 3, 3, 2,…
$ nchar             <dbl> 6, 9, 9, 9, 8, 10, 10, 7, 7, 8, 4, 10, 8, 8, 8, 11, 5, 9, 9, 10, 7, 6, 3, 7, 6, 8, 7, 7, 8, …
$ nphon             <dbl> 5, 9, NA, NA, 7, 9, 10, 6, 5, 7, 3, 10, 8, 5, 7, 8, 3, 8, 6, 8, 4, NA, 3, 5, 5, NA, 6, 6, 7,…
$ cat               <chr> "is natural", "is manmade", "is natural", NA, "is manmade", "is manmade", "is manmade", "is …
$ semcat            <dbl> -0.12349131, -0.44492604, -0.72806943, NA, 0.05315648, 0.48631844, 1.55619767, 0.33721132, 1…

Exploring data

d_ppt_pic <- distinct(d_spellname, ppt_id, image_id)
count(d_ppt_pic, ppt_id)
# A tibble: 72 × 2
   ppt_id     n
    <dbl> <int>
 1      1   141
 2      2   133
 3      3   142
 4      4   139
 5      5   136
 6      6   133
 7      7   137
 8      8   141
 9      9   142
10     10   136
# ℹ 62 more rows

What is data visualisation?

  • Graphical representation of data
  • Graphical data analysis
  • What do we want to know?
  • What do we want to communicate?
  • What do people take away from your visualisation?
  • Exploratory plots (for small specialist audience)
  • Explanatory plots: inform and persuade wider audience

Building up a plot

d_vocab <- summarise(d_spellname, 
                      rt = mean(rt),
                      .by = c(ppt_id, ppt_vocab, modality)) 
glimpse(d_vocab, width = 120)
Rows: 72
Columns: 4
$ ppt_id    <dbl> 40, 41, 42, 43, 44, 45, 46, 47, 48, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 62, 63, 64, 65, 66, …
$ ppt_vocab <dbl> 0.9500, 1.0000, 0.9250, 0.9000, 0.9750, 1.0000, 0.9250, 0.9625, 0.9750, 0.8250, 0.9375, 0.8875, 0.97…
$ modality  <chr> "speech", "speech", "speech", "speech", "speech", "speech", "speech", "speech", "speech", "speech", …
$ rt        <dbl> 1358.0756, 1273.9292, 1392.7664, 740.0988, 1188.0392, 1341.9835, 1519.9000, 1512.6337, 1479.2810, 12…

Building up a plot

ggplot(data = d_vocab, 
       mapping = aes(x = ppt_vocab, 
                     y = rt)) 

Building up a plot

ggplot(data = d_vocab, 
       mapping = aes(x = ppt_vocab, 
                     y = rt))  +
  geom_point()  

Building up a plot

ggplot(data = d_vocab, 
       mapping = aes(x = ppt_vocab, 
                     y = rt)) +
  geom_point() +
  stat_smooth(method = "lm") 

Building up a plot

ggplot(data = d_vocab, 
       mapping = aes(x = ppt_vocab, 
                     y = rt,
                     colour = modality)) +
  geom_point() +
  stat_smooth(method = "lm") 

Building up a plot

ggplot(data = d_vocab, 
       mapping = aes(x = ppt_vocab, 
                     y = rt, 
                     colour = modality,
                     linetype = modality))  +
  geom_point(alpha = .25) +
  stat_smooth(method = "lm", se = T, fullrange = TRUE) +
  scale_y_continuous(labels = scales::comma) +
  ggthemes::theme_clean() +
  ggthemes::scale_color_colorblind() +
  labs(y = "Average reaction time (in msecs)", 
       x = "Vocabulary score",
       colour = "Response modality",
       linetype = "Response modality") +
  theme(legend.position = "top",
        legend.justification = "right",
        axis.title = element_text(hjust = 0))

Creating an exploratory plot

Open RMarkdown document 1_scatterplots.Rmd

Why data visualisation?

“[data visualization] forces us to notice what we never expected to see.” (Tukey 1977)

  • exploring structures in the data
  • relationship between variables
  • distribution of data
  • develop an understanding of patterns (beyond means and SDs)
  • selecting appropriate stats
  • prevent wrong conclusions about data / theory

Anscombe’s quartet (Anscombe 1973)

x
y
y ~ x
Data set Mean SD Mean SD Correlation Intercept Slope
1 9 3.32 7.5 2.03 0.82 3 0.5
2 9 3.32 7.5 2.03 0.82 3 0.5
3 9 3.32 7.5 2.03 0.82 3 0.5
4 9 3.32 7.5 2.03 0.82 3 0.5

Anscombe’s quartet

Anscombe’s quartet

The datasaurus dozen

Matejka and Fitzmaurice (2017): see link

Principles of data visualisation

  • No “one fits all” method
  • Some methods are more informative than others
  • Maximise what we can learn from data
  • Going beyond summary statistics
  • Descriptive summary statistics may conceal / obscure important patterns but minimise what we want to communicate
  • Visualisation helps us to understand patterns, structures, relationships
  • Prevent wrong conclusions about data / theory

Principles of data visualisation

Hartwig and Dearing (1979):

  • Skepticism: any visualization might obscure or misrepresent data
  • Openness: there might be patterns and structures that we were not expecting

Tufte (1983):

  • Above all else show the data
  • Avoid distorting what the data have to say
  • Present many numbers in a small space
  • Encourage the eye to compare different pieces of data
  • Reveal data at several levels of detail, from broad overview to fine structures

6 plots of the same data

Obscuring data and misleading information

Open RMarkdown document 3_scatterplots.Rmd

Principles of data visualisation

Edward Tufte’s principles emphasise clarity, precision, and efficiency in the visual display of information. Tufte’s principles guide us to create visualizations that are:

  • Clear
  • Honest
  • Efficient
  • Insightful

Principles of data visualisation

Principle 1: Show the Data

  • Focus on the data itself
  • Avoid unnecessary decoration
  • Let the data tell the story

Principle 2: Maximize Data-Ink Ratio

  • Minimize non-essential elements
  • Every visual element should serve a purpose

Principle 3: Avoid Chartjunk

  • Eliminate decorative elements that obscure the message
  • Simplicity and clarity are key

Principles of data visualisation

Principle 1: Show the Data

  • Focus on the data itself
  • Avoid unnecessary decoration
  • Let the data tell the story

Principle 2: Maximize Data-Ink Ratio

  • Minimize non-essential elements
  • Every visual element should serve a purpose

Principle 3: Avoid Chartjunk

  • Eliminate decorative elements that obscure the message
  • Simplicity and clarity are key

Principles of data visualisation

Principle 4: Use Small Multiples

  • Repeat charts across categories for comparison
  • Supports pattern recognition

Principle 5: Encourage Visual Comparisons

  • Design graphics to make comparisons easy
  • Align scales and axes

Principles of data visualisation

Principle 4: Use Small Multiples

  • Repeat charts across categories for comparison
  • Supports pattern recognition

Principle 5: Encourage Visual Comparisons

  • Design graphics to make comparisons easy
  • Align scales and axes

Principles of data visualisation

Principle 4: Use Small Multiples

  • Repeat charts across categories for comparison
  • Supports pattern recognition

Principle 5: Encourage Visual Comparisons

  • Design graphics to make comparisons easy
  • Align scales and axes

Principles of data visualisation

Principle 6: Integrate Words, Numbers, and Images

  • Labels should be clear and close to the data
  • Avoid legends that require back-and-forth viewing

Principles of data visualisation

Principle 7: Content Over Decoration

  • Focus on substance, not style
  • The story should come from the data

Principles of data visualisation

Principle 8: Use Multivariate Displays

  • Show multiple variables when appropriate
  • Balance complexity with readability

Principles of data visualisation

Principle 8: Use Multivariate Displays

  • Show multiple variables when appropriate
  • Balance complexity with readability

Principles of data visualisation

Principle 8: Use Multivariate Displays

  • Show multiple variables when appropriate
  • Balance complexity with readability

Principles of data visualisation

Principle 9: Avoid Distorting the Data

  • Maintain proportionality and scale
  • Avoid misleading visuals

Principles of data visualisation

  • Clarity: Avoid clutter; make the message obvious
  • Accuracy: Represent data truthfully
  • Efficiency: Use the right chart for the right data
  • Consistency: Use consistent scales, colors, and labels
  • Accessibility: Consider colorblind-friendly palettes and readable fonts

What’s wrong with these?

Reading

Homework

Identify a dataset for the formative assessment.

On Teams, share a poor data visualisation (from a published research papers, news websites, social media, etc) and your reason why it is poor. Which principle(s) of data visualisation were violated?

References

Andrews, Mark. 2021. Doing Data Science in R: An Introduction for Social Scientists. SAGE Publications Ltd.

Anscombe, Francis J. 1973. “Graphs in Statistical Analysis.” The American Statistician 27: 17–21.

Gong, Ruyao, and Binghong Liu. 2022. “[Retracted] Monitoring of Sports Health Indicators Based on Wearable Nanobiosensors.” Advances in Materials Science and Engineering 2022 (1): 3802603. https://doi.org/https://doi.org/10.1155/2022/3802603.

Hartwig, Frederick, and Brian E. Dearing. 1979. Exploratory Data Analysis. 16. Sage.

Ke, Y. 2024. “Examining Simultaneous Pausing on the Cognitive Writing Process: A Micro-Formative Writing Assessment.” Current Psychology 43 (1): 39–50.

Matejka, Justin, and George Fitzmaurice. 2017. “Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics Through Simulated Annealing.” In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, 1290–94.

Rubiah, R., I. N. S. Degeng, P. Setyosari, and D. Kuswandi. 2024. “The Effect of Problem-Based Learning Assisted with Concept Mapping Founded on Cognitive Style on the Creativity of Writing Exposition Text.” Creativity Studies 17 (2): 419–34.

Tufte, Edward R. 1983. The Visual Display of Information. Cheshire, Ct: Graphics Press.

———. 2001. The Visual Display of Quantitative Information. 2nd ed. Cheshire, CT: Graphics Press.

Tukey, John W. 1977. Exploratory Data Analysis. Vol. 2.

van Lieburg, R., E. Sijyeniyo, R. J. Hartsuiker, and Sarah Bernolet. 2023. “The Development of Abstract Syntactic Representations in Beginning L2 Learners of Dutch.” Journal of Cultural Cognitive Science 7: 289–309. https://doi.org/10.1007/s41809-023-00131-5.

Wickham, Hadley. 2010. “A Layered Grammar of Graphics.” Journal of Computational and Graphical Statistics 19 (1): 3–28.

———. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer.

Wickham, Hadley, and Garrett Grolemund. 2016. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. O’Reilly Media, Inc.