- Week 2: Principles of data visualisation
- Week 3: Grammar of graphics; aesthetics and attributes
- Week 4: Major visualisation tools
- Week 5: Customising visualisations (scales, themes, and labels)
ggplot2
teaserDownload “Exercises” folder from NOW Learning Room (week 2). Move files into R-Project folder.
Rows: 8,670 Columns: 17 $ ppt_id <dbl> 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, … $ ppt_vocab <dbl> 0.95, 0.95, 0.95, 0.95, 0.95, 0.95, 0.95, 0.95, 0.95, 0.95, 0.95, 0.95, 0.95, 0.95, 0.95, 0.… $ image_id <chr> "almond.jpg", "ambulance.jpg", "aubergine.jpg", "austronaut.jpg", "bagpipes.jpg", "basketbal… $ resp <chr> "almond", "ambulance", "aubergine", "austronaut", "bagpipes", "basketball", "binoculars", "b… $ name_familiarised <lgl> TRUE, FALSE, FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, FALSE, FALSE, TRUE, … $ modality <chr> "speech", "speech", "speech", "speech", "speech", "speech", "speech", "speech", "speech", "s… $ rt <dbl> 1312, 1057, 967, 1148, 1100, 1295, 1205, 2470, 1292, 1012, 1476, 2241, 3120, 1401, 1232, 119… $ dur <dbl> 649, 544, 800, 680, 587, 452, 693, 618, 721, 740, 407, 873, 3510, 708, 691, 738, 287, 542, 9… $ spell_div <dbl> 0.52909473, 0.39199841, 1.04293875, 1.23439860, 0.88868655, 0.44764042, 1.16363177, 0.416013… $ name_div <dbl> 3.7269149, 0.1407271, 3.7693227, 2.9867884, 0.2224148, 0.7472810, 0.2578952, 2.8129985, 4.49… $ aoa <dbl> 7.67, 6.16, NA, 6.28, NA, 5.30, 6.79, 4.63, 8.72, 5.20, 9.76, 8.22, 12.25, 5.84, 9.61, 6.18,… $ freq <dbl> 1.0769429, 4.8264469, NA, NA, 1.9932336, 2.7816910, 2.6863808, 5.5655792, 2.3297058, 3.51928… $ nsyl <dbl> 2, 3, 4, 3, 3, 3, 4, 2, 2, 3, 2, 4, 3, 2, 2, 4, 1, 3, 2, 3, 2, 1, 1, 2, 2, 3, 2, 2, 3, 3, 2,… $ nchar <dbl> 6, 9, 9, 9, 8, 10, 10, 7, 7, 8, 4, 10, 8, 8, 8, 11, 5, 9, 9, 10, 7, 6, 3, 7, 6, 8, 7, 7, 8, … $ nphon <dbl> 5, 9, NA, NA, 7, 9, 10, 6, 5, 7, 3, 10, 8, 5, 7, 8, 3, 8, 6, 8, 4, NA, 3, 5, 5, NA, 6, 6, 7,… $ cat <chr> "is natural", "is manmade", "is natural", NA, "is manmade", "is manmade", "is manmade", "is … $ semcat <dbl> -0.12349131, -0.44492604, -0.72806943, NA, 0.05315648, 0.48631844, 1.55619767, 0.33721132, 1…
d_ppt_pic <- distinct(d_spellname, ppt_id, image_id) count(d_ppt_pic, ppt_id)
# A tibble: 72 × 2 ppt_id n <dbl> <int> 1 1 141 2 2 133 3 3 142 4 4 139 5 5 136 6 6 133 7 7 137 8 8 141 9 9 142 10 10 136 # ℹ 62 more rows
d_vocab <- summarise(d_spellname, rt = mean(rt), .by = c(ppt_id, ppt_vocab, modality)) glimpse(d_vocab, width = 120)
Rows: 72 Columns: 4 $ ppt_id <dbl> 40, 41, 42, 43, 44, 45, 46, 47, 48, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 62, 63, 64, 65, 66, … $ ppt_vocab <dbl> 0.9500, 1.0000, 0.9250, 0.9000, 0.9750, 1.0000, 0.9250, 0.9625, 0.9750, 0.8250, 0.9375, 0.8875, 0.97… $ modality <chr> "speech", "speech", "speech", "speech", "speech", "speech", "speech", "speech", "speech", "speech", … $ rt <dbl> 1358.0756, 1273.9292, 1392.7664, 740.0988, 1188.0392, 1341.9835, 1519.9000, 1512.6337, 1479.2810, 12…
ggplot(data = d_vocab, mapping = aes(x = ppt_vocab, y = rt))
ggplot(data = d_vocab, mapping = aes(x = ppt_vocab, y = rt)) + geom_point()
ggplot(data = d_vocab, mapping = aes(x = ppt_vocab, y = rt)) + geom_point() + stat_smooth(method = "lm")
ggplot(data = d_vocab, mapping = aes(x = ppt_vocab, y = rt, colour = modality)) + geom_point() + stat_smooth(method = "lm")
ggplot(data = d_vocab, mapping = aes(x = ppt_vocab, y = rt, colour = modality, linetype = modality)) + geom_point(alpha = .25) + stat_smooth(method = "lm", se = T, fullrange = TRUE) + scale_y_continuous(labels = scales::comma) + ggthemes::theme_clean() + ggthemes::scale_color_colorblind() + labs(y = "Average reaction time (in msecs)", x = "Vocabulary score", colour = "Response modality", linetype = "Response modality") + theme(legend.position = "top", legend.justification = "right", axis.title = element_text(hjust = 0))
Open RMarkdown document 1_scatterplots.Rmd
Data set | Mean | SD | Mean | SD | Correlation | Intercept | Slope |
---|---|---|---|---|---|---|---|
1 | 9 | 3.32 | 7.5 | 2.03 | 0.82 | 3 | 0.5 |
2 | 9 | 3.32 | 7.5 | 2.03 | 0.82 | 3 | 0.5 |
3 | 9 | 3.32 | 7.5 | 2.03 | 0.82 | 3 | 0.5 |
4 | 9 | 3.32 | 7.5 | 2.03 | 0.82 | 3 | 0.5 |
Open RMarkdown document: 2_tdd.Rmd
Hartwig and Dearing (1979):
Tufte (1983):
Open RMarkdown document 3_scatterplots.Rmd
Edward Tufte’s principles emphasise clarity, precision, and efficiency in the visual display of information. Tufte’s principles guide us to create visualizations that are:
Principle 1: Show the Data
Principle 2: Maximize Data-Ink Ratio
Principle 3: Avoid Chartjunk
Principle 1: Show the Data
Principle 2: Maximize Data-Ink Ratio
Principle 3: Avoid Chartjunk
Principle 4: Use Small Multiples
Principle 5: Encourage Visual Comparisons
Principle 4: Use Small Multiples
Principle 5: Encourage Visual Comparisons
Principle 4: Use Small Multiples
Principle 5: Encourage Visual Comparisons
Principle 6: Integrate Words, Numbers, and Images
Principle 7: Content Over Decoration
Principle 8: Use Multivariate Displays
Principle 8: Use Multivariate Displays
Principle 8: Use Multivariate Displays
Principle 9: Avoid Distorting the Data
See Gong and Liu (2022) (now retracted), Rubiah et al. (2024), and Ke (2024) for examples.
Check Figures in van Lieburg et al. (2023); code and data are HERE.
Next week we will continue with data visualisation in ggplot2
. For fundamentals of data visualisation in ggplot2
see
And for principles of data visualisation see this book: Tufte (2001)
Identify a dataset for the formative assessment.
On Teams, share a poor data visualisation (from a published research papers, news websites, social media, etc) and your reason why it is poor. Which principle(s) of data visualisation were violated?
Andrews, Mark. 2021. Doing Data Science in R: An Introduction for Social Scientists. SAGE Publications Ltd.
Anscombe, Francis J. 1973. “Graphs in Statistical Analysis.” The American Statistician 27: 17–21.
Gong, Ruyao, and Binghong Liu. 2022. “[Retracted] Monitoring of Sports Health Indicators Based on Wearable Nanobiosensors.” Advances in Materials Science and Engineering 2022 (1): 3802603. https://doi.org/https://doi.org/10.1155/2022/3802603.
Hartwig, Frederick, and Brian E. Dearing. 1979. Exploratory Data Analysis. 16. Sage.
Ke, Y. 2024. “Examining Simultaneous Pausing on the Cognitive Writing Process: A Micro-Formative Writing Assessment.” Current Psychology 43 (1): 39–50.
Matejka, Justin, and George Fitzmaurice. 2017. “Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics Through Simulated Annealing.” In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, 1290–94.
Rubiah, R., I. N. S. Degeng, P. Setyosari, and D. Kuswandi. 2024. “The Effect of Problem-Based Learning Assisted with Concept Mapping Founded on Cognitive Style on the Creativity of Writing Exposition Text.” Creativity Studies 17 (2): 419–34.
Tufte, Edward R. 1983. The Visual Display of Information. Cheshire, Ct: Graphics Press.
———. 2001. The Visual Display of Quantitative Information. 2nd ed. Cheshire, CT: Graphics Press.
Tukey, John W. 1977. Exploratory Data Analysis. Vol. 2.
van Lieburg, R., E. Sijyeniyo, R. J. Hartsuiker, and Sarah Bernolet. 2023. “The Development of Abstract Syntactic Representations in Beginning L2 Learners of Dutch.” Journal of Cultural Cognitive Science 7: 289–309. https://doi.org/10.1007/s41809-023-00131-5.
Wickham, Hadley. 2010. “A Layered Grammar of Graphics.” Journal of Computational and Graphical Statistics 19 (1): 3–28.
———. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer.
Wickham, Hadley, and Garrett Grolemund. 2016. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. O’Reilly Media, Inc.