2023-04-04
The Statistical Consulting Centre
Discuss with one of your supervisors first about booking a consultation.
Go on to the Statistical Consulting Centre website and select
Make an Appointment.
Fill out the form with you and your chosen supervisor’s details.
We will then send you a link to book.
If you have any questions feel free to email me…
bradleyw@uow.edu.au
or check out the SCC website…
https://www.uow.edu.au/niasra/our-research/statistical-consulting-centre/
also have a look at the NIASRA website…
Data Visualisation is the graphical representation and translation of abstract numerical and statistical information for the purposes of interpretation and communication of insights, trends, and patterns.
Have you ever really looked at a digital image?
## X Y Red Green Blue ## 1 1 1 0.3882353 0.3882353 0.3882353 ## 2 2 1 0.3858633 0.3858633 0.3858633 ## 3 3 1 0.3849406 0.3849406 0.3849406 ## 4 4 1 0.3852481 0.3852481 0.3852481 ## 5 5 1 0.3860388 0.3860388 0.3860388 ## 6 6 1 0.3855557 0.3855557 0.3855557 ## 7 7 1 0.3860388 0.3860388 0.3860388 ## 8 8 1 0.3860388 0.3860388 0.3860388 ## 9 9 1 0.3857313 0.3857313 0.3857313 ## 10 10 1 0.3854237 0.3854237 0.3854237 ## 11 11 1 0.3852481 0.3852481 0.3852481 ## 12 12 1 0.3852481 0.3852481 0.3852481 ## 13 13 1 0.3849406 0.3849406 0.3849406 ## 14 14 1 0.3849406 0.3849406 0.3849406 ## 15 15 1 0.3849406 0.3849406 0.3849406
Those values give us this plot…
Humans perceive a lot more information, a lot quicker, with images.
What am I? ….
_____ are large bulky fish that have a sharply pointed conical snout, large pectoral and dorsal fins, a strong crescent-shaped tail, and a whitish belly. They have a contrasting pattern of dark blue, gray, or brown on their back and sides and massive jaws which are armed with large sharply pointed, coarsely serrated teeth. Most weigh between 680 and 1,800 kg, but some weighing more than 2,270 kg have been documented.
_____ are large, gray aquatic mammals with bodies that taper to a flat, paddle-shaped tail. They have two forelimbs, called flippers, with three to four nails on each flipper. Their head and face are wrinkled with whiskers on the snout. The average adult is about 10 feet long and weighs as much as 590 kilograms.
Humans are visual creatures….
Visualisations are more easily interpreted by humans.
We are very good at identifying visual trends and patterns.
Can interpret a lot more information at once.
Can provide another avenue of explanation.
Good data visualisations need to be correct!
Is my data correctly and unambiguously displayed in my visualisation?
Creating purposefully misleading visualisations is unethical.
Somebody has clearly made a mistake in labelling this graph.
This is a map of flight paths, not patterns of documented spread. Pairing this graphic with that title gives a misleading impression about the extent of the spread.
The numbers on this graph make little sense, given the bar plot.
“The principle of proportional ink: The sizes of shaded areas in a visualisation need to be proportional to the data values they represent.”
In the following graph, it looks like Japanese citizens had a much greater life expectancy than citizens of Israel in 2002.
Consider the following graphic comparing the average life expectancy between different continents in 2007.
Graphic Source: Flaticon.com
Suppose we wished to show the difference in GDP per capita of different countries in 2007.
Humans are great at spotting patterns, mainly when those patterns are visual. We must be scientific in our choices of visualisation. Just because there is a pattern doesn’t mean there is a causal relationship between variables.
Check out https://www.tylervigen.com/spurious-correlations for some great examples!!!
What is the point I’m trying to communicate with this visualisation?
If your answer is … I don’t know, but it looks cool … then it is not a good visualisation.
Good visualisations allow the reader to understand immediately what point the author is trying to communicate.
What is the point of this visualisation?
What is the point of this visualisation now?
Titles, labels, and captions allow you to explicitly communicate to the reader specific points that may not be readily apparent in the graph alone. While “you need to label your graphs” may be a reasonably well-known mantra people are familiar with, adding a compelling title or caption is something with which many still struggle.
At a bare minimum, your title and caption must succinctly and accurately give the reader enough information to interpret the data visualisation. In addition, titles and text should indicate the purpose of the visualisation and support your overall research narrative.
But importantly, text in visualisations should be used sparingly. Adding too much text complicates and distracts from the visual point you are trying to make.
Take a look at this data visualisation that appeared in USA Today.
Source: https://www.statisticshowto.com/wp-content/uploads/2014/01/usa-today-1.png
A more effective example of titles and labels can be seen below.
Source: https://clauswilke.com/dataviz/figure-titles-captions.html
Source: https://www.pewresearch.org/fact-tank/2022/02/03/what-the-data-says-about-gun-deaths-in-the-u-s/
Consider the following graphic of 2013 major league soccer player salaries.
Source: \[View Source\]
We use data visualisations in two ways: exploration and explanation.
What makes a good data visualisation can change based on our analysis and communication context.
Visualisations aimed at exploration should give a broad overview of the data and are aimed at making the reader more familiar with the data. Explanatory visualisations communicate the results of an analysis and are used to influence opinions.
The visualisation you create should match the broader context in which it will appear.
Considering our synthetic NSW income data again, what actual benefit does the following graph have over the simple sentence, “the average weekly income of the NSW workers surveyed was $947.83 (95% CI $942.29,$953.36)”?
Tell me which product was the best selling?
| Product | Percentage of Sales |
|---|---|
| A | 20% |
| B | 18% |
| C | 17% |
| D | 22% |
| E | 23% |
:::
Good data visualisations are understandable and interesting.
Is my visualisation able to be interpreted?
Can people understand the graphic?
Do I have confusing auxiliary information?
Does it look interesting and engaging?
Is it simple and makes the point?
Good visualisations are simple, clear, and engaging.
When creating visualisations of data, we can use a range of aesthetics to represent different data types.
Common examples include:
|
|
Which angle is a) biggest, b) smallest, c) are any the same size?
Humans are notoriously bad at seeing small differences in angles!
Which two areas are different?
B and C are different, but it is not obvious!
Which two bars are the same height?
It is much easier to see that B and C are the same!
Giving every country a different colour in this visualisation makes it indecipherable.
Can you tell which area has the highest (and lowest) rate?
Monotonic colour scales give a natural sense of lowest to highest.
The following graph looks more like a topography map or a weather map than a graph of the religiosity of the USA.
Source: https://vividmaps.com/faithland/
When plotting the states Biden and Trump won in the 2020 election, we should use the colours associated with each of their respective parties, i.e. Blue for Democrats, Red for Republicans.
When selecting colour palettes, we must be aware that some readers may have impaired colour vision.
Approximately 8% of males and 0.5% of females suffer from some sort of colour-vision deficiency.
People with impaired colour vision typically have difficulty distinguishing certain types of colours, for example, red and green (red–green colour-vision deficiency) or blue and green (blue–yellow colour-vision deficiency).
By selecting specific colours, we can maximise accessibility.
Source: Claus O. Wilke. Fundamentals of Visualisation. O’Reilly Media, Inc., 2019.
Examples of how an impaired colour looks for different colour-vision disorders.
Source: Claus O. Wilke. Fundamentals of Visualisation. O’Reilly Media, Inc., 2019.
Colourblind-friendly colour scale, Okabe and Ito (2008).
| Name | Hex code | Hue | C,M,Y,K (%) | R,G,B (255) | R,G,B (%) |
|---|---|---|---|---|---|
| orange | #E69F00 | 41° | 0,50,100,0 | 230,159,0 | 90,60,0 |
| sky blue | #56B4E9 | 202° | 80,0,0,0 | 86,180,233 | 35,70,90 |
| bluish green | #009E73 | 164° | 97,0,75,0 | 0,158,115 | 0,60,50 |
| yellow | #F0E442 | 56° | 10,5,90,0 | 240,228,66 | 95,90,25 |
| blue | #0072B2 | 202° | 100,50,0,0 | 0,114,178 | 0,45,70 |
| vermilion | #D55E00 | 27° | 0,80,100,0 | 213,94,0 | 80,40,0 |
| reddish purple | #CC79A7 | 326° | 10,70,0,0 | 204,121,167 | 80,60,70 |
| black | #000000 | - | 0,0,0,100 | 0,0,0 | 0,0,0 |
Source: Claus O. Wilke. Fundamentals of Visualisation. O’Reilly Media, Inc., 2019.
Partial transparency and jittering…
Chartjunk is a term first used by Edward Tufte in his 1983 book The Visual Display of Quantitative Information that relates to any visual elements in the chart that are not required to understand the quantitative information being displayed and may distract attention away from the information.
“Every single pixel should testify directly to content”
-Edward Tufte
The use of logos instead of simple bars in this chart makes it so incomprehensible labels had to be introduced to communicate the data.
Source \[View Source\]
Both examples of going overboard with pattern and colour.
Users notice larger elements more easily.
Bright colours typically attract more attention than muted ones.
Dramatically contrasted colours are more eye-catching.
Out-of-alignment elements stand out over aligned ones.
Repeating styles can suggest content is related.
Proximity – Closely placed elements seem related.
Whitespace – More space around elements draws the eye towards them..
Texture and Style – Richer textures stand out over flat ones.
Always remember good data visualisations are:
correct and representative
not misleading
communicate a story
suited to the context where they appear
simple and easy to interpret
interesting and engaging
chartjunk free
not 3D
We will be looking a few key types of summaries that we tend to visualise:
Amounts
Proportions
Distributions
Associations and Correlations
Trends
Uncertainty
Amounts relate to the magnitude, extent or frequency of categories of a particular variable or combination of variables. That is, amounts visualise a quantitative measure on some set of categories.
When visualising amounts common geometries used are:
Bar graph of the number of republican and democratic presidents since Eisenhower.
Bar graph of length of presidential terms.
Grouped bar graph of customer satisfaction before and after a new staff training program.
Dot chart of top 10 GDP per capita countries in 2007.
Lollipop chart of top 10 life expectancy countries in 2007.
Heat map of the life expectancy in Asian countries 1952-2007.
When plotting proportions people usually use:
Proportion of males and females in each occupation of the NSW synthetic population.
Occupation share of males and females in the NSW synthetic population.
Pie charts should only be used to show simple proportions, majorities, or overwhelming portions.
The proportions of people in each job classification and education level in the ISLR mid-Atlantic wages data.
Employed persons rate in Australia 2020-2022.
When plotting the distribution of numeric variables, people tend to use:
Wage distribution of mid-Atlantic workers (2003-2009) in the Wage (ISLR package) data set.
Wage distribution of mid-Atlantic workers (2003-2009) in the Wage (ISLR package) data set split by job classification.
Wage distribution of mid-Atlantic workers (2003-2009) in the Wage (ISLR package) data set split by education level.
Wage distribution of mid-Atlantic workers (2003-2009) in the Wage (ISLR package) data set split by education level (more detail about shape).
Synthetic NSW Income Data (2016) wage distribution by occupation.
Joint wage and age distribution of mid-Atlantic workers (2003-2009) in the Wage (ISLR package) data set.
Joint wage and age distribution of mid-Atlantic workers (2003-2009) in the Wage (ISLR package) data set - but with bins.
Assessing the normality of the wage distribution of mid-Atlantic workers (2003-2009) in the Wage (ISLR package) data set.
When dealing with two or more quantitative variables, associations are usually best visualised with scatter plots although other plot types do exist for various purposes.
Common plots for visualising associations and correlations include:
Your dependent (or outcome) variable should always take the y position, whereas an independent variable should be given the x position.
The relationship between speed and stopping distance of cars.
Bubble chart of the evolution of GDP per capita and life expectancy of countries between 1952-2007.
Changes in GDP per Capita for various nations between 1957 and 2007.
Correlogram of car properties of 1974 US Motor Trend magazine cars.
Correlogram of car properties of 1974 US Motor Trend magazine cars (with sized circles).
There are many ways trends and patterns can appear and be visualised in data, so this is by no means an exhaustive list of ways to visualise trends. However, the most common way we visualise trends are with,
line graphs,
area graphs,
polygons / clusters.
Relationship between the height and weight of athletes in the Australia Institute of Sport data.
GDP per capita over time of Australia and New Zealand.
Number of unemployed persons in the US from 1967-2015.
Differences in the sex of athletes across the principal components of their physical and hematological characterisitics in the Australian Institute of Sports data.
When demonstrating uncertainty in statistical estimates the following can be used:
Average weekly income by occupation in the synthetic NSW 2016 data with standard errors displayed.
Average weekly income by occupation in the synthetic NSW 2016 data with 95% confidence intervals displayed.
Mid-Atlantic workers’ average wage by ethnicity with staggered confidence intervals.
Relationship between the height and weight of athletes in the Australia Institute of Sport data with 90% and 99% fitted confidence intervals.
Yes
But I’ve probably gone too far for one day anyway…
We are also running three short courses on using stats programs.
Advertised in Universe $110 (or $100) and on our website https://www.uow.edu.au/niasra/
Chat with your supervisor if you’re interested…
REMEMBER
If you have any questions feel free to email me…
bradleyw@uow.edu.au
or check out the SCC website…
<https://www.uow.edu.au/niasra/our-research/statistical-consulting-centre/>
also have a look at the NIASRA website…