2023-08-09
Statistical Consultant at UOW working in the Statistical Consulting Centre in NIASRA.
Did a degree in Mathematics and Applied Statistics (Advanced) at UOW - with a major in both mathematics and statistics.
Stayed at UOW and did a PhD in statistical disclosure control.
Started working as a consultant in July 2021.
Data Visualisation is the graphical representation and translation of abstract numerical and statistical information for the purposes of interpretation and communication of insights, trends, and patterns.
Have you ever really looked at a digital image?
## X Y Red Green Blue ## 1 1 1 0.3882353 0.3882353 0.3882353 ## 2 2 1 0.3858633 0.3858633 0.3858633 ## 3 3 1 0.3849406 0.3849406 0.3849406 ## 4 4 1 0.3852481 0.3852481 0.3852481 ## 5 5 1 0.3860388 0.3860388 0.3860388 ## 6 6 1 0.3855557 0.3855557 0.3855557 ## 7 7 1 0.3860388 0.3860388 0.3860388 ## 8 8 1 0.3860388 0.3860388 0.3860388 ## 9 9 1 0.3857313 0.3857313 0.3857313 ## 10 10 1 0.3854237 0.3854237 0.3854237 ## 11 11 1 0.3852481 0.3852481 0.3852481 ## 12 12 1 0.3852481 0.3852481 0.3852481 ## 13 13 1 0.3849406 0.3849406 0.3849406 ## 14 14 1 0.3849406 0.3849406 0.3849406 ## 15 15 1 0.3849406 0.3849406 0.3849406
Those values give us this plot…
Humans perceive a lot more information, a lot quicker, with images.
What am I? ….
Humans are visual creatures….
Visualisations are more easily interpreted by humans.
We are very good at identifying visual trends and patterns.
Can interpret a lot more information at once.
Can provide another avenue of explanation.
We will be looking a few key types of summaries that we tend to visualise:
Amounts
Proportions
Distributions
Associations and Correlations
Trends
Uncertainty
Amounts relate to the magnitude, extent or frequency of categories of a particular variable or combination of variables. That is, amounts visualise a quantitative measure on some set of categories.
When visualising amounts common geometries used are:
Bar graph of length of presidential terms.
Grouped bar graph of customer satisfaction before and after a new staff training program.
Dot chart of top 10 GDP per capita countries in 2007.
When plotting proportions people usually use:
Proportion of males and females in each occupation of the NSW synthetic population.
Occupation share of males and females in the NSW synthetic population.
Pie charts should only be used to show simple proportions, majorities, or overwhelming portions.
When plotting the distribution of numeric variables, people tend to use:
Wage distribution of mid-Atlantic workers (2003-2009) in the Wage (ISLR package) data set.
Wage distribution of mid-Atlantic workers (2003-2009) in the Wage (ISLR package) data set split by job classification.
Wage distribution of mid-Atlantic workers (2003-2009) in the Wage (ISLR package) data set split by education level.
Wage distribution of mid-Atlantic workers (2003-2009) in the Wage (ISLR package) data set split by education level (more detail about shape).
When dealing with two or more quantitative variables, associations are usually best visualised with scatter plots although other plot types do exist for various purposes.
Your dependent (or outcome) variable should always take the y position, whereas an independent variable should be given the x position.
The relationship between speed and stopping distance of cars.
Bubble chart of the evolution of GDP per capita and life expectancy of countries between 1952-2007.
Changes in GDP per Capita for various nations between 1957 and 2007.
Correlogram of car properties of 1974 US Motor Trend magazine cars.
There are many ways trends and patterns can appear and be visualised in data, so this is by no means an exhaustive list of ways to visualise trends. However, the most common way we visualise trends are with,
line graphs,
area graphs,
polygons / clusters.
Relationship between the height and weight of athletes in the Australia Institute of Sport data.
GDP per capita over time of Australia and New Zealand.
Number of unemployed persons in the US from 1967-2015.
When demonstrating uncertainty in statistical estimates the following can be used:
Given you will learn about how we measure uncertainty later in session, we won’t cover it now.
Good data visualisations need to be correct!
Is my data correctly and unambiguously displayed in my visualisation?
Creating purposefully misleading visualisations is unethical.
Somebody has clearly made a mistake in labelling this graph.
This is a map of flight paths, not patterns of documented spread. Pairing this graphic with that title gives a misleading impression about the extent of the spread.
The numbers on this graph make little sense, given the bar plot.
What is the point I’m trying to communicate with this visualisation?
If your answer is … I don’t know, but it looks cool … then it is not a good visualisation.
Good visualisations allow the reader to understand immediately what point the author is trying to communicate.
Good data visualisations are understandable and interesting.
Is my visualisation able to be interpreted?
Can people understand the graphic?
Do I have confusing auxiliary information?
Does it look interesting and engaging?
Is it simple and makes the point?
Good visualisations are simple, clear, and engaging.
Always remember good data visualisations are:
correct and representative
not misleading
communicate a story
suited to the context where they appear
simple and easy to interpret
interesting and engaging
chartjunk free
not 3D
If you have any questions feel free to email me…
bradleyw@uow.edu.au