2023-08-09

A Little About Me…

I’m Brad Wakefield

  • Statistical Consultant at UOW working in the Statistical Consulting Centre in NIASRA.

  • Did a degree in Mathematics and Applied Statistics (Advanced) at UOW - with a major in both mathematics and statistics.

  • Stayed at UOW and did a PhD in statistical disclosure control.

  • Started working as a consultant in July 2021.

What is Data Visualisation?

Definition

Data Visualisation is the graphical representation and translation of abstract numerical and statistical information for the purposes of interpretation and communication of insights, trends, and patterns.

  • We use data visualisations to simplify and explain relationships and features of data in a tractable and interpretative manner to inform complex decision making.

A picture is worth a thousand values….

Have you ever really looked at a digital image?

##     X Y       Red     Green      Blue
## 1   1 1 0.3882353 0.3882353 0.3882353
## 2   2 1 0.3858633 0.3858633 0.3858633
## 3   3 1 0.3849406 0.3849406 0.3849406
## 4   4 1 0.3852481 0.3852481 0.3852481
## 5   5 1 0.3860388 0.3860388 0.3860388
## 6   6 1 0.3855557 0.3855557 0.3855557
## 7   7 1 0.3860388 0.3860388 0.3860388
## 8   8 1 0.3860388 0.3860388 0.3860388
## 9   9 1 0.3857313 0.3857313 0.3857313
## 10 10 1 0.3854237 0.3854237 0.3854237
## 11 11 1 0.3852481 0.3852481 0.3852481
## 12 12 1 0.3852481 0.3852481 0.3852481
## 13 13 1 0.3849406 0.3849406 0.3849406
## 14 14 1 0.3849406 0.3849406 0.3849406
## 15 15 1 0.3849406 0.3849406 0.3849406

A picture is worth a thousand values….

Those values give us this plot…

Why Visualisation?

Humans perceive a lot more information, a lot quicker, with images.

What am I? ….

What am I?

  • _____ are large bulky fish that have a sharply pointed conical snout, large pectoral and dorsal fins, a strong crescent-shaped tail, and a whitish belly. They have a contrasting pattern of dark blue, gray, or brown on their back and sides and massive jaws which are armed with large sharply pointed, coarsely serrated teeth. Most weigh between 680 and 1,800 kg, but some weighing more than 2,270 kg have been documented.

I am a …

Great White Shark

What am I?

  • _____ are large, gray aquatic mammals with bodies that taper to a flat, paddle-shaped tail. They have two forelimbs, called flippers, with three to four nails on each flipper. Their head and face are wrinkled with whiskers on the snout. The average adult is about 10 feet long and weighs as much as 590 kilograms.

I am a …

Manatee

Compare that too….

What am I now?

Which way was quicker?

Why visualise your data?

Humans are visual creatures….

  • Visualisations are more easily interpreted by humans.

  • We are very good at identifying visual trends and patterns.

  • Can interpret a lot more information at once.

  • Can provide another avenue of explanation.

Lets have a look at some different ways we can visualise data.

The best graph depends on the type of data.

We will be looking a few key types of summaries that we tend to visualise:

  • Amounts

  • Proportions

  • Distributions

  • Associations and Correlations

  • Trends

  • Uncertainty

Amounts

Amounts relate to the magnitude, extent or frequency of categories of a particular variable or combination of variables. That is, amounts visualise a quantitative measure on some set of categories.

When visualising amounts common geometries used are:

  • bar plots,
  • grouped (clustered) bar plots,
  • lollipop plots,
  • dot plots,
  • heatmaps.

Bar Graphs

Bar graph of length of presidential terms.

Grouped Bar Graph

Grouped bar graph of customer satisfaction before and after a new staff training program.

Dot Chart

Dot chart of top 10 GDP per capita countries in 2007.

Proportions

When plotting proportions people usually use:

  • bar charts
  • stacked bar charts,
  • grouped (clustered) bar charts,
  • pie charts,
  • mosaic plots,
  • stacked density plots.

Stacked Bar Chart

Proportion of males and females in each occupation of the NSW synthetic population.

Grouped Bar Chart

Occupation share of males and females in the NSW synthetic population.

Pie Chart

Pie charts should only be used to show simple proportions, majorities, or overwhelming portions.

Distributions

When plotting the distribution of numeric variables, people tend to use:

  • histograms,
  • density plots,
  • boxplots,
  • violin plots,
  • strip charts,
  • ridgeline plots,
  • contour and 2D binned plots,
  • cumulative density and q-q plots.

Histogram

Wage distribution of mid-Atlantic workers (2003-2009) in the Wage (ISLR package) data set.

Density Plot

Wage distribution of mid-Atlantic workers (2003-2009) in the Wage (ISLR package) data set split by job classification.

Boxplot

Wage distribution of mid-Atlantic workers (2003-2009) in the Wage (ISLR package) data set split by education level.

Strip and Violin Chart

Wage distribution of mid-Atlantic workers (2003-2009) in the Wage (ISLR package) data set split by education level (more detail about shape).

Associations and Correlations

When dealing with two or more quantitative variables, associations are usually best visualised with scatter plots although other plot types do exist for various purposes.

  • scatterplots,
  • bubble charts,
  • slope graphs,
  • correlograms,
  • contour and 2D binned plots.

Your dependent (or outcome) variable should always take the y position, whereas an independent variable should be given the x position.

Scatterplot

The relationship between speed and stopping distance of cars.

Bubble Chart

Bubble chart of the evolution of GDP per capita and life expectancy of countries between 1952-2007.

Slope Graph

Changes in GDP per Capita for various nations between 1957 and 2007.

Correlogram

Correlogram of car properties of 1974 US Motor Trend magazine cars.

Trends

Trend Line

Relationship between the height and weight of athletes in the Australia Institute of Sport data.

Trend Line (Temporal)

GDP per capita over time of Australia and New Zealand.

Area Chart

Number of unemployed persons in the US from 1967-2015.

Uncertainty

When demonstrating uncertainty in statistical estimates the following can be used:

  • error bars (for 1D points),
  • staggered bars (for 1D points)
  • ribbons (for lines),
  • polygons (for 2D points).

Given you will learn about how we measure uncertainty later in session, we won’t cover it now.

Visualisations can also be a bit more interactive…

Animations create drama…

…and display more information.

We can also build maps…

…and dashboards.

Principles of Data Visualisation

What makes a Good Data Visualisation?

The Science

Good data visualisations need to be correct!

Is my data correctly and unambiguously displayed in my visualisation?

  • Is my data correct?
  • Have I shown what I said I have shown?
  • Are my visualisations labelled correctly?
  • Are my scales correct and consistent?
  • Have I presented my data in a way faithful to the truth?

Creating purposefully misleading visualisations is unethical.

The media is filled with good (bad) examples…

More media examples…

This is a map of flight paths, not patterns of documented spread. Pairing this graphic with that title gives a misleading impression about the extent of the spread.

Source: https://venngage.com/blog/bad-infographics/

More media examples…

Ensuring the data is correctly displayed in a way faithful to the truth is the most important part of any data visualisation.

The Story

What is the point I’m trying to communicate with this visualisation?

  • Am I demonstrating a particular trend?
  • Am I demonstrating a difference?
  • Am I demonstrating a problem?

If your answer is … I don’t know, but it looks cool … then it is not a good visualisation.

Good visualisations allow the reader to understand immediately what point the author is trying to communicate.

What is the point of this visualisation?

What is the point of this visualisation now?

The Visual

Good data visualisations are understandable and interesting.

Is my visualisation able to be interpreted?

  • Can people understand the graphic?

  • Do I have confusing auxiliary information?

  • Does it look interesting and engaging?

  • Is it simple and makes the point?

Good visualisations are simple, clear, and engaging.

For example…

A better plot…

Key Points

Always remember good data visualisations are:

  • correct and representative

  • not misleading

  • communicate a story

  • suited to the context where they appear

  • simple and easy to interpret

  • interesting and engaging

  • chartjunk free

  • not 3D

THANKS FOR LISTENING TO ME

If you have any questions feel free to email me…

bradleyw@uow.edu.au