Data Visualisation Workshop - Methods

EPA Victoria - June 2018

Dr James Baglin, School of Science, Mathematical Sciences, RMIT University

Last updated: 24 June, 2018

Univariate

Bivariate

Choices Matter…

  • Every decision we make will determine success or failure.
  • Every decision is a compromise.
  • Prioritise
  • Choose wisely
  • Justify everything (Kirk 2012)
  • Let’s consider some examples…

Choices Matter… Cont.

  • People have difficulty assimilating the information in box plots (Biehler 1997) and may forgot about sample size (Pfannkuch 2006).

Choices Matter…Cont. 2

Choices Matter Cont. 3

  • Cleverland, Diaconis, and McGill (1982) found that “zooming out” on a scatter plot increases the perceived strength of a correlation between two quantitative variables.

Different plots, different inferences

  • Baglin and Grant (2016) found that using different plots to summarise the same data, can significantly change the inference people draw.

Different plots, different inferences Cont.

Histograms

  • Histograms, which tally a quantitative variable into equal interval “bins”, are sensitive to sample size and the number of bins selected.

Barcharts

  • Position is a more accurate visual variable than length
  • Let’s look at this issues by proposing the following question:
  • Can we predict a person’s eye colour based on their hair colour?
  • We will look at hair and eye colour data from 592 college students.

Barcharts Cont.

  • Unequal sample size and uncommon baselines make comparisons difficult.

Barcharts Cont. 2

  • Filling out the stacks makes it easier to compare proportions, but the uncommon baselines are still an issue.

Barcharts Cont. 3

  • We can adjust the position by dodging eye colour which fixes the baseline

Mosaic Plots

  • Are Mosaic plots the best of both worlds?

Side-by-side Comparisons

  • Side-by-side comparisons of distributions are a highly effective way to compare groups.

Side-by-side Comparisons Cont.

  • Ordering makes it easier to rank vehicles.

Side-by-side Comparisons Cont.

  • However, we know boxplots hide sample size…

Side-by-side Comparisons Cont. 2

  • We can also add the mean

Side-by-side Comparisons Cont. 3

  • And some 95% Confidence Intervals for the mean

Multivariate

  • Count how many variables are contained in the following visualisation

Multivariate Cont.

  • The world is not bivariate.
  • Our world is far more interesting and complex.
  • The aim of multivariate data visualisation is to see through this complexity
  • However, multivariate data visualisation is very difficult.
  • As a general rule, no more than three variables in one plot.

Multivariate Cont. 2

  • This data visualisation has three variables.

Multivariate Cont. 3

  • We are not good at thinking in 3 dimensions either…

Multivariate Cont. 4

  • There are four general strategies for multivariate data visualisation
    • Mapping additional aesthetics (No more than three in one plot)
    • Faceting (No more than two variables)
    • Purpose built - Use with caution
      • Sankey diagrams
      • Parallel coordinates
      • Multivariate mosaic plots
    • Animation (adds a frame aesthetic)

Animation

  • Animation essentially adds an extra aesthetic - frame
  • Frame is excellent for conveying time.

Getting Interactive

  • Interactive data visualisations take advantage of our inquisitive nature and desire to play (Murray 2013)
  • There are heaps of interactive features we can use in data visualisation:
    • Zooming
    • Selecting variables
    • Hover effects
    • Filtering data
    • Highlighting
  • I only ask you to keep one thing in mind:

The interactivity should always enhance and never detract.

Getting Interactive Cont.

Activity

  • Working with a partner or in small groups, discuss how the following visualisation can be improved.

Port Phillip Bay Report Card - EPA Vic

Quick Quiz 4

References

References

Baglin, J., and S. Grant. 2016. “Exploring the impact of visual representations of variation on informal statistical inference.” In Proceedings of the Ninth Australian Conference on Teaching Statistics, Dec 2016, edited by H. MacGillivray, M. Martin, and B. Phillips. Canberra, Australia.

Biehler, R. 1997. “Students’ Difficulties in Practicing Computer-Supported Data Analyis: Some Hypothetical generalizations from results of two exploratory studies.” In Research on the Role of Technology in Teaching and Learning Statistics, edited by J. B. Garfield and G. Burrill, 176–97. International Statistical Institute.

Cleverland, W. S., P. Diaconis, and R. McGill. 1982. “Variables on Scatterplots Look More Highly Correlated When the Scales Are Increased.” Science 216 (4550): 1138–41. doi:10.1126/science.216.4550.1138.

Kirk, A. 2012. Data visualization: a successful design process. Birmingham, UK: Packt Publishing Ltd.

Murray, S. 2013. Interactive data visualization for the web. Sebastopol, CA: O’Reilly Media, Inc.

Pfannkuch, M. 2006. “Comparing Box Plot Distributions: A Teacher’s Reasoning.” Statistics Education Research Journal 5 (2): 27–45. doi:10.11120/msor.2004.04030058.