Data Visualization Masterclass

Part 1: Why visualize data?

Objective

  • Explain how effective data visualizations enhance the clarity, impact, and integrity of scientific communication

Adapting to the modern data ecosystem

  • We are in a data-rich world.

  • But, data is not the same thing as insights.

  • We need effective ways to communicate the information that is hidden in large data sets.

Advantages of visual cognition

  • Humans are much better at visual perception than almost anything else.

    • Selective evolutionary pressure to build a strong vision system
  • Visualizations help us hold many values at once in our minds.

The limitations of tables

Visual displays take advantage of our skills by allowing us to understand many values at once.

        x         y
 3.727273  5.621515
 8.090909  8.744682
 5.545455  6.816341
 2.181818  3.911479
 7.000000  8.272315
 4.727273  5.469942
 5.454545  6.238583
 4.818182  3.956451
 9.727273 13.062818
 3.181818  3.607896

Graphical advantages - scatter plot

Graphical advantages - adding trend lines

Avoiding statistical mistakes

Statistics “compresses” values into single numbers.

  • Visualization allows you see broader patterns, and avoid confusions
  • Anscombe’s Quartet is a famous example
  x1 x2 x3 x4   y1   y2    y3    y4
1 10 10 10  8 8.04 9.14  7.46  6.58
2  8  8  8  8 6.95 8.14  6.77  5.76
3 13 13 13  8 7.58 8.74 12.74  7.71
4  9  9  9  8 8.81 8.77  7.11  8.84
5 11 11 11  8 8.33 9.26  7.81  8.47
6 14 14 14  8 9.96 8.10  8.84  7.04
7  6  6  6  8 7.24 6.13  6.08  5.25
8  4  4  4 19 4.26 3.10  5.39 12.50

Anscombe’s statistics

Summary statistics for each dataset:
Mean of x: 9 
Mean of y: 7.5 
Variance of x: 11 
Variance of y: 4.13 
Correlation: 0.816 
Linear regression: y = 3 + 0.5 * x
R-squared: 0.667 

Anscombe’s Quartet graphics

Advantage of modern statistical tools

Further reading

  • Few, Stephen. 2009. Now You See It: Simple Visualization Techniques for Quantitative Analysis. Analytics Press.

    • Chapter 1

Exercise

Identify a data source relevant to your interest.

Describe:

(1) A pattern in those data that could be more clearly displayed graphically.

(2) A setting where having a user interact with a data visualization would be helpful in communicating your work.

Part 2: The Mechanics of Visualization

Objective

Explain how the elements of a data visualization can represent the characteristics of a data set.

Why study the mechanics?

When you write up results in language, you need to have an understanding of structure and grammar to communicate clearly.

The same thing applies when communicating visually. We need an understand of the “grammar” of graphics if we are to think critically about visualizations.

Claus Wilke

Mapping data onto aesthetics

Fundamentally, “data visualization map data values into quantifiable features.”

  • We are taking data values and turning them into “blobs of ink or colored pixels on screen.”

What is an aesthetic?

Aesthetics are “every aspects of a given graphical element.” These include:

  1. Position
  2. Shape
  3. Size
  4. Color

Illustration of aesthetics

What kinds of data are there?

  1. Continuous quantitative
  2. Discrete quantitative
  3. Categorical ordered
  4. Categorical unordered
  5. Dates and times

How do we connect data and aesthetics?

Scaling is the process to mapping data to aesthetics.

In other words, changing aesthetic values in ways that correspond with changes in the underlying values.

Choices about aesthetics can change perceptions

The same information, with different aesthetic choices

Packing information into a visualization

Coordinate systems and axes

Choices about axes scales

Color scales

  • Color has two attributes - hue and intensity

  • These attributes can be used for several kinds of visual distinction

    • Qualitative

    • Quantitative

    • Highlighting/contrast

Color for qualitative distinction

Color for quantitative distinction

Color for highlighting

A directory of visualizations

With the tools of aesthetics, scales, coordinate systems, many visualizations become possible

Exercise

Take an example data visualization task from your work and explicitly describe the choices you made with respect to the different aesthetic features of the figure, including position, shape, size, and color.

Part 3: How can data visualizations help us understand data?

Data visualization as exploratory analysis

  • Data visualization is not just about presenting results. It can also help us think about data and derive insights.

Objective

  • Describe how the process of visualizing data can support data exploration.

Stephen Few

  • The ideas in this presentation are derived from Chapter 4 of Now You See It (2009 edition)

Why use visualization to help with analysis?

  • Externalizes thought processes

  • Helps you reveal structure progressively

  • Helps you understand what is interesting to present to outside audiences.

Techniques of analysis with visualization

  • Compare

  • Sort

  • Add variables

  • Filter

  • Highlight

  • Zoom

Compare

  • Comparison and contrast are the “beating heart of analysis”

  • You can compare magnitudes or your can compare patterns

Comparing magnitudes - unordered

Nominal comparisons with no particular order

Comparing magnitudes - part-to-whole

Comparing Patterns

Sorting

Adding Variables

Filtering

Some plots have far too much going on

Filtering - two groups

Simple filtering can let you make focused comparisons

Zooming

Highlighting

With highlighting, much more useful

Foundational Patterns

  • Time series
  • Ranking and part-to-whole
  • Deviation
  • Correlation
  • Multivariate

Visualization as a way to structure your analysis?

  • I often tell students to think about their analytical mission as making a “cool chart”.

  • If the most impactful way to communicate findings is visual, then using visual tools to structure your analysis is a good practice.

Exercise

  • Consider a quantitative data set relevant to your professional or personal interests. What are some examples of relationships within data that you could discover by comparing, sorting, adding variables, zooming or the other techniques presented in this lesson?

Part 4: Making Attractive Visualizations

Edit, edit, edit

“The purpose of visualization is insight, not pictures.” - Ben Shneiderman

We should be ruthless about making visualizations that are informative in all aspects of their design.

Whose advise should we take?

Edward Tufte - a highly opinionated but influential voice Stephen Few - practical business intelligence Nathan Yau - modern data visualization Alberto Cairo - The Truthful Art

What makes for a good graphic?

Tufte’s principles

  1. Graphical excellence is the well-designed presentation of interesting data - a matter of substance, of statistics, and design.
  2. Graphical excellence consists of complex ideas communicated with clarity, precision, and efficiency.
  3. Graphical excellence is that which gives to the viewer the greater number of ideas in the shortest time with the least ink in the smallest space.
  4. Graphical excellence is nearly always multivariate.
  5. Graphical excellence is about telling the truth with data.

Tufte’s Rules - The Data-Ink Ratio

“A large share of ink on a graphic should present data-information.”

Tufte’s Rules - The Lie Factor

\[ \text{Lie Factor} = \frac{\text{size of effect shown in graphic}}{\text{size of effect in data}} \]

A large lie factor means that the size of the effect in the graphic is much greater than the size of the effect in the data itself.

Lie factor illustration

Consider a situation where you see 25% growth in revenue over a 5 year period. Here’s an honest plot, where the change in the data (25%) is the same as the change in the visualization (25%)

Actual revenue increase:  25 %

A dishonest chart

Overlapping Points

Don’t use too much color

(these data are fake)

A better plot

Checklist

  1. Labels
  2. Avoid 3D
  3. Choose clear professional typography
  4. Make sure they are large enough to viewable, but not so big as to overstate the content. Maintain whitespace.
  5. Include citations and notes!
  6. Remember the basics:
    • Position
    • Shape
    • Size
    • Color

Make visualization part of your workflow

  • Iterate and experiment!

  • Develop good skills with a modern, powerful piece of visualization software (R, Python, PowerBI, Tableau). (Excel is ok but not as robust).

Exercise

Evaluate a visualization from your own work using the principles and pointers from this part of the course, like Tufte’s data-ink ratio or the best practices for uses of color. Describes some ways you might adjust your visualization following these recommendations. Alternatively, find a figure online and critique it in the same way.