x y
3.727273 5.621515
8.090909 8.744682
5.545455 6.816341
2.181818 3.911479
7.000000 8.272315
4.727273 5.469942
5.454545 6.238583
4.818182 3.956451
9.727273 13.062818
3.181818 3.607896
Data Visualization Masterclass
Part 1: Why visualize data?
Objective
- Explain how effective data visualizations enhance the clarity, impact, and integrity of scientific communication
Adapting to the modern data ecosystem
We are in a data-rich world.
But, data is not the same thing as insights.
We need effective ways to communicate the information that is hidden in large data sets.
Advantages of visual cognition
Humans are much better at visual perception than almost anything else.
- Selective evolutionary pressure to build a strong vision system
Visualizations help us hold many values at once in our minds.
The limitations of tables
Visual displays take advantage of our skills by allowing us to understand many values at once.
Graphical advantages - scatter plot
Graphical advantages - adding trend lines
Avoiding statistical mistakes
Statistics “compresses” values into single numbers.
- Visualization allows you see broader patterns, and avoid confusions
- Anscombe’s Quartet is a famous example
x1 x2 x3 x4 y1 y2 y3 y4
1 10 10 10 8 8.04 9.14 7.46 6.58
2 8 8 8 8 6.95 8.14 6.77 5.76
3 13 13 13 8 7.58 8.74 12.74 7.71
4 9 9 9 8 8.81 8.77 7.11 8.84
5 11 11 11 8 8.33 9.26 7.81 8.47
6 14 14 14 8 9.96 8.10 8.84 7.04
7 6 6 6 8 7.24 6.13 6.08 5.25
8 4 4 4 19 4.26 3.10 5.39 12.50
Anscombe’s statistics
Summary statistics for each dataset:
Mean of x: 9
Mean of y: 7.5
Variance of x: 11
Variance of y: 4.13
Correlation: 0.816
Linear regression: y = 3 + 0.5 * x
R-squared: 0.667
Anscombe’s Quartet graphics
Advantage of modern statistical tools
- We can build visual insights directly into our workflow.
- Utah Division of Water Quality Dashboard
Further reading
Few, Stephen. 2009. Now You See It: Simple Visualization Techniques for Quantitative Analysis. Analytics Press.
- Chapter 1
Exercise
Identify a data source relevant to your interest.
Describe:
(1) A pattern in those data that could be more clearly displayed graphically.
(2) A setting where having a user interact with a data visualization would be helpful in communicating your work.
Part 2: The Mechanics of Visualization
Objective
Explain how the elements of a data visualization can represent the characteristics of a data set.
Why study the mechanics?
When you write up results in language, you need to have an understanding of structure and grammar to communicate clearly.
The same thing applies when communicating visually. We need an understand of the “grammar” of graphics if we are to think critically about visualizations.
Claus Wilke
Mapping data onto aesthetics
Fundamentally, “data visualization map data values into quantifiable features.”
- We are taking data values and turning them into “blobs of ink or colored pixels on screen.”
What is an aesthetic?
Aesthetics are “every aspects of a given graphical element.” These include:
- Position
- Shape
- Size
- Color
Illustration of aesthetics
What kinds of data are there?
- Continuous quantitative
- Discrete quantitative
- Categorical ordered
- Categorical unordered
- Dates and times
How do we connect data and aesthetics?
Scaling is the process to mapping data to aesthetics.
In other words, changing aesthetic values in ways that correspond with changes in the underlying values.
Choices about aesthetics can change perceptions
The same information, with different aesthetic choices
Packing information into a visualization
Coordinate systems and axes
Choices about axes scales
Color scales
Color has two attributes - hue and intensity
These attributes can be used for several kinds of visual distinction
Qualitative
Quantitative
Highlighting/contrast
Color for qualitative distinction
Color for quantitative distinction
Color for highlighting
A directory of visualizations
With the tools of aesthetics, scales, coordinate systems, many visualizations become possible
Claus Wilke’s “Directory of Visualizations”
Exercise
Take an example data visualization task from your work and explicitly describe the choices you made with respect to the different aesthetic features of the figure, including position, shape, size, and color.
Part 3: How can data visualizations help us understand data?
Data visualization as exploratory analysis
- Data visualization is not just about presenting results. It can also help us think about data and derive insights.
Objective
- Describe how the process of visualizing data can support data exploration.
Stephen Few
- The ideas in this presentation are derived from Chapter 4 of Now You See It (2009 edition)
Why use visualization to help with analysis?
Externalizes thought processes
Helps you reveal structure progressively
Helps you understand what is interesting to present to outside audiences.
Techniques of analysis with visualization
Compare
Sort
Add variables
Filter
Highlight
Zoom
Compare
Comparison and contrast are the “beating heart of analysis”
You can compare magnitudes or your can compare patterns
Comparing magnitudes - unordered
Nominal comparisons with no particular order
Comparing magnitudes - part-to-whole
Comparing Patterns
Sorting
Adding Variables
Filtering
Some plots have far too much going on
Filtering - two groups
Simple filtering can let you make focused comparisons
Zooming
Highlighting
With highlighting, much more useful
Foundational Patterns
- Time series
- Ranking and part-to-whole
- Deviation
- Correlation
- Multivariate
Visualization as a way to structure your analysis?
I often tell students to think about their analytical mission as making a “cool chart”.
If the most impactful way to communicate findings is visual, then using visual tools to structure your analysis is a good practice.
Exercise
- Consider a quantitative data set relevant to your professional or personal interests. What are some examples of relationships within data that you could discover by comparing, sorting, adding variables, zooming or the other techniques presented in this lesson?
Part 4: Making Attractive Visualizations
Edit, edit, edit
“The purpose of visualization is insight, not pictures.” - Ben Shneiderman
We should be ruthless about making visualizations that are informative in all aspects of their design.
Whose advise should we take?
Edward Tufte - a highly opinionated but influential voice Stephen Few - practical business intelligence Nathan Yau - modern data visualization Alberto Cairo - The Truthful Art
What makes for a good graphic?
Tufte’s principles
- Graphical excellence is the well-designed presentation of interesting data - a matter of substance, of statistics, and design.
- Graphical excellence consists of complex ideas communicated with clarity, precision, and efficiency.
- Graphical excellence is that which gives to the viewer the greater number of ideas in the shortest time with the least ink in the smallest space.
- Graphical excellence is nearly always multivariate.
- Graphical excellence is about telling the truth with data.
Tufte’s Rules - The Data-Ink Ratio
“A large share of ink on a graphic should present data-information.”
Tufte’s Rules - The Lie Factor
\[ \text{Lie Factor} = \frac{\text{size of effect shown in graphic}}{\text{size of effect in data}} \]
A large lie factor means that the size of the effect in the graphic is much greater than the size of the effect in the data itself.
Lie factor illustration
Consider a situation where you see 25% growth in revenue over a 5 year period. Here’s an honest plot, where the change in the data (25%) is the same as the change in the visualization (25%)
Actual revenue increase: 25 %
A dishonest chart
Overlapping Points
Don’t use too much color
(these data are fake)
A better plot
Checklist
- Labels
- Avoid 3D
- Choose clear professional typography
- Make sure they are large enough to viewable, but not so big as to overstate the content. Maintain whitespace.
- Include citations and notes!
- Remember the basics:
- Position
- Shape
- Size
- Color
Make visualization part of your workflow
Iterate and experiment!
Develop good skills with a modern, powerful piece of visualization software (R, Python, PowerBI, Tableau). (Excel is ok but not as robust).
Exercise
Evaluate a visualization from your own work using the principles and pointers from this part of the course, like Tufte’s data-ink ratio or the best practices for uses of color. Describes some ways you might adjust your visualization following these recommendations. Alternatively, find a figure online and critique it in the same way.