x y
3.727273 5.621515
8.090909 8.744682
5.545455 6.816341
2.181818 3.911479
7.000000 8.272315
4.727273 5.469942
5.454545 6.238583
4.818182 3.956451
9.727273 13.062818
3.181818 3.607896
We are in a data-rich world.
But, data is not the same thing as insights.
We need effective ways to communicate the information that is hidden in large data sets.
Humans are much better at visual perception than almost anything else.
Visualizations help us hold many values at once in our minds.
Visual displays take advantage of our skills by allowing us to understand many values at once.
x y
3.727273 5.621515
8.090909 8.744682
5.545455 6.816341
2.181818 3.911479
7.000000 8.272315
4.727273 5.469942
5.454545 6.238583
4.818182 3.956451
9.727273 13.062818
3.181818 3.607896
Statistics “compresses” values into single numbers.
x1 x2 x3 x4 y1 y2 y3 y4
1 10 10 10 8 8.04 9.14 7.46 6.58
2 8 8 8 8 6.95 8.14 6.77 5.76
3 13 13 13 8 7.58 8.74 12.74 7.71
4 9 9 9 8 8.81 8.77 7.11 8.84
5 11 11 11 8 8.33 9.26 7.81 8.47
6 14 14 14 8 9.96 8.10 8.84 7.04
7 6 6 6 8 7.24 6.13 6.08 5.25
8 4 4 4 19 4.26 3.10 5.39 12.50
Summary statistics for each dataset:
Mean of x: 9
Mean of y: 7.5
Variance of x: 11
Variance of y: 4.13
Correlation: 0.816
Linear regression: y = 3 + 0.5 * x
R-squared: 0.667
Few, Stephen. 2009. Now You See It: Simple Visualization Techniques for Quantitative Analysis. Analytics Press.
Identify a data source relevant to your interest.
Describe:
(1) A pattern in those data that could be more clearly displayed graphically.
(2) A setting where having a user interact with a data visualization would be helpful in communicating your work.
Explain how the elements of a data visualization can represent the characteristics of a data set.
When you write up results in language, you need to have an understanding of structure and grammar to communicate clearly.
The same thing applies when communicating visually. We need an understand of the “grammar” of graphics if we are to think critically about visualizations.
Fundamentally, “data visualization map data values into quantifiable features.”
Aesthetics are “every aspects of a given graphical element.” These include:
Scaling is the process to mapping data to aesthetics.
In other words, changing aesthetic values in ways that correspond with changes in the underlying values.
Color has two attributes - hue and intensity
These attributes can be used for several kinds of visual distinction
Qualitative
Quantitative
Highlighting/contrast
With the tools of aesthetics, scales, coordinate systems, many visualizations become possible
Claus Wilke’s “Directory of Visualizations”
Take an example data visualization task from your work and explicitly describe the choices you made with respect to the different aesthetic features of the figure, including position, shape, size, and color.
Externalizes thought processes
Helps you reveal structure progressively
Helps you understand what is interesting to present to outside audiences.
Compare
Sort
Add variables
Filter
Highlight
Zoom
Comparison and contrast are the “beating heart of analysis”
You can compare magnitudes or your can compare patterns
Nominal comparisons with no particular order
Some plots have far too much going on
Simple filtering can let you make focused comparisons
I often tell students to think about their analytical mission as making a “cool chart”.
If the most impactful way to communicate findings is visual, then using visual tools to structure your analysis is a good practice.
“The purpose of visualization is insight, not pictures.” - Ben Shneiderman
We should be ruthless about making visualizations that are informative in all aspects of their design.
Edward Tufte - a highly opinionated but influential voice Stephen Few - practical business intelligence Nathan Yau - modern data visualization Alberto Cairo - The Truthful Art
Tufte’s principles
“A large share of ink on a graphic should present data-information.”
\[ \text{Lie Factor} = \frac{\text{size of effect shown in graphic}}{\text{size of effect in data}} \]
A large lie factor means that the size of the effect in the graphic is much greater than the size of the effect in the data itself.
Consider a situation where you see 25% growth in revenue over a 5 year period. Here’s an honest plot, where the change in the data (25%) is the same as the change in the visualization (25%)
Actual revenue increase: 25 %
(these data are fake)
Iterate and experiment!
Develop good skills with a modern, powerful piece of visualization software (R, Python, PowerBI, Tableau). (Excel is ok but not as robust).
Evaluate a visualization from your own work using the principles and pointers from this part of the course, like Tufte’s data-ink ratio or the best practices for uses of color. Describes some ways you might adjust your visualization following these recommendations. Alternatively, find a figure online and critique it in the same way.