- Course orientation
- Statistics Review
- Terminology & Concepts
- History of Visualization
- Traits of Meaningful Data
- Effective Graphing
- Limitations in Data Visualization
Summer 2020
This course:
To do well in this course, you must
Data collected on students in a statistics class on a variety of variables:
| Student | Gender | Intro/Extra | … | Dread |
|---|---|---|---|---|
| 1 | male | extravert | \(\cdots\) | 3 |
| 2 | female | extravert | \(\cdots\) | 2 |
| 3 | female | introvert | \(\cdots\) | 4 |
| 4 | female | extravert | \(\cdots\) | 2 |
| \(\vdots\) | \(\vdots\) | \(\vdots\) | \(\ddots\) | \(\vdots\) |
| 86 | male | extravert | \(\cdots\) | 3 |
sample = rnorm(mean=10, sd=3, n=100) mean(sample)
## [1] 9.913674
sample = rnorm(mean=10, sd=3, n=100) range(sample)
## [1] 3.555334 17.820299
quantile(sample)
## 0% 25% 50% 75% 100% ## 3.555334 8.118230 9.781637 11.764143 17.820299
IQR(sample)
## [1] 3.645912
var(sample); sd(sample)
## [1] 9.169551
## [1] 3.028127

To be able to produce useful visualization, data must:
To produce the most meaningful results, data should also:
Important points are emphasized / annotated
Axes, symbols, and colors are described
Visual content clarifies (does not distract)
Is accurate, clear, and improves understanding
An “effective graph” communicates clearly
“Chart and graph design isn’t just about making statistical visualizations but also about explaining what the visualization shows.” Nathan Yau
A good visualization tells a story from data!
Patterns
Relationships \(\leadsto\) compare & contrast values
Anomolies
Focus / reduction of information
We will discuss all of this in more detail as we go, but for now …
It is bad at doing what it is designed to do: Difficult to judge relative size of the pie slices
Inefficient / inflexible use of space
Need many colors and high contrast to make wedges distinct
We’re much worse at estimating area than length — we’re especially bad at perceiving small differences in area
Pie charts make judging trends difficult
3D effects make graphs harder to read
Are we to judge length? Area? Volume?
Display looks 3D when the angular perspective is offset, which makes referencing values on the axes harder
Display looks 3D when shading is employed, which clutters the graph and makes it harder to read
When you have multiple variables to compare, there are several possibilities:
Plotting 2D values using a scatter plot is easy
If we have a categorical variable, we can sometimes use shading or color to add a third dimension
But if we have another numeric dimension, it’s challenging
Why not use point size (area)?
Items compared should have the same baseline for comparison
That baseline should not distort the true data values
Scaling should be set properly for comparison (apples-to-apples)
Scaling should not distort the true data values
Data should always be properly adjusted
Humans are good at finding patterns
We’re not so good at judging the differences between things
We re-orient things in our mind without realizing, focusing on where things are most similar
So when we judge the difference between two curves (e.g., inflation in the US over time vs. inflation in Europe), we minimize differences by finding the points where they are the closest, regardless of orientation
This is particularly a problem when judging differences along a y-axis (which is how we typically plot things)