Perception, Representation, & Handling Data

Summer 2020

Outline

Graphical Perception Tasks
Encoding Numeric Data, Proportions, Frequencies
Encoding Categorical Distinctions
Abstraction & Cognitive Considerations
Design Principles
Clutter
Gathering Data
Storing & Formatting Data

Graphical Perception Tasks

The Visual Perception System

Eyes sense light reflecting & refracting off of surfaces
A composite object is formed in our brain from various visual properties
We perceive composite as a whole object, but we distinguish these properties
For example: 2D location, length, width, area, shape, color, orientation
We do not attend to everything we see
We do not have good working memory for what we see

Preattentive Processing

Humans have a limited set of visual properties that are detected very rapidly and accurately by our visual system before we are consciously aware of it

We easily detect the presence or absence of a target within a visual field
We easily detect texture boundary between two groups of elements
We easily track an element with a unique visual feature in space and time

“Perception in Visualization”, Chris Healey, NC State

Preattentive Processing – Color is Easy

Find the red circle:

Preattentive Processing – Shape is Easy

Find the red circle:

Preattentive Processing – Conjunction is Harder

Find the red circle:

Preattentive Processing – Other Pre-attentive Cues

See: “Perception in Visualization”, Chris Healey, NC State

Postattentive Vision

What hapens to our visual representation when we stop attending and look at something else?

Sustained attention to objects do not make visual search more efficient
Repeated visual searches are not more efficient
Once we see a pattern, we match the pattern even when it isn’t there
Moral: Do not make users search for things in your visualization, but draw attention to things explicitly

“Perception in Visualization”, Chris Healey, NC State

Familiar Patterns

Humans are pattern matchers …

Familiar Patterns

See the dolphin!

Familiar Patterns

We cannot easily “unsee” things …

Familiar Patterns

There is no spoon … er … dolphin!

Poor Working Memory

We rely on memory, but our working visual memory is very limited

Visual Encoding

Tasks to be done when visualizing information:
- Encode numeric data visually
- Encode cateogrical data visually
- Encode distinctions between different pieces of information
- Encode methods to associate data / distinctions to some context
Objective: To make the reader’s decoding process as easy and error-free as possible

Encoding Numeric & Categorical Data

Typically the categorical data we wish to encode in fact numeric:
- Proportions: a continuous number between 0 and 1
- Frequencies: a discrete integer or count
So often the most fundamental encoding choices for numeric and categorical values to be plotted are the same

Distinguishing Graphical Elements

Whether underlying variables are categorical or numeric, we often have multiple things on a plot. E.g.,
- Proportions from different levels of some variable
- Different categorical level values in a factor
- Different numeric variable values
- Different trend lines in a time series
Because the reader needs to discern these as different things, our encoding must distinguish these for the reader in some way

Encoding Numeric Data, Proportions, Frequencies

Common Ways to Visually Encode Numbers

In order from most easily perceived to least:
1. Position along a common scale, axis, and baseline
2. Position along non-aligned axes
3. Length, direction, angles of relative lines / slope
4. Area
5. Volume, curvature, arcs / angles within a shape
6. Color or shading

Cleveland McGill (1984). “Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods.” Journal of the American Statistical Association, 79(387), pp. 531–554.

Preattention & Quantitative Perception

Preattentive attributes make useful visual peroperties for encoding
Few considers even more properties and groups them:

Group	Attribute
Form	length, width, orientation,size, shape, curvature, enclosure, blur
Color	hue, intensity
Spatial Position	2D position, spatial grouping
Motion	direction

Precision of Preattentive Quantitivate Perception

Visual attributes like length and 2D position encode quantitative data very precisely
Visual attributes like width, size, color intensity, and blur do not not
Data visualizations should align more precise attributes with variables that have the greatest need of precision

Position on a Common Scale

Position on Non-Aligned Axes

Length Comparisons

Area Comparisons

Angle / Curve Comparisons

Color Comparisons

Encoding Categorical Distinctions

Common Ways to Distinguish Visual Elements

Often we need to separate or distinguish discrete visual items using:

Distinct positions
Different colors or shading
Distinguishing symbols, words, or annotations
Other plot elements (e.g., line thickness)

Distinct Positions

Different Colors or Shading

Distinguishing Symbols, Words, or Annotations

Other Plot Elements

Color Considerations

Color can be used for a number of purposes:
- Encoding numeric values
- Distinguishing or highlighting visual elements
- Mood & effect
Perception of colors depends on context:
- Medium: paper, poster, screen, projected presentation
- Lighting: glare, contrast,
- Audience: Colorblindness?

Grouping: Gestalt Principles

Proximity: When objects are close together, we often perceive them as a group
Similarity: When objects share similar attributes (color, shape, etc.), we often perceive them as a group
Enclosure: When objects are surrounded by a boundary, we often perceive them as a group
Closure: Sometimes partially open structures can still be perceived as a grouping metaphor (e.g., “\(\left[ \ldots \right]\)”)
Connectivity: When you draw curves or lines through data elements, this is often perceived as creating a connection between them

Proximity

When objects are close together, we often perceive them as a group

Similarity

When objects share similar attributes (color, shape, etc.), we often perceive them as a group

Enclosure

When objects are surrounded by a boundary, we often perceive them as a group

Closure

Sometimes partially open structures can still be perceived as a grouping metaphor

Connectivity

When you draw curves or lines through data elements, this is often perceived as creating a connection between them

Abstraction & Cognitive Considerations

Lines Imply Connection

Lines imply connection … don’t use them if there isn’t any

Groups Imply Connection

Group things so that the most important things to compare are closest

Memory Limitations

Humans have different kinds of memory, stored differently and in different parts of the brain
- Long-term vs. working memory
- Verbal memory vs. visual memory
Working memory for visual information is very limited
Humans can retain roughly three chunks of information at a time
Visualizations can help “chunk” information together

Keeping It Together

We should avoid “fragmentation” (separating things that should be remembered together)
So place the things most related closest together – things that you most want the reader to remember together
Highlight and annotate things explicitly, if you want the reader to notice them

Building Blocks for Visualization

Visual Perception: properties, objects, etc. (align based on preattentive processes)
Quantitative Reasoning: relationships, comparisons, etc. (align based on postattentive processes like memory)
Visualization: patterns, trends, anomalies (align with higher level concepts)

Design Elements

Motivating Questions of Good Design

How can the perception of the design be influenced?
How can people learn from the design?
How can the design be more usable?
How can the design be more appealing?
How can we make better design decisions in general?

Three Critical Design Elements

Affordances
Accessibility
Aesthetics

Affordances in Product Design

In product design, affordances are aspects inherent to the design that make it obvious how to use the product

Affordances in Data Visualization Desgin

Highlight important content
Eliminate distractions
Create a clear hiearchy of information

Highlight the Important Content

Very sparingly used typesetting techniques in text (bold, case, color, font size, etc.) can draw a reader’s eye to something, but too much is clutter
Color, shape, and size can be used to visually attract attention

Poor Highlighting Example

Demonstrating impact of a degree on marriage rate? Don’t distract!

Better Highlighting Example

Demonstrating impact of a degree on marriage rate? Pivot & highlight that!

Eliminate Distractions

Keep in mind that:

Not all data is equally important
When detail isn’t needed, summarize
Ask yourself: Would eliminating this change anything?
Push necessary but non-message-impacting items to the background

Create a Clear Hierarchy of Information

Think about the order you want the reader’s eyes will be drawn through the material
Use preattentive principles to guide the reader through the material in that way
Think in terms of grouping and hierarchy

Chunking Information

Combine units of information into a small number of chunks so that information is easier to process
Accomodates short-term memory limits
Things that are similar and familiar are easier to remember than the disparate and atypical
But an exception is easiest to recall: The thing that is not like the other

Cognitive Dissonance

Cognitive dissonance – feeling that arises from the tendancy to seek consistency, even when such doesn’t exist
Three ways of reducing this:
- Remove dissonant congitions (remove inconsitencies)
- Add consonant cognitions (create consistencies)
- Reduce the importance of the dissonance conditions

Characteristics of Accessibility

Good design maximizes the number of people the find it to be …

Perceptabe
Operable
Simple
Robust to error in use

Accessibility in Data Visualization

Don’t overcomplicate
- Make it legible and clean
- Use straightforward language
- Be willing to give up a bit of meaning to favor simple over complicated
Annotation/Text is helpful
- Annotating important points communicates directly
- If you highlight something, consider also annotating it
- Label your axes
Consider audience accessibilities issues, as well (e.g., colorblindness)

Aesthetics in Design

Aesthetic-usability effect – People tend to perceive more aesthetic designs as easier to use than less aesthetic designs, whether they are or not
More aesthetic designs have a higher probability of being used
More aesthetic designs fost positive attitudes and make people more tolerant of other design problems
Positive relationships with a design result in more engagement and interaction

Aesthetics in Data Visualization

Be smart and judicious with the use of color
Pay attention to alignment
Make effective use of white space
Make the art serve the data, rather than act as a spectacle itself

Area vs. Edge Alignment

Aligning objects based on edges (e.g., spacing elements horizontally or vertically) works well when items are roughly uniform
High degrees of asymmetry between objects creates a visual impression of misalignment when using the edges of objects
Alternatively, consider aligning so that objects use an equal amount of overall area: i.e., large objects use more space than small objects
This can all apply to text, as well as graphics

Color

Use color conseratively and limit the palette to what the eye can process at one (preattentive) glance – about five colors
Using color combinations that are adjascent on a color wheel when possible
Use warmer colors for foreground elements, cooler colors for background, and light gray for grouping/contrast
Use saturated colors when attracting attention

Clutter

Emphasize Data

In general, good plots should:

Make the data stand out
Make it easy for the reader to decode the data
Avoid unnecessary “chart junk”
Use large enough plot elements to see and distinguish data

Cognitive Load

Cognitive Load – The effort used in working memory to accomplish a particular mental task. There are three types:

Intrinsic – Effort associated with a specific task or topic
Extraneous – The way information or tasks are presented to a learner (!!)
Germane – Work needed to create a permanent store of knowledge

High cognitive load situations can create physical effects in the body (e.g., loss of balance, increased heart rate, etc.)

Clutter

Clutter – visual elements that take up space but do not increase understanding

Makes our visuals appear more complicated than necessary
Can create an uncomfortable experience for the audience (increases cognitive load)
Runs the risk of audience losing focus or interest
To combat this: use our gestalt principles to create simple visual elements to organize your space

Visual Order

Good design is unnoticed by most audience members
Disorganized or haphazard layout or visual organizations lends to the impression of clutter
Emphasize key points (bold, color, highlight, etc.) and align visual elements
Horizontal alignments are most natural for most readers
Diagonal alignments, annotation lines, and other elements can appear “messy”
Rotated text is particularly harder to read for most readers

Contrast

Using contrast in visual plot elements (e.g., color) is important for focusing the reader’s attention
But when the contrast is unclear, or the purpose of the contrast is unclear, it becomes a form of visual clutter
The inherent purpose of visual contrast is pre-attentive: We want to focus the reader’s attention to something or away from something
So think carefully about color and shape use in terms of strategic use of contrast

Cluttered Scatterplot Example

Cleaner Trellised Barplot Example

Decluttering

Remove or deemphasize unnecessary borders
Remove or deemphasize unnecessary gridlines
Clean up axis labels and try to make them horizontal
Label data directly instead of using legends, where appropriate
Leverage consistent color schemes

Gathering Data

Collecting Your Own Data

Often, we are coordinating some experiment and collecting empirical results from our own experiments

We might be manually recording data based on observation
We might be using some software that produces data
We might have colleagues that give us empirical data
Sometimes, it’s all of these

When we are organizing our own data, we should choose to lay it out and store it in a way that makes later analysis and visualization as easy as possible

Garbage-In, Garbage-Out

Manually entered data is subject to periodic errors (e.g., typos)
Automatically populated data is subject to systemic error (e.g., miscalculating some statistic)
You should assume that any data you have (whether you recorded it or someone else) is likely to contain errors
So check the data!
You should also keep track of the metadata:
- Where did the data come from?
- When was the data collected?
- What does the data represent?
- What are the observations, what are the variables?
- Etc.

Where to Find Data

Sometimes you must collect data not directly part of your own project

Most statistics tools have some built-in data sets
Many public institutions publish their data
Some field-specific sites provide a portal to relevant data sets
There are programs and libraries designed to help you access certain kinds of data
You can always email authors / PIs and ask for their data (the worst that can happen is that they say, “no”)

Public Data

Search the web:
- Google maintains a public data repository
- Wolfram Alpha maintains socioeconomic data sets
Check Public Institutions:

Example Field-Specific Sites

Example Programs / Libraries

API to access social media data: Flickr, Facebook, Twitter
Google API to access Google Docs & Maps, etc.
Freebase
GeoCommons
OpenStreetMap

Storing & Formatting Data

Common Data Storage Formats

Common data storage formats include:

Excel
Delimited text (e.g, CSV)
JSON
HTML / XML
Database (e.g., MySQL tables)
SAS, SPSS, or other stat-package format

Data challenges

Dealing with data is usually the most challenging part.
As it is produced, data often

Comes from multiple sources or appears in multiple files / tables
Must be manually entered or automatically extracted and consolidated
Is missing values or has inaccuracies
Is not formatted conveniently for visualization tool

Data Scraping

Sometimes we can find the data we want on the web but:
- It’s not in one place
- It’s not in one file
- It’s posted on the web in HTML or other formats
We might tediously visit sites and record values in a format we can use
More often, we write code to automatically gather and reformat the data
Such automated processes are called data scraping

Data Munging

Sometimes we can conveniently collect certain variables of data but:
- Those variables are not in a form you want
- You must perform some grouping, summarizing, or other computation to create new variables
Again, we might record the true data in one place and tediously manipulate and extract what we need to record in another place
But usually we have computers do all this for us
We sometimes call such automated processes data munging

Data Formatting Tools

There are some tools to help with managing data:
- Google’s OpenRefine
- Mr. Data Converter
Many of the tools we have can do some conversion
- Excel can export in CSV format
- R can read xlsx and SPSS formats
But most data scraping and munging is done in a programming language
- Python
- Perl
- Unix shell scripts
- R

A Typical Workflow

Collect data using algorithm, device, or tool from your field
Perform scraping and munging tasks in Python
- Python has very powerful text manipulation tools built in
- Python has well-developed data manipulation libaries (e.g., pandas)
- Python’s data visualization packages are not great
Load data in R for statistical testing and visualization
- R is designed for data analysis
- ggplot2 is a particularly mature and well-known data visualization library
- But R’s file parsing routines are tedious and krufty