Summer 2020

Outline

  1. Graphical Perception Tasks
  2. Encoding Numeric Data, Proportions, Frequencies
  3. Encoding Categorical Distinctions
  4. Abstraction & Cognitive Considerations
  5. Design Principles
  6. Clutter
  7. Gathering Data
  8. Storing & Formatting Data

Graphical Perception Tasks

The Visual Perception System

  • Eyes sense light reflecting & refracting off of surfaces
  • A composite object is formed in our brain from various visual properties
  • We perceive composite as a whole object, but we distinguish these properties
  • For example: 2D location, length, width, area, shape, color, orientation
  • We do not attend to everything we see
  • We do not have good working memory for what we see

Preattentive Processing

Humans have a limited set of visual properties that are detected very rapidly and accurately by our visual system before we are consciously aware of it

  • We easily detect the presence or absence of a target within a visual field
  • We easily detect texture boundary between two groups of elements
  • We easily track an element with a unique visual feature in space and time


“Perception in Visualization”, Chris Healey, NC State

Preattentive Processing – Color is Easy

Find the red circle:

Preattentive Processing – Shape is Easy

Find the red circle:

Preattentive Processing – Conjunction is Harder

Find the red circle:

Preattentive Processing – Other Pre-attentive Cues

Postattentive Vision

What hapens to our visual representation when we stop attending and look at something else?

  • Sustained attention to objects do not make visual search more efficient
  • Repeated visual searches are not more efficient
  • Once we see a pattern, we match the pattern even when it isn’t there
  • Moral: Do not make users search for things in your visualization, but draw attention to things explicitly


“Perception in Visualization”, Chris Healey, NC State

Familiar Patterns

Humans are pattern matchers …

Familiar Patterns

See the dolphin!

Familiar Patterns

We cannot easily “unsee” things …

Familiar Patterns

There is no spoon … er … dolphin!

Poor Working Memory

We rely on memory, but our working visual memory is very limited

Visual Encoding

  • Tasks to be done when visualizing information:
    • Encode numeric data visually
    • Encode cateogrical data visually
    • Encode distinctions between different pieces of information
    • Encode methods to associate data / distinctions to some context
  • Objective: To make the reader’s decoding process as easy and error-free as possible

Encoding Numeric & Categorical Data

  • Typically the categorical data we wish to encode in fact numeric:
    • Proportions: a continuous number between 0 and 1
    • Frequencies: a discrete integer or count
  • So often the most fundamental encoding choices for numeric and categorical values to be plotted are the same

Distinguishing Graphical Elements

  • Whether underlying variables are categorical or numeric, we often have multiple things on a plot. E.g.,
    • Proportions from different levels of some variable
    • Different categorical level values in a factor
    • Different numeric variable values
    • Different trend lines in a time series
  • Because the reader needs to discern these as different things, our encoding must distinguish these for the reader in some way

Encoding Numeric Data, Proportions, Frequencies

Common Ways to Visually Encode Numbers

Preattention & Quantitative Perception

  • Preattentive attributes make useful visual peroperties for encoding
  • Few considers even more properties and groups them:
Group Attribute
Form length, width, orientation,size, shape, curvature, enclosure, blur
Color hue, intensity
Spatial Position 2D position, spatial grouping
Motion direction

Precision of Preattentive Quantitivate Perception

  • Visual attributes like length and 2D position encode quantitative data very precisely
  • Visual attributes like width, size, color intensity, and blur do not not
  • Data visualizations should align more precise attributes with variables that have the greatest need of precision

Position on a Common Scale

Position on Non-Aligned Axes

Length Comparisons

Length Comparisons

Area Comparisons

Angle / Curve Comparisons

Color Comparisons

Encoding Categorical Distinctions

Common Ways to Distinguish Visual Elements

Often we need to separate or distinguish discrete visual items using:

  • Distinct positions
  • Different colors or shading
  • Distinguishing symbols, words, or annotations
  • Other plot elements (e.g., line thickness)

Distinct Positions

Different Colors or Shading

Distinguishing Symbols, Words, or Annotations

Other Plot Elements

Color Considerations

  • Color can be used for a number of purposes:
    • Encoding numeric values
    • Distinguishing or highlighting visual elements
    • Mood & effect
  • Perception of colors depends on context:
    • Medium: paper, poster, screen, projected presentation
    • Lighting: glare, contrast,
    • Audience: Colorblindness?

Grouping: Gestalt Principles

  1. Proximity: When objects are close together, we often perceive them as a group

  2. Similarity: When objects share similar attributes (color, shape, etc.), we often perceive them as a group

  3. Enclosure: When objects are surrounded by a boundary, we often perceive them as a group

  4. Closure: Sometimes partially open structures can still be perceived as a grouping metaphor (e.g., “\(\left[ \ldots \right]\)”)

  5. Connectivity: When you draw curves or lines through data elements, this is often perceived as creating a connection between them

Proximity

When objects are close together, we often perceive them as a group

Similarity

When objects share similar attributes (color, shape, etc.), we often perceive them as a group

Enclosure

When objects are surrounded by a boundary, we often perceive them as a group

Closure

Sometimes partially open structures can still be perceived as a grouping metaphor

Connectivity

When you draw curves or lines through data elements, this is often perceived as creating a connection between them

Abstraction & Cognitive Considerations

Lines Imply Connection

Lines imply connection … don’t use them if there isn’t any

Groups Imply Connection

Group things so that the most important things to compare are closest

Memory Limitations

  • Humans have different kinds of memory, stored differently and in different parts of the brain
    • Long-term vs. working memory
    • Verbal memory vs. visual memory
  • Working memory for visual information is very limited
  • Humans can retain roughly three chunks of information at a time
  • Visualizations can help “chunk” information together

Keeping It Together

  • We should avoid “fragmentation” (separating things that should be remembered together)
  • So place the things most related closest together – things that you most want the reader to remember together
  • Highlight and annotate things explicitly, if you want the reader to notice them

Building Blocks for Visualization

  1. Visual Perception: properties, objects, etc. (align based on preattentive processes)
  2. Quantitative Reasoning: relationships, comparisons, etc. (align based on postattentive processes like memory)
  3. Visualization: patterns, trends, anomalies (align with higher level concepts)

Design Elements

Motivating Questions of Good Design

  • How can the perception of the design be influenced?

  • How can people learn from the design?

  • How can the design be more usable?

  • How can the design be more appealing?

  • How can we make better design decisions in general?

Three Critical Design Elements

  1. Affordances

  2. Accessibility

  3. Aesthetics

Affordances in Product Design

  • In product design, affordances are aspects inherent to the design that make it obvious how to use the product

Affordances in Data Visualization Desgin

  1. Highlight important content

  2. Eliminate distractions

  3. Create a clear hiearchy of information

Highlight the Important Content

  • Very sparingly used typesetting techniques in text (bold, case, color, font size, etc.) can draw a reader’s eye to something, but too much is clutter

  • Color, shape, and size can be used to visually attract attention

Poor Highlighting Example

Demonstrating impact of a degree on marriage rate? Don’t distract!

Better Highlighting Example

Demonstrating impact of a degree on marriage rate? Pivot & highlight that!

Eliminate Distractions

Keep in mind that:

  • Not all data is equally important

  • When detail isn’t needed, summarize

  • Ask yourself: Would eliminating this change anything?

  • Push necessary but non-message-impacting items to the background

Create a Clear Hierarchy of Information

  • Think about the order you want the reader’s eyes will be drawn through the material

  • Use preattentive principles to guide the reader through the material in that way

  • Think in terms of grouping and hierarchy

Chunking Information

  • Combine units of information into a small number of chunks so that information is easier to process

  • Accomodates short-term memory limits

  • Things that are similar and familiar are easier to remember than the disparate and atypical

  • But an exception is easiest to recall: The thing that is not like the other

Cognitive Dissonance

  • Cognitive dissonance – feeling that arises from the tendancy to seek consistency, even when such doesn’t exist

  • Three ways of reducing this:
    • Remove dissonant congitions (remove inconsitencies)
    • Add consonant cognitions (create consistencies)
    • Reduce the importance of the dissonance conditions

Characteristics of Accessibility

Good design maximizes the number of people the find it to be …

  • Perceptabe

  • Operable

  • Simple

  • Robust to error in use

Accessibility in Data Visualization

  • Don’t overcomplicate
    • Make it legible and clean
    • Use straightforward language
    • Be willing to give up a bit of meaning to favor simple over complicated
  • Annotation/Text is helpful
    • Annotating important points communicates directly
    • If you highlight something, consider also annotating it
    • Label your axes
  • Consider audience accessibilities issues, as well (e.g., colorblindness)

Aesthetics in Design

  • Aesthetic-usability effect – People tend to perceive more aesthetic designs as easier to use than less aesthetic designs, whether they are or not

  • More aesthetic designs have a higher probability of being used

  • More aesthetic designs fost positive attitudes and make people more tolerant of other design problems

  • Positive relationships with a design result in more engagement and interaction

Aesthetics in Data Visualization

  • Be smart and judicious with the use of color

  • Pay attention to alignment

  • Make effective use of white space

  • Make the art serve the data, rather than act as a spectacle itself

Area vs. Edge Alignment

  • Aligning objects based on edges (e.g., spacing elements horizontally or vertically) works well when items are roughly uniform

  • High degrees of asymmetry between objects creates a visual impression of misalignment when using the edges of objects

  • Alternatively, consider aligning so that objects use an equal amount of overall area: i.e., large objects use more space than small objects

  • This can all apply to text, as well as graphics

Color

  • Use color conseratively and limit the palette to what the eye can process at one (preattentive) glance – about five colors

  • Using color combinations that are adjascent on a color wheel when possible

  • Use warmer colors for foreground elements, cooler colors for background, and light gray for grouping/contrast

  • Use saturated colors when attracting attention

Clutter

Emphasize Data

In general, good plots should:

  • Make the data stand out
  • Make it easy for the reader to decode the data
  • Avoid unnecessary “chart junk”
  • Use large enough plot elements to see and distinguish data

Cognitive Load

Cognitive Load – The effort used in working memory to accomplish a particular mental task. There are three types:

  • Intrinsic – Effort associated with a specific task or topic

  • Extraneous – The way information or tasks are presented to a learner (!!)

  • Germane – Work needed to create a permanent store of knowledge

High cognitive load situations can create physical effects in the body (e.g., loss of balance, increased heart rate, etc.)

Clutter

Clutter – visual elements that take up space but do not increase understanding

  • Makes our visuals appear more complicated than necessary

  • Can create an uncomfortable experience for the audience (increases cognitive load)

  • Runs the risk of audience losing focus or interest

  • To combat this: use our gestalt principles to create simple visual elements to organize your space

Visual Order

  • Good design is unnoticed by most audience members

  • Disorganized or haphazard layout or visual organizations lends to the impression of clutter

  • Emphasize key points (bold, color, highlight, etc.) and align visual elements

  • Horizontal alignments are most natural for most readers

  • Diagonal alignments, annotation lines, and other elements can appear “messy”

  • Rotated text is particularly harder to read for most readers

Contrast

  • Using contrast in visual plot elements (e.g., color) is important for focusing the reader’s attention

  • But when the contrast is unclear, or the purpose of the contrast is unclear, it becomes a form of visual clutter

  • The inherent purpose of visual contrast is pre-attentive: We want to focus the reader’s attention to something or away from something

  • So think carefully about color and shape use in terms of strategic use of contrast

Cluttered Scatterplot Example

Cleaner Trellised Barplot Example

Decluttering

  1. Remove or deemphasize unnecessary borders

  2. Remove or deemphasize unnecessary gridlines

  3. Clean up axis labels and try to make them horizontal

  4. Label data directly instead of using legends, where appropriate

  5. Leverage consistent color schemes

Gathering Data

Collecting Your Own Data

Often, we are coordinating some experiment and collecting empirical results from our own experiments

  • We might be manually recording data based on observation
  • We might be using some software that produces data
  • We might have colleagues that give us empirical data
  • Sometimes, it’s all of these

When we are organizing our own data, we should choose to lay it out and store it in a way that makes later analysis and visualization as easy as possible

Garbage-In, Garbage-Out

  • Manually entered data is subject to periodic errors (e.g., typos)
  • Automatically populated data is subject to systemic error (e.g., miscalculating some statistic)
  • You should assume that any data you have (whether you recorded it or someone else) is likely to contain errors
  • So check the data!
  • You should also keep track of the metadata:
    • Where did the data come from?
    • When was the data collected?
    • What does the data represent?
    • What are the observations, what are the variables?
    • Etc.

Where to Find Data

Sometimes you must collect data not directly part of your own project

  • Most statistics tools have some built-in data sets
  • Many public institutions publish their data
  • Some field-specific sites provide a portal to relevant data sets
  • There are programs and libraries designed to help you access certain kinds of data
  • You can always email authors / PIs and ask for their data (the worst that can happen is that they say, “no”)

Public Data

Example Field-Specific Sites

Example Programs / Libraries

Storing & Formatting Data

Common Data Storage Formats

Common data storage formats include:

  • Excel
  • Delimited text (e.g, CSV)
  • JSON
  • HTML / XML
  • Database (e.g., MySQL tables)
  • SAS, SPSS, or other stat-package format

Data challenges

Dealing with data is usually the most challenging part.
As it is produced, data often

  • Comes from multiple sources or appears in multiple files / tables
  • Must be manually entered or automatically extracted and consolidated
  • Is missing values or has inaccuracies
  • Is not formatted conveniently for visualization tool

Data Scraping

  • Sometimes we can find the data we want on the web but:
    • It’s not in one place
    • It’s not in one file
    • It’s posted on the web in HTML or other formats
  • We might tediously visit sites and record values in a format we can use
  • More often, we write code to automatically gather and reformat the data
  • Such automated processes are called data scraping

Data Munging

  • Sometimes we can conveniently collect certain variables of data but:
    • Those variables are not in a form you want
    • You must perform some grouping, summarizing, or other computation to create new variables
  • Again, we might record the true data in one place and tediously manipulate and extract what we need to record in another place
  • But usually we have computers do all this for us
  • We sometimes call such automated processes data munging

Data Formatting Tools

  • There are some tools to help with managing data:
  • Many of the tools we have can do some conversion
    • Excel can export in CSV format
    • R can read xlsx and SPSS formats
  • But most data scraping and munging is done in a programming language
    • Python
    • Perl
    • Unix shell scripts
    • R

A Typical Workflow

  1. Collect data using algorithm, device, or tool from your field
  2. Perform scraping and munging tasks in Python
    • Python has very powerful text manipulation tools built in
    • Python has well-developed data manipulation libaries (e.g., pandas)
    • Python’s data visualization packages are not great
  3. Load data in R for statistical testing and visualization
    • R is designed for data analysis
    • ggplot2 is a particularly mature and well-known data visualization library
    • But R’s file parsing routines are tedious and krufty