Spring 2025

Graphical Perception Tasks

The Visual Perception System

  • Eyes sense light reflecting & refracting off of surfaces
  • A composite object is formed in our brain from various visual properties
  • We perceive composite as a whole object, but we distinguish these properties
  • For example: 2D location, length, width, area, shape, color, orientation
  • We do not attend to everything we see
  • We do not have good working memory for what we see

Preattentive Processing

Humans have a limited set of visual properties that are detected very rapidly and accurately by our visual system before we are consciously aware of it

  • We easily detect the presence or absence of a target within a visual field
  • We easily detect texture boundary between two groups of elements
  • We easily track an element with a unique visual feature in space and time


“Perception in Visualization”, Chris Healey, NC State

Preattentive Processing – Color is Easy

Find the red circle:

Preattentive Processing – Shape is Easy

Find the red circle:

Preattentive Processing – Conjunction is Harder

Find the red circle:

Preattentive Processing – Other Pre-attentive Cues

Postattentive Vision

What hapens to our visual representation when we stop attending and look at something else?

  • Sustained attention to objects do not make visual search more efficient
  • Repeated visual searches are not more efficient
  • Once we see a pattern, we match the pattern even when it isn’t there
  • Moral: Do not make users search for things in your visualization, but draw attention to things explicitly


“Perception in Visualization”, Chris Healey, NC State

Familiar Patterns

Humans are pattern matchers …

Familiar Patterns

See the dolphin!

Familiar Patterns

We cannot easily “unsee” things …

Familiar Patterns

There is no spoon … er … dolphin!

Poor Working Memory

We rely on memory, but our working visual memory is very limited

Visual Encoding

  • Tasks to be done when visualizing information:
    • Encode numeric data visually
    • Encode cateogrical data visually
    • Encode distinctions between different pieces of information
    • Encode methods to associate data / distinctions to some context
  • Objective: To make the reader’s decoding process as easy and error-free as possible

Encoding Numeric & Categorical Data

  • Typically the categorical data we wish to encode in fact numeric:
    • Proportions: a continuous number between 0 and 1
    • Frequencies: a discrete integer or count
  • So often the most fundamental encoding choices for numeric and categorical values to be plotted are the same

Distinguishing Graphical Elements

  • Whether underlying variables are categorical or numeric, we often have multiple things on a plot. E.g.,
    • Proportions from different levels of some variable
    • Different categorical level values in a factor
    • Different numeric variable values
    • Different trend lines in a time series
  • Because the reader needs to discern these as different things, our encoding must distinguish these for the reader in some way

Common Ways to Visually Encode Numbers

Position on a Common Scale

Position on Non-Aligned Axes

Length Comparisons

Length Comparisons

Area Comparisons

Angle / Curve Comparisons

Color Comparisons

Common Ways to Distinguish Visual Elements

Often we need to separate or distinguish discrete visual items using:

  • Distinct positions
  • Different colors or shading
  • Distinguishing symbols, words, or annotations
  • Other plot elements (e.g., line thickness)

Distinct Positions

Different Colors or Shading

Distinguishing Symbols, Words, or Annotations

Other Plot Elements

Color Considerations

  • Color can be used for a number of purposes:
    • Encoding numeric values
    • Distinguishing or highlighting visual elements
    • Mood & effect
  • Perception of colors depends on context:
    • Medium: paper, poster, screen, projected presentation
    • Lighting: glare, contrast,
    • Audience: Colorblindness?

Grouping: Gestalt Principles

  1. Proximity: When objects are close together, we often perceive them as a group

  2. Similarity: When objects share similar attributes (color, shape, etc.), we often perceive them as a group

  3. Enclosure: When objects are surrounded by a boundary, we often perceive them as a group

  4. Closure: Sometimes partially open structures can still be perceived as a grouping metaphor (e.g., “\(\left[ \ldots \right]\)”)

  5. Connectivity: When you draw curves or lines through data elements, this is often perceived as creating a connection between them

Proximity

When objects are close together, we often perceive them as a group

Similarity

When objects share similar attributes (color, shape, etc.), we often perceive them as a group

Enclosure

When objects are surrounded by a boundary, we often perceive them as a group

Closure

Sometimes partially open structures can still be perceived as a grouping metaphor

Connectivity

When you draw curves or lines through data elements, this is often perceived as creating a connection between them

Lines Imply Connection

Lines imply connection … don’t use them if there isn’t any

Groups Imply Connection

Group things so that the most important things to compare are closest

Memory Limitations

  • Humans have different kinds of memory, stored differently and in different parts of the brain
    • Long-term vs. working memory
    • Verbal memory vs. visual memory
  • Working memory for visual information is very limited
  • Humans can retain roughly three chunks of information at a time
  • Visualizations can help “chunk” information together

Keeping It Together

  • We should avoid “fragmentation” (separating things that should be remembered together)
  • So place the things most related closest together – things that you most want the reader to remember together
  • Highlight and annotate things explicitly, if you want the reader to notice them

R: Visualizing Single Variate Distributions & Values

Lolipop Plots for Discrete Distributions

Suppose we want to visualize a Binomial distribution, \(n=15,\; p=0.25\)

library(ggplot2)

k = 0:15
pmf = dbinom(k,  size=max(k),  prob=0.25)
MyData = data.frame(k,  pmf)

ggplot(MyData,aes(x=k,  y=pmf)) + 
  geom_linerange(ymin=0,  ymax=pmf,  size=1.25) + 
  geom_point(size=3.5) + 
  ylab("Pr{k}") +
  theme(text=element_text(size=18, family="Times"))

Lolipop Plots for Discrete Distributions

Distribution Plot for Continuous Distributions

Suppose we want to visualize a Normal distribution, \(\mu = 5, \sigma=2\)

library(ggplot2)

ggplot(data.frame(x=c(-5,15),y=c(0,1)),aes(x=x,y=y)) + 
  stat_function(fun=dnorm,args=list(mean=5,sd=2)) + 
  ggtitle("Normal Distribution, ~N(5,2)") +
  theme(text=element_text(size=18, family="Times"))

Distribution Plot for Continuous Distributions

Estimating Distributions with Histograms

To get a rough picture of the distribution of a sample, use a histogram

library(ggplot2)

MyData = data.frame(val=rnorm(200))

ggplot(MyData,aes(x=val)) + 
  geom_histogram(binwidth=0.5, col="white", fill="darkblue") +
  xlab("Value") + ylab("Count") + ggtitle("Histogram of MyData") +
  theme(text=element_text(size=18, family="Times"))

Estimating Distributions with Histograms

Estimating Distributions with Density Plots

Or a density plot

library(ggplot2)

MyData = data.frame(val=rnorm(200))

ggplot(MyData,aes(x=val)) + 
  geom_density(fill="pink",col=NA) +
  xlab("Value") + ylab("Density") + ggtitle("Density of MyData") +
  theme(text=element_text(size=18, family="Times"))

Estimating Distributions with Density Plots

Estimating Distributions with Several Plots

Or all of these

library(ggplot2)

MyData = data.frame(val=rnorm(200)) 

mu = mean(MyData$val)
sig = sqrt(var(MyData$val))

ggplot(MyData,aes(x=val)) + 
  geom_density(fill="pink",col=NA) +
  geom_histogram(binwidth=0.5, aes(y=..density..), col="white", alpha=0.4) +
  stat_function(fun=dnorm,arg=list(mean=mu,sd=sig), size=1.5, col="darkred") +
  xlab("Value") + ylab("Density") + 
  ggtitle("Estimating MyData Distribution") +
  theme(text=element_text(size=18, family="Times"))

Estimating Distributions with Several Plots

Q-Q Norm Plots

Q-Q plots give us a way to see how close to a normal distribution
our data might be

Right Skew Short Tails
Left Skew Long Tails

Q-Q Norm Plots

MyData = data.frame(val=rnorm(200))

qqnorm(MyData$val,pch=19,col="darkgray")
qqline(MyData$val,lwd=2,col="darkred")

Q-Q Norm Plots

Dot Plot

Dot plots use position to encode a numeric value, proportion, or frequency

library(ggplot2)

MyData = data.frame(State=state.name[1:10], Area=state.area[1:10])

ggplot(MyData,aes(x=Area,y=State)) +
  geom_point(size=4) +
  xlab("Area (sq. miles)") +
  theme(text=element_text(size=18, family="Times"))   

Dot Plot

Dot plots use position to encode a numeric value, proportion, or frequency

Note: There’s no implicit meaning to the \(y\)-axis positions

Ordered Dot Plot

So we can order the dot plot based on value typically to make it easier to read

library(ggplot2)

MyData = data.frame(State=state.name[1:10], Area=state.area[1:10])
MySortedData = transform(MyData, State=reorder(State,Area))

ggplot(MySortedData,aes(x=Area,y=State)) +
  geom_point(size=4) +
  xlab("Area (sq. miles)")  +
  theme(text=element_text(size=18, family="Times"))

Ordered Dot Plot

Ordered Bar Plot

Bar plots use length and position to encode a numeric value

library(ggplot2)

MyData = data.frame(State=state.name[1:10], Area=state.area[1:10])
MySortedData = transform(MyData, State=reorder(State,Area))

ggplot(MySortedData,aes(x=State,y=Area)) +
  geom_bar(stat="identity") +
  coord_flip() +
  ylab("Area (sq. miles)")  +   # Recall we flipped the axes ...
  theme(text=element_text(size=18, family="Times"))

Note: Again, these are ordered for ease of reading …

Ordered Bar Plot

Bar Plots are Not Histograms

  • Histograms visualize an estimate for a distribution of a numeric variable
  • The bins in the histogram remain in the order given by the values
  • While bar plots visualize the values of specific observations
  • And the order of the bar plots presented is typically up to us

Line Plot

For example, use lines to connect the same algorithm at different points during a run

library(ggplot2)

fakeData = data.frame(evals=c(100,150,200,250),
                      performance=c(1000.1,1300.2,1410.6,1470.3),
                      ci=c(150,90,50,30))

ggplot(fakeData,aes(evals,performance)) +
  geom_errorbar(aes(ymin=performance-ci/2, ymax=performance+ci/2),
                size=0.5, width=10) +
  geom_line(color="darkblue", size=1.25) +
  geom_point(size=5) +
  xlab("Number of Evaluations") +
  ylab("Algorithm Performance") +
  theme(text=element_text(size=18, family="Times"))

Line Plot

Box Plots

Box plots give information about the median, inter-quartiles, outliers, as well as confidence inervals

library(ggplot2)

ggplot(mtcars, aes(1,y=mpg)) +
  geom_boxplot(notch=T, fill="pink") +
  theme(axis.text.x=element_blank(), axis.ticks.x=element_blank()) +
  xlim(c(0,2)) + 
  xlab("") + ylab("Mileage") + 
  ggtitle("Distribution of Car Mileage") +
  theme(text=element_text(size=18, family="Times"))

Box Plots

R: Visualizing Multi-Variate Distributions & Values

Overlaid Lolipop Plots for Discrete Distributions

Use dodge to visualize multiple Binomial distributions

library(ggplot2)

k = 0:15
p = factor(c(rep(0.25,length(k)),rep(0.4,length(k))))
pmf = c(dbinom(k,  size=max(k),  prob=0.25), dbinom(k,  size=max(k),  prob=0.4))
MyData = data.frame(k,  p, pmf)

ggplot(MyData, aes(x=k,  y=pmf, group=p)) + 
  geom_linerange(ymin=0,  
                 aes(ymax=pmf, color=p),  
                 size=1.25, 
                 position=position_dodge(width=0.25)) + 
  geom_point(size=3.5, position=position_dodge(width=0.25), aes(color=p)) + 
  ylab("Pr{k}") +
  ggtitle("Two Binomial Distributions, n=15, p=0.25 and p=0.4") 

Overlaid Lolipop Plots

Label text is too small? Use theme()

Overlaid Density Plots of Multiple Variables

You can use factors to separate different plots straightforwardly

library(ggplot2)
library(MASS)                         # Contains a lot of extra data sets

birthwt1 = birthwt                    # Copy a birth Wt / risk factor data set 
birthwt1$smoke = factor(birthwt$smoke) # Make "smoking during preg." a factor

ggplot(birthwt1, aes(x=bwt, fill=smoke)) + 
  geom_density(alpha=0.3) +
  xlab("Birth Weight (g)") +
  ylab("Distribution Density") +
  scale_fill_discrete(name="Mom Smoked?",
                      labels=c("No","Yes")) + 
  theme(text=element_text(size=20, family="Times"))

Overlaid Density Plots of Multiple Variables

Overlaid Histograms of Multiple Variables

library(ggplot2)
library(MASS)                    # Contains a lot of extra data sets

bwt = birthwt$bwt                # Get the birth Wt / risk factor vector
smoke = as.factor(birthwt$smoke) # Make "smoking during preg." variable a factor
MyData = data.frame(bwt,smoke)

ggplot(MyData, aes(x=bwt, fill=smoke)) +
  geom_histogram(aes(y=..density..),
           binwidth=500,
           position=position_dodge(width=500),
           color="black") +
  xlab("Birth Weight (g)") +
  ylab("Distribution Density") +
  scale_fill_discrete(name="Mom Smoked?",
                      labels=c("No","Yes")) + 
  theme(text=element_text(size=20, family="Times"))

Overlaid Histograms of Multiple Variables

Two-Dimensional Density Plots

You can use stat_density2d to create contour density plots

library(ggplot2)
library(gcookbook)

ggplot(faithful, aes(x=eruptions, y=waiting)) +
  stat_density2d(aes(color=..level..), size=1.5) +
  xlab("Eruption Time (min)") +
  ylab("Time Between Eruptions (min)") +
  scale_color_continuous(name="Distribution\nDensity") +
  ggtitle("Old Gaithful Geyser Eruptions") + 
  theme(text=element_text(size=20, family="Times"))

Two-Dimensional Density Plots

The Basic Scatterplot

Use geom_point for scatter plots of numeric values

library(ggplot2)
library(MASS)

ggplot(Boston,aes(x=age, y=medv, size=crim, color=dis)) + 
  geom_point() + 
  scale_size(range=c(2.5,10)) + 
  xlab("Age of Home") + 
  ylab("Median Home Value (thousands)") + 
  scale_size_continuous(name="Township\nCrime Rate") +
  scale_color_continuous(name="Distance to\nEmployment") +
  ggtitle("Houses of Boston") + 
  theme(text=element_text(size=20, family="Times"))

The Basic Scatterplot

Pairwise Scatterplots

The standard R function pairs allows us to see all pairwise scatter plots

pairs(iris[1:4],pch=19)

Pairwise Scatterplots

Pairwise Scatterplots with GGally

If you install the GGally library, you get a ggplot version with ggpairs

library(GGally)

ggpairs(iris) + 
  theme(text=element_text(size=20, family="Times"))

Pairwise Scatterplots with GGally

Co-Plotting Multiple Trends

Co-Plotting Multiple Trends

Stacking Multiple Trends

Stacking Multiple Trends

Stacking Multiple Trends

Multiple Bar plots, Grouped

We can make “grouped” boxplots using dodge

library(ggplot2)

ggplot(cabbage_exp, aes(x=Date, y=Weight, fill=Cultivar)) +
  geom_bar(stat="identity", position="dodge", color="white") + 
  scale_fill_brewer(palette="Set1") +
  theme(text=element_text(size=20, family="Times"))

Multiple Bar plots, Grouped

Multiple Bar plots, Stacked

By default, ggplot wants to stack …

library(ggplot2)

ggplot(cabbage_exp, aes(x=Date, y=Weight, fill=Cultivar)) +
  geom_bar(stat="identity", color="white") + 
  scale_fill_brewer(palette="Set1") +
  theme(text=element_text(size=20, family="Times"))

Multiple Bar plots, Stacked

Mosaic Plots

  • Mosaic plots are like multi-dimensional bar plots
  • Encode values using area
  • In R, we need to install and load the library vcd
  • The vcd mosaic function requires a somewhat more sophisticated data table structure (more on this in another lecture)
library(vcd)
mosaic(HairEyeColor) + 
  theme(text=element_text(size=20, family="Times"))

Mosaic Plots

## NULL

Coxcomb Plots

Florence Nightengale used Coxcomb plots to convince the the Brittish that the biggest threat to their soldiers during the Crimean war were preventable diseases

nightengale = read.csv("http://eecs.ucf.edu/~wiegand/ids6938/datasets/nightengale.csv",header=TRUE)
Month = as.Date(paste("01",nightengale$Date),"%d %B %Y")
DeathType = factor(nightengale$DeathType,ordered=TRUE)
DeathRate = sqrt((1000*nightengale$NumDeaths/nightengale$AvgArmySize)/pi)
MyData = data.frame(Month,DeathType,DeathRate)

ggplot(MyData, aes(x=Month, 
                   y=DeathRate, 
                   fill=DeathType, 
                   order=as.numeric(DeathType))) + 
  geom_bar(stat="identity") +
  coord_polar() +
  scale_x_date(breaks=MyData$Month,labels=format(MyData$Month,"%b %Y")) + 
  theme(text=element_text(size=20, family="Times"))

Coxcomb Plots

Multiple Boxplots

library(ggplot2)

ggplot(iris,aes(x=Species, y=Sepal.Length)) + 
  geom_boxplot(outlier.size=3, notch=TRUE) +
  ylab("Iris Sepal Length (cm)") + 
  theme(text=element_text(size=20, family="Times"))

Multiple Boxplots

A Few Simple Viz Examples in Python/Matplotlib

Bar Plot

import matplotlib.pyplot as plt

fig, ax = plt.subplots()

fruits = ['apple', 'blueberry', 'cherry', 'orange']
counts = [40, 100, 30, 55]
bar_labels = ['red', 'blue', '_red', 'orange']
bar_colors = ['tab:red', 'tab:blue', 'tab:red', 'tab:orange']

ax.bar(fruits, counts, label=bar_labels, color=bar_colors)

ax.set_ylabel('fruit supply')
ax.set_title('Fruit supply by kind and color')
ax.legend(title='Fruit color')

plt.show()

Bar Plot

Bubble Plot

import matplotlib.pyplot as plt
import numpy as np

# Fixing random state for reproducibility
np.random.seed(19680801)


N = 50
x = np.random.rand(N)
y = np.random.rand(N)
colors = np.random.rand(N)
area = (30 * np.random.rand(N))**2  # 0 to 15 point radii

plt.scatter(x, y, s=area, c=colors, alpha=0.5)
plt.show()

Bubble Plot

Boxplots

import matplotlib.pyplot as plt
import numpy as np

np.random.seed(19680801)
fruit_weights = [
    np.random.normal(130, 10, size=100),
    np.random.normal(125, 20, size=100),
    np.random.normal(120, 30, size=100),
]
labels = ['peaches', 'oranges', 'tomatoes']
colors = ['peachpuff', 'orange', 'tomato']

fig, ax = plt.subplots()
ax.set_ylabel('fruit weight (g)')

bplot = ax.boxplot(fruit_weights,
                   patch_artist=True,  # fill with color
                   tick_labels=labels)  # will be used to label x-ticks

# fill with colors
for patch, color in zip(bplot['boxes'], colors):
    patch.set_facecolor(color)

plt.show()

Boxplots