Reproducibility of Ranking Visualization of Correlation using Weber’s Law by Harrison et al. (2014, TVCG)

Author

Fuling Sun

Published

October 30, 2024

Introduction

My research focuses on information visualization, particularly how visualizations can be design effectively to align with authors’ communication goals and support readers’ analytic tasks. This paper, along with the follow-up study by Kay and Heer, models the perception of correlation afforded by visualizations, providing quantitative methods to compare and rank them. Replicating this study will contribute to my future research, both in theoretical development and tool design, while also enhancing my skills in statistical analysis.

To reproduce the analysis from the paper, I will use the open-source data provided by the authors, follow the analysis pipeline, and implement the process using R. Specifically, I will conduct the following tasks:

Data processing: I will understand the meaning of the variables in the dataset, and aggregate the data based on the experimental conditions, as described in Harrison et al.
Statistical analysis: I will perform analysis on both the aggregated data (from Harrison et al.) and the individual judgments (from Kay and Heer), applying techniques such as the Kruskal-Wallis test, Mann-Whitney-Wilcoxon tests, linear models, log-linear models and Bayesian estimation.
Result comparison: I will compare my results with those reported in the original papers to identify any discrepancies.
Additional analysis: I will propose and conduct further analyses to explore new questions that arise during the process.

Some potential challenges in this project include learning each statistic method and implementing them in R, which I am not yet fully familiar with. Additionally, since part of the analysis will draw from the follow-up study, I will need to understand the differences between these methods, their strengths and limitations, and when to apply each one.

Links:

Github repo
Ranking Visualization of Correlation using Weber’s Law by Harrison et al.
Beyond Weber’s Law: A Second Look at Ranking Visualizations of Correlation by Kay and Heer
- A follow-up analysis of the first paper.

Methods

This section presents the methodology, procedure, and analysis pipeline from the original study. Quoted materials from the original paper are indicated with quotation marks, while certain sections have been summarized for brevity due to the report’s length.

Methodology

The original study aimed to infer just-noticeable differences (JNDs) of the perception of correlation on different visualizations. There were nine types of visualization, six correlation values \(r\) (0.3, 0.4, 0.5, 0.6, 0.7 and 0.8) and two approach conditions (above and below).

The staircase procedure was used. “In the staircase procedure, given a target value for correlation, \(r\), participants are given two visualization stimuli side-by-side (two scatterplots in this case) and asked to choose which they perceive to have a higher correlation. With an”above” approach, the participant is given one visualization with the target \(r\), and another with an \(r\) value higher than the target. For example, if the target \(r\) is 0.7, then the second \(r\) value would be 0.8 (assuming a starting distance of 0.1). Conversely, with an “below” approach, the participant would be given a visualization with the target \(r\), and another that has an \(r\) value lower than the target.”

“In both cases, if a participant chooses correctly, the distance in correlation between the two visualizations is decreased by 0.01 while keeping the target \(r\) constant (e.g. 0.7 versus 0.79 in the”above” condition, or 0.7 versus 0.61 in the “below” condition). If a participant chooses incorrectly, the distance in correlation between two visualizations is increased by 0.03, making the next judgment easier. The staircase procedure “hones in” on the JND by penalizing incorrect choices more than correct choices.”

“The staircase procedure ends when either 50 individual judgments are reached or when a convergence criteria is met.”

Planned Sample

The original experiment was conducted with 1687 participants (834 female) on Amazon Mechanical Turk (MTurk). In total, there were 6772 trial records. No other demographics or pre-selection rules were mentioned in the paper.

Materials

“Our study used a total of nine visualizations, two correlation directions (positive/negative), and six correlation values (0.3 to 0.8) yielding 54 main groups. Since each participant was assigned to one visualization, one correlation direction, and two correlation values (above and below), roughly 30 participants were assigned to each visualization×direction×r-value group.”

“We chose nine visualizations for this experiment based on two main criteria: a) they must be commonly used in either infovis or commercial software (external validity), and b) they must be viable within the constraints of the experiment methodology. The nine visualizations chosen include: scatterplots, parallel coordinates plots, stacked area charts, stacked bar charts, stacked line charts, line charts, ordered line charts, radar charts, and donut charts.”

“All visualizations were 300×300 pixels, contained 100 data points and displayed datasets generated from same algorithm. For visualizations that required more than one color, we chose a single color scheme from ColorBrewer.”

Figure 1 (from the original paper): a) A sample starting comparison from the experiment: r = 0.7 on the left and r = 0.6 on the right. Participants were asked to choose which of the two appeared to be more highly correlated. b) The staircase procedure hones in on the just-noticeable difference by gradually making comparisons more difficult: r = 0.7 on the left and r = 0.65 on the right.

Procedure

To accommodate the diverse education background of participants from MTurk, each study started with a training and a practice session. Participants first read the definition of correlation and examples of visualization with different correlation values. Then they were given practice sessions of 30 judgments of which of the two visualization showed the higher correlation. Correct answers were presented after each judgment.

“Informed by early pilot testing, participants were randomly assigned to two correlation values: one from [0.3, 0.4, 0.5], and one from [0.6, 0.7, 0.8]. These groups roughly correspond to”hard” and “easy”, since high correlations are more easily discriminated than low correlations. For the two correlation values chosen, participants complete both the above and below approach, resulting in a total of four trials (easy×above, easy×below, hard×above and hard×below).”

“After completing the training and practice sessions, participants began the four main trials. The order of the correlation-approach pairs was randomized in this session. Upon completing a trial set (either by reaching the convergence criteria or 50 individual judgments), participants were given the option to take a short break, and notified as to how many experiment trials remained. Following the completion of all four trials, a demographics questionnaire was given, which included a question that asked participants to describe the strategy they used to assess correlation. Finally, participants were given a short debrief explaining the purpose of the experiment.”

Analysis Plan

The analysis plan includes data processing and cleaning process, Weber’s Law model fitting, and comparison of visualization types on correlation perception.

Data Processing and Cleaning

According to the paper, the authors aggregated the JNDs of each \(r\) value under each condition (visualization type x direction x approach).

“The resulting data were non-normally distributed, so to mitigate the effect of outliers, JNDs that fell outside 3 median-absolute deviations from the median (within one of the 54 groups) were excluded from the following analyses. Because the staircase methodology penalizes incorrect responses and controls for guessing by defining a convergence criteria, this exclusion criteria also mitigates the effect of”click-through” responses that often impact crowdsourced experiments.”

“An exclusion criteria was also enforced for visualization × direction pairs that exceeded 20% of values falling on or outside the”chance” boundary of JND = 0.45 established previously. Six of the eighteen pairs met this exclusion criteria: stacked area-positive, stacked bar-positive, stacked line-positive, donut-positive, radar-negative, and line-negative. The following analyses include the remaining twelve visualization×direction pairs.”

Weber’s Law Model Fitting

“Specifically, each correlation value \(r\) was moved by half of the average JND from the above and below approach. For the above approach, the correlation \(r\) was moved towards \(r = 1\), while the \(r\) from the below approach was moved towards \(r = 0\). Specifically, correlation \(r\) is transformed into adjusted-correlation \(r_A\) by:”

\[ r_A = r ± 0.5*{\tt {jnd}}(r) \]

“Linear models were then fit to the data, and errors were computed based on the square root of the mean squares of the residuals (RMS error).”

Comparison among Visualization Types

“Examining the JND data alone, there appear to be large differences in performance between many of the visualizations, as well as asymmetries between many of the positive/negative pairs. In order to confirm these observations, an overall Kruskal-Wallis test was conducted on the raw JNDs to evaluate whether there is an interaction between visualization and correlation direction conditions.”

“To explore further, several visualization × direction pairs were compared via Mann-Whitney-Wilcoxon tests. Rather than compare all possible pairs, we instead investigate 14 pairings reflecting the original motivations for choosing the visualizations tested.”

Additional Analysis: Winsorization and Permutation Test

One of the data exclusion criteria was removing data of JNDs that fell outside 3 median-absolute deviations from the median (within one of the 54 groups). Winsorizing JNDs means replacing values of these “outliers” with a specified percentile of the data, rather than eliminating them directly. By Winsorizing JNDs and refitting the model, we can observe the differences of model performance between excluded data and winsorized data.

For the permutation test, the \(r\) values will be randomly shuffled within each visualization x direction pair and refitted to the Weber’s Law model. By comparing the two models, we can observe if the relationship is stronger than random chance.

Differences from Original Study

The overall analysis is the same as the one in the paper. For comparing visualization x direction pairs, the original analysis only considered 14 pairs, I will analyze more pairs. For example, comparing visualization using the same type of visual marks (e.g., line, area) can help us understand which specific visualization types work better for depicting correlation. Another difference is the additional model robustness analysis with Winsorization and permutation test.

Results

Data preparation

Data preparation following the analysis plan.

Confirmatory analysis

The analyses as specified in the analysis plan.

Side-by-side graph with original graph is ideal here

Exploratory analyses

Any follow-up analyses desired (not required).

Discussion

Summary of Replication Attempt

Open the discussion section with a paragraph summarizing the primary result from the confirmatory analysis and the assessment of whether it replicated, partially replicated, or failed to replicate the original result.

Commentary

Add open-ended commentary (if any) reflecting (a) insights from follow-up exploratory analysis, (b) assessment of the meaning of the replication (or not) - e.g., for a failure to replicate, are the differences between original and present study ones that definitely, plausibly, or are unlikely to have been moderators of the result, and (c) discussion of any objections or challenges raised by the current and original authors about the replication attempt. None of these need to be long.