Reproducibility of Ranking Visualization of Correlation using Weber’s Law by Harrison et al. (2014, TVCG)
Introduction
My research focuses on information visualization, particularly how visualizations can be design effectively to align with authors’ communication goals and support readers’ analytic tasks. This paper, along with the follow-up study by Kay and Heer, models the perception of correlation afforded by visualizations, providing quantitative methods to compare and rank them. Replicating this study will contribute to my future research, both in theoretical development and tool design, while also enhancing my skills in statistical analysis.
To reproduce the analysis from the paper, I will use the open-source data provided by the authors, follow the analysis pipeline, and implement the process using R. Specifically, I will conduct the following tasks:
- Data processing: I will understand the meaning of the variables in the dataset, and aggregate the data based on the experimental conditions, as described in Harrison et al.
- Statistical analysis: I will perform analysis on both the aggregated data (from Harrison et al.) and the individual judgements (from Kay and Heer), applying techniques such as the Kruskal-Wallis test, Mann-Whitney-Wilcoxon tests, linear models, log-linear models and Bayesian estimation.
- Result comparison: I will compare my results with those reported in the original papers to identify any discrepancies.
- Additional analysis: I will propose and conduct further analyses to explore new questions that arise during the process.
Some potential challenges in this project include learning each statistic method and implementing them in R, which I am not yet fully familiar with. Additionally, since part of the analysis will draw from the follow-up study, I will need to understand the differences between these methods, their strengths and limitations, and when to apply each one.
Links:
- Github repo
- Ranking Visualization of Correlation using Weber’s Law by Harrison et al.
- Beyond Weber’s Law: A Second Look at Ranking Visualizations of Correlation by Kay and Heer
- A follow-up analysis of the first paper.
Methods
Power Analysis
Original effect size, power analysis for samples to achieve 80%, 90%, 95% power to detect that effect size. Considerations of feasibility for selecting planned sample size.
Planned Sample
Planned sample size and/or termination rule, sampling frame, known demographics if any, preselection rules if any.
Materials
All materials - can quote directly from original article - just put the text in quotations and note that this was followed precisely. Or, quote directly and just point out exceptions to what was described in the original article.
Procedure
Can quote directly from original article - just put the text in quotations and note that this was followed precisely. Or, quote directly and just point out exceptions to what was described in the original article.
Analysis Plan
Can also quote directly, though it is less often spelled out effectively for an analysis strategy section. The key is to report an analysis strategy that is as close to the original - data cleaning rules, data exclusion rules, covariates, etc. - as possible.
Clarify key analysis of interest here You can also pre-specify additional analyses you plan to do.
Differences from Original Study
Explicitly describe known differences in sample, setting, procedure, and analysis plan from original study. The goal, of course, is to minimize those differences, but differences will inevitably occur. Also, note whether such differences are anticipated to make a difference based on claims in the original article or subsequent published research on the conditions for obtaining the effect.
Methods Addendum (Post Data Collection)
You can comment this section out prior to final report with data collection.
Actual Sample
Sample size, demographics, data exclusions based on rules spelled out in analysis plan
Differences from pre-data collection methods plan
Any differences from what was described as the original plan, or “none”.
Results
Data preparation
Data preparation following the analysis plan.
Confirmatory analysis
The analyses as specified in the analysis plan.
Side-by-side graph with original graph is ideal here
Exploratory analyses
Any follow-up analyses desired (not required).
Discussion
Summary of Replication Attempt
Open the discussion section with a paragraph summarizing the primary result from the confirmatory analysis and the assessment of whether it replicated, partially replicated, or failed to replicate the original result.
Commentary
Add open-ended commentary (if any) reflecting (a) insights from follow-up exploratory analysis, (b) assessment of the meaning of the replication (or not) - e.g., for a failure to replicate, are the differences between original and present study ones that definitely, plausibly, or are unlikely to have been moderators of the result, and (c) discussion of any objections or challenges raised by the current and original authors about the replication attempt. None of these need to be long.