A common data interpretation mistake is to assume that a correlation between two variables indicates a causal relationship between them. Even researchers can struggle with this distinction, as finding a correlation can be relatively easy but establishing evidence for a causal explanation is typically challenging.
People routinely see data visualizations (such as tables, bar graphs, line graphs, maps, etc.) in everyday contexts and media. These visualizations are intended to make it easier for people to understand data. Unfortunately, the common mistake of attributing causation based merely on correlation can instead lead to misconceptions.
Previous research has found that perceptions of causality can be context-dependent. When data aligns with a person’s prior beliefs and experiences, they are more likely to judge evidence as sound and stop thinking through other possible explanations (Shah et al., 2017, p. 270).
How does the design of a data visualization affect people’s perception of correlation and causation? Are there design choices that could help reduce the chances of people mistakenly assuming a causal relationship that may not exist?
These questions were the focus of an investigation by a team at Northwestern University in their research paper titled “Illusion of Causality in Visualized Data” (Xiong et al., 2019).
The investigation by Xiong et al. (2019) included a pilot experiment and three follow-up experiments. For all of the experiments, participants were recruited through Amazon Mechanical Turk and completed visualization tasks using an online survey.
Participants were shown a visualization and asked to answer graph-reading comprehension questions (true/false responses). These questions were used as a check to screen out participants who did not demonstrate basic graph reading skills.
For the pilot experiment and Experiment 1, participants then completed a generative task which asked them to explain in several sentences what they conclude from the visualization and why.
Next, participants completed two judgment tasks. Participants were given a statement describing the correlation between the variables and asked to rate their agreement with the statement using a scale from 0 (disagree) to 100 (agree).
Then participants were given a statement describing a causal relationship between the variables and asked to rate their agreement with the statement using a scale from 0 (disagree) to 100 (agree).
The experiments tested three types of visualizations that people commonly encounter: bar graphs, line graphs, and scatter plots. Experiment 1 also presented text-only descriptions, which were written to match their corresponding bar graph.
To create the data visualizations, the researchers randomly generated 100 data points with a correlation of 0.6 from a normal distribution. This same dataset was used for every visualization in all the experiments, regardless of the visualization design or the data context. Only the axis labels were changed.
In a pilot experiment, the researchers tested 19 pairs of variables representing different data contexts intended to range in their plausibility of being causally related.
For example, one visualization presented data on the association between smoking and lung cancer risk, while another visualization presented data on the association between usage of Microsoft Internet Explorer and homicide rates.
The results from the pilot experiment were used to select 4 data contexts (for Experiment 1) that differed significantly in their mean ratings of correlation and causation and covered a wide range of ratings. Despite being rated differently by participants, the visualizations for the different contexts were based on the same dataset.
The results from the pilot experiment were also used to develop a rubric to evaluate the responses for the generative task in Experiment 1.
Experiment 1 was designed to test different types of visualizations with different data contexts.
Each participant in Experiment 1 was presented with a series of 4 visualizations that differed in design type (text, bar, line, scatter) and data context (the four contexts selected from the pilot). Across the full set of participants, every data context was tested using every type of visualization in every order position (first, second, third, last).
For Experiment 2, the researchers created visualizations using different visual encoding marks (bar, line, or dot) with different levels of data aggregation (2 bins, 8 bins, or 16 bins). The line graphs and dot plots used the same data bins as the bar graphs; the bars were simply replaced with either lines or dots.
The researchers also removed the data context from all the visualizations and instead used abstract variable labels (such as X and Y, etc.).
Each participant in Experiment 2 was presented with a series of 3 visualizations that differed in encoding type (bar, line, dot) and level of data aggregation (2 bins, 8 bins, 16 bins).
The participants completed judgment tasks to rate their agreement (or disagreement) with correlation and causation statements given for each visualization.
Experiment 3 was designed to remove the effect of data aggregation and focus on visual encoding type.
For Experiment 3, the researchers created bar graphs, line graphs, and scatterplots with non-aggregated data. Each visualization had 16 data markers (bars, line segments, or dots).
For each type of encoding mark, they created one visualization in which the data were ordered by increasing Y-value and another visualization in which the data was unsorted (but still showing a positive correlation).
Similar to the previous experiment, the data context was removed from all visualizations and abstract variable labels were used (such as X and Y, etc.).
Each participant in Experiment 3 was presented with all the visualizations, in different order. The participants completed judgment tasks to rate their agreement (or disagreement) with correlation and causation statements given for each visualization.
The results from the pilot and Experiment 1 showed that data context has a significant effect on correlation and causation ratings, consistent with previous research.
The results from Experiments 2 and 3 show that a data visualization’s design can affect how viewers will perceive a causal relationship between the variables:
Dot encodings have higher perceived causality, followed by line encodings, with bar encodings having the lowest perceived causality.
Greater data aggregation (i.e., fewer groups or bins) has higher perceived causality with causality decreasing as data became less aggregated (i.e., more groups or bins).
Aggregated data has higher perceived causality than non-aggregated data, even if the number and type of encoding markers is the same.
There is an interaction effect between the type of encoding mark and the amount of data aggregation, which can complicate these patterns. For example, bar graphs with moderate aggregation (8 bins) have higher perceived causality than bar graphs with greater aggregation (2 bins) or bar graphs with less aggregation (16 bins).
The experiments purposefully focused on simple visualization designs commonly encountered in media, so it’s not known how the results might compare for complex visualizations.
The experiments purposefully used the same data set to create all the visualizations. This normalized data set had a positive correlation of 0.6, so it’s not known how the results might compare for data sets with other distributions, correlations, or patterns.
Shah, P., Michal, A., Ibrahim, A., Rhodes, R., & Rodriguez, F. (2017). What makes everyday scientific reasoning so challenging?. In B.H. Ross (Ed.), Psychology of Learning and Motivation (Vol. 66, pp. 251-299). Elsevier. https://doi.org/10.1016/bs.plm.2016.11.006
Xiong, C., Shapiro, J., Hullman, J., & Franconeri, S. (2019). Illusion of causality in visualized data. IEEE Transactions on Visualization and Computer Graphics, 26(1), 853-862. https://doi.org/10.48550/arXiv.1908.00215