Illusion of Causality in Visualized Data

A common data interpretation mistake is to assume that a correlation between two variables indicates a causal relationship between them. Even researchers can struggle with this distinction, as finding a correlation can be relatively easy but establishing evidence for a causal explanation is typically challenging.

People routinely see data visualizations (such as tables, bar graphs, line graphs, maps, etc.) in everyday contexts and media. These visualizations are intended to make it easier for people to understand data. Unfortunately, the common mistake of attributing causation based merely on correlation can instead lead to misconceptions.

Previous research has found that perceptions of causality can be context-dependent. When data aligns with a person’s prior beliefs and experiences, they are more likely to judge evidence as sound and stop thinking through other possible explanations (Shah et al., 2017, p. 270).

How does the design of a data visualization affect people’s perception of correlation and causation? Are there design choices that could help reduce the chances of people mistakenly assuming a causal relationship that may not exist?

These questions were the focus of an investigation by a team at Northwestern University in their research paper titled “Illusion of Causality in Visualized Data” (Xiong et al., 2019).

Methods

The investigation by Xiong et al. (2019) included a pilot experiment and three follow-up experiments. For all of the experiments, participants were recruited through Amazon Mechanical Turk and completed visualization tasks using an online survey.

Participants were shown a visualization and asked to answer graph-reading comprehension questions (true/false responses). These questions were used as a check to screen out participants who did not demonstrate basic graph reading skills.

For the pilot experiment and Experiment 1, participants then completed a generative task which asked them to explain in several sentences what they conclude from the visualization and why.

Next, participants completed two judgment tasks. Participants were given a statement describing the correlation between the variables and asked to rate their agreement with the statement using a scale from 0 (disagree) to 100 (agree).

Then participants were given a statement describing a causal relationship between the variables and asked to rate their agreement with the statement using a scale from 0 (disagree) to 100 (agree).

The experiments tested three types of visualizations that people commonly encounter: bar graphs, line graphs, and scatter plots. Experiment 1 also presented text-only descriptions, which were written to match their corresponding bar graph.

To create the data visualizations, the researchers randomly generated 100 data points with a correlation of 0.6 from a normal distribution. This same dataset was used for every visualization in all the experiments, regardless of the visualization design or the data context. Only the axis labels were changed.

Four types of visualization designs (text, bar, line, scatter) based on the same dataset (Xiong et al., 2019, Fig. 1)

Four bar graphs for different contexts based on the same dataset (Xiong et al., 2019, Fig. 7)

Pilot Experiment

In a pilot experiment, the researchers tested 19 pairs of variables representing different data contexts intended to range in their plausibility of being causally related.

For example, one visualization presented data on the association between smoking and lung cancer risk, while another visualization presented data on the association between usage of Microsoft Internet Explorer and homicide rates.

The results from the pilot experiment were used to select 4 data contexts (for Experiment 1) that differed significantly in their mean ratings of correlation and causation and covered a wide range of ratings. Despite being rated differently by participants, the visualizations for the different contexts were based on the same dataset.

The results from the pilot experiment were also used to develop a rubric to evaluate the responses for the generative task in Experiment 1.

Experiment 1 - Causality in Context

Experiment 1 was designed to test different types of visualizations with different data contexts.

Each participant in Experiment 1 was presented with a series of 4 visualizations that differed in design type (text, bar, line, scatter) and data context (the four contexts selected from the pilot). Across the full set of participants, every data context was tested using every type of visualization in every order position (first, second, third, last).

Experiment 2 - Aggregation Levels

For Experiment 2, the researchers created visualizations using different visual encoding marks (bar, line, or dot) with different levels of data aggregation (2 bins, 8 bins, or 16 bins). The line graphs and dot plots used the same data bins as the bar graphs; the bars were simply replaced with either lines or dots.

The researchers also removed the data context from all the visualizations and instead used abstract variable labels (such as X and Y, etc.).

Each participant in Experiment 2 was presented with a series of 3 visualizations that differed in encoding type (bar, line, dot) and level of data aggregation (2 bins, 8 bins, 16 bins).

The participants completed judgment tasks to rate their agreement (or disagreement) with correlation and causation statements given for each visualization.

Visualizations used in Experiment 2. (Xiong et al., 2019, Fig. 10)

Experiment 3 - Effect of Encoding

Experiment 3 was designed to remove the effect of data aggregation and focus on visual encoding type.

For Experiment 3, the researchers created bar graphs, line graphs, and scatterplots with non-aggregated data. Each visualization had 16 data markers (bars, line segments, or dots).

For each type of encoding mark, they created one visualization in which the data were ordered by increasing Y-value and another visualization in which the data was unsorted (but still showing a positive correlation).

Similar to the previous experiment, the data context was removed from all visualizations and abstract variable labels were used (such as X and Y, etc.).

Each participant in Experiment 3 was presented with all the visualizations, in different order. The participants completed judgment tasks to rate their agreement (or disagreement) with correlation and causation statements given for each visualization.

Visualizations used in Experiment 3 (Xiong et al., 2019, Fig. 13)

Key Findings

The results from the pilot and Experiment 1 showed that data context has a significant effect on correlation and causation ratings, consistent with previous research.

The results from Experiments 2 and 3 show that a data visualization’s design can affect how viewers will perceive a causal relationship between the variables:

Dot encodings have higher perceived causality, followed by line encodings, with bar encodings having the lowest perceived causality.
Greater data aggregation (i.e., fewer groups or bins) has higher perceived causality with causality decreasing as data became less aggregated (i.e., more groups or bins).
Aggregated data has higher perceived causality than non-aggregated data, even if the number and type of encoding markers is the same.
There is an interaction effect between the type of encoding mark and the amount of data aggregation, which can complicate these patterns. For example, bar graphs with moderate aggregation (8 bins) have higher perceived causality than bar graphs with greater aggregation (2 bins) or bar graphs with less aggregation (16 bins).

Limitations

The experiments purposefully focused on simple visualization designs commonly encountered in media, so it’s not known how the results might compare for complex visualizations.

The experiments purposefully used the same data set to create all the visualizations. This normalized data set had a positive correlation of 0.6, so it’s not known how the results might compare for data sets with other distributions, correlations, or patterns.

Questions to Ponder

Tables and maps are commonly used in everyday contexts and media, yet the researchers did not test these as a type of visualization design. What other types of visualization designs might be useful to test in future research?
How might the results be different if participants were asked to complete a critical-thinking generative task? For example, asking participants to provide alternate conclusions for the same visualization or to identify additional information that would help strengthen or weaken their conclusions.
How might the results differ if participants were asked to use the visualizations for decision-making tasks with real-world impact (such as a financial decision, healthcare decision, policy decision, etc.)?
Participants evaluated individual visualizations in isolation with either limited or no context. How might the results differ if participants were presented a brief article containing multiple visualizations?

References

Shah, P., Michal, A., Ibrahim, A., Rhodes, R., & Rodriguez, F. (2017). What makes everyday scientific reasoning so challenging?. In B.H. Ross (Ed.), Psychology of Learning and Motivation (Vol. 66, pp. 251-299). Elsevier. https://doi.org/10.1016/bs.plm.2016.11.006

Xiong, C., Shapiro, J., Hullman, J., & Franconeri, S. (2019). Illusion of causality in visualized data. IEEE Transactions on Visualization and Computer Graphics, 26(1), 853-862. https://doi.org/10.48550/arXiv.1908.00215

H517 Research Paper Summary

Michael Frontz

2025-10-12