Replication of Fluency and the Detection of Misleading Questions: Low Processing Fluency Attenuates The Moses Illusion by Hyunjin Song & Norbert Schwarz (2008, Social Cognition)

Author

Ashley Monteiro (asmonteiro@ucsd.edu)

Published

December 6, 2024

Introduction

The paper that was replicated is, “Fluency and the Detection of Misleading Questions: Low Processing Fluency Attenuates The Moses Illusion,” by Hyunjin Song and Norbert Schwarz. The original paper explores how processing fluency can affect someones ability to detect misleading questions. Specifically, it examines how low processing fluency can reduce susceptibility to the Moses Illusion, a phenomenon where people fail to detect contradictions in misleading statements. The experiment replicted in this report is Experiment 1. This specific experiment tests whether different font types can influence reading ease and/or the ability to detect misleading questions.

GitHub Repository

Pre-Registration

Link to experiment (easy to read condition):
Trivia Q1

Link to experiment (hard to read condition):
Trivia Q2

Methods

Power Analysis

Original effect size was never reported. The original studies achieved power is 59% with 32 participants, to achieve 80% we calculated that we need 52 participants.

Planned Sample

n = 52

Materials

The study was conducted online, therefore, there were no physical materials necessary.

Procedure

Qouted from the original paper, “Thirty-two undergraduates participated for course credit. They were randomly assigned to an easy- vs. difficult-to-read condition. The instructions (modeled after Erickson & Mattson, 1981) read, “You will a read couple of trivia questions and answer them. You can write the answer in the blank. In case you do not know the answer, please write ‘don’t know.’ You may or may not encoun- ter ill-formed questions which do not have correct answers if taken literally. For instance, you might see the question ‘Why was President Gerald Ford forced to resign his office?’ In fact, Gerald Ford was not forced to resign. Please, write ‘can’t say’ for this type of question.“ Depending on condition, participants were presented with two questions print- ed in a hard-to-read or easy-to-read font, as described above. The first (control) question did not have a distortion. It read, “Which country is famous for cuckoo clocks, chocolate, banks, and pocket knives?“ (Switzerland). The second, distorted question read, “How many animals of each kind did Moses take on the Ark?“ (taken from Erickson & Mattson, 1981). This question replaces the correct actor, Noah, with Moses and should be answered “can’t say.“ Answering “2“ indicates the Moses illusion.”

Only exception to the original procedure are the participants. Participants were recruited via Prolific, and were not undergraduates participating for course credit.

Analysis Plan

Qouted from the original paper, “To assess the robustness of the results across both experiments, we used Rosen- thal’s (1978) procedures for combining results of independent studies. This analy- sis confirms the overall reliability of the observed patterns. When the question was distorted, participants were less likely to give an erroneous substantive answer, z = 3.26, p < .002, and more likely to recognize the distortion (as indicated by answer- ing ‘can’t say’), z = 2.89, p < .004, when the font was difficult rather than easy to read. When the question was undistorted, participants were less likely to give a cor- rect substantive answer, z = 2.5, p < .02, and more likely to report that they “don’t know, z = 2.45, p < .02, when the font was difficult rather than easy to read.”

The z-test comparing both proportions of “2” answers for the distorted question between the two conditions is the key analysis of interest.

Differences from Original Study

Instead of recruiting undergraduates for course credit, we will be recruiting participants via Prolific, an online research platform. We do not believe this change will have any impact on the data.

Additionally, we had our participants answer a knowledge question to test their biblical knowledge. If a participant fails the knowledge test then we know it was not disfluency to blame.

Methods Addendum (Post Data Collection)

Actual Sample

n = 52 (in some analyses one subject was excluded due to failing the knowledge question)

Differences from pre-data collection methods plan

None.

Unexpected Events

Something with our jspsych code went wrong so, unfortunately, we had to manually enter all of our data into a spreadsheet. Next time, we will make sure to code our OSF information into our jspsych better so that our data is cleaner.

Results

Data preparation

Data was manually cleaned and then downloaded into R.

Confirmatory analysis

The primary analysis of interest is the z-test of proportions. This compares the proportion of “2” responses to the distorted question between the two conditions. Additionally, this analysis aims to determine whether the rate of this incorrect response—indicative of the Moses Illusion—varies due to differences in processing fluency between the two different fonts used in the conditions.

Exploratory analyses

On the distorted data: Z-test comparing proportions of people who answered “can’t say” in the two conditions Z-test comparing proportions of people who answered “can’t say” or an answer indicating they may have caught the replacement of Noah with Moses (S24, 28, 34, and 42).

On the non-distorted data: Z-test comparing proportions of ppl who answered “Switzerland” in the control question Z-test comparing proportions of ppl who answered a country/region other than Switzerland Z-test comparing proportions of ppl who answered “don’t know”.

Confirmatory Analyses Code

## Setting up libraries and reading data.
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(readr)
data <- read.csv("Stats replication data - Data.csv")
## Our confirmatory analysis examining the proportion of participants who selected "2" for the distorted question.
brush_total <- data %>%
  filter(Condition == "Brush", grepl("Noah", Knowledge)) %>%
  nrow()

arial_total <- data %>%
  filter(Condition == "Arial", grepl("Noah", Knowledge)) %>%
  nrow()

brush_count <- sum(data$Distorted == "2" & data$Condition == "Brush")
arial_count <- sum(data$Distorted == "2" & data$Condition == "Arial")

p1 <- arial_count/arial_total
p2 <- brush_count/brush_total

p <- (brush_count + arial_count)/(brush_total + arial_total)

z <- (p1 - p2)/sqrt(p*(1-p)*(1/brush_total + 1/arial_total))
z
[1] 2.047379
p_value <- 2 * (1 - pnorm(abs(z)))
p_value
[1] 0.04062091
proportions <- c(p1, p2)  
conditions <- c("Arial", "Brush")  
barplot(proportions, 
        names.arg = conditions,  
        col = c("lightblue", "lightgreen"), 
        main = "Proportion of '2' Responses to Distorted Question",  
        ylab = "Proportion",  
        ylim = c(0, 1))  
text(x = 1, 
     y = p1 + 0.05,  
     labels = paste("Z =", round(z, 3), "\nP-value =", round(p_value, 3)),  
     col = "black",  
     cex = 0.8)  

Below is the table from the original paper to compare to our own graph above.

Original Paper’s Table

Exploratory Analyses Code

## Z-test of participants who responded "can't say" to our distorted question in the two different conditions.
brush_count2 <- sum(data$Distorted == "Can't say" & data$Condition == "Brush")
arial_count2 <- sum(data$Distorted == "Can't say" & data$Condition == "Arial")
p1 <- brush_count2/brush_total
p2 <- arial_count2/arial_total
p <- (brush_count2 + arial_count2)/(brush_total + arial_total)
z <- (p1 - p2)/sqrt(p*(1-p)*(1/brush_total + 1/arial_total))
z
[1] 1.001026
p_value <- 2 * (1 - pnorm(abs(z)))
p_value
[1] 0.316814
## The previous test aimed to compare the proportion of participants in the two conditions who noticed the distortion. However, some participants gave alternative responses, such as saying Moses didn't build an ark, etc. 

##The test I am doing below will now, additionaly, look at the proportion of participants in both conditions who either selected 'Can't say' or provided a response indicating they noticed the distortion; this was counted manually.

brush_count3 <- 7
arial_count3 <- 1
p1 <- brush_count3/brush_total
p2 <- arial_count3/arial_total
p <- (brush_count3 + arial_count3)/(brush_total + arial_total)
z <- (p1 - p2)/sqrt(p*(1-p)*(1/brush_total + 1/arial_total))
z
[1] 2.250274
p_value <- 2 * (1 - pnorm(abs(z)))
p_value
[1] 0.02443154
## Z-test comparing the proportion of people who responded "Switzerland" to the control question.

brush_total2 <- sum(data$Condition == "Brush")
arial_total2 <- sum(data$Condition == "Arial")
brush_count4 <- sum(data$Control == "Switzerland" & data$Condition == "Brush")
arial_count4 <- sum(data$Control == "Switzerland" & data$Condition == "Arial")
p1 <- brush_count4/brush_total2
p2 <- arial_count4/arial_total2
p <- (brush_count4 + arial_count4)/(brush_total2 + arial_total2)
z <- (p1 - p2)/sqrt(p*(1-p)*(1/brush_total2 + 1/arial_total2))
z
[1] 0.5563486
p_value <- 2 * (1 - pnorm(abs(z)))
p_value
[1] 0.5779725
## Z-test comparing the proportion of people who responded a country or region other than Switzerland to the control question; this was also counted manually. 

brush_count5 <- 7
arial_count5 <- 10
p1 <- brush_count5/brush_total2
p2 <- arial_count5/arial_total2
p <- (brush_count5 + arial_count5)/(brush_total2 + arial_total2)
z <- (p1 - p2)/sqrt(p*(1-p)*(1/brush_total2 + 1/arial_total2))
z
[1] -0.8868791
p_value <- 2 * (1 - pnorm(abs(z)))
p_value
[1] 0.375144
## Z-test comparing the proportion of people who responded that they didn't know the answer to the control question.

brush_count6 <- sum(data$Control == "Don't know" & data$Condition == "Brush")  + sum(data$Control == "I don't Know" & data$Condition == "Brush")
arial_count6 <- sum(data$Control == "Don't know" & data$Condition == "Arial")
p1 <- brush_count6/brush_total2
p2 <- arial_count6/arial_total2
p <- (brush_count6 + arial_count6)/(brush_total2 + arial_total2)
z <- (p1 - p2)/sqrt(p*(1-p)*(1/brush_total2 + 1/arial_total2))
z
[1] 0
p_value <- 2 * (1 - pnorm(abs(z)))
p_value
[1] 1

Summary of Replication Attempt

In regard to our confirmatory findings, the experimental (distorted) results did replicate, however, the control (non-distorted) results did not replicate. Participants were more likely to answer ‘2’ in the easy-to-read font condition* (92%, 23/25) than in the difficult-to-read font condition (69%, 18/26), z = 2.05, p = .04. In regard to our exploratory findings, more participants answered ‘Can’t say’ in the difficult-to-read font condition (27%, 7/26) than in the easy-to-read font condition* (4%, 1/25), z = 2.25, p = .02.

Commentary

We are unsure why our control condition did not replicate. The main question of interest was if the experimental (distorted) condition would replicate, so we are happy with our findings. Additionally, we found our exploratory analyses to be interesting! Although our confirmatory findings were explicit enough to explain why fluency is so important, these additional analyses we ran simply confirmed just that. If our team were to replicate the study again we would collect participant’s ages in order to see how fluency effects change across the lifespan.

There were a few objections raised by Dr. Norbert Schwarz. We were able to include a few within our study, with the most important one being asking a knowledge question to assess prior biblical knowledge. Moreover, one objection that we were not able/did not have time to include would be keeping the instructions displayed throughout the entire experiment.

Extra Information

Average completion time was 2:24 minutes!

Design Overview

How many factors were manipulated?

One, which was processing fluency via either easy to read or hard to read font.

How many measures were taken?

One, which was the answer to the trivia questions.

Did it use a within-participants or between-participants design?

Between participants.

Were measures repeated?

No.

What would have been the consequence of changing one of these decisions? (e.g. moving from within to between participants). Could it have been either way?

A within-subjects design might have definitely influenced the results. Participants may not have been as distinctly affected by low processing versus high processing fluency when the two conditions were presented consecutively.

Were steps taken to reduce demand characteristics?

I definitely think the original authors could have minimized demand characteristics further. The hard to read font selected by the researchers may have revealed the experiment’s manipulation due to its unusual nature; possibly causing participants to spend more time on the questions and therefore decreasing the occurrence of the Moses Illusion. I believe a font that is more challenging than Arial…but not as unconventional as Brush Script would have been a better choice.

How would you critique the design?

I don’t think it was good idea for the original authors provided an example of a distorted question that should be answered with “can’t say”. This might make participants more aware of the font and question format in the subsequent trials of the main experiment.

Can you think of any potential confounds to the study?

I think a significant potential confound to this study is prior knowledge, specifically whether or not participants are aware of the correct answer to the Moses question.

Do you see any limitations to generalizability?

I believe the original authors could have included more trials, participants, and even stimuli to enhance the validity and generalizability of the results. A significant potential confound is prior knowledge, specifically whether participants are aware of the correct answer to the Moses question.