Replication of Fluency and the detection of misleading questions: Low processing fluency attenuates the Moses Illusion (Experiment 1) by Song & Schwarz (2008, Social Cognition)
Authors
Katie Allen (k7allen@ucsd.edu)
Jae Kwon (jwkwon@ucsd.edu)
Ashley Monteiro (asmonteiro@ucsd.edu)
Kosta Boskovic (koboskovic@ucsd.edu)
Published
December 11, 2024
Introduction
This cognitive research paper is a perceptual investigation of fluency, specifically processing fluency, which is the ease at which information is processed in our cognitive system. In this study, the researchers question if lowering processing fluency can cause us to use more deliberate thinking and recognize errors in an intuitive question (e.g., How many animals of each kind did Moses take on the Ark?). In one experiment, the authors sought to answer this question by having undergraduate participants answer two trivia questions (distorted and non-distorted) in either an easy-to-read or hard-to-read font. We are attempting to replicate their finding that, when font is easy-to-read, we are more likely to answer distorted questions incorrectly because of high processing fluency.
The authors did not report an effect size. A post-hoc power analysis indicated their experiment achieved 60% power. Based on a prior power analysis, a sample size of 52 was required to achieve 80% power. This was a feasible sample size.
Planned Sample
Fifty two Prolific users (26 in each condition) participated for $1.00 in compensation for 5 minutes.
Materials
The stimuli consisted of 2 sets each containing two questions written in two contrasting fonts. One question contained a distortion while the other did not. “Which country is famous for cuckoo clocks, chocolate, banks, and pocket knives?” (A: Switzerland) is the non-distorted version, while “How many animals of each kind did Moses take on the Ark?” (A: can’t say) is the distorted version.
This replication experiment was conducted online through a JSPsych survey experiment on Prolific. Participants were randomly assigned to the easy vs hard-to-read conditions. They either saw the easy or hard to read text font versions of the two trivia questions. They were also informed that some of the questions they encountered might not have right answers and were encouraged to answer by selecting ‘can’t say’ when this happened. Then, the participants answered the non-distorted question followed by the distorted question. Afterwards, they answered a biblical knowledge check question.
Specific instructions given to each participant before the trials:
“You will read a couple of trivia questions and answer them. You can write the answer in the textbox provided. In case you do not know the answer, please type “don’t know”. You may or may not encounter ill-formed questions which do not have correct answers if taken literally.
For instance, you might see the question “Why was President Gerald Ford forced to resign his office?” In fact, Gerald Ford was not forced to resign. Please, type “can’t say” for this type of question.
Click “Next” to begin.”
Analysis Plan
In line with the original paper, we will analyze the data using Z scores comparing the proportion of “2” and “can’t say” responses in the easy vs. hard-to-read distorted condition. This will allow us to see if low processing fluency helps us answer distorted questions correctly.
The original researchers did not include information on cleaning or exclusion rules. We will exclude participants who do not complete the study.
The key analysis of interest here is the comparison of the proportions of “2” answers for the distorted question between the easy vs hard-to-read conditions (Z-scores). This will allow us to determine if low processing fluency helps us scrutinize information and respond more correctly.
Differences from Original Study
We used online instead of in-person data collection, recruited prolific users instead of undergraduate students, added a knowledge check, and wrote the the important instructions in bold font instead of spanning them across trials.
Methods Addendum (Post Data Collection)
We had participants who gave intelligible answers other than “can’t say” indicating they caught the Moses distortion in the question and who failed the knowledge check, so we had to modify our analysis plan to include these participants appropriately and exclude the one participant who failed the knowledge check from the main analysis.
Actual Sample
We collected a sample of 52 Prolific users. We did not collect demographic data. We excluded one participant from the key confirmatory analysis who failed the knowledge check.
Differences from pre-data collection methods plan
We further specified our plan to include participants who gave intelligible answers other than “can’t say” indicating they caught the Moses distortion in the question and participants who failed the knowledge check (only from the key analysis because we cannot include this as evidence of the Moses illusion if they didn’t have the prior knowledge to begin with).
Difficulties: We were not able to set up the JSpsych code to assign participants to conditions within the data structure which made it impossible to filter out the data into columns that were readable in r. Because of this, we had to manually enter all of our data points into a single file and relied on participants to track condition which was disguised as an “attention check.”
Results
Data Preparation & Analysis
Data was downloaded and imported into R, deidentified, and cleaned using Tidyverse prior to analysis.
### Data Preparation#### Load Relevant Libraries and Functionslibrary(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Rows: 52 Columns: 6
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (5): Subject ID, Condition, Control, Distorted, Knowledge
dbl (1): Subject Number
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(data)#### Data exclusion / filteringbrush_total <- data %>%filter(Condition =="Brush", grepl("Noah", Knowledge)) %>%nrow()arial_total <- data %>%filter(Condition =="Arial", grepl("Noah", Knowledge)) %>%nrow()#### Prepare data for analysis - create columns etc.brush_count <-sum(data$Distorted =="2"& data$Condition =="Brush")arial_count <-sum(data$Distorted =="2"& data$Condition =="Arial")
Confirmatory analysis
#### creating proportionsp1 <- arial_count/arial_totalp2 <- brush_count/brush_totalp <- (brush_count + arial_count)/(brush_total + arial_total)#### z test for proportion of "2" answersz <- (p1 - p2)/sqrt(p*(1-p)*(1/brush_total +1/arial_total))z
[1] 2.047379
p_value <-2* (1-pnorm(abs(z)))p_value
[1] 0.04062091
#### visualizing data via graphproportions <-c(p1, p2) conditions <-c("Arial", "Brush") barplot(proportions, names.arg = conditions, col =c("lightblue", "lightgray"),ylab ="Proportion", ylim =c(0, 1)) %>%text(x =1, y = p1 +0.05, labels =paste("Z =", round(z, 3), "\nP-value =", round(p_value, 3)), col ="black", cex =0.8)
We wanted to test if font readability (fluency) altered frequency of the Moses illusion. So, using a z-score approach, we wanted to know if this rate of proportion differs based on processing fluency. The key analysis of interest here was the z-score comparison of the proportions of “2” answers for the distorted question between the two conditions (easy vs. hard to read fonts).
We found the same effect as the original experimenters did – participants were more likely to answer ‘2’ in the easy-to-read font condition (92%, 23/25) than in the difficult-to-read font condition (69%, 18/26), z = 2.05, p = .04.
The original experimenters only included a table of their proportion results (no graph) which are reported here for their confirmatory analysis:
Hard to read (distorted - Moses): 8/15 (53%) - answered “2”
Discussion
Summary of Replication Attempt
In this replication experiment, we recreated Song & Schwarz (2008) experiment 1 online using JSpsych and collected new data online via Prolific. Our key statistical test to replicate was the comparison of the proportions of “2” answers for the distorted question between the easy vs hard-to-read conditions (Z-scores). Based on the data analysis presented here, we successfully replicated the main claim of the original study - low processing fluency (hard-to-read font) attenuates the Moses illusion - from Song & Schwarz (2008) based on this confirmatory analysis.
Commentary
This result likely arose because fluency encourages intuitive thinking. When things are easy to process, we are more confident and quick to answer questions based on heuristics that can be subject to error when we are presented with trick questions (like switching Noah for Moses). Hard-to-read font, on the other hand, creates disfluency, which requires us to slow down and read things more carefully before answering, allowing us to catch these distortions and respond correctly.
Design Overview
One factor was manipulated (processing fluency via font readability).
One measure was taken (answer to trivia questions).
This study used a between-participants design.
Yes, their responses to the trivia questions were measured twice per participant. (They answered 2 questions, one distorted and one non distorted).
Being exposed to both font types could have revealed the purpose of the study and contaminate responses. I think it would be hard to accomplish this study within-participants without risking demand characteristics. However, I think adding more trials could help conceal the purpose if they chose to do so.
It is unclear whether they made an effort to reduce demand characteristics, but likely not. Reading the trivia question in Brush Script could be enough to reveal the purpose because we don’t see that font used much.
One critique of this design is the limited trials. I think this study would be more interesting with the distorted Moses question embedded in a set of 20-30 trials. I also think this might be difficult to generalize to today’s world because cultural knowledge has changed so much. More distorted questions should be used mixed in with nondistorted questions in a larger trial set. A potential confound could be prior (biblical) knowledge. This could impact processing fluency and was not accounted for in the study.
This study has limited generalizability for a few reasons. One, the sample was 32 Psychology undergraduates, so we cannot generalize much beyond this population. Two, without testing a range of distorted questions, it might be hard to generalize beyond the Moses illusion itself.