Replication of Study 1 by Song & Schwartz (2008, Social Cognition)

Author

Jae Kwon

Published

December 11, 2024

Github Repo Paper

Github Repo Write Up

Link to experiment (easy to read condition):
Trivia Q1

Link to experiment (hard to read condition):
Trivia Q2

Pre-Registration

Introduction

The original paper and the discussed phenomenon (e.g., fluency) are considered to be topics of Cognitive Psychology. The cognitive research paper, “Fluency and the Detection of Misleading Questions: Low Processing Fluency Attenuates The Moses Illusion” (Song and Schwarz, 2008), investigated the role of fluency on our ability to detect errors and distortion in a given question/statement. Processing fluency is the ease to which information is cognitive processed. Thus, the authors wanted to test if lowering processing fluency can cause people to use more deliberate and skeptical processing (as opposed to relying on heuristics such as gut feelings) and recognize errors/distortion in a question (i.e., How many animals of each kind did Moses take on the Ark?). The experiment consisted of two conditions: easy to read font (Arial) vs hard to read font (Brush). Undergraduate participants answered two trivia questions (non-distorted and distorted) presented in either the easy font or the hard font. The rate of the Moses of illusion was compared between the two conditions to determine whether the difference in processing fluency yields different rates.

Methods

Main Experiment

Note: Before conducting the main experiment, the authors pretested the stimuli to ensure that there was a perceptual difference (processing fluency) between the two conditions (easy to read font vs hard to read font).

Fifty two prolific participants participated for monetary compensation. They were randomly assigned to an easy- vs. difficult-to-read condition. The instructions (modeled after Erickson & Mattson, 1981) read, “You will a read couple of trivia questions and answer them. You can write the answer in the blank. In case you do not know the answer, please write ‘don’t know.’ You may or may not encounter ill-formed questions which do not have correct answers if taken literally. For instance, you might see the question ‘Why was President Gerald Ford forced to resign his office?’ In fact, Gerald Ford was not forced to resign. Please, write ‘can’t say’ for this type of question.“

Depending on the condition, participants were presented with two questions printed in a hard-to-read (Bush Script MT) or easy-to-read font (Arial). The first (control) question did not have a distortion. It read, “Which country is famous for cuckoo clocks, chocolate, banks, and pocket knives?“ (Switzerland). The second, distorted question read, “How many animals of each kind did Moses take on the Ark?“ (taken from Erickson & Mattson, 1981). This question replaces the correct actor, Noah, with Moses and should be answered “can’t say.“ Answering “2“ indicates the Moses illusion. Participants provided their answers in an empty text box. Finally, to test their prior biblical knowledge, we asked them,”Did Moses or Noah build the Ark?”

Power Analysis

The original effect size was not reported. We computed the original study’s achieved power to be 59% with 32 participants. We conducted a power analysis and determined that to achieve 80% power, we needed to have a sample size of 52.

Planned Sample

n = 52 (prolific users)

Materials

“‘Why was President Gerald Ford forced to resign his office?’ In fact, Gerald Ford was not forced to resign. Please, write ‘can’t say’ for this type of question.“ –> Pilot Question

“Which country is famous for cuckoo clocks, chocolate, banks, and pocket knives?“ (Switzerland). –> Undistorted version

“How many animals of each kind did Moses take on the Ark?“ (taken from Erickson & Mattson, 1981). –> Distorted version.

These are direct quotes from the research article.

Procedure

Described in the “Main Experiment” section

Analysis Plan

For our statistical test, we will use the z-test for proportions (i.e., test of proportions) to compare the proportions of two groups- easy-to-read vs. difficult-to-read font conditions. Specifically, for the distorted question (Moses Question), the proportion of participants who responded “2” and the proportion of participants who responded “can’t say” in both conditions (easy vs hard font) will be reported. We will also report the proportion of “Switzerland”and “can’t say” for the undistorted question in both conditions in the easy vs hard font conditions. The analyses will allow us to observe if low processing fluency helps us answer distorted questions correctly (while the low processing fluency could lead to more incorrect answers for the undistorted questions).

Participants who do not complete the full study will be excluded from the analysis.

Clarify key analysis of interest here

KEY ANALYSIS: The z-test of proportions comparing the proportions of “2” answers (i.e., fell for the illusion) for the distorted question between the easy-to-read condition vs hard-to-read condition will be the key analysis of interest. This will allow us to determine if low processing fluency attenuates the Moses Illusion (i.e., scrutinize information more and respond more correctly).

Differences from Original Study

Our sample will be different than the original paper. Our sample will be online participants (prolific). The original study used undergraduates.

Setting will be different. Ours will be online, their study was in person.

The original paper didn’t have any exclusion criteria, but we will exclude participants who do not complete the full study.

The original paper presented the instructions with the questions every time but we presented the instructions only once at the beginning.

Methods Addendum (Post Data Collection)

You can comment this section out prior to final report with data collection.

Actual Sample

52 prolific participants. One subject in the easy to read condition was excluded from the confirmatory analysis due to insufficient prior biblical knowledge.

Differences from pre-data collection methods plan

Any differences from what was described as the original plan, or “none”.

“None”

Results

Data preparation

Data manually cleanred and downloaded/imported into R. We cleaned the raw data using the Tidyverse package in R (participants were deidentified using subject numbers). Once cleaned, we ran the planned analysis.

Unexpected Events

Our jspsych code wasn’t set up properly to track the assigned condition of the participants, which made combining/importing the responses into a single file difficult. Therefore, we had to manually enter all of our data points into a single file. We will work on this next time.

Confirmatory analysis

As mentioned before, the key analysis of interest is the z-test of proportions comparing the proportions of “2” answers for the distorted question between the two font conditions. This analysis would allow us to determine whether there is a difference in the rate of producing the incorrect response (Moses Illusion) based on the difference in processing fluency between the two font conditions.

Exploratory analyses

We conducted the z-test of proportions comparing the proportions of “can’t say” answers for the distorted question responses between the two font conditions. This analysis will allow us to determine how many participants in each conditions caught the illusion and provided the correct answer. We also conducted the same test and included other answers indicating that the participant noticed the illusion (e.g., Moses didn’t build the Ark).

For the non-distorted question data: Z-test comparing proportions of participants who answered “Switzerland” in the control question, z-test comparing proportions of participants who answered a country/region other than Switzerland, and z-test comparing proportions of participants who answered “don’t know”.

###Confirmatory Analysis

As mentioned above, we had trouble combining the responses into a single file using jpsych and r, so we had to manually create a single file with all of the participant responses.

#Comparing the proportions of "2" answers between the easy to read (Arial) vs hard to read (Brush) conditions using z-test of proportions.

library(readr)
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ purrr     1.0.2
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
data <- read_csv("stats_replication_data.csv")
Rows: 52 Columns: 6
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (5): SubjectID, Condition, Control, Distorted, Knowledge
dbl (1): Subject Number

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
brush_total <- data %>%
  filter(Condition == "Brush", grepl("Noah", Knowledge)) %>%
  nrow()

arial_total <- data %>%
  filter(Condition == "Arial", grepl("Noah", Knowledge)) %>%
  nrow()

brush_count <- sum(data$Distorted == "2" & data$Condition == "Brush")
arial_count <- sum(data$Distorted == "2" & data$Condition == "Arial")

p1 <- arial_count/arial_total
p2 <- brush_count/brush_total

p <- (brush_count + arial_count)/(brush_total + arial_total)

z <- (p1 - p2)/sqrt(p*(1-p)*(1/brush_total + 1/arial_total))
z
[1] 2.047379
p_value <- 2 * (1 - pnorm(abs(z)))
p_value
[1] 0.04062091
#Participants were more likely to answer '2' in the easy-to-read font condition (92%, 23/25) than in the difficult-to-read font condition (69%, 18/26), z = 2.05, p = .04.

Graph

The original paper didn’t include a graph. Additionally, the original paper had a table that was not replicable because it included information that was not part of the experiment. So I will include a graph showing the results of the confirmatory analysis below:

#Graph showing the proportion of "2" answers between the conditions.

proportions <- c(p1, p2)  
conditions <- c("Arial", "Brush")  
barplot(proportions, 
        names.arg = conditions,  
        col = c("lightblue", "lightgreen"), 
        main = "Proportion of '2' Responses to Distorted Question",  
        ylab = "Proportion",  
        ylim = c(0, 1)) %>% 
text(x = 1, 
     y = p1 + 0.05,  
     labels = paste("Z =", round(z, 3), "\nP-value =", round(p_value, 3)),  
     col = "black",  
     cex = 0.8)  

Exploratory Analyses

#z-test of proportions of participants who responded "cant say" to the distorted question between the two conditions (answering cant' say indicates they noticed the distortion)

brush_total2 <- data %>%
  filter(Condition == "Brush", grepl("Noah", Knowledge)) %>%
  nrow()

arial_total2 <- data %>%
  filter(Condition == "Arial", grepl("Noah", Knowledge)) %>%
  nrow()

brush_count2 <- sum(data$Distorted == "Can't say" & data$Condition == "Brush")
arial_count2 <- sum(data$Distorted == "Can't say" & data$Condition == "Arial")

p1 <- arial_count2/arial_total2
p2 <- brush_count2/brush_total2

p <- (brush_count2 + arial_count2)/(brush_total2 + arial_total2)

z <- (p1 - p2)/sqrt(p*(1-p)*(1/brush_total2 + 1/arial_total2))
z
[1] -1.001026
p_value <- 2 * (1 - pnorm(abs(z)))
p_value
[1] 0.316814
# 4% (1/25) of participants in the easy-to-read condition responded "can't say" to the distorted question, compared to 12% (3/26) in the hard-to-read condition, z = 1.00, p = 0.32 
#z-test of proportions comparing the proportion of participants in both conditions who responded "cant' say" or provided an answer indicating they noticed the distortion (e.g., Moses didn't build the ark). This was counted manually

brush_count3 <- 7
arial_count3 <- 1
p1 <- brush_count3/brush_total
p2 <- arial_count3/arial_total
p <- (brush_count3 + arial_count3)/(brush_total + arial_total)
z <- (p1 - p2)/sqrt(p*(1-p)*(1/brush_total + 1/arial_total))
z
[1] 2.250274
p_value <- 2 * (1 - pnorm(abs(z)))
p_value
[1] 0.02443154
# 4% (1/25) of participants in the easy-to-read condition provided a response indicating they noticed the distortion, compared to 27% (7/26) of participants in the hard-to-read condition, z = 2.25, p = 0.02
# z-test comparing the proportion of people who responded "Switzerland" to the control question (non-distorted) in the two conditions.

brush_total2 <- sum(data$Condition == "Brush")
arial_total2 <- sum(data$Condition == "Arial")

brush_count4 <- sum(data$Control == "Switzerland" & data$Condition == "Brush")
arial_count4 <- sum(data$Control == "Switzerland" & data$Condition == "Arial")

p1 <- brush_count4/brush_total2
p2 <- arial_count4/arial_total2

p <- (brush_count4 + arial_count4)/(brush_total2 + arial_total2)

z <- (p1 - p2)/sqrt(p*(1-p)*(1/brush_total2 + 1/arial_total2))
z
[1] 0.5563486
p_value <- 2 * (1 - pnorm(abs(z)))
p_value
[1] 0.5779725
# 50% (13/26) of participants in the easy-to-read condition correctly responded "Switzerland" to the control question, compared to 58% (15/26) of participants in the hard-to-read condition, z = 0.56, p = 58
# z-test comparing the proportion of people who responded a country or region other than Switzerland to the control question. Also counted manually

brush_count5 <- 7
arial_count5 <- 10
p1 <- brush_count5/brush_total2
p2 <- arial_count5/arial_total2

p <- (brush_count5 + arial_count5)/(brush_total2 + arial_total2)

z <- (p1 - p2)/sqrt(p*(1-p)*(1/brush_total2 + 1/arial_total2))
z
[1] -0.8868791
p_value <- 2 * (1 - pnorm(abs(z)))
p_value
[1] 0.375144
# 38% (10/26) of participants in the easy-to-read condition responded to the control question with a country or region other than Switzerland, compared to 27% (7/26) of participants in the hard-to-read condition, z = 0.89, p = 0.38.
# z-test of proportion comparing the proportion of people who responded they didn't know the answer to the control question in both conditions.

brush_count6 <- sum(data$Control == "Don't know" & data$Condition == "Brush")  + sum(data$Control == "I don't Know" & data$Condition == "Brush")
arial_count6 <- sum(data$Control == "Don't know" & data$Condition == "Arial")

p1 <- brush_count6/brush_total2
p2 <- arial_count6/arial_total2

p <- (brush_count6 + arial_count6)/(brush_total2 + arial_total2)

z <- (p1 - p2)/sqrt(p*(1-p)*(1/brush_total2 + 1/arial_total2))
z
[1] 0
p_value <- 2 * (1 - pnorm(abs(z)))
p_value
[1] 1
# 12% of participants in both conditions (3/26 in each) responded that they didn't know the answer to the control question, z = 0, p = 1.

Discussion

In our experiment, participants were significantly more likely to answer “2” to the distorted question when the font was easy to read compared to when it was hard to read (confirmatory analysis). That is, we were able to replicate the original findings to our confirmatory analysis. However, we were not able to replicate the exploratory analysis from the original study (proportion of cant’ say answers). Additionally, we were not able to replicate the original findings regarding the control question (non-distorted). We found null results (no sig difference between the font conditions), but the original study found significant differences between the two conditions for the non-distorted question.

Summary of Replication Attempt

As mentioned before, the primary goal of this replication project was to re-test whether easy of fluency (processing ease) influences people’s ability to notice a distortion within a question. The original study found that people were more likely to incorrectly answer “2” to the distorted question when the font was easy to read. We were able to replicate this finding. However, we were not able to replicate the exploratory analysis (more people noticing the distortion when the font is hard to read by answering cant say). The significant result to the confirmatory analysis indicates that when ease of processing is fast and comfortable, we go with our gut feeling when making decisions. However, when ease of processing is difficult and slow, we are more likely to use skeptical/deeper processing when making decisions.

Commentary

First, we were not able to replicate the exploratory analysis from the original study - more participants answering “can’t say” to the distorted question when the font is hard to read compared to when it is easy to read. This could be due to couple of things. First, the number of people who answered “cant say” was very limited in both of the conditions. Second, participants provided answers other than “can’t say” that indicated their correct knowledge (Moses wasn’t the one with the ark). When we pooled these responses together, we found significant results (more people noticed the distortion in the hard to read condition than the easy to read condition). Such findings indicate that easy of processing influences our decision making.

We were not able to replicate the significant findings to the control question. Unlike the distorted question, the original study found that more people answered correctly to the control question when the font was easy to read compared to when it was difficult to read (i.e., opposite effect on accuracy compared to the distorted question). However, our findings showed no significant differences in the accuracy of the responses between the two conditions. Given the small sample size and the unreported effect size in the original study, the study could have been underpowered. Additionally, the difference in results may have been due to the difference in the composition of the sample (online vs in person; time difference - 2008 vs 2024 - different rate of biblical knowledge).

Finally, we couldn’t implement one recommendation by Dr. Norbert Schwarz. He recommended we present the instructions with the questions every single time (because the original study did so when it was done in pen and paper). However, we were unable to implement it because the study was conducted online (combination of: lack of knowledge on jspsych coding, lack of space on the computer screen, font size and location, etc)

Average Time

Average completion time was 2:24 minutes.

Statement of Contributions

Jae Kwon: Conceptualization, formal analysis, visualization, software, data curation, validation

Katherine Allen: Conceptualization, software, data curation, investigation, validation, methodology

Kosta Boskovic: Conceptualization, formal analysis, project administration, data curation, investigation, validation

Ashley Monterio: Conceptualization, formal analysis, visualization, data curation, investigation, validation

Design Choices

One –> processing fluency via font (easy vs hard to read)

There was one measure (answer to the trivia questions) which was repeated once

Between participants (randomly assigned to easy vs hard condition)

A within subjects design could have impacted the results. Participants may not have been affected as much or as distinctly by low processing vs high processing fluency when the two conditions were presented one after the other.

I believe they could have reduced demand characteristics more. The hard to read font the researchers chose could have given away the manipulation of the experiment because of how unusual it is, leading them to spend more time on the question and decreasing the rate of the Moses Illusion. They could have chosen a font that is harder than ‘Ariel’ but not as hard/unusual as ‘brush script’

I’m not sure if it was appropriate to give an example of a distorted question that should be answered “cant say” (rather than just telling participants to answer ‘cant say’ for a question where that would be applicable because it could make participants more alert to the font and question format in the subsequent experimental trials (main experiment).

I also believe that they could have used more trials, participants, and stimuli to increase the validity and generalizability of the results. A major potential confound would be prior knowledge (whether people know the correct answer to the Moses question)