Replication of Experiment 1 by Child, Oakhill, & Garnham (2018, Language, Cognition and Neuroscience)
Author
Verity Lua (vyqlua@stanford.edu)
Published
April 11, 2024
Introduction
Does the ease of processing emotional information differ based on the emotional valence of the information and the perspective by which the emotional information is written? In Study 1 of the paper “You’re the emotional one: the role of perspective for emotion processing in reading comprehension”, Child, Oakhill, & Garnham (2018) found that participants read positively-valenced passages faster when they were written from a personal (second person’s) perspective compared to when they were written from an onlooker’s (third person’s) perspective. In contrast, there was no difference in reading times for negatively-valenced passages written from either perspective. However, a replication study done by Bunderson (2020) did not find a significant difference in reading time between positively-valenced passages written from a personal perspective and positively-valenced passages written from an onlooker’s perspective.
Hence, the current project seeks to examine the possible reason(s) why the findings failed to replicate. As a researcher in the field of affective science, how people process emotions is something that I am particularly interested in. Furthermore, as a social psychologist, I enjoy uncovering thinking about social factors that could explain why certain effects may be more pronounced in certain contexts and samples compared to others. I suspect that the effect found by Child et al. (2018) may be moderated by educational status, whereby the effect may more pronounced among highly educated individuals who may have greater exposure to reading and writing personal anecdotes throughout their educational journey. It is important to note that, whereas Child et al. (2018) used a sample from the University of Sussex’s subject pool, Bunderson (2020) recruited participants through Amazon Mturk, which may have introduced variability in educational status. Thus, educational status may be a potential variable that could allow us to understand the conflicting findings.
The current project will utilize the stimuli (24*2 emotionally-valenced passages) as in the original study. Alike the replication study by Bunderson (2020), all stimuli and questionnaires will be administered through Qualtrics. In light of the failed replication attempt based on a sample of 40 participants, a larger sample will be used. Participants will be recruited online through the crowdsourcing platform, Prolific. Participants will additionally be asked to report the number of years of formal education they have had to facilitate exploratory analyses on the possible moderating role of educational status.
One challenge of the current project lies in the data analysis. Specifically, the main analysis used in the original study by Child et al. (2018) was a linear mixed effects (LME) model, while the main finding that Bunderson (2020) attempted to replicate was the t-test result showing a significant difference in reading time between the positively-valenced passages that were written from different perspectives (i.e., the simple effect within positively-valenced passages). It may be possible that the LME could indicate a significant interaction effect between emotional valence and perspective, but no simple effect (as in the replication study). In such a case, it would be difficult to determine if the replication should be considered a successful one or not. Additionally, another challenge for the current project is in estimating how large the possible moderation effect will be. Given that the moderation effect is a novel proposition, I may inaccurately estimate the true effect of the moderation and thus fail to find a statistically significant effect even if there was a true effect.
Summary of prior replication attempt
There were several differences in the replication attempt done by Bunderson and the original study done by Child and colleagues that should be discussed. Firstly, as noted above, while Child and colleagues utilized a sample of college students (aged between 18 and 33), the replication study by Bunderson recruited participants (aged between 24 and 70) that may or may not have been to college. Given that college students may have greater exposure to reading and writing personal anecdotes relative to the general population, the difference in the samples’ educational background may account for why the main finding that individuals read quicker when positive stories are presented in the second-person perspective was not replicated.
Secondly, there were some differences in the materials used in the studies. While the main stimuli (the emotionally-valenced passages) and the filler passages presented to participants were similar across both studies, the instructions for the tasks, the passages used for the practice trials, and the phrasing of the items used to examine participants’ emotional state may have been slightly different given that Bunderson was unable to obtain these materials from the original authors. However, as the main effect we are interested in replicating is the reading time for the emotionally-valenced passages (for which were similar across both studies), I posit that these differences were unlikely to have a substantial impact on the findings of the replication study done by Bunderson.
Thirdly, the replication study and the original study differed in the setting by which the studies were conducted. As a result, the subsequent exclusion criteria used in both studies were also different. In the original study, participants completed the study in-person on a computer using the E-Prime software. However, in the replication study, participants completed the study at their own time online using the online survey platform, Qualtrics. This is notable given that the key dependent measure (reading time) may be influenced by how immersed individuals are in the task and how conducive their physical surroundings are. Bunderson argued that external distraction was likely to be more present when a survey is completed in a participant’s personal setting rather than in a laboratory (for an alternative perspective, see Hauser & Schwarz, 2016). Thus, to account for this, Bunderson employed two additional exclusion criteria for their data. First, Bunderson excluded participants who had reading times less than 400ms for more than half of the trials based on the assumption that these participants were likely to have been using bots or were providing ‘key-smashing’ responses. Second, reading times that were longer than 15 seconds on any individual sentence of the emotionally-valenced passages were excluded based on the assumption that participants were distracted during that particular trial. While these differences in exclusion criteria may have influenced the replication of the original study, I posit that the additional exclusion criteria imposed by Bunderson were justified. Nonetheless, to examine if the results were influenced by these differences in exclusion criteria, sensitivity analyses will be conducted to ensure that the conclusions of the current replication study do not change as a result of these additional exclusion criteria.
Based on the framework outlined in LeBel et al. (2018), the replication would be considered a far replication because the population in the replication study was notably different from that of the original study.
Methods
Power Analysis
The original study utilized a sample of 36 participants, and the first replication utilized a sample of 40 participants. Given that the statistical tests of interest for the current replication (a pairwise test for an interaction term in a mixed effects model) is relatively complex, it was difficult to conduct an a priori power analysis to determine the sample size needed for the rescue replication. Instead, the current rescue used the rule-of-thumb suggestion of recruiting a sample approximately 2.5 times greater than the original sample (Simonsohn, 2015). Hence, 100 participants (40*2.5; using the larger replication sample size of 40 to be more conservative) were recruited for the study.
Planned Sample
In line with the original sample used in the original study by Child et al. (2018), participants will be limited to include only native English speakers over 18 years of age without a diagnosed reading disorder. Participants will be recruited via Prolific and these inclusion criteria will be listed in the Prolific task description. Screening questions will also be added at the very start of the Qualtrics survey such that participants who are not native English speakers, are below 18 years of age, or have been diagnosed reading disorder will have their surveys terminated.
Additionally, while Bunderson (2020) used a sample of American participants, the current replication study will use a sample of British participants (as in Child et al.’s original study) to mimic the original study as closely as possible. This criteria will be enforced using Prolific’s in-built geographic filter, and through a screener question at the start of the Qualtrics survey. Individuals who indicate that they were not British will have their surveys terminated and not be allowed to participate in the study.
Materials
Emotionally-Valenced Passages
In line with the original experiment, participants were presented with 24 of the 48 possible emotionally-valenced text passages created by Child et al. (2018). The passages were obtained from the supplemental materials uploaded by the authors of the original study. There were 24 unique passages (12 negatively-valenced and 12 positively-valenced), each with two versions: one written in a personal, second person perspective and the other written in an onlookers’, third person perspective. Each passage was 5-9 sentences long and each passage ended with an explicit emotion word that matched the content of the passage. Thus each participant read: - 12 negatively-valenced texts, with half written in an personal (second-person) perspective and half written in an onlooker’s (third-person) perspective; - and 12 positively-valenced texts, with half written in an personal (second-person) perspective and half written in an onlooker’s (third-person) perspective.
Participants only read each novel scenario once (i.e., participants read each unique story in either the second- or third-person perspective but not both). The passages were counterbalanced by perspective, valence and gender of the character for all participants.
Distractor Passages
Participants were also presented with 24 distractor passages to conceal the purpose of the study. These distractor passages were obtained by Bunderson from the senior author of the paper. Distractor passages were of comparable lengths to the main emotionally-valenced passages, and were written in third (“he/she”) or first (“I”) person perspectives. Of importance, in contrast to the experimental items, “the final sentence of the fillers did not contain an explicit emotion and the texts were therefore more ambiguous” (Child et al., 2020; p.882).
Procedure
The stimuli were presented on Qualtrics, an online survey platform. As in the original experiment by Child et al. (2018), the sentences were presented on white background with font size 24 and black font.
At the very beginning of the study, participants were presented with an introduction and three practice passages created by Bunderson (the original materials for these particular materials were not accessible). To ensure the validity of the introduction and trial passages, “Feedback was obtained during piloting to ensure that the instructions and practice trials prepared participants for the task” (Bunderson, 2020).
After completing the trial, the main text passages (the emotionally-valenced and distractor passages) were presented sentence by sentence to participants. Passages appeared in random orders for each participant and the time participants took to read each sentence was recorded. As in the original study, “After having read the final sentence, participants typed in their self-rating i.e. the number rating their own emotion” (Child et al., 2020; p.882). Specifically, in line with Bunderson’s replication study, participants were asked to “Please rate your current emotional state.” on a 10-point Likert scale (1 = Negative, 10 = Positive). After reporting their current emotional state, the next trial began after a two second break.
After all passages were presented, participants were asked to report their years of formal education at the very end of the survey. This was done to facilitate the novel hypothesis that educational status would moderate the relationship between the perspective by which the passage was written and reading times of positively-valence passages. Specifically, I sought to examine if the effect of perspective on reading time would be larger with increasing years of formal education.
Controls
As in the original study by Child et al. (2018) and as in Bunderson’s (2020) replication study, “outliers 2.5 standard deviations below or above the mean reading times per sentence, were removed for each participant” (Child et al., 2018; p.882). Additionally, in line with Bunderson (2020), participants who had reading times less than 400ms for more than half of the trials and individual reading times that were longer than 15 seconds on any individual sentence of the emotionally-valenced passages were excluded from analysis. This is to ensure that participants retained for analyses were not using bots and to ensure that reading times were not skewed by short moments of distraction.
However, to mimic the original study by Child et al. (2018) as closely as possible, sensitivity analyses will be conducted including participants who had reading times less than 400ms for more than half of the trials and including reading times that were greater than 15 seconds. Unless otherwise stated, the results presented were similar including these participants and data points.
Analysis Plan
As in Child et al. (2018; p.882), “The reading times were analysed using linear mixed effect (LME) models. A natural log transformation was performed in order to normalize the data. We accounted for length effects of different passages by regressing (log) reading times against the number of characters per sentence. These regressions were calculated by participant. As a result, log-residual reading times with a mean (intercept) of zero (per participant) were entered into the analyses… Perspective as well as valence were included as fixed-effects into the analysis using deviation coding. In addition, we also included participants and items with random intercepts and slopes into the mixed models. For parameter reports, we used the default restricted maximum likelihood estimations provided by the lme4 package”. Based on Akaike Information Criterion (AIC) scores, Child et al. (2018) found that the model including random intercepts of participants and items, but not random slopes, showed the best fit. Thus, the current analysis will utilize a model with random intercepts for participants and items but without random slopes as in the original study, unless the model fails to converge and needs to be further simplified. For this analysis, the current replication sought to examine if there would be an interaction between valence and perspective, whereby differences in reading times based on perspective were dissimilar for positively- and negatively-valenced passages.
The main result that the present study seeks to replicate is the finding that “reading times were faster for passages that were written from a personal perspective than from an onlooker perspective” (Child et al., 2018; p.883). To replicate this finding, a pairwise comparison will be conducted to examine if within the positively-valenced passages, reading times differed by the perspective the passages were written in. In the original study, the authors found that “for positive emotions, reading times were faster for passages that were written from a personal perspective (M = 2201, SD = 1284) than from an onlooker perspective (M = 2350, SD = 1351; t(4408) = 3.73, p < .001; see Figure 1)” (Child et al., 2018). In the replication study, Bunderson however found no significant difference in the t-test (t(934) = -0.99, p = 0.323; Bunderson, 2020).
An exploratory analysis will also be conducted to examine if educational status would affect the strength of the relationship between perspective of passage and reading time observed within the positively-valenced passages. LME will be used given that participants’ educational status is a between-persons variable while the perspective of passages was a repeated measure within participants. As in the LME analysis described above, random intercepts of participants and items will be included but not random slopes.
All analyses will be conducted in R version 4.2.2 (Henry & Wickham, 2023). LME will be conducted using lme4 version 1.1-33 (Bates et al., 2015) and lmerTest version 3.1-3 (Kuznetsova et al., 2017). In line with the original study, pairwise tests will be conducted using the lsmeans package version 2.30-0 (Lenth, 2016).
Differences from Original Study and 1st replication
The current replication study sought to minimize the differences from the original study as much as possible. Nonetheless, there are three differences worth nothing. Firstly, like Bunderson (2020), the instructions, trial passages, and item used for self-reported emotion may be dissimilar from the original Child et al. (2018) study given that these materials were not attainable from the original authors. The current work utilizes the materials created by Bunderson, who had re-created these materials based on the descriptions provided in the original paper, and had piloted these materials beforehand. It is expected that these changes will have minimal impact on the findings of the study given that they are not the key measures and stimuli of interest.
Secondly, the current replication study utilized a British sample so as to minimize the differences between the current study and the original study, while Bunderson’s replication used an American sample. These differences in sample nationality are not expected to have a substantial implication on the findings however, given that the original study expects the effects to be applicable for readers in general and not just readers in a specific country.
Lastly, the current study followed the exclusion criteria outlined by Bunderson (2020), which included two additional exclusion criteria compared to the original study done by Child et al. (2018) as outlined above. To ensure that these differences in exclusion criteria did not influence the findings, sensitivity analyses were conducted whereby only the exclusion criteria in the original Child et al. (2018) study was implemented. Unless otherwise stated, there were no differences in the conclusions drawn after including these participants and data points.
Based on the framework outlined in LeBel et al. (2018), this current rescue would be considered a far replication because the population’s age (Range: 18 to 76; Mean = 39.52) was dissimilar to the original study’s (Range 18 to 33; Mean = 22.31). However, it is noteworthy that this was necessary to test the proposed exploratory hypothesis that educational status might moderate the original findings. If we take that the population’s mean age as a non-crucial characteristic of the population that would not influence the original findings, this rescue would be a very close replication given that the rescue used the exact same stimuli used for the IV and DV measures as the original study, although some procedural details (e.g., the filler passages) and physical setting (online study) in which the rescue was conducted was different from the original study (in-person study).
Methods Addendum (Post Data Collection)
The methods and proposed analyses as described above were preregistered. The preregistration can be found here.
Actual Sample
A balanced sample of 50% females and 50% males were recruited for the study over Prolific. Although the gender proportion of the original sample is unclear, the hypotheses were not gender-specific. Thus, to ensure that both males and females were represented in the sample, a balanced sample was recruited. The descriptive statistics of the sample are provided below.
Differences from pre-data collection methods plan
In calculating the log-residual reading times, the model whereby the log-transformed reading time was regressed against the number of characters in each sentence (to account for the length effects of different items) could not converge, even after trying out multiple optimizer. Thus, the number of characters in each sentence were scaled.
Results
Data preparation
The raw data for the rescue project is available in the data folder of the GitHub Repository. The data has been anonymized prior to being uploaded onto the repository. Below are some notes to understand the raw data column headings, as summarized by Bunderson (2020): - ‘2p’ stands for second-person perspective. - ‘3pF’ stands for third-person perspective with a female character. - ‘3pM’ stands for third-person perspective with a male character. - ‘P’ and ‘N’, followed by a number, indicate the valence and scenario number. - ‘RT’ stands for reaction or reading time, and is preceded by a number indicating the sentence. - ‘ER’ stands for emotion self-rating.
Prior to analyses, the raw data was first cleaned to remove participants who were screened out of the survey (i.e., non-native English speakers, and/or participants with a diagnosed reading disorder). Next, the columns not needed for the data analysis (e.g., the emotion ratings and reading time for the filler passages, the click counts for each survey page, etc.) were removed and certain variables were renamed to aid readability. Specifically, “ResponseId” was renamed to “Subject”, “D1.AGE” was renamed to “Age”, and “EDU” was renamed to “Education”.
To prepare for the data analyses, data was transformed into the long format, where each row represented one trial. Thus, in the long datasets, each participant had multiple rows of data. Two additional columns were created during this transformation as well to indicate the valence and perspective conditions of the particular trial. In the column “Valence”, “TRUE” denotes a positively valenced trial, and “FALSE” denotes a negatively valenced trial. In the column “Perspective”, “TRUE” denotes a personal (second-person) perspective, and “FALSE” denotes the onlooker (third-person) perspective.
Two types of long datasets were created: one for the main analyses (excluding participants who had reading times of less than 400ms for more than 50% of the trials, trails which were longer than 15s, and trials that were more than 2.5SDs from each individuals’ mean; N = 98), and one for the sensitivity analyses (only excluding trials that were more than 2.5SDs from each individuals’ mean, as in the original study by Child et al., 2018; N = 100).
The above data preparation process is in the code below, with brief comment titles to denote exact steps. The code can be seen in the Quorto File uploaded in the repository.
### Data Preparation: #### Load Relevant Libraries and Functionslibrary(RCurl) # for reading data from github
Warning: package 'RCurl' was built under R version 4.2.3
library(tidyverse) # for data cleaning
Warning: package 'ggplot2' was built under R version 4.2.3
Warning: package 'tibble' was built under R version 4.2.3
Warning: package 'dplyr' was built under R version 4.2.3
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.2 ✔ readr 2.1.4
✔ forcats 1.0.0 ✔ stringr 1.5.0
✔ ggplot2 3.4.2 ✔ tibble 3.2.1
✔ lubridate 1.9.2 ✔ tidyr 1.3.0
✔ purrr 1.0.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ tidyr::complete() masks RCurl::complete()
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(matrixStats) # for data cleaning
Warning: package 'matrixStats' was built under R version 4.2.3
Attaching package: 'matrixStats'
The following object is masked from 'package:dplyr':
count
library(psych) # for descriptives
Warning: package 'psych' was built under R version 4.2.3
Attaching package: 'psych'
The following objects are masked from 'package:ggplot2':
%+%, alpha
library(lme4) # for LME analyses
Warning: package 'lme4' was built under R version 4.2.3
Loading required package: Matrix
Warning: package 'Matrix' was built under R version 4.2.3
Attaching package: 'Matrix'
The following objects are masked from 'package:tidyr':
expand, pack, unpack
library(lmerTest) # for LME analyses
Attaching package: 'lmerTest'
The following object is masked from 'package:lme4':
lmer
The following object is masked from 'package:stats':
step
library(lsmeans) # for LME (contrasts) analyses
Warning: package 'lsmeans' was built under R version 4.2.3
Loading required package: emmeans
Warning: package 'emmeans' was built under R version 4.2.3
The 'lsmeans' package is now basically a front end for 'emmeans'.
Users are encouraged to switch the rest of the way.
See help('transition') for more information, including how to
convert old 'lsmeans' objects and scripts to work with 'emmeans'.
library(ggplot2) # for graphslibrary(metafor) # for mini meta-analysis
Warning: package 'metafor' was built under R version 4.2.3
Loading required package: metadat
Warning: package 'metadat' was built under R version 4.2.3
Loading required package: numDeriv
Loading the 'metafor' package (version 4.2-0). For an
introduction to the package please type: help(metafor)
library(sessioninfo) # for recording session info
Warning: package 'sessioninfo' was built under R version 4.2.3
library(pander) # for recording session info#### Import data and remove unnecessary columns and rowsdata.url <-"https://raw.githubusercontent.com/psych251/child2018_rescue/main/data/child2018_main_December%2B5%2C%2B2023_18.46.csv"charCount.url <-"https://raw.githubusercontent.com/psych251/child2018_rescue/main/data/Child_Oakhill_Garnham_2018_Sentence_CharacterCount.csv"child2018res_data <-read.csv(text =getURL(data.url), na.strings=c("",NA_real_)) %>% dplyr::filter(Finished ==1) %>%select( -c("StartDate":"RecordedDate","RecipientLastName":"UserLanguage"),-starts_with("PRACTICE"),-starts_with("FILLER"),-contains("Click")) %>%rename("Subject"="ResponseId","Age"="D1.AGE","Education"="EDU") %>% dplyr::select(Subject, Age, Education, everything()) %>%rename_all(~str_replace_all(., "_Page.Submit", "")) %>%rename_all(~str_replace_all(., "X", "")) child2018res_charcount <-read.csv(text =getURL(charCount.url)) %>% dplyr::mutate(Character_Count_scaled =scale(Character_Count))#### Data exclusion: removing participants# remove participants who did not complete main survey (i.e., screened out)filtered_data = child2018res_data %>%filter(D2.NATIVE.ENGLISH ==1& D3.READING.DISORDER ==2) %>% dplyr::select(-D2.NATIVE.ENGLISH, -D3.READING.DISORDER)# initial no. of participants = 100# remove participants who spend less than 400ms on more than half of trialsfiltered_data = filtered_data %>% dplyr::select(Subject:Education, ends_with("RT"), ends_with("ER")) %>%# rearrange columns dplyr::mutate(proplessthan400ms =rowSums(across(`2p.N1.1RT`:`3pF.P12.7RT`) <0.400, na.rm =TRUE) /rowSums(across(`2p.N1.1RT`:`3pF.P12.7RT`) >0.000, na.rm =TRUE)) %>% dplyr::filter(proplessthan400ms <0.5) %>% dplyr::select(-proplessthan400ms)# new no. of participants = 98 (i.e., 2 participants removed)#### Data exclusion: removing specific data points# initial no. of datapoints = 12642sum(!is.na(filtered_data %>% dplyr::select(ends_with('RT'))))
[1] 12642
# Exclude reading times that are longer than 15 seconds & equal to 0sfiltered_data <- filtered_data %>% dplyr::mutate(across(where(is.character) & (contains("Edu") |contains("Age")), trimws)) %>% dplyr::mutate(across(where(is.character) & (ends_with('RT') |ends_with('ER') |contains("Edu") |contains("Age")), as.numeric)) %>% dplyr::mutate(across(ends_with('RT'), ~ifelse(. >15, NA_real_, .))) %>% dplyr::mutate(across(ends_with('RT'), ~ifelse(. ==0, NA_real_, .)))sum(!is.na(filtered_data %>% dplyr::select(ends_with('RT')))) # no. of datapoints left = 12487 (i.e., 155 datapoints removed)
[1] 12487
# Exclude reading times that are more than 2.5sd from each individuals' meanfiltered_data <- filtered_data %>% dplyr::mutate(RTmean =rowMeans(as.matrix(dplyr::select(., ends_with('RT'))), na.rm =TRUE),RTsd =rowSds(as.matrix(dplyr::select(., ends_with('RT'))), na.rm =TRUE) ) %>% dplyr::mutate(across(ends_with('RT'),~case_when(. > RTmean + (2.5* RTsd) ~NA_real_, . < RTmean - (2.5* RTsd) ~NA_real_,TRUE~ .) )) %>% dplyr::select(-RTmean, -RTsd)sum(!is.na(filtered_data %>% dplyr::select(ends_with('RT'))))
[1] 12099
# new no. of datapoints = 12099 (i.e., another 388 datapoints removed)#### Prepare data for analysis# Convert data into long format (RT; Main data for replication) long_RT_data = filtered_data %>% dplyr::select(-ends_with(".ER")) %>%pivot_longer(cols=-c("Subject", "Age", "Education"),names_to ='Passage',values_to ='RT') %>% dplyr::mutate(Perspective =as.factor(grepl("2p", Passage)),Valence =as.factor(grepl(".P", Passage))) %>% dplyr::filter(!is.na(RT)) %>%left_join(., child2018res_charcount, by =c("Passage")) %>% dplyr::mutate(log_RT =log(RT)) # create log RT# Convert data into long format (ER; Exploratory data) long_ER_data = filtered_data %>% dplyr::select(-ends_with("RT")) %>%pivot_longer(cols=-c("Subject", "Age", "Education"),names_to ='Passage',values_to ='ER') %>%mutate(Perspective =as.factor(grepl("2p", Passage)),Valence =as.factor(grepl(".P", Passage))) %>% dplyr::filter(!is.na(ER))#### Prepare data for sensitivity analysis sensitivity_data = child2018res_data %>%filter(D2.NATIVE.ENGLISH ==1& D3.READING.DISORDER ==2) %>% dplyr::select(-D2.NATIVE.ENGLISH, -D3.READING.DISORDER) %>% dplyr::select(Subject:Education, ends_with("RT"), ends_with("ER")) %>% dplyr::mutate(across(where(is.character) & (contains("Edu") |contains("Age")), trimws)) %>% dplyr::mutate(across(where(is.character) & (ends_with('RT') |ends_with('ER') |contains("Edu") |contains("Age")), as.numeric)) %>% dplyr::mutate(across(ends_with('RT'), ~ifelse(. ==0, NA_real_, .))) %>% dplyr::mutate(RTmean =rowMeans(as.matrix(dplyr::select(., ends_with('RT'))), na.rm =TRUE),RTsd =rowSds(as.matrix(dplyr::select(., ends_with('RT'))), na.rm =TRUE)) %>% dplyr::mutate(across(ends_with('RT'),~case_when(. > RTmean + (2.5* RTsd) ~NA_real_, . < RTmean - (2.5* RTsd) ~NA_real_,TRUE~ .))) %>% dplyr::select(-RTmean, -RTsd)sum(!is.na(sensitivity_data %>% dplyr::select(ends_with('RT'))))
[1] 12549
# initial no. of datapoints = 12642# new no. of datapoints = 12558 (i.e., 84 datapoints removed)# Convert data into long format (RT) long_RT_data_sensitivity = sensitivity_data %>% dplyr::select(-ends_with(".ER")) %>%pivot_longer(cols=-c("Subject", "Age", "Education"),names_to ='Passage',values_to ='RT') %>% dplyr::mutate(Perspective =as.factor(grepl("2p", Passage)),Valence =as.factor(grepl(".P", Passage))) %>% dplyr::filter(!is.na(RT)) %>%left_join(., child2018res_charcount, by =c("Passage")) %>% dplyr::mutate(log_RT =log(RT)) # create log RT# Convert data into long format (ER) long_ER_data_sensitivity = sensitivity_data %>% dplyr::select(-ends_with("RT")) %>%pivot_longer(cols=-c("Subject", "Age", "Education"),names_to ='Passage',values_to ='ER') %>%mutate(Perspective =as.factor(grepl("2p", Passage)),Valence =as.factor(grepl(".P", Passage))) %>% dplyr::filter(!is.na(ER))
Results of control measures
There were three exclusion criteria used. Firstly, participants (n = 2 participants) who had reading times of less then 400ms for more than half the trials were excluded from analysis. Secondly, reading times that were longer than 15 seconds were re-coded as missing data (n = 155 data points across 52 participants; 1.23% of datapoints). Finally, outliers 2.5 standard deviations below or above the mean reading times per sentence per participant were removed (n = 388 data points; 3.07% of datapoints). Thus, a total of 98 participants (4.30% missing reading time data) were retained for analysis.
Here are some demographic information about the sample (n = 98):
vars n mean sd median trimmed mad min max range skew kurtosis
Age 1 98 39.52 12.71 38 38.96 16.31 18 76 58 0.39 -0.46
Education 2 98 15.04 3.97 16 15.21 2.97 2 24 22 -0.63 0.95
se
Age 1.28
Education 0.40
# Education = Number of years of formal education
Sensitivity analyses were also conducted with the full 100 participant sample excluding outlier data points that were 2.5 standard deviations below or above the mean reading times per sentence per participant (with n = 84 data points; 0.66% of datapoints).
Confirmatory analysis
The basic descriptive statistics of the sample (age, years of formal education, emotion rating responses, average raw reading times) are presented below.
In line with the original study and the replication study done by Bunderson, log-residual reading times were calculated to account for the length effects of different items. Initially, a regression of (log-transformed) reading times against the number of characters per sentence was performed in line with the original study and with Bunderson’s replication. However, the model failed to converge. Thus, a regression of (log-transformed) reading times against the scaled value of the number of characters per sentence was performed and the log-residual reading times (with an intercept of zero for each participant) were used for the main analyses later on.
### Regress log RT against character count and create variablereg_RT_data =lmer(log_RT ~1+ Character_Count_scaled + (1+ Character_Count_scaled | Subject), long_RT_data)long_RT_data$log_RT_Resid <-residuals(reg_RT_data)rm(reg_RT_data)
### Regress log RT against character count and create variable (sensitivity)reg_RT_data_sensitivity =lmer(log_RT ~1+ Character_Count_scaled + (1+ Character_Count_scaled | Subject), long_RT_data_sensitivity)long_RT_data_sensitivity$log_RT_Resid <-residuals(reg_RT_data_sensitivity)rm(reg_RT_data_sensitivity)
In line with the original study, a LME model was run whereby random intercepts for both participants and items were included in the model. Results show that there was no interaction between valence and perspective on log residual reading times.
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: log_RT_Resid ~ Perspective + Valence + Perspective * Valence +
(1 | Subject) + (1 | Passage)
Data: long_RT_data_sensitivity
REML criterion at convergence: 22542.6
Scaled residuals:
Min 1Q Median 3Q Max
-9.3252 -0.5005 -0.0519 0.4514 6.5386
Random effects:
Groups Name Variance Std.Dev.
Passage (Intercept) 2.763e-02 1.662e-01
Subject (Intercept) 2.706e-16 1.645e-08
Residual 3.410e-01 5.840e-01
Number of obs: 12549, groups: Passage, 258; Subject, 100
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 0.005020 0.023637 251.829694 0.212 0.832
PerspectiveTRUE -0.012862 0.033419 251.593435 -0.385 0.701
ValenceTRUE 0.002571 0.032807 252.106143 0.078 0.938
PerspectiveTRUE:ValenceTRUE 0.009012 0.046385 251.878700 0.194 0.846
Correlation of Fixed Effects:
(Intr) PrTRUE VlTRUE
PrspctvTRUE -0.707
ValenceTRUE -0.720 0.510
PTRUE:VTRUE 0.510 -0.720 -0.707
optimizer (nloptwrap) convergence code: 0 (OK)
boundary (singular) fit: see help('isSingular')
To test if participants read positively-valenced passages faster when they were written from a personal (second person’s) perspective compared to when they were written from an onlooker’s (third person’s) perspective, pairwise comparisons were conducted using the lsmeans package, as in the original study. Results show that there was no significant difference in log residual reading times between the perspective conditions within the positive valence condition. The graph of the results are presented below. alongside the graph of the results obtained by the previous replication study and the results obtained by the original study.
### lsmeans to text for pairwise contrasts RT_model_con <-lsmeans(RT_model, pairwise ~ Perspective | Valence, at =list(Perspective =c("TRUE", "FALSE"), Valence =c("TRUE", "FALSE")))
Note: D.f. calculations have been disabled because the number of observations exceeds 3000.
To enable adjustments, add the argument 'pbkrtest.limit = 12099' (or larger)
[or, globally, 'set emm_options(pbkrtest.limit = 12099)' or larger];
but be warned that this may result in large computation time and memory use.
Note: D.f. calculations have been disabled because the number of observations exceeds 3000.
To enable adjustments, add the argument 'lmerTest.limit = 12099' (or larger)
[or, globally, 'set emm_options(lmerTest.limit = 12099)' or larger];
but be warned that this may result in large computation time and memory use.
# graph RT results for original replication study (Bunderson, 2020) knitr::include_graphics("https://github.com/psych251/child2018_rescue/blob/main/writeup/bunderson2020_RTfigure.png?raw=TRUE")
# graph RT results for original study (Child et al., 2018) knitr::include_graphics("https://raw.githubusercontent.com/psych251/child2018_rescue/main/original_paper/child2018_figure1.png?raw=TRUE")
### lsmeans to text for pairwise contrasts for sensitivity analyses RT_model_con_sensitivity <-lsmeans(RT_model_sensitivity, pairwise ~ Perspective | Valence, at =list(Perspective =c("TRUE", "FALSE"), Valence =c("TRUE", "FALSE")))
Note: D.f. calculations have been disabled because the number of observations exceeds 3000.
To enable adjustments, add the argument 'pbkrtest.limit = 12549' (or larger)
[or, globally, 'set emm_options(pbkrtest.limit = 12549)' or larger];
but be warned that this may result in large computation time and memory use.
Note: D.f. calculations have been disabled because the number of observations exceeds 3000.
To enable adjustments, add the argument 'lmerTest.limit = 12549' (or larger)
[or, globally, 'set emm_options(lmerTest.limit = 12549)' or larger];
but be warned that this may result in large computation time and memory use.
To explore if educational status moderated the relationship between perspective of the passage and reading time among positively-valenced passages, a LME was conducted and an interaction term between the years of formal education and perspective was included as a predictor variable in the model. Results show that there was no three-way interaction between education, valence and perspective on log residual reading times.
The graphical results are presented below. Based on simple slope analyses, the only marginally significant (p = .05) simple slope of education was when valence was positive and perspective was third-person. Specifically, individuals with more years of formal education appear to process third-person positive passages quicker than individuals with less years of formal education.
### analyses: education as moderator RT_model <-lmer(log_RT_Resid ~ Perspective + Valence + Perspective*Valence*Education + (1| Subject) + (1| Passage), data = long_RT_data)
Warning: Johnson-Neyman intervals are not available for factor moderators.
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
boundary (singular) fit: see help('isSingular')
█████████████████████ While Valence (2nd moderator) = FALSE ████████████████████
SIMPLE SLOPES ANALYSIS
Slope of Education when Perspective = FALSE:
Est. S.E. t val. p
------ ------ -------- ------
0.00 0.00 1.49 0.14
Slope of Education when Perspective = TRUE:
Est. S.E. t val. p
------ ------ -------- ------
0.00 0.00 0.67 0.50
█████████████████████ While Valence (2nd moderator) = TRUE █████████████████████
SIMPLE SLOPES ANALYSIS
Slope of Education when Perspective = FALSE:
Est. S.E. t val. p
------- ------ -------- ------
-0.00 0.00 -1.95 0.05
Slope of Education when Perspective = TRUE:
Est. S.E. t val. p
------- ------ -------- ------
-0.00 0.00 -0.55 0.59
Emotion rating
Although not the focus of the current rescue replication study, a separate linear mixed effects (LME) model was run to analyze the emotion ratings for each of the four types of passages. Perspective and valence were incorporated as fixed effects, while participants and items were treated as random intercepts. In the original study, Child et al. found “an interaction between perspective and valence for self-reported emotions”, whereby “Participants gave similar ratings for both perspectives when they were presented with a negative emotion (“he/she”: M = 3.36, SD = 1.29; “you”: M = 3.30, SD = 1.46; t = 1.08, p > 0.1), but they gave higher ratings for the second person perspective you (M = 7.04, SD = 1.77) than for the third person perspective (M = 6.82, SD = 1.73) when reading about a positive emotion (perspective effect for positive emotion: (t(4335) = 3.67, p < .001, see Figure 2)” (Child et al., 2018, p.883).
However, in the current study, there was no interaction between valence and perspective on emotion ratings. Within the positively valenced passages, there was also no significant effect of perspective on emotional rating. Nonetheless, the trends are in the same direction as those observed in the original study.
Taken together, the current study did not find any significant difference in the reading time of positively-valenced passages. Interestingly, the trends observed (see graphs presented above showing the log residual regression reading times in each of the four conditions) in the first replication study by Bunderson (2020) matched the original findings more closely than the current rescue study, although the p-values obtained by Bunderson were non-significant.
To examine the collective evidence for the hypothesis that individuals read quicker when positive stories are presented in the second- rather than third-person perspective, a mini-meta analysis is conducted below with the original, initial replication, and current rescue study.
Mini meta analysis
A mini meta-analysis was conducted using the original study, first replication study, and the current rescue project. Through a brief forwards search on Google Scholar, I skimmed the titles and abstracts of 26 records that cited the original study by Child et al. (2018). Examining the titles abstracts of the records, I could not find any other direct (not conceptual) replications of the original study to include in this mini meta-analysis. Notably, there were a few records (approximately 5) which abstracts were not written in English which I could not evaluate.
The effect of interest was the pairwise t-test examining the log residual reading times of second- and third-person positive passages. Of note, Cohen’s d was difficult to calculate given that the model was a mixed effects model, there is no common consensus on the applicability of effect sizes for mixed effects models. Hence, this mini meta-analysis should be considered very very critically.
Effect size from original study (Child et al., 2018)
To calculate the Cohen’s d obtained in the original study, I obtained information from the sentence “However, for positive emotions, reading times were faster for passages that were written from a personal perspective (M = 2201, SD = 1284) than from an onlooker perspective (M = 2350, SD = 1351; t(4408) = 3.73, p < .001; see Figure 1)” (Child et al., 2018, p.883), and used the formula “d = (m1-m2)/(SD_pooled)”, whereby m1 and m2 are the mean reading time values for the two conditions, and SD_pooled is calculated by the formula “SD_pooled = ((sd12-sd22)/2)^0.5”. sd1 and sd2 are the sd of reading time for the two conditions. Hence, substituting the values into the equation, I calculated Cohen’s d to be: * d = (2201 - 2350)/((12842 +13512)/2)0.5 = -0.113.
To get the standard error of the Cohen’s d estimate, the formula “SE_d = (2(1-r)/n + d2/2n)0.5” was used, whereby r is the correlation of participants’ reading times in the two conditions (positive-personal and positive-onlooker), n is the number of participants, and d is the Cohen’s d value. r is unknown based on the available data, and is assumed to be 0.70 in the current work. Thus, I calculated the SE_d to be: * SE_d = ((2x(1-0.70))/36 + 0.1132/(2x36))0.5 = 0.130.
Effect size from replication study (Bunderson, 2020)
Although Bunderson did report the Cohen’s d value in her replication, I calculated Cohen’s d using the same formula as above to standardize the effect sizes used in the mini-meta analysis. Of note, Bunderson (as well as the current rescue study) used the average reading time per sentence in each passage, rather than the total reading time per passage. This may account for why the mean values look different relative to the original study. Nonetheless, this difference should not impact the calculation of the effect size. Hence, substituting the values into the equation, I calculated Cohen’s d to be: * d = (7.57 - 8.13)/((2.152 +1.992)/2)0.5 = -0.270.
To get the standard error of the Cohen’s d estimate, the same formula was used as above, and r is again assumed to be 0.70. Thus, I calculated the SE_d to be: * SE_d = ((2x(1-0.70))/40 + 0.2702/(2x40))0.5 = 0.126.
Effect size from current rescue study
Using the effect size excluding participants who spent less than 400ms on more than 50% of the trials, excluding data points that were greater than 15 seconds, and excluding data points mre than 2.5sd away from each participants means, I calculated Cohen’s d to be: * d = (7.399660 - 7.079812)/((1.8231342 +1.9631922)/2)0.5 = 0.169.
To get the standard error of the Cohen’s d estimate, the same formula was used as above, and r is again assumed to be 0.70. Thus, I calculated the SE_d to be: * SE_d = ((2x(1-0.70))/98 + 0.1692/(2x98))0.5 = 0.079.
Mini meta analysis results
Based on the mini meta analysis, there was no significant effect of perspective on reading time among positively-valenced passages. Interestingly, the effect sizes calculated suggest that Bunderson’s (2020) replication may have provided stronger evidence for the hypothesis than the original study. Nonetheless, the above calculations may have been inaccurate, so this analysis should be interpreted with caution.
The mini meta-analysis suggests that the effect size is d = -0.056 (se = 0.135, 95% CI = [-0.32, 0.21], p = .678).
### analysis: mini meta-analysis minimeta_dataset <-data.frame(study =c("Child et al., 2018", "Bunderson, 2020", "Lua, 2023"),N =c(36,40,98),cohens_d =c(-0.113,-0.270,0.169),se =c(0.130,0.126,0.079)) mini_meta_mod <-rma(yi = cohens_d, sei = se, slab = study, data = minimeta_dataset)summary(mini_meta_mod)
Random-Effects Model (k = 3; tau^2 estimator: REML)
logLik deviance AIC BIC AICc
0.1164 -0.2329 3.7671 1.1534 15.7671
tau^2 (estimated amount of total heterogeneity): 0.0418 (SE = 0.0545)
tau (square root of estimated tau^2 value): 0.2045
I^2 (total heterogeneity / total variability): 77.53%
H^2 (total variability / sampling variability): 4.45
Test for Heterogeneity:
Q(df = 2) = 9.8832, p-val = 0.0071
Model Results:
estimate se zval pval ci.lb ci.ub
-0.0560 0.1346 -0.4157 0.6776 -0.3198 0.2079
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# plot forest plot aggregate <-tibble(study ="Aggregate", cohens_d = mini_meta_mod$b, se = mini_meta_mod$se, N =sum(minimeta_dataset$N)) for_plot <- minimeta_dataset %>%bind_rows(aggregate) %>%mutate(study =factor(study, levels =c("Aggregate", "Child et al., 2018", "Bunderson, 2020", "Lua, 2023")))ggplot(for_plot, aes(x = study, y = cohens_d, ymin = cohens_d -1.97*se, ymax = cohens_d +1.97*se, size = N))+geom_errorbar(colour ='darkgray', size = .5, width = .25)+geom_point(data = for_plot %>%filter(study!="aggregate")) +geom_point(data = for_plot %>%filter(study =="aggregate"), shape =18)+coord_flip()+scale_size_area()+geom_hline(yintercept =0, color ="black")+theme(legend.position ="none")+geom_vline(xintercept =1.5, lty =2)+labs(y ="Main effect size on original scale", x ="")
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
Summary of Replication Attempt
The current rescue failed to replicate the finding that individuals read quicker when positive stories are presented in the second-person perspective was not replicated. The current rescue found that there was no significant difference in reading time of passages that were written in the second- (personal) and third-person (onlooker). The mini meta-analysis (albeit preliminary and should be interpreted with caution) also suggests that there is insufficient evidence for this effect. Thus, I would conclude that the current results suggest a failed replication for now. On a scale between 0 and 1, I would relate the replication success a 0, given that the main effect of interest failed to replicate and was observed to be in the opposite direction from the original study.
Commentary
One notable insight from the exploratory analyses suggests that the theory behind Child et al.’s (2018) hypothesis may not be incorrect. Specifically, the hypothesis that individuals read quicker when positive stories are presented in the second-person perspective was based on the notion that “stronger empathic responses to negative information will lead to faster reading times in both perspectives” (Child et al., 2018, p.881). In the exploratory analysis, it was found that participants did not differ in their emotional responses towards passages in the different conditions. It also followed that there were no differences in their reading time across the four conditions. It could be possible that relative to the original participants in Child et al.’s (2018) study who completed the study in the lab, the rescue sample recruited on prolific were less engaged in the task.
On my end, I tried to exclude participants based on indices that could have indicated inattention (e.g., speeding through the task, spending more than 15s reading a sentence). However, these proxies may have been poor indices of inattention. It would be fruitful to examine other possible indicators of inattention to investigate if inattention may have been the cause of the replication.
I was honestly shocked that the results had failed to replicate, given that the trends observed by Bunderson (2020) in the first replication was very similar to that obtained by Child et al. (2018). To rule out the possibility that my code was incorrect, I ran my code with Bunderson’s data (which had similar variable names as well, given that the current study utilized the Qualtrics survey file provided by Bunderson), and the trends I observed mimic what Bunderson had found. This suggested that the code (at least for the main effects of interest) are unlikely to be gravely incorrect. Thus, it could be possible that this null result was just due to the natural variation in random sampling.