Replication of - Why do children make mirror errors in reading? Neural correlates of mirror invariance in the visual word form area- by Dehaene et al. (2009, NeuroImage)

Author

Neha Rajagopalan (Email ID: neharaj@stanford.edu)

Published

October 22, 2023

Introduction

Justification

As a PhD student in Developmental and Psychological Sciences at the Graduate School of Education, I am constantly thinking about methods to study underlying neural and behavioral changes that occur during learning in children and adults. Given my interests, I believe that the current chosen topic aligns perfectly with my goal to expand knowledge in education psychology and learn appropriate scientific reporting of results. The paper by Dehaene et al. discusses the disappearance of the “mirror effect” (i.e. writing words from right to left while children learn to read and write) in adults and further investigates a higher presence of mirror generalization for pictures than words through a behavioral task conducted in an fMRI scanner. In the past, I have explored literature in auditory processing and narratives for learning. Through this rescue project, however, I intend to be introduced to experiment design, priming effects and analysis metrics used in visual mechanism studies.

Stimuli and Procedure

The behavioral task (called the “same-different” task) involves presenting 14 french words, 14 Japanese characters, 14 pictures of tools, 14 pictures of faces, and 14 unknown scripts (all black and white) as visual stimuli along with their left-right reversed mirror images. Each trial displays two images from the same category at a fixation position with 200ms of presentation and 300ms of inter-stimulus interval. The participant responds with “same” to the pair of images if they are physically identical (eg. same word both in normal orientation or mirrored) and “different” if they are either unrelated (eg. two different words from the same category) or differently oriented (eg. word followed by mirrored orientation).

Challenges

The target population is adults with a mean age of 23 years. The original task was, however, conducted on French and Japanese participants. One challenge, I can foresee, is making the choice between maintaining the cultural context (which might make recruiting participants difficult and generating language target stimuli time consuming) and opening the study to all participants sans cultural segregation. Originally, the study is restricted to 26 participants. I believe that the analysis may require much more data to make generalizable conclusions on mirror invariance in adults.

Summary of prior replication attempt

Based on a preliminary comparison between the original report and the 1st replication, there are no differences in sample size, method and analysis. Since the original study included an fMRI task, the study was conducted in-person as opposed to the replication study that collected behavioral data online. Due to this difference, participants did not have any exposure to same-different task in the replication study while they did during the original study. The original study had a final mean age of 23 years across participants but the replication study had a mean age of 36 years. The replication attempt was only partially successful since the author obtained results that were both consistent and inconsistent with the original report for different analysis tests.

Methods

Power Analysis

Original effect size, power analysis for samples to achieve 80%, 90%, 95% power to detect that effect size. Considerations of feasibility for selecting planned sample size.

How much power does your planned sample have for original effect? For an attenuated effect that is half the size of the original?

(If power analysis is not possible or precise, discuss more fully how you determined a sample size that would be sufficient for rescue.)

Planned Sample

The rescue project will increase the sample size to 40 participants and exclude the cultural context between Japanese and French participants by opening the study to a global population.

Materials

“All stimuli were presented in black-and-white, and occupied similar locations on screen (approximate width and height : 2° × 2° for Japanese characters and faces; 1.5–4° × 1.5–4° for tools, depending on their compactness and vertical or horizontal main axis; 0.8° × 2.3° for French words). Several precautions were taken to ensure that the task required view-point invariant recognition and could not be performed using simple short-cuts. All stimuli were selected so that they were clearly asymmetrical and maximally distinct from their mirror images. In particular, the faces were not front views, but were viewed and lit from an angle intermediate between profile and front view. Likewise, the Japanese characters were presented in a curvy font (“HG Sei-Kaisho-Tai”) so that they did not contain any vertical or horizontal bars that would be identical after left–right inversion. The French words had an even number of non-repeated letters, so that no letter was repeated at the same location in a word and its mirror-image. Finally, the French words were made of lower-case letters b, d, i, l, m, n, o, p, q, u, v, x, and were presented in an 20-point Arial font, slightly modified so that the above letters were exactly symmetrical on screen. As a result, even in mirror-image the words appeared as alphabetical strings made of essentially normal letters (non-French readers could not easily tell that they were not French words). A similar manipulation was not possible with Japanese characters, but we selected characters made of strokes that did not seem artificial once reversed (non-Japanese readers could not easily tell that these were not Japanese characters). French and Japanese words were matched on frequency (mean Log10 frequency = 1.14 versus 0.90, n.s.).” - Pg 1846

The French and Japanese words will be substituted with English words to maintain uniformity for cultural contexts. The other rules for stimuli presentation will be maintained in the current study.

Procedure

“The stimuli for the behavioral same-different task, performed after fMRI, were 14 French words, 14 Japanese characters, 14 pictures of tools, 14 pictures of faces, 14 unknown script stimuli, and their corresponding left–right reversed mirror images. On each trial, two stimuli from the same category were successively presented at the fixation (200 ms presentation of each image, 300 ms inter-stimulus interval with fixation cross). The participant’s task was to decide whether the two stimuli depicted the same object, possibly in mirror- form. Thus, the participants had to respond “same” both to physically identical stimuli (1/4 of trials) and to mirror images (1/4 of trials). They had to respond “different” whenever the stimuli were unrelated, whether they were in the same orientation (e.g. two normally oriented words; two faces in the same orientation; 1/4 of trials), or whether they were in different orientations (e.g. one word followed by a mirror image of a word). The first stimulus, drawn from one of the five categories, was always in standard orientation, and the second stimulus was defined by a 2 × 2 factorial design with factors of identity (same or different object) and orientation (same or different left– right orientation). This design defined a list of 14 × 5 × 2 × 2 = 280 trials, which were run once in random order.” - Pg 1846

The differences for the rescue study includes using four categories instead of five by merging the French words and Japanese characters categories as one category consisting of English words. Therefore the number of trials will reduce to 14 x 4 x 2 x 2 = 224 trials.

Controls

The trials will be split into 10 blocks with a break between each block to avoid fatigue and loss of attention. If the participant takes more than 2 seconds to respond there could either be an exclusion criteria or a message requesting participants to pay attention and displaying a substitute trial.

Analysis Plan

Similar to the original study and the replication report, the following analysis plan will be conducted:

“The following analysis plan will be followed from the study: Median correct response times were analyzed using an ANOVA with group as a between-subjects factor and stimulus category, repetition (same or different image) and orientation of the first image (normal or mirror; the second image was always in normal orientation) as within-subject factors. I will perform an ANOVA test to examine means of correct median response times between the five categories (French and Japanese words, tools, faces, and false fonts).”

The five categories will be substitued with four categories (English, tools, faces, and false fonts).

Differences from Original Study and 1st replication

The original study does not mention any methods of exclusion criteria or insertion of attention checks, whereas the replication study added the exclusion criteria since it was an online study (as opposed to the original study that was conducted in-person). Participants in the replication study did not have any exposure to same-different task while they did during the original study. The original study had a final mean age of 23 years across participants but the replication study had a mean age of 36 years. Apart from these, there were no significant differences in the original study and the replication study. The results were however not completely consistent. One ANOVA analysis was replicable whereas an additional second analysis was not.

Methods Addendum (Post Data Collection)

You can comment this section out prior to final report with data collection.

Actual Sample

Sample size, demographics, data exclusions based on rules spelled out in analysis plan

Differences from pre-data collection methods plan

Any differences from what was described as the original plan, or “none”.

Results

Data preparation

The replication report cleaned the raw data (added a column) by clearly indicating whether the stimuli pairs were mirror imaged or normally oriented.Preprocessing of data was completed prior to the anlaysis to obtain required rows and columns for the mirror invariance task.

Results of control measures

As mentioned above, there were no quality control checks or positive and negative controls in the replication study. The study excluded trials that had response times lesser than 100ms and greater than 2000ms to remove attention deficit trials, and trials with accidental button presses. Additionally subjects with less than 80 percent accuracy were removed from the analysis to ensure attention of participants.

Confirmatory analysis

The replication study conducted ANOVA and found that category the effect on mean correct response times was significant (p=0.04708) and consistent with the original study. The author also ran an additional ANOVA to compare means of categories between each other but did not observe significant results (inconsistent with original study).

Results from Original study
Results from Replication study

Exploratory analyses

Any follow-up analyses desired (not required).

Discussion

Mini meta analysis

Combining across the original paper, 1st replication, and 2nd replication, what is the aggregate effect size?

Summary of Replication Attempt

Open the discussion section with a paragraph summarizing the primary result from the confirmatory analysis and the assessment of whether it replicated, partially replicated, or failed to replicate the original result.

Commentary

Add open-ended commentary (if any) reflecting (a) insights from follow-up exploratory analysis, (b) assessment of the meaning of the replication (or not) - e.g., for a failure to replicate, are the differences between original and present study ones that definitely, plausibly, or are unlikely to have been moderators of the result, and (c) discussion of any objections or challenges raised by the current and original authors about the replication attempt. None of these need to be long.