Introduction

In their paper, Mirault et al. (2018) demonstrate a novel transposed word effect in speeded sentence grammaticality judgments. They constructed base sentences that were either grammatical or ungrammatical, and then derived test sentences from them by transposing two words, finding that ungrammaticality judgments for transposed word sentences required longer reaction times and had higher error rates when the base sentences were grammatical. The authors suggested that these results demonstrate some uncertainty in word order encoding as well as parallel processing of words.

This transposed word effect has important implications in psycholinguistics, particularly regarding models of sentence processing and grammar. Furthermore, since the effect is a relatively novel phenomenon, it should be verified through replications and reproductions in order to demonstrate its robustness. In particular, the original study was conducted in French, which has agreement (in person, number, and gender) among verbs, articles, adjectives, and nouns. This contrasts with English, which has much more limited agreement (only between subjects and verbs). Additionally, French has certain word orders which are much less common or not possible in English (e.g. adjectives after nouns, direct object before verb). Replicating the effect in English would thus demonstrate that the effect is not solely due to the particular agreement or word order characteristics of French, but can generalise across languages.

The materials needed to replicate this study include the test sentences (which serve as stimuli for the speeded grammaticality judgment task), as well as the specific instructions given to participants. The former are available on the paper’s OSF site; however, the latter will have to be obtained by communicating with the authors. The original experiment was conducted online using Java, and a similar implementation (using JavaScript) on a crowdsourcing platform will be adopted for the replication study.

Since the materials, procedures, and analyses have been described in Mirault et al.’s paper to quite a good degree of detail, a reimplementation of the experimental and analytical designs is likely to be relatively straightforward as long as the specific instructions can be obtained from the authors. The most important challenge is the development of English test stimuli that are sufficiently close in construction to the French test stimuli in order to avoid the effects of improper stimuli.

The repository for this project is hosted on GitHub.

Methods

Power Analysis

Planned Sample

Materials

As the present replication involves a different language from the original study, new stimuli were created in English according to the specifications of the original study. Pairs of five-word grammatical base sentences were constructed, largely using translated lexical items from the original French, modifying where necessary to avoid semantic oddity, to minimize repetitions of lexical items, and to produce sentences that were amenable to the subsequent manipulations. The last words of each pair were then swapped to produce corresponding pairs of ungrammatical base sentences. Finally, the third and fourth words of each base sentence were transposed to form the test sequences, such that the transposed words were of different grammatical categories. Following the original authors, I will refer to the two conditions as the transposed-word condition (derived from a grammatical base sentence) and the control condition (derived from an ungrammatical base sentence). These are illustrated in Table 1.

Table 1. Examples of base sentences and test sequences

Sequence Example
Base
- Grammatical You should really leave now.
Her handsome neighbor is moving.
- Ungrammatical You should really leave moving.
Her handsome neighbor is now.
Test
- Transposed-word You should leave really now.
Her handsome is neighbor moving.
- Control You should leave really moving.
Her handsome is neighbor now.

The remainder of the stimulus construction followed the original study: 160 ungrammatical test sequences were constructed from 80 grammatical and 80 ungrammatical base sentences. These were distributed into two lists, such that participants only saw half of each type of test sequence, and did not experience “repetition of sequences containing the same words”. Additionally, “each participant also saw 80 grammatically correct sentences that were not related in any way to the base sequences used to generate the ungrammatical test sequences.”

The full set of English stimuli can be found here. In this file, items 1–40 have transposed-word sequences in List 1 and control sequences in List 2, items 41–80 have control sequences in List 1 and transposed-word sequences in List 2, and items 81–160 are grammatically correct sentences identical in both lists.

Procedure

The procedure from the original paper was followed closely: “Participants were instructed to decide as rapidly and as accurately as possible whether the sequence of words was grammatically correct. On each trial, a fixation cross was displayed on the center of the screen during a random time ranging between 500 and 700 ms, followed by the stimulus (a five-word sequence) centered on the screen. The distance between the central fixation cross and the first letter of the sequence varied between 8° and 18° of visual angle as a function of the length of the five-word sequence. The word sequence remained on screen until response. After this, a feedback dot was presented for 700 ms, in green if the response was correct or in red if the response was incorrect.”

Additional detail regarding online presentation was as follows: “Stimuli were presented online using Java protocol on the personal computer of the participant. Sentences were presented in 30-point mono-spaced font (Droid Sans Mono) in black on a white background. Participants were asked to sit about 60 cm from the monitor, such that 1 cm equaled approximately 1° of visual angle. Participants responded using their index fingers with two arrows on the computer keyboard: right for grammatical decisions and left for ungrammatical decisions.”

The only deviation from the abovementioned procedure is that the experiment was coded using JavaScript (relying on the jsPsych library) instead of Java, but this is unlikely to have resulted in any substantial difference in the results.

Analysis Plan

As with the original paper, “response times (RT; the time between onset of stimulus presentation and participant’s response) for correct responses and response accuracy” were analyzed. Similarly, prior to analysis, RTs were also inverse-transformed (-1000/RT) to normalize the distribution.

Exclusion criteria

The exclusion criteria used in the original paper were:

  1. Low overall accuracy (no specific cutoff), and
  2. RTs beyond 2.5 standard deviations from the grand mean.

These were adopted for the present replication, with an accuracy cutoff specified at 50% (i.e. at chance), such that participants who performed worse than chance were excluded.

Key analyses of interest

The key analyses of the original paper were:

  1. Linear mixed-effects (LME) model to analyze RTs, and
  2. Generalized (logistic) linear mixed-effects (GLME) model to analyze accuracy.

These effects were considered to be reliable if |t| (for LME) or |z| (for GLME) were greater than 1.96. Furthermore, the original authors “used the maximal random structure model that converged …, and this included by-participant and by-item random intercepts in all analyses”. These analyses were also adopted for the present replication.

Differences from Original Study

In summary, two key methodological differences exist between the original study and the present replication. These are the recruitment of participants on MTurk (instead of volunteers), and the use of novel English stimuli (instead of the original French). The former may influence the results to a small extent due to motivation differences, but it is unlikely to result in a significant change in the result, considering that the study involves a response time task. The latter, however, is much more likely to affect the result. Whether the present replication succeeds or fails would thus depend strongly on whether the transposed-word effect is generalizable to English.

Actual Sample

Differences from pre-data collection methods plan

Results

Data preparation

Confirmatory analysis

Exploratory analyses

Discussion

Summary of Replication Attempt

Commentary