Replication of Exploring second language learners’ grammaticality judgment performance in relation to task design features by Shiu, Yalçın, & Spada (2018, System)

Author

Replication Author: Tonya Murray (tonyamur@stanford.edu)

Published

October 5, 2025

Introduction

This study replicated “Exploring second language learners’ grammaticality judgment performance in relation to task design features” (Shiu, Yalçın, and Spada 2018a, 2018b). The original study was an investigation into whether two dimensions of modality (timed/untimed and aural/written) of a grammaticality judgement task (GJT) affected the performance of adult English language learners on two grammatical features of English (passive voice and past progressive tense). The study recruited 120 adult English-as-a-foreign-language (EFL) learners from one university in Taiwan. Participants were asked to judge items as grammatical or ungrammatical on four computer-based GJTs (two differed on the timed/untimed dimension and two differed on the aural/written dimension). Each GJT consisted of 60 items (30 grammatical and 30 ungrammatical). The items were written in either the passive voice or used the past progressive tense, features which were hypothesized to differ in terms of their learning difficulty. The results showed significant differences in performance with respect to all three variables: time constraint, modality, and grammatical feature. Although learners performed better on past progressive items, the GJT performance across both grammatical features showed similar patterns in relation to task design features.

I chose this study because it relates to my research that uses a digital adaptation of the Test for Reception of Grammar (Bishop 1992), an assessment of implicit English syntax knowledge. Preliminary results suggest that performance on this measure improves as L2 students in grades 2-5 gain proficiency in English. While the current task uses aural prompts with picture answers, I am interested in comparing aural and written modalities on the task. I am also interested in investigating additional English features (such as tense) that are better suited to written stimuli instead of pictoral stimuli, and in extending the measure to adolescent and adult learners.

The key features needed to implement a GJT already exist on the Rapid Online Assessment of Reading (ROAR) platform (Yeatman et al. 2021) – jsPsych infrastructure for playing audio clips, displaying written stimuli, and recording keyboard responses as well as standalone web links for Prolific studies, user management, and a database for storing the responses. I have created other ROAR apps and am confident I can easily convert an existing app into a GJT. The biggest challenges will be obtaining or creating the item stimuli and recruiting particiants who are L2 English learners. I plan to contact the authors of the study to ask if they will allow me to use the original 240 items. If the original items are unavailable I will use a large language model to assist with item creation. The original study was conducted in two sessions one week apart. At each session, participants took two GJT with a 30 minute break between them. In order to reduce the time required for the replication, only two GJT (comparing the aural/written condition) will be given, with a minimal break between them. If recruiting L2 English learners on Prolific is unsuccessful, I will seek permission to recruit Stanford students who are L2 English speakers.

Repository: murray2025
Original paper: Exploring second language learners’ grammaticality judgment performance in relation to task design features (Shiu, Yalçın, & Spada, 2018)

Methods

Power Analysis

Original effect size, power analysis for samples to achieve 80%, 90%, 95% power to detect that effect size. Considerations of feasibility for selecting planned sample size.

Planned Sample

Planned sample size and/or termination rule, sampling frame, known demographics if any, preselection rules if any.

Materials

All materials - can quote directly from original article - just put the text in quotations and note that this was followed precisely. Or, quote directly and just point out exceptions to what was described in the original article.

Procedure

Can quote directly from original article - just put the text in quotations and note that this was followed precisely. Or, quote directly and just point out exceptions to what was described in the original article.

Analysis Plan

Can also quote directly, though it is less often spelled out effectively for an analysis strategy section. The key is to report an analysis strategy that is as close to the original - data cleaning rules, data exclusion rules, covariates, etc. - as possible.

Clarify key analysis of interest here You can also pre-specify additional analyses you plan to do.

Differences from Original Study

Explicitly describe known differences in sample, setting, procedure, and analysis plan from original study. The goal, of course, is to minimize those differences, but differences will inevitably occur. Also, note whether such differences are anticipated to make a difference based on claims in the original article or subsequent published research on the conditions for obtaining the effect.

Methods Addendum (Post Data Collection)

You can comment this section out prior to final report with data collection.

Actual Sample

Sample size, demographics, data exclusions based on rules spelled out in analysis plan

Differences from pre-data collection methods plan

Any differences from what was described as the original plan, or “none”.

Results

Data preparation

Data preparation following the analysis plan.

Confirmatory analysis

The analyses as specified in the analysis plan.

Side-by-side graph with original graph is ideal here

Exploratory analyses

Any follow-up analyses desired (not required).

Discussion

Summary of Replication Attempt

Open the discussion section with a paragraph summarizing the primary result from the confirmatory analysis and the assessment of whether it replicated, partially replicated, or failed to replicate the original result.

Commentary

Add open-ended commentary (if any) reflecting (a) insights from follow-up exploratory analysis, (b) assessment of the meaning of the replication (or not) - e.g., for a failure to replicate, are the differences between original and present study ones that definitely, plausibly, or are unlikely to have been moderators of the result, and (c) discussion of any objections or challenges raised by the current and original authors about the replication attempt. None of these need to be long.

References

Bishop, Dorothy. 1992. T.r.o.g. Test for Reception of Grammar. Chapel Press.
Shiu, Li-Ju, Şebnem Yalçın, and Nina Spada. 2018a. “Exploring Second Language Learners Grammaticality Judgment Performance in Relation to Task Design Features.” System 72 (February): 215–25. https://doi.org/10.1016/j.system.2017.12.004.
———. 2018b. “Exploring Second Language Learners Grammaticality Judgment Performance in Relation to Task Design Features.” System 72 (February): 215–25. https://doi.org/10.1016/j.system.2017.12.004.
Yeatman, Jason D., Kenny An Tang, Patrick M. Donnelly, Maya Yablonski, Mahalakshmi Ramamurthy, Iliana I. Karipidis, Sendy Caffarra, et al. 2021. “Rapid Online Assessment of Reading Ability.” Scientific Reports 11 (1): 6396. https://doi.org/10.1038/s41598-021-85907-x.