Introduction

In their paper, Wang et al. (2017) conducted a study examining the intuition of native Chinese speakers regarding word segmentation. They presented sentences containing keywords of varying semantic transparency to participants, and had them complete a segmentation task in which they were to insert a marker at each perceived word boundary. Three metrics were then calculated from these data: proportionate agreement, Cohen’s kappa, and Fleiss’ kappa. Based on the results, the authors concluded that agreement in word segmentation among Chinese speakers was almost perfect, and that semantic transparency did not affect that agreement. This paper is based on the first author’s PhD thesis (2016).

As a field, formal linguistics has been slow to adopt good, open, and reproducible science practices, and many papers in syntax and semantics rely solely on the intuitions of the author, or possibly a small number of informants (see Juzek, 2016). This paper represents part of a movement towards meaningful linguistic data collection, making use of formal judgements as a means to measure speakers’ intuitions about their native language. As such, conducting a replication study of this paper sets an important precedent for good research practices in formal linguistics, and helps to encourage conversation about linguistic methodology, which is my area of research interest.

The materials needed to replicate this study include the test sentences (which serve as stimuli for the segmentation task), as well as semantic transparency data (which are used as predictors for analysis). The study also employed demographic questions and a set of screening questions. These materials are available as appendices in Wang’s PhD thesis linked above. The original experiment was conducted online on as a questionnaire on a crowdsourcing platform, and this procedure will also be adopted for the replication study.

Since the materials, procedures, and analyses have been described in Wang’s PhD thesis to quite a good degree of detail, a reimplementation of the experimental and analytical designs is likely to be relatively straightforward. One critical challenge, however, would be the recruitment of enough participants; in particular, the authors mentioned that they took 1.3 months to collect the number of responses they had planned for. As such, selecting an appropriate sample size that is both feasible and sufficiently powered will be crucial for this replication.

The repository for this project is hosted on GitHub.

Methods

Power Analysis

Original effect size, power analysis for samples to achieve 80%, 90%, 95% power to detect that effect size. Considerations of feasibility for selecting planned sample size.

Planned Sample

Planned sample size and/or termination rule, sampling frame, known demographics if any, preselection rules if any.

Materials

All materials - can quote directly from original article - just put the text in quotations and note that this was followed precisely. Or, quote directly and just point out exceptions to what was described in the original article.

Procedure

Can quote directly from original article - just put the text in quotations and note that this was followed precisely. Or, quote directly and just point out exceptions to what was described in the original article.

Analysis Plan

Can also quote directly, though it is less often spelled out effectively for an analysis strategy section. The key is to report an analysis strategy that is as close to the original - data cleaning rules, data exclusion rules, covariates, etc. - as possible.

Clarify key analysis of interest here You can also pre-specify additional analyses you plan to do.

Differences from Original Study

Explicitly describe known differences in sample, setting, procedure, and analysis plan from original study. The goal, of course, is to minimize those differences, but differences will inevitably occur. Also, note whether such differences are anticipated to make a difference based on claims in the original article or subsequent published research on the conditions for obtaining the effect.

Methods Addendum (Post Data Collection)

You can comment this section out prior to final report with data collection.

Actual Sample

Sample size, demographics, data exclusions based on rules spelled out in analysis plan

Differences from pre-data collection methods plan

Any differences from what was described as the original plan, or “none”.

Results

Data preparation

Data preparation following the analysis plan.

Confirmatory analysis

The analyses as specified in the analysis plan.

Side-by-side graph with original graph is ideal here

Exploratory analyses

Any follow-up analyses desired (not required).

Discussion

Summary of Replication Attempt

Open the discussion section with a paragraph summarizing the primary result from the confirmatory analysis and the assessment of whether it replicated, partially replicated, or failed to replicate the original result.

Commentary

Add open-ended commentary (if any) reflecting (a) insights from follow-up exploratory analysis, (b) assessment of the meaning of the replication (or not) - e.g., for a failure to replicate, are the differences between original and present study ones that definitely, plausibly, or are unlikely to have been moderators of the result, and (c) discussion of any objections or challenges raised by the current and original authors about the replication attempt. None of these need to be long.