Replication of Study X by Luk & Surrain (2019, PsyArXiv Preprints)

Introduction

This project is a replication of Study 1 in Luk and Surrain (2019). All relevant documentation can be found in this GitHub repository. The authors developed a scale measuring the perception of bilingualism, the 13-item Perception of Bilingualism scale (POB), and conducted psychometric analyses using both a classical test theory (CTT) and an item response theory (IRT) approach. The authors established unidimensionality of the POB scale using exploratory factor analysis (EFA; 80 % of variance explained by a single factor), yet a single-factor confirmatory analysis (CFA) produced insufficient model fit results (\(\chi\)² (65) = 501.23) using multiple indices. In a graded response model, they flagged 3 out of 13 items as uninformative.

Further, using a multiple linear regression model, they showed that participants’ language background is consistently predictive of POB score, regardless of age, education, or sex. The direction of this observatino confirmed their hypothesis that bilingulism is perceived more positively by individuals speaking more than one language themselves.

This replication is important, (a) because it will solidify our knowledge about how bilingual inidividuals perceive their bilingual status and (b) because, to date, there is no reliable scale measuring how bilingualism is perceived by individuals. Obtaining this information can help inform policy and education and aids in assessing how successfuly the current state of research findings on multilingualism have been disseminated to the public.

Methods

Power Analysis

In the original study, multiple linear regression model effect sizes (R²) ranged from .15 to .24. Hence, for the replication of the regression analyses, to achieve an effect size of R² = .24, sample sizes of N = 47, 58, and 69 are needed to achieve statistical power of 80 %, 90 %, and 95 %, respectively. Effects and reliability of the IRT analysis, if the sample size allows, are yet to be determined.

Planned Sample

The authors of the original paper recruited US-American participants via Qualtrics Panel and Amazon MechanicalTurk, N = 422. For the replication, all participants will be recruited using Amazon MechanicalTurk. The use of the same sampling frame increases the likelihood of obtaining similar sample characteristics. Given the power calculations in the previous section, the desired sample size for the regression anlaysis, after exclusions due to non-attention/comphrehension, is N = 70. In order to replicate the IRT analysis, however, a much larger sample is needed–the feasability of this is yet to be determined. Any IRT replications will be tentative at most.

Materials

The replication will make use of the following materials, as did the original study. The POB is available as part of the original paper and I will contact the authors to ask for permission to use it for this replication as well as to obtain the other materials’ precise wordings.

Perception of Bilingualism scale; Luk and Surrain (2019, pp. 12-13) described its development as follows: “An initial set of 13 items was developed based on our review of the literature, cognitive interviews, and input from members of the research team in our lab who have worked with linguistically diverse populations across the lifespan. The initial set of items […] covered perceptions of whether speaking multiple languages in the U.S. should be acknowledged, accommodated, rewarded and supported; whether speaking multiple languages in the U.S. is needed and valued; and whether speaking multiple languages incurs personal benefits and costs. Several items were adapted from Baker’s Attitude to Bilingualism Scale (21) and Byrnes and Kiger’s Language Attitudes of Teachers Scale (LATS; 33,34). We chose to use a 6-point Likert scale from 1 (strongly disagree) to 6 (strongly agree) with no midpoint option elicit greater variability and discourage satisficing, or providing a response without expending the cognitive effort required to fully interpret and respond to each item (55,56)”;
Demographic questionnaire;
10 items from MacPhee’s Knowledge of Infant Development Index;
multiple attention and comprehension checks.

Procedure

In one combined Qualtrics survey, participants will give informed consent to participation, complete the POB, respond to a basic set of questions about their demographics, educational attainment, and language background, and complete 10 items from MacPhee’s Knowledge of Infant Development Index. In the original paper, median survey completion time was 13 minutes, which I expect to be similar in the replication.

Analysis Plan

The analysis plan mirrors that of the orginal paper: All data from participants who fail either the attention or comprehension check will be excluded. A descriptive overview showing demographic characteristics of the resultant sample will be provided. For the psychometric analysis of the POB, I will conduct a CTT analysis, followed by an EFA and CFA to ensure the conditions for a subsequent IRT analysis are met. The IRT analysis then spells out item characteristics for all 13 items, including category-characteristic curves showing discrimination and location paramenters, as well as item information curves. Based on the item parameters, analyses will be repeated for subsets of items after exclusions of problematic items. Finally, correlations between POB scores, age, sex, language background, and years of education were explored and all predictors will be entered into a multiple linear regression model, exactly as in the original papers.

Differences from Original Study

While the original study recruited participants using both Qualtrics Panel and Amazon MechanicalTurk, the replication will use only the latter. Further, for the replication, I will most likely not be able to systematically oversample to guarantee a sufficiently large representation of parents of children exposed to both Spanish and English. In light of the fact that Luk and Surrain did not provide results split by whether or not participants were parents, the effect this sampling different produced is impossible to predict. Lastly, given that only 4 % of participants in the original study took up the opportunity to complete the survey in Spanish, the replication will only use an English version. Overall, the replication will remain very close to the original study; hence, it is reasonable to expect very similar results.