Introduction

This project is a replication of Study 1 in Luk and Surrain (2019). All relevant documentation can be found in this GitHub repository. The authors developed a scale measuring the perception of bilingualism, the 13-item Perception of Bilingualism scale (POB), and conducted psychometric analyses using both a classical test theory (CTT) and an item response theory (IRT) approach. The authors established unidimensionality of the POB scale using exploratory factor analysis (EFA; 80 % of variance explained by a single factor), yet a single-factor confirmatory analysis (CFA) produced insufficient model fit results (\(\chi\)2 (65) = 501.23) using multiple indices. In a graded response model, they flagged 3 out of 13 items as uninformative.

Further, using a multiple linear regression model, they showed that participants’ language background is consistently predictive of POB score, regardless of age, education, or sex. The direction of this observatino confirmed their hypothesis that bilingulism is perceived more positively by individuals speaking more than one language themselves.

This replication is important, (a) because it will solidify our knowledge about how bilingual inidividuals perceive their bilingual status and (b) because, to date, there is no reliable scale measuring how bilingualism is perceived by individuals. Obtaining this information can help inform policy and education and aids in assessing how successfuly the current state of research findings on multilingualism have been disseminated to the public.

Methods

Power Analysis

In the original study, multiple linear regression model effect sizes (R2) ranged from .15 to .24. Hence, for the replication of the regression analyses, to achieve an effect size of R2 = .24, sample sizes of N = 47, 58, and 69 are needed to achieve statistical power of 80 %, 90 %, and 95 %, respectively. Effects and reliability of the IRT analysis, if the sample size allows, are yet to be determined.

Planned Sample

The authors of the original paper recruited US-American participants via Qualtrics Panel and Amazon MechanicalTurk, N = 422. For the replication, all participants will be recruited using Amazon MechanicalTurk. The use of the same sampling frame increases the likelihood of obtaining similar sample characteristics. Given the power calculations in the previous section, the desired sample size for the regression anlaysis, after exclusions due to non-attention/comphrehension, is N = 70. In order to replicate the IRT analysis, however, a much larger sample is needed–the feasability of this is yet to be determined. Any IRT replications will be tentative at most.

Materials

The replication will make use of the following materials, as did the original study. The POB is available as part of the original paper and I will contact the authors to ask for permission to use it for this replication as well as to obtain the other materials’ precise wordings.

  • Perception of Bilingualism scale; Luk and Surrain (2019, pp. 12-13) described its development as follows: “An initial set of 13 items was developed based on our review of the literature, cognitive interviews, and input from members of the research team in our lab who have worked with linguistically diverse populations across the lifespan. The initial set of items […] covered perceptions of whether speaking multiple languages in the U.S. should be acknowledged, accommodated, rewarded and supported; whether speaking multiple languages in the U.S. is needed and valued; and whether speaking multiple languages incurs personal benefits and costs. Several items were adapted from Baker’s Attitude to Bilingualism Scale (21) and Byrnes and Kiger’s Language Attitudes of Teachers Scale (LATS; 33,34). We chose to use a 6-point Likert scale from 1 (strongly disagree) to 6 (strongly agree) with no midpoint option elicit greater variability and discourage satisficing, or providing a response without expending the cognitive effort required to fully interpret and respond to each item (55,56)”;
  • Demographic questionnaire;
  • 10 items from MacPhee’s Knowledge of Infant Development Index;
  • multiple attention and comprehension checks.

Procedure

In one combined Qualtrics survey, participants will give informed consent to participation, complete the POB, respond to a basic set of questions about their demographics, educational attainment, and language background, and complete 10 items from MacPhee’s Knowledge of Infant Development Index. In the original paper, median survey completion time was 13 minutes, which I expect to be similar in the replication.

Analysis Plan

The analysis plan mirrors that of the orginal paper: All data from participants who fail either the attention or comprehension check will be excluded. A descriptive overview showing demographic characteristics of the resultant sample will be provided. For the psychometric analysis of the POB, I will conduct a CTT analysis, followed by an EFA and CFA to ensure the conditions for a subsequent IRT analysis are met. The IRT analysis then spells out item characteristics for all 13 items, including category-characteristic curves showing discrimination and location paramenters, as well as item information curves. Based on the item parameters, analyses will be repeated for subsets of items after exclusions of problematic items. Finally, correlations between POB scores, age, sex, language background, and years of education were explored and all predictors will be entered into a multiple linear regression model, exactly as in the original papers.

Differences from Original Study

While the original study recruited participants using both Qualtrics Panel and Amazon MechanicalTurk, the replication will use only the latter. Further, for the replication, I will most likely not be able to systematically oversample to guarantee a sufficiently large representation of parents of children exposed to both Spanish and English. In light of the fact that Luk and Surrain did not provide results split by whether or not participants were parents, the effect this sampling different produced is impossible to predict. Lastly, given that only 4 % of participants in the original study took up the opportunity to complete the survey in Spanish, the replication will only use an English version. Overall, the replication will remain very close to the original study; hence, it is reasonable to expect very similar results.

Results

Data preparation

Data preparation will follow the analysis plan outlined above. Qualtrics survey results will be downloaded as a csv file, imported into R, and cleaned using the tidyverse.

install.packages("tidyverse")
## 
## The downloaded binary packages are in
##  /var/folders/vf/l3pzhlx13p9dp81b85ndn5gh0000gn/T//RtmpTHCZxs/downloaded_packages
install.packages("readr")
## 
## The downloaded binary packages are in
##  /var/folders/vf/l3pzhlx13p9dp81b85ndn5gh0000gn/T//RtmpTHCZxs/downloaded_packages
install.packages("ltm")
## 
## The downloaded binary packages are in
##  /var/folders/vf/l3pzhlx13p9dp81b85ndn5gh0000gn/T//RtmpTHCZxs/downloaded_packages
install.packages("psych")
## 
## The downloaded binary packages are in
##  /var/folders/vf/l3pzhlx13p9dp81b85ndn5gh0000gn/T//RtmpTHCZxs/downloaded_packages
library(tidyverse)
library(readr)
library(ltm)
library(psych)
## 
## Attaching package: 'psych'
## The following object is masked from 'package:ltm':
## 
##     factor.scores
## The following object is masked from 'package:polycor':
## 
##     polyserial
## The following objects are masked from 'package:ggplot2':
## 
##     %+%, alpha
# import data from csv (Qualtrics output)
d1 <- read.csv("~/Documents/Stanford/Academics/PSYCH251/luk2019/writeup/data/luk2019_replication_pilotA_October+27%2C+2019_01.06.csv") #, col_names=TRUE)

# remove rows 1 & 2 containing duplicate header & other superfluos variable information
d2 <- d1[-c(1,2),]

# remove colums 'StartDate, EndDate, IPAddress, RecordedDate, RecipientLastName, RecipientFirstName, RecipientEmail, ExternalReference, LocationLatitude, LocationLongitude, DistributionChannel, UserLanguage'

d3 <- d2[,-c(1:4,8:17), drop=T]

# change class of PoB responses from character to numeric
    # 1 - Strongly disagree
    # 2 - Disagree
    # 3 - Somewhat disagree
    # 4 - Somewhat agree
    # 5 - Agree
    # 6 - Strongly agree

d3$PoB1 <- as.numeric(d3$PoB1)
d3$PoB2 <- as.numeric(d3$PoB2)
d3$PoB3R <- as.numeric(d3$PoB3R)
d3$PoB4 <- as.numeric(d3$PoB4)
d3$PoB5R <- as.numeric(d3$PoB5R)
d3$PoB6 <- as.numeric(d3$PoB6)
d3$PoB7 <- as.numeric(d3$PoB7)
d3$PoB8 <- as.numeric(d3$PoB8)
d3$PoB9 <- as.numeric(d3$PoB9)
d3$PoB10 <- as.numeric(d3$PoB10)
d3$PoB11R <- as.numeric(d3$PoB11R)
d3$PoB12 <- as.numeric(d3$PoB12)
d3$PoB13 <- as.numeric(d3$PoB13)

# create separate dataframe for PoB scale analysis
PoB <- d3[,c(4:16)]

# change class of demographic variables from character to numeric & add descriptors
    #L1
        # 1 - English
        # 2 - Spanish
        # 3 - Other
    d3$L1 <- as.numeric(d3$L1)
    
    #L2
        # 1 - English
        # 2 - Spanish
        # 3 - Other
        # 4 - None
    d3$L2 <- as.numeric(d3$L2)

    # Education
        # 1 - ...
        # 2 - ...
    d3$Education <- as.numeric(d3$Education)
   
    # Age
    d3$Age <- as.numeric(d3$Age)

    # Birthplace
        # 1 - US
        # 2 - Other
    d3$Birthplace <- as.numeric(d3$Birthplace)

    # Ethnicity
        # 1 - White
        # 2 - ...
        # 3 - ...
    d3$Ethnicity <- as.numeric(d3$Ethnicity)
    
    # Sex
        # 1 - Male
        # 2 - Female
        # 3 - Other
    d3$Sex <- as.numeric(d3$Sex)


# create variable "EO" (L1 English, no L2)
d3$EO <- ifelse(d3$L1 == 1, ifelse(d3$L2 == 4, 1, 0), 0)

# change name of PoB score variable
names(d3)[names(d3) == 'SC0'] <- 'PoB_Score'

After data cleaning, the resultant dataframe will contain participants’ POB scores, age, sex, language background, and years of education, as well as their responses to the linguistic profile questionnaire.

head(d3)
##   Progress Duration..in.seconds. Finished PoB1 PoB2 PoB3R PoB4 PoB5R PoB6
## 3      100                    31        1    1    1     1    1     1    1
## 4      100                    30        1    3    5     4    4     3    4
## 5      100                    78        1    4    3     6    3     5    3
## 6      100                   100        1    7    5     3    4     4    5
## 7      100                    90        1    4    4     5    5     4    4
## 8      100                    66        1    5    3     6    3     4    3
##   PoB7 PoB8 PoB9 PoB10 PoB11R PoB12 PoB13 L1 L1_2_TEXT L2 L2_3_TEXT
## 3    1    1    1     1      1     1     1  1            1          
## 4    5    8    6     5      5     5     5  1            1          
## 5    3    3    3     3      6     3     4  3            3          
## 6    4    6    5     5      3     6     7  4   Spanish  4   English
## 7    4    5    4     4      4     4     6  3            4    German
## 8    3    4    6     3      7     3     3  3            3          
##   Education Education_y Age Birthplace Birthplace_2_TEXT Ethnicity Sex
## 3         1               1          1                           1   1
## 4         1               1          1                           1   1
## 5         3               7          3                           3   3
## 6         4               5          4            Mexico         5   4
## 7         5               6          3                           6   4
## 8         5               3          3                           3   3
##   PoB_Score EO
## 3            0
## 4        46  0
## 5        19  0
## 6        58  0
## 7        41  0
## 8        25  0

Confirmatory analysis

All confirmatory analyses are specified in the analysis plan (see section Analysis Plan). Sample descriptives and between-groups differences will be obtained using the base and psych packages for R.

The main analysis is a multiple linear regression, carried out using the lm function. It could not be carried out on the pilot data collected, as the data did not show enough variability and contained at least one numeric constant.

If sample sizes allows (depending on availability of funds), I will conduct an IRT analysis. For this, I will follow Luk and Surrain’s approach and fit an exploratory factor model first, followed by a single-factor confirmatory factors analysis. As the authors of the original paper did, I will fit a graded response model to the data and then provide item (category) characterstic curves, item information curves, and a test information curve