Replication of Study 1 by Luk & Surrain (2019, PsyArXiv Preprints)

Introduction

This project is a replication of Study 1 in Luk and Surrain (2019). All relevant documentation can be found in this GitHub repository. The authors developed a scale measuring the perception of bilingualism, the 13-item Perception of Bilingualism scale (POB), and conducted psychometric analyses using both a classical test theory (CTT) and an item response theory (IRT) approach. The authors established unidimensionality of the POB scale using exploratory factor analysis (EFA; 80 % of variance explained by a single factor), yet a single-factor confirmatory analysis (CFA) produced insufficient model fit results (\(\chi\)² (65) = 501.23) using multiple indices. In a graded response model, they flagged 3 out of 13 items as uninformative.

Further, using a multiple linear regression model, they showed that participants’ language background is consistently predictive of POB score, regardless of age, education, or sex. The direction of this observatino confirmed their hypothesis that bilingulism is perceived more positively by individuals speaking more than one language themselves.

This replication is important, (a) because it will solidify our knowledge about how bilingual inidividuals perceive their bilingual status and (b) because, to date, there is no reliable scale measuring how bilingualism is perceived by individuals. Obtaining this information can help inform policy and education and aids in assessing how successfuly the current state of research findings on multilingualism have been disseminated to the public.

Methods

Power Analysis

In the original study, multiple linear regression model effect sizes (R²) ranged from .15 in their simple-most model to .24 in a four-predictor model with interaction effect (Table 5 in the original report). Hence, for the replication of the regression analyses, to replicate a regression model with R² = .24, sample sizes of N = 47, 58, and 69 are needed to achieve statistical power of 80 %, 90 %, and 95 %, respectively.

Planned Sample

The authors of the original paper recruited US-American participants via Qualtrics Panel and Amazon MechanicalTurk, N = 422. For the replication, all participants will be recruited using Amazon MechanicalTurk. The use of the same sampling frame increases the likelihood of obtaining similar sample characteristics. Given the power calculations in the previous section, the desired sample size for the regression anlaysis, after exclusions due to non-attention/comphrehension, is N = 70. In order to replicate the IRT analysis, however, a much larger sample is needed–this is unfeasible under the current circumstances.

Materials

The replication will make use of the same materials as the original study. The POB is available as part of the original paper. Though not the original demographic questionnaire, the current version of the survey does, nonetheless, cover all the demographic variables of interest. Materials are:

Perception of Bilingualism scale; Luk and Surrain (2019, pp. 12-13) described its development as follows: “An initial set of 13 items was developed based on our review of the literature, cognitive interviews, and input from members of the research team in our lab who have worked with linguistically diverse populations across the lifespan. The initial set of items […] covered perceptions of whether speaking multiple languages in the U.S. should be acknowledged, accommodated, rewarded and supported; whether speaking multiple languages in the U.S. is needed and valued; and whether speaking multiple languages incurs personal benefits and costs. Several items were adapted from Baker’s Attitude to Bilingualism Scale (21) and Byrnes and Kiger’s Language Attitudes of Teachers Scale (LATS; 33,34). We chose to use a 6-point Likert scale from 1 (strongly disagree) to 6 (strongly agree) with no midpoint option elicit greater variability and discourage satisficing, or providing a response without expending the cognitive effort required to fully interpret and respond to each item”. As per the authors’ explicit concerns, the PoB will be availabel in both Spanish and English;
Demographic questionnaire;
Attention check item inserted inmidst the PoB.

Procedure

In one combined Qualtrics survey, participants will give informed consent to participation, complete the POB, interspersed with an attention check item, and respond to a basic set of questions about their demographics, educational attainment, and language background. In the original paper, median survey completion time was 13 minutes. Given that the replication focuses on Part 1 of the original paper, which does neither require participants to complete the Knowledge of infant development inventory nor the PoB+, I expect the survey to be completed in under 10 minutes.

Analysis Plan

The analysis plan mirrors that of the orginal paper: All data from participants who fail the attention will be excluded. I will provide a descriptive overview showing demographic characteristics of the sample. For the psychometric analysis of the POB, I will conduct a CTT analysis. Given that the small sample size, I will not conduct the EFA, CFA, and IRT analysis. Finally, correlations between POB scores, age, sex, language background, and years of education will be explored and all predictors will be entered into a multiple linear regression model, exactly as in the original paper.

Differences from Original Study

While the original study recruited participants using both Qualtrics Panel and Amazon MechanicalTurk, the replication will use only the latter. Further, for the replication, I will most likely not be able to systematically oversample to guarantee a sufficiently large representation of parents of children exposed to both Spanish and English. In light of the fact that Luk and Surrain did not provide results split by whether or not participants were parents, the effect this sampling different produced is impossible to predict. Overall, the replication will remain very close to the original study; hence, it is reasonable to expect very similar results.

Results

Data preparation

Data preparation will follow the analysis plan outlined above, using the following R packages.

library(tidyverse)

## ── Attaching packages ───────────────────────────────────────────── tidyverse 1.2.1 ──

## ✔ ggplot2 3.2.1     ✔ purrr   0.3.3
## ✔ tibble  2.1.3     ✔ dplyr   0.8.3
## ✔ tidyr   1.0.0     ✔ stringr 1.4.0
## ✔ readr   1.3.1     ✔ forcats 0.4.0

## ── Conflicts ──────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

library(readr)
library(ltm)

## Loading required package: MASS

## 
## Attaching package: 'MASS'

## The following object is masked from 'package:dplyr':
## 
##     select

## Loading required package: msm

## Loading required package: polycor

library(psych)

## 
## Attaching package: 'psych'

## The following object is masked from 'package:ltm':
## 
##     factor.scores

## The following object is masked from 'package:polycor':
## 
##     polyserial

## The following objects are masked from 'package:ggplot2':
## 
##     %+%, alpha

library(qualtRics) # requires devtools install 
library(here) # requires devtools install

## here() starts at /Users/juliansiebert/Documents/Stanford/Academics/PSYCH251/luk2019

library(dplyr)
library(table1)

## 
## Attaching package: 'table1'

## The following objects are masked from 'package:base':
## 
##     units, units<-

library(corrr)

First, I will download the Qualtrics survey results as a csv file, import them into R, and delete irrelevant variables.

# import data from csv (Qualtrics output), in R-friendly format, i.e. without unneccessary header rows
here()

## [1] "/Users/juliansiebert/Documents/Stanford/Academics/PSYCH251/luk2019"

d1 <- readSurvey("data/luk2019_replication_pilotA_November+26%2C+2019_06.12.csv")

## Parsed with column specification:
## cols(
##   .default = col_character(),
##   StartDate = col_datetime(format = ""),
##   EndDate = col_datetime(format = ""),
##   Progress = col_double(),
##   `Duration (in seconds)` = col_double(),
##   Finished = col_logical(),
##   RecordedDate = col_datetime(format = ""),
##   RecipientLastName = col_logical(),
##   RecipientFirstName = col_logical(),
##   RecipientEmail = col_logical(),
##   ExternalReference = col_logical(),
##   LocationLatitude = col_double(),
##   LocationLongitude = col_double(),
##   AttentionCheck_S = col_logical(),
##   Education_y = col_double(),
##   Age = col_double(),
##   SC0 = col_double(),
##   `Random ID` = col_double()
## )

## See spec(...) for full column specifications.

# drop irrelevant Qualtrics variables
d1 <- d1 %>%
  dplyr::select(-c(StartDate, EndDate, Status, IPAddress, RecordedDate, ResponseId, RecipientFirstName, RecipientLastName, RecipientEmail, ExternalReference, Finished, DistributionChannel))

Next, I will evaluate the attention checks, recod them as 1 when passed and 0 when failed. I will then exclude all participants who did not pass the attention check.

d2 <- d1 %>%
  filter(AttentionCheck == "Strongly disagree" | AttentionCheck_S == "Totalmente en desacuerdo") %>%
  mutate(AttentionCheck = ifelse(AttentionCheck == "Strongly disagree", 1, ifelse(AttentionCheck_S == "Totalmente en desacuerdo", 1,0))) %>%
  dplyr::select(-c(AttentionCheck_S))

Then, I will create a dummy variable for monolingual English status and rename the PoB score variable to PoB13_Total.

# create variable "EO" (L1 English, no L2)
d2 <- d2 %>%
  mutate(EO = if_else(L1 == "English", if_else(L2 == "I do not speak a second language.", 1, 0), 0))

# rename SC0 to PoB13_Total
d2 <- d2 %>%
  mutate(PoB13_Total = SC0) %>%
  dplyr::select(-SC0)

Further, I will unify responses to the English and Spanish PoB scales into one variable,Lg_decision; the variable indicating the language in which the survey was taken will be retained). Subsequently, I will delete the separate PoB response variables and convert the unified ones to numeric values.

# unifying PoB responses (inserting Spanish responses into new unified variable for each item)
d2 <- d2 %>%
  mutate(PoB_1 = if_else(Lg_decision == "Español", PoB1_S, PoB1),
         PoB_2 = ifelse(Lg_decision == "Español", PoB2_S, PoB2),
         PoB_4 = ifelse(Lg_decision == "Español", PoB4_S, PoB4),
         PoB_6 = ifelse(Lg_decision == "Español", PoB6_S, PoB6),
         PoB_7 = ifelse(Lg_decision == "Español", PoB7_S, PoB7),
         PoB_8 = ifelse(Lg_decision == "Español", PoB8_S, PoB8),
         PoB_9 = ifelse(Lg_decision == "Español", PoB9_S, PoB9),
         PoB_10 = ifelse(Lg_decision == "Español", PoB10_S, PoB10),
         PoB_12 = ifelse(Lg_decision == "Español", PoB12_S, PoB12),
         PoB_13 = ifelse(Lg_decision == "Español", PoB13_S, PoB13))

# deleting old separate PoB item variables)
d2 <- d2 %>%
    dplyr::select(-c(PoB1, PoB2, PoB4, PoB6, PoB7, PoB8, PoB9, PoB10, PoB12, PoB13, PoB1_S, PoB2_S, PoB4_S, PoB6_S, PoB7_S, PoB8_S, PoB9_S, PoB10_S, PoB12_S, PoB13_S))


# converting new PoB item variables from strings to corresponding levels (integer, yet still character variable), reverse scoring for items 3R, 5R, 11R
d2 <- d2%>%
  mutate(PoB_1 = ifelse(PoB_1 == "Strongly disagree" | PoB_1 == "Totalmente en desacuerdo", 1, ifelse(PoB_1 == "Disagree" | PoB_1 == "En desacuerdo", 2, ifelse(PoB_1 == "Somewhat disagree" | PoB_1 == "Parcialmente en desacuerdo", 3, ifelse(PoB_1 == "Somewhat agree" | PoB_1 == "Parcialmente de acuerdo", 4, ifelse(PoB_1 == "Agree" | PoB_1 == "De acuerdo", 5, ifelse(PoB_1 == "Strongly agree" | PoB_1 == "Totalmente de acuerdo", 6, "N/A")))))),
         PoB_2 = ifelse(PoB_2 == "Strongly disagree" | PoB_2 == "Totalmente en desacuerdo", 1, ifelse(PoB_2 == "Disagree" | PoB_2 == "En desacuerdo", 2, ifelse(PoB_2 == "Somewhat disagree" | PoB_2 == "Parcialmente en desacuerdo", 3, ifelse(PoB_2 == "Somewhat agree" | PoB_2 == "Parcialmente de acuerdo", 4, ifelse(PoB_2 == "Agree" | PoB_2 == "De acuerdo", 5, ifelse(PoB_2 == "Strongly agree" | PoB_2 == "Totalmente de acuerdo", 6, "N/A")))))),
         PoB_4 = ifelse(PoB_4 == "Strongly disagree" | PoB_4 == "Totalmente en desacuerdo", 1, ifelse(PoB_4 == "Disagree" | PoB_4 == "En desacuerdo", 2, ifelse(PoB_4 == "Somewhat disagree" | PoB_4 == "Parcialmente en desacuerdo", 3, ifelse(PoB_4 == "Somewhat agree" | PoB_4 == "Parcialmente de acuerdo", 4, ifelse(PoB_4 == "Agree" | PoB_4 == "De acuerdo", 5, ifelse(PoB_4 == "Strongly agree" | PoB_4 == "Totalmente de acuerdo", 6, "N/A")))))),
         PoB_6 = ifelse(PoB_6 == "Strongly disagree" | PoB_6 == "Totalmente en desacuerdo", 1, ifelse(PoB_6 == "Disagree" | PoB_6 == "En desacuerdo", 2, ifelse(PoB_6 == "Somewhat disagree" | PoB_6 == "Parcialmente en desacuerdo", 3, ifelse(PoB_6 == "Somewhat agree" | PoB_6 == "Parcialmente de acuerdo", 4, ifelse(PoB_6 == "Agree" | PoB_6 == "De acuerdo", 5, ifelse(PoB_6 == "Strongly agree" | PoB_6 == "Totalmente de acuerdo", 6, "N/A")))))),
         PoB_7 = ifelse(PoB_7 == "Strongly disagree" | PoB_7 == "Totalmente en desacuerdo", 1, ifelse(PoB_7 == "Disagree" | PoB_7 == "En desacuerdo", 2, ifelse(PoB_7 == "Somewhat disagree" | PoB_7 == "Parcialmente en desacuerdo", 3, ifelse(PoB_7 == "Somewhat agree" | PoB_7 == "Parcialmente de acuerdo", 4, ifelse(PoB_7 == "Agree" | PoB_7 == "De acuerdo", 5, ifelse(PoB_7 == "Strongly agree" | PoB_7 == "Totalmente de acuerdo", 6, "N/A")))))),
         PoB_8 = ifelse(PoB_8 == "Strongly disagree" | PoB_8 == "Totalmente en desacuerdo", 1, ifelse(PoB_8 == "Disagree" | PoB_8 == "En desacuerdo", 2, ifelse(PoB_8 == "Somewhat disagree" | PoB_8 == "Parcialmente en desacuerdo", 3, ifelse(PoB_8== "Somewhat agree" | PoB_8 == "Parcialmente de acuerdo", 4, ifelse(PoB_8 == "Agree" | PoB_8 == "De acuerdo", 5, ifelse(PoB_8 == "Strongly agree" | PoB_8 == "Totalmente de acuerdo", 6, "N/A")))))),
         PoB_9= ifelse(PoB_9 == "Strongly disagree" | PoB_9 == "Totalmente en desacuerdo", 1, ifelse(PoB_9 == "Disagree" | PoB_9 == "En desacuerdo", 2, ifelse(PoB_9 == "Somewhat disagree" | PoB_9 == "Parcialmente en desacuerdo", 3, ifelse(PoB_9 == "Somewhat agree" | PoB_9 == "Parcialmente de acuerdo", 4, ifelse(PoB_9 == "Agree" | PoB_9 == "De acuerdo", 5, ifelse(PoB_9 == "Strongly agree" | PoB_9 == "Totalmente de acuerdo", 6, "N/A")))))),
         PoB_10 = ifelse(PoB_10 == "Strongly disagree" | PoB_10 == "Totalmente en desacuerdo", 1, ifelse(PoB_10 == "Disagree" | PoB_10 == "En desacuerdo", 2, ifelse(PoB_10 == "Somewhat disagree" | PoB_10 == "Parcialmente en desacuerdo", 3, ifelse(PoB_10 == "Somewhat agree" | PoB_10 == "Parcialmente de acuerdo", 4, ifelse(PoB_10 == "Agree" | PoB_10 == "De acuerdo", 5, ifelse(PoB_10 == "Strongly agree" | PoB_10 == "Totalmente de acuerdo", 6, "N/A")))))),
         PoB_12 = ifelse(PoB_12 == "Strongly disagree" | PoB_12 == "Totalmente en desacuerdo", 1, ifelse(PoB_12 == "Disagree" | PoB_12 == "En desacuerdo", 2, ifelse(PoB_12 == "Somewhat disagree" | PoB_12 == "Parcialmente en desacuerdo", 3, ifelse(PoB_12 == "Somewhat agree" | PoB_12 == "Parcialmente de acuerdo", 4, ifelse(PoB_12 == "Agree" | PoB_12 == "De acuerdo", 5, ifelse(PoB_12 == "Strongly agree" | PoB_12 == "Totalmente de acuerdo", 6, "N/A")))))),
         PoB_13 = ifelse(PoB_13 == "Strongly disagree" | PoB_13 == "Totalmente en desacuerdo", 1, ifelse(PoB_13 == "Disagree" | PoB_13 == "En desacuerdo", 2, ifelse(PoB_13 == "Somewhat disagree" | PoB_13 == "Parcialmente en desacuerdo", 3, ifelse(PoB_13 == "Somewhat agree" | PoB_13 == "Parcialmente de acuerdo", 4, ifelse(PoB_13 == "Agree" | PoB_13 == "De acuerdo", 5, ifelse(PoB_13 == "Strongly agree" | PoB_13 == "Totalmente de acuerdo", 6, "")))))))

# turn newly computed variables from character to numeric
d2$PoB_1 <- as.numeric(d2$PoB_1)
d2$PoB_2 <- as.numeric(d2$PoB_2)
d2$PoB_4 <- as.numeric(d2$PoB_4)
d2$PoB_6 <- as.numeric(d2$PoB_6)
d2$PoB_7 <- as.numeric(d2$PoB_7)
d2$PoB_8 <- as.numeric(d2$PoB_8)
d2$PoB_9 <- as.numeric(d2$PoB_9)
d2$PoB_10 <- as.numeric(d2$PoB_10)
d2$PoB_12 <- as.numeric(d2$PoB_12)
d2$PoB_13 <- as.numeric(d2$PoB_13)

Then, I will compute a new PoB score variable (PoB10_Total), denoting the total score a shortened version of the PoB which Luk and Surrain determined to have improved psychometric properties and which they used for subsequent analyses.

# compute new PoB10_Total
d2 <- d2 %>%
  mutate(PoB10_Total = PoB_1 + PoB_2 + PoB_4 + PoB_6 + PoB_7 + PoB_8 + PoB_9 + PoB_10 + PoB_12 + PoB_13)

Then, after some practical changes to the order of columns, I will create two separate subsets of the data: * An anonymous subset of the data, for which I will strip IP addresses and location data. * A dataframe containing only the PoB scale responses.

# changes to the order of columns
d2 <- d2  %>%
  dplyr::select(-c(Progress, "Duration (in seconds)", LocationLongitude, LocationLatitude, UserLanguage, Consent, L1_2_TEXT, L2_3_TEXT, Education_y, Birthplace_2_TEXT, PoB13_Total), c(PoB13_Total, Progress, "Duration (in seconds)", LocationLongitude, LocationLatitude, UserLanguage, Consent, L1_2_TEXT, L2_3_TEXT, Education_y, Birthplace_2_TEXT))

# create separate dataframe for PoB scale analysis
PoB <- d2 %>%
  dplyr::select(c(PoB_1, PoB_2, PoB_4, PoB_6, PoB_7, PoB_8, PoB_9, PoB_10, PoB_12, PoB_13))

# create subsets for data with location information (IP address, latitude, and longitude)
location_data = d2

anonymous_data <- d2 %>%
  dplyr::select(-c(LocationLatitude, LocationLongitude))

After data cleaning, the resultant dataframe (anonymous_data) will contain some session information, participants’ POB scores, age, sex, language background, and years of education, as well as their responses to the linguistic profile questionnaire.

head(anonymous_data)

## # A tibble: 4 x 32
##   Lg_decision AttentionCheck L1    L2    Education   Age Birthplace
##   <chr>                <dbl> <chr> <chr> <chr>     <dbl> <chr>     
## 1 English                  1 Engl… I do… Graduate…    28 In the US 
## 2 English                  1 Engl… I do… Some col…    54 In the US 
## 3 English                  1 Engl… Othe… College …    25 In the US 
## 4 English                  1 Span… Engl… Some hig…    32 In the US 
## # … with 25 more variables: Ethnicity <chr>, Sex <chr>, Q44 <chr>, `Random
## #   ID` <dbl>, EO <dbl>, PoB_1 <dbl>, PoB_2 <dbl>, PoB_4 <dbl>,
## #   PoB_6 <dbl>, PoB_7 <dbl>, PoB_8 <dbl>, PoB_9 <dbl>, PoB_10 <dbl>,
## #   PoB_12 <dbl>, PoB_13 <dbl>, PoB10_Total <dbl>, PoB13_Total <dbl>,
## #   Progress <dbl>, `Duration (in seconds)` <dbl>, UserLanguage <chr>,
## #   Consent <chr>, L1_2_TEXT <chr>, L2_3_TEXT <chr>, Education_y <dbl>,
## #   Birthplace_2_TEXT <chr>

Confirmatory analysis

Sample descriptives and between-groups differences will be displayed using the table1 package.

# Sample descriptives 
table1::table1(~Age + Sex + Birthplace + Education + Ethnicity, data = anonymous_data)

	Overall (n=4)
How old are you?
Mean (SD)	34.8 (13.1)
Median [Min, Max]	30.0 [25.0, 54.0]
Please identify your sex.
Female	1 (25.0%)
Male	3 (75.0%)
Selected Choice
In the US	4 (100%)
What is your highest level of education?
College graduate	1 (25.0%)
Graduate school degree	1 (25.0%)
Some college or ass. degree	1 (25.0%)
Some high school or less	1 (25.0%)
Please select your ethnicity.
Hispanic	1 (25.0%)
White alone (not Hispanic)	3 (75.0%)

All confirmatory analyses are specified in the analysis plan (see section Analysis Plan). Following a correlations table, the main analysis is a multiple linear regression, carried out using the lm function. Luk and Surrain’s (2019) final model will be replicated as per the below.

# prepration for correlation tables
# creating separate df for correlations_data
correlations_data <- anonymous_data %>%
  dplyr::select(c(PoB10_Total, EO, Age, Sex, Education_y))

# changing Sex to numeric with Female = 1 
correlations_data <- correlations_data %>%
  mutate(Sex = if_else(Sex == "Female", 1, 0))

# creating & showing cor_df
correlations <- corrr::correlate(correlations_data)

## 
## Correlation method: 'pearson'
## Missing treated using: 'pairwise.complete.obs'

correlations

## # A tibble: 5 x 6
##   rowname     PoB10_Total     EO     Age    Sex Education_y
##   <chr>             <dbl>  <dbl>   <dbl>  <dbl>       <dbl>
## 1 PoB10_Total      NA      0.408  0.645   0.471     -0.638 
## 2 EO                0.408 NA      0.549   0.577      0.391 
## 3 Age               0.645  0.549 NA       0.976      0.0143
## 4 Sex               0.471  0.577  0.976  NA          0.225 
## 5 Education_y      -0.638  0.391  0.0143  0.225     NA

# Multiple linear regression predicting PoB score from same demograhpic and linguistic variables as in original paper

# fit1 predictors: EO
# fit2 predictors: EO, Education_y
# fit3 predictors: EO, Education_y, age 
# fit4 predictors: EO, Education_y, age, sex,
# fit4 predictors: EO, Education_y, age, sex, parent

fit1 <- lm(PoB10_Total ~ EO, data=anonymous_data)
fit2 <- lm(PoB10_Total ~ EO + Education_y, data=anonymous_data)
fit3 <- lm(PoB10_Total ~ EO + Education_y + Age, data=anonymous_data)
fit4 <- lm(PoB10_Total ~ EO + Education_y + Age + Sex, data=anonymous_data)

summary(fit1)

## 
## Call:
## lm(formula = PoB10_Total ~ EO, data = anonymous_data)
## 
## Residuals:
##  1  2  3  4 
## -2  2 -6  6 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)   47.000      4.472  10.510  0.00893 **
## EO             4.000      6.325   0.632  0.59175   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.325 on 2 degrees of freedom
## Multiple R-squared:  0.1667, Adjusted R-squared:  -0.25 
## F-statistic:   0.4 on 1 and 2 DF,  p-value: 0.5918

summary(fit2)

## 
## Call:
## lm(formula = PoB10_Total ~ EO + Education_y, data = anonymous_data)
## 
## Residuals:
##          1          2          3          4 
## -2.000e+00  2.000e+00  3.331e-16  3.331e-16 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)   81.800     11.771   6.949    0.091 .
## EO             7.600      3.072   2.474    0.245  
## Education_y   -2.400      0.800  -3.000    0.205  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.828 on 1 degrees of freedom
## Multiple R-squared:  0.9167, Adjusted R-squared:   0.75 
## F-statistic:   5.5 on 2 and 1 DF,  p-value: 0.2887

summary(fit3)

## 
## Call:
## lm(formula = PoB10_Total ~ EO + Education_y + Age, data = anonymous_data)
## 
## Residuals:
## ALL 4 residuals are 0: no residual degrees of freedom!
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)  74.2923         NA      NA       NA
## EO            5.3538         NA      NA       NA
## Education_y  -2.1846         NA      NA       NA
## Age           0.1538         NA      NA       NA
## 
## Residual standard error: NaN on 0 degrees of freedom
## Multiple R-squared:      1,  Adjusted R-squared:    NaN 
## F-statistic:   NaN on 3 and 0 DF,  p-value: NA

summary(fit4)

## 
## Call:
## lm(formula = PoB10_Total ~ EO + Education_y + Age + Sex, data = anonymous_data)
## 
## Residuals:
## ALL 4 residuals are 0: no residual degrees of freedom!
## 
## Coefficients: (1 not defined because of singularities)
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)  74.2923         NA      NA       NA
## EO            5.3538         NA      NA       NA
## Education_y  -2.1846         NA      NA       NA
## Age           0.1538         NA      NA       NA
## SexMale           NA         NA      NA       NA
## 
## Residual standard error: NaN on 0 degrees of freedom
## Multiple R-squared:      1,  Adjusted R-squared:    NaN 
## F-statistic:   NaN on 3 and 0 DF,  p-value: NA