Introduction

Differential educational outcomes are often along racial or socio-economic lines (see, e.g., Reardon, Kalogrides, and Shores 2019). Race per se, however, cannot explain differential performance. Thus, there needs to be a different explanatory variables. One candidate is language, as different race group often speak (or are exposed to) different languages or varieties of the same language.

Moreover, English-language-learners (ELs) in the United States tend to perform worse than their peers who speak English as their first language in a variety of domains (see, e.g., Solano-Flores 2016; Solano-Flores and Hakuta 2017). Schools tackle this issue in various ways, such as through English second-language (ESL) programmes for their ELs, parent education programmes in English, or the employment of reading specialists to assist ELs. The Early Childhood Longitudinal Study, Kindergarten Class of 2011 (ECLS-K:2011; Tourangeau and Najarian 2006) dataset provides a great opportunity to test some specific hypotheses about the differential performance of ELs on a national scale, in an effort to underpin the findings of many smaller studies conducted in the field. Further, it allows for a preliminary analysis of the efficacy of parental education programmes and reading interventions for ELs and helps to suggest answers to the question of what schools can do to close this language-based achievement gap?

Research Question and Hypotheses

How does English reading performance in 5th graders (level 1 outcome) differ by home language? To what extent are the magnitudes of these differences affected by school characteristics, such as the availability of reading specialists and of parental education programmes (level 2 predictors)? The reading score gap is conceptualised as the difference between first-language English-speakers (E1; English is L1) and second-/additional-language English-speakers (EL; English is L2). It is operationalised as the regression coefficient for EL, which considers balanced bilinguals as E1.

Level 1 Hypotheses

  • H\(_1\): First-language English-speakers have a significantly higher reading score than ELLs (fixed effect of variable EL).
  • H\(_2\): The reading score gap is bigger for boys than for girls (interaction effect of Sex and EL).

Level 2 Hypotheses

  • H\(_3\): The reading score gap is bigger in schools with more ELs (fixed effect of pEL).
  • H\(_4\): The reading score gap is lower in schools who offer parent education programmes in English (fixed effect of ParentProgram).

Cross-level Interaction Hypothesis

  • H\(_5\): The offering of ESL programmes reduces the reading score gap between ELs and native English speakers; in other words, schools with a reading specialist experience a smaller difference between ELs and E1 students (interaction effect between EL and ReadingSpecialist).

Methods

Design

The study uses an intra-individual correlational design. The data was analysed using a multilevel regression model with 2 levels. Students (level 1) are nested within schools (level 2). Multilevel analysis techniques allow to investigate the influence of both individual as well as school-based factors on reading scores (Hox, Moerbeek, and Schoot 2018).

Data and Sample

The data comes from the sixth and final wave of the ECLS-K:2011, including children from across the United States. The ECLS-K:20111 is a longitudinal study of a nationally representative sample of children, drawing together information from multiple sources using multiple methods. The data was collected using a multi-stage, stratified cluster design, allowing for sufficient group sizes on the school level for a multilevel regression analysis. Individuals in the sample used in the present study were assessed in spring 2016 whilst in grade 5. Other information stems from teacher, principal, and parent questionnaires, partly from earlier waves of the study, partly computed by the researchers (Tourangeau and Najarian 2006).

The sample comprises 12 children, 12 boys and 12 girls from 1920 different schools. At the time of reading testing, they were aged between 9.56 and 13.13 months (M = 11.1, SD = 0.37),

Variables of interest

I selected a small subset of variables from the ECLS-K:2011 dataset, renamed and (where applicable) recoded them as per the below lists.

Individual-level (Level 1) Variables

  • Age: Continuous variable, the child’s age in months.
  • Sex: Categorical variable indicating the child’s biological sex, obtained from parent interviews in earlier rounds of the eclsk:2011 (boy = 0; girl = 1).
  • EL: Categorical (binary) variable indicating whether or not the child is an English-learner. Native English-speakers and balanced bilinguals are coded as 0, ELs are coded as 1.
  • ESL: Categorical (binary) variable indicating whether or not the child participates in an English.
  • ReadingScore: Continuous variable indicating the child’s reading score (theta/latent ability)

School-level (Level 2) Variables

  • ParentProgram: Categorical (binary) variable indicating whether or not the child’s school offers an English language program for parents.
  • ReadingSpecialist: Categorical (binary) variable indicating whether or not the child’s school employs a reading specialist.
  • pEL: Continuous variable denoting the percentage of ELs at the child’s school.

Results

Descriptive Results

In the below tables, the sample is described by Sex and by EL status. Based on these results, Sex seems to be equally distributed among EL status groups, Age seems to be very similar across all groupings, throughout. Groups sizes for EL status differ, there are about five times as many native English-speakers as there are ELs. The fact that 12 children coded as native English-speakers take part in an ESL program can be explained due to the coding of balanced bilinguals as English-native. The distribution of ELs in schools seems to be unequal, as on average, ELs attend schools with a much higher percentage of ELs than native-speakers do.

By Sex

Variable Total Boys Girls
N 12 12 12
EL 12 (100) 12 (100) 12 (100)
E1 12 (100) 12 (100) 12 (100)
Age (in months) 133.16 (4.41) 133.5 (4.47) 132.81 (4.33)
ReadingScore (theta) 1.48 (0.35) 1.46 (0.37) 1.49 (0.33)

Notes. For continuous variables, means are presented with standard deviations in parentheses. For interval variables, lengths are presented with percentages of column totals in parentheses.

By EL Status

Variable Total ELs E1
N 12 12 12
Boys (n) 12 (100) 12 (100) 12 (100)
Girls (n) 12 (100) 12 (100) 12 (100)
ESL 12 (100) 12 (100) 12 (100)
Age (in months) 133.16 (4.41) 132.27 (4.29) 133.33 (4.41)
ReadingScore (theta) 1.48 (0.35) 1.34 (0.34) 1.51 (0.35)
pEL\(^a\) (%) 12.86 (17.89) 33.41 (21.88) 8.91 (13.89)

Notes. For continuous variables, means are presented with standard deviations in parentheses. For interval variables, lengths are presented with percentages of column totals in parentheses. \(^a\) 173 cases with missing data

Model Building

First, I built an intercept-only model with random intercepts, m0. This model serves as the benchmark model, against which more complex models can be evaluated.

Model m0 looks as follows: \[ ReadingScore_{ij} = \gamma_{00} + u_{0j} + e_{ij} \]

## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
##   method [lmerModLmerTest]
## Formula: ReadingScore ~ (1 | School)
##    Data: data
## 
##      AIC      BIC   logLik deviance df.resid 
##     5608     5629    -2801     5602     8064 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.5707 -0.6222  0.0112  0.6859  2.5805 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  School   (Intercept) 0.02456  0.1567  
##  Residual             0.10208  0.3195  
## Number of obs: 8067, groups:  School, 1920
## 
## Fixed effects:
##              Estimate Std. Error        df t value Pr(>|t|)    
## (Intercept) 1.453e+00  5.737e-03 1.163e+03   253.2   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Before, I ran a significance test for school effects by comparing model m0 to a single-level null model (simple regression), which was highly significant (465.5081738). This confirms a better fit of model m0, suggesting significant between schools variance. Accordingly, the intra-class coefficient of m0, \(\rho\) = 0.19, tell us that 19.39 percent of the total variance observed is explained by level two groupings, i.e. by differences between schools. Generally, however, there is not much variance altogether,

Next, I built a model containing all individual-level (level 1) predictors, m1. This model includes EL (binary variable indicating whether or not the student is an EL), Sex, and ESL (a binary variable indicating whether or not the student takes part in an ESL program).

\[ ReadingScore_{ij} = \gamma_{00} + \gamma_{10}EL_{ij} + \gamma_{20}Sex_{ij} + \gamma_{30}ESL_{ij} + u_{0j} + e_{ij} \]

## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
##   method [lmerModLmerTest]
## Formula: ReadingScore ~ EL + Sex + ESL + (1 | School)
##    Data: data
## 
##      AIC      BIC   logLik deviance df.resid 
##   5360.3   5402.2  -2674.2   5348.3     7949 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.6176 -0.6192  0.0101  0.6826  2.4526 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  School   (Intercept) 0.01933  0.1390  
##  Residual             0.10191  0.3192  
## Number of obs: 7955, groups:  School, 1905
## 
## Fixed effects:
##               Estimate Std. Error         df t value Pr(>|t|)    
## (Intercept)  1.468e+00  6.921e-03  2.184e+03 212.161  < 2e-16 ***
## EL          -1.493e-01  1.224e-02  7.360e+03 -12.202  < 2e-16 ***
## Sex          2.420e-02  7.505e-03  7.633e+03   3.225  0.00127 ** 
## ESL          3.948e-02  1.371e-02  7.893e+03   2.879  0.00400 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##     (Intr) EL     Sex   
## EL  -0.229              
## Sex -0.525  0.007       
## ESL -0.119 -0.394 -0.026

Given some missing data for ESL, models m0 and m1 cannot be compared using the anova function, but other indicators suggest that m1 is, by far, the preferable model here.

Model m1a is a refined version of model m1, with the interaction effect of EL*Sex (testing H\(_2\)). \[ ReadingScore_{ij} = \gamma_{00} + \gamma_{10}EL_{ij} + \gamma_{20}Sex_{ij} + \gamma_{30}ESL_{ij} + \gamma_{30}EL_{ij}*Sex_{ij} + u_{0j} + e_{ij} \]

## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
##   method [lmerModLmerTest]
## Formula: ReadingScore ~ EL + Sex + I(EL * Sex) + ESL + (1 | School)
##    Data: data
## 
##      AIC      BIC   logLik deviance df.resid 
##   5362.3   5411.2  -2674.2   5348.3     7948 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.6169 -0.6189  0.0100  0.6828  2.4518 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  School   (Intercept) 0.01933  0.1390  
##  Residual             0.10191  0.3192  
## Number of obs: 7955, groups:  School, 1905
## 
## Fixed effects:
##               Estimate Std. Error         df t value Pr(>|t|)    
## (Intercept)  1.469e+00  7.102e-03  2.394e+03 206.810  < 2e-16 ***
## EL          -1.509e-01  1.582e-02  7.866e+03  -9.537  < 2e-16 ***
## Sex          2.370e-02  8.175e-03  7.558e+03   2.899  0.00375 ** 
## I(EL * Sex)  3.149e-03  2.057e-02  7.851e+03   0.153  0.87830    
## ESL          3.944e-02  1.372e-02  7.893e+03   2.875  0.00405 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) EL     Sex    I(EL*S
## EL          -0.315                     
## Sex         -0.559  0.256              
## I(EL * Sex)  0.224 -0.634 -0.397       
## ESL         -0.120 -0.291 -0.015 -0.022

Model m1a does not fit better compared to m1, with \(\chi^2\)(1) = 0.02, p = 0.878 and the interaction effect _EL*Sex_ is non-signifiant. Thus, I continued with model m1.

In a next step, I added all school-level (level 2) predictors, pEL, ParentProgram, and ReadingSpecialist resulting in m2 (with both level 1 and 2 predictors).

\[ ReadingScore_{ij} = \gamma_{00} + \gamma_{10}EL_{ij} + \gamma_{20}Sex_{ij} + \gamma_{30}ESL_{ij} + \gamma_{01}pEL_j + \gamma_{02}ParentProgram_j + \gamma_{03}ReadingSpecialist_j + u_{0j} + e_{ij} \]

## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
##   method [lmerModLmerTest]
## Formula: 
## ReadingScore ~ EL + Sex + ESL + pEL + ParentProgram + ReadingSpecialist +  
##     (1 | School)
##    Data: data
## 
##      AIC      BIC   logLik deviance df.resid 
##   5005.6   5068.0  -2493.8   4987.6     7548 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.6390 -0.6251  0.0139  0.6938  2.5383 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  School   (Intercept) 0.01535  0.1239  
##  Residual             0.10253  0.3202  
## Number of obs: 7557, groups:  School, 1801
## 
## Fixed effects:
##                     Estimate Std. Error         df t value Pr(>|t|)    
## (Intercept)        1.477e+00  1.668e-02  1.238e+03  88.590  < 2e-16 ***
## EL                -1.172e-01  1.347e-02  7.545e+03  -8.703  < 2e-16 ***
## Sex                2.748e-02  7.683e-03  7.325e+03   3.577 0.000349 ***
## ESL                4.741e-02  1.408e-02  7.452e+03   3.367 0.000763 ***
## pEL               -2.623e-03  3.378e-04  1.580e+03  -7.767 1.44e-14 ***
## ParentProgram      2.381e-02  1.555e-02  1.194e+03   1.531 0.125984    
## ReadingSpecialist  7.474e-03  1.224e-02  1.085e+03   0.611 0.541506    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) EL     Sex    ESL    pEL    PrntPr
## EL          -0.047                                   
## Sex         -0.194  0.007                            
## ESL         -0.034 -0.362 -0.026                     
## pEL         -0.501 -0.324 -0.019 -0.035              
## ParentPrgrm -0.873  0.039 -0.032 -0.002  0.368       
## RedngSpclst -0.137  0.006 -0.006 -0.032 -0.071 -0.031

Significance testing showed a significant main effect of pEL, but no effects for ParentProgram or ReadingSpecialist. Hence I removed the latter two predictors from the model, resulting in model m2a.

\[ ReadingScore_{ij} = \gamma_{00} + \gamma_{10}EL_{ij} + \gamma_{20}Sex_{ij} + \gamma_{30}ESL_{ij} + \gamma_{01}pEL_j + u_{0j} + e_{ij} \]

## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
##   method [lmerModLmerTest]
## Formula: ReadingScore ~ EL + Sex + ESL + pEL + (1 | School)
##    Data: data
## 
##      AIC      BIC   logLik deviance df.resid 
##   5157.6   5206.3  -2571.8   5143.6     7779 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.6430 -0.6241  0.0136  0.6933  2.5490 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  School   (Intercept) 0.01638  0.1280  
##  Residual             0.10208  0.3195  
## Number of obs: 7786, groups:  School, 1853
## 
## Fixed effects:
##               Estimate Std. Error         df t value Pr(>|t|)    
## (Intercept)  1.501e+00  7.644e-03  1.529e+03 196.337  < 2e-16 ***
## EL          -1.127e-01  1.317e-02  7.781e+03  -8.553  < 2e-16 ***
## Sex          2.743e-02  7.561e-03  7.529e+03   3.628 0.000287 ***
## ESL          4.727e-02  1.379e-02  7.699e+03   3.428 0.000610 ***
## pEL         -2.850e-03  3.108e-04  1.749e+03  -9.170  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##     (Intr) EL     Sex    ESL   
## EL  -0.026                     
## Sex -0.477  0.010              
## ESL -0.088 -0.357 -0.026       
## pEL -0.451 -0.362 -0.008 -0.039

Next, testing random slopes models, variable by variable. Models m3a, m3b, and m3c test random slopes for EL, Sex, and ESL, respectively.

## Random effect variances not available. Returned R2 does not account for random effects.

Given that the models are non-nested, I used AIC and BIC indicators to evaluate the models. None of the models with random slopes (m3a, m3b, and m3c with random slopes for _EL, Sex, and ESL, respectively) performed better than model m2a. Hence, I settled on model m2a, as the final model. The final model looks as follows: \[ ReadingScore_{ij} = \gamma_{00} + \gamma_{10}EL_{ij} + \gamma_{20}Sex_{ij} + \gamma_{30}ESL_{ij} + \gamma_{01}pEL\_School_j + u_{0j} + e_{ij} \]

Summary of Results and Evaluation of Hypotheses

Generally, there is not much variance in Reading Score (partly due to the fact that the analysis is done using theta scores, which translate actual scores to a latent ability measure ranging between -4 and 4.)

Below, each hypothesis is evaluated separately.

Level 1 Hypotheses

  • H\(_1\): First-language English-speakers have a significantly higher reading score than ELs (fixed effect of variable EL).
    • Descriptive results show a difference in mean reading scores between ELs and native English-speakers, with the former trailing behind the latter.
    • EL remains a significant negative predictor of ReadingScore throughout, lending support to H\(_1\).
  • H\(_2\): The reading score gap is bigger for boys than for girls (interaction effect of Sex and EL).
    • The interaction effect Sex*EL is not significant (see model m1a)
    • Hence, I cannot conclude that the reading score gap is bigger for boys than for girls.
    • Girls do, however, generally outperform boys in reading, as inidcated by the significant positive main effect of Sex on ReadingScore.

Level 2 Hypotheses

  • H\(_3\): The reading score gap is bigger in schools with more ELs (fixed effect of pEL).
    • Model m2a provides evidence for this hypotheses.
    • pEL has a significant negative effect on ReadingScore.
    • This effect is independent of other predictors and holds true for both sexes and for ELs, as well as native English-speakers.
  • H\(_4\): The reading score gap is lower in schools who offer parent education programmes in English (fixed effect of ParentProgram).
    • ParentProgram had no significant effect when entered into model m2.
    • There is no evidence for a positive effect on students’ reading score_ReadingScore through schools offering parent programs.

Cross-level Interaction Hypothesis

  • H\(_5\): The offering of ESL programmes reduces the reading score gap between ELs and native English speakers; in other words, schools with a reading specialist experience a smaller difference between ELs and E1 students (interaction effect between EL and ReadingSpecialist).
    • ReadingSpecialist did not show a significant level 2 predictive effect on ReadingScore when tested in model m2.
    • Hence, its inclusion in the model was not warranted and interaction effects could not be tested.

Discussion and Conclusion

The results suggest that that individual-level variable EL explains a great deal of between-schools variance, because ELs are unequally represented at different schools. Hence, individual differences between ELs and E1 students show in the model as aggregated school-level variance. Regarding sex differences (though not the focus of the analysis), these findings align with the previous research (see, e.g., Etchell et al. 2018).

In particular the large main effect for pEL,the percentage of ELs at a given school, warrants further investigation. The more ELs there are in a school, the lower their average reading score. Altogether, the findings (with the caveat of only explaining a small proportion of the variance) point to the fact that the most pressing issue to address is the concentration of ELs in a relatively small number of schools, which has an adverse effect on reading score for all students at such schools–similar patterns were found with regard to racial segration at schools (Reardon and Owens 2014). This is above and beyond the level 1 main effect of EL and, hence, does not mean to negate the differential performance between ELs and native English-speakers at any given school. However, in order to make a more informed judgement on this, the psychometric properties of the reading measure, as well as its cultural validity need to be investigated.

References

Etchell, Andrew, Aditi Adhikari, Lauren S Weinberg, Ai Leen Choo, Emily O Garnett, Ho Ming Chow, and Soo-Eun Chang. 2018. “A systematic literature review of sex differences in childhood language and brain development.” Neuropsychologia 114: 19–31.

Hox, Joop H, Mirjam Moerbeek, and Rens van de Schoot. 2018. Multilevel Analysis - Techniques and Applications. 3rd ed. New York, NY: Routledge.

Reardon, Sean F, Demetra Kalogrides, and Kenneth Shores. 2019. “The Geography of Racial/Ethnic Test Score Gaps.” American Journal of Sociology 124 (4): 1164–1221. https://doi.org/10.1086/700678.

Reardon, Sean F, and Ann Owens. 2014. “60 Years After Brown : Trends and Consequences of School Segregation.” Annual Review of Sociology 40 (1): 199–218. https://doi.org/10.1146/annurev-soc-071913-043152.

Solano-Flores, Guillermo. 2016. Assessing English language learners: Theory and practice. New York, NY: Routledge.

Solano-Flores, Guillermo, and Kenji Hakuta. 2017. Assessing Students in Their Home Language (Understanding Language). Stanford, CA: Stanford University Graduate School of Education.

Tourangeau, Nord, K., and M Najarian. 2006. Early Childhood Longitudinal Study, Kindergarten Class of 2010–11 (ECLS-K:2011) User’s Manual for the ECLS-K:2011 Kindergarten–Fifth Grade Data File and Electronic Codebook, Public Version (NCES 2019-051). U.S. Department of Education. Washington, DC: National Center for Education Statistics.