Differential educational outcomes are often along racial or socio-economic lines (see, e.g., Reardon, Kalogrides, and Shores 2019). Race per se, however, cannot explain differential performance. Thus, there needs to be a different explanatory variables. One candidate is language, as different race group often speak (or are exposed to) different languages or varieties of the same language.
Moreover, English-language-learners (ELs) in the United States tend to perform worse than their peers who speak English as their first language in a variety of domains (see, e.g., Solano-Flores 2016; Solano-Flores and Hakuta 2017). Schools tackle this issue in various ways, such as through English second-language (ESL) programmes for their ELs, parent education programmes in English, or the employment of reading specialists to assist ELs. The Early Childhood Longitudinal Study, Kindergarten Class of 2011 (ECLS-K:2011; Tourangeau and Najarian 2006) dataset provides a great opportunity to test some specific hypotheses about the differential performance of ELs on a national scale, in an effort to underpin the findings of many smaller studies conducted in the field. Further, it allows for a preliminary analysis of the efficacy of parental education programmes and reading interventions for ELs and helps to suggest answers to the question of what schools can do to close this language-based achievement gap?
How does English reading performance in 5th graders (level 1 outcome) differ by home language? To what extent are the magnitudes of these differences affected by school characteristics, such as the availability of reading specialists and of parental education programmes (level 2 predictors)? The reading score gap is conceptualised as the difference between first-language English-speakers (E1; English is L1) and second-/additional-language English-speakers (EL; English is L2). It is operationalised as the regression coefficient for EL, which considers balanced bilinguals as E1.
The study uses an intra-individual correlational design. The data was analysed using a multilevel regression model with 2 levels. Students (level 1) are nested within schools (level 2). Multilevel analysis techniques allow to investigate the influence of both individual as well as school-based factors on reading scores (Hox, Moerbeek, and Schoot 2018).
The data comes from the sixth and final wave of the ECLS-K:2011, including children from across the United States. The ECLS-K:20111 is a longitudinal study of a nationally representative sample of children, drawing together information from multiple sources using multiple methods. The data was collected using a multi-stage, stratified cluster design, allowing for sufficient group sizes on the school level for a multilevel regression analysis. Individuals in the sample used in the present study were assessed in spring 2016 whilst in grade 5. Other information stems from teacher, principal, and parent questionnaires, partly from earlier waves of the study, partly computed by the researchers (Tourangeau and Najarian 2006).
The sample comprises 12 children, 12 boys and 12 girls from 1920 different schools. At the time of reading testing, they were aged between 9.56 and 13.13 months (M = 11.1, SD = 0.37),
I selected a small subset of variables from the ECLS-K:2011 dataset, renamed and (where applicable) recoded them as per the below lists.
In the below tables, the sample is described by Sex and by EL status. Based on these results, Sex seems to be equally distributed among EL status groups, Age seems to be very similar across all groupings, throughout. Groups sizes for EL status differ, there are about five times as many native English-speakers as there are ELs. The fact that 12 children coded as native English-speakers take part in an ESL program can be explained due to the coding of balanced bilinguals as English-native. The distribution of ELs in schools seems to be unequal, as on average, ELs attend schools with a much higher percentage of ELs than native-speakers do.
Variable | Total | Boys | Girls |
---|---|---|---|
N | 12 | 12 | 12 |
EL | 12 (100) | 12 (100) | 12 (100) |
E1 | 12 (100) | 12 (100) | 12 (100) |
Age (in months) | 133.16 (4.41) | 133.5 (4.47) | 132.81 (4.33) |
ReadingScore (theta) | 1.48 (0.35) | 1.46 (0.37) | 1.49 (0.33) |
Notes. For continuous variables, means are presented with standard deviations in parentheses. For interval variables, lengths are presented with percentages of column totals in parentheses.
Variable | Total | ELs | E1 |
---|---|---|---|
N | 12 | 12 | 12 |
Boys (n) | 12 (100) | 12 (100) | 12 (100) |
Girls (n) | 12 (100) | 12 (100) | 12 (100) |
ESL | 12 (100) | 12 (100) | 12 (100) |
Age (in months) | 133.16 (4.41) | 132.27 (4.29) | 133.33 (4.41) |
ReadingScore (theta) | 1.48 (0.35) | 1.34 (0.34) | 1.51 (0.35) |
pEL\(^a\) (%) | 12.86 (17.89) | 33.41 (21.88) | 8.91 (13.89) |
Notes. For continuous variables, means are presented with standard deviations in parentheses. For interval variables, lengths are presented with percentages of column totals in parentheses. \(^a\) 173 cases with missing data
First, I built an intercept-only model with random intercepts, m0. This model serves as the benchmark model, against which more complex models can be evaluated.
Model m0 looks as follows: \[ ReadingScore_{ij} = \gamma_{00} + u_{0j} + e_{ij} \]
## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
## method [lmerModLmerTest]
## Formula: ReadingScore ~ (1 | School)
## Data: data
##
## AIC BIC logLik deviance df.resid
## 5608 5629 -2801 5602 8064
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.5707 -0.6222 0.0112 0.6859 2.5805
##
## Random effects:
## Groups Name Variance Std.Dev.
## School (Intercept) 0.02456 0.1567
## Residual 0.10208 0.3195
## Number of obs: 8067, groups: School, 1920
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 1.453e+00 5.737e-03 1.163e+03 253.2 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Before, I ran a significance test for school effects by comparing model m0 to a single-level null model (simple regression), which was highly significant (465.5081738). This confirms a better fit of model m0, suggesting significant between schools variance. Accordingly, the intra-class coefficient of m0, \(\rho\) = 0.19, tell us that 19.39 percent of the total variance observed is explained by level two groupings, i.e. by differences between schools. Generally, however, there is not much variance altogether,
Next, I built a model containing all individual-level (level 1) predictors, m1. This model includes EL (binary variable indicating whether or not the student is an EL), Sex, and ESL (a binary variable indicating whether or not the student takes part in an ESL program).
\[ ReadingScore_{ij} = \gamma_{00} + \gamma_{10}EL_{ij} + \gamma_{20}Sex_{ij} + \gamma_{30}ESL_{ij} + u_{0j} + e_{ij} \]
## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
## method [lmerModLmerTest]
## Formula: ReadingScore ~ EL + Sex + ESL + (1 | School)
## Data: data
##
## AIC BIC logLik deviance df.resid
## 5360.3 5402.2 -2674.2 5348.3 7949
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.6176 -0.6192 0.0101 0.6826 2.4526
##
## Random effects:
## Groups Name Variance Std.Dev.
## School (Intercept) 0.01933 0.1390
## Residual 0.10191 0.3192
## Number of obs: 7955, groups: School, 1905
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 1.468e+00 6.921e-03 2.184e+03 212.161 < 2e-16 ***
## EL -1.493e-01 1.224e-02 7.360e+03 -12.202 < 2e-16 ***
## Sex 2.420e-02 7.505e-03 7.633e+03 3.225 0.00127 **
## ESL 3.948e-02 1.371e-02 7.893e+03 2.879 0.00400 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) EL Sex
## EL -0.229
## Sex -0.525 0.007
## ESL -0.119 -0.394 -0.026
Given some missing data for ESL, models m0 and m1 cannot be compared using the anova
function, but other indicators suggest that m1 is, by far, the preferable model here.
Model m1a is a refined version of model m1, with the interaction effect of EL*Sex (testing H\(_2\)). \[ ReadingScore_{ij} = \gamma_{00} + \gamma_{10}EL_{ij} + \gamma_{20}Sex_{ij} + \gamma_{30}ESL_{ij} + \gamma_{30}EL_{ij}*Sex_{ij} + u_{0j} + e_{ij} \]
## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
## method [lmerModLmerTest]
## Formula: ReadingScore ~ EL + Sex + I(EL * Sex) + ESL + (1 | School)
## Data: data
##
## AIC BIC logLik deviance df.resid
## 5362.3 5411.2 -2674.2 5348.3 7948
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.6169 -0.6189 0.0100 0.6828 2.4518
##
## Random effects:
## Groups Name Variance Std.Dev.
## School (Intercept) 0.01933 0.1390
## Residual 0.10191 0.3192
## Number of obs: 7955, groups: School, 1905
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 1.469e+00 7.102e-03 2.394e+03 206.810 < 2e-16 ***
## EL -1.509e-01 1.582e-02 7.866e+03 -9.537 < 2e-16 ***
## Sex 2.370e-02 8.175e-03 7.558e+03 2.899 0.00375 **
## I(EL * Sex) 3.149e-03 2.057e-02 7.851e+03 0.153 0.87830
## ESL 3.944e-02 1.372e-02 7.893e+03 2.875 0.00405 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) EL Sex I(EL*S
## EL -0.315
## Sex -0.559 0.256
## I(EL * Sex) 0.224 -0.634 -0.397
## ESL -0.120 -0.291 -0.015 -0.022
Model m1a does not fit better compared to m1, with \(\chi^2\)(1) = 0.02, p = 0.878 and the interaction effect _EL*Sex_ is non-signifiant. Thus, I continued with model m1.
In a next step, I added all school-level (level 2) predictors, pEL, ParentProgram, and ReadingSpecialist resulting in m2 (with both level 1 and 2 predictors).
\[ ReadingScore_{ij} = \gamma_{00} + \gamma_{10}EL_{ij} + \gamma_{20}Sex_{ij} + \gamma_{30}ESL_{ij} + \gamma_{01}pEL_j + \gamma_{02}ParentProgram_j + \gamma_{03}ReadingSpecialist_j + u_{0j} + e_{ij} \]
## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
## method [lmerModLmerTest]
## Formula:
## ReadingScore ~ EL + Sex + ESL + pEL + ParentProgram + ReadingSpecialist +
## (1 | School)
## Data: data
##
## AIC BIC logLik deviance df.resid
## 5005.6 5068.0 -2493.8 4987.6 7548
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.6390 -0.6251 0.0139 0.6938 2.5383
##
## Random effects:
## Groups Name Variance Std.Dev.
## School (Intercept) 0.01535 0.1239
## Residual 0.10253 0.3202
## Number of obs: 7557, groups: School, 1801
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 1.477e+00 1.668e-02 1.238e+03 88.590 < 2e-16 ***
## EL -1.172e-01 1.347e-02 7.545e+03 -8.703 < 2e-16 ***
## Sex 2.748e-02 7.683e-03 7.325e+03 3.577 0.000349 ***
## ESL 4.741e-02 1.408e-02 7.452e+03 3.367 0.000763 ***
## pEL -2.623e-03 3.378e-04 1.580e+03 -7.767 1.44e-14 ***
## ParentProgram 2.381e-02 1.555e-02 1.194e+03 1.531 0.125984
## ReadingSpecialist 7.474e-03 1.224e-02 1.085e+03 0.611 0.541506
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) EL Sex ESL pEL PrntPr
## EL -0.047
## Sex -0.194 0.007
## ESL -0.034 -0.362 -0.026
## pEL -0.501 -0.324 -0.019 -0.035
## ParentPrgrm -0.873 0.039 -0.032 -0.002 0.368
## RedngSpclst -0.137 0.006 -0.006 -0.032 -0.071 -0.031
Significance testing showed a significant main effect of pEL, but no effects for ParentProgram or ReadingSpecialist. Hence I removed the latter two predictors from the model, resulting in model m2a.
\[ ReadingScore_{ij} = \gamma_{00} + \gamma_{10}EL_{ij} + \gamma_{20}Sex_{ij} + \gamma_{30}ESL_{ij} + \gamma_{01}pEL_j + u_{0j} + e_{ij} \]
## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
## method [lmerModLmerTest]
## Formula: ReadingScore ~ EL + Sex + ESL + pEL + (1 | School)
## Data: data
##
## AIC BIC logLik deviance df.resid
## 5157.6 5206.3 -2571.8 5143.6 7779
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.6430 -0.6241 0.0136 0.6933 2.5490
##
## Random effects:
## Groups Name Variance Std.Dev.
## School (Intercept) 0.01638 0.1280
## Residual 0.10208 0.3195
## Number of obs: 7786, groups: School, 1853
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 1.501e+00 7.644e-03 1.529e+03 196.337 < 2e-16 ***
## EL -1.127e-01 1.317e-02 7.781e+03 -8.553 < 2e-16 ***
## Sex 2.743e-02 7.561e-03 7.529e+03 3.628 0.000287 ***
## ESL 4.727e-02 1.379e-02 7.699e+03 3.428 0.000610 ***
## pEL -2.850e-03 3.108e-04 1.749e+03 -9.170 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) EL Sex ESL
## EL -0.026
## Sex -0.477 0.010
## ESL -0.088 -0.357 -0.026
## pEL -0.451 -0.362 -0.008 -0.039
Next, testing random slopes models, variable by variable. Models m3a, m3b, and m3c test random slopes for EL, Sex, and ESL, respectively.
## Random effect variances not available. Returned R2 does not account for random effects.
Given that the models are non-nested, I used AIC and BIC indicators to evaluate the models. None of the models with random slopes (m3a, m3b, and m3c with random slopes for _EL, Sex, and ESL, respectively) performed better than model m2a. Hence, I settled on model m2a, as the final model. The final model looks as follows: \[ ReadingScore_{ij} = \gamma_{00} + \gamma_{10}EL_{ij} + \gamma_{20}Sex_{ij} + \gamma_{30}ESL_{ij} + \gamma_{01}pEL\_School_j + u_{0j} + e_{ij} \]
Generally, there is not much variance in Reading Score (partly due to the fact that the analysis is done using theta scores, which translate actual scores to a latent ability measure ranging between -4 and 4.)
Below, each hypothesis is evaluated separately.
The results suggest that that individual-level variable EL explains a great deal of between-schools variance, because ELs are unequally represented at different schools. Hence, individual differences between ELs and E1 students show in the model as aggregated school-level variance. Regarding sex differences (though not the focus of the analysis), these findings align with the previous research (see, e.g., Etchell et al. 2018).
In particular the large main effect for pEL,the percentage of ELs at a given school, warrants further investigation. The more ELs there are in a school, the lower their average reading score. Altogether, the findings (with the caveat of only explaining a small proportion of the variance) point to the fact that the most pressing issue to address is the concentration of ELs in a relatively small number of schools, which has an adverse effect on reading score for all students at such schools–similar patterns were found with regard to racial segration at schools (Reardon and Owens 2014). This is above and beyond the level 1 main effect of EL and, hence, does not mean to negate the differential performance between ELs and native English-speakers at any given school. However, in order to make a more informed judgement on this, the psychometric properties of the reading measure, as well as its cultural validity need to be investigated.
Etchell, Andrew, Aditi Adhikari, Lauren S Weinberg, Ai Leen Choo, Emily O Garnett, Ho Ming Chow, and Soo-Eun Chang. 2018. “A systematic literature review of sex differences in childhood language and brain development.” Neuropsychologia 114: 19–31.
Hox, Joop H, Mirjam Moerbeek, and Rens van de Schoot. 2018. Multilevel Analysis - Techniques and Applications. 3rd ed. New York, NY: Routledge.
Reardon, Sean F, Demetra Kalogrides, and Kenneth Shores. 2019. “The Geography of Racial/Ethnic Test Score Gaps.” American Journal of Sociology 124 (4): 1164–1221. https://doi.org/10.1086/700678.
Reardon, Sean F, and Ann Owens. 2014. “60 Years After Brown : Trends and Consequences of School Segregation.” Annual Review of Sociology 40 (1): 199–218. https://doi.org/10.1146/annurev-soc-071913-043152.
Solano-Flores, Guillermo. 2016. Assessing English language learners: Theory and practice. New York, NY: Routledge.
Solano-Flores, Guillermo, and Kenji Hakuta. 2017. Assessing Students in Their Home Language (Understanding Language). Stanford, CA: Stanford University Graduate School of Education.
Tourangeau, Nord, K., and M Najarian. 2006. Early Childhood Longitudinal Study, Kindergarten Class of 2010–11 (ECLS-K:2011) User’s Manual for the ECLS-K:2011 Kindergarten–Fifth Grade Data File and Electronic Codebook, Public Version (NCES 2019-051). U.S. Department of Education. Washington, DC: National Center for Education Statistics.