The Science of Reading or Balanced Literacy for Early Reading Instruction: What The Data Actually Supports.

Introduction

Over the last several years, there has been a rise in national awareness around how kids learn to read. Specifically relating to popular teaching methods used in classrooms in the United States. Methods and philosophies of reading instruction range from The Science of Reading to Balanced Literacy programs. With a quickly developing body of research, The Science of Reading appears to be deepening our understanding around how children learn to read. With a focus on the following theories and frameworks of reading derived from reading research: The five pillars of early literacy, Scarborough’s Reading Rope, the Simple View of Reading, and Structured Literacy, educators can continue to use research-based best practices to inform their instruction ensuring all students have the opportunity to achieve reading success.

A massive shift in policies toward The Science of Reading has occurred based on evolving evidence of guiding best practices in reading (Petscher et al., 2020). Along with this shift, we have seen a rise in theoretical views of reading instruction being debated. The two views most debated include Balanced Literacy Approaches and The Science of Reading (Petscher et al., 2020). The Simple View of Reading Theory, which falls under The Science of Reading, predicts that reading development relies on two components: decoding, through explicit phonics instruction, and language comprehension, which is nurtured through rich oral language experiences (Hoover & Gough, 1990). On the one hand, positivists argue for explicit instruction to enhance comprehension and support The Science of Reading. On the other hand, constructivists believe that readers use their graphic, semantic, and syntactic knowledge (the three cueing systems) to guess the meaning of the printed word (Petscher et al., 2020). Taken together, educators and policymakers are often confused by which approach and theory they should approach within the classroom to best support their students.

Recently, southern states such as Mississippi, Alabama, and Louisiana have implemented structured literacy reforms (The Science of Reading). Following these implementations in 2022 Mississippi’s 4th-grade reading NAEP scores increased from near-bottom to 5 points above the national (D’Souza, 2023). Furthermore, as evidenced in NAEP data to better aid support in adopting The Science of Reading approach Alabama and Louisiana were the only two states to post gains during the post-pandemic plunge, particularly for low-income students. With many southern states currently looking to enhance their literacy instruction and improve their student reading outcomes it is critical to determine the effectiveness of The Science of Reading in regard to reading achievement.

The current study aims to investigate the Simple View of Reading focusing specifically on the relative roles that decoding, via explicit phonics instruction, and language comprehension potentially contribute to reading development at the kindergarten level. Additionally, the study will seek to determine if the data supports a phonics-first strategy or a more balanced approach to early reading growth. Furthermore, the study will investigate the relative contribution of these two components in order to explore whether the current data supports the South’s phonics first strategy or a more balanced approach to early reading growth. In line with the simple view of reading theory this study will also investigate whether the language spoken at home by the child moderates the effect of reading development.

Research Questions

  1. To what extent can a composite score of a fall decoding indicator and a composite score of a fall-language comprehension indicator account for the variation in spring Reading IRT scores?

  2. What is the relative contribution of fall decoding skills (composite score) and fall language comprehension skills (composite score) to spring Reading IRT scores?

  3. Do these data support a phonics-first strategy or a more balanced approach to early reading growth, based on the relative contributions of decoding and language comprehension to spring Reading IRT scores?

  4. After controlling for fall Reading IRT scores, what is the relative contribution of fall decoding skills (composite score) and fall language comprehension skills (composite score) to spring Reading IRT scores?

  5. Is the effect of the composite decoding score on spring Reading IRT scores moderated by the home language of the child?

Methods

Participants

The data set used in this analysis is a subset of the Early Childhood Longitudinal Study, Kindergarten Class of 2010-2011 (ECLS-K:2011), which is part of a large-scale, longitudinal study conducted by the National Center for Education Statistics (NCES). It follows a nationally representative sample of children who began kindergarten in the 2010–2011 school year and tracks their development through elementary school. The sample size (n) is 10,671.

Measures & Variables

The independent variables used in the analysis are as follows:

  • Fall Decoding Indicators:
    • T1LETTER - visual identification of symbols, easily and quickly names all upper and lower-case letters of the alphabet
    • T1USESTR - strategic flexibility; decoding; language processing and uses different strategies to read unfamiliar words
    • T1WRITE - directionality; letter formation; sound symbol relationships and demonstrates early writing behaviors

These items represent the teacher’s perceptions of the child’s language and literacy abilities as compared to other children of the same age. Responses were recorded on a 6-point Likert scale (0=not started, 1=none/not yet, 2=beginning, 3=in progress, 4=intermediate, 5=proficient). The composite score was calculated by averaging the three items.

  • Fall Language Comprehension Indicators:
    • T1CMPSEN - language production and uses complex sentence structures
    • T1STORY - story structure; logic; language processing and understands and interprets a story or other text read to him/her
    • T1PRDCT - story structure; logic; language processing and predicts what will happen next in stories
    • T1CMPSTR - language production and composes simple stories

These items represent the teacher’s perceptions of the child’s language and literacy abilities as compared to other children of the same age. Responses were recorded on a 6-point Likert scale (0=not started, 1=none/not yet, 2=beginning, 3=in progress, 4=intermediate, 5=proficient). The composite score was calculated by averaging the four items.

The dependent variable used in the analysis is Spring Reading IRT Score Variation coded as X2RSCALK4. The IRT-based scale scores are overall measures of achievement. They are appropriate for both cross-sectional and longitudinal analyses. They are useful in examining differences in overall achievement among subgroups of children in a given data collection round or in different rounds, as well as in analysis looking at correlations between achievement and child, family, and school characteristics. The fall kindergarten and spring kindergarten scale scores are on the same metric. Therefore, an analyst looking at growth across the kindergarten year could subtract the fall kindergarten score from the spring kindergarten score to compute a gain score.

The moderation variable home language of the child X12LANGST is a composite created to indicate whether English was a primary language spoken in the home or whether a non-English language was the primary language spoken. The scale is 1=non-English, 2=English, 3=English and non-English equally.

Measurement Properties

Reliability

Internal consistency reliability was assessed using Cronbach’s alpha for the decoding composite measure. The alpha coefficient was α = 0.84, which indicates good reliability according to conventional benchmarks (≥ .70 acceptable, ≥ .80 good, ≥ .90 excellent). This suggests that the items in the scale measure the construct consistently across participants.

Decoding Composite Score - Reliability Analysis, Cronbach’s Alpha Output
## [1] 0.8385652

Internal consistency reliability was assessed using Cronbach’s alpha for the language comprehension composite measure. The alpha coefficient was α = 0.86, which indicates good reliability according to conventional benchmarks (≥ .70 acceptable, ≥ .80 good, ≥ .90 excellent). This suggests that the items in the scale measure the construct consistently across participants.

Language Composite Score - Reliability Analysis, Cronbach’s Alpha Output
## [1] 0.8643188

Construct Validity Convergent validity was examined by correlating decoding composite scores with Fall RIT scores . Results showed a positive, statistically significant correlation, r(df) = 10669, p = <0.001, indicating that higher scores on the decoding scale were associated with reading comprehension.

Pearson’s R Convergent Validity for Decoding Comp. and Fall RIT Score
## 
##  Pearson's product-moment correlation
## 
## data:  Qual_Exam_Data$decoding_avg and Qual_Exam_Data$X1RSCALK4
## t = 87.371, df = 10669, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.6346214 0.6567451
## sample estimates:
##       cor 
## 0.6458188

Convergent validity was examined by correlating language composite scores with Fall RIT scores . Results showed a moderate positive, statistically significant correlation, r(df) = 10669, p = <0.001, indicating that higher scores on the decoding scale were associated with reading comprehension.

Pearson’s R Convergent Validity for Language Comp. and Fall RIT Score
## 
##  Pearson's product-moment correlation
## 
## data:  Qual_Exam_Data$langcomp_avg and Qual_Exam_Data$X1RSCALK4
## t = 62.997, df = 10669, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.5067279 0.5343897
## sample estimates:
##       cor 
## 0.5206954

The decoding composite demonstrated a strong positive correlation with the reading measure, r(10,669) = .64, p < .001, whereas the language comprehension composite showed a moderate positive correlation, r(10,669) = .52, p < .001. This suggests that decoding skills were more strongly associated with reading performance than language comprehension skills in the fall assessment data.

Recommendations for Future Measurement Evaluation

To strengthen validity and reliability evidence, future studies should consider:

  • Test–Retest Reliability: Assess score stability over time to determine whether decoding scores and language comprehension remain consistent across measurement points.
  • Content Validity: Use a systematic expert review process with a larger panel and calculate a Content Validity Index (CVI) to ensure items fully represent the constructs.
  • Factorial Validity: Conduct exploratory factor analysis to confirm unidimensionality of each scale.
  • Cultural and Contextual Validity: Evaluate the scales across diverse student populations and settings (e.g., urban vs. rural) to ensure broader applicability.

Overall, both scales demonstrated good internal consistency.

Descriptive Statistics

To begin this analysis, I looked at the distribution of composite scores for both fall decoding and language comprehension indicators, graph 1.1. Both indicators show similar ranges of scores with fall decoding having a mean of 2.5, suggesting that most teachers perceive students beginning to use decoding skills. Language comprehension skills had a mean of 2.6, slightly above decoding at the beginning of the year and showing teacher perceive most students to be beginning to use language comprehension skills.

Graph 1.1 Side-by-Side Box Plots Showing Distribution of Decoding and Language Scores

The decoding composite scores ranged from 1.00 to 5.00, with a mean of 2.49 and a median of 2.33 (Table 1.1) The 25th and 75th percentiles were 1.67 and 3.33, respectively, showing moderate spread in performance.

The language comprehension composite scores also ranged from 1.00 to 5.00, with a slightly higher mean of 2.63 and a median of 2.50 (Table 1.2) The interquartile range was 1.75 to 3.25. Both composites had similar ranges and distributions, though language comprehension scores were marginally higher on average.

Table 1.1 Decoding Scores 5-Number Summary
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   1.667   2.333   2.486   3.333   5.000
Table 1.2 Language Scores 5-Number Summary
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   1.750   2.500   2.625   3.250   5.000

Next, I looked at the frequency of languages spoken at home, Table 1.3. Of the 10,671 students surveyed, 1,580 do not speak English at home, 8,968 speak English at home and 113 speak both English and non-English equally at home. This suggest that the majority of students in the data set speak English at home.

Table 1.3 - Frequency of Languages Spoken at Home
## 
##    1    2    3 
## 1590 8968  113

Finally, I looked at gender frequency in the data, Table 1.4 In this sample, 5,488 students were male and 5,183 students were female. Indicating an equal split in the data in regards to gender. ##### Table 1.4 - Gender Frequency 1-Male, 2=Female

## 
##    1    2 
## 5488 5183

Assumption Testing

Linearity & Homoscadasticity - A residuals-versus-fitted-values scatter plot showed a random pattern (Graph 1.2), supporting the assumption of linearity between predictors and the dependent variable. In looking at the graph, it appears the spead is roughly the same across all levels of the decoding and language variables.

Graph 1.2 - Residuals-versus-Fitted Plot

Independence of Residuals The Durbin–Watson statistic indicated no evidence of autocorrelation in the residuals, DW = 1.99, p = .618, satisfying the assumption of independence.

##  lag Autocorrelation D-W Statistic p-value
##    1     0.003959242      1.991066    0.65
##  Alternative hypothesis: rho != 0

Normality of Residuals/Errors When checking for normality, it appears from the Q-Q plot that the data may be over-dispersed, with data being more distributed around a central value. This pattern indicates mild departure from perfect normality, though such deviations are common in large samples and are unlikely to substantially impact regression estimates. However, it is advised to proceed with that in mind.

Q-Q Plot of Residuals

Multicollinearity Variance inflation factors (VIFs) for decoding (VIF = 2.62) and language comprehension (VIF = 2.62) were well below the commonly used threshold of 5, indicating no concerns with multicollinearity.

Variance Inflation Factors for Decoding and Language Comprehension
## decoding_avg langcomp_avg 
##     2.617553     2.617553

After checking to ensure all assumptions were met I ran a multiple regression analysis to examine whether Fall decoding and language comprehension composites predicted spring reading scores. The overall model was statistically significant, F(2, 10,668) = 2294.0, p < .001, and explained approximately 30.1% of the variance in spring reading performance (R² = .301). From the model, both decoding (B = 6.27, SE = 0.17, p < .001) and language comprehension (B = 0.85, SE = 0.18, p < .001) were statistically significant positive predictors, with decoding showing a substantially larger effect size.

Specifically, for each 1 point increase in decoding skills average, spring reading scores increased by about 6 points, while accounting for language comprehension skills. In regards to language comprehension, for each 1 point increase in language comprehension skill average, spring reading scores increased by about 1 point, while accounting for decoding skills.

From this analysis, it is evident that decoding skills are a stronger predictor, which supports The Science of Reading and a phonics-first emphasis.

Model 1-Multiple Regression Model of Decoding and Language Skills Predicting Spring Reading Scores
## 
## Call:
## lm(formula = X2RSCALK4 ~ decoding_avg + langcomp_avg, data = Qual_Exam_Data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -52.644  -7.708  -1.148   6.355  61.180 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   49.8122     0.3122 159.549  < 2e-16 ***
## decoding_avg   6.2747     0.1651  38.008  < 2e-16 ***
## langcomp_avg   0.8486     0.1778   4.774 1.83e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 11.5 on 10668 degrees of freedom
## Multiple R-squared:  0.3007, Adjusted R-squared:  0.3006 
## F-statistic:  2294 on 2 and 10668 DF,  p-value: < 2.2e-16

After determining that fall decoding and language comprehension skills account for 30.1% variation in spring Reading IRT scores, I then repeated the analysis using a multiple regression a second time controlling for Fall Reading IRT scores.

The model was statistically significant, F(6969, 3) = 10667, p < .001, and explained approximately 66% of the variance in spring reading performance (R² = 0.6618). After controlling for fall reading ability, decoding did not remain a statistically significant predictor, p=0.307. However, language comprehension stayed statistically significant, p=<0.001. Notably, prior reading ability (X1RSCALK4) was the strongest reading predictor, showing that for every 1 point increase in Fall reading IRT scores, Spring reading IRT scores increased by about 1 point also.

From this model it is evident that Fall Reading IRT scores account for most of the variation in Spring Reading IRT scores.

Model 2-Multiple Regression Model of Decoding and Language Skills Predicting Spring Reading Scores Accounting for Fall Reading IRT Scores
## 
## Call:
## lm(formula = X2RSCALK4 ~ decoding_avg + langcomp_avg + X1RSCALK4, 
##     data = Qual_Exam_Data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -71.840  -5.503  -0.725   4.687  49.317 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  16.176865   0.382666  42.274  < 2e-16 ***
## decoding_avg  0.131205   0.128423   1.022    0.307    
## langcomp_avg  0.484606   0.123667   3.919 8.96e-05 ***
## X1RSCALK4     0.934729   0.008757 106.740  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8 on 10667 degrees of freedom
## Multiple R-squared:  0.6619, Adjusted R-squared:  0.6618 
## F-statistic:  6960 on 3 and 10667 DF,  p-value: < 2.2e-16

Model 1 explained about 30% of the variation in Spring IRT scores from decoding and language comprehension while Model 2, which accounted for Fall IRT scores, explained 66% of the variation. In model 2, decoding skills were no longer statistically significant, whereas language comprehension skills were. This could potentially suggest that after students reach a certain decoding proficiency, comprehension skills may play a more important role in predicting later reading performance.

Finally, to test and see if the relationship between decoding skills and Spring IRT scores depends on home language I ran a moderation analysis. The interaction model overall was significant, p=<0.001. However, the interaction term was not statistically significant, p=0.84. From this model, we can infer that home language does not mediate the relationship between spring reading test scores and Fall decoding skills.

Moderation Model-Fall Decoding Skills interacting with Home Language on Spring IRT Scores
## 
## Call:
## lm(formula = X2RSCALK4 ~ decoding_avg * X12LANGST + langcomp_avg + 
##     X1RSCALK4, data = Qual_Exam_Data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -71.796  -5.530  -0.688   4.679  49.178 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            15.008482   0.941600  15.939  < 2e-16 ***
## decoding_avg            0.109481   0.372250   0.294  0.76868    
## X12LANGST               0.735731   0.480980   1.530  0.12613    
## langcomp_avg            0.410862   0.125012   3.287  0.00102 ** 
## X1RSCALK4               0.932299   0.008777 106.217  < 2e-16 ***
## decoding_avg:X12LANGST  0.037760   0.187277   0.202  0.84021    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.995 on 10665 degrees of freedom
## Multiple R-squared:  0.6624, Adjusted R-squared:  0.6622 
## F-statistic:  4184 on 5 and 10665 DF,  p-value: < 2.2e-16
Results and Discussion

The initial model tested whether Fall decoding and language comprehension composite scores contributed to reading development at the kindergarten level. Results indicated that 30% of the variation in spring reading development can be explained by fall decoding and language scores. This model also showed that decoding had a larger effect on reading development than language comprehension skills. From this analysis, it is evident that decoding skills are a stronger predictor of reading development at the kindergarten level, which supports The Science of Reading and a phonics-first emphasis. That aligns with prior literature show that decoding is a key component of literacy, and the science of reading is the body of research that demonstrates the importance of explicitly teaching this skill (Duke & Cartwright, 2021).

Research from the NIH’s National Reading Panel has also shown phonemic awareness (decoding skills) instruction is most effective when students are taught to manipulate phonemes with letters, when it explicitly focuses on one or two types of phoneme manipulations (rather than multiple), and when students are taught in small groups (Cunningham, 2001). This analysis model can suggest that at the start of the year, students with stronger decoding skills are positioned for greater gains in reading achievement. When considering this model alone, the data does support the South’s phonics first strategy versus a more balanced approach to early reading growth.

However, when accounting for Fall IRT reading scores, decoding predictive power goes away and is no longer able to account for variance in Spring reading scores. Yet, in this model language comprehension continues to predict and account for variation in spring reading scores. Taken together, the data shows that much of the influence of decoding skills is already captured by a student’s overall reading level at the beginning of the school year, whereas language comprehension skills continues to explain variation in spring scores beyond prior reading performance. That is to say, once students reach a certain ability in decoding proficiency and skills then differences in language comprehension skills can better explain variation in subsequent reading growth at the end of the year.

These results align with both The Simple View of Reading and Scarborough’s Reading Rope. Both models are widely recognized within the educational system and follow The Science of Reading. However, they differ in how they conceptualize and prioritize decoding and language skills (phonemic awareness, phonics, fluency, vocabulary, and comprehension). According to Philip Gough and William Tunmer’s (1986) Simple View of Reading (SVR), reading comprehension is conceptualized through the combination of two skills: Word recognition and language comprehension. The model is represented as a multiplication equation, showing that neither of these concepts is sufficient on its own. Lacking an understanding of either concept, could lead to reading failure, which is why it is imperative students develop both.

Scarborough’s Reading Rope offers a slightly different breakdown of skill components visually representing the different components that need to be woven together for skilled reading. The strands on the rope are broken into the two categories, word recognition and language comprehension, which are the main components of The Simple View of Reading and our models. The strands are further broken down into small strands to represent skills that fall into each category. All of the skills in the strands are also considered essential for skilled reading and dependent upon one another.

Both models support explicit Structured Literacy Instruction which means concepts and skills introduced to students are directly taught and practiced. Educators should not assume students learn literacy principles independently or through exposure alone, which falls into a more balanced approach. The goal is to provide ample guidance and practice to help students gain new literacy concepts. It is critical to give immediate feedback to minimize the possibility of students learning and practicing concepts incorrectly.

From an instructional perspective, both models and the data point toward a balanced approach: using The Science of Reading to guide instruction. Educators should emphasis decoding, which is crucial for establishing foundation skills. Once decoding skills are mastered and students progress, educators should seek to strengthen language comprehension though activities such as vocabulary building, background knowledge, and inferential skills, all of which are important for sustaining reading growth. With the current educational climate focused on The Science of Reading and moving away from a balanced literacy approach, educational shifts over the year should continue to reflect this developmental trajectory, which ensures that both skill sets are nutured and developed.

Finally, the lack of statistically significant interaction in the third moderation model supports the idea that the patterns above are the same across home language groups because the relationship between decoding, language comprehension, and spring reading outcomes did not depend on whether English was the student’s home language.

To conclude, the data suggest that a Phonics First Reading Approach, such as The Science of Reading versus a balanced literacy approach will better support kindergarten students developing reading ability, even when home language is accounted for. This research has implications for policy-makers, district representatives, and educators. When determining reading curriculum and instruction, districts should ensure early reading instruction focused on decoding and language comprehension explicit instruction and follow The Simple View of Reading or Scarborough’s Reading Rope model to maximize student success.

References

Cunningham, J. W. (2001). The National Reading Panel Report. Reading Research Quarterly, 36(3),
326–335. https://doi.org/10.1598/rrq.36.3.5

Duke, N. K., & Cartwright, K. B. (2021). The science of reading progresses: communicating advances
beyond the simple view of reading. Reading Research Quarterly, 56(S1).
https://doi.org/10.1002/rrq.411

D’Souza, K. (2023, May 18). Southern states make big strides in early literacy. EdSource.
https://edsource.org/updates/southern-states-make-big-strides-in-early-literacy

Gough, P. B., & Tunmer, W. E. (1986). Decoding, reading, and reading disability. Remedial and Special Education, 7(1), 6–10. https://doi.org/10.1177/074193258600700104

Hoover, W. A., & Gough, P. B. (1990). The simple view of reading. Reading and Writing, 2(2),
127–160. https://doi.org/10.1007/bf00401799

National Center for Education Statistics. (n.d.). The Nation’s Report Card | NAEP.
https://nces.ed.gov/nationsreportcard/

Petscher, Y., Cabell, S. Q., Catts, H. W., Compton, D. L., Foorman, B. R., Hart, S. A., Lonigan, C. J., Phillips, B. M., Schatschneider, C., Steacy, L. M., Terry, N. P., & Wagner, R. K. (2020). How the Science of Reading Informs 21st‐Century Education. Reading Research Quarterly, 55(S1). https://doi.org/10.1002/rrq.352