Introduction

Since tests are only equated at the total test level, year-to-year comparison of reporting category performance should be made cautiously. In the test construction process, every effort is made to keep the difficulty of each reporting category similar from year to year, but some fluctuation does occur across administrations.

In this study, three estimation methods compared between grade 3 math in 2018 and grade 4 in 2019.
1) Percent raw subscores
2) Haberman’s estimation method (“Obtaining subscores that are estimated as a function of both the observed subscores and the observed total score.”)
3) Quasi-Normal Curve Equivalent: Soo-Hee suggested this approach at last meeting. The NCE score used the adjusted the percent of raw scores by average item-difficulty (P-value)

For the year-to-year comparison, student_key was used to match the student level informtion. This is a 2-year longitudinal math data. Two test core forms were developed in spring 2018: Form C & Form D.

General Recommendation in research papers
1) Subscore information should only be used for low-stakes purposes because the subscores may not be stable for any domain with a small number of items.
2) The resulting subscores will likely be affected by differences in item difficulty as well as differences in student ability.
3) Subscores across years and grades should not be seen as reliable indicators of differences in student ability.
4) Comparisons of individual student subscores or of group means within one administration can provide useful information about the relative strengths and weaknesses on the measured domains.


Table 1. Subscore estimation methods

Method TYPE Advantage Disadvantage
Raw subscores (percentage correct) CTT Very easy computation Least accurate and reliable for a short subtest
Haberman’s (2008) methods CTT, Regression approach
  1. Provide a quick tool (PRMSE) to judge whether subscores should be reported in addition to total scores. 2) Relatively easy computation
Hard to explain to test users why a subscore estimate depends not only on the observed subscore but also on other observed subscore(s)
Objective Performance Index (OPI;Yen, 1987;Yenetal.,1997) Bayesian and IRT The OPIs may provide more reliable estimates of student achievement on each domain or strand than simple raw scores or percent correct scores.
  1. Not perform well when the correlations between subscores are low or intermediate (Unidimensionality assumption). 2) Estimation errors will occur if a test is composed of a large number of polytomous items (Should be scored dichotomously).

Score distributions of each subscore estimation methods

Table 2. Number of items per subscore category in 2018 (grade 3) and 2019 (grade 4)

Reporting_Category Grade 3 in 2018 Grade 4 in 2019
Computation with Whole Numbers 14 12
Fractions 6 17
Number Relationships and Patterns 10 12
Geometric and Measurement Concepts 19 11


Computation

computation n mean stdev min max
computation2018C 34799 68.39 25.63 0 100
computation2018C_HB 34799 68.21 22.25 8 100
computation2018C_NCE 34799 50.39 20.79 0 76
computation2019 34799 61.69 26.17 0 100
computation2019_HB 34799 61.69 22.11 10 100
computation2019_NCE 34799 42.67 18.05 0 69



Fraction
fractions n mean stdev min max
fractions2018C 34799 49.07 28.57 0 100
fractions2018C_HB 34799 49.06 21.60 0 91
fractions2018C_NCE 34799 50.14 20.99 14 88
fractions2019 34799 60.41 25.07 0 100
fractions2019_HB 34799 60.38 22.24 8 100
fractions2019_NCE 34799 53.84 22.66 0 90



Number

number n mean stdev min max
number2018C 34799 55.51 27.78 0 100
number2018C_HB 34799 55.55 23.31 0 99
number2018C_NCE 34799 50.28 21.01 8 84
number2019 34799 58.90 24.58 0 100
number2019_HB 34799 58.81 20.75 10 96
number2019_NCE 34799 48.37 20.35 0 82



Geometric
geometrics n mean stdev min max
geometrics2018C 34799 55.01 20.91 0 100
geometrics_2018C_HB 34799 54.98 17.88 7 90
geometrics2018C_NCE 34799 50.01 20.90 0 95
geometrics2019 34799 49.84 27.30 0 100
geometrics_2019_HB 34799 49.88 22.55 0 91
geometrics2019_NCE 34799 45.74 18.79 11 80



Relationship among total SS and subscores