The purpose of this study was to determine the minimum percentage of items from an 80-item test that Angoff raters need to evaluate to produce a reliable cut score. The goal was to balance accuracy with rater effort, potentially reducing the workload without compromising the integrity of the standard-setting process.
“What is the minimum percentage of items from an 80-item test that Angoff raters need to evaluate in order to produce a cut score that is within 5% of the full-set cut score, with 95% confidence, while also maintaining a standard error no greater than 1.5 times that of the full-set standard error?”
| Sampling Rate (%) | Items Rated | Estimated Cut Score | Standard Error | Lower 95% CI | Upper 95% CI | Meets Criteria |
|---|---|---|---|---|---|---|
| 10 | 8 | 77.52 | 1.05 | 75.57 | 79.71 | FALSE |
| 20 | 16 | 76.75 | 0.76 | 75.24 | 78.18 | FALSE |
| 30 | 25 | 76.06 | 0.60 | 74.89 | 77.34 | FALSE |
| 40 | 32 | 76.55 | 0.51 | 75.48 | 77.49 | FALSE |
| 50 | 40 | 76.19 | 0.47 | 75.35 | 77.21 | TRUE |
| 60 | 48 | 76.56 | 0.44 | 75.66 | 77.39 | TRUE |
| 70 | 56 | 76.54 | 0.39 | 75.75 | 77.28 | TRUE |
| 80 | 64 | 76.69 | 0.38 | 75.96 | 77.46 | TRUE |
| 90 | 72 | 76.60 | 0.35 | 75.93 | 77.29 | TRUE |
| 100 | 80 | 76.53 | 0.33 | 75.88 | 77.14 | TRUE |
| 100 | 80 | 76.53 | 0.33 | 75.88 | 77.18 | TRUE |
Minimum Sampling Rate: Our analysis indicates that a sampling rate of 50% is the minimum that meets all specified criteria. This corresponds to rating 40 items out of the full 80-item test.
Stability of Estimates: As the sampling rate increases, we observe a gradual stabilization of the cut score estimates, with narrowing confidence intervals.
Precision vs. Effort Trade-off: While higher sampling rates generally provide more precise estimates, the improvements beyond 50% sampling rate appear to be marginal, suggesting a potential point of diminishing returns.
Practical Implications: These results suggest that Angoff raters could potentially evaluate as few as 40 items to produce a cut score that is statistically indistinguishable from one produced by rating all 80 items.
Feasibility of Reduced Rating Burden: This study demonstrates that it is possible to significantly reduce the number of items rated in the Angoff method while maintaining a high level of accuracy and confidence in the resulting cut score.
Optimal Sampling Strategy: Based on our criteria, a sampling rate of 50% (corresponding to 40 items) appears to be the optimal balance between minimizing rater effort and maintaining statistical rigor.
Potential Benefits:
Simulation-Based Study: These results are based on simulated data. Validation with real-world data is necessary to confirm these findings.
Generalizability: The optimal sampling rate may vary based on factors such as test content, item difficulty distribution, and rater characteristics. Further research could explore how these factors influence the required sample size.
Alternative Approaches: Future studies could compare this method with other efficiency-improving approaches, such as iterative rating processes or stratified item sampling.
Rater Dynamics: Investigation into how reduced item sets affect rater behavior and decision-making processes could provide additional insights.
In conclusion, this study provides evidence-based guidance for optimizing the Angoff standard setting process, potentially leading to more efficient and sustainable practices in educational and professional certification contexts.