Angoff Method Sampling Rate Study: Results Report

Study Overview

Purpose

The purpose of this study was to determine the minimum percentage of items from an 80-item test that Angoff raters need to evaluate to produce a reliable cut score. The goal was to balance accuracy with rater effort, potentially reducing the workload without compromising the integrity of the standard-setting process.

Research Question

“What is the minimum percentage of items from an 80-item test that Angoff raters need to evaluate in order to produce a cut score that is within 5% of the full-set cut score, with 95% confidence, while also maintaining a standard error no greater than 1.5 times that of the full-set standard error?”

Methodology

Simulation Parameters

Test length: 80 items
Number of raters: 5
Rater means: Randomly generated between 60 and 85
Rater bias: Normally distributed with mean 0 and SD 0.10
Item-level variability: SD 0.10

Analysis Procedure

Generated a full set of simulated Angoff ratings
Calculated the full-set cut score and standard error as benchmarks
Conducted bootstrap analyses for sampling rates from 10% to 100% in 10% increments
For each sampling rate:
- Estimated the cut score
- Calculated the standard error and 95% confidence interval
- Determined if the results met the specified criteria

Criteria for Acceptability

Estimated cut score within 5% of the full-set cut score with 95% confidence
Standard error no greater than 1.5 times the full-set standard error

Results

Full Dataset Benchmark

Full-set cut score: 76.53
Full-set standard error: 0.33

Results by Sampling Rate

Sampling Rate (%)	Items Rated	Estimated Cut Score	Standard Error	Lower 95% CI	Upper 95% CI	Meets Criteria
10	8	77.52	1.05	75.57	79.71	FALSE
20	16	76.75	0.76	75.24	78.18	FALSE
30	25	76.06	0.60	74.89	77.34	FALSE
40	32	76.55	0.51	75.48	77.49	FALSE
50	40	76.19	0.47	75.35	77.21	TRUE
60	48	76.56	0.44	75.66	77.39	TRUE
70	56	76.54	0.39	75.75	77.28	TRUE
80	64	76.69	0.38	75.96	77.46	TRUE
90	72	76.60	0.35	75.93	77.29	TRUE
100	80	76.53	0.33	75.88	77.14	TRUE
100	80	76.53	0.33	75.88	77.18	TRUE

Minimum Acceptable Sampling Rate

Minimum sampling rate meeting all criteria: 50%
Corresponding number of items: 40

Visualization

Interpretation of Results

Minimum Sampling Rate: Our analysis indicates that a sampling rate of 50% is the minimum that meets all specified criteria. This corresponds to rating 40 items out of the full 80-item test.
Stability of Estimates: As the sampling rate increases, we observe a gradual stabilization of the cut score estimates, with narrowing confidence intervals.
Precision vs. Effort Trade-off: While higher sampling rates generally provide more precise estimates, the improvements beyond 50% sampling rate appear to be marginal, suggesting a potential point of diminishing returns.
Practical Implications: These results suggest that Angoff raters could potentially evaluate as few as 40 items to produce a cut score that is statistically indistinguishable from one produced by rating all 80 items.

Conclusions

Feasibility of Reduced Rating Burden: This study demonstrates that it is possible to significantly reduce the number of items rated in the Angoff method while maintaining a high level of accuracy and confidence in the resulting cut score.
Optimal Sampling Strategy: Based on our criteria, a sampling rate of 50% (corresponding to 40 items) appears to be the optimal balance between minimizing rater effort and maintaining statistical rigor.
Potential Benefits:
- Reduced time and cognitive load for raters
- Potential for involving more raters or conducting more frequent standard setting
- Maintained integrity of the cut score determination process

Limitations and Future Directions

Simulation-Based Study: These results are based on simulated data. Validation with real-world data is necessary to confirm these findings.
Generalizability: The optimal sampling rate may vary based on factors such as test content, item difficulty distribution, and rater characteristics. Further research could explore how these factors influence the required sample size.
Alternative Approaches: Future studies could compare this method with other efficiency-improving approaches, such as iterative rating processes or stratified item sampling.
Rater Dynamics: Investigation into how reduced item sets affect rater behavior and decision-making processes could provide additional insights.

In conclusion, this study provides evidence-based guidance for optimizing the Angoff standard setting process, potentially leading to more efficient and sustainable practices in educational and professional certification contexts.