Background: Psychometric equivalence is essential in test construction to ensure that sampled items perform consistently across different test versions. This analysis evaluates the psychometric properties of 30 dichotomous (0/1) items from a sample of 1,000 participants by clustering items based on their difficulty and discrimination using Item Response Theory (IRT) models.
Methods: The analysis applied Categorical Principal Component Analysis (Princals) to assess unidimensionality, Exploratory Factor Analysis (EFA) to investigate factor structures, and Item Factor Analysis (IFA) to confirm model simplicity. IRT models, including the Rasch model, Two-Parameter Logistic (2PL), and Three-Parameter Logistic (3PL) models, were used to estimate item parameters and guide the clustering process. The 2PL model was selected for its ability to accommodate varying item discrimination, unlike the Rasch model, which assumes equal discrimination, and the 3PL model, which introduces a guessing parameter. Using KMeans clustering, items were categorized into three groups based on their discrimination and difficulty levels.
Results: The Princals analysis indicated that the items aligned along a single dimension, confirming unidimensionality. The EFA in combination with a scree plot also verified that a one-factor solution is the best for the data. The IFA suggested that a simpler factor structure provided a better fit. The Rasch model captured a wide range of item difficulties but was limited by its assumption of equal discrimination across items. The 2PL model provided the best fit, with discrimination parameters ranging from 0.778 to 1.862 and difficulty parameters covering a wide range. The guessing parameter in the 3PL model did not significantly improve the model fit. The Elbow and Silhouette methods showed that three clusters were optimal for KMeans clustering, facilitating balanced item sampling.
Conclusions: The 2PL model was the optimal framework for clustering the 30 items based on difficulty and discrimination. This approach enabled accurate grouping into three clusters, allowing future test forms to draw items from each level to ensure psychometric equivalence and improve the reliability and fairness of assessments while exactly measuring diverse participant abilities.
The dataset includes responses to 30 dichotomous (0/1) items (U1–U30) from 1,000 participants. Each row represents a participant’s response pattern, with “1” indicating a correct response and “0” indicating an incorrect response. Demographic data such as group and age are also given. The data can be used for psychometric analysis, particularly with Item Response Theory (IRT) models, to assess item difficulty and discrimination.
| U1 | U2 | U3 | U4 | U5 | U6 | U7 | U8 | U9 | U10 | U11 | U12 | U13 | U14 | U15 | U16 | U17 | U18 | U19 | U20 | U21 | U22 | U23 | U24 | U25 | U26 | U27 | U28 | U29 | U30 | group | Age |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | men | 28 |
| 1 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | women | 26 |
| 1 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | women | 28 |
| 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | women | 30 |
| 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | men | 26 |
| 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | men | 26 |
| Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. |
|---|---|---|---|---|---|
| 20 | 26 | 28 | 27.727 | 30 | 32 |
| Var1 | Freq |
|---|---|
| men | 500 |
| women | 500 |
The dataset consists of 1,000 participants with ages ranging from 20 to 32 years. The mean age is 27.73 years, and the median is 28 years. The distribution is relatively narrow, with most participants (63.3%) falling between 26 and 30 years, indicating a slightly younger sample. The group distribution is balanced, with exactly 500 women and 500 men.
| Count_1s | Count_0s | |
|---|---|---|
| U1 | 501 | 499 |
| U2 | 193 | 807 |
| U3 | 470 | 530 |
| U4 | 642 | 358 |
| U5 | 293 | 707 |
| U6 | 844 | 156 |
| U7 | 315 | 685 |
| U8 | 199 | 801 |
| U9 | 635 | 365 |
| U10 | 802 | 198 |
| U11 | 782 | 218 |
| U12 | 173 | 827 |
| U13 | 757 | 243 |
| U14 | 374 | 626 |
| U15 | 654 | 346 |
| U16 | 473 | 527 |
| U17 | 674 | 326 |
| U18 | 818 | 182 |
| U19 | 520 | 480 |
| U20 | 681 | 319 |
| U21 | 833 | 167 |
| U22 | 472 | 528 |
| U23 | 298 | 702 |
| U24 | 322 | 678 |
| U25 | 181 | 819 |
| U26 | 328 | 672 |
| U27 | 535 | 465 |
| U28 | 226 | 774 |
| U29 | 698 | 302 |
| U30 | 206 | 794 |
The distribution of responses across items U1 to U30 shows variability in the number of 0’s and 1’s. Items like U6 and U21 had a high proportion of 1’s (844 and 833), indicating they were easier for participants. In contrast, U12 and U25 had significantly fewer 1’s (173 and 181), suggesting higher difficulty.
This pattern shows that certain items were easier for participants, leading to higher correct response rates, while others created more challenges, resulting in a greater number of incorrect answers. Overall, the differences in response distribution indicate varying levels of item difficulty within the dataset.
Before fitting an Item Response Theory (IRT) model, it is crucial to assess the scale’s dimensionality. The following method were used: Categorical Principal Component Analysis (Princals), Exploratory Factor Analysis (EFA) and Item Factor Analysis (IFA) for analyzing latent item structures. These techniques ensure the scale is appropriate for IRT modeling.
The PCA shows that all 30 items point in the same direction on the principal component plot, suggesting they measure a similar concept. This supports grouping the items into balanced clusters based on difficulty and discrimination, helping to create clusters with similar characteristics.
| Factor 1 | Factor 2 | Factor 3 | |
|---|---|---|---|
| U1 | 0.4583751 | 0.4146866 | 0.2245282 |
| U2 | 0.4749530 | 0.5012407 | 0.0883197 |
| U3 | 0.4540790 | 0.3512264 | 0.0874065 |
| U4 | 0.5368042 | 0.2194293 | 0.1355101 |
| U5 | 0.4991854 | 0.3067454 | 0.1303310 |
| U6 | 0.1765252 | 0.1424032 | 0.9713710 |
| U7 | 0.3560481 | 0.3641058 | 0.0744484 |
| U8 | 0.2052611 | 0.4936294 | 0.2133258 |
| U9 | 0.4427127 | 0.2513244 | 0.2079537 |
| U10 | 0.4360856 | 0.3907758 | 0.1537579 |
| U11 | 0.5655336 | 0.1685652 | 0.1369274 |
| U12 | 0.1731500 | 0.6412458 | 0.2160460 |
| U13 | 0.4997258 | 0.3650578 | 0.0974844 |
| U14 | 0.4238033 | 0.4193519 | 0.1390296 |
| U15 | 0.6852040 | 0.2895528 | 0.1409246 |
| U16 | 0.5054943 | 0.4780923 | 0.1753344 |
| U17 | 0.4192744 | 0.4352794 | 0.0120122 |
| U18 | 0.2765364 | 0.3262428 | 0.1715914 |
| U19 | 0.4144009 | 0.2099004 | -0.0399616 |
| U20 | 0.6080735 | 0.2784103 | 0.0691154 |
| U21 | 0.4073212 | 0.2540542 | 0.1202816 |
| U22 | 0.5184947 | 0.3358727 | 0.2031776 |
| U23 | 0.4023313 | 0.5120767 | 0.0276091 |
| U24 | 0.4453539 | 0.3604365 | 0.2536687 |
| U25 | 0.5685594 | 0.3351204 | 0.1343321 |
| U26 | 0.5255998 | 0.3137475 | 0.1967931 |
| U27 | 0.3768026 | 0.3910923 | 0.0724512 |
| U28 | 0.5413010 | 0.3418858 | 0.2666009 |
| U29 | 0.4252332 | 0.3504914 | 0.0695117 |
| U30 | 0.3364113 | 0.6218859 | -0.0226244 |
The eigenvalues indicate a sharp decline after the first factor, dropping from 10.62 (Factor 1) to 0.96 (Factor 2). This decrease suggests that Factor 1 explains the majority of the variance, while the factors afterwards contribute very little. The scree plot’s “elbow” at Factor 1 highlights this dramatic reduction in explained variance, showing that after the first factor, the other factors have little impact.
The EFA factor loadings follow the eigenvalue pattern, with Factor 1 showing strong loadings across most items (e.g., U15 = 0.685, U20 = 0.608). In contrast, Factors 2 and 3 have weak and inconsistent loadings, with only a few items moderately linked to these factors. For example, Factor 3 only shows a significant loading on U6 (0.971), which is not enough to define a meaningful factor.
The eigenvalue drop from 10.62 to 0.96 and the weak loadings for Factors 2 and 3, suggest that a single-factor model fits best. The scree plot and factor loadings show that the data doesn’t clearly split into three distinct factors, as the additional factors add very little. Therefore, a one-factor model is simpler and fits better.
| AIC | SABIC | HQ | BIC | logLik | X2 | df | p | |
|---|---|---|---|---|---|---|---|---|
| fitifa1 | 30805.21 | 30909.11 | 30917.12 | 31099.67 | -15342.60 | NA | NA | NA |
| fitifa2 | 30811.02 | 30965.14 | 30977.03 | 31247.81 | -15316.51 | 52.185 | 29 | 0.005 |
Two models were tested, and the simpler model (fitifa1) had a lower AIC, indicating it was more efficient. The more complex model (fitifa2) had a significant chi-square result (p = 0.005), suggesting a poor fit despite having more parameters.
The IFA results support the simpler model (fitifa1), which captures the factor structure effectively without unnecessary complexity. The lack of improvement in the more complex model (fitifa2) suggests that adding more factors or parameters doesn’t significantly improve the fit. This indicates that the factor structure is relatively simple, and the items can be grouped using fewer factors.
| Item | Estimate | Std..Error | Lower.CI | Upper.CI | |
|---|---|---|---|---|---|
| U2 | U2 | 1.8276169 | 0.0885824 | 1.6539954 | 2.0012384 |
| U3 | U3 | 0.1303256 | 0.0717695 | -0.0103426 | 0.2709937 |
| U4 | U4 | -0.7862235 | 0.0738644 | -0.9309977 | -0.6414493 |
| U5 | U5 | 1.1265567 | 0.0781315 | 0.9734190 | 1.2796943 |
| U6 | U6 | -2.1491982 | 0.0935869 | -2.3326285 | -1.9657680 |
| U7 | U7 | 0.9914796 | 0.0767251 | 0.8410984 | 1.1418609 |
| U8 | U8 | 1.7798918 | 0.0876863 | 1.6080266 | 1.9517570 |
| U9 | U9 | -0.7472956 | 0.0736144 | -0.8915798 | -0.6030114 |
| U10 | U10 | -1.8031336 | 0.0862896 | -1.9722613 | -1.6340060 |
| U11 | U11 | -1.6554300 | 0.0837052 | -1.8194922 | -1.4913679 |
| U12 | U12 | 1.9943255 | 0.0919516 | 1.8141004 | 2.1745505 |
| U13 | U13 | -1.4821102 | 0.0810397 | -1.6409480 | -1.3232723 |
| U14 | U14 | 0.6493271 | 0.0739466 | 0.5043917 | 0.7942624 |
| U15 | U15 | -0.8535999 | 0.0743329 | -0.9992924 | -0.7079074 |
| U16 | U16 | 0.1145091 | 0.0717409 | -0.0261031 | 0.2551213 |
| U17 | U17 | -0.9679428 | 0.0752333 | -1.1154001 | -0.8204854 |
| U18 | U18 | -1.9284397 | 0.0887228 | -2.1023363 | -1.7545431 |
| U19 | U19 | -0.1322831 | 0.0715829 | -0.2725856 | 0.0080193 |
| U20 | U20 | -1.0086460 | 0.0755867 | -1.1567960 | -0.8604961 |
| U21 | U21 | -2.0528812 | 0.0913706 | -2.2319675 | -1.8737948 |
| U22 | U22 | 0.1197798 | 0.0717501 | -0.0208505 | 0.2604101 |
| U23 | U23 | 1.0954197 | 0.0777908 | 0.9429497 | 1.2478898 |
| U24 | U24 | 0.9494793 | 0.0763245 | 0.7998832 | 1.0990753 |
| U25 | U25 | 1.9261390 | 0.0905277 | 1.7487046 | 2.1035734 |
| U26 | U26 | 0.9138198 | 0.0759976 | 0.7648646 | 1.0627751 |
| U27 | U27 | -0.2110157 | 0.0716476 | -0.3514449 | -0.0705865 |
| U28 | U28 | 1.5759768 | 0.0841794 | 1.4109852 | 1.7409684 |
| U29 | U29 | -1.1091829 | 0.0765347 | -1.2591910 | -0.9591747 |
| U30 | U30 | 1.7253959 | 0.0866988 | 1.5554661 | 1.8953256 |
| Item | Estimate | Std..Error | Lower.CI | Upper.CI | |
|---|---|---|---|---|---|
| beta U1 | beta U1 | 0.0326600 | 0.0715813 | -0.1076394 | 0.1729594 |
| beta U2 | beta U2 | -1.8276169 | 0.0885824 | -2.0012384 | -1.6539954 |
| beta U3 | beta U3 | -0.1303256 | 0.0717695 | -0.2709937 | 0.0103426 |
| beta U4 | beta U4 | 0.7862235 | 0.0738644 | 0.6414493 | 0.9309977 |
| beta U5 | beta U5 | -1.1265567 | 0.0781315 | -1.2796943 | -0.9734190 |
| beta U6 | beta U6 | 2.1491982 | 0.0935869 | 1.9657680 | 2.3326285 |
| beta U7 | beta U7 | -0.9914796 | 0.0767251 | -1.1418609 | -0.8410984 |
| beta U8 | beta U8 | -1.7798918 | 0.0876863 | -1.9517570 | -1.6080266 |
| beta U9 | beta U9 | 0.7472956 | 0.0736144 | 0.6030114 | 0.8915798 |
| beta U10 | beta U10 | 1.8031336 | 0.0862896 | 1.6340060 | 1.9722613 |
| beta U11 | beta U11 | 1.6554300 | 0.0837052 | 1.4913679 | 1.8194922 |
| beta U12 | beta U12 | -1.9943255 | 0.0919516 | -2.1745505 | -1.8141004 |
| beta U13 | beta U13 | 1.4821102 | 0.0810397 | 1.3232723 | 1.6409480 |
| beta U14 | beta U14 | -0.6493271 | 0.0739466 | -0.7942624 | -0.5043917 |
| beta U15 | beta U15 | 0.8535999 | 0.0743329 | 0.7079074 | 0.9992924 |
| beta U16 | beta U16 | -0.1145091 | 0.0717409 | -0.2551213 | 0.0261031 |
| beta U17 | beta U17 | 0.9679428 | 0.0752333 | 0.8204854 | 1.1154001 |
| beta U18 | beta U18 | 1.9284397 | 0.0887228 | 1.7545431 | 2.1023363 |
| beta U19 | beta U19 | 0.1322831 | 0.0715829 | -0.0080193 | 0.2725856 |
| beta U20 | beta U20 | 1.0086460 | 0.0755867 | 0.8604961 | 1.1567960 |
| beta U21 | beta U21 | 2.0528812 | 0.0913706 | 1.8737948 | 2.2319675 |
| beta U22 | beta U22 | -0.1197798 | 0.0717501 | -0.2604101 | 0.0208505 |
| beta U23 | beta U23 | -1.0954197 | 0.0777908 | -1.2478898 | -0.9429497 |
| beta U24 | beta U24 | -0.9494793 | 0.0763245 | -1.0990753 | -0.7998832 |
| beta U25 | beta U25 | -1.9261390 | 0.0905277 | -2.1035734 | -1.7487046 |
| beta U26 | beta U26 | -0.9138198 | 0.0759976 | -1.0627751 | -0.7648646 |
| beta U27 | beta U27 | 0.2110157 | 0.0716476 | 0.0705865 | 0.3514449 |
| beta U28 | beta U28 | -1.5759768 | 0.0841794 | -1.7409684 | -1.4109852 |
| beta U29 | beta U29 | 1.1091829 | 0.0765347 | 0.9591747 | 1.2591910 |
| beta U30 | beta U30 | -1.7253959 | 0.0866988 | -1.8953256 | -1.5554661 |
The Rasch model assumes equal discrimination for all items, with discrimination set to 1. Item difficulty (eta) ranges from -2.149 (U6) to 1.994 (U12), showing a wide range of difficulty levels.
While the Rasch model is good for estimating item difficulty, its assumption of equal discrimination makes it less suitable for this dataset. The variation in difficulty suggests some items are easier (U6) and others harder (U12). However, by assuming equal discrimination, the model doesn’t account for how well items distinguish between respondents with different abilities. This means the Rasch model may oversimplify the data, so it might not be the best choice for clustering based on difficulty and discrimination.
| Item | Value | Standard_error | Z_value | |
|---|---|---|---|---|
| Dffclt.U1 | Dffclt.U1 | -0.0008481 | 0.0610828 | -0.013884 |
| Dffclt.U2 | Dffclt.U2 | 1.2687406 | 0.0901622 | 14.071760 |
| Dffclt.U3 | Dffclt.U3 | 0.1361012 | 0.0699564 | 1.945515 |
| Dffclt.U4 | Dffclt.U4 | -0.6242749 | 0.0798422 | -7.818856 |
| Dffclt.U5 | Dffclt.U5 | 0.9297283 | 0.0892332 | 10.419083 |
| Dffclt.U6 | Dffclt.U6 | -2.1987512 | 0.2488342 | -8.836208 |
| Dffclt.U7 | Dffclt.U7 | 0.9480237 | 0.1049707 | 9.031317 |
| Dffclt.U8 | Dffclt.U8 | 1.6488531 | 0.1552095 | 10.623404 |
| Dffclt.U9 | Dffclt.U9 | -0.6270314 | 0.0849658 | -7.379806 |
| Dffclt.U10 | Dffclt.U10 | -1.3526105 | 0.1096486 | -12.335868 |
| Dffclt.U11 | Dffclt.U11 | -1.3128331 | 0.1134219 | -11.574781 |
| Dffclt.U12 | Dffclt.U12 | 1.6436196 | 0.1378746 | 11.921116 |
| Dffclt.U13 | Dffclt.U13 | -1.0858853 | 0.0911028 | -11.919336 |
| Dffclt.U14 | Dffclt.U14 | 0.5363938 | 0.0728710 | 7.360864 |
| Dffclt.U15 | Dffclt.U15 | -0.5365018 | 0.0598813 | -8.959423 |
| Dffclt.U16 | Dffclt.U16 | 0.0976415 | 0.0562389 | 1.736190 |
| Dffclt.U17 | Dffclt.U17 | -0.7645914 | 0.0839846 | -9.103949 |
| Dffclt.U18 | Dffclt.U18 | -1.9453258 | 0.2115523 | -9.195482 |
| Dffclt.U19 | Dffclt.U19 | -0.1104972 | 0.0930060 | -1.188065 |
| Dffclt.U20 | Dffclt.U20 | -0.7144043 | 0.0727424 | -9.821011 |
| Dffclt.U21 | Dffclt.U21 | -1.8194657 | 0.1740250 | -10.455195 |
| Dffclt.U22 | Dffclt.U22 | 0.1128470 | 0.0621808 | 1.814822 |
| Dffclt.U23 | Dffclt.U23 | 0.8628041 | 0.0817048 | 10.560012 |
| Dffclt.U24 | Dffclt.U24 | 0.7597336 | 0.0786717 | 9.657008 |
| Dffclt.U25 | Dffclt.U25 | 1.3611040 | 0.0974469 | 13.967653 |
| Dffclt.U26 | Dffclt.U26 | 0.7116288 | 0.0745092 | 9.550886 |
| Dffclt.U27 | Dffclt.U27 | -0.1587176 | 0.0749001 | -2.119059 |
| Dffclt.U28 | Dffclt.U28 | 1.1332858 | 0.0862080 | 13.145950 |
| Dffclt.U29 | Dffclt.U29 | -0.9326043 | 0.0976556 | -9.549933 |
| Dffclt.U30 | Dffclt.U30 | 1.3144806 | 0.1029637 | 12.766445 |
| Dscrmn.U1 | Dscrmn.U1 | 1.4431905 | 0.1178069 | 12.250471 |
| Dscrmn.U2 | Dscrmn.U2 | 1.6390342 | 0.1486804 | 11.023876 |
| Dscrmn.U3 | Dscrmn.U3 | 1.1620367 | 0.1017599 | 11.419396 |
| Dscrmn.U4 | Dscrmn.U4 | 1.1862358 | 0.1064482 | 11.143784 |
| Dscrmn.U5 | Dscrmn.U5 | 1.2303873 | 0.1110062 | 11.083948 |
| Dscrmn.U6 | Dscrmn.U6 | 0.8738399 | 0.1135162 | 7.697929 |
| Dscrmn.U7 | Dscrmn.U7 | 0.9910584 | 0.0969653 | 10.220753 |
| Dscrmn.U8 | Dscrmn.U8 | 1.0154358 | 0.1087823 | 9.334565 |
| Dscrmn.U9 | Dscrmn.U9 | 1.0850009 | 0.1004593 | 10.800399 |
| Dscrmn.U10 | Dscrmn.U10 | 1.3647625 | 0.1334303 | 10.228283 |
| Dscrmn.U11 | Dscrmn.U11 | 1.2405463 | 0.1216962 | 10.193796 |
| Dscrmn.U12 | Dscrmn.U12 | 1.2056943 | 0.1234510 | 9.766585 |
| Dscrmn.U13 | Dscrmn.U13 | 1.4105109 | 0.1295309 | 10.889380 |
| Dscrmn.U14 | Dscrmn.U14 | 1.2741796 | 0.1092822 | 11.659530 |
| Dscrmn.U15 | Dscrmn.U15 | 1.8623600 | 0.1526118 | 12.203254 |
| Dscrmn.U16 | Dscrmn.U16 | 1.7056954 | 0.1344097 | 12.690273 |
| Dscrmn.U17 | Dscrmn.U17 | 1.2135765 | 0.1099691 | 11.035613 |
| Dscrmn.U18 | Dscrmn.U18 | 0.8848572 | 0.1085157 | 8.154189 |
| Dscrmn.U19 | Dscrmn.U19 | 0.7781632 | 0.0839225 | 9.272405 |
| Dscrmn.U20 | Dscrmn.U20 | 1.4692961 | 0.1254177 | 11.715217 |
| Dscrmn.U21 | Dscrmn.U21 | 1.0589969 | 0.1201081 | 8.817030 |
| Dscrmn.U22 | Dscrmn.U22 | 1.4077961 | 0.1155528 | 12.183141 |
| Dscrmn.U23 | Dscrmn.U23 | 1.3319425 | 0.1160957 | 11.472801 |
| Dscrmn.U24 | Dscrmn.U24 | 1.3076825 | 0.1132013 | 11.551837 |
| Dscrmn.U25 | Dscrmn.U25 | 1.5777773 | 0.1463352 | 10.781942 |
| Dscrmn.U26 | Dscrmn.U26 | 1.3736622 | 0.1164977 | 11.791320 |
| Dscrmn.U27 | Dscrmn.U27 | 1.0551151 | 0.0964734 | 10.936852 |
| Dscrmn.U28 | Dscrmn.U28 | 1.5408897 | 0.1365496 | 11.284472 |
| Dscrmn.U29 | Dscrmn.U29 | 1.1100887 | 0.1059345 | 10.479010 |
| Dscrmn.U30 | Dscrmn.U30 | 1.3802906 | 0.1288097 | 10.715736 |
The 2PL model estimates both item difficulty and discrimination, making it more flexible than the Rasch model. Discrimination ranges from 0.778 (U19) to 1.862 (U15), and difficulty varies widely across items. The model has a good fit, with a lower AIC (30806.27) than both the Rasch and 3PL models.
The 2PL model is the best fit for this data because it allows each item to have its own discrimination value, unlike the Rasch model. High-discrimination items (e.g., U15 and U16) better differentiate between participants of different abilities, while low-discrimination items (e.g., U19) are less reliable. The wide range of difficulty also shows that items cover a broad ability spectrum, making the 2PL model ideal for grouping items based on difficulty and discrimination.
| Item | Value | Standard_error | Z_value |
|---|---|---|---|
| Gussng.U1 | 0.0000009 | 0.0003609 | 0.0025857 |
| Gussng.U2 | 0.0000002 | 0.0000582 | 0.0031701 |
| Gussng.U3 | 0.0357052 | 0.0632781 | 0.5642578 |
| Gussng.U4 | 0.0001960 | NaN | NaN |
| Gussng.U5 | 0.0000000 | 0.0000194 | 0.0010626 |
| Gussng.U6 | 0.0015934 | 0.0801375 | 0.0198834 |
| Gussng.U7 | 0.0000431 | 0.0033458 | 0.0128825 |
| Gussng.U8 | 0.0458528 | 0.0274881 | 1.6680980 |
| Gussng.U9 | 0.0000029 | 0.0008204 | 0.0034785 |
| Gussng.U10 | 0.2851522 | 0.1370039 | 2.0813445 |
| Gussng.U11 | 0.0000032 | 0.0009037 | 0.0034951 |
| Gussng.U12 | 0.0211417 | 0.0205739 | 1.0275975 |
| Gussng.U13 | 0.1855913 | 0.1451978 | 1.2781969 |
| Gussng.U14 | 0.0459580 | 0.0406205 | 1.1313990 |
| Gussng.U15 | 0.0000048 | 0.0009051 | 0.0052610 |
| Gussng.U16 | 0.0647455 | 0.0356979 | 1.8137065 |
| Gussng.U17 | 0.2326992 | 0.0933739 | 2.4921230 |
| Gussng.U18 | 0.3740668 | 0.2431689 | 1.5383005 |
| Gussng.U19 | 0.1901005 | 0.1051988 | 1.8070599 |
| Gussng.U20 | 0.0000008 | 0.0004119 | 0.0019622 |
| Gussng.U21 | 0.0000156 | 0.0025430 | 0.0061353 |
| Gussng.U22 | 0.0512474 | 0.0511980 | 1.0009661 |
| Gussng.U23 | 0.0383912 | 0.0307128 | 1.2500078 |
| Gussng.U24 | 0.0150560 | 0.0325145 | 0.4630556 |
| Gussng.U25 | 0.0000001 | 0.0000436 | 0.0024084 |
| Gussng.U26 | 0.0000007 | 0.0002423 | 0.0028031 |
| Gussng.U27 | 0.2032610 | 0.0576739 | 3.5243161 |
| Gussng.U28 | 0.0000071 | 0.0012296 | 0.0057429 |
| Gussng.U29 | 0.1427725 | 0.1731639 | 0.8244932 |
| Gussng.U30 | 0.0089231 | 0.0192527 | 0.4634710 |
| Dffclt.U1 | 0.0119097 | 0.0601326 | 0.1980570 |
| Dffclt.U2 | 1.2482655 | 0.0879533 | 14.1923610 |
| Dffclt.U3 | 0.2214603 | 0.1507873 | 1.4686931 |
| Dffclt.U4 | -0.6136286 | 0.0700249 | -8.7630005 |
| Dffclt.U5 | 0.9197074 | 0.0868735 | 10.5867402 |
| Dffclt.U6 | -2.1935655 | 0.2917418 | -7.5188580 |
| Dffclt.U7 | 0.9366097 | 0.1021023 | 9.1732467 |
| Dffclt.U8 | 1.5919940 | 0.1340811 | 11.8733703 |
| Dffclt.U9 | -0.6174472 | 0.0851973 | -7.2472659 |
| Dffclt.U10 | -0.8258273 | 0.2935379 | -2.8133584 |
| Dffclt.U11 | -1.3117914 | 0.1142150 | -11.4852777 |
| Dffclt.U12 | 1.5794644 | 0.1259695 | 12.5384683 |
| Dffclt.U13 | -0.7455169 | 0.2877414 | -2.5909267 |
| Dffclt.U14 | 0.6248992 | 0.1025810 | 6.0917611 |
| Dffclt.U15 | -0.5208393 | 0.0599754 | -8.6842191 |
| Dffclt.U16 | 0.2244899 | 0.0804311 | 2.7910815 |
| Dffclt.U17 | -0.2573540 | 0.2162693 | -1.1899700 |
| Dffclt.U18 | -1.0038987 | 0.7850804 | -1.2787208 |
| Dffclt.U19 | 0.4474745 | 0.3085590 | 1.4502071 |
| Dffclt.U20 | -0.7028142 | 0.0730983 | -9.6146460 |
| Dffclt.U21 | -1.8180335 | 0.1722984 | -10.5516581 |
| Dffclt.U22 | 0.2227416 | 0.1139132 | 1.9553616 |
| Dffclt.U23 | 0.9106879 | 0.0887507 | 10.2611955 |
| Dffclt.U24 | 0.7816172 | 0.0929963 | 8.4048211 |
| Dffclt.U25 | 1.3373744 | 0.0950551 | 14.0694640 |
| Dffclt.U26 | 0.7114651 | 0.0728489 | 9.7663077 |
| Dffclt.U27 | 0.3234919 | 0.1439763 | 2.2468406 |
| Dffclt.U28 | 1.1216178 | 0.0843141 | 13.3028452 |
| Dffclt.U29 | -0.6201115 | 0.3983104 | -1.5568551 |
| Dffclt.U30 | 1.2926744 | 0.0986429 | 13.1045811 |
| Dscrmn.U1 | 1.4603442 | 0.1195811 | 12.2121638 |
| Dscrmn.U2 | 1.6885134 | 0.1548361 | 10.9051677 |
| Dscrmn.U3 | 1.2613939 | 0.1898812 | 6.6430674 |
| Dscrmn.U4 | 1.1859815 | 0.1043215 | 11.3685256 |
| Dscrmn.U5 | 1.2624734 | 0.1149204 | 10.9856294 |
| Dscrmn.U6 | 0.8747084 | 0.1133934 | 7.7139283 |
| Dscrmn.U7 | 1.0159475 | 0.1002129 | 10.1378939 |
| Dscrmn.U8 | 1.3804055 | 0.2852295 | 4.8396307 |
| Dscrmn.U9 | 1.0855059 | 0.1005572 | 10.7949052 |
| Dscrmn.U10 | 1.6887556 | 0.3036448 | 5.5616156 |
| Dscrmn.U11 | 1.2328643 | 0.1200453 | 10.2699879 |
| Dscrmn.U12 | 1.4563134 | 0.2664375 | 5.4658720 |
| Dscrmn.U13 | 1.6456813 | 0.2951939 | 5.5749172 |
| Dscrmn.U14 | 1.4758585 | 0.2152583 | 6.8562199 |
| Dscrmn.U15 | 1.8573273 | 0.1526182 | 12.1697658 |
| Dscrmn.U16 | 2.0487720 | 0.2551623 | 8.0292905 |
| Dscrmn.U17 | 1.6200036 | 0.2905938 | 5.5748050 |
| Dscrmn.U18 | 1.0774335 | 0.2907628 | 3.7055417 |
| Dscrmn.U19 | 1.0642060 | 0.2716899 | 3.9169878 |
| Dscrmn.U20 | 1.4663331 | 0.1249919 | 11.7314238 |
| Dscrmn.U21 | 1.0584922 | 0.1178840 | 8.9791004 |
| Dscrmn.U22 | 1.5947631 | 0.2260503 | 7.0549044 |
| Dscrmn.U23 | 1.5901450 | 0.2452181 | 6.4846162 |
| Dscrmn.U24 | 1.4031319 | 0.1966532 | 7.1350558 |
| Dscrmn.U25 | 1.6256323 | 0.1524307 | 10.6647299 |
| Dscrmn.U26 | 1.3975489 | 0.1195062 | 11.6943602 |
| Dscrmn.U27 | 1.5916525 | 0.2682938 | 5.9324987 |
| Dscrmn.U28 | 1.5741189 | 0.1411647 | 11.1509409 |
| Dscrmn.U29 | 1.2466152 | 0.2505410 | 4.9756932 |
| Dscrmn.U30 | 1.5016014 | 0.2239273 | 6.7057534 |
The 3PL model includes a guessing parameter (c), which accounts for the chance of people guessing correctly, but this parameter is near zero for most items. The model’s AIC (30843.97) is higher than the 2PL model, indicating a worse fit.
Since the guessing parameter doesn’t significantly contribute, the 3PL model adds unnecessary complexity without improving fit. The higher AIC suggests it’s over-parameterized. This shows that guessing isn’t a major factor in how people answer, making the simpler 2PL model better for analyzing item properties and forming balanced item clusters without added complexity.
| Item | Difficulty | Discrimination | |
|---|---|---|---|
| U1 | U1 | -0.0008481 | 1.4431905 |
| U2 | U2 | 1.2687406 | 1.6390342 |
| U3 | U3 | 0.1361012 | 1.1620367 |
| U4 | U4 | -0.6242749 | 1.1862358 |
| U5 | U5 | 0.9297283 | 1.2303873 |
| U6 | U6 | -2.1987512 | 0.8738399 |
| U7 | U7 | 0.9480237 | 0.9910584 |
| U8 | U8 | 1.6488531 | 1.0154358 |
| U9 | U9 | -0.6270314 | 1.0850009 |
| U10 | U10 | -1.3526105 | 1.3647625 |
| U11 | U11 | -1.3128331 | 1.2405463 |
| U12 | U12 | 1.6436196 | 1.2056943 |
| U13 | U13 | -1.0858853 | 1.4105109 |
| U14 | U14 | 0.5363938 | 1.2741796 |
| U15 | U15 | -0.5365018 | 1.8623600 |
| U16 | U16 | 0.0976415 | 1.7056954 |
| U17 | U17 | -0.7645914 | 1.2135765 |
| U18 | U18 | -1.9453258 | 0.8848572 |
| U19 | U19 | -0.1104972 | 0.7781632 |
| U20 | U20 | -0.7144043 | 1.4692961 |
| U21 | U21 | -1.8194657 | 1.0589969 |
| U22 | U22 | 0.1128470 | 1.4077961 |
| U23 | U23 | 0.8628041 | 1.3319425 |
| U24 | U24 | 0.7597336 | 1.3076825 |
| U25 | U25 | 1.3611040 | 1.5777773 |
| U26 | U26 | 0.7116288 | 1.3736622 |
| U27 | U27 | -0.1587176 | 1.0551151 |
| U28 | U28 | 1.1332858 | 1.5408897 |
| U29 | U29 | -0.9326043 | 1.1100887 |
| U30 | U30 | 1.3144806 | 1.3802906 |
This analysis determines the best number of clusters for grouping test items based on their difficulty and discrimination using K-Means clustering. The Elbow Method indicates an inflection point at k = 3, meaning that adding more clusters does not significantly reduce within-cluster variance. The Silhouette Method also peaks at k = 3, confirming that this number offers the best balance between cluster cohesion and separation.
Therefore, three clusters provide the ideal solution for grouping items with similar characteristics, helping to balance difficulty and discrimination in item sampling.
| Item | Difficulty | Discrimination | Cluster | |
|---|---|---|---|---|
| U6 | U6 | -2.1987512 | 0.8738399 | 1 |
| U10 | U10 | -1.3526105 | 1.3647625 | 1 |
| U11 | U11 | -1.3128331 | 1.2405463 | 1 |
| U13 | U13 | -1.0858853 | 1.4105109 | 1 |
| U18 | U18 | -1.9453258 | 0.8848572 | 1 |
| U21 | U21 | -1.8194657 | 1.0589969 | 1 |
| U29 | U29 | -0.9326043 | 1.1100887 | 1 |
| U1 | U1 | -0.0008481 | 1.4431905 | 2 |
| U3 | U3 | 0.1361012 | 1.1620367 | 2 |
| U4 | U4 | -0.6242749 | 1.1862358 | 2 |
| U9 | U9 | -0.6270314 | 1.0850009 | 2 |
| U15 | U15 | -0.5365018 | 1.8623600 | 2 |
| U16 | U16 | 0.0976415 | 1.7056954 | 2 |
| U17 | U17 | -0.7645914 | 1.2135765 | 2 |
| U19 | U19 | -0.1104972 | 0.7781632 | 2 |
| U20 | U20 | -0.7144043 | 1.4692961 | 2 |
| U22 | U22 | 0.1128470 | 1.4077961 | 2 |
| U27 | U27 | -0.1587176 | 1.0551151 | 2 |
| U2 | U2 | 1.2687406 | 1.6390342 | 3 |
| U5 | U5 | 0.9297283 | 1.2303873 | 3 |
| U7 | U7 | 0.9480237 | 0.9910584 | 3 |
| U8 | U8 | 1.6488531 | 1.0154358 | 3 |
| U12 | U12 | 1.6436196 | 1.2056943 | 3 |
| U14 | U14 | 0.5363938 | 1.2741796 | 3 |
| U23 | U23 | 0.8628041 | 1.3319425 | 3 |
| U24 | U24 | 0.7597336 | 1.3076825 | 3 |
| U25 | U25 | 1.3611040 | 1.5777773 | 3 |
| U26 | U26 | 0.7116288 | 1.3736622 | 3 |
| U28 | U28 | 1.1332858 | 1.5408897 | 3 |
| U30 | U30 | 1.3144806 | 1.3802906 | 3 |
Items were grouped based on difficulty and discrimination parameters using the Two-Parameter Logistic (2PL) model and K-Means clustering with k = 3. The 2PL model accounts for differences in item discrimination, resulting in a more accurate psychometric evaluation.
Cluster 1 includes items with low discrimination and negative difficulty (e.g., U6, U18), indicating they are easier items that do not effectively differentiate between individuals. Cluster 2 contains items with moderate difficulty and high discrimination (e.g., U16, U22), while Cluster 3 features more difficult items (e.g., U2, U30) with strong discriminative power. This clustering approach enables balanced item sampling across various psychometric profiles.
In conclusion, the analysis successfully grouped 30 items based on their difficulty and ability to differentiate between test-takers, ensuring psychometric equivalence across test forms. Using KMeans clustering, supported by the Elbow and Silhouette methods, the items were divided into three balanced clusters. This approach ensures that each test form is fair and consistent, covering a wide range of ability levels while maintaining clear distinctions between performance levels. Overall, this approach supports the creation of psychometrically reliable and valid tests.