Task number 9

Abstract

Background: Psychometric equivalence is essential in test construction to ensure that sampled items perform consistently across different test versions. This analysis evaluates the psychometric properties of 30 dichotomous (0/1) items from a sample of 1,000 participants by clustering items based on their difficulty and discrimination using Item Response Theory (IRT) models.

Methods: The analysis applied Categorical Principal Component Analysis (Princals) to assess unidimensionality, Exploratory Factor Analysis (EFA) to investigate factor structures, and Item Factor Analysis (IFA) to confirm model simplicity. IRT models, including the Rasch model, Two-Parameter Logistic (2PL), and Three-Parameter Logistic (3PL) models, were used to estimate item parameters and guide the clustering process. The 2PL model was selected for its ability to accommodate varying item discrimination, unlike the Rasch model, which assumes equal discrimination, and the 3PL model, which introduces a guessing parameter. Using KMeans clustering, items were categorized into three groups based on their discrimination and difficulty levels.

Results: The Princals analysis indicated that the items aligned along a single dimension, confirming unidimensionality. The EFA in combination with a scree plot also verified that a one-factor solution is the best for the data. The IFA suggested that a simpler factor structure provided a better fit. The Rasch model captured a wide range of item difficulties but was limited by its assumption of equal discrimination across items. The 2PL model provided the best fit, with discrimination parameters ranging from 0.778 to 1.862 and difficulty parameters covering a wide range. The guessing parameter in the 3PL model did not significantly improve the model fit. The Elbow and Silhouette methods showed that three clusters were optimal for KMeans clustering, facilitating balanced item sampling.

Conclusions: The 2PL model was the optimal framework for clustering the 30 items based on difficulty and discrimination. This approach enabled accurate grouping into three clusters, allowing future test forms to draw items from each level to ensure psychometric equivalence and improve the reliability and fairness of assessments while exactly measuring diverse participant abilities.

Descriptive statistics

The dataset includes responses to 30 dichotomous (0/1) items (U1–U30) from 1,000 participants. Each row represents a participant’s response pattern, with “1” indicating a correct response and “0” indicating an incorrect response. Demographic data such as group and age are also given. The data can be used for psychometric analysis, particularly with Item Response Theory (IRT) models, to assess item difficulty and discrimination.

Sample of Dichotomous Item Responses
U1	U2	U3	U4	U5	U6	U7	U8	U9	U10	U11	U13	U14	U15	U16	U17	U18	U19	U20	U21	U22	U23	U24	U25	U26	U27	U28	U29	U30	group	Age
1	0	0	0	1	1	0	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	0	0	1	0	1	0	men	28
1	1	0	1	1	1	0	1	1	1	1	1	0	1	0	1	0	1	1	1	0	1	1	1	1	1	0	1	0	women	26
1	0	0	1	0	1	1	0	1	1	1	1	1	1	1	1	1	0	1	1	1	1	0	1	0	1	1	1	0	women	28
1	0	1	1	1	1	1	0	1	1	1	1	0	1	1	0	1	0	0	1	1	0	0	0	0	1	0	1	1	women	30
1	1	0	1	0	1	0	1	0	1	1	1	1	1	0	1	1	1	1	1	1	0	0	0	0	0	1	0	0	men	26
0	0	0	1	0	1	1	0	0	1	1	1	0	1	1	1	1	1	1	1	1	0	0	0	1	0	0	1	0	men	26

Age and Group Statistics

Summary Statistics of Age
Min.	1st Qu.	Median	Mean	3rd Qu.	Max.
20	26	28	27.727	30	32

Distribution of Group
Var1	Freq
men	500
women	500

The dataset consists of 1,000 participants with ages ranging from 20 to 32 years. The mean age is 27.73 years, and the median is 28 years. The distribution is relatively narrow, with most participants (63.3%) falling between 26 and 30 years, indicating a slightly younger sample. The group distribution is balanced, with exactly 500 women and 500 men.

Counting the 0’s and 1’s per Item

Counts of 0s and 1s per Item
	Count_1s	Count_0s
U1	501	499
U2	193	807
U3	470	530
U4	642	358
U5	293	707
U6	844	156
U7	315	685
U8	199	801
U9	635	365
U10	802	198
U11	782	218
U12	173	827
U13	757	243
U14	374	626
U15	654	346
U16	473	527
U17	674	326
U18	818	182
U19	520	480
U20	681	319
U21	833	167
U22	472	528
U23	298	702
U24	322	678
U25	181	819
U26	328	672
U27	535	465
U28	226	774
U29	698	302
U30	206	794

The distribution of responses across items U1 to U30 shows variability in the number of 0’s and 1’s. Items like U6 and U21 had a high proportion of 1’s (844 and 833), indicating they were easier for participants. In contrast, U12 and U25 had significantly fewer 1’s (173 and 181), suggesting higher difficulty.

This pattern shows that certain items were easier for participants, leading to higher correct response rates, while others created more challenges, resulting in a greater number of incorrect answers. Overall, the differences in response distribution indicate varying levels of item difficulty within the dataset.

Assessing Dimensionality

Before fitting an Item Response Theory (IRT) model, it is crucial to assess the scale’s dimensionality. The following method were used: Categorical Principal Component Analysis (Princals), Exploratory Factor Analysis (EFA) and Item Factor Analysis (IFA) for analyzing latent item structures. These techniques ensure the scale is appropriate for IRT modeling.

Categorical Principal Component Analysis (Princals)

The PCA shows that all 30 items point in the same direction on the principal component plot, suggesting they measure a similar concept. This supports grouping the items into balanced clusters based on difficulty and discrimination, helping to create clusters with similar characteristics.

Exploratory Factor Analysis (EFA)

EFA Loadings Table
	Factor 1	Factor 2	Factor 3
U1	0.4583751	0.4146866	0.2245282
U2	0.4749530	0.5012407	0.0883197
U3	0.4540790	0.3512264	0.0874065
U4	0.5368042	0.2194293	0.1355101
U5	0.4991854	0.3067454	0.1303310
U6	0.1765252	0.1424032	0.9713710
U7	0.3560481	0.3641058	0.0744484
U8	0.2052611	0.4936294	0.2133258
U9	0.4427127	0.2513244	0.2079537
U10	0.4360856	0.3907758	0.1537579
U11	0.5655336	0.1685652	0.1369274
U12	0.1731500	0.6412458	0.2160460
U13	0.4997258	0.3650578	0.0974844
U14	0.4238033	0.4193519	0.1390296
U15	0.6852040	0.2895528	0.1409246
U16	0.5054943	0.4780923	0.1753344
U17	0.4192744	0.4352794	0.0120122
U18	0.2765364	0.3262428	0.1715914
U19	0.4144009	0.2099004	-0.0399616
U20	0.6080735	0.2784103	0.0691154
U21	0.4073212	0.2540542	0.1202816
U22	0.5184947	0.3358727	0.2031776
U23	0.4023313	0.5120767	0.0276091
U24	0.4453539	0.3604365	0.2536687
U25	0.5685594	0.3351204	0.1343321
U26	0.5255998	0.3137475	0.1967931
U27	0.3768026	0.3910923	0.0724512
U28	0.5413010	0.3418858	0.2666009
U29	0.4252332	0.3504914	0.0695117
U30	0.3364113	0.6218859	-0.0226244

The eigenvalues indicate a sharp decline after the first factor, dropping from 10.62 (Factor 1) to 0.96 (Factor 2). This decrease suggests that Factor 1 explains the majority of the variance, while the factors afterwards contribute very little. The scree plot’s “elbow” at Factor 1 highlights this dramatic reduction in explained variance, showing that after the first factor, the other factors have little impact.

The EFA factor loadings follow the eigenvalue pattern, with Factor 1 showing strong loadings across most items (e.g., U15 = 0.685, U20 = 0.608). In contrast, Factors 2 and 3 have weak and inconsistent loadings, with only a few items moderately linked to these factors. For example, Factor 3 only shows a significant loading on U6 (0.971), which is not enough to define a meaningful factor.

The eigenvalue drop from 10.62 to 0.96 and the weak loadings for Factors 2 and 3, suggest that a single-factor model fits best. The scree plot and factor loadings show that the data doesn’t clearly split into three distinct factors, as the additional factors add very little. Therefore, a one-factor model is simpler and fits better.

Item Factor Analysis (IFA)

ANOVA Results for Item Factor Analysis
	AIC	SABIC	HQ	BIC	logLik	X2	df	p
fitifa1	30805.21	30909.11	30917.12	31099.67	-15342.60	NA	NA	NA
fitifa2	30811.02	30965.14	30977.03	31247.81	-15316.51	52.185	29	0.005

Two models were tested, and the simpler model (fitifa1) had a lower AIC, indicating it was more efficient. The more complex model (fitifa2) had a significant chi-square result (p = 0.005), suggesting a poor fit despite having more parameters.

The IFA results support the simpler model (fitifa1), which captures the factor structure effectively without unnecessary complexity. The lack of improvement in the more complex model (fitifa2) suggests that adding more factors or parameters doesn’t significantly improve the fit. This indicates that the factor structure is relatively simple, and the items can be grouped using fewer factors.

The Rasch Model

Item Difficulty Parameters with 95% CI
	Item	Estimate	Std..Error	Lower.CI	Upper.CI
U2	U2	1.8276169	0.0885824	1.6539954	2.0012384
U3	U3	0.1303256	0.0717695	-0.0103426	0.2709937
U4	U4	-0.7862235	0.0738644	-0.9309977	-0.6414493
U5	U5	1.1265567	0.0781315	0.9734190	1.2796943
U6	U6	-2.1491982	0.0935869	-2.3326285	-1.9657680
U7	U7	0.9914796	0.0767251	0.8410984	1.1418609
U8	U8	1.7798918	0.0876863	1.6080266	1.9517570
U9	U9	-0.7472956	0.0736144	-0.8915798	-0.6030114
U10	U10	-1.8031336	0.0862896	-1.9722613	-1.6340060
U11	U11	-1.6554300	0.0837052	-1.8194922	-1.4913679
U12	U12	1.9943255	0.0919516	1.8141004	2.1745505
U13	U13	-1.4821102	0.0810397	-1.6409480	-1.3232723
U14	U14	0.6493271	0.0739466	0.5043917	0.7942624
U15	U15	-0.8535999	0.0743329	-0.9992924	-0.7079074
U16	U16	0.1145091	0.0717409	-0.0261031	0.2551213
U17	U17	-0.9679428	0.0752333	-1.1154001	-0.8204854
U18	U18	-1.9284397	0.0887228	-2.1023363	-1.7545431
U19	U19	-0.1322831	0.0715829	-0.2725856	0.0080193
U20	U20	-1.0086460	0.0755867	-1.1567960	-0.8604961
U21	U21	-2.0528812	0.0913706	-2.2319675	-1.8737948
U22	U22	0.1197798	0.0717501	-0.0208505	0.2604101
U23	U23	1.0954197	0.0777908	0.9429497	1.2478898
U24	U24	0.9494793	0.0763245	0.7998832	1.0990753
U25	U25	1.9261390	0.0905277	1.7487046	2.1035734
U26	U26	0.9138198	0.0759976	0.7648646	1.0627751
U27	U27	-0.2110157	0.0716476	-0.3514449	-0.0705865
U28	U28	1.5759768	0.0841794	1.4109852	1.7409684
U29	U29	-1.1091829	0.0765347	-1.2591910	-0.9591747
U30	U30	1.7253959	0.0866988	1.5554661	1.8953256

Item Easiness Parameters with 95% CI
	Item	Estimate	Std..Error	Lower.CI	Upper.CI
beta U1	beta U1	0.0326600	0.0715813	-0.1076394	0.1729594
beta U2	beta U2	-1.8276169	0.0885824	-2.0012384	-1.6539954
beta U3	beta U3	-0.1303256	0.0717695	-0.2709937	0.0103426
beta U4	beta U4	0.7862235	0.0738644	0.6414493	0.9309977
beta U5	beta U5	-1.1265567	0.0781315	-1.2796943	-0.9734190
beta U6	beta U6	2.1491982	0.0935869	1.9657680	2.3326285
beta U7	beta U7	-0.9914796	0.0767251	-1.1418609	-0.8410984
beta U8	beta U8	-1.7798918	0.0876863	-1.9517570	-1.6080266
beta U9	beta U9	0.7472956	0.0736144	0.6030114	0.8915798
beta U10	beta U10	1.8031336	0.0862896	1.6340060	1.9722613
beta U11	beta U11	1.6554300	0.0837052	1.4913679	1.8194922
beta U12	beta U12	-1.9943255	0.0919516	-2.1745505	-1.8141004
beta U13	beta U13	1.4821102	0.0810397	1.3232723	1.6409480
beta U14	beta U14	-0.6493271	0.0739466	-0.7942624	-0.5043917
beta U15	beta U15	0.8535999	0.0743329	0.7079074	0.9992924
beta U16	beta U16	-0.1145091	0.0717409	-0.2551213	0.0261031
beta U17	beta U17	0.9679428	0.0752333	0.8204854	1.1154001
beta U18	beta U18	1.9284397	0.0887228	1.7545431	2.1023363
beta U19	beta U19	0.1322831	0.0715829	-0.0080193	0.2725856
beta U20	beta U20	1.0086460	0.0755867	0.8604961	1.1567960
beta U21	beta U21	2.0528812	0.0913706	1.8737948	2.2319675
beta U22	beta U22	-0.1197798	0.0717501	-0.2604101	0.0208505
beta U23	beta U23	-1.0954197	0.0777908	-1.2478898	-0.9429497
beta U24	beta U24	-0.9494793	0.0763245	-1.0990753	-0.7998832
beta U25	beta U25	-1.9261390	0.0905277	-2.1035734	-1.7487046
beta U26	beta U26	-0.9138198	0.0759976	-1.0627751	-0.7648646
beta U27	beta U27	0.2110157	0.0716476	0.0705865	0.3514449
beta U28	beta U28	-1.5759768	0.0841794	-1.7409684	-1.4109852
beta U29	beta U29	1.1091829	0.0765347	0.9591747	1.2591910
beta U30	beta U30	-1.7253959	0.0866988	-1.8953256	-1.5554661

The Rasch model assumes equal discrimination for all items, with discrimination set to 1. Item difficulty (eta) ranges from -2.149 (U6) to 1.994 (U12), showing a wide range of difficulty levels.

While the Rasch model is good for estimating item difficulty, its assumption of equal discrimination makes it less suitable for this dataset. The variation in difficulty suggests some items are easier (U6) and others harder (U12). However, by assuming equal discrimination, the model doesn’t account for how well items distinguish between respondents with different abilities. This means the Rasch model may oversimplify the data, so it might not be the best choice for clustering based on difficulty and discrimination.

Two-Parameter Logistic Model

Summary of 2PL Model
	Item	Value	Standard_error	Z_value
Dffclt.U1	Dffclt.U1	-0.0008481	0.0610828	-0.013884
Dffclt.U2	Dffclt.U2	1.2687406	0.0901622	14.071760
Dffclt.U3	Dffclt.U3	0.1361012	0.0699564	1.945515
Dffclt.U4	Dffclt.U4	-0.6242749	0.0798422	-7.818856
Dffclt.U5	Dffclt.U5	0.9297283	0.0892332	10.419083
Dffclt.U6	Dffclt.U6	-2.1987512	0.2488342	-8.836208
Dffclt.U7	Dffclt.U7	0.9480237	0.1049707	9.031317
Dffclt.U8	Dffclt.U8	1.6488531	0.1552095	10.623404
Dffclt.U9	Dffclt.U9	-0.6270314	0.0849658	-7.379806
Dffclt.U10	Dffclt.U10	-1.3526105	0.1096486	-12.335868
Dffclt.U11	Dffclt.U11	-1.3128331	0.1134219	-11.574781
Dffclt.U12	Dffclt.U12	1.6436196	0.1378746	11.921116
Dffclt.U13	Dffclt.U13	-1.0858853	0.0911028	-11.919336
Dffclt.U14	Dffclt.U14	0.5363938	0.0728710	7.360864
Dffclt.U15	Dffclt.U15	-0.5365018	0.0598813	-8.959423
Dffclt.U16	Dffclt.U16	0.0976415	0.0562389	1.736190
Dffclt.U17	Dffclt.U17	-0.7645914	0.0839846	-9.103949
Dffclt.U18	Dffclt.U18	-1.9453258	0.2115523	-9.195482
Dffclt.U19	Dffclt.U19	-0.1104972	0.0930060	-1.188065
Dffclt.U20	Dffclt.U20	-0.7144043	0.0727424	-9.821011
Dffclt.U21	Dffclt.U21	-1.8194657	0.1740250	-10.455195
Dffclt.U22	Dffclt.U22	0.1128470	0.0621808	1.814822
Dffclt.U23	Dffclt.U23	0.8628041	0.0817048	10.560012
Dffclt.U24	Dffclt.U24	0.7597336	0.0786717	9.657008
Dffclt.U25	Dffclt.U25	1.3611040	0.0974469	13.967653
Dffclt.U26	Dffclt.U26	0.7116288	0.0745092	9.550886
Dffclt.U27	Dffclt.U27	-0.1587176	0.0749001	-2.119059
Dffclt.U28	Dffclt.U28	1.1332858	0.0862080	13.145950
Dffclt.U29	Dffclt.U29	-0.9326043	0.0976556	-9.549933
Dffclt.U30	Dffclt.U30	1.3144806	0.1029637	12.766445
Dscrmn.U1	Dscrmn.U1	1.4431905	0.1178069	12.250471
Dscrmn.U2	Dscrmn.U2	1.6390342	0.1486804	11.023876
Dscrmn.U3	Dscrmn.U3	1.1620367	0.1017599	11.419396
Dscrmn.U4	Dscrmn.U4	1.1862358	0.1064482	11.143784
Dscrmn.U5	Dscrmn.U5	1.2303873	0.1110062	11.083948
Dscrmn.U6	Dscrmn.U6	0.8738399	0.1135162	7.697929
Dscrmn.U7	Dscrmn.U7	0.9910584	0.0969653	10.220753
Dscrmn.U8	Dscrmn.U8	1.0154358	0.1087823	9.334565
Dscrmn.U9	Dscrmn.U9	1.0850009	0.1004593	10.800399
Dscrmn.U10	Dscrmn.U10	1.3647625	0.1334303	10.228283
Dscrmn.U11	Dscrmn.U11	1.2405463	0.1216962	10.193796
Dscrmn.U12	Dscrmn.U12	1.2056943	0.1234510	9.766585
Dscrmn.U13	Dscrmn.U13	1.4105109	0.1295309	10.889380
Dscrmn.U14	Dscrmn.U14	1.2741796	0.1092822	11.659530
Dscrmn.U15	Dscrmn.U15	1.8623600	0.1526118	12.203254
Dscrmn.U16	Dscrmn.U16	1.7056954	0.1344097	12.690273
Dscrmn.U17	Dscrmn.U17	1.2135765	0.1099691	11.035613
Dscrmn.U18	Dscrmn.U18	0.8848572	0.1085157	8.154189
Dscrmn.U19	Dscrmn.U19	0.7781632	0.0839225	9.272405
Dscrmn.U20	Dscrmn.U20	1.4692961	0.1254177	11.715217
Dscrmn.U21	Dscrmn.U21	1.0589969	0.1201081	8.817030
Dscrmn.U22	Dscrmn.U22	1.4077961	0.1155528	12.183141
Dscrmn.U23	Dscrmn.U23	1.3319425	0.1160957	11.472801
Dscrmn.U24	Dscrmn.U24	1.3076825	0.1132013	11.551837
Dscrmn.U25	Dscrmn.U25	1.5777773	0.1463352	10.781942
Dscrmn.U26	Dscrmn.U26	1.3736622	0.1164977	11.791320
Dscrmn.U27	Dscrmn.U27	1.0551151	0.0964734	10.936852
Dscrmn.U28	Dscrmn.U28	1.5408897	0.1365496	11.284472
Dscrmn.U29	Dscrmn.U29	1.1100887	0.1059345	10.479010
Dscrmn.U30	Dscrmn.U30	1.3802906	0.1288097	10.715736

The 2PL model estimates both item difficulty and discrimination, making it more flexible than the Rasch model. Discrimination ranges from 0.778 (U19) to 1.862 (U15), and difficulty varies widely across items. The model has a good fit, with a lower AIC (30806.27) than both the Rasch and 3PL models.

The 2PL model is the best fit for this data because it allows each item to have its own discrimination value, unlike the Rasch model. High-discrimination items (e.g., U15 and U16) better differentiate between participants of different abilities, while low-discrimination items (e.g., U19) are less reliable. The wide range of difficulty also shows that items cover a broad ability spectrum, making the 2PL model ideal for grouping items based on difficulty and discrimination.

Three-Parameter Logistic Model

Summary of 3PL Model
Item	Value	Standard_error	Z_value
Gussng.U1	0.0000009	0.0003609	0.0025857
Gussng.U2	0.0000002	0.0000582	0.0031701
Gussng.U3	0.0357052	0.0632781	0.5642578
Gussng.U4	0.0001960	NaN	NaN
Gussng.U5	0.0000000	0.0000194	0.0010626
Gussng.U6	0.0015934	0.0801375	0.0198834
Gussng.U7	0.0000431	0.0033458	0.0128825
Gussng.U8	0.0458528	0.0274881	1.6680980
Gussng.U9	0.0000029	0.0008204	0.0034785
Gussng.U10	0.2851522	0.1370039	2.0813445
Gussng.U11	0.0000032	0.0009037	0.0034951
Gussng.U12	0.0211417	0.0205739	1.0275975
Gussng.U13	0.1855913	0.1451978	1.2781969
Gussng.U14	0.0459580	0.0406205	1.1313990
Gussng.U15	0.0000048	0.0009051	0.0052610
Gussng.U16	0.0647455	0.0356979	1.8137065
Gussng.U17	0.2326992	0.0933739	2.4921230
Gussng.U18	0.3740668	0.2431689	1.5383005
Gussng.U19	0.1901005	0.1051988	1.8070599
Gussng.U20	0.0000008	0.0004119	0.0019622
Gussng.U21	0.0000156	0.0025430	0.0061353
Gussng.U22	0.0512474	0.0511980	1.0009661
Gussng.U23	0.0383912	0.0307128	1.2500078
Gussng.U24	0.0150560	0.0325145	0.4630556
Gussng.U25	0.0000001	0.0000436	0.0024084
Gussng.U26	0.0000007	0.0002423	0.0028031
Gussng.U27	0.2032610	0.0576739	3.5243161
Gussng.U28	0.0000071	0.0012296	0.0057429
Gussng.U29	0.1427725	0.1731639	0.8244932
Gussng.U30	0.0089231	0.0192527	0.4634710
Dffclt.U1	0.0119097	0.0601326	0.1980570
Dffclt.U2	1.2482655	0.0879533	14.1923610
Dffclt.U3	0.2214603	0.1507873	1.4686931
Dffclt.U4	-0.6136286	0.0700249	-8.7630005
Dffclt.U5	0.9197074	0.0868735	10.5867402
Dffclt.U6	-2.1935655	0.2917418	-7.5188580
Dffclt.U7	0.9366097	0.1021023	9.1732467
Dffclt.U8	1.5919940	0.1340811	11.8733703
Dffclt.U9	-0.6174472	0.0851973	-7.2472659
Dffclt.U10	-0.8258273	0.2935379	-2.8133584
Dffclt.U11	-1.3117914	0.1142150	-11.4852777
Dffclt.U12	1.5794644	0.1259695	12.5384683
Dffclt.U13	-0.7455169	0.2877414	-2.5909267
Dffclt.U14	0.6248992	0.1025810	6.0917611
Dffclt.U15	-0.5208393	0.0599754	-8.6842191
Dffclt.U16	0.2244899	0.0804311	2.7910815
Dffclt.U17	-0.2573540	0.2162693	-1.1899700
Dffclt.U18	-1.0038987	0.7850804	-1.2787208
Dffclt.U19	0.4474745	0.3085590	1.4502071
Dffclt.U20	-0.7028142	0.0730983	-9.6146460
Dffclt.U21	-1.8180335	0.1722984	-10.5516581
Dffclt.U22	0.2227416	0.1139132	1.9553616
Dffclt.U23	0.9106879	0.0887507	10.2611955
Dffclt.U24	0.7816172	0.0929963	8.4048211
Dffclt.U25	1.3373744	0.0950551	14.0694640
Dffclt.U26	0.7114651	0.0728489	9.7663077
Dffclt.U27	0.3234919	0.1439763	2.2468406
Dffclt.U28	1.1216178	0.0843141	13.3028452
Dffclt.U29	-0.6201115	0.3983104	-1.5568551
Dffclt.U30	1.2926744	0.0986429	13.1045811
Dscrmn.U1	1.4603442	0.1195811	12.2121638
Dscrmn.U2	1.6885134	0.1548361	10.9051677
Dscrmn.U3	1.2613939	0.1898812	6.6430674
Dscrmn.U4	1.1859815	0.1043215	11.3685256
Dscrmn.U5	1.2624734	0.1149204	10.9856294
Dscrmn.U6	0.8747084	0.1133934	7.7139283
Dscrmn.U7	1.0159475	0.1002129	10.1378939
Dscrmn.U8	1.3804055	0.2852295	4.8396307
Dscrmn.U9	1.0855059	0.1005572	10.7949052
Dscrmn.U10	1.6887556	0.3036448	5.5616156
Dscrmn.U11	1.2328643	0.1200453	10.2699879
Dscrmn.U12	1.4563134	0.2664375	5.4658720
Dscrmn.U13	1.6456813	0.2951939	5.5749172
Dscrmn.U14	1.4758585	0.2152583	6.8562199
Dscrmn.U15	1.8573273	0.1526182	12.1697658
Dscrmn.U16	2.0487720	0.2551623	8.0292905
Dscrmn.U17	1.6200036	0.2905938	5.5748050
Dscrmn.U18	1.0774335	0.2907628	3.7055417
Dscrmn.U19	1.0642060	0.2716899	3.9169878
Dscrmn.U20	1.4663331	0.1249919	11.7314238
Dscrmn.U21	1.0584922	0.1178840	8.9791004
Dscrmn.U22	1.5947631	0.2260503	7.0549044
Dscrmn.U23	1.5901450	0.2452181	6.4846162
Dscrmn.U24	1.4031319	0.1966532	7.1350558
Dscrmn.U25	1.6256323	0.1524307	10.6647299
Dscrmn.U26	1.3975489	0.1195062	11.6943602
Dscrmn.U27	1.5916525	0.2682938	5.9324987
Dscrmn.U28	1.5741189	0.1411647	11.1509409
Dscrmn.U29	1.2466152	0.2505410	4.9756932
Dscrmn.U30	1.5016014	0.2239273	6.7057534

The 3PL model includes a guessing parameter (c), which accounts for the chance of people guessing correctly, but this parameter is near zero for most items. The model’s AIC (30843.97) is higher than the 2PL model, indicating a worse fit.

Since the guessing parameter doesn’t significantly contribute, the 3PL model adds unnecessary complexity without improving fit. The higher AIC suggests it’s over-parameterized. This shows that guessing isn’t a major factor in how people answer, making the simpler 2PL model better for analyzing item properties and forming balanced item clusters without added complexity.

Cluster Analysis

Item Difficulty and Discrimination Parameters
	Item	Difficulty	Discrimination
U1	U1	-0.0008481	1.4431905
U2	U2	1.2687406	1.6390342
U3	U3	0.1361012	1.1620367
U4	U4	-0.6242749	1.1862358
U5	U5	0.9297283	1.2303873
U6	U6	-2.1987512	0.8738399
U7	U7	0.9480237	0.9910584
U8	U8	1.6488531	1.0154358
U9	U9	-0.6270314	1.0850009
U10	U10	-1.3526105	1.3647625
U11	U11	-1.3128331	1.2405463
U12	U12	1.6436196	1.2056943
U13	U13	-1.0858853	1.4105109
U14	U14	0.5363938	1.2741796
U15	U15	-0.5365018	1.8623600
U16	U16	0.0976415	1.7056954
U17	U17	-0.7645914	1.2135765
U18	U18	-1.9453258	0.8848572
U19	U19	-0.1104972	0.7781632
U20	U20	-0.7144043	1.4692961
U21	U21	-1.8194657	1.0589969
U22	U22	0.1128470	1.4077961
U23	U23	0.8628041	1.3319425
U24	U24	0.7597336	1.3076825
U25	U25	1.3611040	1.5777773
U26	U26	0.7116288	1.3736622
U27	U27	-0.1587176	1.0551151
U28	U28	1.1332858	1.5408897
U29	U29	-0.9326043	1.1100887
U30	U30	1.3144806	1.3802906

This analysis determines the best number of clusters for grouping test items based on their difficulty and discrimination using K-Means clustering. The Elbow Method indicates an inflection point at k = 3, meaning that adding more clusters does not significantly reduce within-cluster variance. The Silhouette Method also peaks at k = 3, confirming that this number offers the best balance between cluster cohesion and separation.

Therefore, three clusters provide the ideal solution for grouping items with similar characteristics, helping to balance difficulty and discrimination in item sampling.

Clustering & Conclusion

Clusters
	Item	Difficulty	Discrimination	Cluster
U6	U6	-2.1987512	0.8738399	1
U10	U10	-1.3526105	1.3647625	1
U11	U11	-1.3128331	1.2405463	1
U13	U13	-1.0858853	1.4105109	1
U18	U18	-1.9453258	0.8848572	1
U21	U21	-1.8194657	1.0589969	1
U29	U29	-0.9326043	1.1100887	1
U1	U1	-0.0008481	1.4431905	2
U3	U3	0.1361012	1.1620367	2
U4	U4	-0.6242749	1.1862358	2
U9	U9	-0.6270314	1.0850009	2
U15	U15	-0.5365018	1.8623600	2
U16	U16	0.0976415	1.7056954	2
U17	U17	-0.7645914	1.2135765	2
U19	U19	-0.1104972	0.7781632	2
U20	U20	-0.7144043	1.4692961	2
U22	U22	0.1128470	1.4077961	2
U27	U27	-0.1587176	1.0551151	2
U2	U2	1.2687406	1.6390342	3
U5	U5	0.9297283	1.2303873	3
U7	U7	0.9480237	0.9910584	3
U8	U8	1.6488531	1.0154358	3
U12	U12	1.6436196	1.2056943	3
U14	U14	0.5363938	1.2741796	3
U23	U23	0.8628041	1.3319425	3
U24	U24	0.7597336	1.3076825	3
U25	U25	1.3611040	1.5777773	3
U26	U26	0.7116288	1.3736622	3
U28	U28	1.1332858	1.5408897	3
U30	U30	1.3144806	1.3802906	3

Clustering of Items Using the 2PL Model and K-Means (k = 3)

Items were grouped based on difficulty and discrimination parameters using the Two-Parameter Logistic (2PL) model and K-Means clustering with k = 3. The 2PL model accounts for differences in item discrimination, resulting in a more accurate psychometric evaluation.

Cluster 1 includes items with low discrimination and negative difficulty (e.g., U6, U18), indicating they are easier items that do not effectively differentiate between individuals. Cluster 2 contains items with moderate difficulty and high discrimination (e.g., U16, U22), while Cluster 3 features more difficult items (e.g., U2, U30) with strong discriminative power. This clustering approach enables balanced item sampling across various psychometric profiles.

Conclusion: Psychometric Equivalence via Clustering

In conclusion, the analysis successfully grouped 30 items based on their difficulty and ability to differentiate between test-takers, ensuring psychometric equivalence across test forms. Using KMeans clustering, supported by the Elbow and Silhouette methods, the items were divided into three balanced clusters. This approach ensures that each test form is fair and consistent, covering a wide range of ability levels while maintaining clear distinctions between performance levels. Overall, this approach supports the creation of psychometrically reliable and valid tests.

Task number 9

Fabian Moreno Almeida

21.09.2024

Test equivalence of psychometric properties

Abstract

Descriptive statistics

Descriptive statistics

Age and Group Statistics

Counting the 0’s and 1’s per Item

Assessing Dimensionality

Assessing Dimensionality

Categorical Principal Component Analysis (Princals)

Exploratory Factor Analysis (EFA)

Item Factor Analysis (IFA)

The Rasch Model

The Rasch Model

Two-Parameter Logistic Model

Two-Parameter Logistic Model

Three-Parameter Logistic Model

Three-Parameter Logistic Model

Cluster Analysis

Cluster Analysis

Clustering & Conclusion

Clustering & Conclusion

Clustering of Items Using the 2PL Model and K-Means (k = 3)

Conclusion: Psychometric Equivalence via Clustering

	Count_1s	Count_0s
U1	501	499
U2	193	807
U3	470	530
U4	642	358
U5	293	707
U6	844	156
U7	315	685
U8	199	801
U9	635	365
U10	802	198
U11	782	218
U12	173	827
U13	757	243
U14	374	626
U15	654	346
U16	473	527
U17	674	326
U18	818	182
U19	520	480
U20	681	319
U21	833	167
U22	472	528
U23	298	702
U24	322	678
U25	181	819
U26	328	672
U27	535	465
U28	226	774
U29	698	302
U30	206	794

	Count_1s	Count_0s
U1	501	499
U2	193	807
U3	470	530
U4	642	358
U5	293	707
U6	844	156
U7	315	685
U8	199	801
U9	635	365
U10	802	198
U11	782	218
U12	173	827
U13	757	243
U14	374	626
U15	654	346
U16	473	527
U17	674	326
U18	818	182
U19	520	480
U20	681	319
U21	833	167
U22	472	528
U23	298	702
U24	322	678
U25	181	819
U26	328	672
U27	535	465
U28	226	774
U29	698	302
U30	206	794

	Count_1s	Count_0s
U1	501	499
U2	193	807
U3	470	530
U4	642	358
U5	293	707
U6	844	156
U7	315	685
U8	199	801
U9	635	365
U10	802	198
U11	782	218
U12	173	827
U13	757	243
U14	374	626
U15	654	346
U16	473	527
U17	674	326
U18	818	182
U19	520	480
U20	681	319
U21	833	167
U22	472	528
U23	298	702
U24	322	678
U25	181	819
U26	328	672
U27	535	465
U28	226	774
U29	698	302
U30	206	794