1, When other assumptions hold and unidimentionality donesn’t hold, should we just declare “these items are not good”. Because the latent covariates may contain more than “wrist condition” only, it is hard to study health condition from these items.
2, According to reference “An introduction to the Rasch measurement model: An example using the Hospital Anxiety and Depression Scale (HADS)”, when there is significant indicator of DIF, even after bonforroni adjustment, we should split the group into two subgroups to reach unbiasness. Do you agree with that?
3, For local dependency, there are many criterias and their statements don’t contain clear statistical defination, so I am still searching reference papers for the final criteria. If you know any clearly-defined criteria, please let me know.
## mhq1a1 mhq1a2 mhq1a3 mhq1a4 mhq1a5
## 0 66 88 60 30 81
## 1 107 97 102 102 88
## 2 51 34 52 70 46
## 3 6 9 14 24 13
## 4 3 5 5 7 5
In the RSM, we use the same thresholds for different items. In the PCM, different items can have their own thresholds.
## LR statistic: 31.85512 df = 12 p = 0.001456823
The result is significant, thus we will PCM as our final model, i.e. we can’t simplify PCM into RSM.
there are no disordered thresholds in PCM model.
## mhq1a1 mhq1a2 mhq1a3 mhq1a4 mhq1a5
## 1.000 1.000 1.000 0.992 0.170
We show all the p-values here. And for domain 1, all items fit well.
## P9 P44 P75 P79 P92 P99 P100 P107 P135 P160 P164
## 0.0023 0.0019 0.0041 0.0019 0.0039 0.0019 0.0299 0.0042 0.0265 0.0367 0.0289
## P178 P180 P192 P193 P197 P198 P205 P208 P212 P214 P220
## 0.0366 0.0438 0.0195 0.0178 0.0032 0.0036 0.0000 0.0172 0.0119 0.0328 0.0002
These people don’t fit the model well.
Cronbach’s alpha is a measure of internal consistency, that is, how closely related a set of items are as a group. It is considered to be a measure of scale reliability.
The defination:
\[\alpha = \frac{p}{p-1} (1 - \Sigma_{i=1}^p \frac{ \sigma_{y_i}^2}{\sigma_x^2} )\]
p is the item number; \(\sigma_{y_i}^2\) is the variance of i th item; and \(\sigma_x^2\) is the variance of total score.
## [1] 0.918483
\(\alpha > 0.9\) means excellent internal consistency; and for an exploratory study, \(\alpha > 0.7\) is acceptable.
Model 1: item difficulty + person_ability
Model 2: item difficulty + person_ability + “dominant = injury”
Model 3: item difficulty + person_ability + “dominant = injury” + person_ability \(\times\) “dominant = injury”
A significant result between Model 1 and Model 2 would indicate the presence of uniform DIF; and a significant result between Model 2 and Model 3 would indicate the presence of non-uniform DIF .
## item ncat chi12 chi13 chi23 beta12 pseudo12.McFadden pseudo13.McFadden
## 1 1 3 0.1891 0.2519 0.3094 0.0316 0.0041 0.0065
## 2 2 3 0.5361 0.8180 0.8901 0.0051 0.0009 0.0009
## 3 3 3 0.0815 0.2129 0.8095 0.0137 0.0072 0.0073
## 4 4 4 0.5421 0.8298 0.9684 0.0075 0.0008 0.0008
## 5 5 3 0.1982 0.0913 0.0768 0.0013 0.0038 0.0109
## pseudo23.McFadden pseudo12.Nagelkerke pseudo13.Nagelkerke pseudo23.Nagelkerke
## 1 0.0024 0.0018 0.0029 0.0011
## 2 0.0000 0.0005 0.0006 0.0000
## 3 0.0001 0.0043 0.0044 0.0001
## 4 0.0000 0.0005 0.0005 0.0000
## 5 0.0071 0.0030 0.0086 0.0056
## pseudo12.CoxSnell pseudo13.CoxSnell pseudo23.CoxSnell df12 df13 df23
## 1 0.0016 0.0025 9e-04 1 2 1
## 2 0.0005 0.0005 0e+00 1 2 1
## 3 0.0038 0.0039 1e-04 1 2 1
## 4 0.0004 0.0005 0e+00 1 2 1
## 5 0.0027 0.0076 5e-03 1 2 1
Let’s focus on chi12, chi13 and chi23.
If we delete the people that don’t fill well in step 4, then no item exists DIF.
if we don’t delete the people that don’t fill well in step 4, for item 3, it indicate significant presence of uniform DIF; for item 5, it indicated the presence of non-uniform DIF.
##
## Martin-Loef-Test (split criterion: median)
## LR-value: 59.074
## Chi-square df: 95
## p-value: 0.999
We can assume uni-dimentionallity holds here, and there is no evidence against it according to Martin-Loef-Test.
## mhq1a1 mhq1a2 mhq1a3 mhq1a4 mhq1a5
## mhq1a1 1.00 -0.13 -0.13 -0.18 -0.37
## mhq1a2 -0.13 1.00 -0.34 -0.31 -0.13
## mhq1a3 -0.13 -0.34 1.00 -0.19 -0.34
## mhq1a4 -0.18 -0.31 -0.19 1.00 -0.29
## mhq1a5 -0.37 -0.13 -0.34 -0.29 1.00
##
## n= 187
##
##
## P
## mhq1a1 mhq1a2 mhq1a3 mhq1a4 mhq1a5
## mhq1a1 0.0706 0.0850 0.0117 0.0000
## mhq1a2 0.0706 0.0000 0.0000 0.0736
## mhq1a3 0.0850 0.0000 0.0079 0.0000
## mhq1a4 0.0117 0.0000 0.0079 0.0000
## mhq1a5 0.0000 0.0736 0.0000 0.0000
Local dependence does not usually impact the ordering of the measures, only their spacing.
Accordingly, any statistical tests based on differences between these Rasch measures should be interpreted conservatively, so that differences between measures need to be slightly larger than, say, a t-test would ordinarily require in order to be declared “significant”.]
## mhq5a1 mhq5a2 mhq5a3 mhq5a4
## 0 89 136 149 150
## 1 80 47 45 46
## 2 36 28 21 21
## 3 20 17 12 10
## 4 8 5 6 6
In the RSM, we use the same thresholds for different items. In the PCM, different items can have their own thresholds.
## LR statistic: 50.50682 df = 9 p = 8.648575e-08
The result is significant, thus we will PCM as our final model, i.e. we can’t simplify PCM into RSM.
there are no disordered thresholds in PCM model.
## mhq5a1 mhq5a2 mhq5a3 mhq5a4
## 0.872 1.000 1.000 1.000
We show all the p-values here. And for domain 1, all items fit well.
## P16 P52 P139 P152 P182 P184 P189 P217 P223
## 0.0059 0.0371 0.0030 0.0007 0.0047 0.0030 0.0059 0.0001 0.0030
These people don’t fit the model well.
Cronbach’s alpha is a measure of internal consistency, that is, how closely related a set of items are as a group. It is considered to be a measure of scale reliability.
The defination:
\[\alpha = \frac{p}{p-1} (1 - \Sigma_{i=1}^p \frac{ \sigma_{y_i}^2}{\sigma_x^2} )\]
p is the item number; \(\sigma_{y_i}^2\) is the variance of i th item; and \(\sigma_x^2\) is the variance of total score.
## [1] 0.8390395
\(\alpha > 0.9\) means excellent internal consistency; and for an exploratory study, \(\alpha > 0.7\) is acceptable.
Model 1: item difficulty + person_ability
Model 2: item difficulty + person_ability + “dominant = injury”
Model 3: item difficulty + person_ability + “dominant = injury” + person_ability \(\times\) “dominant = injury”
A significant result between Model 1 and Model 2 would indicate the presence of uniform DIF; and a significant result between Model 2 and Model 3 would indicate the presence of non-uniform DIF .
## item ncat chi12 chi13 chi23 beta12 pseudo12.McFadden pseudo13.McFadden
## 1 1 4 0.7364 0.9201 0.8174 0.0004 0.0002 0.0003
## 2 2 4 0.3660 0.6611 0.9184 0.0017 0.0017 0.0018
## 3 3 4 0.9225 0.9950 0.9802 0.0186 0.0000 0.0000
## 4 4 3 0.0663 0.1683 0.6609 0.0316 0.0090 0.0095
## pseudo23.McFadden pseudo12.Nagelkerke pseudo13.Nagelkerke pseudo23.Nagelkerke
## 1 1e-04 0.0002 0.0003 1e-04
## 2 0e+00 0.0013 0.0013 0e+00
## 3 0e+00 0.0000 0.0000 0e+00
## 4 5e-04 0.0062 0.0065 4e-04
## pseudo12.CoxSnell pseudo13.CoxSnell pseudo23.CoxSnell df12 df13 df23
## 1 0.0002 0.0003 1e-04 1 2 1
## 2 0.0011 0.0011 0e+00 1 2 1
## 3 0.0000 0.0000 0e+00 1 2 1
## 4 0.0051 0.0054 3e-04 1 2 1
##
## Martin-Loef-Test (split criterion: median)
## LR-value: 92.498
## Chi-square df: 63
## p-value: 0.009
There is significant evidence against the “Unidimentionality” according to Martin-Loef-Test.
## mhq5a1 mhq5a2 mhq5a3 mhq5a4
## mhq5a1 1.00 -0.44 -0.48 -0.61
## mhq5a2 -0.44 1.00 -0.17 -0.22
## mhq5a3 -0.48 -0.17 1.00 0.13
## mhq5a4 -0.61 -0.22 0.13 1.00
##
## n= 145
##
##
## P
## mhq5a1 mhq5a2 mhq5a3 mhq5a4
## mhq5a1 0.0000 0.0000 0.0000
## mhq5a2 0.0000 0.0356 0.0072
## mhq5a3 0.0000 0.0356 0.1061
## mhq5a4 0.0000 0.0072 0.1061
Local dependence does not usually impact the ordering of the measures, only their spacing.
Accordingly, any statistical tests based on differences between these Rasch measures should be interpreted conservatively, so that differences between measures need to be slightly larger than, say, a t-test would ordinarily require in order to be declared “significant”.