Overall questions

1, When other assumptions hold and unidimentionality donesn’t hold, should we just declare “these items are not good”. Because the latent covariates may contain more than “wrist condition” only, it is hard to study health condition from these items.

2, According to reference “An introduction to the Rasch measurement model: An example using the Hospital Anxiety and Depression Scale (HADS)”, when there is significant indicator of DIF, even after bonforroni adjustment, we should split the group into two subgroups to reach unbiasness. Do you agree with that?

3, For local dependency, there are many criterias and their statements don’t contain clear statistical defination, so I am still searching reference papers for the final criteria. If you know any clearly-defined criteria, please let me know.

Domain 1

Table 1

##   mhq1a1 mhq1a2 mhq1a3 mhq1a4 mhq1a5
## 0     66     88     60     30     81
## 1    107     97    102    102     88
## 2     51     34     52     70     46
## 3      6      9     14     24     13
## 4      3      5      5      7      5

1, To determine the appropriate Rasch model to use : Hypothesis testing on “RSM or PCM”

In the RSM, we use the same thresholds for different items. In the PCM, different items can have their own thresholds.

## LR statistic:  31.85512  df = 12  p = 0.001456823

The result is significant, thus we will PCM as our final model, i.e. we can’t simplify PCM into RSM.

2, Plot category thresholds

Table 1: RSM thresholds, whose shapes are the same.

Table 2: PCM threshold, whose shapes can be different.

there are no disordered thresholds in PCM model.

3, The items fit condition

## mhq1a1 mhq1a2 mhq1a3 mhq1a4 mhq1a5 
##  1.000  1.000  1.000  0.992  0.170

We show all the p-values here. And for domain 1, all items fit well.

4, The people fit condition

##     P9    P44    P75    P79    P92    P99   P100   P107   P135   P160   P164 
## 0.0023 0.0019 0.0041 0.0019 0.0039 0.0019 0.0299 0.0042 0.0265 0.0367 0.0289 
##   P178   P180   P192   P193   P197   P198   P205   P208   P212   P214   P220 
## 0.0366 0.0438 0.0195 0.0178 0.0032 0.0036 0.0000 0.0172 0.0119 0.0328 0.0002

These people don’t fit the model well.

5, Internal consistency : Cronbach’s Alpha

Cronbach’s alpha is a measure of internal consistency, that is, how closely related a set of items are as a group. It is considered to be a measure of scale reliability.

The defination:

\[\alpha = \frac{p}{p-1} (1 - \Sigma_{i=1}^p \frac{ \sigma_{y_i}^2}{\sigma_x^2} )\]

p is the item number; \(\sigma_{y_i}^2\) is the variance of i th item; and \(\sigma_x^2\) is the variance of total score.

## [1] 0.918483

\(\alpha > 0.9\) means excellent internal consistency; and for an exploratory study, \(\alpha > 0.7\) is acceptable.

6, DIF test

Model 1: item difficulty + person_ability

Model 2: item difficulty + person_ability + “dominant = injury”

Model 3: item difficulty + person_ability + “dominant = injury” + person_ability \(\times\) “dominant = injury”

A significant result between Model 1 and Model 2 would indicate the presence of uniform DIF; and a significant result between Model 2 and Model 3 would indicate the presence of non-uniform DIF .

##   item ncat  chi12  chi13  chi23 beta12 pseudo12.McFadden pseudo13.McFadden
## 1    1    3 0.1891 0.2519 0.3094 0.0316            0.0041            0.0065
## 2    2    3 0.5361 0.8180 0.8901 0.0051            0.0009            0.0009
## 3    3    3 0.0815 0.2129 0.8095 0.0137            0.0072            0.0073
## 4    4    4 0.5421 0.8298 0.9684 0.0075            0.0008            0.0008
## 5    5    3 0.1982 0.0913 0.0768 0.0013            0.0038            0.0109
##   pseudo23.McFadden pseudo12.Nagelkerke pseudo13.Nagelkerke pseudo23.Nagelkerke
## 1            0.0024              0.0018              0.0029              0.0011
## 2            0.0000              0.0005              0.0006              0.0000
## 3            0.0001              0.0043              0.0044              0.0001
## 4            0.0000              0.0005              0.0005              0.0000
## 5            0.0071              0.0030              0.0086              0.0056
##   pseudo12.CoxSnell pseudo13.CoxSnell pseudo23.CoxSnell df12 df13 df23
## 1            0.0016            0.0025             9e-04    1    2    1
## 2            0.0005            0.0005             0e+00    1    2    1
## 3            0.0038            0.0039             1e-04    1    2    1
## 4            0.0004            0.0005             0e+00    1    2    1
## 5            0.0027            0.0076             5e-03    1    2    1

Let’s focus on chi12, chi13 and chi23.

If we delete the people that don’t fill well in step 4, then no item exists DIF.

if we don’t delete the people that don’t fill well in step 4, for item 3, it indicate significant presence of uniform DIF; for item 5, it indicated the presence of non-uniform DIF.

7, Unidimentionality : Martin-Loef-Test

## 
## Martin-Loef-Test (split criterion: median)
## LR-value: 59.074 
## Chi-square df: 95 
## p-value: 0.999

We can assume uni-dimentionallity holds here, and there is no evidence against it according to Martin-Loef-Test.

8, Local dependency

##        mhq1a1 mhq1a2 mhq1a3 mhq1a4 mhq1a5
## mhq1a1   1.00  -0.13  -0.13  -0.18  -0.37
## mhq1a2  -0.13   1.00  -0.34  -0.31  -0.13
## mhq1a3  -0.13  -0.34   1.00  -0.19  -0.34
## mhq1a4  -0.18  -0.31  -0.19   1.00  -0.29
## mhq1a5  -0.37  -0.13  -0.34  -0.29   1.00
## 
## n= 187 
## 
## 
## P
##        mhq1a1 mhq1a2 mhq1a3 mhq1a4 mhq1a5
## mhq1a1        0.0706 0.0850 0.0117 0.0000
## mhq1a2 0.0706        0.0000 0.0000 0.0736
## mhq1a3 0.0850 0.0000        0.0079 0.0000
## mhq1a4 0.0117 0.0000 0.0079        0.0000
## mhq1a5 0.0000 0.0736 0.0000 0.0000

Local dependence does not usually impact the ordering of the measures, only their spacing.

Accordingly, any statistical tests based on differences between these Rasch measures should be interpreted conservatively, so that differences between measures need to be slightly larger than, say, a t-test would ordinarily require in order to be declared “significant”.]

Domain 5

Table 1

##   mhq5a1 mhq5a2 mhq5a3 mhq5a4
## 0     89    136    149    150
## 1     80     47     45     46
## 2     36     28     21     21
## 3     20     17     12     10
## 4      8      5      6      6

1, To determine the appropriate Rasch model to use : Hypothesis testing on “RSM or PCM”

In the RSM, we use the same thresholds for different items. In the PCM, different items can have their own thresholds.

## LR statistic:  50.50682  df = 9  p = 8.648575e-08

The result is significant, thus we will PCM as our final model, i.e. we can’t simplify PCM into RSM.

2, Plot category thresholds

Table 1: RSM thresholds, whose shapes are the same.

Table 2: PCM threshold, whose shapes can be different.

there are no disordered thresholds in PCM model.

3, The items fit condition

## mhq5a1 mhq5a2 mhq5a3 mhq5a4 
##  0.872  1.000  1.000  1.000

We show all the p-values here. And for domain 1, all items fit well.

4, The people fit condition

##    P16    P52   P139   P152   P182   P184   P189   P217   P223 
## 0.0059 0.0371 0.0030 0.0007 0.0047 0.0030 0.0059 0.0001 0.0030

These people don’t fit the model well.

5, Internal consistency : Cronbach’s Alpha

Cronbach’s alpha is a measure of internal consistency, that is, how closely related a set of items are as a group. It is considered to be a measure of scale reliability.

The defination:

\[\alpha = \frac{p}{p-1} (1 - \Sigma_{i=1}^p \frac{ \sigma_{y_i}^2}{\sigma_x^2} )\]

p is the item number; \(\sigma_{y_i}^2\) is the variance of i th item; and \(\sigma_x^2\) is the variance of total score.

## [1] 0.8390395

\(\alpha > 0.9\) means excellent internal consistency; and for an exploratory study, \(\alpha > 0.7\) is acceptable.

6, DIF test

Model 1: item difficulty + person_ability

Model 2: item difficulty + person_ability + “dominant = injury”

Model 3: item difficulty + person_ability + “dominant = injury” + person_ability \(\times\) “dominant = injury”

A significant result between Model 1 and Model 2 would indicate the presence of uniform DIF; and a significant result between Model 2 and Model 3 would indicate the presence of non-uniform DIF .

##   item ncat  chi12  chi13  chi23 beta12 pseudo12.McFadden pseudo13.McFadden
## 1    1    4 0.7364 0.9201 0.8174 0.0004            0.0002            0.0003
## 2    2    4 0.3660 0.6611 0.9184 0.0017            0.0017            0.0018
## 3    3    4 0.9225 0.9950 0.9802 0.0186            0.0000            0.0000
## 4    4    3 0.0663 0.1683 0.6609 0.0316            0.0090            0.0095
##   pseudo23.McFadden pseudo12.Nagelkerke pseudo13.Nagelkerke pseudo23.Nagelkerke
## 1             1e-04              0.0002              0.0003               1e-04
## 2             0e+00              0.0013              0.0013               0e+00
## 3             0e+00              0.0000              0.0000               0e+00
## 4             5e-04              0.0062              0.0065               4e-04
##   pseudo12.CoxSnell pseudo13.CoxSnell pseudo23.CoxSnell df12 df13 df23
## 1            0.0002            0.0003             1e-04    1    2    1
## 2            0.0011            0.0011             0e+00    1    2    1
## 3            0.0000            0.0000             0e+00    1    2    1
## 4            0.0051            0.0054             3e-04    1    2    1

7, Unidimentionality : Martin-Loef-Test

## 
## Martin-Loef-Test (split criterion: median)
## LR-value: 92.498 
## Chi-square df: 63 
## p-value: 0.009

There is significant evidence against the “Unidimentionality” according to Martin-Loef-Test.

8, Local dependency

##        mhq5a1 mhq5a2 mhq5a3 mhq5a4
## mhq5a1   1.00  -0.44  -0.48  -0.61
## mhq5a2  -0.44   1.00  -0.17  -0.22
## mhq5a3  -0.48  -0.17   1.00   0.13
## mhq5a4  -0.61  -0.22   0.13   1.00
## 
## n= 145 
## 
## 
## P
##        mhq5a1 mhq5a2 mhq5a3 mhq5a4
## mhq5a1        0.0000 0.0000 0.0000
## mhq5a2 0.0000        0.0356 0.0072
## mhq5a3 0.0000 0.0356        0.1061
## mhq5a4 0.0000 0.0072 0.1061

Local dependence does not usually impact the ordering of the measures, only their spacing.

Accordingly, any statistical tests based on differences between these Rasch measures should be interpreted conservatively, so that differences between measures need to be slightly larger than, say, a t-test would ordinarily require in order to be declared “significant”.