Department of Industrial Psychology
Stellenbosch University
South Africa
This is a demonstration of using R for the analysis of scales that employ items with ordinal response scales (e.g. Likert-type items). I use the partial credit model (Wright & Masters, 1982), which is one of a family of Rasch models, to analyse the responses of 1377 persons to the items of the General Work Stress Scale (de Bruin & Taylor, 2006). The analysis is performed with the TAM package (Robitzsch et al., 2022). It is assumed that readers have a basic understanding of Rasch models.
The R-code below can be used to perform a basic partical credit model analysis of any unidimensional scale with ordinal response categories. Users should import their data and store it as a data frame named raschdata. The data frame should include only the items of interest. Users can use square brackets to select items for inclusion or exclusion. For instance,
Users should ensure that the items are all scored in the same direction (i.e. if required, items should be reverse scored before the analysis begins). Items should be scored such that the lowest category = 0 (e.g. responses on a three-point scale should be scored as 0, 1, or 2).
Users must have the TAM and psych packages installed on their machines,
library(TAM)
library(psych)
raschdata <- read.csv("stressdata.csv")
nitems <- ncol(raschdata)
describe(raschdata)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## i1 1 1377 1.60 1.06 2 1.56 1.48 0 4 4 0.25 -0.35 0.03
## i2 2 1377 1.15 1.11 1 1.01 1.48 0 4 4 0.74 -0.19 0.03
## i3 3 1377 1.23 1.06 1 1.11 1.48 0 4 4 0.69 -0.07 0.03
## i4 4 1377 1.24 1.09 1 1.14 1.48 0 4 4 0.54 -0.48 0.03
## i5 5 1377 1.04 0.93 1 0.95 1.48 0 4 4 0.70 0.13 0.03
## i6 6 1377 1.08 0.90 1 1.01 1.48 0 4 4 0.65 0.18 0.02
## i7 7 1377 1.67 1.05 2 1.66 1.48 0 4 4 0.16 -0.66 0.03
## i8 8 1377 0.97 0.94 1 0.87 1.48 0 4 4 0.86 0.49 0.03
## i9 9 1377 0.99 0.96 1 0.88 1.48 0 4 4 0.91 0.59 0.03
alpha(raschdata)$response.freq
## 0 1 2 3 4 miss
## i1 0.1728395 0.2716049 0.3907044 0.11111111 0.05374001 0
## i2 0.3565723 0.2984749 0.2265795 0.07988381 0.03848947 0
## i3 0.2817720 0.3645606 0.2360203 0.08133624 0.03631082 0
## i4 0.3071895 0.3028322 0.2599855 0.10021786 0.02977487 0
## i5 0.3202614 0.3972404 0.2171387 0.05228758 0.01307190 0
## i6 0.2824982 0.4306463 0.2244009 0.05083515 0.01161946 0
## i7 0.1408860 0.3144517 0.3195352 0.18736383 0.03776325 0
## i8 0.3623820 0.3812636 0.1982571 0.04066812 0.01742919 0
## i9 0.3536674 0.3870733 0.1931736 0.04429920 0.02178649 0
The TAM package can be used to fit observed data to Rasch partial credit model. Item and person parameters are estimated via marginal maximum likelihood (users can also use joint maximum likelihood if desired).
The partial credit model is suitable for the analysis of items where persons receive partial credit for partially correct answers (e.g. on a 3-point item a completely correct response receives a score of three, a partially correct response that is close to the correct answer receives a score of two, a response that is mostly incorrect receives a score of one, and a completely incorrect answer receives a score of zero). The partial credit model is also suitable for the analysis of rating type data (e.g. Likert-type scales). Note that the response options can differ across items.
myTAM <- tam.mml(raschdata,
irtmodel = "PCM")
The partial credit model yields for every item an overall location parameter and two or more category threshold parameters. Higher location parameters indicate that an item is more difficult to solve (in the context of ability testing) or that the item is more difficult to agree with (in the context of attitude or personality testing). An item with \(k\) response categories will have \(k - 1\) thresholds. The thresholds reflect the relative “difficulty” in choosing a response category rather than the one directly preceding it.
It is expected that the category thresholds will be ordered (e.g. the first threshold should have a lower location than the second threshold and so on). In the context of attitude testing, disordered thresholds might happen because of very low endorsement rates of one or more response categories. Andrich holds the opinion that disordered thresholds indicate that persons do not use the response categories as intended by the test developer. In such cases it might be necessary to collapse some response categories.
g <- myTAM$item_irt
g
## item alpha beta tau.Cat1 tau.Cat2 tau.Cat3 tau.Cat4
## 1 i1 1 0.6771950 -2.182549 -1.0935296 1.4089742 1.867104
## 2 i2 1 1.3402854 -1.696564 -0.6860079 0.8859069 1.496665
## 3 i3 1 1.2455477 -2.159559 -0.4988067 0.9807419 1.677623
## 4 i4 1 1.3020437 -1.935436 -0.8656406 0.7744098 2.026667
## 5 i5 1 1.8027228 -2.542314 -0.6988677 1.0925751 2.148606
## 6 i6 1 1.7858654 -2.794740 -0.6649302 1.1721855 2.287484
## 7 i7 1 0.6586941 -2.594977 -0.7904606 0.6420925 2.743345
## 8 i8 1 1.8131038 -2.302253 -0.5855767 1.2953295 1.592500
## 9 i9 1 1.7055158 -2.251534 -0.4618804 1.2389116 1.474503
deltas <- myTAM$xsi[1:nitems, ]
g <- myTAM$item_irt
gg <- g[g$tau.Cat1 > g$tau.Cat2 | g$tau.Cat2 > g$tau.Cat3 | g$tau.Cat3 > g$tau.Cat4, ]
rownames(gg)
## character(0)
# Plot the option characteristic curves
plot(myTAM,
type = "items",
export = FALSE,
package = "graphics",
observed = TRUE,
low = -6,
high = 6)
plot(myTAM,
type = "expected",
ngroups = 6,
low = -6,
high = 6,
package = "lattice", overlay = FALSE)
plot(myTAM,
type = "expected",
ngroups = 6,
low = -6,
high = 6,
package = "lattice",
overlay = TRUE,
observed = FALSE)
## Iteration in WLE/MLE estimation 1 | Maximal change 2.145
## Iteration in WLE/MLE estimation 2 | Maximal change 0.8445
## Iteration in WLE/MLE estimation 3 | Maximal change 0.489
## Iteration in WLE/MLE estimation 4 | Maximal change 0.1344
## Iteration in WLE/MLE estimation 5 | Maximal change 0.0047
## Iteration in WLE/MLE estimation 6 | Maximal change 1e-04
## Iteration in WLE/MLE estimation 7 | Maximal change 0
## ----
## WLE Reliability= 0.867
## ....................................................
## Plots exported in png format into folder:
## C:/Users/deondb/OneDrive - Stellenbosch University/Documents/Rpubs/Plots
fit <- msq.itemfit(myTAM)$itemfit
(fit <- data.frame(fit[1], round(fit[3:8], 2)))
## item Outfit Outfit_t Outfit_p Infit Infit_t Infit_p
## 1 i1 1.00 -0.09 0.93 0.98 -0.47 0.64
## 2 i2 0.85 -3.33 0.00 0.86 -3.74 0.00
## 3 i3 1.00 0.13 0.90 0.99 -0.13 0.90
## 4 i4 1.10 2.30 0.02 1.09 2.33 0.02
## 5 i5 1.04 0.85 0.40 1.06 1.42 0.16
## 6 i6 0.85 -3.95 0.00 0.86 -3.67 0.00
## 7 i7 1.13 3.46 0.00 1.14 3.57 0.00
## 8 i8 0.92 -1.79 0.07 0.94 -1.49 0.14
## 9 i9 1.12 2.51 0.01 1.13 3.19 0.00
Yen’s (1984) Q3 statistic is the correlation between the items after the latent trait has been partialled out (i.e. the correlation between the residuals of the analysis). The Q3 statistic is used to detect violations of the assumption of local independence. Item pairs with large Q3 statistics might be seen as locally dependent. Note that a Q3 value of zero does not necessarily indicate that two items are locallly independent, because they might be non-linearly related to each other (de Ayala, 2009).
The expected value of Q3 is slightly smaller than zero. The adjusted Q3 statistic corrects for this by subtracting the average of all the Q3 statistics from Q3.
The MADaQ3 reflects the average of the adjusted Q3 statistics and might be seen as an overall measure of model-data fit.
q3 <- tam.modelfit(myTAM)$stat.itempair
## **** Calculate Residuals
## **** Calculate Counts
## **** Calculate Covariances
q3 <- data.frame(q3[1:2], round(q3[5:6], 2), round(q3[7:8], 4))
q3
## item1 item2 Q3 aQ3 p p.holm
## 1 i1 i2 0.41 0.52 0.0000 0.0000
## 27 i5 i6 0.24 0.35 0.0000 0.0000
## 24 i4 i7 0.17 0.28 0.0000 0.0000
## 13 i2 i7 -0.34 -0.23 0.0000 0.0000
## 9 i2 i3 0.09 0.20 0.0000 0.0000
## 6 i1 i7 -0.30 -0.19 0.0000 0.0000
## 3 i1 i4 -0.29 -0.18 0.0000 0.0000
## 10 i2 i4 -0.28 -0.17 0.0000 0.0000
## 2 i1 i3 0.04 0.15 0.0000 0.0000
## 11 i2 i5 -0.25 -0.14 0.0000 0.0000
## 4 i1 i5 -0.23 -0.12 0.0000 0.0002
## 21 i3 i9 -0.22 -0.11 0.0001 0.0018
## 32 i6 i8 -0.01 0.10 0.0001 0.0029
## 19 i3 i7 -0.21 -0.10 0.0002 0.0044
## 20 i3 i8 -0.21 -0.10 0.0004 0.0080
## 34 i7 i8 -0.01 0.10 0.0004 0.0084
## 26 i4 i9 -0.20 -0.09 0.0006 0.0112
## 5 i1 i6 -0.20 -0.09 0.0015 0.0285
## 36 i8 i9 -0.03 0.08 0.0023 0.0410
## 18 i3 i6 -0.19 -0.08 0.0032 0.0548
## 30 i5 i9 -0.05 0.06 0.0167 0.2674
## 12 i2 i6 -0.17 -0.06 0.0179 0.2688
## 23 i4 i6 -0.17 -0.06 0.0320 0.4480
## 35 i7 i9 -0.17 -0.06 0.0354 0.4601
## 33 i6 i9 -0.06 0.05 0.0427 0.5126
## 7 i1 i8 -0.16 -0.05 0.0733 0.8064
## 17 i3 i5 -0.15 -0.04 0.1057 1.0000
## 8 i1 i9 -0.15 -0.04 0.1437 1.0000
## 16 i3 i4 -0.07 0.04 0.1563 1.0000
## 29 i5 i8 -0.14 -0.03 0.2190 1.0000
## 28 i5 i7 -0.14 -0.03 0.2709 1.0000
## 31 i6 i7 -0.08 0.03 0.2957 1.0000
## 22 i4 i5 -0.13 -0.02 0.4510 1.0000
## 15 i2 i9 -0.09 0.02 0.5310 1.0000
## 25 i4 i8 -0.10 0.01 0.6276 1.0000
## 14 i2 i8 -0.12 -0.01 0.7569 1.0000
(MADaQ3 <- tam.modelfit(myTAM)$stat.MADaQ3)
## **** Calculate Residuals
## **** Calculate Counts
## **** Calculate Covariances
## MADaQ3 maxaQ3 p
## 1 0.1108286 0.5239566 1.376596e-101
The RMSD (which is also referred to as the RMSEA in some publications) is used as a measure of the degree of misfit. Values closer to zero indicate better fit. Values > 0.06 might reflect unacceptable fit.
(item.RMSD <- IRT.itemfit(myTAM)$RMSD_bc)
## item Group1
## 1 i1 0.02367909
## 2 i2 0.03675970
## 3 i3 0.02452907
## 4 i4 0.01693367
## 5 i5 0.01825815
## 6 i6 0.03868853
## 7 i7 0.02512242
## 8 i8 0.02238762
## 9 i9 0.01952306
myTAMresid <- IRT.residuals(myTAM)$stand_residuals
psych::scree(myTAMresid,
factors = FALSE)
It is expected that there should be no structure among the item residuals. Against this background an unrotated principal components analysis of the standardized residuals should yield a weakly defined first component, which should not be much stronger than the subsequent components. In practice the focus falls on the eigenvalues of the components. As a rule of thumb, the first eigenvalue should be < 2 (which would roughly indicate that the first component of the residuals has less strength than two items).
The size and signs of the component loadings can be used to identify substrands of items that represents constructs above and beyond the trait of interest. In particular, one would want to examine the items with large positive loadings and contrast them with the items with large negative loadings.
(pcaresid <- pca(myTAMresid,
nfactors = ncol(myTAMresid),
rotate = "none"))
## Principal Components Analysis
## Call: principal(r = r, nfactors = nfactors, residuals = residuals,
## rotate = rotate, n.obs = n.obs, covar = covar, scores = scores,
## missing = missing, impute = impute, oblique.scores = oblique.scores,
## method = method, use = use, cor = cor, correct = 0.5, weight = NULL)
## Standardized loadings (pattern matrix) based upon correlation matrix
## PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 h2 u2 com
## i1 0.76 -0.01 -0.07 0.17 -0.37 0.04 -0.10 -0.49 0.06 1 -6.7e-16 2.5
## i2 0.78 0.02 -0.07 0.11 -0.14 0.10 0.10 0.58 0.05 1 -5.3e-15 2.1
## i3 0.37 -0.30 0.49 -0.14 0.65 -0.30 0.02 -0.07 0.06 1 -4.4e-16 3.8
## i4 -0.41 -0.61 0.20 -0.13 -0.08 0.58 0.23 -0.02 0.06 1 -2.0e-15 3.5
## i5 -0.32 0.56 0.50 -0.02 -0.07 0.18 -0.53 0.09 0.06 1 -8.9e-16 3.9
## i6 -0.32 0.59 0.22 0.38 -0.03 -0.14 0.57 -0.06 0.05 1 -1.8e-15 3.8
## i7 -0.54 -0.48 -0.14 0.12 -0.35 -0.54 -0.14 0.12 0.07 1 -2.2e-15 4.2
## i8 -0.20 0.06 -0.66 0.45 0.50 0.19 -0.17 -0.02 0.05 1 -6.7e-16 3.4
## i9 -0.07 0.39 -0.44 -0.79 0.03 -0.06 0.12 -0.03 0.06 1 -1.1e-15 2.2
##
## PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9
## SS loadings 2.03 1.51 1.23 1.06 0.96 0.81 0.75 0.61 0.03
## Proportion Var 0.23 0.17 0.14 0.12 0.11 0.09 0.08 0.07 0.00
## Cumulative Var 0.23 0.39 0.53 0.65 0.76 0.85 0.93 1.00 1.00
## Proportion Explained 0.23 0.17 0.14 0.12 0.11 0.09 0.08 0.07 0.00
## Cumulative Proportion 0.23 0.39 0.53 0.65 0.76 0.85 0.93 1.00 1.00
##
## Mean item complexity = 3.3
## Test of the hypothesis that 9 components are sufficient.
##
## The root mean square of the residuals (RMSR) is 0
## with the empirical chi square 0 with prob < NA
##
## Fit based upon off diagonal values = 1
### Plot the test information curve
info.myTAM <- IRT.informationCurves(myTAM)
plot(info.myTAM, xlab = "Theta",
ylab = "Test information",
main = "Test Information Curve")
plot(info.myTAM,
curve_type = "se",
xlab = "Theta",
ylab = "Standard error",
main = "Test Standard Error Curve")
Users can choose “MLE”, “WLE” or “EAP”.
pmod1 <- IRT.factor.scores(myTAM, type = "WLE")
## ***** Estimate WLE *****
## Iteration in WLE/MLE estimation 1 | Maximal change 2.145
## Iteration in WLE/MLE estimation 2 | Maximal change 0.8445
## Iteration in WLE/MLE estimation 3 | Maximal change 0.489
## Iteration in WLE/MLE estimation 4 | Maximal change 0.1344
## Iteration in WLE/MLE estimation 5 | Maximal change 0.0047
## Iteration in WLE/MLE estimation 6 | Maximal change 1e-04
## Iteration in WLE/MLE estimation 7 | Maximal change 0
## ----
## WLE Reliability= 0.867
#pmod1 <- IRT.factor.scores(myTAM, type = "EAP")
#pmod1 <- IRT.factor.scores(myTAM, type = "MLE")
### Histogram and boxplot of the WLE person measures
hist(pmod1$theta,
xlab = "Person measures in logits")
boxplot(pmod1$theta,
ylab = "Person measures in logits")
### Histogram and boxplot of the EAP person measures
#hist(pmod1$EAP,
# xlab = "Person measures in logits")
#boxplot(pmod1$EAP,
# ylab = "Person measures in logits")
The correlations of items with the estimated latent trait (theta) might also serve as indicators of item quality. In general, it is expected that each item should correlate moderately to strongly with the latent trait. These correlations are seen as indicators of the discrimination ability of the items. Items with stronger correlations with the latent trait discriminate better between persons with different trait levels.
Whereas Rasch models require that items should discriminate equally well, the correlations between items and the latent trait are influenced by the distance between the locations of items and persons on the latent trait. Items with extreme high or low locations will have low standard deviations, which curtails the strength of their correlations with the trait. Yet these items might discriminate well between persons who are located close to them on the latent trait. Items with low item-trait correlations might therefore still be useful, on condition that their infit and outfit statistics are satisfactory.
The table below contains the correlations between (a) the items and the latent trait, and (b) the items and the summated total score.
The item-total correlations are expected to be somewhat larger than the item-trait correlations. The item-total correlation reflects the correlation between an item (including its error variance) and the total score where the item forms part of the total score. The item-total correlation is inflated because the error variance of the item correlates with the error variance of the total score.
pps <- cor(raschdata, pmod1$theta, use = "complete.obs")
it <- matrix(cor(rowSums(raschdata), raschdata, use = "complete.obs"))
item.trait <- cbind(pps,it)
colnames(item.trait) <- c("Item.trait", "Item.total")
item.trait
## Item.trait Item.total
## i1 0.7586157 0.7712468
## i2 0.7747364 0.8176713
## i3 0.7385467 0.7659369
## i4 0.7164540 0.7371995
## i5 0.6915165 0.7109106
## i6 0.7573744 0.7789909
## i7 0.7125007 0.7018259
## i8 0.7249905 0.7587248
## i9 0.6717893 0.6975067
A construct alley plot reflects both the location of an item on the latent trait continuum (on the y-axis) and the outfit or infit mean square of the item (on the x-axis). The expected value of the mean squares is one. Items with mean squares < 0.70 or > 1.30 show potentially problematic overfit or underfit, respectively.
### Plot a construct alley using outfit
plot(msq.itemfit(myTAM)$itemfit$Outfit,
deltas$xsi[1:ncol(raschdata)],
xlim = c(0.5, 1.5),
type = "n",
xlab = "Outfit mean square",
ylab = "Item location in logits")
text(msq.itemfit(myTAM)$itemfit$Outfit,
deltas$xsi[1:nitems],
labels = colnames(raschdata),
cex = 0.9,
font = 2)
abline(v = 1, lty = "longdash")
abline(v = 1.30, lty = "dashed")
abline(v = 0.70, lty = "dashed")
### Plot a construct alley using infit
plot(msq.itemfit(myTAM)$itemfit$Infit,
deltas$xsi[1:nitems],
xlim = c(0.5, 1.5),
type = "n",
xlab = "Infit mean square",
ylab = "Item location in logits")
text(msq.itemfit(myTAM)$itemfit$Infit,
deltas$xsi[1:nitems],
labels = colnames(raschdata),
cex = 0.9, font = 2)
abline(v = 1, lty = "longdash")
abline(v = 1.30, lty = "dashed")
abline(v = 0.70, lty = "dashed")
A Wright map of a partial credit model jointly reflects the locations of the persons and the Thurstonian thresholds of the items on the latent trait continuum. The map allows for a visual evaluation of the degree to which the items are targeted at the persons.
IRT.WrightMap(myTAM)
psych::describe(pmod1$theta)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 1377 0 1.54 0 0.03 1.3 -3.97 6.31 10.28 -0.14 0.77 0.04
An overall reliability coefficient, similar to Cronbach’s coefficient alpha, can be estimated using the weighted likelihood person measures (theta) and associated standard errors. The variance of the person measures, \(v\), reflects the test variance. The average standard error across the persons, \(s\), reflects the error variance. The true variance equals \(v - s\). The WLE reliability, i.e. the proportion of the test variance that is true variance, then equals \((v - s)/v\) (Adams, 2005; Wright & Masters, 1982).
WLErel(pmod1$theta,
pmod1$error)
## [1] 0.8673504
The reliability()
function yields three estimates of
test reliability: Cronbach’s alpha, McDonald’s omega, and Revelle’s beta
coeffients. Coefficient beta is the minimum split-half reliability. The
function also yields three experimental unidimensionality statistics:
(a) the fit of a single factor model, (b) the fit of a model where the
mean item correlation is used as the “model”, and (c) a
unidimensionality statistic that is the product of (a) and (b). See the
help pages of the psych
package for more information (Revelle, 2023).
If the number of factors is changed to two or more coefficient omega_hierarchical represents the proportion of observed variance that is explained by the general factor of a Schmid-Leiman transformation of the oblique factor solution. Omega_total represents the proportion of observed variance that is accounted for by all the common factors (Revelle & Condon, 2019).
reliability(raschdata, nfactors = 1)
## Measures of reliability
## reliability(keys = raschdata, nfactors = 1)
## omega_h alpha omega.tot Uni r.fit fa.fit max.split min.split mean.r
## All_items 0.9 0.9 0.9 0.96 0.98 0.99 0.92 0.84 0.51
## med.r n.items
## All_items 0.5 9
Adams, R. J. (2005). Reliability as a measurement design effect. Studies in Educational Evaluation, 31, 162-172.
de Bruin, G. P. & Taylor, N. (2005). Development of the Sources of Work Stress Inventory. South African Journal of Psychology, 35, 748-765.
de Bruin, G. P. (2006). Dimensionality of the General Work Stress Scale. SA Journal of Industrial Psychology, 32(4), 68-75.
Revelle, W. (2023). psych: Procedures for Psychological, Psychometric, and Personality Research. Northwestern University, Evanston, Illinois. R package version 2.3.3, https://CRAN.R-project.org/package=psych.
Revelle, W., & Condon, D. (2019). Reliability from \(\alpha\) to \(\omega\). Psychological Assessment, 31, 1395-1411.
Robitzsch A., Kiefer, T., & Wu, M. (2022). TAM: Test Analysis Modules. R package version 4.1-4, https://CRAN.R-project.org/package=TAM.
Wright, B. D. & Masters, G. M. (1982). Rating scale analysis. Chicago: Mesa Press.
Zinbarg, R. E., Revelle, W., Yovel, I., & Li, W. (2005). Cronbach’s \(\alpha\), Revelle’s \(\beta\), and McDonald’s \(\omega_H\): Their relations with each other and two alternative conceptualizations of reliability. Psychometrika, 70, 123-133. http://dx.doi.org/10.1007/s11336-003-0974-7