library(lordif)
data(Anxiety)Maximising coefficient omega
The Step.omega function
This document contains R code that can be used to select items from a pool of items, such that the reliability of the set of items will be maximized. The code can be used with any data set containing polytomous items (ideally the items should have five or more response options).
The pool of items
As an example, we have a pool of 29 Anxiety items to which 766 people have responded. The items employ a five-point Likert-type rating scale. We want to select the 20 items that will maximise the reliability (coefficient omega) of the scale.
The data for this example are available from the lordif package. The name of the data frame is Anxiety. Users can specify any data frame they want.
Assumptions
The item selection approach proceeds on the assumption that a single factor model with uncorrelated unique variances fits the data. Unfortunately, this assumption is often violated with personality data. In such cases, one might proceed if it can be shown that the items measure a strong general factor that dominates all other factors.
Activating the packages we need
The analysis requires the psych and MBESS packages.
library(psych)
library(MBESS)Preparing the data for analysis
The Anxiety data frame contains three columns with demographic variables that we don’t need now. We create a new data frame (simply called “data”) that contains the items only. We print the names of the variables to check that the data frame contains the variables we are interested in.
Users can specify any data frame containing any set of items, on condition that the data frame contains the items only.
data <- Anxiety[-c(1:3)]
names(data) [1] "R1" "R2" "R3" "R4" "R5" "R6" "R7" "R8" "R9" "R10" "R11" "R12"
[13] "R13" "R14" "R15" "R16" "R17" "R18" "R19" "R20" "R21" "R22" "R23" "R24"
[25] "R25" "R26" "R27" "R28" "R29"
Descriptive statistics
As a first step we check descriptive statistics of the items.
library(psych)
describe(data) vars n mean sd median trimmed mad min max range skew kurtosis se
R1 1 766 1.49 0.83 1 1.31 0.00 1 5 4 1.79 2.87 0.03
R2 2 766 1.43 0.75 1 1.25 0.00 1 5 4 1.87 3.32 0.03
R3 3 766 1.42 0.81 1 1.23 0.00 1 5 4 2.03 3.82 0.03
R4 4 766 1.87 1.03 2 1.70 1.48 1 5 4 1.04 0.27 0.04
R5 5 766 1.46 0.88 1 1.26 0.00 1 5 4 1.97 3.27 0.03
R6 6 766 1.57 0.94 1 1.37 0.00 1 5 4 1.68 2.14 0.03
R7 7 766 1.99 0.98 2 1.89 1.48 1 5 4 0.64 -0.44 0.04
R8 8 766 1.52 0.84 1 1.36 0.00 1 5 4 1.53 1.68 0.03
R9 9 766 1.67 0.95 1 1.51 0.00 1 5 4 1.34 1.12 0.03
R10 10 766 1.41 0.79 1 1.22 0.00 1 5 4 2.05 3.88 0.03
R11 11 766 1.65 0.92 1 1.48 0.00 1 5 4 1.50 1.96 0.03
R12 12 766 1.88 0.96 2 1.76 1.48 1 5 4 0.91 0.24 0.03
R13 13 766 1.70 1.06 1 1.50 0.00 1 5 4 1.32 0.61 0.04
R14 14 766 1.80 0.99 1 1.65 0.00 1 5 4 0.99 0.00 0.04
R15 15 766 1.50 0.86 1 1.32 0.00 1 5 4 1.82 2.88 0.03
R16 16 766 2.04 1.05 2 1.90 1.48 1 5 4 0.73 -0.35 0.04
R17 17 766 1.23 0.60 1 1.08 0.00 1 5 4 3.06 10.57 0.02
R18 18 766 1.93 1.07 2 1.76 1.48 1 5 4 1.00 0.26 0.04
R19 19 766 1.40 0.75 1 1.21 0.00 1 5 4 2.04 4.04 0.03
R20 20 766 1.54 0.91 1 1.36 0.00 1 5 4 1.69 2.21 0.03
R21 21 766 1.54 0.86 1 1.37 0.00 1 5 4 1.57 1.78 0.03
R22 22 766 1.80 0.94 2 1.67 1.48 1 5 4 0.95 0.09 0.03
R23 23 766 1.90 1.03 2 1.76 1.48 1 5 4 0.91 -0.02 0.04
R24 24 766 1.83 1.00 2 1.67 1.48 1 5 4 1.13 0.63 0.04
R25 25 766 2.40 1.21 2 2.31 1.48 1 5 4 0.40 -0.84 0.04
R26 26 766 2.03 1.05 2 1.89 1.48 1 5 4 0.69 -0.49 0.04
R27 27 766 1.83 0.98 2 1.69 1.48 1 5 4 0.99 0.22 0.04
R28 28 766 2.06 1.07 2 1.93 1.48 1 5 4 0.66 -0.51 0.04
R29 29 766 1.55 0.85 1 1.39 0.00 1 5 4 1.53 1.81 0.03
The item selection function
McDonald (1999) describes an item selection procedure that can be used to maximize the reliability of a scale. According to McDonald the reliability of a scale (as reflected by coefficient omega) of length k can be maximized by selecting from a pool of items the k items with the highest information statistics.
The function first obtains a single factor solution of the entire pool of items, using the covariance matrix of the items as input. This yields a solution where the factor is standardized, but the factor loadings and unique variances are unstandardized.
Second, the unstandardized factor loadings and unique variances are used to calculate item information statistics, where information is equal to the ratio of the squared factor loading (signal) to the unique variance (noise).
Third, the item with the lowest information statistic is removed from the dataset. The process is repeated until the specified number of items remain in the scale. At each step two reliability coefficients are calculated, namely omega and alpha.
Activating the function
Users should paste the code that follows in an R script window and then run it. This will activate the Step.Omega function, which will remain active until the R session is closed. Users can delete the code once the function has been activated.
Step.Omega <- function(data, items.wanted, i) {
h2 <- fa(data, 1, cor = "cov")$communality
u2 <- fa(data, 1, cor = "cov")$uniqueness
info <- sort(h2/u2)
## Data frame that tracks omega as you remove items iteratively
StepOmega <- as.data.frame(c(0, "None",
ci.reliability(data, type = "hierarchical")[1]))
names(StepOmega) <- c('Step', 'Item.removed', 'Omega')
## Specify number of items you want
items.wanted <- items.wanted
for (i in 1:(ncol(data) - items.wanted)) {
h2 <- fa(data, 1, cor = "cov")$communality
u2 <- fa(data, 1, cor = "cov")$uniqueness
info <- sort(h2/u2)
# Identify item with lowest information
low.info <- names(info[1])
item.out <- names(data) %in% c(low.info)
### Remove item with the lowest information from data frame
data <- data[!item.out]
# Calculate omega with MBESS package
rel <- ci.reliability(data, type = "hierarchical")[1]
# Store the step, name of item removed and omega in a data frame
LoopOut <- as.data.frame(c(i,low.info,rel))
names(LoopOut ) <- c('Step','Item.removed','Omega')
StepOmega <- rbind(StepOmega,LoopOut)
}
results <- list(Steps = StepOmega,
Items.in.scale = names(data),
Omega = ci.reliability(data, type = "hierarchical")[1],
Alpha = alpha(data)$total[1])
print(results)
}Performing the analysis
The analysis is run by typing the name of the function (Step.Omega). The function takes two arguments, namely the name of the data frame containing the items, and the number of items that should remain in the scale.
The output contains (a) a table that shows at each step which item was removed (i.e. the item with the lowest information), along with the reliability of the scale without that item; (b) the names of the items that are included in the final scale; (c) coefficient omega of the final scale; and (d) coefficient alpha of the final scale.
Step.Omega(data, items.wanted = 12)$Steps
Step Item.removed Omega
1 0 None 0.9679626
2 1 R21 0.9682411
3 2 R25 0.9698631
4 3 R8 0.9694007
5 4 R11 0.9691310
6 5 R9 0.9690738
7 6 R13 0.9690519
8 7 R14 0.9688353
9 8 R18 0.9690265
10 9 R12 0.9689736
11 10 R17 0.9684700
12 11 R23 0.9678633
13 12 R26 0.9672829
14 13 R6 0.9661940
15 14 R5 0.9646160
16 15 R15 0.9628851
17 16 R2 0.9608687
18 17 R3 0.9585747
$Items.in.scale
[1] "R1" "R4" "R7" "R10" "R16" "R19" "R20" "R22" "R24" "R27" "R28" "R29"
$Omega
$Omega$est
[1] 0.9585747
$Alpha
raw_alpha
0.9575593
Using the function for selecting items
The function can be used to find the set of items that will maximise coefficient omega for a scale of a specified length k. The output lists (a) the items that are retained in the scale, (b) coefficients alpha and omega of the scale, and (c) a table showing which item is removed at each successive step and the scale reliability at that step. A purely mechanical approach would dictate that the k items with the highest information statistics be retained. However, items should rarely be retained or rejected on the basis of statistics alone. It is recommended that the content of an item that is removed by the function be carefully considered before a final decision about retaining or removing it is made. Items that contain essential content should likely be considered for retention, even if their information is lower than that of other items.
Another way in which the function can be used is to specify a minimum reliability that would be acceptable. The results can then be used to identify the point at which that reliability is reached and to note which items were identified for removal. Again, careful consideration should also be given to the item content.