Maximising coefficient omega

Author

Deon de Bruin

The Step.omega function

This document contains R code that can be used to select items from a pool of items, such that the reliability of the set of items will be maximized. The code can be used with any data set containing polytomous items (ideally the items should have five or more response options).

The pool of items

As an example, we have a pool of 29 Anxiety items to which 766 people have responded. The items employ a five-point Likert-type rating scale. We want to select the 20 items that will maximise the reliability (coefficient omega) of the scale.

The data for this example are available from the lordif package. The name of the data frame is Anxiety. Users can specify any data frame they want.

library(lordif)
data(Anxiety)

Assumptions

The item selection approach proceeds on the assumption that a single factor model with uncorrelated unique variances fits the data. Unfortunately, this assumption is often violated with personality data. In such cases, one might proceed if it can be shown that the items measure a strong general factor that dominates all other factors.

Activating the packages we need

The analysis requires the psych and MBESS packages.

library(psych)
library(MBESS)

Preparing the data for analysis

The Anxiety data frame contains three columns with demographic variables that we don’t need now. We create a new data frame (simply called “data”) that contains the items only. We print the names of the variables to check that the data frame contains the variables we are interested in.

Users can specify any data frame containing any set of items, on condition that the data frame contains the items only.

data <- Anxiety[-c(1:3)]
names(data)

 [1] "R1"  "R2"  "R3"  "R4"  "R5"  "R6"  "R7"  "R8"  "R9"  "R10" "R11" "R12"
[13] "R13" "R14" "R15" "R16" "R17" "R18" "R19" "R20" "R21" "R22" "R23" "R24"
[25] "R25" "R26" "R27" "R28" "R29"

Descriptive statistics

As a first step we check descriptive statistics of the items.

library(psych)
describe(data)

    vars   n mean   sd median trimmed  mad min max range skew kurtosis   se
R1     1 766 1.49 0.83      1    1.31 0.00   1   5     4 1.79     2.87 0.03
R2     2 766 1.43 0.75      1    1.25 0.00   1   5     4 1.87     3.32 0.03
R3     3 766 1.42 0.81      1    1.23 0.00   1   5     4 2.03     3.82 0.03
R4     4 766 1.87 1.03      2    1.70 1.48   1   5     4 1.04     0.27 0.04
R5     5 766 1.46 0.88      1    1.26 0.00   1   5     4 1.97     3.27 0.03
R6     6 766 1.57 0.94      1    1.37 0.00   1   5     4 1.68     2.14 0.03
R7     7 766 1.99 0.98      2    1.89 1.48   1   5     4 0.64    -0.44 0.04
R8     8 766 1.52 0.84      1    1.36 0.00   1   5     4 1.53     1.68 0.03
R9     9 766 1.67 0.95      1    1.51 0.00   1   5     4 1.34     1.12 0.03
R10   10 766 1.41 0.79      1    1.22 0.00   1   5     4 2.05     3.88 0.03
R11   11 766 1.65 0.92      1    1.48 0.00   1   5     4 1.50     1.96 0.03
R12   12 766 1.88 0.96      2    1.76 1.48   1   5     4 0.91     0.24 0.03
R13   13 766 1.70 1.06      1    1.50 0.00   1   5     4 1.32     0.61 0.04
R14   14 766 1.80 0.99      1    1.65 0.00   1   5     4 0.99     0.00 0.04
R15   15 766 1.50 0.86      1    1.32 0.00   1   5     4 1.82     2.88 0.03
R16   16 766 2.04 1.05      2    1.90 1.48   1   5     4 0.73    -0.35 0.04
R17   17 766 1.23 0.60      1    1.08 0.00   1   5     4 3.06    10.57 0.02
R18   18 766 1.93 1.07      2    1.76 1.48   1   5     4 1.00     0.26 0.04
R19   19 766 1.40 0.75      1    1.21 0.00   1   5     4 2.04     4.04 0.03
R20   20 766 1.54 0.91      1    1.36 0.00   1   5     4 1.69     2.21 0.03
R21   21 766 1.54 0.86      1    1.37 0.00   1   5     4 1.57     1.78 0.03
R22   22 766 1.80 0.94      2    1.67 1.48   1   5     4 0.95     0.09 0.03
R23   23 766 1.90 1.03      2    1.76 1.48   1   5     4 0.91    -0.02 0.04
R24   24 766 1.83 1.00      2    1.67 1.48   1   5     4 1.13     0.63 0.04
R25   25 766 2.40 1.21      2    2.31 1.48   1   5     4 0.40    -0.84 0.04
R26   26 766 2.03 1.05      2    1.89 1.48   1   5     4 0.69    -0.49 0.04
R27   27 766 1.83 0.98      2    1.69 1.48   1   5     4 0.99     0.22 0.04
R28   28 766 2.06 1.07      2    1.93 1.48   1   5     4 0.66    -0.51 0.04
R29   29 766 1.55 0.85      1    1.39 0.00   1   5     4 1.53     1.81 0.03

The item selection function

McDonald (1999) describes an item selection procedure that can be used to maximize the reliability of a scale. According to McDonald the reliability of a scale (as reflected by coefficient omega) of length k can be maximized by selecting from a pool of items the k items with the highest information statistics.

The function first obtains a single factor solution of the entire pool of items, using the covariance matrix of the items as input. This yields a solution where the factor is standardized, but the factor loadings and unique variances are unstandardized.

Second, the unstandardized factor loadings and unique variances are used to calculate item information statistics, where information is equal to the ratio of the squared factor loading (signal) to the unique variance (noise).

Third, the item with the lowest information statistic is removed from the dataset. The process is repeated until the specified number of items remain in the scale. At each step two reliability coefficients are calculated, namely omega and alpha.

Activating the function

Users should paste the code that follows in an R script window and then run it. This will activate the Step.Omega function, which will remain active until the R session is closed. Users can delete the code once the function has been activated.

   Step.Omega <- function(data, items.wanted, i) {
      h2   <- fa(data, 1, cor = "cov")$communality
      u2   <- fa(data, 1, cor = "cov")$uniqueness
      info <- sort(h2/u2)
      
      ## Data frame that tracks omega as you remove items iteratively
      StepOmega         <- as.data.frame(c(0, "None", 
                                           ci.reliability(data, type = "hierarchical")[1]))
      names(StepOmega)  <- c('Step', 'Item.removed', 'Omega')
      
      ## Specify number of items you want
      items.wanted <- items.wanted
      
      for (i in 1:(ncol(data) - items.wanted)) {
        
        h2   <- fa(data, 1, cor = "cov")$communality
        u2   <- fa(data, 1, cor = "cov")$uniqueness
        info <- sort(h2/u2)
        
        # Identify item with lowest information
        low.info        <- names(info[1])
        item.out <- names(data) %in% c(low.info)
        
        ### Remove item with the lowest information from data frame
        data <- data[!item.out]
        
        # Calculate omega with MBESS package
        rel             <- ci.reliability(data, type = "hierarchical")[1]
        
        # Store the step, name of item removed and omega in a data frame
        LoopOut         <- as.data.frame(c(i,low.info,rel))
        names(LoopOut ) <- c('Step','Item.removed','Omega')
        
        StepOmega       <-  rbind(StepOmega,LoopOut)
      }
      results   <- list(Steps          = StepOmega,
                        Items.in.scale = names(data),
                        Omega          = ci.reliability(data, type = "hierarchical")[1],
                        Alpha          = alpha(data)$total[1])
      print(results)
    }

Performing the analysis

The analysis is run by typing the name of the function (Step.Omega). The function takes two arguments, namely the name of the data frame containing the items, and the number of items that should remain in the scale.

The output contains (a) a table that shows at each step which item was removed (i.e. the item with the lowest information), along with the reliability of the scale without that item; (b) the names of the items that are included in the final scale; (c) coefficient omega of the final scale; and (d) coefficient alpha of the final scale.

Step.Omega(data, items.wanted = 12)

$Steps
   Step Item.removed     Omega
1     0         None 0.9679626
2     1          R21 0.9682411
3     2          R25 0.9698631
4     3           R8 0.9694007
5     4          R11 0.9691310
6     5           R9 0.9690738
7     6          R13 0.9690519
8     7          R14 0.9688353
9     8          R18 0.9690265
10    9          R12 0.9689736
11   10          R17 0.9684700
12   11          R23 0.9678633
13   12          R26 0.9672829
14   13           R6 0.9661940
15   14           R5 0.9646160
16   15          R15 0.9628851
17   16           R2 0.9608687
18   17           R3 0.9585747

$Items.in.scale
 [1] "R1"  "R4"  "R7"  "R10" "R16" "R19" "R20" "R22" "R24" "R27" "R28" "R29"

$Omega
$Omega$est
[1] 0.9585747


$Alpha
 raw_alpha
 0.9575593

Using the function for selecting items

The function can be used to find the set of items that will maximise coefficient omega for a scale of a specified length k. The output lists (a) the items that are retained in the scale, (b) coefficients alpha and omega of the scale, and (c) a table showing which item is removed at each successive step and the scale reliability at that step. A purely mechanical approach would dictate that the k items with the highest information statistics be retained. However, items should rarely be retained or rejected on the basis of statistics alone. It is recommended that the content of an item that is removed by the function be carefully considered before a final decision about retaining or removing it is made. Items that contain essential content should likely be considered for retention, even if their information is lower than that of other items.

Another way in which the function can be used is to specify a minimum reliability that would be acceptable. The results can then be used to identify the point at which that reliability is reached and to note which items were identified for removal. Again, careful consideration should also be given to the item content.