Reliability Analysis

This tutorial is based on Field et al. (2012). Reliability analysis can be done with the alpha() function in the psych package.

Preparing the data

Load packages and data:

library(dplyr)
#url contains the data set
url <- "http://www.uk.sagepub.com/dsur/study/DSUR%20Data%20Files/Chapter%2017/raq.dat"
dat <- read.table(url, header = TRUE)

This dataset is a questionnaire with 23 items with four subscales measuring different types of fear:

Subscale 1 (fear of computers): items 6, 7, 10, 13, 14, 15, 18
Subscale 2 (fear of statistics): items 1, 3 (reverse-scored), 4, 5, 12, 16, 21
Subscale 3 (fear of maths): items 8, 11, 17
Subscale 4 (feer of peer evaluation): items 2, 9, 19, 22

tbl_df(dat)

## Source: local data frame [2,571 x 23]
## 
##    Q01 Q02 Q03 Q04 Q05 Q06 Q07 Q08 Q09 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18
## 1    4   5   2   4   4   4   3   5   5   4   5   4   4   4   4   3   5   4
## 2    5   5   2   3   4   4   4   4   1   4   4   3   5   3   2   3   4   4
## 3    4   3   4   4   2   5   4   4   4   4   3   3   4   2   4   3   4   3
## 4    3   5   5   2   3   3   2   4   4   2   4   4   4   3   3   3   4   2
## 5    4   5   3   4   4   3   3   4   2   4   4   3   3   4   4   4   4   3
## 6    4   5   3   4   2   2   2   4   2   3   4   2   3   3   1   4   3   1
## 7    4   3   3   4   4   4   4   4   3   4   4   4   4   4   4   4   4   4
## 8    4   4   3   4   4   4   4   4   2   4   4   3   4   4   3   4   4   4
## 9    3   3   5   2   1   3   1   1   3   3   1   1   1   1   1   1   1   1
## 10   4   2   2   3   4   5   4   4   3   4   4   3   4   5   4   3   4   4
## .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
## Variables not shown: Q19 (int), Q20 (int), Q21 (int), Q22 (int), Q23 (int)

Create a new variable to store items for each subscale. This makes things easier later on. I’m using select() from dplyr package to select the variables/columns.

computerFear <- select(dat, 6, 7, 10, 13, 14, 15, 18) #each number refers to the column
statsFear <- select(dat, 1, 3, 4, 5, 12, 16, 20, 21)
mathsFear <- select(dat, 8, 11, 17)
peerFear <- select(dat, 2, 9, 19, 22, 23)

To do the reliability analysis, you’ll need to load the psych package and use the alpha function. However, ggplot2 also has a function called alpha. If you’ve loaded ggplot2, the alpha function in ggplot2 will be called instead. If that happens, you can specify the package using psych::alpha().

library(psych)

If your scale contains items that are reversed scored, you need to specify them. The keys argument allows you to specify which items are reverse-scored:

alpha(statsFear, keys = c(1, -1, 1, 1, 1, 1, 1, 1)) #Q03, which is item 2 in the statsFear subscale is reverse-scored

If all your items are positively scored (not reverse-scored), you can do the following to do your reliability analyses. It is assumed that all items are positively scored, so you don’t have to specify anything.

alpha(computerFear)
alpha(mathsFear)
alpha(peerFear)

Interpreting the output

Fear of computer subscale

alpha(computerFear)

## 
## Reliability analysis   
## Call: alpha(x = computerFear)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N    ase mean   sd
##       0.82      0.82    0.81       0.4 4.6 0.0094  3.4 0.71
## 
##  lower alpha upper     95% confidence boundaries
## 0.8 0.82 0.84 
## 
##  Reliability if an item is dropped:
##     raw_alpha std.alpha G6(smc) average_r S/N alpha se
## Q06      0.79      0.79    0.77      0.38 3.7    0.011
## Q07      0.79      0.79    0.77      0.38 3.7    0.011
## Q10      0.82      0.82    0.80      0.44 4.7    0.010
## Q13      0.79      0.79    0.77      0.39 3.8    0.011
## Q14      0.80      0.80    0.77      0.39 3.9    0.011
## Q15      0.81      0.81    0.79      0.41 4.2    0.011
## Q18      0.79      0.78    0.76      0.38 3.6    0.011
## 
##  Item statistics 
##        n raw.r std.r r.cor r.drop mean   sd
## Q06 2571  0.75  0.74  0.68   0.62  3.8 1.12
## Q07 2571  0.75  0.73  0.68   0.62  3.1 1.10
## Q10 2571  0.54  0.57  0.44   0.40  3.7 0.88
## Q13 2571  0.72  0.73  0.67   0.61  3.6 0.95
## Q14 2571  0.70  0.70  0.64   0.58  3.1 1.00
## Q15 2571  0.64  0.64  0.54   0.49  3.2 1.01
## Q18 2571  0.76  0.76  0.72   0.65  3.4 1.05
## 
## Non missing response frequency for each item
##        1    2    3    4    5 miss
## Q06 0.06 0.10 0.13 0.44 0.27    0
## Q07 0.09 0.24 0.26 0.34 0.07    0
## Q10 0.02 0.10 0.18 0.57 0.14    0
## Q13 0.03 0.12 0.25 0.48 0.12    0
## Q14 0.07 0.18 0.38 0.31 0.06    0
## Q15 0.06 0.18 0.30 0.39 0.07    0
## Q18 0.06 0.12 0.31 0.37 0.14    0

What do the summary statistics mean?

raw_alpha: Cronbach’s α (values ≥ .7 or .8 indicate good reliability; Kline (1999))
std.alpha: this should be similar to raw_alpha (we only need the raw alpha though)
G6: Guttman’s lambda 6 (calculated from the squared multiple correlation or ‘smc’)
average_r: average inter-item correlation (this is used to calculate std.alpha)
mean: scale mean (the mean of the means of all individuals)
sd: scale sd

How to interpret ‘Reliability if an item is dropped’?

The overall α (raw_alpha) is .82. Each row refers to each item and has a raw alpha associated—this refers to the overall α when that particular item has been dropped/deleted. For example, the first row refers to Q06, and if it is dropped, the overall α becomes .79, which reflects worse reliability, so we want to keep Q06
We are checking whether any of these raw alpha values are greater than the overall α of .82; if yes, this means that dropping that particular item will increase the overall α of the scale.
The other columns of this table refer to how the other statistics will change if that particular item has been dropped/deleted.

How to interpret ‘Item statistics’?

raw.r: correlation between the item and the total score from the scale (i.e., item-total correlations); there is a problem with raw.r, that is, the item itself is included in the total—this means we’re correlating the item with itself, so of course it will correlate (r.cor and r.drop solve this problem; see ?alpha for details)
r.drop: item-total correlation without that item itself (i.e., item-rest correlation or corrected item-total correlation); low item-total correlations indicate that that item doesn’t correlate well with the scale overall
r.cor: item-total correlation corrected for item overlap and scale reliability
mean and sd: mean and sd of the scale if that item is dropped
All items should correlate the the total score, so we’re looking for items that don’t correlate with the overall score from the scale. If r.drop values are less than about .3, it means that particular item doesn’t correlate very well with the scale overall.

How to interpret the final frequency table?

This table tells us what percentage of people gave each response to each of the items (i.e., if you have a 5-point scale, then it tells you how many percent of responses were 1, 2, 3, 4, or 5).
This helps you check the distribution of responses and whether everyone is giving the same responses (which will lead to low reliability).

Fear of computer subscale

alpha(statsFear, keys = c(1, -1, 1, 1, 1, 1, 1, 1)) #Q03, which is item 2 in the statsFear subscale is reverse-scored

## 
## Reliability analysis   
## Call: alpha(x = statsFear, keys = c(1, -1, 1, 1, 1, 1, 1, 1))
## 
##   raw_alpha std.alpha G6(smc) average_r S/N    ase mean   sd
##       0.82      0.82    0.81      0.37 4.7 0.0089    3 0.64
## 
##  lower alpha upper     95% confidence boundaries
## 0.8 0.82 0.84 
## 
##  Reliability if an item is dropped:
##      raw_alpha std.alpha G6(smc) average_r S/N alpha se
## Q01       0.80      0.80    0.79      0.37 4.1   0.0101
## Q03-      0.80      0.80    0.79      0.37 4.1   0.0101
## Q04       0.80      0.80    0.78      0.36 4.0   0.0102
## Q05       0.81      0.81    0.80      0.38 4.2   0.0099
## Q12       0.80      0.80    0.79      0.36 4.0   0.0102
## Q16       0.79      0.80    0.78      0.36 3.9   0.0103
## Q20       0.82      0.82    0.80      0.40 4.6   0.0096
## Q21       0.79      0.80    0.78      0.36 3.9   0.0104
## 
##  Item statistics 
##         n raw.r std.r r.cor r.drop mean   sd
## Q01  2571  0.65  0.67  0.60   0.54  3.6 0.83
## Q03- 2571  0.69  0.67  0.60   0.55  2.6 1.08
## Q04  2571  0.69  0.70  0.64   0.58  3.2 0.95
## Q05  2571  0.63  0.63  0.55   0.49  3.3 0.96
## Q12  2571  0.69  0.69  0.63   0.57  2.8 0.92
## Q16  2571  0.71  0.71  0.67   0.60  3.1 0.92
## Q20  2571  0.58  0.56  0.47   0.42  2.4 1.04
## Q21  2571  0.72  0.71  0.67   0.61  2.8 0.98
## 
## Non missing response frequency for each item
##        1    2    3    4    5 miss
## Q01 0.02 0.07 0.29 0.52 0.11    0
## Q03 0.03 0.17 0.34 0.26 0.19    0
## Q04 0.05 0.17 0.36 0.37 0.05    0
## Q05 0.04 0.18 0.29 0.43 0.06    0
## Q12 0.09 0.23 0.46 0.20 0.02    0
## Q16 0.06 0.16 0.42 0.33 0.04    0
## Q20 0.22 0.37 0.25 0.15 0.02    0
## Q21 0.09 0.29 0.34 0.26 0.02    0

How to interpret?

Same as before. Just note that for reverse-scored items, there will be a negative sign in the results output to indicate it has been reversed.

Fear of computer subscale (without specifying reverse-scored item: WRONG!)

This demonstrates what happens when you forget to reverse your items using the keys parameter:

#alpha(statsFear, keys = c(1, -1, 1, 1, 1, 1, 1, 1)) #Q03, which is item 2 in the statsFear subscale is reverse-scored
alpha(statsFear, keys = c(1, 1, 1, 1, 1, 1, 1, 1))

## 
## Reliability analysis   
## Call: alpha(x = statsFear, keys = c(1, 1, 1, 1, 1, 1, 1, 1))
## 
##   raw_alpha std.alpha G6(smc) average_r S/N   ase mean  sd
##       0.61      0.64    0.71      0.18 1.8 0.014  3.1 0.5
## 
##  lower alpha upper     95% confidence boundaries
## 0.58 0.61 0.63 
## 
##  Reliability if an item is dropped:
##     raw_alpha std.alpha G6(smc) average_r S/N alpha se
## Q01      0.52      0.56    0.64      0.15 1.3    0.017
## Q03      0.80      0.80    0.79      0.37 4.1    0.010
## Q04      0.50      0.55    0.64      0.15 1.2    0.017
## Q05      0.52      0.57    0.66      0.16 1.3    0.017
## Q12      0.52      0.56    0.65      0.15 1.3    0.017
## Q16      0.51      0.55    0.63      0.15 1.2    0.017
## Q20      0.56      0.60    0.68      0.18 1.5    0.016
## Q21      0.50      0.55    0.63      0.15 1.2    0.017
## 
##  Item statistics 
##        n raw.r std.r r.cor r.drop mean   sd
## Q01 2571  0.65  0.68  0.62   0.51  3.6 0.83
## Q03 2571 -0.35 -0.37 -0.64  -0.55  3.4 1.08
## Q04 2571  0.69  0.69  0.65   0.53  3.2 0.95
## Q05 2571  0.65  0.65  0.57   0.47  3.3 0.96
## Q12 2571  0.66  0.67  0.62   0.50  2.8 0.92
## Q16 2571  0.69  0.70  0.66   0.53  3.1 0.92
## Q20 2571  0.57  0.55  0.45   0.35  2.4 1.04
## Q21 2571  0.70  0.70  0.66   0.54  2.8 0.98
## 
## Non missing response frequency for each item
##        1    2    3    4    5 miss
## Q01 0.02 0.07 0.29 0.52 0.11    0
## Q03 0.03 0.17 0.34 0.26 0.19    0
## Q04 0.05 0.17 0.36 0.37 0.05    0
## Q05 0.04 0.18 0.29 0.43 0.06    0
## Q12 0.09 0.23 0.46 0.20 0.02    0
## Q16 0.06 0.16 0.42 0.33 0.04    0
## Q20 0.22 0.37 0.25 0.15 0.02    0
## Q21 0.09 0.29 0.34 0.26 0.02    0

What’s wrong?

raw_alpha has dropped from .82 to .61.
Reliability if an item is dropped for Q03 is .80, much higher than the overall scale α of .61, suggesting that this item (Q03) should be deleted.
Q03 has a negative item-total correlation (this is a good way to spot whether a potential reverse-scored item hasn’t been reverse scored)

How to report reliability analysis the APA way?

The fear of computers subscale was reliable, α = .82, but the fear of negative peer evaluation subscale had relatively low reliability, α = .57.

References

Field, A., Miles, J., & Field, Z. (2012). Discovering statistics with R. Sage Publications Ltd.
Kline, P. (1999). The handbook of psychological testing (2nd ed.). London: Routledge.