Psychometrics HW #3

Intro to Factor Analysis

Packages needed for this assignment:

library(psych)
library(GPArotation)
library(polycor)
library(prettydoc)

Let’s go ahead and read in our data. This file contains item responses for 500 individuals to 30 categorical items

data<-read.delim("/Users/alliechoate/Desktop/Psychometric Materials/Psychometrics/Homework3joon.txt", header = TRUE) #header = T allows variable names to be displayed correctly

1. Perform an exploratory factor analysis (EFA) using principal axis factoring with an oblique rotation (direct oblimin).

We will be using the psych package to perform a basic EFA. The command ‘fa’ allows us to specify a variety of different orthogonal and oblique rotations in addition to specifying the factoring method. In this case, we want to use principal factor analysis, so we put ‘pa’ for fm. We also want to use some sort of oblique factor roation. The psych package oblique rotations include: “Promax”, “promax”, “oblimin”, “simplimax”, “bentlerQ,”geominQ" and “biquartimin” and “cluster.”

The psych package defaults to extracting one factor. In may be beneficial in this case to first look at eigenvalue tables or the scree plot before specifying how many factors we would like to extract. However, you can also vary the amount of factors by specifying how many you would like to extract by specifying ‘nfactors = x’ in the ‘fa’ command.

library(psych)

## Warning: package 'psych' was built under R version 3.4.4

EFA<-fa(data, nfactors = "3", fm="pa", rotate="oblimin")

## Loading required namespace: GPArotation

a. How many factors are indicated by the scree plot, eigenvalue table, and pattern of loadings? Explain your answer.

The pattern of loading matrix suggests a three factor solution, in which variable ‘X0’ through ‘X0.9’ load strongly on ‘PA1’, ‘X0.10’ through ‘X0.19’ load strongly on factor ‘PA2’, and items ‘X1’ through ‘X0.21’ load strongly on ‘PA3.’
To extract the scree plot, we can use the command below.

scree(data)

We want to extract the number of factors based on an eigen value at/above 1. This usually corresponds to the largest dip in the scree plot as well. Above, we can see that the first eigen value corresponds to the largest dip in the scree plot, suggesting that a four factor solution should be sufficient. Remember that the scree plot often overestimates the number of factors needed, however. In this case, it may be safer to extract three factors or look at other components of our factor analysis to conclude how many factors would be appropriate to extract.

We can also quickly visualize our factor analysis if we save it as an object and then use the default ‘plot’ function in R to see our results. As expected, we see a clustering effect for our three factors.

plot(EFA)

Based on the above results, it is probably safe to say a 3-factor solution is appropriate given our data.

b. Examine the factor loadings and identify the most discriminating item and the least discriminating item. Most discriminating item is the item with the highest factor loading. We can find this item by looking at the pattern matrix of loadings.

Most discriminating = X0.4 (factor loading of ‘.74’ on factor 1)
Least discriminating = X0.3 (factor loading of ‘0’ on factor 3)

2. Run a classical test theory (CTT) reliability analysis on the combined set of 30 items. Then run reliability analyses on the item subsets associated with each of the factors identified above. Why do the reliabilities for the subsets and full measure differ? Are the “subscale” reliabilities “adequate”?

To get alpha on the full dataset, simply run the line of code below that can also be found in the psych package. Below, we can see our alpha for the full dataset is 0.86.

alpha(data)

## 
## Reliability analysis   
## Call: alpha(x = data)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N    ase mean  sd median_r
##       0.86      0.85    0.89      0.16 5.8 0.0092  0.5 0.2     0.13
## 
##  lower alpha upper     95% confidence boundaries
## 0.84 0.86 0.87 
## 
##  Reliability if an item is dropped:
##       raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## X0         0.85      0.85    0.89      0.16 5.7   0.0094 0.023  0.14
## X0.1       0.85      0.85    0.89      0.16 5.6   0.0096 0.023  0.13
## X0.2       0.84      0.84    0.89      0.16 5.4   0.0100 0.023  0.12
## X0.3       0.85      0.85    0.89      0.16 5.5   0.0098 0.023  0.13
## X0.4       0.85      0.84    0.89      0.16 5.5   0.0098 0.022  0.13
## X0.5       0.85      0.85    0.89      0.16 5.6   0.0096 0.024  0.13
## X0.6       0.85      0.84    0.89      0.16 5.4   0.0098 0.023  0.13
## X0.7       0.85      0.85    0.89      0.17 5.7   0.0094 0.023  0.14
## X0.8       0.85      0.85    0.89      0.16 5.7   0.0094 0.023  0.14
## X0.9       0.85      0.85    0.89      0.16 5.6   0.0096 0.023  0.13
## X0.10      0.85      0.85    0.89      0.17 5.8   0.0094 0.022  0.14
## X0.11      0.86      0.85    0.89      0.17 5.8   0.0093 0.023  0.14
## X0.12      0.86      0.85    0.89      0.17 5.8   0.0093 0.023  0.14
## X0.13      0.85      0.85    0.89      0.16 5.7   0.0095 0.024  0.13
## X0.14      0.85      0.85    0.89      0.16 5.7   0.0094 0.023  0.14
## X0.15      0.85      0.85    0.89      0.16 5.6   0.0095 0.023  0.14
## X0.16      0.85      0.85    0.89      0.16 5.5   0.0096 0.023  0.13
## X0.17      0.85      0.85    0.89      0.16 5.5   0.0096 0.023  0.13
## X0.18      0.85      0.85    0.89      0.16 5.6   0.0096 0.023  0.13
## X0.19      0.85      0.85    0.89      0.16 5.5   0.0097 0.024  0.13
## X1         0.85      0.85    0.89      0.16 5.7   0.0094 0.023  0.14
## X1.1       0.85      0.85    0.89      0.16 5.5   0.0096 0.024  0.13
## X1.2       0.86      0.85    0.89      0.17 5.8   0.0093 0.024  0.13
## X1.3       0.85      0.85    0.89      0.17 5.8   0.0093 0.022  0.14
## X1.4       0.85      0.85    0.89      0.16 5.7   0.0094 0.023  0.13
## X1.5       0.86      0.86    0.89      0.17 6.0   0.0091 0.022  0.14
## X0.20      0.85      0.85    0.89      0.16 5.5   0.0096 0.023  0.13
## X1.6       0.85      0.85    0.89      0.16 5.6   0.0095 0.024  0.13
## X1.7       0.85      0.85    0.89      0.17 5.7   0.0094 0.023  0.14
## X0.21      0.85      0.85    0.89      0.16 5.5   0.0097 0.024  0.12
## 
##  Item statistics 
##         n raw.r std.r r.cor r.drop mean   sd
## X0    499  0.38  0.38  0.35   0.32 0.22 0.42
## X0.1  499  0.48  0.48  0.46   0.41 0.45 0.50
## X0.2  499  0.64  0.64  0.63   0.59 0.53 0.50
## X0.3  499  0.56  0.56  0.55   0.50 0.34 0.48
## X0.4  499  0.58  0.57  0.57   0.52 0.53 0.50
## X0.5  499  0.49  0.49  0.46   0.43 0.67 0.47
## X0.6  499  0.58  0.58  0.57   0.53 0.39 0.49
## X0.7  499  0.35  0.36  0.32   0.29 0.26 0.44
## X0.8  499  0.41  0.40  0.37   0.33 0.42 0.49
## X0.9  499  0.49  0.49  0.47   0.43 0.26 0.44
## X0.10 499  0.35  0.34  0.32   0.28 0.27 0.45
## X0.11 499  0.33  0.33  0.29   0.26 0.44 0.50
## X0.12 499  0.27  0.28  0.24   0.22 0.15 0.36
## X0.13 499  0.41  0.41  0.38   0.35 0.28 0.45
## X0.14 499  0.40  0.39  0.36   0.33 0.46 0.50
## X0.15 499  0.46  0.45  0.44   0.40 0.43 0.50
## X0.16 499  0.50  0.50  0.49   0.44 0.39 0.49
## X0.17 499  0.51  0.50  0.49   0.44 0.52 0.50
## X0.18 499  0.48  0.48  0.46   0.42 0.36 0.48
## X0.19 499  0.53  0.52  0.51   0.47 0.48 0.50
## X1    499  0.36  0.37  0.34   0.29 0.77 0.42
## X1.1  499  0.49  0.50  0.48   0.44 0.74 0.44
## X1.2  499  0.31  0.31  0.27   0.24 0.68 0.47
## X1.3  499  0.33  0.34  0.32   0.26 0.77 0.42
## X1.4  499  0.38  0.40  0.37   0.32 0.75 0.43
## X1.5  499  0.19  0.20  0.15   0.12 0.73 0.45
## X0.20 499  0.51  0.52  0.51   0.45 0.74 0.44
## X1.6  499  0.43  0.44  0.41   0.37 0.73 0.44
## X1.7  499  0.35  0.36  0.33   0.28 0.75 0.43
## X0.21 499  0.52  0.52  0.49   0.46 0.61 0.49
## 
## Non missing response frequency for each item
##          0    1 miss
## X0    0.78 0.22    0
## X0.1  0.55 0.45    0
## X0.2  0.47 0.53    0
## X0.3  0.66 0.34    0
## X0.4  0.47 0.53    0
## X0.5  0.33 0.67    0
## X0.6  0.61 0.39    0
## X0.7  0.74 0.26    0
## X0.8  0.58 0.42    0
## X0.9  0.74 0.26    0
## X0.10 0.73 0.27    0
## X0.11 0.56 0.44    0
## X0.12 0.85 0.15    0
## X0.13 0.72 0.28    0
## X0.14 0.54 0.46    0
## X0.15 0.57 0.43    0
## X0.16 0.61 0.39    0
## X0.17 0.48 0.52    0
## X0.18 0.64 0.36    0
## X0.19 0.52 0.48    0
## X1    0.23 0.77    0
## X1.1  0.26 0.74    0
## X1.2  0.32 0.68    0
## X1.3  0.23 0.77    0
## X1.4  0.25 0.75    0
## X1.5  0.27 0.73    0
## X0.20 0.26 0.74    0
## X1.6  0.27 0.73    0
## X1.7  0.25 0.75    0
## X0.21 0.39 0.61    0

To get alpha on each of our specific factors, let’s first quickly subset the data so we have three new dataframes with 10 items each that correspond respectively to our three factors we are trying to extract.

Conveinently, the data is also pre-arranged in order. Subsetting becomes especially easy in this scenario, as we can just tell R which columns we would like to subset from our original dataset.

factor1<- data[c(1:10)] 
factor2<- data[c(11:20)]
factor3<- data[c(21:30)]

Now that we have subsetted our data, we can again use the ‘alpha’ command for each of our ‘factors.’

## 
## Reliability analysis   
## Call: alpha(x = factor1)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N    ase mean   sd median_r
##       0.86      0.86    0.86      0.38 6.2 0.0091 0.41 0.32     0.39
## 
##  lower alpha upper     95% confidence boundaries
## 0.84 0.86 0.88 
## 
##  Reliability if an item is dropped:
##      raw_alpha std.alpha G6(smc) average_r S/N alpha se  var.r med.r
## X0        0.86      0.86    0.85      0.40 6.0   0.0095 0.0060  0.40
## X0.1      0.85      0.84    0.84      0.38 5.4   0.0103 0.0067  0.38
## X0.2      0.85      0.85    0.84      0.38 5.5   0.0102 0.0068  0.38
## X0.3      0.84      0.84    0.84      0.37 5.3   0.0104 0.0074  0.37
## X0.4      0.84      0.84    0.83      0.37 5.2   0.0107 0.0054  0.37
## X0.5      0.86      0.85    0.85      0.39 5.8   0.0096 0.0062  0.39
## X0.6      0.84      0.84    0.84      0.37 5.3   0.0104 0.0068  0.37
## X0.7      0.86      0.86    0.85      0.40 6.0   0.0095 0.0066  0.41
## X0.8      0.85      0.85    0.84      0.39 5.6   0.0098 0.0081  0.39
## X0.9      0.85      0.85    0.84      0.38 5.6   0.0099 0.0072  0.38
## 
##  Item statistics 
##        n raw.r std.r r.cor r.drop mean   sd
## X0   499  0.57  0.58  0.51   0.46 0.22 0.42
## X0.1 499  0.71  0.70  0.67   0.62 0.45 0.50
## X0.2 499  0.70  0.69  0.65   0.60 0.53 0.50
## X0.3 499  0.73  0.73  0.70   0.65 0.34 0.48
## X0.4 499  0.76  0.75  0.73   0.68 0.53 0.50
## X0.5 499  0.61  0.60  0.54   0.50 0.67 0.47
## X0.6 499  0.72  0.72  0.69   0.64 0.39 0.49
## X0.7 499  0.57  0.58  0.50   0.46 0.26 0.44
## X0.8 499  0.65  0.65  0.59   0.55 0.42 0.49
## X0.9 499  0.65  0.66  0.60   0.56 0.26 0.44
## 
## Non missing response frequency for each item
##         0    1 miss
## X0   0.78 0.22    0
## X0.1 0.55 0.45    0
## X0.2 0.47 0.53    0
## X0.3 0.66 0.34    0
## X0.4 0.47 0.53    0
## X0.5 0.33 0.67    0
## X0.6 0.61 0.39    0
## X0.7 0.74 0.26    0
## X0.8 0.58 0.42    0
## X0.9 0.74 0.26    0

## 
## Reliability analysis   
## Call: alpha(x = factor2)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N   ase mean  sd median_r
##       0.84      0.84    0.84      0.34 5.2 0.011 0.38 0.3     0.32
## 
##  lower alpha upper     95% confidence boundaries
## 0.82 0.84 0.86 
## 
##  Reliability if an item is dropped:
##       raw_alpha std.alpha G6(smc) average_r S/N alpha se  var.r med.r
## X0.10      0.82      0.82    0.82      0.34 4.6    0.012 0.0136  0.31
## X0.11      0.83      0.83    0.84      0.35 4.9    0.011 0.0138  0.35
## X0.12      0.83      0.83    0.84      0.36 5.0    0.011 0.0120  0.35
## X0.13      0.84      0.84    0.84      0.36 5.1    0.011 0.0114  0.35
## X0.14      0.83      0.83    0.83      0.35 4.8    0.011 0.0125  0.31
## X0.15      0.81      0.81    0.82      0.32 4.3    0.012 0.0115  0.30
## X0.16      0.81      0.81    0.82      0.32 4.3    0.012 0.0107  0.31
## X0.17      0.81      0.81    0.81      0.32 4.3    0.012 0.0098  0.31
## X0.18      0.82      0.82    0.83      0.34 4.7    0.012 0.0118  0.32
## X0.19      0.82      0.82    0.83      0.34 4.6    0.012 0.0128  0.32
## 
##  Item statistics 
##         n raw.r std.r r.cor r.drop mean   sd
## X0.10 499  0.65  0.66  0.62   0.55 0.27 0.45
## X0.11 499  0.57  0.57  0.49   0.45 0.44 0.50
## X0.12 499  0.51  0.55  0.47   0.42 0.15 0.36
## X0.13 499  0.52  0.53  0.45   0.40 0.28 0.45
## X0.14 499  0.61  0.61  0.54   0.49 0.46 0.50
## X0.15 499  0.74  0.73  0.71   0.65 0.43 0.50
## X0.16 499  0.74  0.73  0.71   0.65 0.39 0.49
## X0.17 499  0.74  0.73  0.71   0.65 0.52 0.50
## X0.18 499  0.64  0.63  0.58   0.53 0.36 0.48
## X0.19 499  0.65  0.64  0.59   0.53 0.48 0.50
## 
## Non missing response frequency for each item
##          0    1 miss
## X0.10 0.73 0.27    0
## X0.11 0.56 0.44    0
## X0.12 0.85 0.15    0
## X0.13 0.72 0.28    0
## X0.14 0.54 0.46    0
## X0.15 0.57 0.43    0
## X0.16 0.61 0.39    0
## X0.17 0.48 0.52    0
## X0.18 0.64 0.36    0
## X0.19 0.52 0.48    0

## 
## Reliability analysis   
## Call: alpha(x = factor3)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N   ase mean   sd median_r
##       0.81      0.81    0.81       0.3 4.3 0.013 0.73 0.27     0.29
## 
##  lower alpha upper     95% confidence boundaries
## 0.78 0.81 0.83 
## 
##  Reliability if an item is dropped:
##       raw_alpha std.alpha G6(smc) average_r S/N alpha se  var.r med.r
## X1         0.79      0.79    0.79      0.30 3.8    0.014 0.0090  0.29
## X1.1       0.79      0.79    0.79      0.30 3.8    0.014 0.0090  0.29
## X1.2       0.81      0.81    0.81      0.32 4.2    0.013 0.0085  0.32
## X1.3       0.78      0.78    0.78      0.29 3.6    0.015 0.0072  0.28
## X1.4       0.78      0.79    0.78      0.29 3.7    0.015 0.0086  0.29
## X1.5       0.80      0.80    0.80      0.31 4.1    0.013 0.0089  0.31
## X0.20      0.78      0.78    0.78      0.29 3.6    0.015 0.0076  0.28
## X1.6       0.79      0.79    0.79      0.30 3.8    0.014 0.0105  0.27
## X1.7       0.79      0.79    0.79      0.29 3.7    0.014 0.0085  0.28
## X0.21      0.80      0.80    0.80      0.31 4.1    0.013 0.0097  0.32
## 
##  Item statistics 
##         n raw.r std.r r.cor r.drop mean   sd
## X1    499  0.62  0.63  0.57   0.51 0.77 0.42
## X1.1  499  0.61  0.61  0.55   0.49 0.74 0.44
## X1.2  499  0.50  0.49  0.39   0.35 0.68 0.47
## X1.3  499  0.68  0.69  0.66   0.58 0.77 0.42
## X1.4  499  0.66  0.66  0.62   0.55 0.75 0.43
## X1.5  499  0.53  0.53  0.44   0.39 0.73 0.45
## X0.20 499  0.68  0.68  0.65   0.58 0.74 0.44
## X1.6  499  0.62  0.62  0.56   0.50 0.73 0.44
## X1.7  499  0.63  0.64  0.59   0.52 0.75 0.43
## X0.21 499  0.54  0.53  0.44   0.40 0.61 0.49
## 
## Non missing response frequency for each item
##          0    1 miss
## X1    0.23 0.77    0
## X1.1  0.26 0.74    0
## X1.2  0.32 0.68    0
## X1.3  0.23 0.77    0
## X1.4  0.25 0.75    0
## X1.5  0.27 0.73    0
## X0.20 0.26 0.74    0
## X1.6  0.27 0.73    0
## X1.7  0.25 0.75    0
## X0.21 0.39 0.61    0

Overall, our alpha’s for the three factors are similar to our alpha we got for the full dataset, with F1 being identical. The other two factors likely have slightly lower alpha’s, as they tended to have more ‘cross-loadings’ than many of the items specified for F1. Also, it is not uncommon for alpha to decrease when the number of items for a scale are reduced.

Alpha for F1 = .86
Alpha for F2 = .84
Alpha for F3 = .81

3. Using the EFA and CTT results, identify the best five items measuring each factor (explain your choices) and compute the reliability of the new five-item measures. Explain the findings.

Factor 1:

Items with the highest loadings on this factor include X0.1, X0.3, X0.4, X0.6, & X0.8 We can subset our dataset further to include just this items.
Our alpha for these 5 items is: 0.80

new.f1<- factor1[,c("X0.1", "X0.3","X0.4","X0.6", "X0.8")]
alpha(new.f1)

## 
## Reliability analysis   
## Call: alpha(x = new.f1)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N   ase mean   sd median_r
##        0.8       0.8    0.77      0.45 4.1 0.014 0.43 0.37     0.43
## 
##  lower alpha upper     95% confidence boundaries
## 0.77 0.8 0.83 
## 
##  Reliability if an item is dropped:
##      raw_alpha std.alpha G6(smc) average_r S/N alpha se  var.r med.r
## X0.1      0.77      0.77    0.72      0.45 3.3    0.017 0.0027  0.45
## X0.3      0.76      0.76    0.71      0.44 3.2    0.018 0.0060  0.42
## X0.4      0.74      0.74    0.68      0.42 2.9    0.019 0.0019  0.42
## X0.6      0.76      0.76    0.71      0.44 3.2    0.017 0.0045  0.43
## X0.8      0.79      0.79    0.74      0.48 3.8    0.015 0.0023  0.48
## 
##  Item statistics 
##        n raw.r std.r r.cor r.drop mean   sd
## X0.1 499  0.74  0.74  0.64   0.57 0.45 0.50
## X0.3 499  0.75  0.76  0.67   0.60 0.34 0.48
## X0.4 499  0.80  0.80  0.74   0.66 0.53 0.50
## X0.6 499  0.75  0.75  0.66   0.59 0.39 0.49
## X0.8 499  0.69  0.69  0.56   0.50 0.42 0.49
## 
## Non missing response frequency for each item
##         0    1 miss
## X0.1 0.55 0.45    0
## X0.3 0.66 0.34    0
## X0.4 0.47 0.53    0
## X0.6 0.61 0.39    0
## X0.8 0.58 0.42    0

Factor 2:

Items with the highest loadings on this factor include X0.10, X0.15, X0.16, X0.17, & X0.18.
Our alpha for these 5 items is: 0.81

new.f2<- factor2[,c("X0.10", "X0.15","X0.16","X0.17", "X0.18")]
alpha(new.f2)

## 
## Reliability analysis   
## Call: alpha(x = new.f2)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N   ase mean   sd median_r
##       0.81      0.81    0.79      0.45 4.2 0.014 0.39 0.36     0.46
## 
##  lower alpha upper     95% confidence boundaries
## 0.78 0.81 0.83 
## 
##  Reliability if an item is dropped:
##       raw_alpha std.alpha G6(smc) average_r S/N alpha se  var.r med.r
## X0.10      0.81      0.81    0.77      0.51 4.2    0.014 0.0065  0.49
## X0.15      0.75      0.75    0.71      0.43 3.0    0.018 0.0155  0.39
## X0.16      0.76      0.76    0.71      0.44 3.1    0.018 0.0078  0.46
## X0.17      0.75      0.75    0.70      0.42 2.9    0.018 0.0071  0.43
## X0.18      0.78      0.78    0.74      0.47 3.5    0.016 0.0114  0.47
## 
##  Item statistics 
##         n raw.r std.r r.cor r.drop mean   sd
## X0.10 499  0.65  0.66  0.51   0.46 0.27 0.45
## X0.15 499  0.79  0.79  0.72   0.65 0.43 0.50
## X0.16 499  0.78  0.78  0.72   0.63 0.39 0.49
## X0.17 499  0.80  0.80  0.75   0.66 0.52 0.50
## X0.18 499  0.73  0.73  0.62   0.56 0.36 0.48
## 
## Non missing response frequency for each item
##          0    1 miss
## X0.10 0.73 0.27    0
## X0.15 0.57 0.43    0
## X0.16 0.61 0.39    0
## X0.17 0.48 0.52    0
## X0.18 0.64 0.36    0

Factor 3:

Items with the highest loadings on this factor include X1.7, X0.20, X1.3, X1.4, & X1. These items also have good alpha values and total item correlations.
Our alpha for these 5 items is: 0.77

new.f3<- factor3[,c("X1.7", "X0.20","X1.3","X1.4", "X1")]
alpha(new.f3)

## 
## Reliability analysis   
## Call: alpha(x = new.f3)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N   ase mean   sd median_r
##       0.77      0.77    0.74       0.4 3.4 0.016 0.76 0.31     0.41
## 
##  lower alpha upper     95% confidence boundaries
## 0.74 0.77 0.8 
## 
##  Reliability if an item is dropped:
##       raw_alpha std.alpha G6(smc) average_r S/N alpha se   var.r med.r
## X1.7       0.75      0.75    0.69      0.43 3.0    0.018 0.00094  0.43
## X0.20      0.73      0.73    0.69      0.41 2.7    0.020 0.00787  0.43
## X1.3       0.71      0.71    0.65      0.38 2.4    0.021 0.00576  0.37
## X1.4       0.73      0.73    0.68      0.40 2.7    0.020 0.00543  0.40
## X1         0.73      0.73    0.69      0.41 2.8    0.020 0.00832  0.41
## 
##  Item statistics 
##         n raw.r std.r r.cor r.drop mean   sd
## X1.7  499  0.68  0.68  0.57   0.49 0.75 0.43
## X0.20 499  0.72  0.72  0.61   0.54 0.74 0.44
## X1.3  499  0.77  0.77  0.70   0.61 0.77 0.42
## X1.4  499  0.73  0.73  0.63   0.55 0.75 0.43
## X1    499  0.72  0.72  0.61   0.54 0.77 0.42
## 
## Non missing response frequency for each item
##          0    1 miss
## X1.7  0.25 0.75    0
## X0.20 0.26 0.74    0
## X1.3  0.23 0.77    0
## X1.4  0.25 0.75    0
## X1    0.23 0.77    0

4. Now perform an EFA with ‘GEOMIN rotation,’ treating the data as categorical. How many factors do the results suggest? Justify your answer using multiple sources of information from the output. Did the analysis lead to the same conclusion as above?

For this question, we can again use the psych package and specify the rotation as ‘geominQ.’ You’ll notice that there are two different types of GEOMIN-related rotations that the psych package offers. The ‘geominT’ rotation is an orthogonal rotation while the “geominQ” package is an oblique rotation. Since the data is binary, we want to specify the correlation to tetrachoric correlations. Using the polycor package, we can create a correlation matrix of tetrachoric correlations to then feed into the psych package factor analysis command.

Once we have our matrix, we can feed the correlations specifically into ‘fa’ and just specify our factors and rotation method. In theory, you can also specify the type of correlation you would like in the ‘fa’ command by specifying ‘cor =’ (with options like ‘poly’ for polychoric correlations or ‘tet’ for tetrachoric correlations).

library(polycor)

## 
## Attaching package: 'polycor'

## The following object is masked from 'package:psych':
## 
##     polyserial

tet.cor<- hetcor(data, pd=TRUE) # pd = T will try and force the matrix to be positive definite
faPC <- fa(r = tet.cor$correlations, nfactors=3, rotate="geominQ") # r = cor matrix, 
summary(faPC)

## 
## Factor analysis with Call: fa(r = tet.cor$correlations, nfactors = 3, rotate = "geominQ")
## 
## Test of the hypothesis that 3 factors are sufficient.
## The degrees of freedom for the model is 348  and the objective function was  1.63 
## 
## The root mean square of the residuals (RMSA) is  0.04 
## The df corrected root mean square of the residuals is  0.04 
## 
##  With factor correlations of 
##      MR1  MR2  MR3
## MR1 1.00 0.21 0.21
## MR2 0.21 1.00 0.14
## MR3 0.21 0.14 1.00

faPC$loadings # can extract just the loadings (and hides insignificant ones)

## 
## Loadings:
##       MR1    MR2    MR3   
## X0     0.539        -0.120
## X0.1   0.672 -0.101       
## X0.2   0.600  0.162  0.110
## X0.3   0.693              
## X0.4   0.741              
## X0.5   0.503         0.150
## X0.6   0.681              
## X0.7   0.534        -0.139
## X0.8   0.628        -0.135
## X0.9   0.593              
## X0.10         0.623 -0.198
## X0.11         0.483 -0.138
## X0.12         0.476 -0.227
## X0.13  0.204  0.383       
## X0.14         0.555       
## X0.15         0.731       
## X0.16         0.713       
## X0.17         0.721       
## X0.18         0.593  0.165
## X0.19  0.170  0.525       
## X1                   0.599
## X1.1   0.215         0.468
## X1.2  -0.114  0.186  0.407
## X1.3         -0.112  0.689
## X1.4                 0.621
## X1.5         -0.179  0.484
## X0.20  0.146         0.593
## X1.6   0.119         0.522
## X1.7         -0.138  0.579
## X0.21  0.229  0.168  0.345
## 
##                  MR1   MR2   MR3
## SS loadings    4.121 3.677 3.157
## Proportion Var 0.137 0.123 0.105
## Cumulative Var 0.137 0.260 0.365

plot(faPC)

Conclusion

Overall, our results are fairly comparable for when we used the tetrachoric correlations with the geomin rotation compared to the oblimin rotation from question 1. We conclude a three factor solution is sufficient given the data based on the scree plot, factor analysis plot, pattern matrix, and eigen value table.