STAT 360: Computational Methods in STAT

Lab 6: Orthogonal Factor Rotations, Communalities, and Variances

Load R Libraries, Import and Attach Relevant Data, and Specify Seed

library(rmarkdown); library(knitr)

library(dplyr)
library(psych); library(corrplot); library(GPArotation)
library(lavaan); library(moments)

ACADEMICDATA <- read.csv("https://www.dropbox.com/s/4qtq7noyk76axk4/?dl=1")
attach(ACADEMICDATA)

set.seed(37)

Use some of the available variables to explore the interrelations between instances of common obstacles to the academic success of college students

Exercise 01:

Use pca() to generate the unrotated (factor) loading matrix, A, for an exploratory factor analysis of the interaction between PHYSICAL, SEXUAL, STDI, ILLNESS, and PAIN

r <- cor(cbind(PHYSICAL, SEXUAL, STDI, ILLNESS, PAIN), use = "pairwise.complete.obs")
#e <- eigen(r)$values
#v <- eigen(r)$vectors
#A <- v[,1:2] %*% sqrt(l[1:2,1:2])

(load <- pca(r, nfactors = 2, rotate = "none")$loadings[])
##                PC1        PC2
## PHYSICAL 0.6849157  0.4313925
## SEXUAL   0.6342701  0.4462439
## STDI     0.5494984  0.2996891
## ILLNESS  0.6333489 -0.5878605
## PAIN     0.6382143 -0.5810972

Exercise 02:

Use plot() to generate a scatter plot of the loadings of the original variables onto each of the factors extracted in the exploratory factor analysis above

plot(load, xlim = c(-2, 2), ylim = c(-1, 1))
abline(h = 0, v = 0)

Exercise 03:

Compute the rotation matrix, PSI37, for a 37 degree clockwise rotation of the loadings onto the extracted factors

psi <- -37*pi/180
PSI37 <- matrix(c(cos(psi), -sin(psi), sin(psi), cos(psi)), byrow = T, nrow = 2)

Exercise 04:

Use A1 and PSI37 to compute the rotated (by 37 degrees) loading matrix

(A37 <- load %*% PSI37)
##               [,1]        [,2]
## PHYSICAL 0.2873795  0.75671792
## SEXUAL   0.2379944  0.73809951
## STDI     0.2584915  0.57003882
## ILLNESS  0.8595982 -0.08832740
## PAIN     0.8594137 -0.07999791

Exercise 05:

Explain what the values in each of the rows of A37 mean in terms of if there are any remaining complex variables and if rotation was, therefore, successful (in decreasing complexity among variables)

Rotation was successful, there are no complex variables remaining.

Exercise 06:

Use plot() to generate a scatter plot of the rotated loadings above

plot(A37, xlim = c(-2, 2), ylim = c(-1, 1))
abline(h = 0, v = 0)

Exercise 07:

Explain what the scatter plot above means in terms of whether or not the rotation helped to decrease complexity among variables

It is much more obvious which factor each variable corresponds to.

Exercise 08:

Use pca() to generate the optimally rotated (factor) loading matrix, AVARIMAX, for the original variables based on the varimax algorithm

(Avar <- pca(r, nfactors = 2, rotate = "varimax"))$loadings[]
##                RC1        RC2
## PHYSICAL 0.8039160 0.09448853
## SEXUAL   0.7738338 0.05112269
## STDI     0.6158211 0.11192182
## ILLNESS  0.1236224 0.85523580
## PAIN     0.1316563 0.85302883

Exercise 09:

Use pca() to generate the optimally rotated (factor) loading matrix, AEQUAMAX, for the original variables based on the equamax algorithm

(Aequ <- pca(r, nfactors = 2, rotate = "equamax"))$loadings[]
##                 RC1        RC2
## PHYSICAL 0.79976600 0.12483329
## SEXUAL   0.77134584 0.08036047
## STDI     0.61114625 0.13513841
## ILLNESS  0.09118004 0.85930028
## PAIN     0.09929169 0.85739881

Exercise 10:

Explain what the two rotated loading matrices above mean in terms of whether the varimax or equamax algorithms were more effective in decreasing complexity among variables

They both decreased complexity roughly the same amount and produced very similar matrices.

Exercise 11:

Use the optimally rotated (factor) loading matrix based on the varimax algorithm, AVARIMAX, above to compute the communalities for each of the variables

Avar$communality
##  PHYSICAL    SEXUAL      STDI   ILLNESS      PAIN 
## 0.6552090 0.6014322 0.3917621 0.7467108 0.7449916

Exercise 12:

Explain what the communalities above mean in terms of how much of the variability in PHYSICAL was lost by performing the exploratory factor analysis

This indicates a loss of roughly 34.5% (1-0.6552) of its variability. 

Exercise 13:

Use the optimally rotated (factor) loading matrix based on the varimax algorithm, AVARIMAX, above to compute the variance accounted for by each rotated factor

colSums(Avar$loadings[]^2)/5
##       RC1       RC2 
## 0.3313902 0.2966309

Exercise 14:

Explain what the variances above mean in terms of how much variability in the original variables is explained by the first rotated factor

33.14% of the of the variability is explained in the first rotated factor.

Exercise 15:

Explain what the variances above mean in terms of how much additional variability in the original DVs is explained by the addition of the second rotated factor

29.66% of the of the variability is explained in the second rotated factor.

Exercise 16:

Explain what the variances above mean in terms of how much variability in the original variables is explained by the exploratory factor analysis as a whole

62.8% of the of the variability is explained in the rotated factors.

Exercise 17:

Compute the transpose of the rotated (factor) loading matrix based on the varimax algorithm, AVARIMAX, above

(tAvar <- t(Avar$loadings[]))
##       PHYSICAL     SEXUAL      STDI   ILLNESS      PAIN
## RC1 0.80391599 0.77383376 0.6158211 0.1236224 0.1316563
## RC2 0.09448853 0.05112269 0.1119218 0.8552358 0.8530288

Exercise 18

Use the optimally rotated (factor) loading matrix based on the varimax algorithm, AVARIMAX, and its transpose above to compute the reproduced correlation matrix, RR, for the exploratory factor analysis

(rr <- Avar$loadings[] %*% tAvar)
##           PHYSICAL    SEXUAL      STDI   ILLNESS      PAIN
## PHYSICAL 0.6552090 0.6269278 0.5056438 0.1801920 0.1864420
## SEXUAL   0.6269278 0.6014322 0.4822649 0.1393851 0.1454892
## STDI     0.5056438 0.4822649 0.3917621 0.1718488 0.1765493
## ILLNESS  0.1801920 0.1393851 0.1718488 0.7467108 0.7458165
## PAIN     0.1864420 0.1454892 0.1765493 0.7458165 0.7449916

Exercise 19:

Use the original correlation matrix, R, and the reproduced correlation matrix, RR, above to compute the residual correlation matrix, RES

r-rr
##              PHYSICAL      SEXUAL        STDI      ILLNESS         PAIN
## PHYSICAL  0.344790999 -0.19066527 -0.21075232 -0.003490247  0.004385892
## SEXUAL   -0.190665274  0.39856779 -0.25831362  0.020204505  0.010868493
## STDI     -0.210752324 -0.25831362  0.60823787 -0.017708486 -0.023224057
## ILLNESS  -0.003490247  0.02020450 -0.01770849  0.253289232 -0.252445365
## PAIN      0.004385892  0.01086849 -0.02322406 -0.252445365  0.255008438

Exercise 20:

Explain what the residual correlation matrix above means in terms of how successful the exploratory factor analysis was in reproducing the original correlation structure between PHYSICAL, SEXUAL, STDI, ILLNESS, and PAIN

The exploratory factor analysis was somewhat successful.  Most of the residuals are small, although some get a bit large, such as Illness/Pain.