library(rmarkdown); library(knitr)
library(dplyr)
library(psych); library(corrplot); library(GPArotation)
library(lavaan); library(moments)
HEALTHDATA <- read.csv("https://www.dropbox.com/s/sb8s1i4h1z7g7fh/?dl=1")
attach(HEALTHDATA)
set.seed(37)
Compute the eigenvalues for the (missing value) correlation matrix, R_yy, for the interaction between the dependent variables (DVs) of ISOLATED, TROUBLED, SELFHARM, and THOUGHTS
r_yy <- cor(cbind(ISOLATED, TROUBLED, SELFHARM, THOUGHTS), use = "pairwise.complete.obs")
eigen(r_yy)$values
## [1] 1.6866483 1.0983928 0.6333929 0.5815660
Use pca() to generate the optimally rotated (factor) loading matrix, AVARIMAX, for the original DVs based on the (orthogonal) varimax algorithm
(avar <- pca(r_yy, nfactors = 2, rotate = "varimax"))$loadings[]
## RC1 RC2
## ISOLATED 0.11368314 0.81427621
## TROUBLED 0.04250767 0.83279093
## SELFHARM 0.84336224 0.06346262
## THOUGHTS 0.82454601 0.13623659
Explain what the rotated loading matrix above means in terms of which factor is more likely measuring “suicidality”
Factor 1 seems to be measuring suicidality with high values for self-harm and suicidal thoughts
Use solve() to compute the inverse of the correlation matrix, R_yy, above
avar$loadings[] %*% t(avar$loadings[])
## ISOLATED TROUBLED SELFHARM THOUGHTS
## ISOLATED 0.6759696 0.68295424 0.14755217 0.2046712
## TROUBLED 0.6829542 0.69534763 0.08870046 0.1485061
## SELFHARM 0.1475522 0.08870046 0.71528738 0.7040369
## THOUGHTS 0.2046712 0.14850613 0.70403690 0.6984365
Use the inverse of the correlation matrix and the rotated loading matrix above to compute the matrix of regression coefficients, B1
(B <- solve(r_yy) %*% avar$loadings[])
## RC1 RC2
## ISOLATED -0.04451342 0.59989176
## TROUBLED -0.10043662 0.62523036
## SELFHARM 0.61777784 -0.08559891
## THOUGHTS 0.59222802 -0.02738934
Explain how to find the standardized score on the first rotated factor for the first college student in the data
Take the betas/coefficients generated in problem 5 and multiply them with the corresponding data of the college student, then sum:
-0.045 * 1 + -0.1 * 1 + 0.618 * 0 + 0.592 * 1 = 0.447
Use cbind() and scale() to compute (but do not print) the matrix of standardized scores, Z, on ISOLATED, TROUBLED, SELFHARM, and THOUGHTS
Z <- scale(cbind(ISOLATED, TROUBLED, SELFHARM, THOUGHTS))
Use the matrix of standardized scores and the matrix of regression coefficients to compute (but do not print) the standardized factor scores, F1, for the data
FF <- Z %*% B
Use matrix notation to display the standardized factor scores for the first college student in the data
FF[1,]
## RC1 RC2
## 0.4132606 0.4159632
Explain what the first standardized factor score (z-score) for the first college student above means in terms of standard deviations
The college student is 0.4133 standard deviations above average for the suicidality factor.
Use plot() to generate a scatter plot of the rotated loadings based on the varimax algorithm above
plot(avar$loadings[], xlim = c(-2, 2), ylim = c(-2, 2))
abline(h = 0, v = 0)
Explain what the scatter plot above means in terms of whether or not an oblique (rather than an orthogonal) rotation may be necessary
Because all of the points are in quadrant 1, oblique roation may increase the purity of the loading matrix.
Use pca() to generate the optimally rotated (pattern) loading matrix, AOBLIMIN, for the original variables based on the (oblique) oblimin algorithm
(AOBLIMIN <- pca(r_yy, nfactors = 2, rotate = "oblimin"))$loadings[]
## TC1 TC2
## ISOLATED 0.03962165 0.81308212
## TROUBLED -0.03454497 0.84031903
## SELFHARM 0.85237898 -0.03561774
## THOUGHTS 0.82643016 0.04048665
Compute the matrix of regression coefficients, B2, for the oblique rotation above
(B2 <- solve(r_yy) %*% AOBLIMIN$loadings[])
## TC1 TC2
## ISOLATED -0.1013489 0.6142267
## TROUBLED -0.1606305 0.6465813
## SELFHARM 0.6367233 -0.1601708
## THOUGHTS 0.6052823 -0.0980519
Compute (but do not print) the standardized factor scores, F2, for the data based on the oblique rotation above
F2 <- Z %*% B2
Compute the matrix of correlations between the (optimally and oblique) rotated factors above, PHI
(PHI <- cor(F2))
## TC1 TC2
## TC1 1.0000000 -0.2063777
## TC2 -0.2063777 1.0000000
Explain what the correlation matrix above means in terms of whether or not oblique rotation was, after all, necessary
It does not appear that oblique rotation was necessary because the interaction correlation is no significant (< 0.3)
Use the rotated and oblique factor loading matrix, AOBLIMIN, and the factor correlation matrix, PHI, above to compute the structure matrix, C
(C <- AOBLIMIN$loadings[] %*% PHI)
## TC1 TC2
## ISOLATED -0.1281804 0.8049051
## TROUBLED -0.2079681 0.8474483
## SELFHARM 0.8597297 -0.2115297
## THOUGHTS 0.8180746 -0.1300701
Explain what the structure matrix above means in terms of which factor is more likely measuring “suicidality”
TC1 is more likely measuring suicidality because it has high measure for selfharm and thoughts.
Use the structure matrix, C, and the oblique loading matrix, AOBLIMIN, above to compute the reproduced correlation matrix, RR, for the exploratory factor analysis (based on the oblimin algorithm)
(RR <- C %*% t(AOBLIMIN$loadings[]))
## ISOLATED TROUBLED SELFHARM THOUGHTS
## ISOLATED 0.64937523 0.6808051 -0.1379271 -0.07334421
## TROUBLED 0.68080505 0.7193112 -0.2074518 -0.13756074
## SELFHARM -0.13792714 -0.2074518 0.7403497 0.70194242
## THOUGHTS -0.07334421 -0.1375607 0.7019424 0.67081544