STAT 360: Computational Statistics and Data Analysis

Load R Libraries, Import and Attach Relevant Data, and Specify Seed

library(rmarkdown); library(knitr); library(readxl)
set.seed(37)

EXERCISE 01

Part (a)

library(readxl)
DistressData <- read_excel("C:/Users/Sarah Chock/OneDrive - University of St. Thomas/Senior Year/STAT 360 Comp Stat and Data Analysis/Exploratory Data Analysis/DistressData.xlsx")
dd <- as.matrix(DistressData)
dd[which(dd==3)] = 6
dd[which(dd==5)] = 3
dd[which(dd==6)] = 5
head(DistressData)
## # A tibble: 6 x 10
##   Hopelessness Overwhelmed Exhausted VeryLonely VerySad Depressed Anxiety
##          <dbl>       <dbl>     <dbl>      <dbl>   <dbl>     <dbl>   <dbl>
## 1            5           3         3          5       5         5       4
## 2            4           4         4          5       5         2       3
## 3            1           4         3          4       2         2       2
## 4            5           3         4          3       4         2       3
## 5            5           5         5          5       5         1       1
## 6            3           4         4          3       3         4       3
## # ... with 3 more variables: SelfHarm <dbl>, SuicidalThoughts <dbl>,
## #   SuicidalAttempts <dbl>

Part (b)

R <- cor(dd)
eigen(R)$values
##  [1] 4.8390360 1.6895478 0.8790759 0.4963882 0.4727502 0.4058604 0.3471259
##  [8] 0.3220688 0.2873694 0.2607774
#The intrinsic dimensionality using Kaiser's criterion is 2
library(psych)
## Warning: package 'psych' was built under R version 4.0.5
A <- pca(R, 2, rotate = "varimax")$loadings[]
A
##                          RC1         RC2
## Hopelessness     0.736782679  0.35542955
## Overwhelmed      0.750335749 -0.10891644
## Exhausted        0.766728049 -0.04801078
## VeryLonely       0.763674783  0.25509207
## VerySad          0.801576138  0.26866823
## Depressed        0.691210438  0.47498740
## Anxiety          0.742612047  0.22977641
## SelfHarm         0.162446158  0.78686905
## SuicidalThoughts 0.232894533  0.82454863
## SuicidalAttempts 0.005108308  0.80251514

Part (c)

There is still some complexity in our optimally rotated matrix. Hopelessness and Depressed are both complex.

Part (d)

Looking at the factors that are complex, it seeme like hopelessness and depressed would be correlated with this theme of sadness and emotions in factor 1, as well as the the mental health problems of selfharm/suicidal ideas in factor 2. I think factor 1 is something to do with negative emotions, where factor 2 has to do more with impacts of mental health. Even in the rows that aren't complex, the correlations are still not negligible, so it's possible we would benefit from oblique rotation. 

Part (e)

plot(A, xlim = c(-1,1), ylim = c(-1,1))
abline(v = 0, h = 0)

Part (f)

It looks like we can go through the centers of our clusters better if we did an oblique rotation. It kind of looks like if we rotate our x axis (RC1) by about 30 degrees and our y axis (RC2) by about -15 degrees, we would hit these clusters better. 

Part (g)

library(GPArotation)
Aoblique <- pca(R, 2, rotate = "oblimin")$loadings[]
Aoblique
##                          TC1         TC2
## Hopelessness      0.72687711  0.20895915
## Overwhelmed       0.80001663 -0.27378107
## Exhausted         0.80946160 -0.21436714
## VeryLonely        0.76779000  0.09949767
## VerySad           0.80577935  0.10538218
## Depressed         0.66395180  0.34229585
## Anxiety           0.74893402  0.07786394
## SelfHarm          0.07033278  0.77819767
## SuicidalThoughts  0.13936505  0.80199550
## SuicidalAttempts -0.09650970  0.82817441
plot(Aoblique, xlim = c(-1,1), ylim = c(-1,1))
abline(v = 0, h = 0)

Part (h)

Z <- scale(dd)
B <- solve(R) %*% Aoblique
factorscores <- Z %*% B
head(factorscores)
##             TC1        TC2
## [1,]  0.2541373 -0.1937682
## [2,]  0.1713971 -0.6322462
## [3,] -0.3687477 -0.7357213
## [4,]  0.7575780 -0.9600768
## [5,] -1.0760793 -0.1615432
## [6,]  1.0521125 -0.6240366

Part (i)

phi <- cor(factorscores)
phi
##            TC1        TC2
## TC1  1.0000000 -0.3198569
## TC2 -0.3198569  1.0000000

Part (j)

Yes, this rotation was necessary, but it was flying by the skin of it's teeth! In order for oblique rotation to be worthwhile, the absolute value of the correlation between the factors needs to be bigger than .3, and it is, at .32. Thus, this rotation is significant enough, so we can stick with our oblique rotation. 
library(cats)
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.0.5
## 
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
## 
##     %+%, alpha
stonks <- matrix(c(0.00023, 0.00038, 0.00022, 0.00007, 0.00006,
                   0.00038, 0.00134, 0.00041, 0.00013, 0.00008, 
                   0.00022, 0.00041, 0.00093, 0.00019, 0.00002,
                   0.00007, 0.00013, 0.00019, 0.00068, 0.00037,
                   0.00006, 0.00008, 0.00002, 0.00037, 0.00053), nrow = 5,ncol = 5)
stonks <- as.data.frame(stonks)
ggplot(stonks, aes(x = stonks[,1], y = stonks[,2])) + add_cat()+ geom_point()

here_kitty()

## meow