STAT 360: Computational Statistics and Data Analysis

Load R Libraries, Import and Attach Relevant Data, and Specify Seed

library(rmarkdown); library(knitr); library(readxl)
set.seed(37)

EXERCISE 01

Data Time

eigenvals <- c(2.232,1.423,.705,.363,.277)
stonks <- matrix(c(-.551,-.505,-.444,-.379,-.316,.285,.328,.231,-.584,-.645,-.247,-.443,.798,.217,-.244,.354,-.442,.190,-.595,.538,.655,-.495,-.277,.337,-.367), nrow = 5, ncol = 5)
rownames(stonks) <- c("SNDL", "WRN", "NGD", "UPH", "WISH")
colnames(stonks) <- c("F1","F2","F3","F4","F5")
stonks
##          F1     F2     F3     F4     F5
## SNDL -0.551  0.285 -0.247  0.354  0.655
## WRN  -0.505  0.328 -0.443 -0.442 -0.495
## NGD  -0.444  0.231  0.798  0.190 -0.277
## UPH  -0.379 -0.584  0.217 -0.595  0.337
## WISH -0.316 -0.645 -0.244  0.538 -0.367

Part a

Our intrinsic dimensionality using Kaiser is 2
mystonks <- stonks[,1:2]%*%sqrt(diag(eigenvals)[1:2,1:2])
mystonks
##            [,1]       [,2]
## SNDL -0.8231874  0.3399753
## WRN  -0.7544639  0.3912698
## NGD  -0.6633307  0.2755589
## UPH  -0.5662214 -0.6966511
## WISH -0.4721002 -0.7694177

Part (b)

Yes, we have complexity with all of the rows except for NGD

Part (c)

plot(mystonks, xlim = c(-1,1), ylim = c(-1,1))
abline(v = 0, h = 0)

Part (d)

I think we can do orthogonal rotation, because it looks like rotating maybe 60 degrees would fix our problem (aka decrease complexity)

Part (e)

psi <- matrix(c(cos(-25*pi/180), sin(-25*pi/180), -sin(-25*pi/180), cos(-25*pi/180)), nrow = 2, ncol = 2)
arot <- mystonks%*%psi
arot
##            [,1]        [,2]
## SNDL -0.8897409 -0.03977179
## WRN  -0.8491343  0.03576061
## NGD  -0.7176380 -0.03059448
## UPH  -0.2187534 -0.87067579
## WISH -0.1026981 -0.89684740

Part (f)

plot(arot, xlim = c(-1,1), ylim = c(-1,1))
abline(v = 0, h = 0)

Part (g)

This rotation was absolutely fabulous and eliminated all of the complexity. What a great day.

Part (h)

psi2 <- matrix(c(cos(-37*pi/180), sin(-37*pi/180), -sin(-37*pi/180), cos(-37*pi/180)), nrow = 2, ncol = 2)
arot2 <- mystonks%*%psi2
arot2
##             [,1]       [,2]
## SNDL -0.86202888 -0.2238902
## WRN  -0.83801370 -0.1415658
## NGD  -0.69559489 -0.1791312
## UPH  -0.03294947 -0.8971308
## WISH  0.08601114 -0.8986013
plot(arot2, xlim = c(-1,1), ylim = c(-1,1))
abline(v = 0, h = 0)

Part (i)

I think that the 25 degree rotation is more effective at reducing complexity than the 37 degree rotation. If we look at the graphs, the points are closer to axes for the 25 rather than the 37. Also, if we compare the correlations, the 25 degree rotation is better in 3 of the rows and 37 in only 2 I think.

Part (j)

var(arot[,2]^2) + var(arot[,1]^2)
## [1] 0.3187261
var(arot2[,2]^2) + var(arot2[,1]^2)
## [1] 0.3110978

Part (k)

We were right! The varimax for the 25 degree rotation was marginally better than the 37 degree rotation. (We definitely didn't do this wrong initially and believe that our initial conclusions were wrong.... We never doubted ourselves.) 

Part (l)

rotator3001 <- function(degrees)
{
    psi <- matrix(c(cos(-degrees*pi/180), sin(-degrees*pi/180),
                    -sin(-degrees*pi/180), cos(-degrees*pi/180)), 
                    nrow = 2, ncol = 2)
    arot <- mystonks%*%psi
    varimax <- var(arot[,2]^2) + var(arot[,1]^2)
    return(-varimax)
}

Part (m)

optim(par = 0, fn = rotator3001, method = "Nelder-Mead")
## Warning in optim(par = 0, fn = rotator3001, method = "Nelder-Mead"): one-dimensional optimization by Nelder-Mead is unreliable:
## use "Brent" or optimize() directly
## $par
## [1] 30.15156
## 
## $value
## [1] -0.32887
## 
## $counts
## function gradient 
##       42       NA 
## 
## $convergence
## [1] 0
## 
## $message
## NULL

Part (n)

psi <- matrix(c(cos(-30.15156*pi/180), sin(-30.15156*pi/180),
                    -sin(-30.15156*pi/180), cos(-30.15156*pi/180)), 
                    nrow = 2, ncol = 2)
arot <- mystonks%*%psi
arot
##             [,1]        [,2]
## SNDL -0.88257578 -0.11950150
## WRN  -0.84891530 -0.04062811
## NGD  -0.71199208 -0.09490803
## UPH  -0.13969133 -0.88680084
## WISH -0.02175483 -0.90244606

Part (o)

Well, I don’t think that we can increase this using a different angle of rotation since we already applied our optimization function…

var(arot[,2]^2) + var(arot[,1]^2)
## [1] 0.32887

EXERCISE 02

Part (a)

library(readxl)
DistressData <- read_excel("C:/Users/Sarah Chock/OneDrive - University of St. Thomas/Senior Year/STAT 360 Comp Stat and Data Analysis/Exploratory Data Analysis/DistressData.xlsx")
dd <- as.matrix(DistressData)
dd[which(dd==3)] = 6
dd[which(dd==5)] = 3
dd[which(dd==6)] = 5

Part (b)

R <- cor(dd)
val <- eigen(R)$values
vec <- eigen(R)$vectors
Lreduced <- diag(val[1:2])
Vreduced <- vec[,(1:2)]
loading <- Vreduced%*%sqrt(Lreduced)
loading
##             [,1]        [,2]
##  [1,] -0.8152669 -0.06722211
##  [2,] -0.5915083 -0.47432527
##  [3,] -0.6365186 -0.43014068
##  [4,] -0.7875711 -0.16734074
##  [5,] -0.8271228 -0.17485648
##  [6,] -0.8366072  0.05893453
##  [7,] -0.7565806 -0.17848135
##  [8,] -0.5389876  0.59585573
##  [9,] -0.6188137  0.59261280
## [10,] -0.4113073  0.68911750

Part (c)

plot(loading, xlim = c(-1,1), ylim = c(-1,1))
abline(v = 0, h = 0)

Part (d)

Visually, this looks very complex because there are a lot of points hanging out in the middle of the second and third quadrants. 

Part (e)

We don't think that orthogonal rotation will fix our problems (aka reduce complexity), for several reasons. For one, while we were in class, you said we would have a problem where rotating wouldn't fix it. For two, the point clusters are a distance away from each other where moving the axes won't put them in a better place.