STAT 360: Computational Statistics and Data Analysis

Load R Libraries, Import and Attach Relevant Data, and Specify Seed

library(rmarkdown); library(knitr); library(readxl)
set.seed(37)

EXERCISE 01

Part (a)

stonks <- matrix(c(0.00023, 0.00038, 0.00022, 0.00007, 0.00006,
                   0.00038, 0.00134, 0.00041, 0.00013, 0.00008,
                   0.00022, 0.00041, 0.00093, 0.00019, 0.00002,
                   0.00007, 0.00013, 0.00019, 0.00068, 0.00037,
                   0.00006, 0.00008, 0.00002, 0.00037, 0.00053), 
                   nrow =5,ncol = 5)
rownames(stonks)<- c("SNDL", "WRN", "NGD", "UPH", "WISH")
colnames(stonks)<- c("SNDL", "WRN", "NGD", "UPH", "WISH")
stonks
##         SNDL     WRN     NGD     UPH    WISH
## SNDL 0.00023 0.00038 0.00022 0.00007 0.00006
## WRN  0.00038 0.00134 0.00041 0.00013 0.00008
## NGD  0.00022 0.00041 0.00093 0.00019 0.00002
## UPH  0.00007 0.00013 0.00019 0.00068 0.00037
## WISH 0.00006 0.00008 0.00002 0.00037 0.00053

Part (b)

R <- solve(sqrt(diag(diag(stonks)))) %*% stonks %*%
  t(solve(sqrt(diag(diag(stonks)))))
R
##           [,1]       [,2]       [,3]      [,4]       [,5]
## [1,] 1.0000000 0.68449027 0.47568263 0.1770026 0.17184995
## [2,] 0.6844903 1.00000000 0.36727383 0.1361873 0.09492916
## [3,] 0.4756826 0.36727383 1.00000000 0.2389228 0.02848725
## [4,] 0.1770026 0.13618726 0.23892284 1.0000000 0.61632436
## [5,] 0.1718499 0.09492916 0.02848725 0.6163244 1.00000000

Part (c)

library(corrplot)
## corrplot 0.92 loaded
corrplot(R, method = "color")

Part (d)

From the heat map, there appears to be two clusters, similar to the clusters from in class! There's one 3x3 in the top left and one 2x2 in the bottom right.

Part (e)

eigen(R)$values
## [1] 2.2315142 1.4234986 0.7052701 0.3628567 0.2768604
eigen(R)$vectors
##            [,1]       [,2]       [,3]       [,4]       [,5]
## [1,] -0.5510097  0.2846241 -0.2465648  0.3535425  0.6554319
## [2,] -0.5050987  0.3282943 -0.4429402 -0.4423280 -0.4952253
## [3,] -0.4443831  0.2314799  0.7976778  0.1901316 -0.2765878
## [4,] -0.3794004 -0.5840746  0.2169151 -0.5950724  0.3372668
## [5,] -0.3159789 -0.6453572 -0.2442697  0.5376732 -0.3673027

Part (f)

Using Kaiser's criterion, there should be two clusters (so the intrinsic dimensionality is 2). This is because two of the eigenvalues are greater than 1.

Part (f)

Wow! I guessed that there would be the same number of dimensions. Heat maps really can be good.

EXERCISE 02

Part (a)

southAfrica <- c(20.225,8.432,2.356,2.156,1.835,.895,.779,.701,.653,.601,.552,.492,.452,.401,.369,.301,.215,.198,.182,.173,.167,.161,.157,.152,.107,.101,.040,.037,.034)
southAfrica
##  [1] 20.225  8.432  2.356  2.156  1.835  0.895  0.779  0.701  0.653  0.601
## [11]  0.552  0.492  0.452  0.401  0.369  0.301  0.215  0.198  0.182  0.173
## [21]  0.167  0.161  0.157  0.152  0.107  0.101  0.040  0.037  0.034

Part (b)

plot(southAfrica, pch = 16)

Part (c)

We thing that the inflection point is at point 5, which means the instrinsic dimensionality would be 4 based on this point.

Part (d)

Using Kaiser's criterion, the intrinsic dimensionality would be 5.

Part (e)

Using Jolliffe's criterion, the intrinsic dimensionality would be 8.

Part (f)

There are so many differences between inflection point, Kaiser, and Jolliffe! I'm thinking that Joliffe missed the 8am team zoom calls to discuss intrinsic dimensionality since that number of 8 was way different from 5 and 6. Since they are different, we would have to look more closely at what the clusters are to see if they make sense, and then choose from there.

EXERCISE 03

Part (a)

epilepsy <- read.csv("C:/Users/Sarah Chock/OneDrive - University of St. Thomas/Senior Year/STAT 360 Comp Stat and Data Analysis/Exploratory Factor Analysis/Epilepsy Detection.csv")

Part (b)

corrEpilepsy <- cor(epilepsy)

Part (c)

corrplot(corrEpilepsy, method = "color", tl.pos = 'n')

Part (d)

I think there are 8 clusters based on this data, after rigorous discussion with Connor. We counted up the clusters on the diagonal and split them up when there were light sections inbetween the dark sections.

Part (e)

eigenvalues <- eigen(corrEpilepsy)$values

Part (f)

sum(eigenvalues > 1) #kaiser
## [1] 54
sum(eigenvalues > .7) #joliffe
## [1] 72

Part (g)

Well, needless to say, there certainly are differences between my answer of 8 and the criteria of 54 and 72. I think this is because my eyes are not very discerning of all of these clusters since there are so amny dimensions to begin with.