Bruno Wu
March 4, 2014
Using USArrests data set
states = row.names(USArrests)
head(states)
[1] "Alabama" "Alaska" "Arizona" "Arkansas" "California"
[6] "Colorado"
apply(USArrests, 2, mean)
Murder Assault UrbanPop Rape
7.788 170.760 65.540 21.232
apply(USArrests, 2, var)
Murder Assault UrbanPop Rape
18.97 6945.17 209.52 87.73
Assault has the largest mean and variance. So if we don't scale the variables, Assault will dominate the principal components.pr.out = prcomp(USArrests, scale=T)
names(pr.out)
[1] "sdev" "rotation" "center" "scale" "x"
pr.out$center
Murder Assault UrbanPop Rape
7.788 170.760 65.540 21.232
pr.out$scale
Murder Assault UrbanPop Rape
4.356 83.338 14.475 9.366
pr.out$rotation
PC1 PC2 PC3 PC4
Murder -0.5359 0.4182 -0.3412 0.64923
Assault -0.5832 0.1880 -0.2681 -0.74341
UrbanPop -0.2782 -0.8728 -0.3780 0.13388
Rape -0.5434 -0.1673 0.8178 0.08902
n observations and p variables.scale=0 ensures that the arrows are scaled to represent the loadings.
pr.out$rotation = -pr.out$rotation
pr.out$x = -pr.out$x
biplot(pr.out, scale=0)
pr.out$sdev
[1] 1.5749 0.9949 0.5971 0.4164
pr.var = pr.out$sdev^2
pr.var
[1] 2.4802 0.9898 0.3566 0.1734
pve = pr.var / sum(pr.var)
pve
[1] 0.62006 0.24744 0.08914 0.04336
cumsum() calculates the cumulative sum of a numeric vector.