Lab 10 from ISLR Book (Stats 216 Stanford)

Bruno Wu
March 4, 2014

Principle Component Analysis

Principle Component Analysis Exericse

Using USArrests data set

This is from Lab 10 from ISLR book
Testing out how to use R Presentation on RStudio
See if this can create 2 columns

Check out the titles

states = row.names(USArrests)
head(states)

[1] "Alabama"    "Alaska"     "Arizona"    "Arkansas"   "California"
[6] "Colorado"

Examine the data

apply(USArrests, 2, mean)

  Murder  Assault UrbanPop     Rape 
   7.788  170.760   65.540   21.232

apply(USArrests, 2, var)

  Murder  Assault UrbanPop     Rape 
   18.97  6945.17   209.52    87.73

Assault has the largest mean and variance. So if we don't scale the variables, Assault will dominate the principal components.

Using the prcomp() function

pr.out = prcomp(USArrests, scale=T)
names(pr.out)

[1] "sdev"     "rotation" "center"   "scale"    "x"

pr.out$center

  Murder  Assault UrbanPop     Rape 
   7.788  170.760   65.540   21.232

pr.out$scale

  Murder  Assault UrbanPop     Rape 
   4.356   83.338   14.475    9.366

rotation matrix

pr.out$rotation

             PC1     PC2     PC3      PC4
Murder   -0.5359  0.4182 -0.3412  0.64923
Assault  -0.5832  0.1880 -0.2681 -0.74341
UrbanPop -0.2782 -0.8728 -0.3780  0.13388
Rape     -0.5434 -0.1673  0.8178  0.08902

There are 4 prinicipal components
There are in general \( min(n-1, p) \) informative prinicipal components in a data set with n observations and p variables.

Biplot

Biplot scale=0 ensures that the arrows are scaled to represent the loadings.

Sign change doesn't change the analysis

Prinicipal components are only unique up to a sign change.

pr.out$rotation = -pr.out$rotation
pr.out$x = -pr.out$x
biplot(pr.out, scale=0)

plot of chunk unnamed-chunk-6

Proportion of Variance Explained (PVE)

pr.out$sdev

[1] 1.5749 0.9949 0.5971 0.4164

pr.var = pr.out$sdev^2
pr.var

[1] 2.4802 0.9898 0.3566 0.1734

pve = pr.var / sum(pr.var)
pve

[1] 0.62006 0.24744 0.08914 0.04336

PVE plot - "Scree plot"

plot of chunk unnamed-chunk-8

Cumulative PVE plot

plot of chunk unnamed-chunk-9

cumsum() calculates the cumulative sum of a numeric vector.

Lab 10 from ISLR Book (Stats 216 Stanford)

Principle Component Analysis

Principle Component Analysis Exericse

Check out the titles

Examine the data

Using the prcomp() function

rotation matrix

Biplot

Sign change doesn't change the analysis

Proportion of Variance Explained (PVE)

PVE plot - "Scree plot"

Cumulative PVE plot

Thank you