Lab 10 from ISLR Book (Stats 216 Stanford)

Bruno Wu
March 4, 2014

Principle Component Analysis

Principle Component Analysis Exericse

Using USArrests data set

  • This is from Lab 10 from ISLR book
  • Testing out how to use R Presentation on RStudio
  • See if this can create 2 columns

Check out the titles

states = row.names(USArrests)
head(states)
[1] "Alabama"    "Alaska"     "Arizona"    "Arkansas"   "California"
[6] "Colorado"  

Examine the data

apply(USArrests, 2, mean)
  Murder  Assault UrbanPop     Rape 
   7.788  170.760   65.540   21.232 
apply(USArrests, 2, var)
  Murder  Assault UrbanPop     Rape 
   18.97  6945.17   209.52    87.73 
  • Assault has the largest mean and variance. So if we don't scale the variables, Assault will dominate the principal components.

Using the prcomp() function

pr.out = prcomp(USArrests, scale=T)
names(pr.out)
[1] "sdev"     "rotation" "center"   "scale"    "x"       
pr.out$center
  Murder  Assault UrbanPop     Rape 
   7.788  170.760   65.540   21.232 
pr.out$scale
  Murder  Assault UrbanPop     Rape 
   4.356   83.338   14.475    9.366 

rotation matrix

pr.out$rotation
             PC1     PC2     PC3      PC4
Murder   -0.5359  0.4182 -0.3412  0.64923
Assault  -0.5832  0.1880 -0.2681 -0.74341
UrbanPop -0.2782 -0.8728 -0.3780  0.13388
Rape     -0.5434 -0.1673  0.8178  0.08902
  • There are 4 prinicipal components
  • There are in general \( min(n-1, p) \) informative prinicipal components in a data set with n observations and p variables.

Biplot

  • Biplot scale=0 ensures that the arrows are scaled to represent the loadings. plot of chunk unnamed-chunk-5

Sign change doesn't change the analysis

  • Prinicipal components are only unique up to a sign change.
pr.out$rotation = -pr.out$rotation
pr.out$x = -pr.out$x
biplot(pr.out, scale=0)

plot of chunk unnamed-chunk-6

Proportion of Variance Explained (PVE)

pr.out$sdev
[1] 1.5749 0.9949 0.5971 0.4164
pr.var = pr.out$sdev^2
pr.var
[1] 2.4802 0.9898 0.3566 0.1734
pve = pr.var / sum(pr.var)
pve
[1] 0.62006 0.24744 0.08914 0.04336

PVE plot - "Scree plot"

plot of chunk unnamed-chunk-8

Cumulative PVE plot

plot of chunk unnamed-chunk-9

  • cumsum() calculates the cumulative sum of a numeric vector.

Thank you