Principal Components Analysis

Data Compression Application

Li Zhang

PCA

Principal component analysis (PCA) is a statistical procedure that provides a sequence of best linear approximations to a given high-dimensional observation. This PCA application takes a classic image (lena) in image processing as an example to illustrate the dimension reduction.

The Original Image

lena

Principal Components

library(jpeg)
lena = readJPEG('lena.jpg')
sc <- scale(lena)
svd1 <-svd(sc)
plot(svd1$d^2/sum(svd1$d^2),pch=19,xlab="Principal components", ylab="% Variance explained")

plot of chunk unnamed-chunk-1

Compressed Image

approx <- svd1$u[,1:10] %*% diag(svd1$d[1:10],nrow=10,ncol=10)%*% t(svd1$v[,1:10])
for (icol in 1:ncol(approx)) {
   approx[,icol]=approx[,icol]*attr(sc,'scaled:scale')[icol]+attr(sc,'scaled:center')[icol]
      }
image(t(approx)[,nrow(approx):1],,col=grey(seq(0,1,length=256)),xaxt='n',yaxt='n',asp=1,main=paste("Image Using the First 10 Principal Components"))

plot of chunk unnamed-chunk-2