Eigen Image Variability



Eigen Images or Eigen Faces is a Classification approach that is often used for such things as facial recognition.


This assignment will calculate Eigen Vectors for a set of sneaker jpegs, and look at the variablity.


Along the way, it will review some of the key concepts of Eigen Decomposition



Consume The Images



Create a matrix in which each image is loaded into one column.


The RGB colors are 3 vectors so we will layer them into one vector.


Note : - length(names) is 17 for the 17 images in the local directory - prod(dim(img_template) is the product of 3 dimensions: row, column, and color composite layer (RGB)


setwd("C:\\Users\\arono\\CUNY\\DATA 605 Computational Math\\jpgs")

path<-'C:\\Users\\arono\\CUNY\\DATA 605 Computational Math\\jpgs'
pic<-'\\RC_2500x1200_2014_us_54106.jpg'

pathnpic<-paste0(path,pic)

img_template<-readJPEG(pathnpic)    # jpeg package

# imageShow(img_template)            # OpenImageR package



names <- list.files(path,pattern = "jpg")

# create a matrix, one column per image, with the number of pixels in each image
data <- matrix(0, length(names), prod(dim(img_template))) 

for (i in 1:length(names)) {
  im <- readJPEG(names[i])
  r  <- as.vector(im[,,1])
  g  <- as.vector(im[,,2])
  b  <- as.vector(im[,,3])
  

  data[i,] <- t(c(r, g, b))
}


sneakers <- data.frame(x = data)





Principle Component Analysis



PCA is usually referred to as a dimensionality-reduction method.


It reflects the idea that a set of faces, or similar objects, have a shared set of visual attributes, and that it is useful to “normalize” the image data before assessing variablity.


The specific steps are [1] .

  1. Standardize the range of continuous initial variables
  2. Compute the covariance matrix to identify correlations
  3. Compute the eigenvectors and eigenvalues of the covariance matrix to identify the principal components
  4. Create a feature vector to decide which principal components to keep
  5. Recast the data along the principal components axes


The PCA process isolates “principal components”, by ordering the eigen values by variability.


Note SVD or Singular Value Decomposition refers to the factorization of an Eigen Matrix A where :

\[A \ = \ U\sum{}V\]



where [2]
     U = mxn matrix of the orthonormal eigenvectors of \(AA^T\)
     \(\sum{}\) = a nxn diagonal matrix of the singular values which are the square roots of the eigenvalues of \(A^TA\)
     V = transpose of a nxn matrix containing the orthonormal eigenvectors of \(A^TA\)



We wont be explicitly factoring but we will the scale(), cov() and eigen() to perform PCA and Variance analysis.





Scale()




scale() normalizes your data range in 2 ways


  • when scale=TRUE (the default) it subtracts the mean from each number
  • when center=TRUE (the default) it divides each number by the root mean square


The root mean square is


\[rms \ = \ \sqrt{\sum_{v_{1}}^{V_{n}} \frac{v_{i}^2}{n-1}}\]


Note scaling a matrix is a column by column process…


scaled_sneakers<-scale(sneakers)





Covariance()



Recall that Covariance is the product of X variance and Y variance, and Correlation is the Covariance divided by the standard error.


The Covariance equation is:

\[ cov(x,y) \ = \ \sum{\frac{(x \ - \ \bar{x} )(y \ - \ \bar{y} )}{n-1}} \]


Applying covariance to a matrix is also a column by column process.


Thus the diagonal ( in which x and y are the same ) is just the variance of that column.


Also worth noting is that the non diagonal elements are symmetrical, that is position [i,j] = position [j,i]


The R cov() function can throw memory errors on large matrixes but we can avoid them by calculating covariance directly.

so calculate the covariance of the scaled sneakers data frame by taking the dot product of our scaled matrix and its transpose…


# not sure why some na's sneaked in...

scaled_sneakers[is.na(scaled_sneakers)] = 1


cov_scaled_sneakers<-scaled_sneakers %*% t(scaled_sneakers) / (nrow(scaled_sneakers)-1)



Variability



Now calculate some of the key variability metrics…


eig          <- eigen(cov_scaled_sneakers)
eigenvalues  <- eig$values
eigenvectors <- eig$vectors

prop.var <- eigenvalues / sum(eigenvalues)
cum.var  <- cumsum(eigenvalues) / sum(eigenvalues)
thres    <- min(which(cum.var > .85))


Note how the eigen function ordered the eigenvalues from large to small…


eigenvalues
##  [1] 2953243.19  931160.55  287326.28  157137.35  142832.33  125512.95
##  [7]   90041.28   81462.18   75325.26   65911.23   63225.39   59913.37
## [13]   52261.85   50153.97   44327.96   37528.76   18856.30


The cum.var vector is useful to assess which components account for the total variablity.


cum.var
##  [1] 0.5640029 0.7418335 0.7967064 0.8267161 0.8539938 0.8779640 0.8951598
##  [8] 0.9107173 0.9251027 0.9376902 0.9497649 0.9612070 0.9711878 0.9807661
## [15] 0.9892317 0.9963989 1.0000000


The thresh variable is essentially telling us that the first 5 components account for 85% of the variablity.


thres 
## [1] 5