Introduction

With the attached data file, build and visualize eigen imagery that accounts for 80% of the variability. Provide full R code and discussion.

Method

  1. We first download the data file and visualize one of the images. What the heck type of shoe is this?
test <- readJPEG("/Users/williamaiken/Downloads/jpg/RC_2500x1200_2014_us_53446.jpg")
imageShow(test)

Not a golfer

  1. Next we read in all the images and resize them
path = "/Users/williamaiken/Downloads/jpg/"

filenames <- list.files(path = "/Users/williamaiken/Downloads/jpg/", pattern="*.jpg")

data <- matrix(0, length(filenames), prod(dim(test))) 
for (i in 1:length(filenames)) {
  #im <- readJPEG(paste0(path, filenames[i]))
  im = resizeImage(
    readJPEG(paste0(path, filenames[i])),
    1200/20, 
    2500/20
  )
  r  <- as.vector(im[,,1])
  g  <- as.vector(im[,,2])
  b  <- as.vector(im[,,3])
  
  data[i,] <- t(c(r, g, b))
}
  1. Then we transpose the resulting matrix and cast it as a data frame
shoes <- data.frame(t(data))
  1. To extract the eigenvectors and eigenvalues the images need to be scaled
scaled <- scale(shoes, center = TRUE, scale = TRUE)
  1. Next we find the correlation of our scaled matrix
Sigma_=cor(scaled)
  1. Then we can use our Sigma to extract the eigenvalues and eigenvectors
eig          <- eigen(Sigma_)
eigenvalues  <- eig$values
eigenvectors <- eig$vectors

cumsum(eigenvalues) / sum(eigenvalues)
##  [1] 0.6833796 0.7836119 0.8350410 0.8629807 0.8827157 0.8996343 0.9143339
##  [8] 0.9269998 0.9375131 0.9474672 0.9565156 0.9650405 0.9734219 0.9805532
## [15] 0.9875810 0.9943436 1.0000000
  1. We can look at the cumulative variation and see how many eigenvectors we need to account for 80% of the variation. It turns out we only need the first 3 to acheive this!
prop.var <- eigenvalues / sum(eigenvalues)
cum.var  <- cumsum(eigenvalues) / sum(eigenvalues)
thres    <- min(which(cum.var > .80))
  1. Next we multiply our scaled matrix by our first 3 eigenvectors and scaling
scaling=diag(eig$values[1:thres]^(-1/2)) / (sqrt(nrow(scaled)-1))
eigshoes=scaled%*%eig$vectors[,1:thres]%*%scaling
par(mfrow=c(2,3))
  1. Lastly we look at the 3 resulting eigenimages
imageShow(array(eigshoes[,1], c(60,125,3)))

imageShow(array(eigshoes[,2], c(60,125,3)))

imageShow(array(eigshoes[,3], c(60,125,3)))

Conclusion

It’s amazing how few eigenvectors are required to create images accounting for 80% of the variation in the initial data set

This assignment was incredibly challenging and could not have been accomplished without the reviewing the work of two other data scientists

Diego Herrero

R-Minator