The following webpages were referenced to preform this analysis, as was the professor’s blackboard example:
https://rpubs.com/tahmad/DATA605Assignment4
https://rpubs.com/R-Minator/eigenshoes
https://rpubs.com/jefflittlejohn/Data605_HW4

The following exercise uses Principle Component Analysis(PCA) to ‘build and visualize eigenimagery that accounts for 80% of the variability’ of 17 images of shoes.

Load Packages
The following packages were used in this analysis.

library(tidyverse)
library(utils)
library(jpeg)
library(OpenImageR)
library(EBImage)

Downloading Images
A zip file containing the images was placed in a github repository. The zip file was then downloaded via R and contents unzipped.

url <- r"(https://github.com/greggmaloy/Data_605/blob/main/jpg.zip?raw=TRUE","jpg.zip")"
zipfilename <- r"(eigenjpg.zip)"
unzippath <- r"(eigenjpg)"
download.file(url, zipfilename, mode = "wb")
unzip(zipfilename, exdir = unzippath)

Image Matrix
Below individual file names are placed in a new variable ‘names’. Image dimensions specified via ‘height’, ‘width’ and ‘scale’ variable creation. The array ‘im’ is initialized and finally the images are resized and placed in the ’im’array.

names <- list.files(path = unzippath, full.names = TRUE)

height <- 1200
width <- 2500
scale <- 20


im <- array(rep(0,length(names)*height/scale*width/scale*3), dim = c(length(names), height/scale, width/scale,3))

for (i in 1:length(names)){
  temp=resize(readJPEG(names[i]),height/scale, width/scale)
  im[i,,,]=array(temp,dim=c(1, height/scale, width/scale,3))}

The Images
Below the images plotted.

plot_jpeg = function(path, add=FALSE)
{ jpg = readJPEG(path, native=T) 
  res = dim(jpg)[2:1]
  if (!add) 
    plot(1,1,xlim=c(1,res[1]),ylim=c(1,res[2]),asp=1,type='n',xaxs='i',yaxs='i',xaxt='n',yaxt='n',xlab='',ylab='',bty='n')
  rasterImage(jpg,1,1,res[1],res[2])
}

par(mfrow=c(6,3))
par(mai=c(.01,.01,.01,.01))
for (i in 1:length(names)){  
  plot_jpeg(writeJPEG(im[i,,,]))
}

Rasterization
Next the images are rasterized so the images can be represented mathematically. The result is a dataframe with 17 columns and 382,000 rows with each column representing a shoe.

flat <- matrix(0, length(names), prod(dim(im))) 
for (i in 1:length(names)) {
  newim <- readJPEG(names[i])
  r <- as.vector(im[i,,,1])
  g <- as.vector(im[i,,,2])
  b <- as.vector(im[i,,,3])
  flat[i,] <- t(c(r, g, b))
}
shoes <- as.data.frame(t(flat))
shoes

Covariance Matrix Calculation
Below we prepare for principle component analysis by calculating the covariance matrix. The data is standardized by centering the data (subtracting each observation from the mean and then dividing by the standard deviation via the scale function). Finally the covariance is calculated for each shoe and represented in the matix ‘Sigma_’.

scaled=scale(shoes, center = TRUE, scale = TRUE)
Sigma_=cor(scaled)

Calculating Eigenvalues and Eigenvectors
Eigenvalues and eigenvectors are computed using the above covariance. The eigenvalues here represent the amount of variance explained by each component and the eigenvectors represent the direction of the variance.

myeigen=eigen(Sigma_)

PCA
Below the variance for each component is calulcated. The first three images account for 80% of the variability. (I believe this is correct)

variance <- myeigen$values / sum(myeigen$values)
variance
##  [1] 0.692820228 0.101224662 0.051062366 0.027277491 0.018999399 0.016249609
##  [7] 0.013994442 0.012060722 0.009698242 0.009058322 0.008458196 0.007987044
## [13] 0.007632793 0.006697401 0.006182362 0.005955453 0.004641268
sum(variance[1:2]) 
## [1] 0.7940449
sum(variance[1:3]) 
## [1] 0.8451073

Validating the PCA
Below the above PCA is validated using the code below which I found online. The results are identical. THe first three images account for 80% of variability.

newdata=im
dim(newdata)=c(length(names),height*width*3/scale^2)
mypca=princomp(t(as.matrix(newdata)), scores=TRUE, cor=TRUE)

mycomponents=mypca$sdev^2/sum(mypca$sdev^2)
sum(mycomponents[1:3]) #first 19 components account for 80% of variability
## [1] 0.8451073

Visualizing the Three Principle Components
Below are the three principle components which account for 80% of variability.

mypca2 <- t(mypca$scores)
dim(mypca2) <- c(length(names), height/scale, width/scale, 3)
par(mfrow=c(5, 5))
par(mai=c(.001,.001,.001,.001))

# Loop to plot the first 3 Eigenshoes
for (i in 1:min(3, dim(mypca2)[1])) { 
    plot_jpeg(writeJPEG(mypca2[i,,,], bg="white")) 
}