The following webpages were referenced to preform this analysis, as
was the professor’s blackboard example:
https://rpubs.com/tahmad/DATA605Assignment4
https://rpubs.com/R-Minator/eigenshoes
https://rpubs.com/jefflittlejohn/Data605_HW4
The following exercise uses Principle Component Analysis(PCA) to ‘build and visualize eigenimagery that accounts for 80% of the variability’ of 17 images of shoes.
Load Packages
The following packages were used in this analysis.
library(tidyverse)
library(utils)
library(jpeg)
library(OpenImageR)
library(EBImage)
Downloading Images
A zip file containing the images was placed in a github repository. The
zip file was then downloaded via R and contents unzipped.
url <- r"(https://github.com/greggmaloy/Data_605/blob/main/jpg.zip?raw=TRUE","jpg.zip")"
zipfilename <- r"(eigenjpg.zip)"
unzippath <- r"(eigenjpg)"
download.file(url, zipfilename, mode = "wb")
unzip(zipfilename, exdir = unzippath)
Image Matrix
Below individual file names are placed in a new variable ‘names’. Image
dimensions specified via ‘height’, ‘width’ and ‘scale’ variable
creation. The array ‘im’ is initialized and finally the images are
resized and placed in the ’im’array.
names <- list.files(path = unzippath, full.names = TRUE)
height <- 1200
width <- 2500
scale <- 20
im <- array(rep(0,length(names)*height/scale*width/scale*3), dim = c(length(names), height/scale, width/scale,3))
for (i in 1:length(names)){
temp=resize(readJPEG(names[i]),height/scale, width/scale)
im[i,,,]=array(temp,dim=c(1, height/scale, width/scale,3))}
The Images
Below the images plotted.
plot_jpeg = function(path, add=FALSE)
{ jpg = readJPEG(path, native=T)
res = dim(jpg)[2:1]
if (!add)
plot(1,1,xlim=c(1,res[1]),ylim=c(1,res[2]),asp=1,type='n',xaxs='i',yaxs='i',xaxt='n',yaxt='n',xlab='',ylab='',bty='n')
rasterImage(jpg,1,1,res[1],res[2])
}
par(mfrow=c(6,3))
par(mai=c(.01,.01,.01,.01))
for (i in 1:length(names)){
plot_jpeg(writeJPEG(im[i,,,]))
}
Rasterization
Next the images are rasterized so the images can be represented
mathematically. The result is a dataframe with 17 columns and 382,000
rows with each column representing a shoe.
flat <- matrix(0, length(names), prod(dim(im)))
for (i in 1:length(names)) {
newim <- readJPEG(names[i])
r <- as.vector(im[i,,,1])
g <- as.vector(im[i,,,2])
b <- as.vector(im[i,,,3])
flat[i,] <- t(c(r, g, b))
}
shoes <- as.data.frame(t(flat))
shoes
Covariance Matrix Calculation
Below we prepare for principle component analysis by calculating the
covariance matrix. The data is standardized by centering the data
(subtracting each observation from the mean and then dividing by the
standard deviation via the scale function). Finally the covariance is
calculated for each shoe and represented in the matix ‘Sigma_’.
scaled=scale(shoes, center = TRUE, scale = TRUE)
Sigma_=cor(scaled)
Calculating Eigenvalues and Eigenvectors
Eigenvalues and eigenvectors are computed using the above covariance.
The eigenvalues here represent the amount of variance explained by each
component and the eigenvectors represent the direction of the
variance.
myeigen=eigen(Sigma_)
PCA
Below the variance for each component is calulcated. The first three
images account for 80% of the variability. (I believe this is
correct)
variance <- myeigen$values / sum(myeigen$values)
variance
## [1] 0.692820228 0.101224662 0.051062366 0.027277491 0.018999399 0.016249609
## [7] 0.013994442 0.012060722 0.009698242 0.009058322 0.008458196 0.007987044
## [13] 0.007632793 0.006697401 0.006182362 0.005955453 0.004641268
sum(variance[1:2])
## [1] 0.7940449
sum(variance[1:3])
## [1] 0.8451073
Validating the PCA
Below the above PCA is validated using the code below which I found
online. The results are identical. THe first three images account for
80% of variability.
newdata=im
dim(newdata)=c(length(names),height*width*3/scale^2)
mypca=princomp(t(as.matrix(newdata)), scores=TRUE, cor=TRUE)
mycomponents=mypca$sdev^2/sum(mypca$sdev^2)
sum(mycomponents[1:3]) #first 19 components account for 80% of variability
## [1] 0.8451073
Visualizing the Three Principle Components
Below are the three principle components which account for 80% of
variability.
mypca2 <- t(mypca$scores)
dim(mypca2) <- c(length(names), height/scale, width/scale, 3)
par(mfrow=c(5, 5))
par(mai=c(.001,.001,.001,.001))
# Loop to plot the first 3 Eigenshoes
for (i in 1:min(3, dim(mypca2)[1])) {
plot_jpeg(writeJPEG(mypca2[i,,,], bg="white"))
}