Assignment

With seventeen provided jpg data files, build and visualize eigenimagery that accounts for 80% of the variability.


References

For a roadmap and codehow we relied heavily on Doctor Larry’s rpubs.com/R-Minator/eigenshoes.


Preamble

In this project we are encoding images of 17 distinct shoes into a matrix of data. The first eigenvector is a line through the 17 images or dimensions that most closely represents the complexity of the 17 shoes. The second eigenvector is then drawn as a line, orthogonal in one of the 17 dimensions to the first line, such that it bests captures the remaining complexity of the 17 shoes. (Note that orthogonal basically means perpendicular but for higher dimensions.) You make the 3rd and the 4th and so on down the line until we have 17 eigenvectors that capture all of the complexity of the 17 shoes. If one shoe can entirely be made up of the other 16 shoes then we won’t be able to find a 17th eigenvector. We’ll be able to capture 100% of the complexity of the system with just 16 eigenvectors.

An eigenvector derived from a matrix of images is itself an image and usually looks like a ghostly ethereal version of what it’s a composite of. Once we have a set of these eigenvectors/eigenimages we’ll be able to recreate any of the 17 shoes like a recipe: 0.3 units of the first eigenimage, 0.1, 0 and 0.8 units of the second, third and 4th eigenimage and so on down the line.

The value of this is we can closely approximate the image of an 18th shoe, not included in our original set, just by specifying how much of each eigenimage to add together. This is how image compression works. Everyone agrees to using the same large set of giant eigenimages and then you can pass entire images which would normally be megabytes of data, as a few bytes, just the multiples of each eigenimage needed to recreate the communicated image.

Since all of our beginning images are closely related we’ll find that with only the first few eigenimages we can already account for the majority of the complexity among our images.

Let’s check it out!


Library

Note that while we couldn’t download EBImage directly, we were able to install it through the intermediary package, BiocManager.

Note that I had to reinstall the latest version of R. You can check what version of R you are running by typing R.Version() in the console.

While I was doing that I also downloaded XQuartz, which was linked in the same page where I downloaded the current version of R for my operating system. XQuartz lets you use the X11 library to access some libraries that are no longer supported with the new mac operating systems. I needed it for the imager library was not necessary for the final file.

library(jpeg)
library(OpenImageR) 
#install.packages("BiocManager") 
#BiocManager::install("EBImage")
library(EBImage)



Read and Preprocess Images


Load Images

num=17
files=list.files("./D605 Homework Week4 jpg", pattern="\\.jpg")[1:num] 


Read Images

height=1200; width=2500;scale=20
plot_jpeg = function(path, add=FALSE)
{ jpg = readJPEG(path, native=T) # read the file
  res = dim(jpg)[2:1] # get the resolution, [x, y]
  if (!add) # initialize an empty plot area if add==FALSE
    plot(1,1,xlim=c(1,res[1]),ylim=c(1,res[2]),asp=1,type='n',xaxs='i',yaxs='i',xaxt='n',yaxt='n',xlab='',ylab='',bty='n')
  rasterImage(jpg,1,1,res[1],res[2])
}


Scale Images

im=array(rep(0,length(files)*height/scale*width/scale*3), dim=c(length(files), height/scale, width/scale,3))



Stack into Single Matrix


Array Images

In order to build the matrix we’ll first cut the images into vertical strips one pixel wide and stack those vertical strips on top of each other so we get one long vector. It won’t look like anything to you or I but this is how machines read images and it doesn’t matter if pixels are next to each other or not for them to detect the patterns, only that the order is fixed.

for (i in 1:17){
  temp=resize(readJPEG(paste0("./D605 Homework Week4 jpg/", files[i])),height/scale, width/scale)
  im[i,,,]=array(temp,dim=c(1, height/scale, width/scale,3))}


Stack Matrix

Each image, now one pixel wide, is one column in a 17 column matrix.

flat=matrix(0, 17, prod(dim(im))) 
for (i in 1:17) {
  newim <- readJPEG(paste0("./D605 Homework Week4 jpg/", files[i]))
  r=as.vector(im[i,,,1]); g=as.vector(im[i,,,2]);b=as.vector(im[i,,,3])
  flat[i,] <- t(c(r, g, b))
}
shoes=as.data.frame(t(flat))


Display Images

Here we display the images that the matrix represents.

par(mfrow=c(3,3))
par(mai=c(.3,.3,.3,.3))
for (i in 1:17){  #plot the first images only
plot_jpeg(writeJPEG(im[i,,,]))
}




Principal Component Analysis


Get Eigencomponents from Correlation Structure

scaled=scale(shoes, center = TRUE, scale = TRUE)
mean.shoe=attr(scaled, "scaled:center") #saving for classification
std.shoe=attr(scaled, "scaled:scale")  #saving for classification...later


Calculate Covariance (Correlation)

Sigma_=cor(scaled)


Get the Eigencomponents

myeigen=eigen(Sigma_)
cumsum(myeigen$values) / sum(myeigen$values)
##  [1] 0.6928202 0.7940449 0.8451073 0.8723847 0.8913841 0.9076338 0.9216282
##  [8] 0.9336889 0.9433872 0.9524455 0.9609037 0.9688907 0.9765235 0.9832209
## [15] 0.9894033 0.9953587 1.0000000


Create the Eigenimages

scaling=diag(myeigen$values[1:5]^(-1/2)) / (sqrt(nrow(scaled)-1))
eigenshoes=scaled%*%myeigen$vectors[,1:5]%*%scaling


Display the Eigenimages

par(mfrow=c(2,3))
imageShow(array(eigenshoes[,1], c(60,125,3)))
imageShow(array(eigenshoes[,2], c(60,125,3)))
imageShow(array(eigenshoes[,3], c(60,125,3)))
imageShow(array(eigenshoes[,4], c(60,125,3)))
imageShow(array(eigenshoes[,5], c(60,125,3)))


Generate Principal Components

Transform the images

height=1200
width=2500
scale=20
newdata=im
dim(newdata)=c(length(files),height*width*3/scale^2)
mypca=princomp(t(as.matrix(newdata)), scores=TRUE, cor=TRUE)


Generate the Eigenimages

mypca2=t(mypca$scores)
dim(mypca2)=c(length(files),height/scale,width/scale,3)
par(mfrow=c(5,5))
par(mai=c(.001,.001,.001,.001))
for (i in 1:17){#plot the first 25 Eigenshoes only
plot_jpeg(writeJPEG(mypca2[i,,,], bg="white"))  #complete without reduction
}


Variance Capture

a=round(mypca$sdev[1:17]^2/ sum(mypca$sdev^2),3)
cumsum(a)
##  Comp.1  Comp.2  Comp.3  Comp.4  Comp.5  Comp.6  Comp.7  Comp.8  Comp.9 Comp.10 
##   0.693   0.794   0.845   0.872   0.891   0.907   0.921   0.933   0.943   0.952 
## Comp.11 Comp.12 Comp.13 Comp.14 Comp.15 Comp.16 Comp.17 
##   0.960   0.968   0.976   0.983   0.989   0.995   1.000


New Data Set

x = t(t(eigenshoes)%*%scaled)
x
##          [,1]       [,2]       [,3]        [,4]       [,5]
## V1  -533.9339  -48.37640  -81.33150  159.855435  115.42295
## V2  -544.3537  186.36373  -54.64147   97.133316  -11.47307
## V3  -419.1762 -280.11383 -141.61570  274.657551  -60.35971
## V4  -507.5895  247.57965  -78.40193  -51.879459 -115.03020
## V5  -535.9770  193.86407  -35.12973   -6.376342  112.55948
## V6  -445.0731 -282.14147 -243.88403 -139.376322    1.82313
## V7  -471.2906 -261.05226 -212.76211 -108.978466   11.58090
## V8  -551.3154  112.45512 -157.66897  -62.116821  -55.99322
## V9  -476.0269  316.47423 -101.85977  -64.190442  -84.28868
## V10 -535.6992  218.56391   15.24172   33.205128   94.64374
## V11 -531.5352  193.19703   84.00358   30.854541   79.74317
## V12 -539.4412 -130.33163   97.80495   53.383974  -91.59271
## V13 -504.0171 -206.41993  107.98056  -97.673234  127.59146
## V14 -516.1920 -138.98540  201.63255  -61.599422  -93.37514
## V15 -537.4005  -50.20620  187.06406    1.204948 -116.46975
## V16 -545.7370  -97.20100  140.50135  -77.357369   86.19848
## V17 -533.5235  -85.24431  176.37984   26.613904  -21.62229