With seventeen provided jpg data files, build and visualize eigenimagery that accounts for 80% of the variability.
For a roadmap and codehow we relied heavily on Doctor Larry’s rpubs.com/R-Minator/eigenshoes.
In this project we are encoding images of 17 distinct shoes into a matrix of data. The first eigenvector is a line through the 17 images or dimensions that most closely represents the complexity of the 17 shoes. The second eigenvector is then drawn as a line, orthogonal in one of the 17 dimensions to the first line, such that it bests captures the remaining complexity of the 17 shoes. (Note that orthogonal basically means perpendicular but for higher dimensions.) You make the 3rd and the 4th and so on down the line until we have 17 eigenvectors that capture all of the complexity of the 17 shoes. If one shoe can entirely be made up of the other 16 shoes then we won’t be able to find a 17th eigenvector. We’ll be able to capture 100% of the complexity of the system with just 16 eigenvectors.
An eigenvector derived from a matrix of images is itself an image and usually looks like a ghostly ethereal version of what it’s a composite of. Once we have a set of these eigenvectors/eigenimages we’ll be able to recreate any of the 17 shoes like a recipe: 0.3 units of the first eigenimage, 0.1, 0 and 0.8 units of the second, third and 4th eigenimage and so on down the line.
The value of this is we can closely approximate the image of an 18th shoe, not included in our original set, just by specifying how much of each eigenimage to add together. This is how image compression works. Everyone agrees to using the same large set of giant eigenimages and then you can pass entire images which would normally be megabytes of data, as a few bytes, just the multiples of each eigenimage needed to recreate the communicated image.
Since all of our beginning images are closely related we’ll find that with only the first few eigenimages we can already account for the majority of the complexity among our images.
Let’s check it out!
Note that while we couldn’t download EBImage
directly,
we were able to install it through the intermediary package,
BiocManager
.
Note that I had to reinstall the latest version of R. You can check
what version of R you are running by typing R.Version()
in
the console.
While I was doing that I also downloaded XQuartz
, which
was linked in the same page where I downloaded the current version of R
for my operating system. XQuartz
lets you use the X11
library to access some libraries that are no longer supported with the
new mac operating systems. I needed it for the imager
library was not necessary for the final file.
library(jpeg)
library(OpenImageR)
#install.packages("BiocManager")
#BiocManager::install("EBImage")
library(EBImage)
num=17
files=list.files("./D605 Homework Week4 jpg", pattern="\\.jpg")[1:num]
height=1200; width=2500;scale=20
plot_jpeg = function(path, add=FALSE)
{ jpg = readJPEG(path, native=T) # read the file
res = dim(jpg)[2:1] # get the resolution, [x, y]
if (!add) # initialize an empty plot area if add==FALSE
plot(1,1,xlim=c(1,res[1]),ylim=c(1,res[2]),asp=1,type='n',xaxs='i',yaxs='i',xaxt='n',yaxt='n',xlab='',ylab='',bty='n')
rasterImage(jpg,1,1,res[1],res[2])
}
im=array(rep(0,length(files)*height/scale*width/scale*3), dim=c(length(files), height/scale, width/scale,3))
In order to build the matrix we’ll first cut the images into vertical strips one pixel wide and stack those vertical strips on top of each other so we get one long vector. It won’t look like anything to you or I but this is how machines read images and it doesn’t matter if pixels are next to each other or not for them to detect the patterns, only that the order is fixed.
for (i in 1:17){
temp=resize(readJPEG(paste0("./D605 Homework Week4 jpg/", files[i])),height/scale, width/scale)
im[i,,,]=array(temp,dim=c(1, height/scale, width/scale,3))}
Each image, now one pixel wide, is one column in a 17 column matrix.
flat=matrix(0, 17, prod(dim(im)))
for (i in 1:17) {
newim <- readJPEG(paste0("./D605 Homework Week4 jpg/", files[i]))
r=as.vector(im[i,,,1]); g=as.vector(im[i,,,2]);b=as.vector(im[i,,,3])
flat[i,] <- t(c(r, g, b))
}
shoes=as.data.frame(t(flat))
Here we display the images that the matrix represents.
par(mfrow=c(3,3))
par(mai=c(.3,.3,.3,.3))
for (i in 1:17){ #plot the first images only
plot_jpeg(writeJPEG(im[i,,,]))
}
scaled=scale(shoes, center = TRUE, scale = TRUE)
mean.shoe=attr(scaled, "scaled:center") #saving for classification
std.shoe=attr(scaled, "scaled:scale") #saving for classification...later
Sigma_=cor(scaled)
myeigen=eigen(Sigma_)
cumsum(myeigen$values) / sum(myeigen$values)
## [1] 0.6928202 0.7940449 0.8451073 0.8723847 0.8913841 0.9076338 0.9216282
## [8] 0.9336889 0.9433872 0.9524455 0.9609037 0.9688907 0.9765235 0.9832209
## [15] 0.9894033 0.9953587 1.0000000
scaling=diag(myeigen$values[1:5]^(-1/2)) / (sqrt(nrow(scaled)-1))
eigenshoes=scaled%*%myeigen$vectors[,1:5]%*%scaling
par(mfrow=c(2,3))
imageShow(array(eigenshoes[,1], c(60,125,3)))
imageShow(array(eigenshoes[,2], c(60,125,3)))
imageShow(array(eigenshoes[,3], c(60,125,3)))
imageShow(array(eigenshoes[,4], c(60,125,3)))
imageShow(array(eigenshoes[,5], c(60,125,3)))
height=1200
width=2500
scale=20
newdata=im
dim(newdata)=c(length(files),height*width*3/scale^2)
mypca=princomp(t(as.matrix(newdata)), scores=TRUE, cor=TRUE)
mypca2=t(mypca$scores)
dim(mypca2)=c(length(files),height/scale,width/scale,3)
par(mfrow=c(5,5))
par(mai=c(.001,.001,.001,.001))
for (i in 1:17){#plot the first 25 Eigenshoes only
plot_jpeg(writeJPEG(mypca2[i,,,], bg="white")) #complete without reduction
}
a=round(mypca$sdev[1:17]^2/ sum(mypca$sdev^2),3)
cumsum(a)
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8 Comp.9 Comp.10
## 0.693 0.794 0.845 0.872 0.891 0.907 0.921 0.933 0.943 0.952
## Comp.11 Comp.12 Comp.13 Comp.14 Comp.15 Comp.16 Comp.17
## 0.960 0.968 0.976 0.983 0.989 0.995 1.000
x = t(t(eigenshoes)%*%scaled)
x
## [,1] [,2] [,3] [,4] [,5]
## V1 -533.9339 -48.37640 -81.33150 159.855435 115.42295
## V2 -544.3537 186.36373 -54.64147 97.133316 -11.47307
## V3 -419.1762 -280.11383 -141.61570 274.657551 -60.35971
## V4 -507.5895 247.57965 -78.40193 -51.879459 -115.03020
## V5 -535.9770 193.86407 -35.12973 -6.376342 112.55948
## V6 -445.0731 -282.14147 -243.88403 -139.376322 1.82313
## V7 -471.2906 -261.05226 -212.76211 -108.978466 11.58090
## V8 -551.3154 112.45512 -157.66897 -62.116821 -55.99322
## V9 -476.0269 316.47423 -101.85977 -64.190442 -84.28868
## V10 -535.6992 218.56391 15.24172 33.205128 94.64374
## V11 -531.5352 193.19703 84.00358 30.854541 79.74317
## V12 -539.4412 -130.33163 97.80495 53.383974 -91.59271
## V13 -504.0171 -206.41993 107.98056 -97.673234 127.59146
## V14 -516.1920 -138.98540 201.63255 -61.599422 -93.37514
## V15 -537.4005 -50.20620 187.06406 1.204948 -116.46975
## V16 -545.7370 -97.20100 140.50135 -77.357369 86.19848
## V17 -533.5235 -85.24431 176.37984 26.613904 -21.62229