For this assignment, the goal was to use a set of data in the form of 17 different shoe images, and represent the images as matrices. The first set would have the most variance, and the second set would be an eigen-version set of those images, capturing the most essential elements of what makes each of the shoes different.
library(jpeg)
library(EBImage)
image_directory<-'/Users/renida/Desktop/jpg'
files=list.files(path=image_directory, pattern="\\.jpg")
height=1200
width=2500
scale=20
plot_jpeg=function(path,add=FALSE) #initialize fxn
{
require('jpeg')
jpg=readJPEG(path,native=T) #reads the file
res=dim(jpg)[2:1] #gets the resolution, 2=x, 1=y
if(!add) #initialize an empty plot area if add==FALSE
plot(1,1,xlim=c(1,res[1]),ylim=c(1,res[2]), #set the X limits by size
asp=1, #aspect ratio
type='n', #dont plot
xaxs='i', yaxs='i', #prevents expanding axis windows +6% as normal
xaxt='n', yaxt='n', xlab='',ylab='', #no axes or labels
bty='n') #no box around graph
rasterImage(jpg,1,1,res[1],res[2]) #image, xleft,ybottom,xright,ytop
}
im=array(rep(0,length(files)*height/scale/width/scale*3),
#set dimension to N,x,y,3colors, 4d array
dim=c(length(files),height/scale,width/scale,3))
for(i in 1:length(files)){
#define file to be read
tmp <- file.path(image_directory, files[i])
#read the file
temp=EBImage::resize(readJPEG(tmp),height/scale, width/scale)
#assign to the array
im[i,,,]=array(temp,dim =c(1,height/scale,width/scale,3))
}
Here we see all 17 of the shoe images printed out. Most of them look pretty different, but some of them look very similar. It would be more useful to create eigenshoes to determine how different these shoes truly are, since we can more easily view their most distinguishable patterns.
par(mfrow=c(3,3)) #set graphics to 3x3 table
par(mai=c(.3,.3,.3,.3)) #set margins
for(i in 1:17){ #plot the first images only
plot_jpeg(writeJPEG(im[i,,,]))
}
height=1200
width=2500
scale=20
newdata=im
dim(newdata)=c(length(files),height*width*3/scale^2) #redimension my data
mypca=princomp(t(as.matrix(newdata)),scores=TRUE,cor=TRUE) #eigenscores of the images
sum(mypca$sdev^2/sum(mypca$sdev^2)) #verify that sum of variance=1
## [1] 1
ycomponents=mypca$sdev^2/sum(mypca$sdev^2)
sum(ycomponents[1:17]) #first 19 components account for 80% of variability
## [1] 1
sum(ycomponents[1:17]) #first 79 accounts for 90% of variability
## [1] 1
Here we see the same 17 shoes yet again, except this time, their colors are broken down to the most elemental form: red, green, and blue. More than half of the shoes have a design across the side in varying colors. Almost all of them have a colorful sole, and, besides its outline, only one of them appears to be completely one color. Looking at an even larger sample would give us an even better idea of which design this company is producing too much of, based on how each shoe style is performing.
mypca2=t(mypca$scores)
dim(mypca2)=c(length(files),height/scale,width/scale,3)
par(mfrow=c(5,5))
par(mai=c(.001,.001,.001,.001))
for(i in 1:17){ #plot the first 17 shoes
plot_jpeg(writeJPEG(mypca2[i,,,],quality=1,bg="white"))
}