For this assignment, the goal was to use a set of data in the form of 17 different shoe images, and represent the images as matrices. The first set would have the most variance, and the second set would be an eigen-version set of those images, capturing the most essential elements of what makes each of the shoes different.

Load all packages

library(jpeg)
library(EBImage)

Load all shoes

image_directory<-'/Users/renida/Desktop/jpg'

files=list.files(path=image_directory, pattern="\\.jpg")
height=1200
width=2500
scale=20
plot_jpeg=function(path,add=FALSE)    #initialize fxn
{
require('jpeg')
jpg=readJPEG(path,native=T) #reads the file
res=dim(jpg)[2:1] #gets the resolution, 2=x, 1=y
if(!add) #initialize an empty plot area if add==FALSE
    plot(1,1,xlim=c(1,res[1]),ylim=c(1,res[2]), #set the X limits by size
        asp=1, #aspect ratio
        type='n', #dont plot
        xaxs='i', yaxs='i', #prevents expanding axis windows +6% as normal
        xaxt='n', yaxt='n', xlab='',ylab='', #no axes or labels
        bty='n') #no box around graph
    rasterImage(jpg,1,1,res[1],res[2]) #image, xleft,ybottom,xright,ytop
}

Load shoes into array

im=array(rep(0,length(files)*height/scale/width/scale*3),
    #set dimension to N,x,y,3colors, 4d array
    dim=c(length(files),height/scale,width/scale,3))
for(i in 1:length(files)){
#define file to be read
tmp <- file.path(image_directory, files[i])
#read the file
temp=EBImage::resize(readJPEG(tmp),height/scale, width/scale)
#assign to the array
im[i,,,]=array(temp,dim =c(1,height/scale,width/scale,3))
}

Shoe images

Here we see all 17 of the shoe images printed out. Most of them look pretty different, but some of them look very similar. It would be more useful to create eigenshoes to determine how different these shoes truly are, since we can more easily view their most distinguishable patterns.

par(mfrow=c(3,3)) #set graphics to 3x3 table
par(mai=c(.3,.3,.3,.3)) #set margins
for(i in 1:17){ #plot the first images only
  plot_jpeg(writeJPEG(im[i,,,]))
}

General Principal Components

height=1200
width=2500
scale=20
newdata=im
dim(newdata)=c(length(files),height*width*3/scale^2) #redimension my data
mypca=princomp(t(as.matrix(newdata)),scores=TRUE,cor=TRUE) #eigenscores of the images
sum(mypca$sdev^2/sum(mypca$sdev^2)) #verify that sum of variance=1 
## [1] 1
ycomponents=mypca$sdev^2/sum(mypca$sdev^2)
sum(ycomponents[1:17]) #first 19 components account for 80% of variability
## [1] 1
sum(ycomponents[1:17]) #first 79 accounts for 90% of variability
## [1] 1

Eigenshoes

Here we see the same 17 shoes yet again, except this time, their colors are broken down to the most elemental form: red, green, and blue. More than half of the shoes have a design across the side in varying colors. Almost all of them have a colorful sole, and, besides its outline, only one of them appears to be completely one color. Looking at an even larger sample would give us an even better idea of which design this company is producing too much of, based on how each shoe style is performing.

mypca2=t(mypca$scores)
dim(mypca2)=c(length(files),height/scale,width/scale,3)
par(mfrow=c(5,5))
par(mai=c(.001,.001,.001,.001))
for(i in 1:17){ #plot the first 17 shoes
  plot_jpeg(writeJPEG(mypca2[i,,,],quality=1,bg="white"))
}