Eigenvectors & Image Recognition

1. With the attached data file, build and visualize eigenimagery that accounts for 80% of the variability. Provide full R code and discussion.

Since Dr. Fulton went over this in class, I broke down the code for a single shoe in a 2nd section beneath the “All Shoes” section to ensure I had a complete understanding, as it felt.. not beneficial to just copy and paste his code. I will attempt an alternative method if time allows.
To see the break down and discussion, skip to the “Single Shoe & Discussion” section.

Answer:
Two eigenvectors of the first two shoes account for 79% of variance and three eigenvectors of the first three shoes account for 84% of variance. I have plotted imagery of the three shoes in the Eigenshoes section and established variance in the Variance section.

Notes from Lecture on 9/20:

The process of Eigenshoes is going to be:

  1. Represent the images as a matrix of pixel values
  2. For the first image, create weights for each of the columns that would absorb the most variance. For the second image, repeat this but make the second set of weights orthogonal to the first that captures the variance that is left after the first eigenvector of weights we created.
  3. Note that both of the set of weights created in steps 3 and 4 are Eigenvectors
  4. We repeat step 2 for each image of shoes

Note that by doing this, we can re-create the images of the shoes by reverse weighting our eigenvectors.

library(jpeg) #for readJPEG()
library(EBImage) #for the resize() function

All Shoes

Loading Shoes

filepath <- "C:\\Users\\dcrai\\source\\repos\\DATA605\\Week 4\\jpg\\"

files = list.files(filepath, pattern = "\\.jpg")
height <- 1200
width <-2500
scale <- 20

plot_jpeg = function(path, add = FALSE) {
  require('jpeg') #ensure jpeg package loaded
  jpg <- readJPEG(path, native=T)
  res = dim(jpg)[2:1] #x is 2nd element, y is 1st
  if (!add) #initializing emply plot if add ==False which we set in our function intialization as an option
    plot(1,1,xlim = c(1,res[1]),ylim=c(1,res[2]),
                      asp = 1, #asp ratio
                      type = 'n', #don't plot
                      xaxs = 'i', yaxs = 'i', #prevents expanding axis
                      xaxt = 'n', yaxt = 'n', xlab = '',ylab = '', #no axes or labels
                      bty = 'n') #no box
  rasterImage(jpg,1,1,res[1],res[2])
                      
}

Load Data into Array

im <- array(rep(0, length(files)*height/scale*width/scale*3), 
            #set dims to N, x, y, 3 color, 4D Array
            dim=c(length(files), height/scale,width/scale,3))

for(i in 1:length(files)) { #for each file
  tmp <- paste0(filepath, files[i]) #define the filepath by concatenating our previous variables
  temp <- EBImage::resize(readJPEG(tmp),height/scale, width/scale)
  
  im[i,,,] <- array(temp,dim=c(1, height/scale, width/scale,3))
}

Generate Principal Components

height <- 1200
width <- 2500
scale <- 20
newdata <- im

dim(newdata) <- c(length(files),height*width*3/scale^2)
mypca <- princomp(t(as.matrix(newdata)), scores = TRUE, cor = TRUE)
sum(mypca$sdev^2/sum(mypca$sdev^2)) #verify sum of variance =1
## [1] 1

Variance

We check below to find that only 3 shoes are needed to crack 80% variance. 2 shoes only get us to 79%

mycomponents <- mypca$sdev^2/sum(mypca$sdev^2)
sum(mycomponents[1:3])
## [1] 0.8451073
sum(mycomponents[1:2])
## [1] 0.7940449

Eigenshoes

mypca2 <- t(mypca$scores)
dim(mypca2) <- c(length(files),height/scale,width/scale,3)
par(mfrow=c(1,3))
par(mai=c(.001,.001,.001,.001))

for (i in 3:1) { #plot first 3 Eigenshoes only
 plot_jpeg(writeJPEG(mypca2[i,,,], quality = 1, bg = "white")) 
}

Single Shoe & Discussion

For now I am going to attempt a re-creation of Dr. Fulton’s code while explaining what’s going on to show I have understanding. I will search for alternative methods and attempt a re-creation where the “average” shoe is removed before PCA, using a different function to calculate PCA, and see if I can “re-build” the shoes by adding average shoe eigenvectors together.
I will be breaking down the code line by line and highlighting important aspects so I can use this technique in the future.
Notes on Raster Array:

  • readJPEG() will import the jpeg into a raster array
  • Raster Arrays is data in a grid format that distinguishes itself from matrices by being regularized
    • Regularized data means it’s been spaced uniformly. Typically, this is used to represent geographic locations in each cell of the grid, and each cell is spaced uniformly from its adjacent cells.
  • rasterImage(): draws a raster image at given locations
    • Effectively, this is applying the uniform spacing from a raster array to represent each of the cells in a pixelated image such as our shoes
height <- 1200 #not sure why we set these as the dims from readJPEG() pull out as 1200 and 2500
width <-2500
scale <- 20

single <- "C:\\Users\\dcrai\\source\\repos\\DATA605\\Week 4\\jpg\\RC_2500x1200_2014_us_53446.jpg"

single_jpg <- readJPEG(single, native=T)
attributes(single_jpg)
## $dim
## [1] 1200 2500
## 
## $class
## [1] "nativeRaster"
## 
## $channels
## [1] 3

We can see that readJPEG created a raster object with 3 channels, one for each of RGB pixel coloration.
Below we are grabbing the resolution from the image so we can create a plot with similar dimensions so we can accurately display each image on a plot that represents each pixel value within the raster array. This will create a human interpretable image of the shoe.

single_res <- dim(single_jpg)[2:1] #We set the dimensions height = 1200, and width = 2500 earlier. Height will correspond with y, and width to x so we flip them here via "[2:1]" to represent them in the order of "x,y" like a coordinate plane
single_res
## [1] 2500 1200

The below code is creating a white background that matches the dimensions of our image.
rasterImage() needs to be called directly after the plot function to work.
rasterImage() will “plot” the cell values on the plot while matching the cell values in the raster array’s to locations on the coordinate plane and then representing the color of the RBG pixel at that location. Effectively re-creating the image for us.

plot(1,1, #sets the x and y coordinates of the plot to begin at 1,1
     xlim = c(1,single_res[1]), #set our x and y limits equal to the dimensions of each jpeg img
     ylim=c(1,single_res[2]),  # single_res[2] = 1200 or height
                      asp = 1, #asp ratio
                      type = 'n', #don't plot
                      xaxs = 'i', yaxs = 'i', #prevents expanding axis
                      xaxt = 'n', yaxt = 'n', xlab = '',ylab = '', #no axes or labels
                      bty = 'n') #no box

rasterImage(single_jpg,1,1,single_res[1],single_res[2])

Loading into Array

Below we are going to create an array to store the individual pixel/cell values into an array to later use PCA to create our eigenvector. I believe we are scaling to perform operations on a smaller array.

tmp <- paste0(filepath, files[1]) #define the filepath by concatenating our previous variables
temp <- EBImage::resize(readJPEG(tmp), #declare file to resize
                          height/scale,  #declare width resize factor
                          width/scale) #declare height resize factor
  
single_im_filled <- array(temp,dim=c(1, height/scale, width/scale,3))



Generate Principal Components

Below we store our array of pixel, channel, and RGB values into a new object to flatten the array from 4D to 1D. Imagine appending each matrix column to the end of the first. We do this to prep the data for princomp(). The important part of this is that when performing PCA, the process creates eigenvectors from the covariance matrices when the method is searching for pixels that explain the most variance. That is how the eigenvectors are generated. We are cleverly “hijacking” PCA’s eigenvectors for our own purposes.


height <- 1200
width <- 2500
scale <- 20
single_newdata <- single_im_filled


#we change the dim to flatten the array to 1D, but it keeps the values
# imagine appending each matrix column to the end of the each other for one long series of values
dim(single_newdata) <- c(1,height*width*3/scale^2)

single_mypca <- princomp(t(as.matrix(single_newdata)), scores = TRUE, cor = TRUE) #scores =TRUE gives access to values for each column to create the eigenvector
head(unique(single_mypca$scores))
##          Comp.1
## [1,]  0.6510137
## [2,]  0.3782734
## [3,]  0.6414439
## [4,]  0.6462288
## [5,]  0.3017148
## [6,] -0.4208077


Below we are checking to ensure our variance adds up to 1 to ensure no issues are ocurring

sum(single_mypca$sdev^2/sum(single_mypca$sdev^2)) #verify sum of variance =1
## [1] 1

Eigenshoe

single_mypca2 <- t(single_mypca$scores) #transpose the score
dim(single_mypca2) <- c(1,height/scale,width/scale,3) #change the dims back to 4D Array

par(mai=c(.001,.001,.001,.001)) # par is a function used to set params for the graphics package and plotting, mai sets margin size for plots

dim(single_mypca2)
## [1]   1  60 125   3
plot_jpeg(writeJPEG(mypca2[i,,,], quality = 1, bg = "white")) 


Since there is only one shoe, the variance will be 1.

(single_mycomponents <- single_mypca$sdev^2/sum(single_mypca$sdev^2))
## Comp.1 
##      1