Eigenvectors & Image Recognition
1. With the attached data file, build and visualize eigenimagery that accounts for 80% of the variability. Provide full R code and discussion.
Since Dr. Fulton went over this in class, I broke down the code for a
single shoe in a 2nd section beneath the “All Shoes” section to ensure I
had a complete understanding, as it felt.. not beneficial to just copy
and paste his code. I will attempt an alternative method if time allows.
To see the break down and discussion, skip to the “Single
Shoe & Discussion” section.
Answer:
Two eigenvectors of the first two shoes
account for 79% of variance and three eigenvectors of the first three
shoes account for 84% of variance. I have plotted imagery of the three
shoes in the Eigenshoes section and established variance in the Variance
section.
Notes from Lecture on 9/20:
The process of Eigenshoes is going to be:
- Represent the images as a matrix of pixel values
- For the first image, create weights for each of the columns that
would absorb the most variance. For the second image, repeat this but
make the second set of weights orthogonal to the first that captures the
variance that is left after the first eigenvector of weights we created.
- Note that both of the set of weights created in steps 3 and 4 are
Eigenvectors
- We repeat step 2 for each image of shoes
Note that by doing this, we can re-create the images of the shoes by reverse weighting our eigenvectors.
All Shoes
Loading Shoes
filepath <- "C:\\Users\\dcrai\\source\\repos\\DATA605\\Week 4\\jpg\\"
files = list.files(filepath, pattern = "\\.jpg")height <- 1200
width <-2500
scale <- 20
plot_jpeg = function(path, add = FALSE) {
require('jpeg') #ensure jpeg package loaded
jpg <- readJPEG(path, native=T)
res = dim(jpg)[2:1] #x is 2nd element, y is 1st
if (!add) #initializing emply plot if add ==False which we set in our function intialization as an option
plot(1,1,xlim = c(1,res[1]),ylim=c(1,res[2]),
asp = 1, #asp ratio
type = 'n', #don't plot
xaxs = 'i', yaxs = 'i', #prevents expanding axis
xaxt = 'n', yaxt = 'n', xlab = '',ylab = '', #no axes or labels
bty = 'n') #no box
rasterImage(jpg,1,1,res[1],res[2])
}Load Data into Array
im <- array(rep(0, length(files)*height/scale*width/scale*3),
#set dims to N, x, y, 3 color, 4D Array
dim=c(length(files), height/scale,width/scale,3))
for(i in 1:length(files)) { #for each file
tmp <- paste0(filepath, files[i]) #define the filepath by concatenating our previous variables
temp <- EBImage::resize(readJPEG(tmp),height/scale, width/scale)
im[i,,,] <- array(temp,dim=c(1, height/scale, width/scale,3))
}Generate Principal Components
height <- 1200
width <- 2500
scale <- 20
newdata <- im
dim(newdata) <- c(length(files),height*width*3/scale^2)
mypca <- princomp(t(as.matrix(newdata)), scores = TRUE, cor = TRUE)
sum(mypca$sdev^2/sum(mypca$sdev^2)) #verify sum of variance =1## [1] 1
Variance
We check below to find that only 3 shoes are needed to crack 80% variance. 2 shoes only get us to 79%
## [1] 0.8451073
## [1] 0.7940449
Single Shoe & Discussion
For now I am going to attempt a re-creation of Dr. Fulton’s code
while explaining what’s going on to show I have understanding. I will
search for alternative methods and attempt a re-creation where the
“average” shoe is removed before PCA, using a different function to
calculate PCA, and see if I can “re-build” the shoes by adding average
shoe eigenvectors together.
I will be breaking down the code line
by line and highlighting important aspects so I can use this technique
in the future.
Notes on Raster Array:
readJPEG()will import the jpeg into a raster array- Raster Arrays is data in a grid format that distinguishes itself
from matrices by being regularized
- Regularized data means it’s been spaced uniformly. Typically, this is used to represent geographic locations in each cell of the grid, and each cell is spaced uniformly from its adjacent cells.
rasterImage(): draws a raster image at given locations- Effectively, this is applying the uniform spacing from a raster array to represent each of the cells in a pixelated image such as our shoes
height <- 1200 #not sure why we set these as the dims from readJPEG() pull out as 1200 and 2500
width <-2500
scale <- 20
single <- "C:\\Users\\dcrai\\source\\repos\\DATA605\\Week 4\\jpg\\RC_2500x1200_2014_us_53446.jpg"
single_jpg <- readJPEG(single, native=T)
attributes(single_jpg)## $dim
## [1] 1200 2500
##
## $class
## [1] "nativeRaster"
##
## $channels
## [1] 3
We can see that readJPEG created a raster object with 3 channels, one
for each of RGB pixel coloration.
Below we are grabbing the
resolution from the image so we can create a plot with similar
dimensions so we can accurately display each image on a plot that
represents each pixel value within the raster array. This will create a
human interpretable image of the shoe.
single_res <- dim(single_jpg)[2:1] #We set the dimensions height = 1200, and width = 2500 earlier. Height will correspond with y, and width to x so we flip them here via "[2:1]" to represent them in the order of "x,y" like a coordinate plane
single_res## [1] 2500 1200
The below code is creating a white background that matches the
dimensions of our image.
rasterImage() needs to be called directly
after the plot function to work.
rasterImage() will “plot” the cell
values on the plot while matching the cell values in the raster array’s
to locations on the coordinate plane and then representing the color of
the RBG pixel at that location. Effectively re-creating the image for
us.
plot(1,1, #sets the x and y coordinates of the plot to begin at 1,1
xlim = c(1,single_res[1]), #set our x and y limits equal to the dimensions of each jpeg img
ylim=c(1,single_res[2]), # single_res[2] = 1200 or height
asp = 1, #asp ratio
type = 'n', #don't plot
xaxs = 'i', yaxs = 'i', #prevents expanding axis
xaxt = 'n', yaxt = 'n', xlab = '',ylab = '', #no axes or labels
bty = 'n') #no box
rasterImage(single_jpg,1,1,single_res[1],single_res[2])Loading into Array
Below we are going to create an array to store the individual pixel/cell values into an array to later use PCA to create our eigenvector. I believe we are scaling to perform operations on a smaller array.
tmp <- paste0(filepath, files[1]) #define the filepath by concatenating our previous variables
temp <- EBImage::resize(readJPEG(tmp), #declare file to resize
height/scale, #declare width resize factor
width/scale) #declare height resize factor
single_im_filled <- array(temp,dim=c(1, height/scale, width/scale,3))
Generate Principal Components
Below we store our array of pixel, channel, and RGB values into a new object to flatten the array from 4D to 1D. Imagine appending each matrix column to the end of the first. We do this to prep the data for princomp(). The important part of this is that when performing PCA, the process creates eigenvectors from the covariance matrices when the method is searching for pixels that explain the most variance. That is how the eigenvectors are generated. We are cleverly “hijacking” PCA’s eigenvectors for our own purposes.
height <- 1200
width <- 2500
scale <- 20
single_newdata <- single_im_filled
#we change the dim to flatten the array to 1D, but it keeps the values
# imagine appending each matrix column to the end of the each other for one long series of values
dim(single_newdata) <- c(1,height*width*3/scale^2)
single_mypca <- princomp(t(as.matrix(single_newdata)), scores = TRUE, cor = TRUE) #scores =TRUE gives access to values for each column to create the eigenvector
head(unique(single_mypca$scores))## Comp.1
## [1,] 0.6510137
## [2,] 0.3782734
## [3,] 0.6414439
## [4,] 0.6462288
## [5,] 0.3017148
## [6,] -0.4208077
Below we are checking to ensure our variance adds up to 1 to
ensure no issues are ocurring
## [1] 1
Eigenshoe
single_mypca2 <- t(single_mypca$scores) #transpose the score
dim(single_mypca2) <- c(1,height/scale,width/scale,3) #change the dims back to 4D Array
par(mai=c(.001,.001,.001,.001)) # par is a function used to set params for the graphics package and plotting, mai sets margin size for plots
dim(single_mypca2)## [1] 1 60 125 3
Since there is only one shoe, the variance will be 1.
## Comp.1
## 1