Introduction

In many fields such as computer vision, face recognition, and data compression, images can be represented as a matrix of pixel values. These matrices can be used to extract features and perform dimensionality reduction techniques such as Principal Component Analysis (PCA). In this context, Eigen Shoes refers to a set of orthogonal vectors, which represent the most important variations in the dataset of shoe images provided.

–Written by ChatGPT

In this project, a set of images files were uploaded to build and visualize “eigenimagery” that accounts for 80% of the variability. The images were first converted into a list of arrays and then transformed into single column vectors. The vectors were constructed by splitting each image into the R, G, and B channels of the image and concatenated together. These vectors where then placed into a data frame, named shoes in this code, as the columns of the data frame. The data frame is then standardized (subtracting the mean and dividing by the standard deviation for each value of each variable) and the covariance matrix is computed. The eigenvectors and eigenvalues of the covariance matrix are calculated to then identify the principal components of the images. Since we have 17 images, we have 17 principal components. The number of principal components in this assignment that accounted for 80% of the variability in the dataset where PC1, PC2, PC3. The results of this PCA were visualized using imageShow function.

R Code

Reading Pixels

file_names <- list.files("C:/Users/Melissa/OneDrive/Documents/CUNY/Spring 2023/Data 605/HW4/jpg", pattern = "*.jpg", full.names = TRUE)

Load the Data into an Array

for (file in file_names) {
  img <- readImage(file)  #function to read the image file and store it as an object in R
#dim(img)
}

Vectorize

flat <- matrix(0, length(file_names), prod(dim(img))) 
for (i in 1:length(file_names)) {
  im <- readJPEG(file_names[i])
  r  <- as.vector(im[,,1])
  g  <- as.vector(im[,,2])
  b  <- as.vector(im[,,3])
  
  flat[i,] <- t(c(r, g, b))
}

shoes=as.data.frame(t(flat))

Standardization

scaled <- scale(shoes, center = TRUE, scale = TRUE)
mean.shoes <- attr(scaled, "scaled:center")
std.shoes  <- attr(scaled, "scaled:scale")

Calculate Covariance (Correlation)

Sigma_=cor(scaled)

Calculate the non-null eigenvalues of the covariance matrix.

eig          <- eigen(Sigma_)
eigenvalues  <- eig$values
eigenvectors <- eig$vectors

Choose the number of principal components.

prop.var <- eigenvalues / sum(eigenvalues)
cum.var  <- cumsum(eigenvalues) / sum(eigenvalues)
thres    <- min(which(cum.var > .80))

Eigenshoes that accounts for 80% of the variability

scaling=diag(eig$values[1:thres]^(-1/2)) / (sqrt(nrow(scaled)-1))
eigenshoes=scaled%*%eig$vectors[,1:thres]%*%scaling
imageShow(array(eigenshoes[,1], dim(img)))

Second Eigenshoe

imageShow(array(eigenshoes[,2], dim(img)))

Plotted a Scree Graph to show percent variation that each PC accounts for in the data.

pca <- prcomp(shoes, center = TRUE, scale= TRUE)

pca.var <- pca$sdev^2
pca.var.per <- round(pca.var/sum(pca.var)*100,1)
barplot(pca.var.per, main = "Scree Plot", xlab = "Principal Component", ylab = "Percent Variation")

In conclusion, an uploaded set of image files and applied principal component analysis (PCA) to extract the most important features that explain the variability within the dataset. Using the techniques described in the introduction, we were able to build and visualize “eigenimagery” that accounts for 80% of the total variability. By examining these eigenvariables, we can identify the key features that are most significant in differentiating the images. This technique has wide applications in image processing and can be used for tasks such as facial recognition, object detection, and more. Overall, PCA provides a powerful tool for analyzing image data and extracting useful information from it.