Shoe Image Analysis

This analysis involves loading, processing, and analyzing a collection of shoe images to understand patterns and variations.

Set Up Environment

knitr::opts_chunk$set(echo = TRUE)
library(doParallel)
library(foreach)
library(jpeg)
library(EBImage)

image_path <- "/Users/haigbedros/Desktop/MSDS/Spring 24/605/HW/HW4/Eigenimagery-Variability-Analysis/jpg"
files <- list.files(path=image_path, pattern="\\.jpg", full.names = TRUE)

View Shoes

Adjustment parameters are set to standardize image sizes for analysis. A function to plot images from their paths is defined.

height <- 1200
width <- 2500
scale <- 20

plot_jpeg <- function(path, add=FALSE) {
  jpg <- readJPEG(path, native=TRUE)
  res <- dim(jpg)[2:1]
  if (!add) {
    plot(1, 1, xlim=c(1, res[1]), ylim=c(1, res[2]), 
         asp=1, 
         type='n', 
         xaxt='n', yaxt='n', 
         xlab='', ylab='', 
         bty='n')
  }
  rasterImage(jpg, 1, 1, res[1], res[2])
}

Load Data

Images are loaded, resized, and stored in an array for further analysis.

im <- array(rep(0, length(files) * height / scale * width / scale * 3), 
            dim=c(length(files), height / scale, width / scale, 3))

for (i in seq_along(files)) {
  temp <- EBImage::resize(readJPEG(files[i]), height / scale, width / scale)
  im[i, , , ] <- array(temp, dim=c(1, height / scale, width / scale, 3))
}

Display Images

A subset of images is displayed to verify the loading and resizing process.

par(mfrow=c(3, 3), mai=c(.3, .3, .3, .3))

# I adjusted the loop to have a dynamic range, ensuring it adapts to the actual number of available images, and combined the par settings for efficiency and conciseness.
for (i in 1:min(length(files), dim(im)[1])) { 
  plot_jpeg(writeJPEG(im[i, , , ]))
}

Principal Component Analysis

PCA is performed to identify major patterns in the image data, reducing dimensionality.

newdata <- im
dim(newdata) <- c(length(files), height * width * 3 / scale^2)
mypca <- princomp(t(as.matrix(newdata)), scores=TRUE, cor=TRUE)
mycomponents <- mypca$sdev^2 / sum(mypca$sdev^2)

Eigenshoes

Principal components are visualized as ‘Eigenshoes’, representing major variation patterns.

mypca2 <- t(mypca$scores)
dim(mypca2) <- c(length(files), height / scale, width / scale, 3)
par(mfrow=c(5, 5), mai=c(.001, .001, .001, .001))

# I adjusted the loop to have a dynamic range, ensuring it adapts to the actual number of available images, and combined the par settings for efficiency and conciseness.
for (i in 1:min(length(files), dim(mypca2)[1])) {  
  plot_jpeg(writeJPEG(mypca2[i, , , ], quality=1, bg="white"))
}

Discussion

In this analysis, we demonstrate a comprehensive approach to processing and analyzing a dataset of shoe images. Initially, we set up the environment by loading necessary libraries and setting global chunk options for consistent output. We utilize the list.files function with full.names = TRUE to directly obtain the full path of each image, simplifying subsequent file reading operations.
In the “View Shoes” section, we define a function to plot images from their paths, streamlining the visualization process in later stages.
The “Load Data” section involves resizing images to a standard dimension, crucial for analysis consistency and reducing computational load. Employing EBImage::resize within a loop ensures that all images are uniformly processed.
Displaying a subset of images allows for a visual check of the preprocessing steps, ensuring that images are correctly loaded and resized.
Principal Component Analysis (PCA) is a critical step in understanding the underlying patterns in the dataset. By transforming the image data into a set of principal components, this method highlights the major sources of variation among the images, which is particularly useful in identifying distinguishing features of the shoes.
Finally, visualizing the principal components as “Eigenshoes” provides intuitive insights into the most significant patterns captured by PCA. This includes aspects like shape, color distribution, and design features that differentiate the shoes in the dataset.
Overall, our analysis efficiently handles image data, from preprocessing to advanced analysis, providing valuable insights into the dataset’s underlying structure.

DATA 605 - Assignment 4

Haig Bedros

2024-02-16