In this first R chunk, the required libraries are loaded. Also, the file names are set.
library(jpeg)
library(magick)
## Warning: package 'magick' was built under R version 4.3.2
## Linking to ImageMagick 6.9.12.98
## Enabled features: cairo, freetype, fftw, ghostscript, heic, lcms, pango, raw, rsvg, webp
## Disabled features: fontconfig, x11
library(OpenImageR)
## Warning: package 'OpenImageR' was built under R version 4.3.2
files <- list.files("~/hw4unzipped/",pattern="\\.jpg")[1:17]
Next, the new dimensions for scaling the images are set as variables. They are used a few times throughout the code. The function used to plot the files is created.
In the function, each image is read, the resolution is determined, and then the plots are made for display.
height=1200
width=2500
scale=20
plot_jpeg = function(path, add=FALSE)
{ jpg = readJPEG(path, native=T) # read the file
res = dim(jpg)[2:1] # get the resolution, [x, y]
if (!add) # initialize an empty plot area if add==FALSE
plot(1,1,xlim=c(1,res[1]),ylim=c(1,res[2]),asp=1,type='n',xaxs='i',yaxs='i',xaxt='n',yaxt='n',xlab='',ylab='',bty='n')
rasterImage(jpg,1,1,res[1],res[2])
}
In this section, the path is set to my directory that contains the files. Then, an array is initialized to store the new images. Each file is looped through to resize the jpeg files. The function resize needed to be converted to pixel data as a matrix to make sure the function worked, so the images were converted into a usable format. The new images were added to the array im.
path <- '~/hw4unzipped/'
# Create an empty array to store the resized images
im <- array(0, dim = c(17, height/scale, width/scale, 3))
# Loop through each file
for (i in 1:17) {
# Read the JPEG image
temp <- image_read(paste0(path, files[i]))
# Resize the image
temp_resized <- image_scale(temp, geometry = paste0(width/scale, "x", height/scale, "!"))
# Get the pixel data as a matrix and convert to numeric, if not numeric it doesn't work
temp_data <- as.numeric(image_data(temp_resized))
# Reshape the matrix to match the array dimensions
temp_array <- array(temp_data, dim = c(height/scale, width/scale, 3))
# Assign the resized image to the array
im[i,,,] <- temp_array
}
Then, the matrix was flattened. It turned into a two dimensional matrix. The shoes matrix was inverted from the flat matrix.
flat=matrix(0, 17, prod(dim(im)))
for (i in 1:17) {
newim <- readJPEG(paste0(path, files[i]))
r=as.vector(im[i,,,1]); g=as.vector(im[i,,,2]);b=as.vector(im[i,,,3])
flat[i,] <- t(c(r, g, b))
}
shoes=as.data.frame(t(flat))
In this section, the plots are shown. You can see all 17 shoes.
par(mfrow=c(3,3))
par(mai=c(.3,.3,.3,.3))
for (i in 1:17){ #plot the first images only
plot_jpeg(writeJPEG(im[i,,,]))
}
Here, we get the new scales, the mean, and the standard deviation.
scaled=scale(shoes, center = TRUE, scale = TRUE)
mean.shoe=attr(scaled, "scaled:center") #saving for classification
std.shoe=attr(scaled, "scaled:scale") #saving for classification...later
The sigma value is calculated from the scaled values.
Sigma_=cor(scaled)
The eigen values are calculated and displayed here.
myeigen=eigen(Sigma_)
cumsum(myeigen$values) / sum(myeigen$values)
## [1] 0.7356451 0.8383330 0.8888012 0.9123082 0.9285333 0.9417408 0.9525551
## [8] 0.9612776 0.9681671 0.9738915 0.9791163 0.9841289 0.9881985 0.9918082
## [15] 0.9953036 0.9983300 1.0000000
After that, we need to diagonalize the matrix to get the eigenshoes matrix. We can then create the compiled image of shoes. It is one image that shows similarities between all the images of the shoes.
scaling=diag(myeigen$values[1:5]^(-1/2)) / (sqrt(nrow(scaled)-1))
eigenshoes=scaled%*%myeigen$vectors[,1:5]%*%scaling
par(mfrow=c(2,3))
imageShow(array(eigenshoes[,1], c(60,125,3)))
In this section we perform principle component analysis. This will be used to show the differences in the shoes.
height=1200
width=2500
scale=20
newdata=im
dim(newdata)=c(17,height*width*3/scale^2)
mypca=princomp(t(as.matrix(newdata)), scores=TRUE, cor=TRUE)
Now, we can see the differences in the shoes by using the principal component analysis score.
mypca2=t(mypca$scores)
dim(mypca2)=c(17,height/scale,width/scale,3)
par(mfrow=c(5,5))
par(mai=c(.001,.001,.001,.001))
for (i in 1:17){#plot the Eigenshoes
plot_jpeg(writeJPEG(mypca2[i,,,], bg="white")) #complete without reduction
}
This shows the cumulative sums of the principal component analysis. This shows how many images account for different amounts of variability. We can see that two images account for 83.9% of the variability. We are looking for 80%, so two images accounts for that goal.
a=round(mypca$sdev[1:17]^2/ sum(mypca$sdev^2),3)
cumsum(a)
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8 Comp.9 Comp.10
## 0.736 0.839 0.889 0.913 0.929 0.942 0.953 0.962 0.969 0.975
## Comp.11 Comp.12 Comp.13 Comp.14 Comp.15 Comp.16 Comp.17
## 0.980 0.985 0.989 0.993 0.996 0.999 1.001