Loading the necessary packages, and loading the files using the file path.
# Loading the necessary packages
library(jpeg)
library(OpenImageR)
library(EBImage)
##
## Attaching package: 'EBImage'
## The following objects are masked from 'package:OpenImageR':
##
## readImage, writeImage
library(foreach)
files<-list.files("/Users/selina/Downloads/", pattern="\\.jpg")[1:17]
Defining a function plot_jpeg to plot JPEG images with
specified dimensions and aspect ratio, using the readJPEG
function from the JPEG package.
height = 1200;
width = 2500;
scale = 20
plot_jpeg = function(path, add=FALSE) {
jpg = readJPEG(path, native=T)
res = dim(jpg)[2:1]
if (!add)
plot(1,1,xlim=c(1,res[1]), ylim=c(1, res[2]), asp=1, type='n', xaxs='i', yaxs='i', xaxt='n', yaxt='n', xlab='', ylab='', bty='n')
rasterImage(jpg, 1, 1, res[1], res[2])
}
Initializing an array to store resized images from JPEG files located in the specified directory.
images<-array(rep(0, length(files)*height/scale*width/scale*3), dim=c(length(files), height/scale, width/scale, 3))
for (i in 1:17) {
temp<-resize(readJPEG(paste0("/Users/selina/Downloads/jpg/", files[i])), height/scale, width/scale)
images[i,,,]=array(temp, dim=c(1, height/scale, width/scale, 3))
}
dim(images)
## [1] 17 60 125 3
Vectorizing the images and storing them as a data frame for further processing.
flat<-matrix(0, 17, prod(dim(images)))
for (i in 1:17) {
newimages<-readJPEG(paste0("/Users/selina/Downloads/jpg/", files[i]))
r=as.vector(images[i,,,1]); g=as.vector(images[i,,,2]); b=as.vector(images[i,,,3])
flat[i,] <- t(c(r, g, b))
}
shoes<-as.data.frame(t(flat))
Plotting the images, so that it is in 5 by 4 layout
par(mfrow=c(5,4))
par(mai=c(.03,.03,.03,.03))
for (i in 1:17) {
plot_jpeg(writeJPEG(images[i,,,]))
}
Scaling the “shoes” data frame centers its variables around their mean values and scales them to achieve unit variance. This process facilitates comparison and analysis of the variables on a standardized scale. Finally, the dimensions of the scaled data frame are examined to ensure proper transformation.
scaled<-scale(shoes, center=TRUE, scale=TRUE)
mean.shoe<-attr(scaled, "scaled:center")
std.shoe<-attr(scaled, "scaled:scale")
dim(scaled)
## [1] 382500 17
Calculating the correlation matrix of the scaled data enables the assessment of the linear relationship between different variables. The dimensions of the resulting correlation matrix are then inspected to ensure its compatibility with further analyses.
Sigma_<-cor(scaled)
dim(Sigma_)
## [1] 17 17
Calculating the cumulative proportion of variance explained by each eigenvalue in the covariance matrix involves determining the contribution of each eigenvalue to the total variance. This is achieved by summing up the eigenvalues progressively and dividing by the total sum of eigenvalues.
myeigen<-eigen(Sigma_)
cumsum(myeigen$values) / sum(myeigen$values)
## [1] 0.6928202 0.7940449 0.8451073 0.8723847 0.8913841 0.9076338 0.9216282
## [8] 0.9336889 0.9433872 0.9524455 0.9609037 0.9688907 0.9765235 0.9832209
## [15] 0.9894033 0.9953587 1.0000000
Scaling the eigenvalues and eigenvectors is performed to ensure that they contribute equally to the principal components. Once scaled, the second eigenshoe is extracted and displayed as an image in a 5x5 grid layout to visualize its features.
scaling<-diag(myeigen$values[1:5]^(-1/2)) / (sqrt(nrow(scaled)-1))
eigenshoes<-scaled%*%myeigen$vectors[,1:5]%*%scaling
par(mfrow=c(2,3))
imageShow(array(eigenshoes[,2], c(60, 125, 3)))

height=1200
width=2500
scale=20
newimages=images
dim(newimages)=c(length(files), height*width*3/scale^2)
pca<-princomp(t(as.matrix(newimages)), scores=TRUE, cor=TRUE)
pca2<-t(pca$scores)
dim(pca2)=c(length(files), height/scale, width/scale, 3)
par(mfrow=c(5,5))
par(mai=c(.01,.01,.01,.01))
for (i in 1:17) {
plot_jpeg(writeJPEG(pca2[i,,,]))
}
For the first part, “a” is calculated by squaring the standard deviations of the principal components, dividing them by the sum of all squared standard deviations, and then rounding the result to three decimal places. The cumulative sum of “a” is then computed to determine the cumulative proportion of variance explained by each principal component.
In the second part, “x” represents the new dataset capturing 80% of the variability for all images. It is obtained by multiplying the transpose of the scaled eigenshoes matrix by the scaled original data.
a<-round(pca$sdev[1:17]^2 / sum(pca$sdev^2), 3)
cumsum(a)
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8 Comp.9 Comp.10
## 0.693 0.794 0.845 0.872 0.891 0.907 0.921 0.933 0.943 0.952
## Comp.11 Comp.12 Comp.13 Comp.14 Comp.15 Comp.16 Comp.17
## 0.960 0.968 0.976 0.983 0.989 0.995 1.000
x <- t(t(eigenshoes)%*%scaled)
`