Dimension reduction on example of food items

Introduction

The paper aims to review different algorithms and methods used for dimension reduction. Two different algorithms were used: Principal Component Analysis (PCA) and Weighted Least Squares Factor Analysis. Analysis, as in the paper regarding the clustering techniques, are based on the data from the database called “nutrition” that can be found here: https://www.kaggle.com/trolukovich/nutritional-values-for-common-foods-and-products?select=nutrition.csv. This data set contains nutrition data for almost 9 thousand different food items.

Data preparation

First of all, necessary libraries are loaded and the database is imported.

library(tidyverse)
library(dplyr)
library(lubridate)
library(ggplot2)
library(datasets)
library(readxl)
library(xlsx)
library(cluster)
library(factoextra)
library(flexclust)
library(fpc)
library(clustertend)
library(ClusterR)
library(tidyverse)
library(grid)
library(gridExtra)
library(lattice)
library(ppclust)
library(fclust)
library(wesanderson)
library(corrplot)
library(psych)
library(ggfortify)
library(pca3d)
library(knitr)
library(rgl)
library(smacof)
library(labdsv)
library(vegan)
library(MASS)
library(ape)
library(ggfortify)
library(pca3d)
library(pls)
library(ClusterR)

nutrition <- read.csv2("nutrition.csv", header=TRUE, sep = ",", stringsAsFactors = TRUE)

Second of all, the rows with NA values or omitted, irrelevant columns are deleted and the food items names are stored as an additional variable.

nutrition <- na.omit(nutrition)
namesfood <- nutrition[1:175,2]

nutrition <- nutrition[,-1]
nutrition <- nutrition[,-1]
nutrition <- nutrition[,-1]
nutrition <- nutrition[,-3]
dim(nutrition)

## [1] 8789   73

Further more, all columns containing char variables are converted to numeric.

for(i in 1:ncol(nutrition)) {
  nutrition[,i] <- as.numeric(nutrition[,i])
}

Also, in order to obtain more interpretable results whole database is normalised.

nutrition2<-nutrition 
nutrition <- scale(nutrition)

After the initial data processing we obtain the normalised matrix containing data for 8789 food products, for which 73 nutritional values has been assigned. Finally, in order not to work with big data, only 175 food items and 15 nutritional characteristics are selected.

nutritiontrim <- as.matrix(nutrition[1:175, 1:15])
nutrition2<-as.matrix(nutrition2[1:175, 1:15])
names_nutr<- as.matrix(colnames(nutritiontrim))
summary(nutritiontrim)

##     calories          total_fat        cholesterol           sodium       
##  Min.   :-1.26152   Min.   :-1.0124   Min.   :-0.89088   Min.   :-1.9078  
##  1st Qu.:-0.96127   1st Qu.:-0.9158   1st Qu.:-0.89088   1st Qu.:-1.0133  
##  Median :-0.18417   Median :-0.4971   Median :-0.89088   Median :-0.2183  
##  Mean   :-0.01106   Mean   :-0.1256   Mean   :-0.36797   Mean   :-0.2200  
##  3rd Qu.: 0.62236   3rd Qu.: 0.3295   3rd Qu.:-0.06053   3rd Qu.: 0.6415  
##  Max.   : 3.82496   Max.   : 2.7234   Max.   : 1.63663   Max.   : 1.8090  
##     choline             folate           folic_acid          niacin       
##  Min.   :-0.78718   Min.   :-1.17831   Min.   :-0.3548   Min.   :-1.2115  
##  1st Qu.:-0.78718   1st Qu.:-1.09848   1st Qu.:-0.3353   1st Qu.:-1.0396  
##  Median :-0.48857   Median :-0.22032   Median :-0.3353   Median :-0.5094  
##  Mean   : 0.01346   Mean   :-0.03467   Mean   :-0.1089   Mean   :-0.2006  
##  3rd Qu.: 0.74183   3rd Qu.: 0.98858   3rd Qu.:-0.3353   3rd Qu.: 0.5672  
##  Max.   : 2.22285   Max.   : 1.58923   Max.   : 4.6595   Max.   : 1.9358  
##  pantothenic_acid     riboflavin           thiamin           vitamin_a       
##  Min.   :-0.96531   Min.   :-1.061069   Min.   :-0.80429   Min.   :-0.97765  
##  1st Qu.:-0.94700   1st Qu.:-0.854052   1st Qu.:-0.67142   1st Qu.:-0.97535  
##  Median :-0.36370   Median :-0.289718   Median :-0.36460   Median :-0.03169  
##  Mean   : 0.07939   Mean   : 0.003946   Mean   : 0.04274   Mean   : 0.07498  
##  3rd Qu.: 0.85000   3rd Qu.: 0.484469   3rd Qu.: 0.45923   3rd Qu.: 1.03051  
##  Max.   : 2.80656   Max.   : 3.379871   Max.   : 3.52501   Max.   : 2.06049  
##  vitamin_a_rae      carotene_alpha    carotene_beta    
##  Min.   :-0.75618   Min.   :-0.2591   Min.   :-0.4945  
##  1st Qu.:-0.75032   1st Qu.:-0.2591   1st Qu.:-0.4945  
##  Median :-0.74446   Median :-0.2267   Median :-0.4879  
##  Mean   :-0.07657   Mean   : 0.1852   Mean   : 0.2510  
##  3rd Qu.: 0.52687   3rd Qu.:-0.2267   3rd Qu.: 0.8318  
##  Max.   : 2.49539   Max.   : 5.9855   Max.   : 3.2059

In the next step we will examine the correlation matrix for analysed data.

#two correlation plots
corr_nutritiontrim = cor(nutritiontrim, method='pearson')

corrplot.mixed(cor(nutritiontrim), bg="white", upper="pie",lower="number", order="hclust", tl.col="black", tl.pos="lt", diag="l", number.font=0.5, tl.cex=1, number.cex=0.55)

corrplot(corr_nutritiontrim, tl.col="black")

## Warning in plot.window(...): 'tl.col' nie jest parametrem graficznym

## Warning in plot.xy(xy, type, ...): 'tl.col' nie jest parametrem graficznym

## Warning in axis(side = side, at = at, labels = labels, ...): 'tl.col' nie jest
## parametrem graficznym

## Warning in axis(side = side, at = at, labels = labels, ...): 'tl.col' nie jest
## parametrem graficznym

## Warning in box(...): 'tl.col' nie jest parametrem graficznym

## Warning in title(...): 'tl.col' nie jest parametrem graficznym

Looking at both plots we can clearly see that some variables are correlated, while most of them have correlation close to 0.

MDS

First method used is the Multidimensional Scaling (MDS). This algorithm projects multidimensional data set onto two dimensional plane, while trying to preserve the original structure of the data set. In our case we have 15 nutritional values - 15 dimensions for each food item. We will visualize 15 dimensions on the 2D plane.

distance<-dist(t(nutritiontrim)) 
mds<-cmdscale(distance, k=2)
#let's plot the results with labels
plot(mds, type='n') 
text(mds, labels=names_nutr, cex=0.6, adj=0.5)

On the plot we can see all 15 nutritional values with their relative position in the two dimensional plane. We can see that although there are no obvious outliers, some variables such as carotene alpha or pantothenic acid lay further away than most points. In order not to have to deal with outliers we will use slightly different approach. This time we will calculate the distance matrix based on the correlation matrix. This way we will avoid problem of outliers.

#calculate the correlation matrix
simmilaritymatr<-cor(nutritiontrim)
#calculate the dissimilarity matrix
dissimilaritymatr<-sim2diss(simmilaritymatr, method=1, to.dist=TRUE)
#perform MDS on calcualted matrixes
mds2<-mds(dissimilaritymatr, ndim=2,  type="ratio") # from smacof::
#plot the results
plot(mds2)

This time we can see that all variables are close to each other, there are no outliers visible. We can also look at the contribution of each variable to the STRESS function - the function that measures quality of representation of variables on the 2D plane compared to their original location.

plot(mds2, pch=21, cex=as.numeric(mds2$spp), bg="red")

We can see that some of the variables have bigger impact on the STRESS function than others. Finally, we shall examine the quality of performed MDS.

#theoretical stress function
stressfunction<-randomstress(n=15, ndim=2, nrep=1) 
#empirical stress function
empirical<-mds2$stress
#ratio
result <- empirical/mean(stressfunction)
result

## [1] 0.6524064

Value of the stress functions ratio is equal to 0.64, which according to Kruskal (1964), obtained MDS results are poor (result close to 1). Thus, we should perform other methods to extract information from our data set.

PCA

Second method used for reducing the analysed dataset is the Principal Component Analysis (PCA). PCA is used to reduce the number of variables while containing as much of the original information as possible. We can achieve that by choosing the variables with high variance, and creating the principal components (which are linear combinations of variables). First principal component explains the highest percentage of the total variance in the data, second principal component explains the highest percentage of variance that has not been explained by the first component etc. (this property is associated with the orthogonality of the components, we can think about the orthogonality as the more general concept of perpendicularity in the higher dimensional spaces) (Górniak 1998).

pca<-prcomp(nutritiontrim, center=TRUE, scale.=FALSE) 
pca

## Standard deviations (1, .., p=15):
##  [1] 1.8457612 1.5338108 1.2021381 1.1280718 1.0732342 1.0289359 0.9820252
##  [8] 0.9159694 0.8090402 0.7748769 0.6929745 0.6571189 0.6457515 0.6107240
## [15] 0.5229643
## 
## Rotation (n x k) = (15 x 15):
##                          PC1         PC2          PC3         PC4         PC5
## calories         -0.27302813 -0.11156558  0.610080784 -0.10125758  0.11818808
## total_fat        -0.23582437 -0.21320868  0.527635012 -0.02897626 -0.19367552
## cholesterol      -0.14406565 -0.07095549 -0.083396784  0.11524353 -0.48231343
## sodium           -0.07242794  0.05679978 -0.007889995  0.45141426  0.54124791
## choline           0.08599310 -0.25004100 -0.159103902  0.09732101  0.13254513
## folate           -0.06495034 -0.14543061 -0.258987933  0.24122284  0.13642134
## folic_acid       -0.13134323 -0.08301567  0.076413442  0.06380616  0.26993769
## niacin           -0.32906420 -0.17915130 -0.195692392 -0.03069413 -0.05073449
## pantothenic_acid -0.34078549 -0.33158394 -0.318295065 -0.06362482 -0.32547014
## riboflavin       -0.33305249 -0.31137478 -0.199353682  0.10339677  0.19715379
## thiamin          -0.33561653 -0.23588007  0.072206066 -0.13791248  0.25977458
## vitamin_a         0.16347016 -0.28278460  0.119599878  0.48628498 -0.15285821
## vitamin_a_rae     0.19134693 -0.26902804  0.123719670  0.28297197 -0.15056768
## carotene_alpha    0.39479334 -0.51290330 -0.080669257 -0.55862037  0.22414125
## carotene_beta     0.38444641 -0.36499637  0.153220618  0.19187597 -0.05260858
##                          PC6           PC7         PC8         PC9         PC10
## calories         -0.11018887  0.0015081457 -0.17795499  0.23353763 -0.150158468
## total_fat         0.16360123 -0.3633090618  0.13287123  0.01079130 -0.003831069
## cholesterol       0.25331796 -0.2862792525 -0.14395705 -0.34627570  0.160173404
## sodium            0.61187040 -0.1006024839 -0.12870623  0.06642504 -0.197334750
## choline          -0.33757182 -0.2818999891 -0.53782974 -0.21267369 -0.452220572
## folate           -0.32417983 -0.5782226879  0.29207545  0.46858222  0.239061362
## folic_acid       -0.20444712 -0.1528121378 -0.05524841 -0.43491158  0.305525877
## niacin            0.04544823 -0.0792876673  0.03645436 -0.34154796 -0.015108050
## pantothenic_acid  0.13388771  0.1472416164 -0.19001373  0.42365893 -0.261301142
## riboflavin        0.06813964  0.3736129060  0.08052513 -0.04563109  0.300898060
## thiamin          -0.20511282  0.2170974335  0.07617543 -0.07460581  0.045302111
## vitamin_a        -0.19365595  0.2226183843  0.51395177 -0.19231392 -0.435943468
## vitamin_a_rae     0.23939104  0.0009782173 -0.02496509  0.03298645  0.254585345
## carotene_alpha    0.29378543 -0.1662290179  0.22669196 -0.06557853 -0.089406364
## carotene_beta    -0.11775157  0.2165419950 -0.41217350  0.14461563  0.365908493
##                         PC11          PC12         PC13        PC14        PC15
## calories          0.39281872 -0.0568560056  0.003348553 -0.46128505  0.15923312
## total_fat        -0.50445328 -0.0388447490  0.130045184  0.35883322  0.06309202
## cholesterol       0.14770521 -0.0999371085  0.182675741 -0.39897968 -0.42962401
## sodium           -0.02644088  0.1726832757  0.056122630 -0.02759144 -0.12604398
## choline          -0.05168794 -0.3377800006  0.064635287  0.15066951  0.04084531
## folate            0.10825041  0.0214699008  0.092987247 -0.08585684 -0.03314396
## folic_acid       -0.24623533  0.2237656657 -0.615926459 -0.22635639  0.03284080
## niacin            0.35775766  0.4551513597  0.254036157  0.20296550  0.50036287
## pantothenic_acid -0.17104162  0.2110314131 -0.400628883 -0.05602250 -0.04145774
## riboflavin       -0.23608888 -0.5097680503  0.234584878 -0.18694313  0.23217324
## thiamin           0.22600781  0.0677936316  0.056887857  0.37119040 -0.66384344
## vitamin_a        -0.02850743  0.0881389911 -0.011192010 -0.17369311 -0.04530423
## vitamin_a_rae     0.45707234 -0.3219781708 -0.429306958  0.36978893  0.12446879
## carotene_alpha    0.03368478 -0.0005076429 -0.008356985 -0.18031789 -0.03033686
## carotene_beta    -0.13695630  0.4040161396  0.294216556 -0.02660041 -0.03486261

summary(pca)

## Importance of components:
##                           PC1    PC2     PC3     PC4     PC5     PC6     PC7
## Standard deviation     1.8458 1.5338 1.20214 1.12807 1.07323 1.02894 0.98203
## Proportion of Variance 0.2167 0.1496 0.09192 0.08094 0.07326 0.06734 0.06134
## Cumulative Proportion  0.2167 0.3663 0.45826 0.53920 0.61247 0.67981 0.74115
##                            PC8     PC9    PC10    PC11    PC12    PC13    PC14
## Standard deviation     0.91597 0.80904 0.77488 0.69297 0.65712 0.64575 0.61072
## Proportion of Variance 0.05337 0.04163 0.03819 0.03055 0.02747 0.02652 0.02372
## Cumulative Proportion  0.79452 0.83615 0.87434 0.90489 0.93236 0.95888 0.98260
##                          PC15
## Standard deviation     0.5230
## Proportion of Variance 0.0174
## Cumulative Proportion  1.0000

As we can see first two principal components explain around 36% of total variance. Now, we need to determine the proper number of principal components. We will do that by using two different approaches.
First one is performed by looking at the scree plot.

fviz_eig(pca, choice='eigenvalue')

This plot presents the eigenvalue of each principal component. Keiser’s rule tells us that we should keep all components, which have eigenvalues bigger than 1 (Rea 2016). Here, first six components have eigenvalues bigger than 1, therefore 6 components should be chosen.
Second approach suggests that chosen components must explain at least 70% of variance.

#table for each component 
eig.val<-get_eigenvalue(pca)
eig.val

##        eigenvalue variance.percent cumulative.variance.percent
## Dim.1   3.4068344        21.669941                    21.66994
## Dim.2   2.3525755        14.964089                    36.63403
## Dim.3   1.4451359         9.192114                    45.82614
## Dim.4   1.2725461         8.094317                    53.92046
## Dim.5   1.1518316         7.326485                    61.24695
## Dim.6   1.0587091         6.734158                    67.98110
## Dim.7   0.9643735         6.134115                    74.11522
## Dim.8   0.8389999         5.336648                    79.45187
## Dim.9   0.6545461         4.163388                    83.61526
## Dim.10  0.6004342         3.819198                    87.43445
## Dim.11  0.4802137         3.054508                    90.48896
## Dim.12  0.4318052         2.746595                    93.23556
## Dim.13  0.4169950         2.652391                    95.88795
## Dim.14  0.3729838         2.372448                    98.26039
## Dim.15  0.2734916         1.739605                   100.00000

We can see that in order to explain at least 70% of the total variance, we should take into account 7 components (74% of variance explained). Thus, I have decided to choose 7 principal components for further analysis.

fviz_pca_var(pca, col.var="contrib", gradient.cols = wes_palette("Rushmore1"))

We can see that carotene alpha, carotene beta and sodium are amongst the most influential variables. The graphs show the contribution of individual variables to each of the seven principal components.

# contributions of individual variables to PC
var<-get_pca_var(pca)
a<-fviz_contrib(pca, "var", color="navy blue", axes=1, xtickslab.rt=90, ggtheme=theme_classic(), palette="Set1", title = NULL) 
b<-fviz_contrib(pca, "var", color="navy blue", axes=2, xtickslab.rt=90, ggtheme=theme_classic(), palette="Set1")
c<-fviz_contrib(pca, "var", color="navy blue", axes=3, xtickslab.rt=90, ggtheme=theme_classic(), palette="Set1")
d<-fviz_contrib(pca, "var", color="navy blue", axes=4, xtickslab.rt=90, ggtheme=theme_classic(), palette="Set1")
e<-fviz_contrib(pca, "var", color="navy blue", axes=5, xtickslab.rt=90, ggtheme=theme_classic(), palette="Set1")
f<-fviz_contrib(pca, "var", color="navy blue", axes=6, xtickslab.rt=90, ggtheme=theme_classic(), palette="Set1")
g<-fviz_contrib(pca, "var", color="navy blue", axes=7, xtickslab.rt=90, ggtheme=theme_classic(), palette="Set1")
grid.arrange(a,b,c,d,e,f,g, top='Contribution to the first seven Principal Components')

To draw meaningful conclusions from the performed analysis, it is useful to implement the rotated PCA. Rotated PCA allows us to analyse the contribution of the individual variables in the components.

#rotated pca
pca_rotated<-principal(nutritiontrim, nfactors=7, rotate="varimax")
pca_rotated

## Principal Components Analysis
## Call: principal(r = nutritiontrim, nfactors = 7, rotate = "varimax")
## Standardized loadings (pattern matrix) based upon correlation matrix
##                    RC1   RC2   RC3   RC7   RC6   RC4   RC5   h2    u2 com
## calories          0.15 -0.08  0.83 -0.07 -0.03 -0.10 -0.01 0.74 0.260 1.1
## total_fat         0.11 -0.06  0.78  0.13  0.01  0.32 -0.03 0.74 0.264 1.4
## cholesterol       0.15 -0.08  0.07 -0.02  0.02  0.90  0.00 0.84 0.163 1.1
## sodium            0.04 -0.09 -0.03 -0.01  0.03 -0.01  0.95 0.91 0.088 1.0
## choline           0.04  0.59 -0.08 -0.15  0.57  0.08 -0.04 0.70 0.296 2.2
## folate            0.12 -0.16 -0.17  0.30  0.73  0.08 -0.07 0.71 0.287 1.7
## folic_acid        0.11  0.00  0.39 -0.16  0.60 -0.14  0.19 0.61 0.389 2.3
## niacin            0.70 -0.15  0.11 -0.13  0.19  0.29  0.00 0.66 0.337 1.8
## pantothenic_acid  0.76  0.03  0.00 -0.02 -0.04  0.31 -0.12 0.69 0.309 1.4
## riboflavin        0.82 -0.03  0.08  0.10  0.05 -0.08  0.19 0.74 0.257 1.2
## thiamin           0.71 -0.07  0.42 -0.07  0.13 -0.29 -0.06 0.79 0.215 2.1
## vitamin_a         0.00  0.15  0.00  0.85  0.09 -0.08 -0.06 0.76 0.243 1.1
## vitamin_a_rae    -0.10  0.51  0.06  0.55 -0.12  0.26  0.24 0.71 0.292 3.0
## carotene_alpha    0.01  0.75 -0.10  0.05 -0.08 -0.10 -0.11 0.60 0.405 1.1
## carotene_beta    -0.19  0.72 -0.02  0.35 -0.02 -0.08 -0.02 0.69 0.314 1.6
## 
##                        RC1  RC2  RC3  RC7  RC6  RC4  RC5
## SS loadings           2.38 1.79 1.70 1.34 1.31 1.31 1.07
## Proportion Var        0.16 0.12 0.11 0.09 0.09 0.09 0.07
## Cumulative Var        0.16 0.28 0.39 0.48 0.57 0.65 0.73
## Proportion Explained  0.22 0.16 0.16 0.12 0.12 0.12 0.10
## Cumulative Proportion 0.22 0.38 0.54 0.66 0.78 0.90 1.00
## 
## Mean item complexity =  1.6
## Test of the hypothesis that 7 components are sufficient.
## 
## The root mean square of the residuals (RMSR) is  0.07 
##  with the empirical chi square  193.7  with prob <  6.3e-30 
## 
## Fit based upon off diagonal values = 0.86

summary(pca_rotated)

## 
## Factor analysis with Call: principal(r = nutritiontrim, nfactors = 7, rotate = "varimax")
## 
## Test of the hypothesis that 7 factors are sufficient.
## The degrees of freedom for the model is 21  and the objective function was  2.11 
## The number of observations was  175  with Chi Square =  345.77  with prob <  1.4e-60 
## 
## The root mean square of the residuals (RMSA) is  0.07

# we can try to look at the data and group nutritional values into groups
#threshold was chosen arbitrarily at the 0.45 level
print(loadings(pca_rotated), digits=3, cutoff=0.45, sort=TRUE)

## 
## Loadings:
##                  RC1    RC2    RC3    RC7    RC6    RC4    RC5   
## niacin            0.699                                          
## pantothenic_acid  0.759                                          
## riboflavin        0.824                                          
## thiamin           0.708                                          
## choline                  0.588                0.566              
## carotene_alpha           0.746                                   
## carotene_beta            0.723                                   
## calories                        0.835                            
## total_fat                       0.777                            
## vitamin_a                              0.847                     
## vitamin_a_rae            0.508         0.549                     
## folate                                        0.734              
## folic_acid                                    0.604              
## cholesterol                                          0.895       
## sodium                                                      0.949
## 
##                  RC1   RC2   RC3   RC7   RC6   RC4   RC5
## SS loadings    2.375 1.785 1.696 1.337 1.311 1.305 1.072
## Proportion Var 0.158 0.119 0.113 0.089 0.087 0.087 0.071
## Cumulative Var 0.158 0.277 0.390 0.480 0.567 0.654 0.725

Even though my knowledge regarding the nutritional values is limited and it is hard to assess the results without the expert knowledge, we can see that for example amount of total fat is connected with the amount of calories or that choline occurs together with folate and folic acid (which means that vitamin B4 and B9 can often be found in the same products).

Finally, we can visualize the results.

#visualisation of pca 
# let's choose matching colors - all colors can be displayed by the function colors() 
fviz_pca_ind(pca, col.ind="cos2", geom="point", gradient.cols=c("darkslategray1", "skyblue", "blue1", "navy blue" ))

#3D plot of variables
pca3d(pca, palette=c("darkslategray1", "salmon", "navy blue" ))

## [1] 0.09998575 0.09066615 0.10272537
## Creating new device

rglwidget()

The 3D plot is not very informative, therefore we shall try to assign the points to the clusters obtained in the paper regarding the clustering. The most effective way to group data was to use the fuzzy k-means algorithm for 3 clusters.

# 3D plot with points assigned to clusters
fuzzykm <- fcm(nutritiontrim, centers=3, m=1.5)
fuzzykm2 <- ppclust2(fuzzykm, "kmeans")
pca3d(pca, group=fuzzykm2$cluster, palette=c("darkslategray1", "salmon", "navy blue" ))

## [1] 0.09998575 0.09066615 0.10272537

rglwidget()

We can see that variables are grouped into three main groups, and these groups correspond with the previously obtained clusters. Every group is mostly described by different principal component.

Weighted Least Squares Factor Analysis

Third method used to extract the information from the data set and make an attempt to group variables is the Weighted Least Squares Factor Analysis. In this approach we decompose the correlation matrix and calculate the communulaties (the sum of loading s of each variable). Then we assign weights to the correlation coefficients (between the original data and the matrix of created factors) in such a way that most unique variables are assigned low weights. As a result, we value commonly occurring variables more than rare ones (Revelle 2021).

In order to perform the Weighted Least Squares Factor Analysis I have decided to use library GPArotation, which also allows to visualize the contribution of each variable to each group.

library(GPArotation)
#WLSFA for our data set
f3wtest <- fa(nutritiontrim, 3, n.obs = 175, fm="wls")
#visualization of results 
fa.diagram(f3wtest)

On the diagram we can see three groups that variables were assigned to, as well as the loading value for each factor. First one, the most abundant consists of variables such as thiamine, total calories or total fat. Second one consists mainly of vitamins. Contrary to previous method, sodium was assigned to individual group, while cholesterol and folate were not assigned to any group.

Summary

The analysis of 150 food items and 15 nutritional values was performed. After initial data preparation, three methods were applied. First one, the Multidimensional Scaling allowed us to reduce 15-dimensional data set to two dimension and plot the results. However, obtained results were of poor quality. Second one, the Principal Components Analysis allowed us to reduce the number of variables in order to display the original information with smaller matrix. Two methods were used to determine the number of components. Moreover the rotated PCA allowed us to distinct nutritional elements that were common within the same food items. Finally, we have visualized the results on the 3D plot, using results obtained in the paper regarding clustering to colour variables from the same clusters. Third method used was the Weighted Least Squares Factor Analysis. This approach allowed us to group variables into 3 categories, which were dissimilar to those obtained in the PCA.

Bibliography

Kruskal,(1964). J.B. Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika 29, 1–27 .
Rea, Alethea & Rea, William. (2016). How Many Components should be Retained from a Multivariate Time Series PCA?.
Górniak, Jarosław. (1998). Analiza czynnikowa i analiza głównych składowych (see: https://kb.osu.edu/bitstream/handle/1811/69494/ASK_1998_83_102.pdf).
Revelle, William. (2021). How To: Use the psych package for Factor Analysis and data reduction. Department of Psychology, Northwestern University (see: https://cran.r-project.org/web/packages/psychTools/vignettes/factor.pdf).