Introduction and project overview

The purpose of the project is to investigate and proceed the dimension reduction on the dataset Food Nutrition from Kaggle (https://www.kaggle.com/datasets/utsavdey1410/food-nutrition-dataset). This project focuses on analyzing a comprehensive nutritional food database, which provides detailed information on the macro and micronutrient content of various food items. The dataset includes essential nutritional values such as caloric content, fat, carbohydrates, proteins, vitamins, and minerals, which are crucial for understanding dietary needs. With this data, there can be explored the nutritional value of foods and identify patterns in food composition, aiding in healthier dietary planning.

Principal Component Analysis (PCA)

Principal Component Analysis is a dimensionality reduction technique that simplifies complex datasets by transforming correlated variables into a smaller set of uncorrelated components. It retains the most important information while reducing redundancy, making it ideal for visualizing and analyzing high-dimensional data. PCA will be used in this project to reduce the dimensionality of the nutritional data, helping to identify the most significant factors that explain the variance in food composition.

Preprocessing

The first step is to load and investigate the dataset.

food <- read.csv("FOOD-DATA-GROUP1.csv")
head(food,1)
##   X Unnamed..0         food Caloric.Value Fat Saturated.Fats
## 1 0          0 cream cheese            51   5            2.9
##   Monounsaturated.Fats Polyunsaturated.Fats Carbohydrates Sugars Protein
## 1                  1.3                  0.2           0.8    0.5     0.9
##   Dietary.Fiber Cholesterol Sodium Water Vitamin.A Vitamin.B1 Vitamin.B11
## 1             0        14.6  0.016   7.6       0.2      0.033       0.064
##   Vitamin.B12 Vitamin.B2 Vitamin.B3 Vitamin.B5 Vitamin.B6 Vitamin.C Vitamin.D
## 1       0.092      0.097      0.084      0.052      0.096     0.004         0
##   Vitamin.E Vitamin.K Calcium Copper  Iron Magnesium Manganese Phosphorus
## 1         0       0.1   0.008   14.1 0.082     0.027       1.3      0.091
##   Potassium Selenium  Zinc Nutrition.Density
## 1      15.5     19.1 0.039              7.07
str(food)
## 'data.frame':    551 obs. of  37 variables:
##  $ X                   : int  0 1 2 3 4 5 6 7 8 9 ...
##  $ Unnamed..0          : int  0 1 2 3 4 5 6 7 8 9 ...
##  $ food                : chr  "cream cheese" "neufchatel cheese" "requeijao cremoso light catupiry" "ricotta cheese" ...
##  $ Caloric.Value       : int  51 215 49 30 30 19 116 113 71 19 ...
##  $ Fat                 : num  5 19.4 3.6 2 2.3 0.2 9.1 9.3 4.5 1.3 ...
##  $ Saturated.Fats      : num  2.9 10.9 2.3 1.3 1.4 0.1 5.3 5.3 2.7 0.9 ...
##  $ Monounsaturated.Fats: num  1.3 4.9 0.9 0.5 0.6 0.091 2.8 2.6 1.4 0.4 ...
##  $ Polyunsaturated.Fats: num  0.2 0.8 0 0.002 0.042 0.075 0.5 0.3 0.1 0.035 ...
##  $ Carbohydrates       : num  0.8 3.1 0.9 1.5 1.2 1.4 0.1 0.9 0.6 0.2 ...
##  $ Sugars              : num  0.5 2.7 3.4 0.091 0.9 1 0.1 0.1 0.046 0.088 ...
##  $ Protein             : num  0.9 7.8 0.8 1.5 1.2 2.8 8.3 6.4 6.4 1.6 ...
##  $ Dietary.Fiber       : num  0 0 0.1 0 0 0 0 0 0 0 ...
##  $ Cholesterol         : num  14.6 62.9 0 9.8 8.1 2.2 30.8 27.7 12.2 5.2 ...
##  $ Sodium              : num  0.016 0.3 0 0.017 0.046 0.1 0.2 0.2 0.2 0.008 ...
##  $ Water               : num  7.6 53.6 0 14.7 10 12.9 9.3 10.3 5.4 1.5 ...
##  $ Vitamin.A           : num  0.2 0.2 0 0.075 0.016 0.063 0.061 0.054 0.067 0.064 ...
##  $ Vitamin.B1          : num  0.033 0.099 0 0.019 0.08 0.02 0.021 0.031 0.062 0.058 ...
##  $ Vitamin.B11         : num  0.064 0.079 0 0.079 0.062 0.089 0.072 0.005 0.099 0.026 ...
##  $ Vitamin.B12         : num  0.092 0.09 0 0.091 0.049 0.092 0.078 0.073 0.059 0.045 ...
##  $ Vitamin.B2          : num  0.097 0.1 0 0.027 0.026 0.021 0.004 0.1 0.057 0.059 ...
##  $ Vitamin.B3          : num  0.084 0.2 0 0.041 0.08 0.025 0.043 0.01 0.039 0.055 ...
##  $ Vitamin.B5          : num  0.052 0.5 0 0.016 0.1 0.2 0.2 0.1 0.06 0.025 ...
##  $ Vitamin.B6          : num  0.096 0.078 0 0.007 0.003 0.038 0.051 0.005 0.066 0.029 ...
##  $ Vitamin.C           : num  0.004 0 0 0.006 0 0 0 0 0 0 ...
##  $ Vitamin.D           : num  0 0 0 0 0.036 0 0.034 0.06 0.095 0.073 ...
##  $ Vitamin.E           : num  0 0.3 0 0.001 0.009 0.049 0.035 0.2 0.018 0.078 ...
##  $ Vitamin.K           : num  0.1 0.045 0 0.011 0.019 0.059 0.048 0.035 0.021 0.004 ...
##  $ Calcium             : num  0.008 99.5 0 0.097 22.2 ...
##  $ Copper              : num  14.1 0.034 0 41.2 0.072 0.039 0.033 0.099 0.051 0.046 ...
##  $ Iron                : num  0.082 0.1 0 0.097 0.008 0.053 0.094 0.077 0.1 0.03 ...
##  $ Magnesium           : num  0.027 8.5 0 0.096 1.2 4 10.1 7.6 7.9 2.1 ...
##  $ Manganese           : num  1.3 0.088 0 4 0.098 0.028 0.002 0.063 0.073 0.002 ...
##  $ Phosphorus          : num  0.091 117.3 0 0.024 22.8 ...
##  $ Potassium           : num  15.5 129.2 0 30.8 37.1 ...
##  $ Selenium            : num  19.1 0.054 0 43.8 0.034 0.013 0.079 0.009 0.045 0.087 ...
##  $ Zinc                : num  0.039 0.7 0 0.035 0.053 0.3 1.1 1 0.5 0.1 ...
##  $ Nutrition.Density   : num  7.07 130.1 5.4 5.2 27.01 ...
food <- food[3:length(food)]

Caloric values needs a conversion into numeric variables.

food$Caloric.Value <- as.numeric(gsub("[^0-9.]", "", food$Caloric.Value))
dim(food)
## [1] 551  35

Let’s check the NAs int the dataset.

library(dplyr)
## 
## Dołączanie pakietu: 'dplyr'
## Następujące obiekty zostały zakryte z 'package:stats':
## 
##     filter, lag
## Następujące obiekty zostały zakryte z 'package:base':
## 
##     intersect, setdiff, setequal, union
sum(is.na(food))
## [1] 0

The data needs to be scaled.

food <- scale(food[, 2:length(food)])

Subsequently, the data needs summarizing.

summary(food)
##  Caloric.Value          Fat          Saturated.Fats    Monounsaturated.Fats
##  Min.   :-1.1763   Min.   :-0.8583   Min.   :-0.6898   Min.   :-0.7222     
##  1st Qu.:-0.7170   1st Qu.:-0.6590   1st Qu.:-0.5971   1st Qu.:-0.6139     
##  Median :-0.2578   Median :-0.3322   Median :-0.3563   Median :-0.3252     
##  Mean   : 0.0000   Mean   : 0.0000   Mean   : 0.0000   Mean   : 0.0000     
##  3rd Qu.: 0.5001   3rd Qu.: 0.2896   3rd Qu.: 0.1996   3rd Qu.: 0.2073     
##  Max.   : 6.7289   Max.   : 6.1170   Max.   : 7.3702   Max.   : 7.9411     
##  Polyunsaturated.Fats Carbohydrates         Sugars            Protein       
##  Min.   :-0.6254      Min.   :-0.7797   Min.   :-0.36334   Min.   :-0.9728  
##  1st Qu.:-0.5102      1st Qu.:-0.7797   1st Qu.:-0.36334   1st Qu.:-0.6929  
##  Median :-0.3085      Median :-0.5283   Median :-0.36334   Median :-0.3443  
##  Mean   : 0.0000      Mean   : 0.0000   Mean   : 0.00000   Mean   : 0.0000  
##  3rd Qu.: 0.1237      3rd Qu.: 0.6349   3rd Qu.:-0.06826   3rd Qu.: 0.3820  
##  Max.   :10.9280      Max.   : 5.5439   Max.   : 9.13280   Max.   : 3.6172  
##  Dietary.Fiber      Cholesterol          Sodium            Water        
##  Min.   :-0.5092   Min.   :-0.8781   Min.   :-0.9011   Min.   :-1.1487  
##  1st Qu.:-0.5092   1st Qu.:-0.7391   1st Qu.:-0.7439   1st Qu.:-0.8560  
##  Median :-0.5092   Median :-0.3948   Median :-0.2723   Median :-0.2820  
##  Mean   : 0.0000   Mean   : 0.0000   Mean   : 0.0000   Mean   : 0.0000  
##  3rd Qu.: 0.1863   3rd Qu.: 0.3002   3rd Qu.: 0.5137   3rd Qu.: 0.7615  
##  Max.   : 7.6051   Max.   : 4.1474   Max.   : 8.6882   Max.   : 4.9055  
##    Vitamin.A          Vitamin.B1       Vitamin.B11       Vitamin.B12      
##  Min.   :-0.47732   Min.   :-0.7381   Min.   :-0.7148   Min.   :-1.10820  
##  1st Qu.:-0.47732   1st Qu.:-0.5810   1st Qu.:-0.4862   1st Qu.:-0.89470  
##  Median :-0.22663   Median :-0.3286   Median :-0.1597   Median :-0.06737  
##  Mean   : 0.00000   Mean   : 0.0000   Mean   : 0.0000   Mean   : 0.00000  
##  3rd Qu.: 0.04495   3rd Qu.: 0.2143   3rd Qu.: 0.1886   3rd Qu.: 0.70657  
##  Max.   :12.05729   Max.   : 8.3095   Max.   :13.4342   Max.   : 9.56695  
##    Vitamin.B2        Vitamin.B3         Vitamin.B5        Vitamin.B6     
##  Min.   :-0.6796   Min.   :-0.61805   Min.   :-0.4468   Min.   :-0.6283  
##  1st Qu.:-0.5085   1st Qu.:-0.55090   1st Qu.:-0.4058   1st Qu.:-0.5350  
##  Median :-0.3409   Median :-0.31589   Median :-0.2416   Median :-0.4254  
##  Mean   : 0.0000   Mean   : 0.00000   Mean   : 0.0000   Mean   : 0.0000  
##  3rd Qu.: 0.3367   3rd Qu.: 0.08698   3rd Qu.: 0.0456   3rd Qu.: 0.1832  
##  Max.   :12.1936   Max.   : 9.08453   Max.   :12.4375   Max.   : 8.0955  
##    Vitamin.C          Vitamin.D         Vitamin.E          Vitamin.K       
##  Min.   :-0.41544   Min.   :-0.1226   Min.   :-0.44666   Min.   :-0.06039  
##  1st Qu.:-0.41544   1st Qu.:-0.1226   1st Qu.:-0.44666   1st Qu.:-0.06039  
##  Median :-0.40122   Median :-0.1226   Median :-0.41321   Median :-0.05912  
##  Mean   : 0.00000   Mean   : 0.0000   Mean   : 0.00000   Mean   : 0.00000  
##  3rd Qu.:-0.08958   3rd Qu.:-0.1070   3rd Qu.: 0.08356   3rd Qu.:-0.05158  
##  Max.   : 8.05683   Max.   :20.7554   Max.   :11.05506   Max.   :23.38605  
##     Calcium             Copper             Iron           Magnesium      
##  Min.   :-0.55607   Min.   :-0.1933   Min.   :-0.8127   Min.   :-0.7627  
##  1st Qu.:-0.49950   1st Qu.:-0.1922   1st Qu.:-0.6534   1st Qu.:-0.6192  
##  Median :-0.33153   Median :-0.1911   Median :-0.2818   Median :-0.2969  
##  Mean   : 0.00000   Mean   : 0.0000   Mean   : 0.0000   Mean   : 0.0000  
##  3rd Qu.: 0.03841   3rd Qu.:-0.1866   3rd Qu.: 0.3553   3rd Qu.: 0.2771  
##  Max.   : 6.96876   Max.   :14.8723   Max.   :10.3892   Max.   : 7.5426  
##    Manganese         Phosphorus        Potassium          Selenium      
##  Min.   :-0.2808   Min.   :-0.9022   Min.   :-0.9134   Min.   :-0.2288  
##  1st Qu.:-0.2770   1st Qu.:-0.6959   1st Qu.:-0.6767   1st Qu.:-0.2286  
##  Median :-0.2729   Median :-0.3516   Median :-0.3263   Median :-0.2284  
##  Mean   : 0.0000   Mean   : 0.0000   Mean   : 0.0000   Mean   : 0.0000  
##  3rd Qu.:-0.2411   3rd Qu.: 0.4012   3rd Qu.: 0.2864   3rd Qu.:-0.2282  
##  Max.   : 8.5792   Max.   : 4.7849   Max.   : 7.5003   Max.   :10.7426  
##       Zinc           Nutrition.Density
##  Min.   :-0.253536   Min.   :-0.7679  
##  1st Qu.:-0.207521   1st Qu.:-0.5599  
##  Median :-0.130830   Median :-0.3238  
##  Mean   : 0.000000   Mean   : 0.0000  
##  3rd Qu.: 0.007215   3rd Qu.: 0.1392  
##  Max.   :22.339749   Max.   : 6.3745

Correlation matrix and KMO

The Kaiser-Meyer-Olkin (KMO) measure assesses the adequacy of sample data for factor analysis. It evaluates the proportion of variance in variables that might be caused by underlying factors. A KMO value closer to 1 indicates that the data is suitable for factor analysis, while values below 0.5 suggest that factor analysis may not be appropriate.

library("psych")
## Warning: pakiet 'psych' został zbudowany w wersji R 4.4.2
KMO(food)
## Kaiser-Meyer-Olkin factor adequacy
## Call: KMO(r = food)
## Overall MSA =  0.51
## MSA for each item = 
##        Caloric.Value                  Fat       Saturated.Fats 
##                 0.82                 0.40                 0.75 
## Monounsaturated.Fats Polyunsaturated.Fats        Carbohydrates 
##                 0.70                 0.80                 0.25 
##               Sugars              Protein        Dietary.Fiber 
##                 0.87                 0.39                 0.13 
##          Cholesterol               Sodium                Water 
##                 0.95                 0.84                 0.87 
##            Vitamin.A           Vitamin.B1          Vitamin.B11 
##                 0.14                 0.89                 0.78 
##          Vitamin.B12           Vitamin.B2           Vitamin.B3 
##                 0.79                 0.88                 0.93 
##           Vitamin.B5           Vitamin.B6            Vitamin.C 
##                 0.79                 0.83                 0.10 
##            Vitamin.D            Vitamin.E            Vitamin.K 
##                 0.58                 0.92                 0.21 
##              Calcium               Copper                 Iron 
##                 0.24                 0.72                 0.24 
##            Magnesium            Manganese           Phosphorus 
##                 0.96                 0.59                 0.95 
##            Potassium             Selenium                 Zinc 
##                 0.92                 0.59                 0.84 
##    Nutrition.Density 
##                 0.31

As the KMO turns out to be rather low I decided to remove variables with low MSA such as Carbohydrates, Dietary Fiber, Vitamin A and Vitamin C, Vitamin K and Calcium.

food <- as.data.frame(food)
food <- food %>% select(-Carbohydrates, -Dietary.Fiber, -Vitamin.A, -Vitamin.C, - Vitamin.K, -Calcium)

Let’s check KMO once again.

KMO(food)
## Kaiser-Meyer-Olkin factor adequacy
## Call: KMO(r = food)
## Overall MSA =  0.81
## MSA for each item = 
##        Caloric.Value                  Fat       Saturated.Fats 
##                 0.84                 0.85                 0.83 
## Monounsaturated.Fats Polyunsaturated.Fats               Sugars 
##                 0.87                 0.83                 0.53 
##              Protein          Cholesterol               Sodium 
##                 0.85                 0.90                 0.74 
##                Water           Vitamin.B1          Vitamin.B11 
##                 0.83                 0.79                 0.84 
##          Vitamin.B12           Vitamin.B2           Vitamin.B3 
##                 0.74                 0.80                 0.91 
##           Vitamin.B5           Vitamin.B6            Vitamin.D 
##                 0.82                 0.87                 0.62 
##            Vitamin.E               Copper                 Iron 
##                 0.89                 0.71                 0.79 
##            Magnesium            Manganese           Phosphorus 
##                 0.94                 0.65                 0.87 
##            Potassium             Selenium                 Zinc 
##                 0.88                 0.64                 0.52 
##    Nutrition.Density 
##                 0.80

The overall KMO value has increased to 0.81, indicating that the dataset is now more suitable for factor analysis. This represents a notable improvement compared to the previous result, suggesting that the removal of certain variables has enhanced the adequacy of the data.

dim(food)
## [1] 551  28

The data has 551 observations and 28 variables.

The Bartlett test is a statistical test used to determine whether the variables in a dataset are sufficiently correlated.

cortest.bartlett(food, n = 15428)
## R was not square, finding R from data
## $chisq
## [1] 13059.71
## 
## $p.value
## [1] 0
## 
## $df
## [1] 378

The p-value of Bartlett’s test is 0, indicating that our correlation matrix significantly differs from the identity matrix.

The correlation matrix has been investigated, providing deeper insights into the relationships between variables and confirming the suitability of the data for further factor analysis.

library(corrplot)
## Warning: pakiet 'corrplot' został zbudowany w wersji R 4.4.2
## corrplot 0.95 loaded
cor <- cor(food)
corrplot(cor, type = "lower", order = "hclust", tl.col = "black", tl.cex = 0.5)

Principal Component Analysis

For the the singular value decomposition analysis to compute the principal components prcomp function is used.

food.pca <- prcomp(food, center = TRUE, scale = TRUE)
summary(food.pca)
## Importance of components:
##                           PC1    PC2     PC3     PC4     PC5    PC6     PC7
## Standard deviation     2.8457 2.1296 1.67068 1.24872 1.16317 1.1099 1.05851
## Proportion of Variance 0.2892 0.1620 0.09969 0.05569 0.04832 0.0440 0.04002
## Cumulative Proportion  0.2892 0.4512 0.55087 0.60656 0.65488 0.6989 0.73890
##                            PC8     PC9    PC10   PC11   PC12    PC13    PC14
## Standard deviation     0.98296 0.92370 0.84311 0.8025 0.7216 0.68295 0.65136
## Proportion of Variance 0.03451 0.03047 0.02539 0.0230 0.0186 0.01666 0.01515
## Cumulative Proportion  0.77340 0.80387 0.82926 0.8523 0.8709 0.88752 0.90267
##                           PC15    PC16    PC17    PC18    PC19    PC20    PC21
## Standard deviation     0.63161 0.59210 0.56209 0.54600 0.52445 0.48373 0.46051
## Proportion of Variance 0.01425 0.01252 0.01128 0.01065 0.00982 0.00836 0.00757
## Cumulative Proportion  0.91692 0.92944 0.94072 0.95137 0.96119 0.96955 0.97712
##                           PC22    PC23    PC24    PC25    PC26    PC27    PC28
## Standard deviation     0.42289 0.39125 0.32154 0.27063 0.25526 0.19170 0.17340
## Proportion of Variance 0.00639 0.00547 0.00369 0.00262 0.00233 0.00131 0.00107
## Cumulative Proportion  0.98351 0.98898 0.99267 0.99529 0.99761 0.99893 1.00000

PC1 has the highest standard deviation and explains approximately 28.92% of the total variance.

There is also another approach - the spectral decomposition approach.

food.pca2<-princomp(food)
loadings(food.pca2)
## 
## Loadings:
##                      Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8
## Caloric.Value         0.316         0.203                                   
## Fat                   0.275         0.311         0.165                     
## Saturated.Fats        0.188         0.368 -0.277  0.171        -0.154       
## Monounsaturated.Fats  0.207         0.287 -0.109  0.232         0.115  0.154
## Polyunsaturated.Fats  0.234         0.174  0.290                0.286  0.172
## Sugars                              0.219  0.349        -0.121  0.176       
## Protein               0.287        -0.277                                   
## Cholesterol           0.248        -0.125 -0.185  0.102                0.155
## Sodium                0.115         0.241  0.308 -0.196 -0.143 -0.431 -0.268
## Water                 0.190        -0.148  0.374 -0.147 -0.142 -0.339 -0.137
## Vitamin.B1            0.169                      -0.235  0.374  0.361 -0.281
## Vitamin.B11                  0.378               -0.176 -0.132              
## Vitamin.B12                  0.163        -0.376 -0.307 -0.293  0.182       
## Vitamin.B2            0.174               -0.207 -0.256  0.281        -0.361
## Vitamin.B3            0.246 -0.109 -0.232                       0.219       
## Vitamin.B5            0.131  0.337 -0.123         0.156  0.250              
## Vitamin.B6            0.211  0.146 -0.300                0.163              
## Vitamin.D                    0.334               -0.248 -0.297              
## Vitamin.E             0.164                0.297        -0.283  0.178  0.199
## Copper                       0.376               -0.178 -0.217              
## Iron                  0.178 -0.113  0.150  0.108 -0.422  0.310              
## Magnesium             0.236 -0.125 -0.224        -0.101 -0.103              
## Manganese                    0.412                       0.266              
## Phosphorus            0.272 -0.155 -0.173 -0.176        -0.150 -0.111       
## Potassium             0.261        -0.270                      -0.117       
## Selenium                     0.409                0.123  0.247 -0.133       
## Zinc                                             -0.453  0.173 -0.248  0.721
## Nutrition.Density     0.196         0.189 -0.251               -0.378 -0.101
##                      Comp.9 Comp.10 Comp.11 Comp.12 Comp.13 Comp.14 Comp.15
## Caloric.Value                0.140                                         
## Fat                                                                        
## Saturated.Fats                              -0.118          -0.288         
## Monounsaturated.Fats -0.310  0.117   0.258  -0.256          -0.103         
## Polyunsaturated.Fats                         0.147           0.391         
## Sugars                0.688  0.384                          -0.180         
## Protein                                      0.206                         
## Cholesterol           0.171          0.137   0.401  -0.624  -0.215         
## Sodium               -0.202 -0.270           0.257                  -0.117 
## Water                        0.145   0.290  -0.103          -0.266         
## Vitamin.B1           -0.134         -0.296   0.287   0.144  -0.462   0.162 
## Vitamin.B11                                                          0.401 
## Vitamin.B12           0.227 -0.271   0.520   0.145   0.345  -0.123         
## Vitamin.B2            0.268 -0.218          -0.458  -0.234   0.159  -0.230 
## Vitamin.B3           -0.124         -0.125           0.175   0.197  -0.388 
## Vitamin.B5            0.142 -0.111                                   0.260 
## Vitamin.B6                                           0.234  -0.132  -0.424 
## Vitamin.D            -0.153  0.137  -0.253          -0.231          -0.138 
## Vitamin.E                   -0.653  -0.201  -0.295  -0.214  -0.163   0.187 
## Copper               -0.145  0.142  -0.216          -0.189          -0.155 
## Iron                                 0.191   0.139  -0.217   0.349   0.158 
## Magnesium                    0.136          -0.291   0.197   0.110   0.433 
## Manganese                   -0.117                           0.140         
## Phosphorus                          -0.187                                 
## Potassium                    0.205          -0.226                   0.118 
## Selenium                                                                   
## Zinc                                -0.225  -0.103   0.111  -0.139  -0.124 
## Nutrition.Density     0.262 -0.127  -0.369   0.104   0.203   0.205         
##                      Comp.16 Comp.17 Comp.18 Comp.19 Comp.20 Comp.21 Comp.22
## Caloric.Value                                                         0.121 
## Fat                                           0.108           0.131         
## Saturated.Fats                        0.158           0.340   0.445         
## Monounsaturated.Fats -0.164          -0.140  -0.258  -0.299  -0.478         
## Polyunsaturated.Fats  0.269   0.222           0.438  -0.111                 
## Sugars                       -0.180          -0.176                         
## Protein                                      -0.103                   0.202 
## Cholesterol           0.173  -0.150  -0.128          -0.128   0.133  -0.170 
## Sodium                0.286  -0.237  -0.290  -0.142   0.167  -0.182         
## Water                -0.222   0.298   0.232          -0.399   0.232         
## Vitamin.B1                           -0.110   0.124  -0.166           0.140 
## Vitamin.B11           0.286           0.366  -0.276  -0.192  -0.105  -0.371 
## Vitamin.B12          -0.137                                           0.120 
## Vitamin.B2            0.317   0.143                  -0.143                 
## Vitamin.B3                            0.108  -0.572           0.234   0.196 
## Vitamin.B5            0.121   0.307          -0.203   0.212                 
## Vitamin.B6                   -0.313           0.206                  -0.575 
## Vitamin.D            -0.276   0.329  -0.463           0.104          -0.212 
## Vitamin.E            -0.172                                                 
## Copper                0.109  -0.307   0.314   0.228                   0.344 
## Iron                 -0.447  -0.170   0.205           0.271          -0.134 
## Magnesium                    -0.411  -0.403          -0.122   0.339         
## Manganese            -0.105  -0.262                  -0.143           0.288 
## Phosphorus                                                   -0.326         
## Potassium             0.102   0.166           0.255   0.491  -0.314   0.176 
## Selenium             -0.171          -0.282                   0.143   0.215 
## Zinc                  0.214                                                 
## Nutrition.Density    -0.268           0.105   0.112  -0.234                 
##                      Comp.23 Comp.24 Comp.25 Comp.26 Comp.27 Comp.28
## Caloric.Value                         0.297   0.304           0.781 
## Fat                                           0.658   0.122  -0.514 
## Saturated.Fats        0.169   0.216  -0.134  -0.385                 
## Monounsaturated.Fats -0.122                  -0.214                 
## Polyunsaturated.Fats                 -0.172  -0.371                 
## Sugars                                                       -0.127 
## Protein                       0.485   0.593  -0.195          -0.282 
## Cholesterol                  -0.250                                 
## Sodium                                                              
## Water                                -0.115                         
## Vitamin.B1                                                          
## Vitamin.B11           0.326                                         
## Vitamin.B12                                                         
## Vitamin.B2                    0.104                                 
## Vitamin.B3                   -0.305  -0.154                         
## Vitamin.B5           -0.582          -0.225           0.173         
## Vitamin.B6                            0.114                         
## Vitamin.D                                             0.223         
## Vitamin.E                                                           
## Copper               -0.414          -0.120          -0.193         
## Iron                                                                
## Magnesium            -0.153                                         
## Manganese             0.310                           0.616         
## Phosphorus            0.256   0.433  -0.578   0.207                 
## Potassium             0.182  -0.421          -0.122                 
## Selenium              0.238                          -0.667         
## Zinc                                                                
## Nutrition.Density    -0.213  -0.370   0.111  -0.118                 
## 
##                Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8 Comp.9
## SS loadings     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
## Proportion Var  0.036  0.036  0.036  0.036  0.036  0.036  0.036  0.036  0.036
## Cumulative Var  0.036  0.071  0.107  0.143  0.179  0.214  0.250  0.286  0.321
##                Comp.10 Comp.11 Comp.12 Comp.13 Comp.14 Comp.15 Comp.16 Comp.17
## SS loadings      1.000   1.000   1.000   1.000   1.000   1.000   1.000   1.000
## Proportion Var   0.036   0.036   0.036   0.036   0.036   0.036   0.036   0.036
## Cumulative Var   0.357   0.393   0.429   0.464   0.500   0.536   0.571   0.607
##                Comp.18 Comp.19 Comp.20 Comp.21 Comp.22 Comp.23 Comp.24 Comp.25
## SS loadings      1.000   1.000   1.000   1.000   1.000   1.000   1.000   1.000
## Proportion Var   0.036   0.036   0.036   0.036   0.036   0.036   0.036   0.036
## Cumulative Var   0.643   0.679   0.714   0.750   0.786   0.821   0.857   0.893
##                Comp.26 Comp.27 Comp.28
## SS loadings      1.000   1.000   1.000
## Proportion Var   0.036   0.036   0.036
## Cumulative Var   0.929   0.964   1.000

Loadings represent the contribution of each original variable to the principal components. Each column in the loadings matrix corresponds to a principal component and each row corresponds to an original variable. The higher the absolute value of a loading, the more strongly the original variable contributes to that principal component.

Next step is to create a scree plot of eigenvalues.

library("factoextra")
## Warning: pakiet 'factoextra' został zbudowany w wersji R 4.4.2
## Ładowanie wymaganego pakietu: ggplot2
## 
## Dołączanie pakietu: 'ggplot2'
## Następujące obiekty zostały zakryte z 'package:psych':
## 
##     %+%, alpha
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
fviz_eig(food.pca, choice = "eigenvalue", ncp = 22,barfill = "skyblue", barcolor = "darkblue", linecolor = "darkgreen", addlabels = TRUE,main = "Scree Plot of Eigenvalues", xlab = "Principal Components", ylab = "Eigenvalue", line.size = 1.2, bar.width = 0.8, fill.alpha = 0.6)          

Kaiser’s rule is recommended for deciding how many factors to keep in factor analysis. It suggests retaining only those components or factors that have eigenvalues greater than 1.

For this analysis selecting 7 components is suitable as their eigenvalues exceed 1, according to Kaiser’s rule.

eig.val<-get_eigenvalue(food.pca)
eig.val
##        eigenvalue variance.percent cumulative.variance.percent
## Dim.1  8.09798717       28.9213827                    28.92138
## Dim.2  4.53521877       16.1972099                    45.11859
## Dim.3  2.79118359        9.9685128                    55.08711
## Dim.4  1.55930387        5.5689424                    60.65605
## Dim.5  1.35295689        4.8319889                    65.48804
## Dim.6  1.23195981        4.3998565                    69.88789
## Dim.7  1.12045288        4.0016174                    73.88951
## Dim.8  0.96621219        3.4507578                    77.34027
## Dim.9  0.85321321        3.0471900                    80.38746
## Dim.10 0.71083911        2.5387111                    82.92617
## Dim.11 0.64403720        2.3001329                    85.22630
## Dim.12 0.52073642        1.8597729                    87.08608
## Dim.13 0.46642541        1.6658050                    88.75188
## Dim.14 0.42427369        1.5152632                    90.26714
## Dim.15 0.39893423        1.4247651                    91.69191
## Dim.16 0.35058411        1.2520861                    92.94399
## Dim.17 0.31594403        1.1283715                    94.07237
## Dim.18 0.29811203        1.0646858                    95.13705
## Dim.19 0.27504842        0.9823158                    96.11937
## Dim.20 0.23399230        0.8356868                    96.95505
## Dim.21 0.21206717        0.7573827                    97.71244
## Dim.22 0.17883894        0.6387105                    98.35115
## Dim.23 0.15307679        0.5467028                    98.89785
## Dim.24 0.10338690        0.3692389                    99.26709
## Dim.25 0.07324001        0.2615715                    99.52866
## Dim.26 0.06515735        0.2327048                    99.76137
## Dim.27 0.03674901        0.1312465                    99.89261
## Dim.28 0.03006851        0.1073876                   100.00000

The results show that PC7 explains 73.89% of variation.

Components analysis and correlation circle

fviz_pca_ind(food.pca, col.ind="cos2", geom = "point", gradient.cols = c("blue", "purple", "red"), title = "PCA - Individual points")

The graph shows a dense area of points. Let’s remember that there are 7 dimensions.

fviz_pca_var(food.pca, col.var = "blue", labelsize = 3)

The PCA variable plot indicates that most of the variable labels are clustered in the right half.

Below, there are presented graphs of contribution of different variables for each dimension.

library(gridExtra)
## 
## Dołączanie pakietu: 'gridExtra'
## Następujący obiekt został zakryty z 'package:dplyr':
## 
##     combine
var<-get_pca_var(food.pca)
PC1<-fviz_contrib(food.pca, "var", axes=1, xtickslab.rt=90)
grid.arrange(PC1,ncol = 1,top='Contribution to Principal Components')

PC2<-fviz_contrib(food.pca, "var", axes=2, xtickslab.rt=90)
grid.arrange(PC2,ncol = 1,top='Contribution to Principal Components')

PC3<-fviz_contrib(food.pca, "var", axes=3, xtickslab.rt=90)
grid.arrange(PC3,ncol = 1,top='Contribution to Principal Components')

PC4<-fviz_contrib(food.pca, "var", axes=4, xtickslab.rt=90)
grid.arrange(PC4,ncol = 1,top='Contribution to Principal Components')

PC5<-fviz_contrib(food.pca, "var", axes=5, xtickslab.rt=90)
grid.arrange(PC5,ncol = 1,top='Contribution to Principal Components')

PC6<-fviz_contrib(food.pca, "var", axes=6, xtickslab.rt=90)
grid.arrange(PC6,ncol = 1,top='Contribution to Principal Components')

PC7<-fviz_contrib(food.pca, "var", axes=7, xtickslab.rt=90)
grid.arrange(PC7,ncol = 1,top='Contribution to Principal Components')

Hierarchical clustering

Let’s present the hierarchical clustering dendogram for the variables.

distance.m<-dist(t(food))
hc<-hclust(distance.m, method="complete") 
plot(hc, hang=-1)
rect.hclust(hc, k = 7, border='#3399FF')

sub_grp<-cutree(hc, k=7) 
fviz_cluster(list(data = distance.m, cluster = sub_grp), palette=c("blue", "purple","green", "yellow", "red", "brown", "orange" ))

Combining hierarchical clustering dendogram and the cluster plot it can be said that the branches of the tree cover with the clusters.

Conclusion

This study aimed to explore the possibility of simplifying the complexity of happiness scores by identifying a smaller set of key variables using PCA for dimensional reduction. The results demonstrated that 7 principal components are sufficient to represent the data. The comparison revealed that both PCA and hierarchical clustering led to similar conclusions, confirming the robustness of the results.