The purpose of the project is to investigate and proceed the dimension reduction on the dataset Food Nutrition from Kaggle (https://www.kaggle.com/datasets/utsavdey1410/food-nutrition-dataset). This project focuses on analyzing a comprehensive nutritional food database, which provides detailed information on the macro and micronutrient content of various food items. The dataset includes essential nutritional values such as caloric content, fat, carbohydrates, proteins, vitamins, and minerals, which are crucial for understanding dietary needs. With this data, there can be explored the nutritional value of foods and identify patterns in food composition, aiding in healthier dietary planning.
Principal Component Analysis is a dimensionality reduction technique that simplifies complex datasets by transforming correlated variables into a smaller set of uncorrelated components. It retains the most important information while reducing redundancy, making it ideal for visualizing and analyzing high-dimensional data. PCA will be used in this project to reduce the dimensionality of the nutritional data, helping to identify the most significant factors that explain the variance in food composition.
The first step is to load and investigate the dataset.
food <- read.csv("FOOD-DATA-GROUP1.csv")
head(food,1)
## X Unnamed..0 food Caloric.Value Fat Saturated.Fats
## 1 0 0 cream cheese 51 5 2.9
## Monounsaturated.Fats Polyunsaturated.Fats Carbohydrates Sugars Protein
## 1 1.3 0.2 0.8 0.5 0.9
## Dietary.Fiber Cholesterol Sodium Water Vitamin.A Vitamin.B1 Vitamin.B11
## 1 0 14.6 0.016 7.6 0.2 0.033 0.064
## Vitamin.B12 Vitamin.B2 Vitamin.B3 Vitamin.B5 Vitamin.B6 Vitamin.C Vitamin.D
## 1 0.092 0.097 0.084 0.052 0.096 0.004 0
## Vitamin.E Vitamin.K Calcium Copper Iron Magnesium Manganese Phosphorus
## 1 0 0.1 0.008 14.1 0.082 0.027 1.3 0.091
## Potassium Selenium Zinc Nutrition.Density
## 1 15.5 19.1 0.039 7.07
str(food)
## 'data.frame': 551 obs. of 37 variables:
## $ X : int 0 1 2 3 4 5 6 7 8 9 ...
## $ Unnamed..0 : int 0 1 2 3 4 5 6 7 8 9 ...
## $ food : chr "cream cheese" "neufchatel cheese" "requeijao cremoso light catupiry" "ricotta cheese" ...
## $ Caloric.Value : int 51 215 49 30 30 19 116 113 71 19 ...
## $ Fat : num 5 19.4 3.6 2 2.3 0.2 9.1 9.3 4.5 1.3 ...
## $ Saturated.Fats : num 2.9 10.9 2.3 1.3 1.4 0.1 5.3 5.3 2.7 0.9 ...
## $ Monounsaturated.Fats: num 1.3 4.9 0.9 0.5 0.6 0.091 2.8 2.6 1.4 0.4 ...
## $ Polyunsaturated.Fats: num 0.2 0.8 0 0.002 0.042 0.075 0.5 0.3 0.1 0.035 ...
## $ Carbohydrates : num 0.8 3.1 0.9 1.5 1.2 1.4 0.1 0.9 0.6 0.2 ...
## $ Sugars : num 0.5 2.7 3.4 0.091 0.9 1 0.1 0.1 0.046 0.088 ...
## $ Protein : num 0.9 7.8 0.8 1.5 1.2 2.8 8.3 6.4 6.4 1.6 ...
## $ Dietary.Fiber : num 0 0 0.1 0 0 0 0 0 0 0 ...
## $ Cholesterol : num 14.6 62.9 0 9.8 8.1 2.2 30.8 27.7 12.2 5.2 ...
## $ Sodium : num 0.016 0.3 0 0.017 0.046 0.1 0.2 0.2 0.2 0.008 ...
## $ Water : num 7.6 53.6 0 14.7 10 12.9 9.3 10.3 5.4 1.5 ...
## $ Vitamin.A : num 0.2 0.2 0 0.075 0.016 0.063 0.061 0.054 0.067 0.064 ...
## $ Vitamin.B1 : num 0.033 0.099 0 0.019 0.08 0.02 0.021 0.031 0.062 0.058 ...
## $ Vitamin.B11 : num 0.064 0.079 0 0.079 0.062 0.089 0.072 0.005 0.099 0.026 ...
## $ Vitamin.B12 : num 0.092 0.09 0 0.091 0.049 0.092 0.078 0.073 0.059 0.045 ...
## $ Vitamin.B2 : num 0.097 0.1 0 0.027 0.026 0.021 0.004 0.1 0.057 0.059 ...
## $ Vitamin.B3 : num 0.084 0.2 0 0.041 0.08 0.025 0.043 0.01 0.039 0.055 ...
## $ Vitamin.B5 : num 0.052 0.5 0 0.016 0.1 0.2 0.2 0.1 0.06 0.025 ...
## $ Vitamin.B6 : num 0.096 0.078 0 0.007 0.003 0.038 0.051 0.005 0.066 0.029 ...
## $ Vitamin.C : num 0.004 0 0 0.006 0 0 0 0 0 0 ...
## $ Vitamin.D : num 0 0 0 0 0.036 0 0.034 0.06 0.095 0.073 ...
## $ Vitamin.E : num 0 0.3 0 0.001 0.009 0.049 0.035 0.2 0.018 0.078 ...
## $ Vitamin.K : num 0.1 0.045 0 0.011 0.019 0.059 0.048 0.035 0.021 0.004 ...
## $ Calcium : num 0.008 99.5 0 0.097 22.2 ...
## $ Copper : num 14.1 0.034 0 41.2 0.072 0.039 0.033 0.099 0.051 0.046 ...
## $ Iron : num 0.082 0.1 0 0.097 0.008 0.053 0.094 0.077 0.1 0.03 ...
## $ Magnesium : num 0.027 8.5 0 0.096 1.2 4 10.1 7.6 7.9 2.1 ...
## $ Manganese : num 1.3 0.088 0 4 0.098 0.028 0.002 0.063 0.073 0.002 ...
## $ Phosphorus : num 0.091 117.3 0 0.024 22.8 ...
## $ Potassium : num 15.5 129.2 0 30.8 37.1 ...
## $ Selenium : num 19.1 0.054 0 43.8 0.034 0.013 0.079 0.009 0.045 0.087 ...
## $ Zinc : num 0.039 0.7 0 0.035 0.053 0.3 1.1 1 0.5 0.1 ...
## $ Nutrition.Density : num 7.07 130.1 5.4 5.2 27.01 ...
food <- food[3:length(food)]
Caloric values needs a conversion into numeric variables.
food$Caloric.Value <- as.numeric(gsub("[^0-9.]", "", food$Caloric.Value))
dim(food)
## [1] 551 35
Let’s check the NAs int the dataset.
library(dplyr)
##
## Dołączanie pakietu: 'dplyr'
## Następujące obiekty zostały zakryte z 'package:stats':
##
## filter, lag
## Następujące obiekty zostały zakryte z 'package:base':
##
## intersect, setdiff, setequal, union
sum(is.na(food))
## [1] 0
The data needs to be scaled.
food <- scale(food[, 2:length(food)])
Subsequently, the data needs summarizing.
summary(food)
## Caloric.Value Fat Saturated.Fats Monounsaturated.Fats
## Min. :-1.1763 Min. :-0.8583 Min. :-0.6898 Min. :-0.7222
## 1st Qu.:-0.7170 1st Qu.:-0.6590 1st Qu.:-0.5971 1st Qu.:-0.6139
## Median :-0.2578 Median :-0.3322 Median :-0.3563 Median :-0.3252
## Mean : 0.0000 Mean : 0.0000 Mean : 0.0000 Mean : 0.0000
## 3rd Qu.: 0.5001 3rd Qu.: 0.2896 3rd Qu.: 0.1996 3rd Qu.: 0.2073
## Max. : 6.7289 Max. : 6.1170 Max. : 7.3702 Max. : 7.9411
## Polyunsaturated.Fats Carbohydrates Sugars Protein
## Min. :-0.6254 Min. :-0.7797 Min. :-0.36334 Min. :-0.9728
## 1st Qu.:-0.5102 1st Qu.:-0.7797 1st Qu.:-0.36334 1st Qu.:-0.6929
## Median :-0.3085 Median :-0.5283 Median :-0.36334 Median :-0.3443
## Mean : 0.0000 Mean : 0.0000 Mean : 0.00000 Mean : 0.0000
## 3rd Qu.: 0.1237 3rd Qu.: 0.6349 3rd Qu.:-0.06826 3rd Qu.: 0.3820
## Max. :10.9280 Max. : 5.5439 Max. : 9.13280 Max. : 3.6172
## Dietary.Fiber Cholesterol Sodium Water
## Min. :-0.5092 Min. :-0.8781 Min. :-0.9011 Min. :-1.1487
## 1st Qu.:-0.5092 1st Qu.:-0.7391 1st Qu.:-0.7439 1st Qu.:-0.8560
## Median :-0.5092 Median :-0.3948 Median :-0.2723 Median :-0.2820
## Mean : 0.0000 Mean : 0.0000 Mean : 0.0000 Mean : 0.0000
## 3rd Qu.: 0.1863 3rd Qu.: 0.3002 3rd Qu.: 0.5137 3rd Qu.: 0.7615
## Max. : 7.6051 Max. : 4.1474 Max. : 8.6882 Max. : 4.9055
## Vitamin.A Vitamin.B1 Vitamin.B11 Vitamin.B12
## Min. :-0.47732 Min. :-0.7381 Min. :-0.7148 Min. :-1.10820
## 1st Qu.:-0.47732 1st Qu.:-0.5810 1st Qu.:-0.4862 1st Qu.:-0.89470
## Median :-0.22663 Median :-0.3286 Median :-0.1597 Median :-0.06737
## Mean : 0.00000 Mean : 0.0000 Mean : 0.0000 Mean : 0.00000
## 3rd Qu.: 0.04495 3rd Qu.: 0.2143 3rd Qu.: 0.1886 3rd Qu.: 0.70657
## Max. :12.05729 Max. : 8.3095 Max. :13.4342 Max. : 9.56695
## Vitamin.B2 Vitamin.B3 Vitamin.B5 Vitamin.B6
## Min. :-0.6796 Min. :-0.61805 Min. :-0.4468 Min. :-0.6283
## 1st Qu.:-0.5085 1st Qu.:-0.55090 1st Qu.:-0.4058 1st Qu.:-0.5350
## Median :-0.3409 Median :-0.31589 Median :-0.2416 Median :-0.4254
## Mean : 0.0000 Mean : 0.00000 Mean : 0.0000 Mean : 0.0000
## 3rd Qu.: 0.3367 3rd Qu.: 0.08698 3rd Qu.: 0.0456 3rd Qu.: 0.1832
## Max. :12.1936 Max. : 9.08453 Max. :12.4375 Max. : 8.0955
## Vitamin.C Vitamin.D Vitamin.E Vitamin.K
## Min. :-0.41544 Min. :-0.1226 Min. :-0.44666 Min. :-0.06039
## 1st Qu.:-0.41544 1st Qu.:-0.1226 1st Qu.:-0.44666 1st Qu.:-0.06039
## Median :-0.40122 Median :-0.1226 Median :-0.41321 Median :-0.05912
## Mean : 0.00000 Mean : 0.0000 Mean : 0.00000 Mean : 0.00000
## 3rd Qu.:-0.08958 3rd Qu.:-0.1070 3rd Qu.: 0.08356 3rd Qu.:-0.05158
## Max. : 8.05683 Max. :20.7554 Max. :11.05506 Max. :23.38605
## Calcium Copper Iron Magnesium
## Min. :-0.55607 Min. :-0.1933 Min. :-0.8127 Min. :-0.7627
## 1st Qu.:-0.49950 1st Qu.:-0.1922 1st Qu.:-0.6534 1st Qu.:-0.6192
## Median :-0.33153 Median :-0.1911 Median :-0.2818 Median :-0.2969
## Mean : 0.00000 Mean : 0.0000 Mean : 0.0000 Mean : 0.0000
## 3rd Qu.: 0.03841 3rd Qu.:-0.1866 3rd Qu.: 0.3553 3rd Qu.: 0.2771
## Max. : 6.96876 Max. :14.8723 Max. :10.3892 Max. : 7.5426
## Manganese Phosphorus Potassium Selenium
## Min. :-0.2808 Min. :-0.9022 Min. :-0.9134 Min. :-0.2288
## 1st Qu.:-0.2770 1st Qu.:-0.6959 1st Qu.:-0.6767 1st Qu.:-0.2286
## Median :-0.2729 Median :-0.3516 Median :-0.3263 Median :-0.2284
## Mean : 0.0000 Mean : 0.0000 Mean : 0.0000 Mean : 0.0000
## 3rd Qu.:-0.2411 3rd Qu.: 0.4012 3rd Qu.: 0.2864 3rd Qu.:-0.2282
## Max. : 8.5792 Max. : 4.7849 Max. : 7.5003 Max. :10.7426
## Zinc Nutrition.Density
## Min. :-0.253536 Min. :-0.7679
## 1st Qu.:-0.207521 1st Qu.:-0.5599
## Median :-0.130830 Median :-0.3238
## Mean : 0.000000 Mean : 0.0000
## 3rd Qu.: 0.007215 3rd Qu.: 0.1392
## Max. :22.339749 Max. : 6.3745
The Kaiser-Meyer-Olkin (KMO) measure assesses the adequacy of sample data for factor analysis. It evaluates the proportion of variance in variables that might be caused by underlying factors. A KMO value closer to 1 indicates that the data is suitable for factor analysis, while values below 0.5 suggest that factor analysis may not be appropriate.
library("psych")
## Warning: pakiet 'psych' został zbudowany w wersji R 4.4.2
KMO(food)
## Kaiser-Meyer-Olkin factor adequacy
## Call: KMO(r = food)
## Overall MSA = 0.51
## MSA for each item =
## Caloric.Value Fat Saturated.Fats
## 0.82 0.40 0.75
## Monounsaturated.Fats Polyunsaturated.Fats Carbohydrates
## 0.70 0.80 0.25
## Sugars Protein Dietary.Fiber
## 0.87 0.39 0.13
## Cholesterol Sodium Water
## 0.95 0.84 0.87
## Vitamin.A Vitamin.B1 Vitamin.B11
## 0.14 0.89 0.78
## Vitamin.B12 Vitamin.B2 Vitamin.B3
## 0.79 0.88 0.93
## Vitamin.B5 Vitamin.B6 Vitamin.C
## 0.79 0.83 0.10
## Vitamin.D Vitamin.E Vitamin.K
## 0.58 0.92 0.21
## Calcium Copper Iron
## 0.24 0.72 0.24
## Magnesium Manganese Phosphorus
## 0.96 0.59 0.95
## Potassium Selenium Zinc
## 0.92 0.59 0.84
## Nutrition.Density
## 0.31
As the KMO turns out to be rather low I decided to remove variables with low MSA such as Carbohydrates, Dietary Fiber, Vitamin A and Vitamin C, Vitamin K and Calcium.
food <- as.data.frame(food)
food <- food %>% select(-Carbohydrates, -Dietary.Fiber, -Vitamin.A, -Vitamin.C, - Vitamin.K, -Calcium)
Let’s check KMO once again.
KMO(food)
## Kaiser-Meyer-Olkin factor adequacy
## Call: KMO(r = food)
## Overall MSA = 0.81
## MSA for each item =
## Caloric.Value Fat Saturated.Fats
## 0.84 0.85 0.83
## Monounsaturated.Fats Polyunsaturated.Fats Sugars
## 0.87 0.83 0.53
## Protein Cholesterol Sodium
## 0.85 0.90 0.74
## Water Vitamin.B1 Vitamin.B11
## 0.83 0.79 0.84
## Vitamin.B12 Vitamin.B2 Vitamin.B3
## 0.74 0.80 0.91
## Vitamin.B5 Vitamin.B6 Vitamin.D
## 0.82 0.87 0.62
## Vitamin.E Copper Iron
## 0.89 0.71 0.79
## Magnesium Manganese Phosphorus
## 0.94 0.65 0.87
## Potassium Selenium Zinc
## 0.88 0.64 0.52
## Nutrition.Density
## 0.80
The overall KMO value has increased to 0.81, indicating that the dataset is now more suitable for factor analysis. This represents a notable improvement compared to the previous result, suggesting that the removal of certain variables has enhanced the adequacy of the data.
dim(food)
## [1] 551 28
The data has 551 observations and 28 variables.
The Bartlett test is a statistical test used to determine whether the variables in a dataset are sufficiently correlated.
cortest.bartlett(food, n = 15428)
## R was not square, finding R from data
## $chisq
## [1] 13059.71
##
## $p.value
## [1] 0
##
## $df
## [1] 378
The p-value of Bartlett’s test is 0, indicating that our correlation matrix significantly differs from the identity matrix.
The correlation matrix has been investigated, providing deeper insights into the relationships between variables and confirming the suitability of the data for further factor analysis.
library(corrplot)
## Warning: pakiet 'corrplot' został zbudowany w wersji R 4.4.2
## corrplot 0.95 loaded
cor <- cor(food)
corrplot(cor, type = "lower", order = "hclust", tl.col = "black", tl.cex = 0.5)
For the the singular value decomposition analysis to compute the principal components prcomp function is used.
food.pca <- prcomp(food, center = TRUE, scale = TRUE)
summary(food.pca)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 2.8457 2.1296 1.67068 1.24872 1.16317 1.1099 1.05851
## Proportion of Variance 0.2892 0.1620 0.09969 0.05569 0.04832 0.0440 0.04002
## Cumulative Proportion 0.2892 0.4512 0.55087 0.60656 0.65488 0.6989 0.73890
## PC8 PC9 PC10 PC11 PC12 PC13 PC14
## Standard deviation 0.98296 0.92370 0.84311 0.8025 0.7216 0.68295 0.65136
## Proportion of Variance 0.03451 0.03047 0.02539 0.0230 0.0186 0.01666 0.01515
## Cumulative Proportion 0.77340 0.80387 0.82926 0.8523 0.8709 0.88752 0.90267
## PC15 PC16 PC17 PC18 PC19 PC20 PC21
## Standard deviation 0.63161 0.59210 0.56209 0.54600 0.52445 0.48373 0.46051
## Proportion of Variance 0.01425 0.01252 0.01128 0.01065 0.00982 0.00836 0.00757
## Cumulative Proportion 0.91692 0.92944 0.94072 0.95137 0.96119 0.96955 0.97712
## PC22 PC23 PC24 PC25 PC26 PC27 PC28
## Standard deviation 0.42289 0.39125 0.32154 0.27063 0.25526 0.19170 0.17340
## Proportion of Variance 0.00639 0.00547 0.00369 0.00262 0.00233 0.00131 0.00107
## Cumulative Proportion 0.98351 0.98898 0.99267 0.99529 0.99761 0.99893 1.00000
PC1 has the highest standard deviation and explains approximately 28.92% of the total variance.
There is also another approach - the spectral decomposition approach.
food.pca2<-princomp(food)
loadings(food.pca2)
##
## Loadings:
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8
## Caloric.Value 0.316 0.203
## Fat 0.275 0.311 0.165
## Saturated.Fats 0.188 0.368 -0.277 0.171 -0.154
## Monounsaturated.Fats 0.207 0.287 -0.109 0.232 0.115 0.154
## Polyunsaturated.Fats 0.234 0.174 0.290 0.286 0.172
## Sugars 0.219 0.349 -0.121 0.176
## Protein 0.287 -0.277
## Cholesterol 0.248 -0.125 -0.185 0.102 0.155
## Sodium 0.115 0.241 0.308 -0.196 -0.143 -0.431 -0.268
## Water 0.190 -0.148 0.374 -0.147 -0.142 -0.339 -0.137
## Vitamin.B1 0.169 -0.235 0.374 0.361 -0.281
## Vitamin.B11 0.378 -0.176 -0.132
## Vitamin.B12 0.163 -0.376 -0.307 -0.293 0.182
## Vitamin.B2 0.174 -0.207 -0.256 0.281 -0.361
## Vitamin.B3 0.246 -0.109 -0.232 0.219
## Vitamin.B5 0.131 0.337 -0.123 0.156 0.250
## Vitamin.B6 0.211 0.146 -0.300 0.163
## Vitamin.D 0.334 -0.248 -0.297
## Vitamin.E 0.164 0.297 -0.283 0.178 0.199
## Copper 0.376 -0.178 -0.217
## Iron 0.178 -0.113 0.150 0.108 -0.422 0.310
## Magnesium 0.236 -0.125 -0.224 -0.101 -0.103
## Manganese 0.412 0.266
## Phosphorus 0.272 -0.155 -0.173 -0.176 -0.150 -0.111
## Potassium 0.261 -0.270 -0.117
## Selenium 0.409 0.123 0.247 -0.133
## Zinc -0.453 0.173 -0.248 0.721
## Nutrition.Density 0.196 0.189 -0.251 -0.378 -0.101
## Comp.9 Comp.10 Comp.11 Comp.12 Comp.13 Comp.14 Comp.15
## Caloric.Value 0.140
## Fat
## Saturated.Fats -0.118 -0.288
## Monounsaturated.Fats -0.310 0.117 0.258 -0.256 -0.103
## Polyunsaturated.Fats 0.147 0.391
## Sugars 0.688 0.384 -0.180
## Protein 0.206
## Cholesterol 0.171 0.137 0.401 -0.624 -0.215
## Sodium -0.202 -0.270 0.257 -0.117
## Water 0.145 0.290 -0.103 -0.266
## Vitamin.B1 -0.134 -0.296 0.287 0.144 -0.462 0.162
## Vitamin.B11 0.401
## Vitamin.B12 0.227 -0.271 0.520 0.145 0.345 -0.123
## Vitamin.B2 0.268 -0.218 -0.458 -0.234 0.159 -0.230
## Vitamin.B3 -0.124 -0.125 0.175 0.197 -0.388
## Vitamin.B5 0.142 -0.111 0.260
## Vitamin.B6 0.234 -0.132 -0.424
## Vitamin.D -0.153 0.137 -0.253 -0.231 -0.138
## Vitamin.E -0.653 -0.201 -0.295 -0.214 -0.163 0.187
## Copper -0.145 0.142 -0.216 -0.189 -0.155
## Iron 0.191 0.139 -0.217 0.349 0.158
## Magnesium 0.136 -0.291 0.197 0.110 0.433
## Manganese -0.117 0.140
## Phosphorus -0.187
## Potassium 0.205 -0.226 0.118
## Selenium
## Zinc -0.225 -0.103 0.111 -0.139 -0.124
## Nutrition.Density 0.262 -0.127 -0.369 0.104 0.203 0.205
## Comp.16 Comp.17 Comp.18 Comp.19 Comp.20 Comp.21 Comp.22
## Caloric.Value 0.121
## Fat 0.108 0.131
## Saturated.Fats 0.158 0.340 0.445
## Monounsaturated.Fats -0.164 -0.140 -0.258 -0.299 -0.478
## Polyunsaturated.Fats 0.269 0.222 0.438 -0.111
## Sugars -0.180 -0.176
## Protein -0.103 0.202
## Cholesterol 0.173 -0.150 -0.128 -0.128 0.133 -0.170
## Sodium 0.286 -0.237 -0.290 -0.142 0.167 -0.182
## Water -0.222 0.298 0.232 -0.399 0.232
## Vitamin.B1 -0.110 0.124 -0.166 0.140
## Vitamin.B11 0.286 0.366 -0.276 -0.192 -0.105 -0.371
## Vitamin.B12 -0.137 0.120
## Vitamin.B2 0.317 0.143 -0.143
## Vitamin.B3 0.108 -0.572 0.234 0.196
## Vitamin.B5 0.121 0.307 -0.203 0.212
## Vitamin.B6 -0.313 0.206 -0.575
## Vitamin.D -0.276 0.329 -0.463 0.104 -0.212
## Vitamin.E -0.172
## Copper 0.109 -0.307 0.314 0.228 0.344
## Iron -0.447 -0.170 0.205 0.271 -0.134
## Magnesium -0.411 -0.403 -0.122 0.339
## Manganese -0.105 -0.262 -0.143 0.288
## Phosphorus -0.326
## Potassium 0.102 0.166 0.255 0.491 -0.314 0.176
## Selenium -0.171 -0.282 0.143 0.215
## Zinc 0.214
## Nutrition.Density -0.268 0.105 0.112 -0.234
## Comp.23 Comp.24 Comp.25 Comp.26 Comp.27 Comp.28
## Caloric.Value 0.297 0.304 0.781
## Fat 0.658 0.122 -0.514
## Saturated.Fats 0.169 0.216 -0.134 -0.385
## Monounsaturated.Fats -0.122 -0.214
## Polyunsaturated.Fats -0.172 -0.371
## Sugars -0.127
## Protein 0.485 0.593 -0.195 -0.282
## Cholesterol -0.250
## Sodium
## Water -0.115
## Vitamin.B1
## Vitamin.B11 0.326
## Vitamin.B12
## Vitamin.B2 0.104
## Vitamin.B3 -0.305 -0.154
## Vitamin.B5 -0.582 -0.225 0.173
## Vitamin.B6 0.114
## Vitamin.D 0.223
## Vitamin.E
## Copper -0.414 -0.120 -0.193
## Iron
## Magnesium -0.153
## Manganese 0.310 0.616
## Phosphorus 0.256 0.433 -0.578 0.207
## Potassium 0.182 -0.421 -0.122
## Selenium 0.238 -0.667
## Zinc
## Nutrition.Density -0.213 -0.370 0.111 -0.118
##
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8 Comp.9
## SS loadings 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
## Proportion Var 0.036 0.036 0.036 0.036 0.036 0.036 0.036 0.036 0.036
## Cumulative Var 0.036 0.071 0.107 0.143 0.179 0.214 0.250 0.286 0.321
## Comp.10 Comp.11 Comp.12 Comp.13 Comp.14 Comp.15 Comp.16 Comp.17
## SS loadings 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
## Proportion Var 0.036 0.036 0.036 0.036 0.036 0.036 0.036 0.036
## Cumulative Var 0.357 0.393 0.429 0.464 0.500 0.536 0.571 0.607
## Comp.18 Comp.19 Comp.20 Comp.21 Comp.22 Comp.23 Comp.24 Comp.25
## SS loadings 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
## Proportion Var 0.036 0.036 0.036 0.036 0.036 0.036 0.036 0.036
## Cumulative Var 0.643 0.679 0.714 0.750 0.786 0.821 0.857 0.893
## Comp.26 Comp.27 Comp.28
## SS loadings 1.000 1.000 1.000
## Proportion Var 0.036 0.036 0.036
## Cumulative Var 0.929 0.964 1.000
Loadings represent the contribution of each original variable to the principal components. Each column in the loadings matrix corresponds to a principal component and each row corresponds to an original variable. The higher the absolute value of a loading, the more strongly the original variable contributes to that principal component.
Next step is to create a scree plot of eigenvalues.
library("factoextra")
## Warning: pakiet 'factoextra' został zbudowany w wersji R 4.4.2
## Ładowanie wymaganego pakietu: ggplot2
##
## Dołączanie pakietu: 'ggplot2'
## Następujące obiekty zostały zakryte z 'package:psych':
##
## %+%, alpha
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
fviz_eig(food.pca, choice = "eigenvalue", ncp = 22,barfill = "skyblue", barcolor = "darkblue", linecolor = "darkgreen", addlabels = TRUE,main = "Scree Plot of Eigenvalues", xlab = "Principal Components", ylab = "Eigenvalue", line.size = 1.2, bar.width = 0.8, fill.alpha = 0.6)
Kaiser’s rule is recommended for deciding how many factors to keep in
factor analysis. It suggests retaining only those components or factors
that have eigenvalues greater than 1.
For this analysis selecting 7 components is suitable as their eigenvalues exceed 1, according to Kaiser’s rule.
eig.val<-get_eigenvalue(food.pca)
eig.val
## eigenvalue variance.percent cumulative.variance.percent
## Dim.1 8.09798717 28.9213827 28.92138
## Dim.2 4.53521877 16.1972099 45.11859
## Dim.3 2.79118359 9.9685128 55.08711
## Dim.4 1.55930387 5.5689424 60.65605
## Dim.5 1.35295689 4.8319889 65.48804
## Dim.6 1.23195981 4.3998565 69.88789
## Dim.7 1.12045288 4.0016174 73.88951
## Dim.8 0.96621219 3.4507578 77.34027
## Dim.9 0.85321321 3.0471900 80.38746
## Dim.10 0.71083911 2.5387111 82.92617
## Dim.11 0.64403720 2.3001329 85.22630
## Dim.12 0.52073642 1.8597729 87.08608
## Dim.13 0.46642541 1.6658050 88.75188
## Dim.14 0.42427369 1.5152632 90.26714
## Dim.15 0.39893423 1.4247651 91.69191
## Dim.16 0.35058411 1.2520861 92.94399
## Dim.17 0.31594403 1.1283715 94.07237
## Dim.18 0.29811203 1.0646858 95.13705
## Dim.19 0.27504842 0.9823158 96.11937
## Dim.20 0.23399230 0.8356868 96.95505
## Dim.21 0.21206717 0.7573827 97.71244
## Dim.22 0.17883894 0.6387105 98.35115
## Dim.23 0.15307679 0.5467028 98.89785
## Dim.24 0.10338690 0.3692389 99.26709
## Dim.25 0.07324001 0.2615715 99.52866
## Dim.26 0.06515735 0.2327048 99.76137
## Dim.27 0.03674901 0.1312465 99.89261
## Dim.28 0.03006851 0.1073876 100.00000
The results show that PC7 explains 73.89% of variation.
fviz_pca_ind(food.pca, col.ind="cos2", geom = "point", gradient.cols = c("blue", "purple", "red"), title = "PCA - Individual points")
The graph shows a dense area of points. Let’s remember that there are 7
dimensions.
fviz_pca_var(food.pca, col.var = "blue", labelsize = 3)
The PCA variable plot indicates that most of the variable labels are
clustered in the right half.
Below, there are presented graphs of contribution of different variables for each dimension.
library(gridExtra)
##
## Dołączanie pakietu: 'gridExtra'
## Następujący obiekt został zakryty z 'package:dplyr':
##
## combine
var<-get_pca_var(food.pca)
PC1<-fviz_contrib(food.pca, "var", axes=1, xtickslab.rt=90)
grid.arrange(PC1,ncol = 1,top='Contribution to Principal Components')
PC2<-fviz_contrib(food.pca, "var", axes=2, xtickslab.rt=90)
grid.arrange(PC2,ncol = 1,top='Contribution to Principal Components')
PC3<-fviz_contrib(food.pca, "var", axes=3, xtickslab.rt=90)
grid.arrange(PC3,ncol = 1,top='Contribution to Principal Components')
PC4<-fviz_contrib(food.pca, "var", axes=4, xtickslab.rt=90)
grid.arrange(PC4,ncol = 1,top='Contribution to Principal Components')
PC5<-fviz_contrib(food.pca, "var", axes=5, xtickslab.rt=90)
grid.arrange(PC5,ncol = 1,top='Contribution to Principal Components')
PC6<-fviz_contrib(food.pca, "var", axes=6, xtickslab.rt=90)
grid.arrange(PC6,ncol = 1,top='Contribution to Principal Components')
PC7<-fviz_contrib(food.pca, "var", axes=7, xtickslab.rt=90)
grid.arrange(PC7,ncol = 1,top='Contribution to Principal Components')
Let’s present the hierarchical clustering dendogram for the variables.
distance.m<-dist(t(food))
hc<-hclust(distance.m, method="complete")
plot(hc, hang=-1)
rect.hclust(hc, k = 7, border='#3399FF')
sub_grp<-cutree(hc, k=7)
fviz_cluster(list(data = distance.m, cluster = sub_grp), palette=c("blue", "purple","green", "yellow", "red", "brown", "orange" ))
Combining hierarchical clustering dendogram and the cluster plot it can be said that the branches of the tree cover with the clusters.
This study aimed to explore the possibility of simplifying the complexity of happiness scores by identifying a smaller set of key variables using PCA for dimensional reduction. The results demonstrated that 7 principal components are sufficient to represent the data. The comparison revealed that both PCA and hierarchical clustering led to similar conclusions, confirming the robustness of the results.