The dataset contains 77 rows and 14 columns. First 70 rows will be used to train the model and the remaining 7 rows will be used for testing the model (prediction). PCA function in factoextra package takes the following syntax. PCA(X,scale.unit=TRUE,ncp=5,graph=TRUE).
x: a data frame. Rows are individuals and columns are numeric variables. scale.unit: a logical value. If TRUE, the data are scaled to unit variance before the analysis. This standardization to the same scale avoids some variables to become dominant just because of their large measurement units. It makes variable comparable. ncp: number of dimensions kept in the final results. graph: a logical value. If TRUE, a graph is displayed.
## record name mfr type protein fat sodium fiber carbo sugars
## 1 1 100%_Bran N C 4 1 130 10 5 6
## 2 2 100%_Natural_Bran Q C 3 5 15 2 8 8
## 3 3 All-Bran K C 4 1 260 9 7 5
## potass vitamins calories rating
## 1 280 25 70 68.40
## 2 135 0 120 33.98
## 3 320 25 70 59.43
## [,1]
## [1,] "record"
## [2,] "name"
## [3,] "mfr"
## [4,] "type"
## [5,] "protein"
## [6,] "fat"
## [7,] "sodium"
## [8,] "fiber"
## [9,] "carbo"
## [10,] "sugars"
## [11,] "potass"
## [12,] "vitamins"
## [13,] "calories"
## [14,] "rating"


## **Results for the Principal Component Analysis (PCA)**
## The analysis was performed on 70 individuals, described by 9 variables
## *The results are available in the following objects:
##
## name description
## 1 "$eig" "eigenvalues"
## 2 "$var" "results for the variables"
## 3 "$var$coord" "coord. for the variables"
## 4 "$var$cor" "correlations variables - dimensions"
## 5 "$var$cos2" "cos2 for the variables"
## 6 "$var$contrib" "contributions of the variables"
## 7 "$ind" "results for the individuals"
## 8 "$ind$coord" "coord. for the individuals"
## 9 "$ind$cos2" "cos2 for the individuals"
## 10 "$ind$contrib" "contributions of the individuals"
## 11 "$call" "summary statistics"
## 12 "$call$centre" "mean of the variables"
## 13 "$call$ecart.type" "standard error of the variables"
## 14 "$call$row.w" "weights for the individuals"
## 15 "$call$col.w" "weights for the variables"
##
## Call:
## PCA(X = df.train, graph = TRUE)
##
##
## Eigenvalues
## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5 Dim.6 Dim.7
## Variance 2.715 2.062 1.590 1.045 0.622 0.505 0.374
## % of var. 30.172 22.912 17.663 11.615 6.906 5.612 4.157
## Cumulative % of var. 30.172 53.084 70.746 82.361 89.267 94.880 99.036
## Dim.8 Dim.9
## Variance 0.059 0.027
## % of var. 0.660 0.304
## Cumulative % of var. 99.696 100.000
##
## Individuals (the 10 first)
## Dist Dim.1 ctr cos2 Dim.2 ctr cos2 Dim.3 ctr
## 1 | 5.320 | 5.005 13.179 0.885 | 0.119 0.010 0.000 | 0.127 0.014
## 2 | 4.764 | 1.667 1.461 0.122 | 2.615 4.737 0.301 | -1.961 3.456
## 3 | 5.338 | 4.631 11.281 0.752 | 0.196 0.027 0.001 | 1.183 1.257
## 4 | 7.135 | 6.354 21.238 0.793 | -1.677 1.949 0.055 | 1.269 1.446
## 5 | 1.319 | -0.476 0.119 0.130 | 0.776 0.417 0.346 | -0.223 0.045
## 6 | 1.748 | -0.293 0.045 0.028 | 1.158 0.929 0.439 | -0.927 0.772
## 7 | 2.445 | -0.875 0.403 0.128 | 0.403 0.113 0.027 | -1.540 2.132
## 8 | 1.903 | -0.545 0.156 0.082 | 1.140 0.900 0.359 | 0.931 0.779
## 9 | 1.415 | 0.555 0.162 0.154 | -0.473 0.155 0.112 | 0.198 0.035
## 10 | 2.383 | 1.697 1.515 0.507 | -0.716 0.355 0.090 | 0.763 0.524
## cos2
## 1 0.001 |
## 2 0.169 |
## 3 0.049 |
## 4 0.032 |
## 5 0.028 |
## 6 0.281 |
## 7 0.397 |
## 8 0.239 |
## 9 0.019 |
## 10 0.103 |
##
## Variables
## Dim.1 ctr cos2 Dim.2 ctr cos2 Dim.3 ctr cos2
## protein | 0.604 13.449 0.365 | 0.104 0.527 0.011 | 0.501 15.774 0.251 |
## fat | 0.128 0.607 0.016 | 0.764 28.313 0.584 | -0.050 0.155 0.002 |
## sodium | -0.341 4.284 0.116 | 0.237 2.729 0.056 | 0.591 21.986 0.350 |
## fiber | 0.893 29.362 0.797 | 0.046 0.103 0.002 | 0.246 3.810 0.061 |
## carbo | -0.549 11.098 0.301 | -0.375 6.832 0.141 | 0.576 20.880 0.332 |
## sugars | -0.193 1.370 0.037 | 0.781 29.543 0.609 | -0.384 9.280 0.148 |
## potass | 0.873 28.082 0.763 | 0.247 2.970 0.061 | 0.269 4.551 0.072 |
## vitamins | -0.356 4.674 0.127 | 0.177 1.527 0.031 | 0.561 19.804 0.315 |
## calories | -0.438 7.075 0.192 | 0.752 27.455 0.566 | 0.245 3.761 0.060 |
Eigenvalues /Variances
To plot the variables in the top two dimension
fviz_pca_var( train.pca, col.var = "black", title = "Correlation Plot of Variables")

Quality of Representation
The quality of representation of the variables on factor map is called cos2 (square cosine, squared coordinates) . A high cos2 indicates a good representation of the variable on the principal component. In this case the variable is positioned close to the circumference of the correlation circle. A low cos2 indicates that the variable is not perfectly represented by the component. We can obtain cos2 as follow:
## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5
## protein 0.36520203 0.010861453 0.250749201 0.16858307 0.014811137
## fat 0.01647943 0.583837641 0.002456503 0.17550950 0.005857528
## sodium 0.11633348 0.056265847 0.349505101 0.16109243 0.222579334
## fiber 0.79731576 0.002131563 0.060564172 0.06535682 0.003298728
Contribution of variables to PC
The larger the value of the contribution, the more the variable contributes to the component. We can use the function corrplot() to highlight the most contributing variable for each column.

Contribution of Variables to PC1


The most important contributing variables can be highlighted in the correlation plot as
fviz_pca_var(train.pca, col.var = "contrib", gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"))

`dimdesc() can be used to identify the most significantly associated variables with a given principal component . The output will be sorted by p-values. This function can be used as follows.
desc <- dimdesc(train.pca, axes = c(1,2), proba = 0.05)
# Description of dimension 1
desc$Dim.1
## $quanti
## correlation p.value
## fiber 0.8929254 2.907970e-25
## potass 0.8732524 6.445535e-23
## protein 0.6043195 3.035490e-08
## sodium -0.3410769 3.859072e-03
## vitamins -0.3562426 2.471195e-03
## calories -0.4382992 1.476686e-04
## carbo -0.5489537 8.629781e-07
##
## attr(,"class")
## [1] "condes" "list"