With the following command
library(R4SPISE2022)
data("sixBeers12Descriptors10Judges", package = "data4PCCAR")
beers <- sixBeers12Descriptors10Judges$ratingsIntensity
beerColors <- sixBeers12Descriptors10Judges$color4Products
descColors <- sixBeers12Descriptors10Judges$color4Descriptors
beers: the ratings;beerColors: colors for the beers;descColors: colors for the descriptors.| Bitter | Acid | Sweet | Astringent | Alcoholic | Hop | Cereal | Toasted | Yeast | Hay | Malt | Fermented | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Bruxelles | 3.190103 | 2.250000 | 2.002791 | 2.486309 | 3.237597 | 3.813985 | 3.341958 | 2.858587 | 3.305860 | 2.774995 | 3.693717 | 3.314378 |
| FranzDunkel | 3.898867 | 2.116649 | 2.690365 | 3.419777 | 4.316269 | 4.480046 | 3.718364 | 2.958749 | 3.419777 | 2.970115 | 3.685813 | 3.505336 |
| FranzNatur | 4.277778 | 2.075397 | 3.085289 | 3.426917 | 4.407831 | 4.536122 | 4.138244 | 3.895192 | 3.496122 | 3.261656 | 3.940780 | 2.805860 |
| Hofbrau | 4.399007 | 3.087549 | 0.000000 | 3.158053 | 3.984467 | 4.674603 | 3.706632 | 3.395192 | 3.630654 | 3.616398 | 4.023809 | 3.731262 |
| Paulaner | 3.697283 | 2.404388 | 1.028142 | 3.326382 | 3.910490 | 4.749270 | 3.786054 | 3.144253 | 3.444692 | 3.278671 | 3.792303 | 3.281101 |
| BlueMoon | 3.200000 | 3.300000 | 4.600000 | 3.300000 | 5.200000 | 5.200000 | 2.500000 | 2.500000 | 2.100000 | 2.900000 | 4.200000 | 4.100000 |
res.pca <- epPCA(beers,
center = TRUE, # Center the data
scale = FALSE, # **Do not** scale the variables
graphs = FALSE)
The default of this function will center (to have means equal 0) and scale (to have the standard deviations equal 1) all variables.
res.plot.pca <- OTAplot(
resPCA = res.pca,
data = beers
)
results.stats: useful statistics after performing PCA,
among which one can find the factor scores, loadings, eigenvalues, and
contributions;results.graphs: all the PCA graphs (heatmaps,
screeplot, factor map, correlation circles, loading maps etc.)description.graphs: the titles of these graphs, used as
titles in the PPTX file.We can check the data by plotting the correlation matrix between the variables. This correlation matrix is where the components are extracted.
res.plot.pca$results.graphs$correlation
The scree plot shows the eigenvalues of each component. These eigenvalues give the variance of each component (or called dimension in the figure). In other words, the singular values, which are the square root of the eigenvalues, give the standard deviations of these components. The sum of the eigenvalues is equal to the total inertia of the data.
res.plot.pca$results.graphs$scree
According to an elbow test, there are 4 dimensions in this data.
Here, the observation map shows how the observations (i.e., rows) are
represented in the component space. If center = TRUE when
running PCA, the origin should be at the center of all observations. (If
it’s not, something is off.)
res.plot.pca$results.graphs$factorScoresI12
The result in this plot showed that the first dimension separates
BlueMoon from Hofbrau, and the second
dimension separates BlueMoon and Hofbrau from
FranzDunkel and FranzNatur.
The circle of correlations illustrate how the variables are correlated with each other and with the dimensions. From this figure, the length of an arrow indicates how much this variable is explained by the two given dimensions. The cosine between any two arrows gives their correlation. The cosine between a variable and an axis gives the correlation between that variable and the corresponding dimension.
In this figure, an angle closer to 0° indicates a correlation close to 1; an angle closer to 180° indicates a correlation close to -1; and an 90° angle indicates 0 correlation. However, it’s worth noted that this implication of correlation might only be true within the given dimensions. When a variable is far away from the circle, it is not fully explained by the dimensions, and other dimensions might be characterized by other pattern of relationship between this and other variables.
res.plot.pca$results.graphs$cosineCircleArrowJ12
This circle of correlation plot illustrates that Dimension 1 is
Sweet and Alcoholic versus Yeast,
Hay, Bitter, Cereal, and
Toasted; Dimension 2 is Acid and
Fermented versus Cereal and
Toasted. Interestingly, Acid and
Fermented are orthogonal to Sweet and
Hay (which are anti-correlated from each other) on the
first two dimension. On these two dimensions, Cereal and
Toasted are negatively correlated with Malt
and Hop.
The inference analysis of PCA (performed by
OTAplotInference) includes bootstrap test of the proportion
of variance explained, permutation tests of the eigenvalues, the
bootstrap tests of the column factor scores \(\mathbf{G}\) (i.e., the right singular
vectors scaled to have the variance of each dimension equals the
corresponding singular value; \(\mathbf{G} =
\mathbf{Q}\boldsymbol{\Delta}\) with \(\mathbf{X =
P}\boldsymbol{\Delta}\mathbf{Q}^\top\)). These column factor
scores are stored as fj in the output of
epPCA.
res.plot.pca.inference <- OTAplotInference(
resPCA = res.pca,
data = beers
) %>% suppressMessages()
The inference scree plot illustrates the 95% bootstrap confidence intervals of the percentage of variance explained by each eigenvalue. If an interval includes 0, the component explains reliably larger than 0% of the variance.
res.plot.pca.inference$results.graphs$scree
The bootstrap test identifies one significant dimension explains the variance reliably larger than 0%. The permutation test identifies one significant dimension with an eigenvalue significantly larger than 0.
The bar plot illustrates the bootstrap ratios which equals \[\frac{M_{g_{j}boot}}{SD_{g_{j}boot}},\] where \(M_{g_{j}boot}\) is the mean of the bootstrapped sample of the jth column factor score and \(SD_{g_{j}boot}\) is the bootstrapped standard deviation of the factor score. A bootstrap ratio is equivalent to a t-statistics for the column factor score with the \(\mathrm{H}_0: g_j = 0\). The threshold is set to 2 to approximate the critical t-value of 1.96 at \(\alpha\) = .05.
res.plot.pca.inference$results.graphs$BR1
res.plot.pca.inference$results.graphs$BR2
The results show no significant factor scores for both dimensions that is reliably different from 0.