3.3 - PCA implementation

Where we’re headed

We’ve explored the concepts behind PCA
Time now to see how to implement PCA in R, including:
- Functions for executing PCA
- Packages and functions for creating useful plots to gaining insight into our PCA

6D example: Prehistoric goblets

Suppose you have been asked to analyze data on 25 prehistoric goblets from Thailand (Professor C.F.W. Higham, University of Otago, as taken from Manly, B.F.J. 1986. Multivariate Statistical Methods: A Primer. Chapman and Hall, London, 159pp.)
Let’s consider three of these:

\(X_1\) = mouth width
\(X_2\) = total width
\(X_3\) = total height
\(X_4\) = base width
\(X_5\) = stem width
\(X_6\) = stem height

Reading and plotting

goblets <- read.csv('Data/goblets.csv')
head(goblets)

  MouthWidth TotalWidth TotalHeight BaseWidth StemWidth StemHeight
1         13         21          23        14         7          8
2         14         14          24        19         5          9
3         19         23          24        20         6         12
4         17         18          16        16        11          8
5         19         20          16        16        10          7
6         12         20          24        17         6          9

Recall questions:
- These variables are correlated. Do we need all 6 to characterize the goblets?
- Is there a linear combination of these variables that reduces these variables to 2 dimensions that works well to characterize these goblets?

library(GGally)
ggpairs(goblets) + theme_classic(base_size = 14)

The `prcomp` function

We will use prcomp() (comes with base R) as our PCA workhorse
Uses SVD on the correlation matrix (assuming scale.=TRUE, which it always should be)

goblets_pca <- prcomp(goblets, scale. = TRUE)
goblets_pca

Standard deviations (1, .., p=6):
[1] 2.0668279 1.0450729 0.6202804 0.3773544 0.2555262 0.2088231

Rotation (n x k) = (6 x 6):
                  PC1         PC2        PC3         PC4         PC5        PC6
MouthWidth  0.3660233  0.48592912 -0.6179335 -0.32436829  0.27835629  0.2556581
TotalWidth  0.4515367 -0.03412653  0.3752732 -0.67427405 -0.08391876 -0.4386709
TotalHeight 0.4111609 -0.44135161  0.3163501  0.02019451  0.38254463  0.6239630
BaseWidth   0.4618586 -0.11457532 -0.1588367  0.54119094  0.38182563 -0.5564635
StemWidth   0.2963653  0.68277080  0.4914536  0.35921044 -0.22136144  0.1625790
StemHeight  0.4381125 -0.29768029 -0.3324080  0.13346207 -0.75785442  0.1295892

`prcomp()` objects

prcomp() returns the following important objects
- rotation: matrix of loadings (\(p \times p\))
- x: matrix of scores/PCs (\(n \times p\))
- sdev: standard deviations of the PCs (also, square-root of eigenvalues of correlation matrix)
- center, scale: mean/standard deviation vectors of original variables

Loadings

goblets_pca$rotation

                  PC1         PC2        PC3         PC4         PC5        PC6
MouthWidth  0.3660233  0.48592912 -0.6179335 -0.32436829  0.27835629  0.2556581
TotalWidth  0.4515367 -0.03412653  0.3752732 -0.67427405 -0.08391876 -0.4386709
TotalHeight 0.4111609 -0.44135161  0.3163501  0.02019451  0.38254463  0.6239630
BaseWidth   0.4618586 -0.11457532 -0.1588367  0.54119094  0.38182563 -0.5564635
StemWidth   0.2963653  0.68277080  0.4914536  0.35921044 -0.22136144  0.1625790
StemHeight  0.4381125 -0.29768029 -0.3324080  0.13346207 -0.75785442  0.1295892

Scores

goblets_pca$x

             PC1          PC2         PC3          PC4         PC5           PC6
 [1,]  0.4724326 -0.025587670  0.64572404 -0.429551128 -0.01788425  0.0432723489
 [2,]  0.3551757 -0.747416633 -0.86646955  0.916962515  0.69767990  0.1581123362
 [3,]  2.5682176 -0.005956072 -1.24324748 -0.597104969  0.25186054 -0.2623550978
 [4,]  0.9455738  2.363098749  0.03405265  0.509258860 -0.25627655  0.0007467682
 [5,]  1.1203843  2.448935880 -0.32572756 -0.226395798  0.23407762 -0.1481410898
 [6,]  0.6508633 -0.746100727  0.36812287  0.105018303  0.10955649 -0.2748903014
 [7,]  0.4394648 -0.660112674  0.10994051  0.165737274 -0.33050862 -0.2035319181
 [8,]  0.5621160 -0.273163556  1.10939014 -0.382406693  0.32961800 -0.1131009926
 [9,] -1.6920335  0.185212389  0.42995824  0.013300574  0.08552813  0.0678216794
[10,] -2.1054544  0.830051502  0.43430673  0.438252249  0.07413410 -0.0038557355
[11,]  1.1112325 -1.447189946 -0.16347544  0.200009437 -0.35680227 -0.2580418446
[12,]  0.8585801  0.579560252  1.06287388  0.033989365 -0.13088938  0.0596473254
[13,] -1.3213315 -0.237320609 -0.04334162 -0.081254259  0.25738747  0.1898312175
[14,]  1.3930101 -0.523917666  0.56171497 -0.098199041 -0.05460262 -0.0702028196
[15,]  1.1522576 -0.212222372  0.53876372 -0.509804137  0.09499890  0.2421534568
[16,]  0.5217002 -0.535724932 -0.67122923 -0.092477558 -0.07655724 -0.4485183366
[17,] -0.0994333  1.617947532 -0.09140099  0.526583088 -0.11130036 -0.1273051019
[18,]  1.7763959  1.549731588 -0.57626998 -0.407535958 -0.15492219  0.2099148349
[19,]  0.9515815 -0.642113521  0.63246486  0.190009305 -0.20061919  0.1813956360
[20,]  2.2715217 -0.657717663 -1.07074847 -0.081194252 -0.35335799  0.5302830258
[21,]  0.9755721 -0.802412037  0.31882396  0.007125104  0.39098498  0.1181806411
[22,] -4.0334505  0.085097503 -0.13313632  0.198251934 -0.10493998  0.0899205262
[23,] -5.0994230 -0.333415717 -0.44332583 -0.185035550 -0.10481801 -0.1253355582
[24,] -4.9173947 -0.224676863 -0.47187689 -0.575677054 -0.06080661  0.0955162116
[25,]  1.1424410 -1.584586738 -0.14588719  0.362138390 -0.21154087  0.0484827883

Variability of PCs

Running summary() on the PC analysis shows us some important information:

summary(goblets_pca)

Importance of components:
                         PC1   PC2     PC3     PC4     PC5     PC6
Standard deviation     2.067 1.045 0.62028 0.37735 0.25553 0.20882
Proportion of Variance 0.712 0.182 0.06412 0.02373 0.01088 0.00727
Cumulative Proportion  0.712 0.894 0.95812 0.98185 0.99273 1.00000

Note that the first two PCs explain 71+18 = 89% of total variability in 6-dimensional space!
\(\implies\) There’s a lot of shared information in the 6 goblet variables (as we know from the pairs plot)

The `factoextra` package

The factoextra package contains several excellent functions for plotting PCA results.
Input: PCA result from prcomp
Produces ggplot-based plot

library(factoextra)

Scree plot with `fviz_eig()`

A scree plot is a common plot in PCA that visualizing the variability explained by each PCA.

Base usage:

fviz_eig(goblets_pca)

Scree is a geological term for a collection of broken rock fragments at the base of steep rocky mass:

Image source: Wikipedia

Scree plot interpretation

Look for the spot where the cliff ends and the scree begins (point of diminishing returns)
Also called “elbow” or “kink” in the plot
These are the number of PCs that adequately approximate the full-dimensional data
Any PCs beyond the sharp elbow add little additional information

fviz_eig(goblets_pca)

Interpretation:

We see the cliff ends at PC 2
Scree begins at PC 3
2 dimensions is sufficient for representing this 6D data set

Scree plot options

Scree plots are often visualized just with a line for a simpler look
Sometimes represent the eigenvalues instead of the % variability explained
Both versions easy to get with fviz_eig(); note also the seamless ggplot integration:

fviz_eig(goblets_pca, geom='line') + 
  labs(title='Scree plot, % variability explained')+
  theme_classic(base_size = 16)

fviz_eig(goblets_pca, geom='line', 
         choice = 'eigenvalue') +
  labs(title='Scree plot, eigenvalues') +
  theme_classic(base_size = 16)

Scree plot with lines only, eigenvalues instead of cumulative variability

Variable contributions with `fviz_contrib()`

“Variable contributions” to each PC are measured by the squared loadings
fviz_contrib() plots the squared loadings, which must sum to 1 (remember, each vector of loadings is normalized)
A reference line is placed at \(100\times 1/p\), 16.67% for us: the contribution of each variable if each contributed equally to that PC.
Variables above this threshold are weighted more heavily in that direction than variables below.

fviz_contrib(goblets_pca, choice = 'var', axes = 1) + 
  theme_classic(base_size = 16) +
  labs(x = 'Variable', 
       title = 'Contribution to first dimension')

fviz_contrib(goblets_pca, choice = 'var', axes = 2) + 
  theme_classic(base_size = 16)+
  labs(x = 'Variable', 
       title = 'Contribution to second dimension')

Interpreting variable contribution plot: 1st PC

fviz_contrib(goblets_pca, choice = 'var', axes = 1) + 
  theme_classic(base_size = 16) +
  labs(x = 'Variable', 
       title = 'Contribution to first dimension')

sort(goblets_pca$rotation[,1], decreasing=TRUE)

  BaseWidth  TotalWidth  StemHeight TotalHeight  MouthWidth   StemWidth 
  0.4618586   0.4515367   0.4381125   0.4111609   0.3660233   0.2963653

sort(goblets_pca$rotation[,1]^2, decreasing=TRUE)

  BaseWidth  TotalWidth  StemHeight TotalHeight  MouthWidth   StemWidth 
 0.21331336  0.20388536  0.19194254  0.16905330  0.13397304  0.08783241

sum(goblets_pca$rotation[,1]^2)

[1] 1

Base width, total width, stem height, and total height contribute the most to the first PC.
Mouth width and stem width are less for separating goblets in the first PC direction
As all loadings are \(>0\), generally large goblets will have large PC1 values and small goblets will have small PC1 values

Interpreting variable contribution plot: 2nd PC

fviz_contrib(goblets_pca, choice = 'var', axes = 2) + 
  theme_classic(base_size = 16)+
  labs(x = 'Variable', 
       title = 'Contribution to second dimension')

sort(goblets_pca$rotation[,2], decreasing=TRUE)

  StemWidth  MouthWidth  TotalWidth   BaseWidth  StemHeight TotalHeight 
 0.68277080  0.48592912 -0.03412653 -0.11457532 -0.29768029 -0.44135161

sort(goblets_pca$rotation[,2]^2, decreasing=TRUE)

  StemWidth  MouthWidth TotalHeight  StemHeight   BaseWidth  TotalWidth 
 0.46617596  0.23612711  0.19479125  0.08861356  0.01312750  0.00116462

sum(goblets_pca$rotation[,2]^2)

[1] 1

Stem width and mouth width (least important for the first PC) are now the two most important variables for separating goblets in the second PC direction.
Total height is third most important, but has a negative loading
Goblets with large stem and mouth widths, and small heights (i.e., short, squat goblets) will have large PC2 values

Loadings plot with `fviz_pca_var()`

While the variable contribution plot shows us importance, it doesn’t convey how the variables explain variability in the first 2 PC directions.
That’s the job of the loadings plot, which plots the loadings as arrow vectors.

fviz_pca_var(goblets_pca, axes= c(1,2))

goblets_pca$rotation[,1:2]

                  PC1         PC2
MouthWidth  0.3660233  0.48592912
TotalWidth  0.4515367 -0.03412653
TotalHeight 0.4111609 -0.44135161
BaseWidth   0.4618586 -0.11457532
StemWidth   0.2963653  0.68277080
StemHeight  0.4381125 -0.29768029

Total width and base width point almost directly horizontal, as we would expect given the variable contribution plots: the first PC is determined primarily by these variables
Stem width and mouth width point more in the vertical direction than any other variable, consistent with these being the highest contributors to PC2.
All arrows point in somewhat similar directions, consistent with notable positive correlation between all of these variables

Biplot with `fviz_pca()`

We can put this all together, adding the arrows on top of a plot of the first two PC scores with fviz_pca()
One point = one goblet’s first two PC scores
One arrow = one variable’s loading

fviz_pca(goblets_pca, axes = 1:2)

goblets_pca$x[,1:2]

             PC1          PC2
 [1,]  0.4724326 -0.025587670
 [2,]  0.3551757 -0.747416633
 [3,]  2.5682176 -0.005956072
 [4,]  0.9455738  2.363098749
 [5,]  1.1203843  2.448935880
 [6,]  0.6508633 -0.746100727
 [7,]  0.4394648 -0.660112674
 [8,]  0.5621160 -0.273163556
 [9,] -1.6920335  0.185212389
[10,] -2.1054544  0.830051502
[11,]  1.1112325 -1.447189946
[12,]  0.8585801  0.579560252
[13,] -1.3213315 -0.237320609
[14,]  1.3930101 -0.523917666
[15,]  1.1522576 -0.212222372
[16,]  0.5217002 -0.535724932
[17,] -0.0994333  1.617947532
[18,]  1.7763959  1.549731588
[19,]  0.9515815 -0.642113521
[20,]  2.2715217 -0.657717663
[21,]  0.9755721 -0.802412037
[22,] -4.0334505  0.085097503
[23,] -5.0994230 -0.333415717
[24,] -4.9173947 -0.224676863
[25,]  1.1424410 -1.584586738

Interpreting biplot

fviz_pca(goblets_pca, axes = 1:2)

Goblets like 3 and 20 have very wide Total and Base widths
Goblets like 23 and 24 have very narrow Total and Base widths
Goblet 20 is much taller than Goblet 10
Goblets like 4 and 5 have similar Total and Base width to goblets like 15 and 14, but have much widers Stems and Mouths.

Fast food nutrition

Time for a new example on nutritional “value” of various fast food items

fastfood <- read.csv('Data/fast_food_nutrition.csv')
head(fastfood)

  Restaurant                                Item     Type Breakfast Calories TotalFat SaturatedFat Cholesterol Sodium TotalCarbs Fiber Sugar Protein
1     Wendys            10 piece Chicken Nuggets  Chicken        No      450       29            6          65    930         26     2     0      21
2     Wendys             5 piece Chicken Nuggets  Chicken        No      220       14            3          35    460         13     1     0      10
3  McDonalds     Angus Bacon & Cheese Snack Wrap     Wrap        No      390       21            9          75   1080         28     1     4      21
4  McDonalds             Angus Deluxe Snack Wrap     Wrap        No      410       25           10          75    990         27     2     3      20
5  McDonalds              Angus Mushroom & Swiss Sandwich        No      770       40           17         135   1170         59     4     8      44
6     Wendys Apple Pecan Chicken Salad full-Size    Salad        No      570       27            8         115   1340         51     6    39      37

tail(fastfood)

    Restaurant                Item     Type Breakfast Calories TotalFat SaturatedFat Cholesterol Sodium TotalCarbs Fiber Sugar Protein
75 Burger King  Spicy ChickN Crisp Sandwich        No      450       30            5          30    830         34     2     4      13
76 Burger King Tendercrisp Chicken Sandwich        No      800       46            8          70   1560         68     3     9      32
77 Burger King Tendergrill Chicken Sandwich        No      490       21            4          85   1330         51     2     7      26
78 Burger King      Triple Whopper Sandwich        No     1160       76           27         205   1110         51     3    11      68
79 Burger King             Whopper Sandwich        No      680       40           11          75    980         51     3    11      40
80 Burger King          Whopper JR Sandwich        No      370       21            5          40    520         31     1     6      14

Selecting the numeric variables and performing PCA (making sure to scale!):

library(tidyverse)
numeric_variables <- fastfood %>% select(Calories:Protein)
rownames(numeric_variables) <- fastfood$Item #Useful for subsequent point labeling
food_pca <- prcomp(numeric_variables, scale. = TRUE)

Pairs plot

What kind of correlations are we dealing with?

ggpairs(numeric_variables) + 
  theme_classic(base_size = 14)

calories, fat (both total and saturated) and protein all correlated with each other
Cholesteral highly correlated with protein and fat
Sugar, fiber, carbs, and cholesterol are kind of “on their own”

How many PCs?

summary(food_pca)

Importance of components:
                          PC1    PC2     PC3     PC4     PC5     PC6     PC7     PC8     PC9
Standard deviation     2.2601 1.1632 0.93582 0.91378 0.61020 0.50978 0.34841 0.26973 0.04262
Proportion of Variance 0.5676 0.1503 0.09731 0.09278 0.04137 0.02888 0.01349 0.00808 0.00020
Cumulative Proportion  0.5676 0.7179 0.81520 0.90798 0.94935 0.97823 0.99171 0.99980 1.00000

fviz_eig(food_pca, geom='line') +
  theme_classic(base_size = 16)

Could argue the “scree” starts at 5 and above
Consider retaining first 4 PCs to approximate data? (Explains 90.8% of total variability)

What’s contributing to the PCs?

library(patchwork)
c1 <- fviz_contrib(food_pca, choice='var', axes = 1)
c2 <- fviz_contrib(food_pca, choice='var', axes = 2)
c3 <- fviz_contrib(food_pca, choice='var', axes = 3)
c4 <- fviz_contrib(food_pca, choice='var', axes = 4)

(c1+c2)/(c3+c4)

Variable contribution plots for food nutrition PCA

food_pca$rotation[,1:4]

                    PC1         PC2         PC3         PC4
Calories     0.42671188 -0.06019828 -0.04565786 -0.23226760
TotalFat     0.41555181  0.10272038  0.02777273 -0.19041622
SaturatedFat 0.39977015  0.07660561  0.22907804  0.08035464
Cholesterol  0.35766578  0.19339534  0.16139158  0.32754323
Sodium       0.36317299  0.10118529 -0.23634847 -0.10139831
TotalCarbs   0.18874417 -0.59221814 -0.24040894 -0.54335240
Fiber        0.13019795 -0.46002736 -0.53815934  0.65806472
Sugar        0.06708248 -0.59171566  0.71730148  0.16443492
Protein      0.41051448  0.14085855  0.02874194  0.17450080

Note the “equal contribution” threshold = \(100\times 1/10 =10\%\)
Calories, fat, and protein are driving PC1
PC2 is driven by carbs, sugar and fiber. Sugar and fiber are main drivers of PC3.
Cholesterol third highest contributor to PC4.

Sign matters!

It takes a bit more work to plot the raw loadings (must create out own data frame), but in this way we can see their signs which provides insight the squared loadings don’t.

Variable loading plots for food nutrition PCA

loadings_df <- data.frame(food_pca$rotation) %>% 
  rownames_to_column('Variable')

base_plot <- ggplot(loadings_df) + 
  theme_classic(base_size = 12) +
  theme(axis.text.x = element_text(angle = 45,hjust=1,vjust=1)) + 
  labs(x = 'Variable') + 
  geom_hline(aes(yintercept= 0), linetype=2)

l1 <- base_plot + 
  geom_col(aes(x = reorder(Variable,PC1, decreasing=TRUE), y = PC1))
l2 <- base_plot+
  geom_col(aes(x = reorder(Variable,PC2, decreasing=TRUE), y = PC2))  
l3 <- base_plot + 
  geom_col(aes(x = reorder(Variable,PC3, decreasing=TRUE), y = PC3))  
l4 <- base_plot + 
  geom_col(aes(x = reorder(Variable,PC4, decreasing=TRUE), y = PC4))  


(l1+l2)/(l3+l4) +
  plot_annotation(title = 'Loadings plots for fast food PCA',
                  theme=theme(plot.title = element_text(hjust = 0.5)))

Foods low in fiber, sugar, carbs will have high PC2 scores.
PC3 contrasts foods high in sugar and low in fiber.
PC4 contrasts foods high in fiber and low in carbs.

PCA biplot + bells and whistles

We can easily incorporate more information (ggrepeled-point labeling, color-coding) in PCA biplots
Default graph: ugh!!

fviz_pca(food_pca, axes = c(1,2),
         col.ind = fastfood$Type,
         repel = TRUE)

Biplot of first 2 PCs from fast food example

Cleaning it up

Here we are also suppressing the point labeling, labeling only the variable arrows (even repelling doesn’t save us if we label everything!), and cleaning up the legend

fviz_pca(food_pca, axes = c(1,2), 
         col.ind = fastfood$Type, 
         label ='var',   #  Only label variable arrows, not 'ind' too
         repel = TRUE, 
         pointsize = 3) + 
  scale_shape_manual(values = c(15:19,8)) + #Control point shapes; see ?pch
  labs(color='Food type', shape = 'Food type') #Clean up legend title

Interpretation: axes 1 and 2

The “big points” are the mean vectors of the PCs for each food type
Protein, sodium, saturated fat, total fat, cholesterol all highly correlated
- Food items that are high in these are obviously sandwiches
- Desserts are low in these
These are not that correlated with carbs, sugar, fiber
- Low in sugar/fiber/carbs include chicken and wraps

What about PCs 3 and 4?

Want to produce a plot of PC3 vs PC4
Want to label points with more control over the repelling and text size
Need to set up a data frame with the PCs and information for point labelling (food type, item name) to feed to geom_text_repel()

PCS_with_labels <- bind_cols(food_pca$x[,1:4], Item=fastfood$Item, 
                             type = fastfood$Type)
PCS_with_labels

# A tibble: 80 × 6
      PC1    PC2     PC3     PC4 Item                                type    
    <dbl>  <dbl>   <dbl>   <dbl> <chr>                               <chr>   
 1 -0.339  1.19  -0.442   0.100  10 piece Chicken Nuggets            Chicken 
 2 -2.51   1.57   0.116   0.416  5 piece Chicken Nuggets             Chicken 
 3 -0.326  1.14   0.236  -0.0293 Angus Bacon & Cheese Snack Wrap     Wrap    
 4 -0.155  1.00  -0.0511  0.302  Angus Deluxe Snack Wrap             Wrap    
 5  3.19  -0.577 -0.396   0.0642 Angus Mushroom & Swiss              Sandwich
 6  1.96  -3.26   1.37    1.75   Apple Pecan Chicken Salad full-Size Salad   
 7 -0.871 -1.18   0.886   1.41   Apple Pecan Chicken Salad Half-Size Salad   
 8  2.92  -0.726 -0.898  -0.281  Asiago Ranch Chicken Club           Sandwich
 9  1.23  -2.02  -2.05    0.521  Bacon Cheese Potato                 Side    
10  6.08   1.35   0.848   0.0783 Baconator                           Sandwich
# ℹ 70 more rows

Code for plot production:

p1 <-  fviz_pca(food_pca, axes = c(1,2), 
         col.ind = fastfood$Type, 
         label ='var',   
         repel=TRUE,
         select.var = list(contrib=5) # Can control num arrows!  Top 5 per PC combo
         ) + 
  scale_shape_manual(values = c(15:19,8)) + 
  labs(color='Food type', shape = 'Food type', title='')  +
  geom_text_repel(aes(x = PC1, y = PC2, label = Item, color = type), 
                  data = PCS_with_labels, 
                  max.overlaps = 7, 
                  size = 3)

p2 <-  fviz_pca(food_pca, axes = c(3,4), 
         col.ind = fastfood$Type, 
         label ='var',   
         repel=TRUE,
         select.var = list(contrib=5) # Can control num arrows!  Top 5 per PC combo
         ) + 
  scale_shape_manual(values = c(15:19,8)) + 
  guides(color='none', shape = 'none') + # eliminate color legend; already have one
  labs(color='Food type', shape = 'Food type', title='')  + 
  geom_text_repel(aes(x = PC3, y = PC4, label = Item, color = type), 
                  data = PCS_with_labels, 
                  max.overlaps = 7, 
                  size = 3)
p1 + p2

Plot result

Biplots of PC1 v PC2 and PC3 v PC4 with point labeling

Notice how PCs 3 and 4 are much more focused on the sugar, cholesterol, fiber, and carb content - the “low-correlation” variables in our initial pairs plot
PCs 3 and 4 highlight variability in the salads, desserts, and sides - notice how clumped in the middle the sandwiches and chicken are in the PC3v4 plot, when they were the big story in the PC1v2 plot
Notice how PCs 3 and 4 contrast fiber, sugar, and carbs, which we knew from the loadings plots)

3.3 - PCA implementation

Where we’re headed

6D example: Prehistoric goblets

Reading and plotting

The prcomp function

prcomp() objects

Loadings

Scores

Variability of PCs

The factoextra package

Scree plot with fviz_eig()

Scree plot interpretation

Scree plot options

Variable contributions with fviz_contrib()

Interpreting variable contribution plot: 1st PC

Interpreting variable contribution plot: 2nd PC

Loadings plot with fviz_pca_var()

Biplot with fviz_pca()

Interpreting biplot

Fast food nutrition

Pairs plot

How many PCs?

What’s contributing to the PCs?

Sign matters!

PCA biplot + bells and whistles

Cleaning it up

Interpretation: axes 1 and 2

What about PCs 3 and 4?

Code for plot production:

Plot result

The `prcomp` function

`prcomp()` objects

The `factoextra` package

Scree plot with `fviz_eig()`

Variable contributions with `fviz_contrib()`

Loadings plot with `fviz_pca_var()`

Biplot with `fviz_pca()`