3.3 - PCA implementation

Where we’re headed

  • We’ve explored the concepts behind PCA
  • Time now to see how to implement PCA in R, including:
    • Functions for executing PCA
    • Packages and functions for creating useful plots to gaining insight into our PCA

6D example: Prehistoric goblets

  • Suppose you have been asked to analyze data on 25 prehistoric goblets from Thailand (Professor C.F.W. Higham, University of Otago, as taken from Manly, B.F.J. 1986. Multivariate Statistical Methods: A Primer. Chapman and Hall, London, 159pp.)

  • Let’s consider three of these:

  • \(X_1\) = mouth width
  • \(X_2\) = total width
  • \(X_3\) = total height
  • \(X_4\) = base width
  • \(X_5\) = stem width
  • \(X_6\) = stem height

Reading and plotting

goblets <- read.csv('Data/goblets.csv')
head(goblets)
  MouthWidth TotalWidth TotalHeight BaseWidth StemWidth StemHeight
1         13         21          23        14         7          8
2         14         14          24        19         5          9
3         19         23          24        20         6         12
4         17         18          16        16        11          8
5         19         20          16        16        10          7
6         12         20          24        17         6          9
  • Recall questions:
    • These variables are correlated. Do we need all 6 to characterize the goblets?
    • Is there a linear combination of these variables that reduces these variables to 2 dimensions that works well to characterize these goblets?
library(GGally)
ggpairs(goblets) + theme_classic(base_size = 14)

Pairs plot of goblets data

The prcomp function

  • We will use prcomp() (comes with base R) as our PCA workhorse
  • Uses SVD on the correlation matrix (assuming scale.=TRUE, which it always should be)
goblets_pca <- prcomp(goblets, scale. = TRUE)
goblets_pca
Standard deviations (1, .., p=6):
[1] 2.0668279 1.0450729 0.6202804 0.3773544 0.2555262 0.2088231

Rotation (n x k) = (6 x 6):
                  PC1         PC2        PC3         PC4         PC5        PC6
MouthWidth  0.3660233  0.48592912 -0.6179335 -0.32436829  0.27835629  0.2556581
TotalWidth  0.4515367 -0.03412653  0.3752732 -0.67427405 -0.08391876 -0.4386709
TotalHeight 0.4111609 -0.44135161  0.3163501  0.02019451  0.38254463  0.6239630
BaseWidth   0.4618586 -0.11457532 -0.1588367  0.54119094  0.38182563 -0.5564635
StemWidth   0.2963653  0.68277080  0.4914536  0.35921044 -0.22136144  0.1625790
StemHeight  0.4381125 -0.29768029 -0.3324080  0.13346207 -0.75785442  0.1295892

prcomp() objects

  • prcomp() returns the following important objects
    • rotation: matrix of loadings (\(p \times p\))
    • x: matrix of scores/PCs (\(n \times p\))
    • sdev: standard deviations of the PCs (also, square-root of eigenvalues of correlation matrix)
    • center, scale: mean/standard deviation vectors of original variables

Loadings

goblets_pca$rotation
                  PC1         PC2        PC3         PC4         PC5        PC6
MouthWidth  0.3660233  0.48592912 -0.6179335 -0.32436829  0.27835629  0.2556581
TotalWidth  0.4515367 -0.03412653  0.3752732 -0.67427405 -0.08391876 -0.4386709
TotalHeight 0.4111609 -0.44135161  0.3163501  0.02019451  0.38254463  0.6239630
BaseWidth   0.4618586 -0.11457532 -0.1588367  0.54119094  0.38182563 -0.5564635
StemWidth   0.2963653  0.68277080  0.4914536  0.35921044 -0.22136144  0.1625790
StemHeight  0.4381125 -0.29768029 -0.3324080  0.13346207 -0.75785442  0.1295892

Scores

goblets_pca$x
             PC1          PC2         PC3          PC4         PC5           PC6
 [1,]  0.4724326 -0.025587670  0.64572404 -0.429551128 -0.01788425  0.0432723489
 [2,]  0.3551757 -0.747416633 -0.86646955  0.916962515  0.69767990  0.1581123362
 [3,]  2.5682176 -0.005956072 -1.24324748 -0.597104969  0.25186054 -0.2623550978
 [4,]  0.9455738  2.363098749  0.03405265  0.509258860 -0.25627655  0.0007467682
 [5,]  1.1203843  2.448935880 -0.32572756 -0.226395798  0.23407762 -0.1481410898
 [6,]  0.6508633 -0.746100727  0.36812287  0.105018303  0.10955649 -0.2748903014
 [7,]  0.4394648 -0.660112674  0.10994051  0.165737274 -0.33050862 -0.2035319181
 [8,]  0.5621160 -0.273163556  1.10939014 -0.382406693  0.32961800 -0.1131009926
 [9,] -1.6920335  0.185212389  0.42995824  0.013300574  0.08552813  0.0678216794
[10,] -2.1054544  0.830051502  0.43430673  0.438252249  0.07413410 -0.0038557355
[11,]  1.1112325 -1.447189946 -0.16347544  0.200009437 -0.35680227 -0.2580418446
[12,]  0.8585801  0.579560252  1.06287388  0.033989365 -0.13088938  0.0596473254
[13,] -1.3213315 -0.237320609 -0.04334162 -0.081254259  0.25738747  0.1898312175
[14,]  1.3930101 -0.523917666  0.56171497 -0.098199041 -0.05460262 -0.0702028196
[15,]  1.1522576 -0.212222372  0.53876372 -0.509804137  0.09499890  0.2421534568
[16,]  0.5217002 -0.535724932 -0.67122923 -0.092477558 -0.07655724 -0.4485183366
[17,] -0.0994333  1.617947532 -0.09140099  0.526583088 -0.11130036 -0.1273051019
[18,]  1.7763959  1.549731588 -0.57626998 -0.407535958 -0.15492219  0.2099148349
[19,]  0.9515815 -0.642113521  0.63246486  0.190009305 -0.20061919  0.1813956360
[20,]  2.2715217 -0.657717663 -1.07074847 -0.081194252 -0.35335799  0.5302830258
[21,]  0.9755721 -0.802412037  0.31882396  0.007125104  0.39098498  0.1181806411
[22,] -4.0334505  0.085097503 -0.13313632  0.198251934 -0.10493998  0.0899205262
[23,] -5.0994230 -0.333415717 -0.44332583 -0.185035550 -0.10481801 -0.1253355582
[24,] -4.9173947 -0.224676863 -0.47187689 -0.575677054 -0.06080661  0.0955162116
[25,]  1.1424410 -1.584586738 -0.14588719  0.362138390 -0.21154087  0.0484827883

Variability of PCs

  • Running summary() on the PC analysis shows us some important information:
summary(goblets_pca)
Importance of components:
                         PC1   PC2     PC3     PC4     PC5     PC6
Standard deviation     2.067 1.045 0.62028 0.37735 0.25553 0.20882
Proportion of Variance 0.712 0.182 0.06412 0.02373 0.01088 0.00727
Cumulative Proportion  0.712 0.894 0.95812 0.98185 0.99273 1.00000
  • Note that the first two PCs explain 71+18 = 89% of total variability in 6-dimensional space!
  • \(\implies\) There’s a lot of shared information in the 6 goblet variables (as we know from the pairs plot)

The factoextra package

  • The factoextra package contains several excellent functions for plotting PCA results.
  • Input: PCA result from prcomp
  • Produces ggplot-based plot
library(factoextra)

Scree plot with fviz_eig()

  • A scree plot is a common plot in PCA that visualizing the variability explained by each PCA.
  • Base usage:
fviz_eig(goblets_pca) 

Scree plot

  • Scree is a geological term for a collection of broken rock fragments at the base of steep rocky mass:

Photo of scree

Image source: Wikipedia

Scree plot interpretation

  • Look for the spot where the cliff ends and the scree begins (point of diminishing returns)
  • Also called “elbow” or “kink” in the plot
  • These are the number of PCs that adequately approximate the full-dimensional data
  • Any PCs beyond the sharp elbow add little additional information
fviz_eig(goblets_pca) 

Scree plot

Interpretation:

  • We see the cliff ends at PC 2
  • Scree begins at PC 3
  • 2 dimensions is sufficient for representing this 6D data set

Scree plot options

  • Scree plots are often visualized just with a line for a simpler look
  • Sometimes represent the eigenvalues instead of the % variability explained
  • Both versions easy to get with fviz_eig(); note also the seamless ggplot integration:
fviz_eig(goblets_pca, geom='line') + 
  labs(title='Scree plot, % variability explained')+
  theme_classic(base_size = 16) 

Scree plot with lines only

fviz_eig(goblets_pca, geom='line', 
         choice = 'eigenvalue') +
  labs(title='Scree plot, eigenvalues') +
  theme_classic(base_size = 16) 

Scree plot with lines only, eigenvalues instead of cumulative variability

Variable contributions with fviz_contrib()

  • “Variable contributions” to each PC are measured by the squared loadings
  • fviz_contrib() plots the squared loadings, which must sum to 1 (remember, each vector of loadings is normalized)
  • A reference line is placed at \(100\times 1/p\), 16.67% for us: the contribution of each variable if each contributed equally to that PC.
  • Variables above this threshold are weighted more heavily in that direction than variables below.
fviz_contrib(goblets_pca, choice = 'var', axes = 1) + 
  theme_classic(base_size = 16) +
  labs(x = 'Variable', 
       title = 'Contribution to first dimension')

Variable cotribution to first PC

fviz_contrib(goblets_pca, choice = 'var', axes = 2) + 
  theme_classic(base_size = 16)+
  labs(x = 'Variable', 
       title = 'Contribution to second dimension')

Variable cotribution to second PC

Interpreting variable contribution plot: 1st PC

fviz_contrib(goblets_pca, choice = 'var', axes = 1) + 
  theme_classic(base_size = 16) +
  labs(x = 'Variable', 
       title = 'Contribution to first dimension')

Variable cotribution to first PC

sort(goblets_pca$rotation[,1], decreasing=TRUE)
  BaseWidth  TotalWidth  StemHeight TotalHeight  MouthWidth   StemWidth 
  0.4618586   0.4515367   0.4381125   0.4111609   0.3660233   0.2963653 
sort(goblets_pca$rotation[,1]^2, decreasing=TRUE)
  BaseWidth  TotalWidth  StemHeight TotalHeight  MouthWidth   StemWidth 
 0.21331336  0.20388536  0.19194254  0.16905330  0.13397304  0.08783241 
sum(goblets_pca$rotation[,1]^2)
[1] 1
  • Base width, total width, stem height, and total height contribute the most to the first PC.
  • Mouth width and stem width are less for separating goblets in the first PC direction
  • As all loadings are \(>0\), generally large goblets will have large PC1 values and small goblets will have small PC1 values

Interpreting variable contribution plot: 2nd PC

fviz_contrib(goblets_pca, choice = 'var', axes = 2) + 
  theme_classic(base_size = 16)+
  labs(x = 'Variable', 
       title = 'Contribution to second dimension')

Variable cotribution to second PC

sort(goblets_pca$rotation[,2], decreasing=TRUE)
  StemWidth  MouthWidth  TotalWidth   BaseWidth  StemHeight TotalHeight 
 0.68277080  0.48592912 -0.03412653 -0.11457532 -0.29768029 -0.44135161 
sort(goblets_pca$rotation[,2]^2, decreasing=TRUE)
  StemWidth  MouthWidth TotalHeight  StemHeight   BaseWidth  TotalWidth 
 0.46617596  0.23612711  0.19479125  0.08861356  0.01312750  0.00116462 
sum(goblets_pca$rotation[,2]^2)
[1] 1
  • Stem width and mouth width (least important for the first PC) are now the two most important variables for separating goblets in the second PC direction.
  • Total height is third most important, but has a negative loading
  • Goblets with large stem and mouth widths, and small heights (i.e., short, squat goblets) will have large PC2 values

Loadings plot with fviz_pca_var()

  • While the variable contribution plot shows us importance, it doesn’t convey how the variables explain variability in the first 2 PC directions.
  • That’s the job of the loadings plot, which plots the loadings as arrow vectors.
fviz_pca_var(goblets_pca, axes= c(1,2))

Variable PCA plot

goblets_pca$rotation[,1:2]
                  PC1         PC2
MouthWidth  0.3660233  0.48592912
TotalWidth  0.4515367 -0.03412653
TotalHeight 0.4111609 -0.44135161
BaseWidth   0.4618586 -0.11457532
StemWidth   0.2963653  0.68277080
StemHeight  0.4381125 -0.29768029
  • Total width and base width point almost directly horizontal, as we would expect given the variable contribution plots: the first PC is determined primarily by these variables
  • Stem width and mouth width point more in the vertical direction than any other variable, consistent with these being the highest contributors to PC2.
  • All arrows point in somewhat similar directions, consistent with notable positive correlation between all of these variables

Biplot with fviz_pca()

  • We can put this all together, adding the arrows on top of a plot of the first two PC scores with fviz_pca()
  • One point = one goblet’s first two PC scores
  • One arrow = one variable’s loading
fviz_pca(goblets_pca, axes = 1:2)

Biplot of the goblet PCA

goblets_pca$x[,1:2]
             PC1          PC2
 [1,]  0.4724326 -0.025587670
 [2,]  0.3551757 -0.747416633
 [3,]  2.5682176 -0.005956072
 [4,]  0.9455738  2.363098749
 [5,]  1.1203843  2.448935880
 [6,]  0.6508633 -0.746100727
 [7,]  0.4394648 -0.660112674
 [8,]  0.5621160 -0.273163556
 [9,] -1.6920335  0.185212389
[10,] -2.1054544  0.830051502
[11,]  1.1112325 -1.447189946
[12,]  0.8585801  0.579560252
[13,] -1.3213315 -0.237320609
[14,]  1.3930101 -0.523917666
[15,]  1.1522576 -0.212222372
[16,]  0.5217002 -0.535724932
[17,] -0.0994333  1.617947532
[18,]  1.7763959  1.549731588
[19,]  0.9515815 -0.642113521
[20,]  2.2715217 -0.657717663
[21,]  0.9755721 -0.802412037
[22,] -4.0334505  0.085097503
[23,] -5.0994230 -0.333415717
[24,] -4.9173947 -0.224676863
[25,]  1.1424410 -1.584586738

Interpreting biplot

fviz_pca(goblets_pca, axes = 1:2)

Biplot of the goblet PCA

  • Goblets like 3 and 20 have very wide Total and Base widths
  • Goblets like 23 and 24 have very narrow Total and Base widths
  • Goblet 20 is much taller than Goblet 10
  • Goblets like 4 and 5 have similar Total and Base width to goblets like 15 and 14, but have much widers Stems and Mouths.

Fast food nutrition

  • Time for a new example on nutritional “value” of various fast food items
fastfood <- read.csv('Data/fast_food_nutrition.csv')
head(fastfood)
  Restaurant                                Item     Type Breakfast Calories TotalFat SaturatedFat Cholesterol Sodium TotalCarbs Fiber Sugar Protein
1     Wendys            10 piece Chicken Nuggets  Chicken        No      450       29            6          65    930         26     2     0      21
2     Wendys             5 piece Chicken Nuggets  Chicken        No      220       14            3          35    460         13     1     0      10
3  McDonalds     Angus Bacon & Cheese Snack Wrap     Wrap        No      390       21            9          75   1080         28     1     4      21
4  McDonalds             Angus Deluxe Snack Wrap     Wrap        No      410       25           10          75    990         27     2     3      20
5  McDonalds              Angus Mushroom & Swiss Sandwich        No      770       40           17         135   1170         59     4     8      44
6     Wendys Apple Pecan Chicken Salad full-Size    Salad        No      570       27            8         115   1340         51     6    39      37
tail(fastfood)
    Restaurant                Item     Type Breakfast Calories TotalFat SaturatedFat Cholesterol Sodium TotalCarbs Fiber Sugar Protein
75 Burger King  Spicy ChickN Crisp Sandwich        No      450       30            5          30    830         34     2     4      13
76 Burger King Tendercrisp Chicken Sandwich        No      800       46            8          70   1560         68     3     9      32
77 Burger King Tendergrill Chicken Sandwich        No      490       21            4          85   1330         51     2     7      26
78 Burger King      Triple Whopper Sandwich        No     1160       76           27         205   1110         51     3    11      68
79 Burger King             Whopper Sandwich        No      680       40           11          75    980         51     3    11      40
80 Burger King          Whopper JR Sandwich        No      370       21            5          40    520         31     1     6      14
  • Selecting the numeric variables and performing PCA (making sure to scale!):
library(tidyverse)
numeric_variables <- fastfood %>% select(Calories:Protein)
rownames(numeric_variables) <- fastfood$Item #Useful for subsequent point labeling
food_pca <- prcomp(numeric_variables, scale. = TRUE)

Pairs plot

  • What kind of correlations are we dealing with?
ggpairs(numeric_variables) + 
  theme_classic(base_size = 14)

Pairs plot of fast food data

  • calories, fat (both total and saturated) and protein all correlated with each other
  • Cholesteral highly correlated with protein and fat
  • Sugar, fiber, carbs, and cholesterol are kind of “on their own”

How many PCs?

summary(food_pca)
Importance of components:
                          PC1    PC2     PC3     PC4     PC5     PC6     PC7     PC8     PC9
Standard deviation     2.2601 1.1632 0.93582 0.91378 0.61020 0.50978 0.34841 0.26973 0.04262
Proportion of Variance 0.5676 0.1503 0.09731 0.09278 0.04137 0.02888 0.01349 0.00808 0.00020
Cumulative Proportion  0.5676 0.7179 0.81520 0.90798 0.94935 0.97823 0.99171 0.99980 1.00000
fviz_eig(food_pca, geom='line') +
  theme_classic(base_size = 16)

Scree plot for fast food PCA

  • Could argue the “scree” starts at 5 and above
  • Consider retaining first 4 PCs to approximate data? (Explains 90.8% of total variability)

What’s contributing to the PCs?

library(patchwork)
c1 <- fviz_contrib(food_pca, choice='var', axes = 1)
c2 <- fviz_contrib(food_pca, choice='var', axes = 2)
c3 <- fviz_contrib(food_pca, choice='var', axes = 3)
c4 <- fviz_contrib(food_pca, choice='var', axes = 4)

(c1+c2)/(c3+c4)

Variable contribution plots for food nutrition PCA

food_pca$rotation[,1:4]
                    PC1         PC2         PC3         PC4
Calories     0.42671188 -0.06019828 -0.04565786 -0.23226760
TotalFat     0.41555181  0.10272038  0.02777273 -0.19041622
SaturatedFat 0.39977015  0.07660561  0.22907804  0.08035464
Cholesterol  0.35766578  0.19339534  0.16139158  0.32754323
Sodium       0.36317299  0.10118529 -0.23634847 -0.10139831
TotalCarbs   0.18874417 -0.59221814 -0.24040894 -0.54335240
Fiber        0.13019795 -0.46002736 -0.53815934  0.65806472
Sugar        0.06708248 -0.59171566  0.71730148  0.16443492
Protein      0.41051448  0.14085855  0.02874194  0.17450080
  • Note the “equal contribution” threshold = \(100\times 1/10 =10\%\)
  • Calories, fat, and protein are driving PC1
  • PC2 is driven by carbs, sugar and fiber. Sugar and fiber are main drivers of PC3.
  • Cholesterol third highest contributor to PC4.

Sign matters!

  • It takes a bit more work to plot the raw loadings (must create out own data frame), but in this way we can see their signs which provides insight the squared loadings don’t.

Variable loading plots for food nutrition PCA

loadings_df <- data.frame(food_pca$rotation) %>% 
  rownames_to_column('Variable')

base_plot <- ggplot(loadings_df) + 
  theme_classic(base_size = 12) +
  theme(axis.text.x = element_text(angle = 45,hjust=1,vjust=1)) + 
  labs(x = 'Variable') + 
  geom_hline(aes(yintercept= 0), linetype=2)

l1 <- base_plot + 
  geom_col(aes(x = reorder(Variable,PC1, decreasing=TRUE), y = PC1))
l2 <- base_plot+
  geom_col(aes(x = reorder(Variable,PC2, decreasing=TRUE), y = PC2))  
l3 <- base_plot + 
  geom_col(aes(x = reorder(Variable,PC3, decreasing=TRUE), y = PC3))  
l4 <- base_plot + 
  geom_col(aes(x = reorder(Variable,PC4, decreasing=TRUE), y = PC4))  


(l1+l2)/(l3+l4) +
  plot_annotation(title = 'Loadings plots for fast food PCA',
                  theme=theme(plot.title = element_text(hjust = 0.5)))
  • Foods low in fiber, sugar, carbs will have high PC2 scores.
  • PC3 contrasts foods high in sugar and low in fiber.
  • PC4 contrasts foods high in fiber and low in carbs.

PCA biplot + bells and whistles

  • We can easily incorporate more information (ggrepeled-point labeling, color-coding) in PCA biplots
  • Default graph: ugh!!
fviz_pca(food_pca, axes = c(1,2),
         col.ind = fastfood$Type,
         repel = TRUE) 
Biplot of first 2 PCs from fast food example

Cleaning it up

  • Here we are also suppressing the point labeling, labeling only the variable arrows (even repelling doesn’t save us if we label everything!), and cleaning up the legend
fviz_pca(food_pca, axes = c(1,2), 
         col.ind = fastfood$Type, 
         label ='var',   #  Only label variable arrows, not 'ind' too
         repel = TRUE, 
         pointsize = 3) + 
  scale_shape_manual(values = c(15:19,8)) + #Control point shapes; see ?pch
  labs(color='Food type', shape = 'Food type') #Clean up legend title
Biplot of first 2 PCs from fast food example

Interpretation: axes 1 and 2

Biplot of first 2 PCs from fast food example

  • The “big points” are the mean vectors of the PCs for each food type
  • Protein, sodium, saturated fat, total fat, cholesterol all highly correlated
    • Food items that are high in these are obviously sandwiches
    • Desserts are low in these
  • These are not that correlated with carbs, sugar, fiber
    • Low in sugar/fiber/carbs include chicken and wraps

What about PCs 3 and 4?

  • Want to produce a plot of PC3 vs PC4
  • Want to label points with more control over the repelling and text size
  • Need to set up a data frame with the PCs and information for point labelling (food type, item name) to feed to geom_text_repel()
PCS_with_labels <- bind_cols(food_pca$x[,1:4], Item=fastfood$Item, 
                             type = fastfood$Type)
PCS_with_labels
# A tibble: 80 × 6
      PC1    PC2     PC3     PC4 Item                                type    
    <dbl>  <dbl>   <dbl>   <dbl> <chr>                               <chr>   
 1 -0.339  1.19  -0.442   0.100  10 piece Chicken Nuggets            Chicken 
 2 -2.51   1.57   0.116   0.416  5 piece Chicken Nuggets             Chicken 
 3 -0.326  1.14   0.236  -0.0293 Angus Bacon & Cheese Snack Wrap     Wrap    
 4 -0.155  1.00  -0.0511  0.302  Angus Deluxe Snack Wrap             Wrap    
 5  3.19  -0.577 -0.396   0.0642 Angus Mushroom & Swiss              Sandwich
 6  1.96  -3.26   1.37    1.75   Apple Pecan Chicken Salad full-Size Salad   
 7 -0.871 -1.18   0.886   1.41   Apple Pecan Chicken Salad Half-Size Salad   
 8  2.92  -0.726 -0.898  -0.281  Asiago Ranch Chicken Club           Sandwich
 9  1.23  -2.02  -2.05    0.521  Bacon Cheese Potato                 Side    
10  6.08   1.35   0.848   0.0783 Baconator                           Sandwich
# ℹ 70 more rows

Code for plot production:

p1 <-  fviz_pca(food_pca, axes = c(1,2), 
         col.ind = fastfood$Type, 
         label ='var',   
         repel=TRUE,
         select.var = list(contrib=5) # Can control num arrows!  Top 5 per PC combo
         ) + 
  scale_shape_manual(values = c(15:19,8)) + 
  labs(color='Food type', shape = 'Food type', title='')  +
  geom_text_repel(aes(x = PC1, y = PC2, label = Item, color = type), 
                  data = PCS_with_labels, 
                  max.overlaps = 7, 
                  size = 3)

p2 <-  fviz_pca(food_pca, axes = c(3,4), 
         col.ind = fastfood$Type, 
         label ='var',   
         repel=TRUE,
         select.var = list(contrib=5) # Can control num arrows!  Top 5 per PC combo
         ) + 
  scale_shape_manual(values = c(15:19,8)) + 
  guides(color='none', shape = 'none') + # eliminate color legend; already have one
  labs(color='Food type', shape = 'Food type', title='')  + 
  geom_text_repel(aes(x = PC3, y = PC4, label = Item, color = type), 
                  data = PCS_with_labels, 
                  max.overlaps = 7, 
                  size = 3)
p1 + p2

Plot result

Biplots of PC1 v PC2 and PC3 v PC4 with point labeling
  • Notice how PCs 3 and 4 are much more focused on the sugar, cholesterol, fiber, and carb content - the “low-correlation” variables in our initial pairs plot
  • PCs 3 and 4 highlight variability in the salads, desserts, and sides - notice how clumped in the middle the sandwiches and chicken are in the PC3v4 plot, when they were the big story in the PC1v2 plot
  • Notice how PCs 3 and 4 contrast fiber, sugar, and carbs, which we knew from the loadings plots)