- Principle component analysis distributes the variation in a multivariate dataset across components.
- Visualize patterns that would not be apparent
- Linear algebra is at the heart of the PCA
- This discussion will be light on mathematical theory
2019-02-12
| interoc | cwidth | clength | T.weight | |
|---|---|---|---|---|
| median | 0.798 | 2.975 | 3.706 | 1.740 |
| mean | 0.799 | 2.991 | 3.691 | 1.742 |
| SE.mean | 0.005 | 0.020 | 0.020 | 0.004 |
| CI.mean.0.95 | 0.011 | 0.039 | 0.039 | 0.008 |
| var | 0.002 | 0.028 | 0.028 | 0.001 |
| std.dev | 0.046 | 0.166 | 0.167 | 0.033 |
| coef.var | 0.058 | 0.055 | 0.045 | 0.019 |
standardize <- function(x) {(x - mean(x))/sd(x)}
# Eliminate factor variables & untransformed weights
my.scaled.data <- as.data.frame(apply(morpho, 2, standardize))
# Calculate correlation matrix
my.cor <- cor(my.scaled.data)
# Save the eigenvalues of the correllation matrix
my.eigen <- eigen(my.cor)
# Rename matrix rows and columns for easier interpretation
rownames(my.eigen$vectors) <- c("interoc", "cwidth",
"clength", "T.weight")
colnames(my.eigen$vectors) <- c("PC1", "PC2", "PC3", "PC4")
| PC1 | PC2 | PC3 | PC4 | |
|---|---|---|---|---|
| interoc | -0.4973 | -0.2504 | 0.8251 | -0.0960 |
| cwidth | -0.5319 | -0.3465 | -0.4948 | -0.5935 |
| clength | -0.5760 | -0.0463 | -0.2716 | 0.7696 |
| T.weight | -0.3716 | 0.9028 | 0.0250 | -0.2150 |
| PC | eigenvalues |
|---|---|
| PC1 | 2.7104 |
| PC2 | 0.7608 |
| PC3 | 0.4128 |
| PC4 | 0.1160 |
Sum of the eigenvalues = total variance of the scaled data
sum(my.eigen$values)
## [1] 4
sum( var(my.scaled.data[,1]), var(my.scaled.data[,2]), var(my.scaled.data[,3]), var(my.scaled.data[,4]))
## [1] 4
pc1.var <- 100*round(my.eigen$values[1]/
sum(my.eigen$values), digits = 3)
pc2.var <- 100*round(my.eigen$values[2]/
sum(my.eigen$values), digits = 3)
pc3.var <- 100*round(my.eigen$values[3]/
sum(my.eigen$values), digits = 3)
pc4.var <- 100*round(my.eigen$values[4]/
sum(my.eigen$values), digits = 3)
pc <- data.frame(PC = c("PC1", "PC2", "PC3", "PC4"),
Percentage = c(pc1.var, pc2.var,
pc3.var, pc4.var))
| PC | Percentage |
|---|---|
| PC1 | 67.8 |
| PC2 | 19.0 |
| PC3 | 10.3 |
| PC4 | 2.9 |
The total variation should sum to ~100% depending on rounding error:
## [1] 100
loadings <- my.eigen$vectors my.scaled.matrix <- as.matrix(my.scaled.data) # the function %*% is matrix multiplication scores <- my.scaled.matrix %*% loadings sd <- sqrt(my.eigen$values) rownames(loadings) <- colnames(my.scaled.data)
| PC1 | PC2 | PC3 | PC4 |
|---|---|---|---|
| -2.2600 | 0.4796 | 0.1963 | -0.0511 |
| -1.2141 | 1.0328 | 0.0185 | 0.0859 |
| -2.9230 | 0.7367 | -0.3902 | 0.5873 |
| -1.0990 | 1.0164 | 0.0283 | 0.0654 |
| -0.5060 | 1.8150 | 0.8230 | 0.6192 |
| -0.3074 | 0.8778 | 0.0583 | -0.0498 |
The function prcomp is the primary tool for PCA in base R
pca_morpho <- prcomp(morpho, center = TRUE, scale. = TRUE) # Show the variables in the class "prcomp" ls(pca_morpho)
## [1] "center" "rotation" "scale" "sdev" "x"
pca_summary <- summary(pca_morpho)$importance %>% as.data.frame() %>% round(., digits = 3)
| PC1 | PC2 | PC3 | PC4 | |
|---|---|---|---|---|
| Standard deviation | 1.646 | 0.872 | 0.642 | 0.341 |
| Proportion of Variance | 0.678 | 0.190 | 0.103 | 0.029 |
| Cumulative Proportion | 0.678 | 0.868 | 0.971 | 1.000 |
prcomp object.## [1] 2.7104296 0.7607956 0.4127988 0.1159759
Designed to reduce the subjectivity of interpreting a scree plot
## ## Using eigendecomposition of correlation matrix.
In Summary, a PCA of any type may not be an appropriate statistical approach for this dataset