library(tidyverse)
library(ggplot2)
library(MASS)
library(GGally)
library(plot3D)
At the end of this session you should be able to:
explore PCA through multivariate data;
understand how to apply PCA using covariance;
understand how to apply PCA using correlation (standardising/scaling);
understand how to interpret the R output using princomp and pcomp in R.
Relates to lecture 2.
Libraries ggplot2, MASS, tidyverse, GGally may be useful. Others may be mentioned below in the hints.
For Q3 you will also need the library mvtnorm.
You might like to set up a project for each lab if you are using RStudio. Then you can copy a .Rmd file into that directory and write your answers in that file.
Consider the aircraft data (aircraft.csv) with variables Year, Period, Power, Span, Length, Weight, Speed and Range. The data are available under Week 1 folder or in the Data Sets folder on LMS.
These data record six characteristics of aircraft designs which appeared during the twentieth century.
The variables are: - Year: year of first manufacture - Period: a code to indicate one of three broad time periods - Power: total engine power (kW) - Span: wing span (m) - Length: length (m) - Weight: maximum take-off weight (kg) - Speed: maximum speed (km/h) - Range: range (km)
Source: The data were collected by P. Saviotti and are described in detail in Saviotti (1996), “Technological Evolution, Variety and Economy”, Edward Elgar: Cheltenham.
The dataframe aircraft refers to the aircraft data from
the previous lab, the version with the logged variables. In this lab we
work only with the logged variables plus Year and
Period.
Use prcomp for this question.
Year Period Power Span Length Weight Speed Range
1 14 1 82.0 12.8 7.60 1070 105 400
2 14 1 82.0 11.0 9.00 830 145 402
3 14 1 223.6 17.9 10.35 2200 135 500
4 15 1 164.0 14.5 9.80 1946 138 500
5 15 1 119.0 12.9 7.90 1190 140 400
6 15 1 74.5 7.5 6.30 653 177 350
Lab Method:
# Creating a new dataframe with logged variables
aircraft_logged <- mutate(aircraft, Period = factor(Period), logPower = log10(Power),
logSpan = log10(Span), logLength = log10(Length), logWeight = log10(Weight),
logSpeed = log10(Speed), logRange = log10(Range)) %>%
dplyr::select(Year, Period, starts_with("log"))
# Method 1 pca.aircraft=prcomp(select(aircraft, -Year, -Period)) Method 2
noms = names(aircraft_logged)[3:8]
rhs = paste0(noms, collapse = " + ")
fmla = as.formula(paste("~ ", rhs))
print(fmla)
~logPower + logSpan + logLength + logWeight + logSpeed + logRange
pca.aircraft = prcomp(fmla, data = aircraft_logged)
# Display the eigenvectors (rotation matrix)
print(pca.aircraft$rotation)
PC1 PC2 PC3 PC4 PC5 PC6
logPower 0.6789041 -0.3370631 0.1337237 0.63152740 -0.092702676 0.01323106
logSpan 0.1475060 0.4778834 0.2150124 0.06127707 -0.048909835 -0.83515987
logLength 0.2128740 0.2307281 0.1568771 -0.27071156 -0.863486982 0.24071613
logWeight 0.5880074 0.2578687 0.2481639 -0.48182127 0.480207328 0.25182325
logSpeed 0.2291724 -0.6301924 -0.2741496 -0.53235281 -0.112767421 -0.42315895
logRange 0.2715571 0.3756882 -0.8800761 0.09264058 -0.009418365 0.04370613
# Display the eigenvalues (squared singular values)
eigenvalues <- (pca.aircraft$sdev)^2
print(eigenvalues)
[1] 0.900183940 0.086144332 0.044225537 0.005585039 0.002357161 0.002158438
Eigenvectors:
logPower 0.6789041 -0.3370631 0.1337237 0.63152740 -0.092702676 0.01323106 logSpan 0.1475060 0.4778834 0.2150124 0.06127707 -0.048909835 -0.83515987 logLength 0.2128740 0.2307281 0.1568771 -0.27071156 -0.863486982 0.24071613 logWeight 0.5880074 0.2578687 0.2481639 -0.48182127 0.480207328 0.25182325 logSpeed 0.2291724 -0.6301924 -0.2741496 -0.53235281 -0.112767421 -0.42315895 logRange 0.2715571 0.3756882 -0.8800761 0.09264058 -0.009418365 0.04370613
Eigenvalues:
[1] 0.900183940 0.086144332 0.044225537 0.005585039 0.002357161 0.002158438
print, summary and str
to the object produced by prcomp and work out the
correspondence between these outputs and the terminology of PCA concepts
used in the lecture notes. Note that, unlike the other two commands,
str shows what the object actually is (i.e. its
structure).cat("*****print(pca_result)****: This command prints the standard deviations of the principal components and the rotation matrix, which contains the eigenvectors.
The standard deviations are related to the square roots of the eigenvalues of the covariance matrix. The larger the standard deviation, the more variation that principal component explains.
The rotation matrix shows how the original variables are weighted to form each principal component (PC). Each column corresponds to a principal component, and each row corresponds to the original variables.
",
sep = "\n")
*****print(pca_result)****: This command prints the standard deviations of the principal components and the rotation matrix, which contains the eigenvectors.
The standard deviations are related to the square roots of the eigenvalues of the covariance matrix. The larger the standard deviation, the more variation that principal component explains.
The rotation matrix shows how the original variables are weighted to form each principal component (PC). Each column corresponds to a principal component, and each row corresponds to the original variables.
print(pca.aircraft)
Standard deviations (1, .., p=6):
[1] 0.94878024 0.29350355 0.21029869 0.07473312 0.04855061 0.04645899
Rotation (n x k) = (6 x 6):
PC1 PC2 PC3 PC4 PC5 PC6
logPower 0.6789041 -0.3370631 0.1337237 0.63152740 -0.092702676 0.01323106
logSpan 0.1475060 0.4778834 0.2150124 0.06127707 -0.048909835 -0.83515987
logLength 0.2128740 0.2307281 0.1568771 -0.27071156 -0.863486982 0.24071613
logWeight 0.5880074 0.2578687 0.2481639 -0.48182127 0.480207328 0.25182325
logSpeed 0.2291724 -0.6301924 -0.2741496 -0.53235281 -0.112767421 -0.42315895
logRange 0.2715571 0.3756882 -0.8800761 0.09264058 -0.009418365 0.04370613
cat("****summary(pca_result)****: This provides a summary of the PCA, including the importance of components.The 'Standard deviation' section reiterates the standard deviations of the principal components.
'Proportion of Variance' indicates the fraction of the total variance explained by each principal component.
'Cumulative Proportion' shows the cumulative variance explained when you consider the components together up to that point.
",
sep = "\n")
****summary(pca_result)****: This provides a summary of the PCA, including the importance of components.The 'Standard deviation' section reiterates the standard deviations of the principal components.
'Proportion of Variance' indicates the fraction of the total variance explained by each principal component.
'Cumulative Proportion' shows the cumulative variance explained when you consider the components together up to that point.
summary(pca.aircraft)
Importance of components:
PC1 PC2 PC3 PC4 PC5 PC6
Standard deviation 0.9488 0.29350 0.2103 0.07473 0.04855 0.04646
Proportion of Variance 0.8650 0.08278 0.0425 0.00537 0.00227 0.00207
Cumulative Proportion 0.8650 0.94780 0.9903 0.99566 0.99793 1.00000
cat("****str(pca_result)*****: This function reveals the structure of the prcomp object.
$ sdev: Numerical vector of the standard deviations (square roots of the eigenvalues) of the principal components.
$ rotation: The matrix of eigenvectors (each column is an eigenvector).
$ center: The means of the original variables, which are subtracted from the data if center = TRUE.
$ scale: Logical value indicating whether the variables have been scaled to unit variance before the analysis.
$ x: The matrix of the principal component scores, which are the original data projected onto the principal components.
",
sep = "\n")
****str(pca_result)*****: This function reveals the structure of the prcomp object.
$ sdev: Numerical vector of the standard deviations (square roots of the eigenvalues) of the principal components.
$ rotation: The matrix of eigenvectors (each column is an eigenvector).
$ center: The means of the original variables, which are subtracted from the data if center = TRUE.
$ scale: Logical value indicating whether the variables have been scaled to unit variance before the analysis.
$ x: The matrix of the principal component scores, which are the original data projected onto the principal components.
str(pca.aircraft)
List of 6
$ sdev : num [1:6] 0.9488 0.2935 0.2103 0.0747 0.0486 ...
$ rotation: num [1:6, 1:6] 0.679 0.148 0.213 0.588 0.229 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:6] "logPower" "logSpan" "logLength" "logWeight" ...
.. ..$ : chr [1:6] "PC1" "PC2" "PC3" "PC4" ...
$ center : Named num [1:6] 3.08 1.2 1.13 3.8 2.64 ...
..- attr(*, "names")= chr [1:6] "logPower" "logSpan" "logLength" "logWeight" ...
$ scale : logi FALSE
$ x : num [1:709, 1:6] -1.61 -1.64 -1.03 -1.17 -1.44 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:709] "1" "2" "3" "4" ...
.. ..$ : chr [1:6] "PC1" "PC2" "PC3" "PC4" ...
$ call : language prcomp(formula = fmla, data = aircraft_logged)
- attr(*, "class")= chr "prcomp"
cumsum.)plot(pca.aircraft)
# Assuming pca.aircraft is the result from prcomp Extract eigenvalues (squared
# singular values) eigenvalues <- pca.aircraft$sdev^2
# Create a sequence for the x-axis representing each principal component pc_seq
# <- 1:length(eigenvalues)
# Create the scree plot using base R plot function plot(pc_seq, eigenvalues,
# type = 'b', xlab = 'Principal Component', ylab = 'Eigenvalue', main = 'Scree
# Plot', pch = 19, col = 'blue')
# Create the scree plot using base R plot function
plot(1:length(eigenvalues), pca.aircraft$sdev^2, type = "b", xlab = "Principal Component",
ylab = "Eigenvalue", main = "Scree Plot", col = "black")
plot(cumsum(pca.aircraft$sdev^2)/sum(pca.aircraft$sdev^2), type = "b", ylab = "Cumulative variance")
forplot = cbind(aircraft, pca.aircraft$x)
names(forplot)
[1] "Year" "Period" "Power" "Span" "Length" "Weight" "Speed" "Range"
[9] "PC1" "PC2" "PC3" "PC4" "PC5" "PC6"
ggplot(forplot, aes(PC1)) + geom_density()
ggplot(forplot, aes(PC2)) + geom_density()
ggplot(forplot, aes(PC3)) + geom_density()
ggplot(forplot, aes(PC1, PC2)) + geom_point() + geom_density_2d()
ggplot(forplot, aes(PC1, PC2, colour = Period)) + geom_point() + geom_density_2d()
Warning: The following aesthetics were dropped during statistical transformation: colour
ℹ This can happen when ggplot fails to infer the correct grouping structure in
the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
variable into a factor?
ggplot(forplot, aes(PC1, PC3, colour = Period)) + geom_point() + geom_density_2d()
Warning: The following aesthetics were dropped during statistical transformation: colour
ℹ This can happen when ggplot fails to infer the correct grouping structure in
the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
variable into a factor?
ggpairs(forplot, columns = 9:14)
biplot, with
col = c("white", "blue").)biplot(pca.aircraft, col = c("white", "blue"), choices = 1:2)
biplot(pca.aircraft, col = c("red", "blue"), choices = 1:2)
ChatGPT Method:
# Assuming 'aircraft' is already loaded with read.csv()
# Note: Ensure that you only take the log of positive, non-zero values.
# The log function cannot be applied to zero or negative values.
# Creating a new dataframe with logged variables
# Creating a new dataframe with logged variables
aircraft_logged <- mutate(aircraft,
# Convert the 'Period' variable to a factor (categorical variable)
Period = factor(Period),
# Apply logarithmic transformation (base 10) to the 'Power' variable
logPower = log10(Power),
# Apply logarithmic transformation (base 10) to the 'Span' variable
logSpan = log10(Span),
# Apply logarithmic transformation (base 10) to the 'Length' variable
logLength = log10(Length),
# Apply logarithmic transformation (base 10) to the 'Weight' variable
logWeight = log10(Weight),
# Apply logarithmic transformation (base 10) to the 'Speed' variable
logSpeed = log10(Speed),
# Apply logarithmic transformation (base 10) to the 'Range' variable
logRange = log10(Range)
) %>%
# Select only specific columns: 'Year', 'Period', and those that start with "log"
dplyr::select(Year, Period, starts_with("log"))
# Perform PCA on the logged variables
# Select only the logged variables for PCA
pca_result <- prcomp(aircraft_logged[,c('logPower', 'logSpan', 'logLength', 'logWeight', 'logSpeed', 'logRange')],
center = TRUE, scale. = FALSE)
# Display the eigenvectors (rotation matrix)
print(pca_result$rotation)
PC1 PC2 PC3 PC4 PC5 PC6
logPower 0.6789041 -0.3370631 0.1337237 0.63152740 -0.092702676 0.01323106
logSpan 0.1475060 0.4778834 0.2150124 0.06127707 -0.048909835 -0.83515987
logLength 0.2128740 0.2307281 0.1568771 -0.27071156 -0.863486982 0.24071613
logWeight 0.5880074 0.2578687 0.2481639 -0.48182127 0.480207328 0.25182325
logSpeed 0.2291724 -0.6301924 -0.2741496 -0.53235281 -0.112767421 -0.42315895
logRange 0.2715571 0.3756882 -0.8800761 0.09264058 -0.009418365 0.04370613
# Display the eigenvalues (squared singular values)
eigenvalues <- (pca_result$sdev)^2
print(eigenvalues)
[1] 0.900183940 0.086144332 0.044225537 0.005585039 0.002357161 0.002158438
Eigenvectors Rotation matrix
PC1 PC2 PC3 PC4 PC5 PC6 logPower 0.6789041 -0.3370631 0.1337237 0.63152740 -0.092702676 0.01323106 logSpan 0.1475060 0.4778834 0.2150124 0.06127707 -0.048909835 -0.83515987 logLength 0.2128740 0.2307281 0.1568771 -0.27071156 -0.863486982 0.24071613 logWeight 0.5880074 0.2578687 0.2481639 -0.48182127 0.480207328 0.25182325 logSpeed 0.2291724 -0.6301924 -0.2741496 -0.53235281 -0.112767421 -0.42315895 logRange 0.2715571 0.3756882 -0.8800761 0.09264058 -0.009418365 0.04370613
Eigenvalues
[1] 0.900183940 0.086144332 0.044225537 0.005585039 0.002357161 0.002158438
print, summary and str
to the object produced by prcomp and work out the
correspondence between these outputs and the terminology of PCA concepts
used in the lecture notes. Note that, unlike the other two commands,
str shows what the object actually is (i.e. its
structure).cat("*****print(pca_result)****: This command prints the standard deviations of the principal components and the rotation matrix, which contains the eigenvectors.
The standard deviations are related to the square roots of the eigenvalues of the covariance matrix. The larger the standard deviation, the more variation that principal component explains.
The rotation matrix shows how the original variables are weighted to form each principal component (PC). Each column corresponds to a principal component, and each row corresponds to the original variables.
",
sep = "\n")
*****print(pca_result)****: This command prints the standard deviations of the principal components and the rotation matrix, which contains the eigenvectors.
The standard deviations are related to the square roots of the eigenvalues of the covariance matrix. The larger the standard deviation, the more variation that principal component explains.
The rotation matrix shows how the original variables are weighted to form each principal component (PC). Each column corresponds to a principal component, and each row corresponds to the original variables.
print(pca_result)
Standard deviations (1, .., p=6):
[1] 0.94878024 0.29350355 0.21029869 0.07473312 0.04855061 0.04645899
Rotation (n x k) = (6 x 6):
PC1 PC2 PC3 PC4 PC5 PC6
logPower 0.6789041 -0.3370631 0.1337237 0.63152740 -0.092702676 0.01323106
logSpan 0.1475060 0.4778834 0.2150124 0.06127707 -0.048909835 -0.83515987
logLength 0.2128740 0.2307281 0.1568771 -0.27071156 -0.863486982 0.24071613
logWeight 0.5880074 0.2578687 0.2481639 -0.48182127 0.480207328 0.25182325
logSpeed 0.2291724 -0.6301924 -0.2741496 -0.53235281 -0.112767421 -0.42315895
logRange 0.2715571 0.3756882 -0.8800761 0.09264058 -0.009418365 0.04370613
cat("****summary(pca_result)****: This provides a summary of the PCA, including the importance of components.The 'Standard deviation' section reiterates the standard deviations of the principal components.
'Proportion of Variance' indicates the fraction of the total variance explained by each principal component.
'Cumulative Proportion' shows the cumulative variance explained when you consider the components together up to that point.
",
sep = "\n")
****summary(pca_result)****: This provides a summary of the PCA, including the importance of components.The 'Standard deviation' section reiterates the standard deviations of the principal components.
'Proportion of Variance' indicates the fraction of the total variance explained by each principal component.
'Cumulative Proportion' shows the cumulative variance explained when you consider the components together up to that point.
summary(pca_result)
Importance of components:
PC1 PC2 PC3 PC4 PC5 PC6
Standard deviation 0.9488 0.29350 0.2103 0.07473 0.04855 0.04646
Proportion of Variance 0.8650 0.08278 0.0425 0.00537 0.00227 0.00207
Cumulative Proportion 0.8650 0.94780 0.9903 0.99566 0.99793 1.00000
cat("****str(pca_result)*****: This function reveals the structure of the prcomp object.
$ sdev: Numerical vector of the standard deviations (square roots of the eigenvalues) of the principal components.
$ rotation: The matrix of eigenvectors (each column is an eigenvector).
$ center: The means of the original variables, which are subtracted from the data if center = TRUE.
$ scale: Logical value indicating whether the variables have been scaled to unit variance before the analysis.
$ x: The matrix of the principal component scores, which are the original data projected onto the principal components.
",
sep = "\n")
****str(pca_result)*****: This function reveals the structure of the prcomp object.
$ sdev: Numerical vector of the standard deviations (square roots of the eigenvalues) of the principal components.
$ rotation: The matrix of eigenvectors (each column is an eigenvector).
$ center: The means of the original variables, which are subtracted from the data if center = TRUE.
$ scale: Logical value indicating whether the variables have been scaled to unit variance before the analysis.
$ x: The matrix of the principal component scores, which are the original data projected onto the principal components.
str(pca_result)
List of 5
$ sdev : num [1:6] 0.9488 0.2935 0.2103 0.0747 0.0486 ...
$ rotation: num [1:6, 1:6] 0.679 0.148 0.213 0.588 0.229 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:6] "logPower" "logSpan" "logLength" "logWeight" ...
.. ..$ : chr [1:6] "PC1" "PC2" "PC3" "PC4" ...
$ center : Named num [1:6] 3.08 1.2 1.13 3.8 2.64 ...
..- attr(*, "names")= chr [1:6] "logPower" "logSpan" "logLength" "logWeight" ...
$ scale : logi FALSE
$ x : num [1:709, 1:6] -1.61 -1.64 -1.03 -1.17 -1.44 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:6] "PC1" "PC2" "PC3" "PC4" ...
- attr(*, "class")= chr "prcomp"
library(ggplot2)
# Assuming pca_result is already obtained from the prcomp function and contains
# the principal component scores in pca_result$x
# Convert the PC scores to a dataframe for easier manipulation with ggplot2
pc_scores <- as.data.frame(pca_result$x)
# Plot smoothed histogram (density) for PC1
ggplot(pc_scores, aes(x = PC1)) + geom_density(fill = "blue", alpha = 0.5) + labs(title = "Density Plot of PC1 Scores",
x = "PC1 Score", y = "Density") + theme_minimal()
# Plot smoothed histogram (density) for PC2
ggplot(pc_scores, aes(x = PC2)) + geom_density(fill = "red", alpha = 0.5) + labs(title = "Density Plot of PC2 Scores",
x = "PC2 Score", y = "Density") + theme_minimal()
library(ggplot2)
# Assuming pca_result is already obtained from the prcomp function
# Extract eigenvalues
eigenvalues <- (pca_result$sdev)^2 # Squared singular values are the eigenvalues
# Create a dataframe for plotting the scree plot
pca_df <- data.frame(PC = paste0('PC', seq_along(eigenvalues)),
Eigenvalues = eigenvalues)
# Generate the scree plot
ggplot(pca_df, aes(x = PC, y = Eigenvalues)) +
geom_point() + # Add points for each principal component
geom_line(aes(group = 1)) + # Connect points with a line
theme_minimal() + # Use a minimal theme
labs(title = "Scree Plot", x = "Principal Component", y = "Eigenvalue")
# Calculate the cumulative variance
pca_df$CumulativeVariance <- cumsum(pca_df$Eigenvalues) / sum(pca_df$Eigenvalues)
# Generate the plot for cumulative contribution to total variance
ggplot(pca_df, aes(x = PC, y = CumulativeVariance)) +
geom_point() + # Add points for each principal component
geom_line(aes(group = 1)) + # Connect points with a line
theme_minimal() + # Use a minimal theme
labs(title = "Cumulative Contribution to Total Variance",
x = "Principal Component", y = "Cumulative Proportion of Variance Explained")
Your turn to answer this question.
Repeat Question 1 but with scaled log data. How do the two analyses differ? Why do we see these differences?
We work with the matrix \(\Sigma_0\) from Q1 of Lab 2 which is given again below. As in Q2 of Lab 2 we generate random samples, with \(n = 250\), from each of the following distributions
(i) the multivariate normal distribution ${\cal N}(0,\Sigma_0 )$
(ii) the multivariate $t$-distribution $t_6(0,\Sigma_0)$ in $\nu =6$ degrees of freedom.
Start with the random sample of the \({\cal N}(0,\Sigma_0 )\) distribution and calculate the covariance matrix of the random sample.
Calculate the eigenvalues and eigenvectors of the covariance matrix and compare to those of the population matrix \(\Sigma_0\) (from Lab 2).
Show a graph of the eigenvalues and of the cumulative contribution to total variance against the index.
Plot a smoothed histogram (density) of all six PC scores.
Make a scatterplot of the PC1 scores (x-axis) against the PC2 scores (y-axis). Overlay contours of a 2D density estimate.
Repeat parts (a) to (e) for the random sample of the \(t_6(0,\Sigma_0)\) distribution.
Compare \(\Sigma_0\) with the two sample covariance matrices: how different are the eigenvalues and eigenvectors of the three matrices? Which are most similar? Why do they differ?
Sigma0 = matrix(c(3.1386518, 0.38872659, 0.6178228, 1.7576847, 0.77433973, 0.7508789,
0.3887266, 1.21417949, 0.1941453, 0.451892, 0.01236855, 0.2155466, 0.6178228,
0.19414529, 1.2437919, 0.597032, 0.15151088, 0.2665018, 1.7576847, 0.45189196,
0.597032, 1.7083497, 0.52685668, 0.7109476, 0.7743397, 0.01236855, 0.1515109,
0.5268567, 0.53406192, 0.2299193, 0.7508789, 0.21554658, 0.2665018, 0.7109476,
0.22991933, 0.6642375), byrow = TRUE, nrow = 6)
Remember that you will need the library mvtnorm here.
Sigma0sym = 0.5 * (Sigma0 + t(Sigma0))
Xmat = rmvt(100, sigma = Sigma0sym, df = 6)
Suppose the random variables \(X_1, X_2, X_3\) have the covariance matrix
\[ \Sigma=\begin{pmatrix} 1 & -2 & 0 \\ -2 & 5 & 0 \\ 0 &0 &2 \end{pmatrix}.\]
Compute the eigenvalues and eigenvector.
Write the principal component equations.
Calculate \(Var(Y_1)\) and \(Cov(Y_1, Y_2)\) and the total variance.
Calculate the proportions of the total variance of the first PC and the first two PC. Comment.
The headsize data provided in (a) give the head lengths and head breadths (in millimetres) for each of the first two adult sons in 25 families. Here we shall use only the head lengths (head1 and head2).
(a). Read the data set in R and assign an appropriate name to the data object.
"headsize" <- +matrix(c(191, 195, 181, 183, 176, 208, 189, 197, 188, 192, 179, 183,
174, 190, 188, 163, 195, 186, 181, 175, 192, 174, +176, 197, 190, 155, 149, 148,
153, 144, 157, 150, 159, 152, 150, 158, 147, 150, 159, 151, 137, 155, 153, +145,
140, 154, 143, 139, 167, 163, 179, 201, 185, 188, 171, 192, 190, 189, 197, 187,
186, 174, 185, 195, +187, 161, 183, 173, 182, 165, 185, 178, 176, 200, 187, 145,
152, 149, 149, 142, 152, 149, 152, 159, 151, +148, 147, 152, 157, 158, 130, 158,
148, 146, 137, 152, 147, 143, 158, 150), nrow = 25, ncol = 4, dimnames = list(character(0),
c("head1", "breadth1", "head2", "breadth2")))
(b). Compute the array of sample means, variance-covariance matrix \(S\)
(c). Perform PCA on the dataset based on \(S\).
(d). Write the equations in (c) and determine the proportion of the total sample variance explained by the first two principal components.
(e). Interpret the first two sample PCA components, obtained in (d).