speed dist
1 4 2
2 4 10
3 7 4
4 7 22
5 8 16
6 9 10
\[PC_{ik} = v_{k1} X_{i1} + v_{k2} X_{i2} + ... + v_{kp} X_{ip}\]
\[ \scriptsize \begin{pmatrix} \color{black}{PC_{11}} & \color{lightgray}{PC_{12}} & \dots & \color{lightgray}{PC_{1k}} \\ \color{lightgray}{PC_{21}} & \color{lightgray}{PC_{22}} & \dots & \color{lightgray}{PC_{2k}} \\ \vdots & \vdots & \ddots & \vdots \\ \color{lightgray}{PC_{n1}} & \color{lightgray}{PC_{n2}} & \dots & \color{lightgray}{PC_{nk}} \end{pmatrix}_{n \times k} = \begin{pmatrix} \color{black}{X_{11}} & \color{black}{X_{12}} & \dots & \color{black}{X_{1p}} \\ \color{lightgray}{X_{21}} & \color{lightgray}{X_{22}} & \dots & \color{lightgray}{X_{2p}} \\ \vdots & \vdots & \ddots & \vdots \\ \color{lightgray}{X_{n1}} & \color{lightgray}{X_{n2}} & \dots & \color{lightgray}{X_{np}} \end{pmatrix}_{n \times p} \begin{pmatrix} \color{black}{v_{11}} & \color{lightgray}{v_{21}} & \dots & \color{lightgray}{v_{k1}} \\ \color{black}{v_{12}} & \color{lightgray}{v_{22}} & \dots & \color{lightgray}{v_{k2}} \\ \vdots & \vdots & \ddots & \vdots \\ \color{black}{v_{1p}} & \color{lightgray}{v_{2p}} & \dots & \color{lightgray}{v_{kp}} \end{pmatrix}_{p \times k} \]
\[ \scriptsize \begin{pmatrix} \color{lightgray}{PC_{11}} & \color{lightgray}{PC_{12}} & \dots & \color{lightgray}{PC_{1k}} \\ \color{black}{PC_{21}} & \color{lightgray}{PC_{22}} & \dots & \color{lightgray}{PC_{2k}} \\ \vdots & \vdots & \ddots & \vdots \\ \color{lightgray}{PC_{n1}} & \color{lightgray}{PC_{n2}} & \dots & \color{lightgray}{PC_{nk}} \end{pmatrix}_{n \times k} = \begin{pmatrix} \color{lightgray}{X_{11}} & \color{lightgray}{X_{12}} & \dots & \color{lightgray}{X_{1p}} \\ \color{black}{X_{21}} & \color{black}{X_{22}} & \dots & \color{black}{X_{2p}} \\ \vdots & \vdots & \ddots & \vdots \\ \color{lightgray}{X_{n1}} & \color{lightgray}{X_{n2}} & \dots & \color{lightgray}{X_{np}} \end{pmatrix}_{n \times p} \begin{pmatrix} \color{black}{v_{11}} & \color{lightgray}{v_{21}} & \dots & \color{lightgray}{v_{k1}} \\ \color{black}{v_{12}} & \color{lightgray}{v_{22}} & \dots & \color{lightgray}{v_{k2}} \\ \vdots & \vdots & \ddots & \vdots \\ \color{black}{v_{1p}} & \color{lightgray}{v_{2p}} & \dots & \color{lightgray}{v_{kp}} \end{pmatrix}_{p \times k} \]
\[ \scriptsize \begin{pmatrix} \color{lightgray}{PC_{11}} & \color{lightgray}{PC_{12}} & \dots & \color{lightgray}{PC_{1k}} \\ \color{lightgray}{PC_{21}} & \color{lightgray}{PC_{22}} & \dots & \color{black}{PC_{2k}} \\ \vdots & \vdots & \ddots & \vdots \\ \color{lightgray}{PC_{n1}} & \color{lightgray}{PC_{n2}} & \dots & \color{lightgray}{PC_{nk}} \end{pmatrix}_{n \times k} = \begin{pmatrix} \color{lightgray}{X_{11}} & \color{lightgray}{X_{12}} & \dots & \color{lightgray}{X_{1p}} \\ \color{black}{X_{21}} & \color{black}{X_{22}} & \dots & \color{black}{X_{2p}} \\ \vdots & \vdots & \ddots & \vdots \\ \color{lightgray}{X_{n1}} & \color{lightgray}{X_{n2}} & \dots & \color{lightgray}{X_{np}} \end{pmatrix}_{n \times p} \begin{pmatrix} \color{lightgray}{v_{11}} & \color{lightgray}{v_{21}} & \dots & \color{black}{v_{k1}} \\ \color{lightgray}{v_{12}} & \color{lightgray}{v_{22}} & \dots & \color{black}{v_{k2}} \\ \vdots & \vdots & \ddots & \vdots \\ \color{lightgray}{v_{1p}} & \color{lightgray}{v_{2p}} & \dots & \color{black}{v_{kp}} \end{pmatrix}_{p \times k} \]
\[\scriptsize \begin{pmatrix} \color{blue}{PC_{11}} & \color{orange}{PC_{12}} & \dots & PC_{1k} \\ \color{blue}{PC_{21}} & \color{orange}{PC_{22}} & \dots & PC_{2k} \\ \vdots & \vdots & \ddots & \vdots \\ \color{blue}{PC_{n1}} & \color{orange}{PC_{n2}} & \dots & PC_{nk} \end{pmatrix} = X \begin{pmatrix} \color{blue}{v_{11}} & \color{orange}{v_{21}} & \dots & v_{k1} \\ \color{blue}{v_{12}} & \color{orange}{v_{22}} & \dots & v_{k2} \\ \vdots & \vdots & \ddots & \vdots \\ \color{blue}{v_{1p}} & \color{orange}{v_{2p}} & \dots & v_{kp} \end{pmatrix}\]
Find loadings to produce PCs that:
cars data set.speeddistancespeed and distance2D plot:
1D principal components:
Distance is just measured on a larger scale than speed!distance more than speed will explain a larger proportion of variability.speed explains more total variance: speed dist
speed 100653.061 6596.8163
dist 6596.816 664.0608
… So it looks like equal weights on the scaled speed and distance variables (\(v_3\)) explains the most variability.
speeddistancespeed and distance\[X_S = UDV^T\]
\[S_k = U[,k]\cdot D[k]\]
svd_cars_scaled <- svd(cars_scaled)
U <- svd_cars_scaled$u
D <- svd_cars_scaled$d
V <- svd_cars_scaled$v
head(U[,1] * D[1])[1] -2.648984 -2.429466 -2.192920 -1.699003 -1.729914 -1.760825
[,1]
[1,] -2.648984
[2,] -2.429466
[3,] -2.192920
[4,] -1.699003
[5,] -1.729914
[6,] -1.760825
\[S_{1:k} = U[,1:k]diag(D[1:k])\]
\[S_{1:k} = X_SV[,1:k]\]
Suppose you have been asked to analyze data on 25 prehistoric goblets from Thailand (Professor C.F.W. Higham, University of Otago, as taken from Manly, B.F.J. 1986. Multivariate Statistical Methods: A Primer. Chapman and Hall, London, 159pp.)
Let’s consider three of these:
MouthWidth TotalWidth TotalHeight
1 13 21 23
2 14 14 24
3 19 23 24
4 17 18 16
5 19 20 16
6 12 20 24
Step 3: Find the scores
U and D X1 X2 X3
1 2.228046 0.672099 0.09985496
Takeaways:
MouthWidth, TotalWidth, and TotalHeightMouthWidth with TotalHeightMouthWidth but small TotalHeight: short, stout gobletsLet’s see if we’re right by plotting just the first 2 PCs, using size and color to represent MouthWidth and TotalHeight (scatterplot created with ggplot, ellipses/arrows/annotations added with PPT):
Consider the below plots of four mean-centered and scaled data sets:
Consider the below plots of four mean-centered and scaled data sets:
Consider the plot below:
4 possible sets of loadings for the first PC are:
A. (0.43, 0.90)
B. (0.90, -0.43)
C. (0.71, 0.71)
D. (-0.43, 0.90)
Which of these are the loadings for the first PC from this data set?
Consider the following scaled data and loadings:
X1 X2 X3
1.2 -0.8 0.5
0.0 1.5 1.8
Variable V1 V2 V3
X1 0.707 -0.500 0.500
X2 0.500 0.707 0.500
X3 0.500 0.500 -0.707
Reduce these data to one dimension using the first PC.
Consider the following correlation matrix and subsequent eigendecomposition:
Math Reading Science Hours Attendance
Math 1.00 0.72 0.85 0.65 0.43
Reading 0.72 1.00 0.68 0.58 0.39
Science 0.85 0.68 1.00 0.71 0.47
Hours 0.65 0.58 0.71 1.00 0.34
Attendance 0.43 0.39 0.47 0.34 1.00
eigen() decomposition
$values
[1] 3.3829121 0.7271795 0.4280074 0.3227359 0.1391651
$vectors
[,1] [,2] [,3] [,4] [,5]
[1,] -0.4943116 -0.14078526 -0.19307822 0.49961607 0.67002993
[2,] -0.4527383 -0.14549796 -0.64466890 -0.58799530 -0.11190154
[3,] -0.5009088 -0.09446515 0.07063200 0.46830772 -0.71823792
[4,] -0.4406478 -0.27485999 0.73046871 -0.42044049 0.14116213
[5,] -0.3249676 0.93516746 0.09246291 -0.09253642 0.05239731