PCA in R


Data

df <- readxl::read_excel("data/1.2 data.xlsx")
original_data <- df[,-8]

Download data

Eigen decomposition

From covariance matrix

cov_mat <- cov(original_data)
cov_mat |> round(digits = 3)
      x1     x2    x3     x4     x5    x6    x7
x1 1.344  0.904 0.934  0.985  0.355 0.527 0.396
x2 0.904  1.284 1.180  1.451 -0.092 0.580 0.177
x3 0.934  1.180 1.355  1.522  0.060 0.589 0.267
x4 0.985  1.451 1.522  2.080 -0.113 0.795 0.243
x5 0.355 -0.092 0.060 -0.113  0.456 0.047 0.266
x6 0.527  0.580 0.589  0.795  0.047 0.382 0.158
x7 0.396  0.177 0.267  0.243  0.266 0.158 0.272
ei_cov <- eigen(cov_mat)
ei_cov$values |> round(3)
[1] 5.480 1.077 0.291 0.164 0.085 0.045 0.032
ei_cov$vectors |> round(3)
       [,1]   [,2]   [,3]   [,4]   [,5]   [,6]   [,7]
[1,] -0.389  0.621  0.521  0.213 -0.349  0.146 -0.051
[2,] -0.457 -0.163  0.439 -0.290  0.650 -0.180 -0.182
[3,] -0.476 -0.047 -0.253 -0.675 -0.373  0.041  0.332
[4,] -0.585 -0.340 -0.372  0.453 -0.140  0.073 -0.416
[5,] -0.020  0.589 -0.451 -0.131  0.134 -0.550 -0.334
[6,] -0.242  0.025 -0.082  0.438  0.181 -0.406  0.738
[7,] -0.105  0.350 -0.351  0.025  0.496  0.687  0.156

Here, values shows the variances of the principal components.

vectors shows the component weights. These are coefficients that are used in calculating principal components.

Component loadings are computed as - \[ \frac{g_{ij} * \sqrt(l_j)}{\sqrt(S_{ii})} \]

where, \(g_{ij}\) = component weight between i-th variable and j-th component.

\(l_j\) = i-th eigenvalue, variance of i-th principal component

\(S_{ii}\) = diagonal elements in the covariance matrix (1 if correlation matrix is used)

gij = ei_cov$vectors
lj = sqrt(ei_cov$values)
sii = sqrt(diag(cov_mat))
n = length(sii)

round(gij * matrix(rep(lj, each = n), nrow = n) / matrix(rep(sii, times = n), nrow = n), 3)
       [,1]   [,2]   [,3]   [,4]   [,5]   [,6]   [,7]
[1,] -0.786  0.556  0.243  0.074 -0.088  0.027 -0.008
[2,] -0.945 -0.149  0.209 -0.104  0.167 -0.034 -0.029
[3,] -0.958 -0.042 -0.117 -0.235 -0.093  0.007  0.051
[4,] -0.949 -0.245 -0.139  0.127 -0.028  0.011 -0.051
[5,] -0.068  0.905 -0.361 -0.078  0.058 -0.172 -0.088
[6,] -0.916  0.042 -0.072  0.287  0.085 -0.139  0.212
[7,] -0.473  0.697 -0.363  0.020  0.278  0.279  0.053
round(cor(original_data, as.matrix(original_data) %*% as.matrix(gij)), 3)
     [,1]   [,2]   [,3]   [,4]   [,5]   [,6]   [,7]
x1 -0.786  0.556  0.243  0.074 -0.088  0.027 -0.008
x2 -0.945 -0.149  0.209 -0.104  0.167 -0.034 -0.029
x3 -0.958 -0.042 -0.117 -0.235 -0.093  0.007  0.051
x4 -0.949 -0.245 -0.139  0.127 -0.028  0.011 -0.051
x5 -0.068  0.905 -0.361 -0.078  0.058 -0.172 -0.088
x6 -0.916  0.042 -0.072  0.287  0.085 -0.139  0.212
x7 -0.473  0.697 -0.363  0.020  0.278  0.279  0.053

Component loading denotes the importance of i th variable in finding j th component. It also expresses the correlation coefficient between original variables and the principal components.

prin_comps <- as.matrix(original_data) %*% as.matrix(gij)
cor(original_data, prin_comps) |> round(3)
     [,1]   [,2]   [,3]   [,4]   [,5]   [,6]   [,7]
x1 -0.786  0.556  0.243  0.074 -0.088  0.027 -0.008
x2 -0.945 -0.149  0.209 -0.104  0.167 -0.034 -0.029
x3 -0.958 -0.042 -0.117 -0.235 -0.093  0.007  0.051
x4 -0.949 -0.245 -0.139  0.127 -0.028  0.011 -0.051
x5 -0.068  0.905 -0.361 -0.078  0.058 -0.172 -0.088
x6 -0.916  0.042 -0.072  0.287  0.085 -0.139  0.212
x7 -0.473  0.697 -0.363  0.020  0.278  0.279  0.053

Calculation of principal components is affected by the scales of measurement in the original data, because covariance matrix is not independent of scale change.

So we need to standardize the original data before performing eigen decomposition on covariance matrix.

ei_cov_scaled <- original_data |> scale() |> cov() |> eigen()
ei_cov_scaled$values |> round(3)
[1] 4.502 1.774 0.283 0.193 0.146 0.066 0.035
ei_cov_scaled$vectors |> round(3)
       [,1]   [,2]   [,3]   [,4]   [,5]   [,6]   [,7]
[1,] -0.402  0.223  0.778  0.114 -0.097 -0.398 -0.055
[2,] -0.423 -0.263  0.218 -0.253 -0.268  0.726 -0.202
[3,] -0.438 -0.137 -0.157 -0.619  0.225 -0.246  0.519
[4,] -0.423 -0.271 -0.352  0.071  0.176 -0.340 -0.688
[5,] -0.110  0.706 -0.027 -0.158  0.574  0.297 -0.213
[6,] -0.439 -0.084 -0.139  0.713  0.251  0.210  0.407
[7,] -0.291  0.532 -0.423  0.027 -0.667 -0.072  0.053

From correlation matrix

cor_mat <- cor(original_data)
cor_mat |> round(digits = 3)
      x1     x2    x3     x4     x5    x6    x7
x1 1.000  0.688 0.692  0.589  0.454 0.735 0.655
x2 0.688  1.000 0.894  0.888 -0.120 0.828 0.300
x3 0.692  0.894 1.000  0.906  0.076 0.819 0.441
x4 0.589  0.888 0.906  1.000 -0.116 0.891 0.324
x5 0.454 -0.120 0.076 -0.116  1.000 0.114 0.756
x6 0.735  0.828 0.819  0.891  0.114 1.000 0.491
x7 0.655  0.300 0.441  0.324  0.756 0.491 1.000
ei_cor <- eigen(cor_mat)
ei_cor$values |> round(3)
[1] 4.502 1.774 0.283 0.193 0.146 0.066 0.035
ei_cor$vectors |> round(3)
       [,1]   [,2]   [,3]   [,4]   [,5]   [,6]   [,7]
[1,] -0.402  0.223  0.778  0.114  0.097  0.398 -0.055
[2,] -0.423 -0.263  0.218 -0.253  0.268 -0.726 -0.202
[3,] -0.438 -0.137 -0.157 -0.619 -0.225  0.246  0.519
[4,] -0.423 -0.271 -0.352  0.071 -0.176  0.340 -0.688
[5,] -0.110  0.706 -0.027 -0.158 -0.574 -0.297 -0.213
[6,] -0.439 -0.084 -0.139  0.713 -0.251 -0.210  0.407
[7,] -0.291  0.532 -0.423  0.027  0.667  0.072  0.053

Here, values shows the variances of the principal components.

vectors shows the component weights. These are coefficients that are used in calculating principal components.

It is not affected by the scales of measurement in the original data, because correlation coefficient is independent of scale change.

Principal Components

Transformed principal components can be found by matrix multiplication of the original data and component weights -

prin_comps <- as.matrix(original_data) %*% as.matrix(ei_cor$vectors)
apply(prin_comps, 2, var)
[1] 4.93782054 1.11117124 0.40369931 0.26798003 0.09599546 0.13447374 0.22308983
cor(original_data, prin_comps) |> round(3)
     [,1]   [,2]   [,3]   [,4]   [,5]  [,6]   [,7]
x1 -0.828  0.070  0.677 -0.425 -0.230 0.600 -0.527
x2 -0.926 -0.593  0.187 -0.712 -0.179 0.309 -0.801
x3 -0.949 -0.459  0.053 -0.841 -0.470 0.568 -0.710
x4 -0.924 -0.624 -0.107 -0.616 -0.438 0.573 -0.920
x5 -0.154  0.823  0.360 -0.036 -0.384 0.235  0.139
x6 -0.924 -0.375  0.109 -0.395 -0.435 0.529 -0.795
x7 -0.545  0.471  0.226 -0.261 -0.175 0.441 -0.259

The object prin_comps contains the transformed variables called the principal components. As we can see, the correlation matrix of the principal components is identity, meaning all of the variables are orthogonal to each other, or they are independent of each other.

Using package princomp()

p <- prcomp(original_data, center = T, scale. = T)
summary(p)
Importance of components:
                          PC1    PC2     PC3     PC4     PC5     PC6     PC7
Standard deviation     2.1218 1.3320 0.53204 0.43966 0.38258 0.25684 0.18685
Proportion of Variance 0.6432 0.2535 0.04044 0.02761 0.02091 0.00942 0.00499
Cumulative Proportion  0.6432 0.8966 0.93706 0.96468 0.98559 0.99501 1.00000

Variances explained by each principal component -

p$sdev^2
[1] 4.50218058 1.77420306 0.28306214 0.19330296 0.14637109 0.06596874 0.03491141
apply(p$x, 2, var)
       PC1        PC2        PC3        PC4        PC5        PC6        PC7 
4.50218058 1.77420306 0.28306214 0.19330296 0.14637109 0.06596874 0.03491141 

Component weights or eigen vectors -

p$rotation |> round(3)
      PC1    PC2    PC3    PC4    PC5    PC6    PC7
x1 -0.402  0.223 -0.778  0.114 -0.097  0.398 -0.055
x2 -0.423 -0.263 -0.218 -0.253 -0.268 -0.726 -0.202
x3 -0.438 -0.137  0.157 -0.619  0.225  0.246  0.519
x4 -0.423 -0.271  0.352  0.071  0.176  0.340 -0.688
x5 -0.110  0.706  0.027 -0.158  0.574 -0.297 -0.213
x6 -0.439 -0.084  0.139  0.713  0.251 -0.210  0.407
x7 -0.291  0.532  0.423  0.027 -0.667  0.072  0.053

Principal components -

head(p$x, 15) |> round(3)
         PC1    PC2    PC3    PC4    PC5    PC6    PC7
 [1,] -1.657 -1.608  0.453  0.404 -0.320  0.344 -0.132
 [2,] -0.564 -1.875  0.553  0.303 -0.092  0.350 -0.194
 [3,] -2.667 -1.305 -0.051  0.678 -0.223  0.307  0.046
 [4,]  0.218 -2.565  0.968 -0.024 -0.136  0.299  0.143
 [5,]  1.803 -2.708  0.558 -0.450  0.142 -0.008 -0.404
 [6,]  0.529 -1.851 -0.512 -0.275  0.179  0.057  0.058
 [7,] -2.490 -0.640 -0.087  0.061 -0.160 -0.384  0.276
 [8,] -1.242 -1.938  0.180 -0.141  0.395 -0.537  0.001
 [9,] -0.361 -1.494 -0.421  0.267 -0.270 -0.285 -0.119
[10,] -1.316 -1.818 -1.183 -0.240  0.286 -0.078  0.096
[11,] -1.208 -1.385 -0.262 -0.144  0.026 -0.207  0.310
[12,] -2.883 -0.993 -0.902  0.201 -0.200 -0.099 -0.017
[13,] -2.280  1.265 -0.510  0.074 -0.091  0.484 -0.169
[14,] -2.162  1.300 -0.447  0.681  0.514  0.269 -0.012
[15,] -1.413  1.229 -0.501  0.091  0.595  0.019 -0.333
LS0tDQp0aXRsZTogIlBDQSBpbiBSIg0KYXV0aG9yOiAnTUQuIEFIU0FOVUwgSVNMQU0nDQpkYXRlOiAiTGFzdCB1cGRhdGVkIG9uIGByIGZvcm1hdChTeXMuRGF0ZSgpLCAnJWQgJUIsICVZJylgIg0Kb3V0cHV0Og0KICBybWRmb3JtYXRzOjpyZWFkdGhlZG93bjoNCiAgICBzZWxmX2NvbnRhaW5lZDogdHJ1ZQ0KICAgIGxpZ2h0Ym94OiB0cnVlDQogICAgY29kZV9kb3dubG9hZDogdHJ1ZQ0KcGtnZG93bjoNCiAgYXNfaXM6IHRydWUNCi0tLQ0KDQpgYGB7Y3NzLCBlY2hvPUZBTFNFfQ0KYm9keXsNCiAgZm9udC1mYW1pbHk6ICJBcmlhbCI7DQogIGZvbnQtc2l6ZTogMTFwdDsNCn0NCmBgYA0KDQpgYGB7ciwgaW5jbHVkZT1GQUxTRX0NCmtuaXRyOjpvcHRzX2NodW5rJHNldCgNCiAgY29tbWVudCA9ICIiLCBwcm9tcHQgPSBGLCBtZXNzYWdlID0gRiwgd2FybmluZyA9IEYNCikNCmxpYnJhcnkoa25pdHIpDQpsaWJyYXJ5KGthYmxlRXh0cmEpDQpgYGANCg0KLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tDQoNCiMjIERhdGENCg0KYGBge3J9DQpkZiA8LSByZWFkeGw6OnJlYWRfZXhjZWwoImRhdGEvMS4yIGRhdGEueGxzeCIpDQpvcmlnaW5hbF9kYXRhIDwtIGRmWywtOF0NCmBgYA0KDQpbRG93bmxvYWQgZGF0YV0oaHR0cHM6Ly9tZC1haHNhbnVsLmdpdGh1Yi5pby8yMDIyLzAzLzIxL3BjYS1pbnRyb2R1Y3Rpb24tci9kYXRhLzEuMiUyMGRhdGEueGxzeCkNCg0KIyMgRWlnZW4gZGVjb21wb3NpdGlvbg0KDQojIyMgRnJvbSBjb3ZhcmlhbmNlIG1hdHJpeA0KDQpgYGB7cn0NCmNvdl9tYXQgPC0gY292KG9yaWdpbmFsX2RhdGEpDQpjb3ZfbWF0IHw+IHJvdW5kKGRpZ2l0cyA9IDMpDQpgYGANCg0KYGBge3J9DQplaV9jb3YgPC0gZWlnZW4oY292X21hdCkNCmVpX2NvdiR2YWx1ZXMgfD4gcm91bmQoMykNCmVpX2NvdiR2ZWN0b3JzIHw+IHJvdW5kKDMpDQpgYGANCg0KSGVyZSwgYHZhbHVlc2Agc2hvd3MgdGhlIHZhcmlhbmNlcyBvZiB0aGUgcHJpbmNpcGFsIGNvbXBvbmVudHMuDQoNCmB2ZWN0b3JzYCBzaG93cyB0aGUgY29tcG9uZW50IHdlaWdodHMuIFRoZXNlIGFyZSBjb2VmZmljaWVudHMgdGhhdCBhcmUgdXNlZCBpbiBjYWxjdWxhdGluZyBwcmluY2lwYWwgY29tcG9uZW50cy4NCg0KQ29tcG9uZW50IGxvYWRpbmdzIGFyZSBjb21wdXRlZCBhcyAtICQkIFxmcmFje2dfe2lqfSAqIFxzcXJ0KGxfail9e1xzcXJ0KFNfe2lpfSl9ICQkDQoNCndoZXJlLCAkZ197aWp9JCA9IGNvbXBvbmVudCB3ZWlnaHQgYmV0d2VlbiBpLXRoIHZhcmlhYmxlIGFuZCBqLXRoIGNvbXBvbmVudC4NCg0KJGxfaiQgPSBpLXRoIGVpZ2VudmFsdWUsIHZhcmlhbmNlIG9mIGktdGggcHJpbmNpcGFsIGNvbXBvbmVudA0KDQokU197aWl9JCA9IGRpYWdvbmFsIGVsZW1lbnRzIGluIHRoZSBjb3ZhcmlhbmNlIG1hdHJpeCAoMSBpZiBjb3JyZWxhdGlvbiBtYXRyaXggaXMgdXNlZCkNCg0KYGBge3J9DQpnaWogPSBlaV9jb3YkdmVjdG9ycw0KbGogPSBzcXJ0KGVpX2NvdiR2YWx1ZXMpDQpzaWkgPSBzcXJ0KGRpYWcoY292X21hdCkpDQpuID0gbGVuZ3RoKHNpaSkNCg0Kcm91bmQoZ2lqICogbWF0cml4KHJlcChsaiwgZWFjaCA9IG4pLCBucm93ID0gbikgLyBtYXRyaXgocmVwKHNpaSwgdGltZXMgPSBuKSwgbnJvdyA9IG4pLCAzKQ0KDQpyb3VuZChjb3Iob3JpZ2luYWxfZGF0YSwgYXMubWF0cml4KG9yaWdpbmFsX2RhdGEpICUqJSBhcy5tYXRyaXgoZ2lqKSksIDMpDQpgYGANCg0KQ29tcG9uZW50IGxvYWRpbmcgZGVub3RlcyB0aGUgaW1wb3J0YW5jZSBvZiBpIHRoIHZhcmlhYmxlIGluIGZpbmRpbmcgaiB0aCBjb21wb25lbnQuIEl0IGFsc28gZXhwcmVzc2VzIHRoZSBjb3JyZWxhdGlvbiBjb2VmZmljaWVudCBiZXR3ZWVuIG9yaWdpbmFsIHZhcmlhYmxlcyBhbmQgdGhlIHByaW5jaXBhbCBjb21wb25lbnRzLg0KDQpgYGB7cn0NCnByaW5fY29tcHMgPC0gYXMubWF0cml4KG9yaWdpbmFsX2RhdGEpICUqJSBhcy5tYXRyaXgoZ2lqKQ0KY29yKG9yaWdpbmFsX2RhdGEsIHByaW5fY29tcHMpIHw+IHJvdW5kKDMpDQpgYGANCg0KKipDYWxjdWxhdGlvbiBvZiBwcmluY2lwYWwgY29tcG9uZW50cyBpcyBhZmZlY3RlZCBieSB0aGUgc2NhbGVzIG9mIG1lYXN1cmVtZW50IGluIHRoZSBvcmlnaW5hbCBkYXRhLCBiZWNhdXNlIGNvdmFyaWFuY2UgbWF0cml4IGlzIG5vdCBpbmRlcGVuZGVudCBvZiBzY2FsZSBjaGFuZ2UuKioNCg0KU28gd2UgbmVlZCB0byAqKnN0YW5kYXJkaXplKiogdGhlIG9yaWdpbmFsIGRhdGEgYmVmb3JlIHBlcmZvcm1pbmcgZWlnZW4gZGVjb21wb3NpdGlvbiBvbiBjb3ZhcmlhbmNlIG1hdHJpeC4NCg0KYGBge3J9DQplaV9jb3Zfc2NhbGVkIDwtIG9yaWdpbmFsX2RhdGEgfD4gc2NhbGUoKSB8PiBjb3YoKSB8PiBlaWdlbigpDQplaV9jb3Zfc2NhbGVkJHZhbHVlcyB8PiByb3VuZCgzKQ0KZWlfY292X3NjYWxlZCR2ZWN0b3JzIHw+IHJvdW5kKDMpDQpgYGANCg0KIyMjIEZyb20gY29ycmVsYXRpb24gbWF0cml4DQoNCmBgYHtyfQ0KY29yX21hdCA8LSBjb3Iob3JpZ2luYWxfZGF0YSkNCmNvcl9tYXQgfD4gcm91bmQoZGlnaXRzID0gMykNCmBgYA0KDQpgYGB7cn0NCmVpX2NvciA8LSBlaWdlbihjb3JfbWF0KQ0KZWlfY29yJHZhbHVlcyB8PiByb3VuZCgzKQ0KZWlfY29yJHZlY3RvcnMgfD4gcm91bmQoMykNCmBgYA0KDQpIZXJlLCBgdmFsdWVzYCBzaG93cyB0aGUgdmFyaWFuY2VzIG9mIHRoZSBwcmluY2lwYWwgY29tcG9uZW50cy4NCg0KYHZlY3RvcnNgIHNob3dzIHRoZSBjb21wb25lbnQgd2VpZ2h0cy4gVGhlc2UgYXJlIGNvZWZmaWNpZW50cyB0aGF0IGFyZSB1c2VkIGluIGNhbGN1bGF0aW5nIHByaW5jaXBhbCBjb21wb25lbnRzLg0KDQpJdCBpcyBub3QgYWZmZWN0ZWQgYnkgdGhlIHNjYWxlcyBvZiBtZWFzdXJlbWVudCBpbiB0aGUgb3JpZ2luYWwgZGF0YSwgYmVjYXVzZSBjb3JyZWxhdGlvbiBjb2VmZmljaWVudCBpcyBpbmRlcGVuZGVudCBvZiBzY2FsZSBjaGFuZ2UuDQoNCiMjIFByaW5jaXBhbCBDb21wb25lbnRzDQoNClRyYW5zZm9ybWVkIHByaW5jaXBhbCBjb21wb25lbnRzIGNhbiBiZSBmb3VuZCBieSBtYXRyaXggbXVsdGlwbGljYXRpb24gb2YgdGhlIG9yaWdpbmFsIGRhdGEgYW5kIGNvbXBvbmVudCB3ZWlnaHRzIC0NCg0KYGBge3J9DQpwcmluX2NvbXBzIDwtIGFzLm1hdHJpeChvcmlnaW5hbF9kYXRhKSAlKiUgYXMubWF0cml4KGVpX2NvciR2ZWN0b3JzKQ0KYXBwbHkocHJpbl9jb21wcywgMiwgdmFyKQ0KY29yKG9yaWdpbmFsX2RhdGEsIHByaW5fY29tcHMpIHw+IHJvdW5kKDMpDQpgYGANCg0KVGhlIG9iamVjdCBgcHJpbl9jb21wc2AgY29udGFpbnMgdGhlIHRyYW5zZm9ybWVkIHZhcmlhYmxlcyBjYWxsZWQgdGhlIHByaW5jaXBhbCBjb21wb25lbnRzLiBBcyB3ZSBjYW4gc2VlLCB0aGUgY29ycmVsYXRpb24gbWF0cml4IG9mIHRoZSBwcmluY2lwYWwgY29tcG9uZW50cyBpcyBpZGVudGl0eSwgbWVhbmluZyBhbGwgb2YgdGhlIHZhcmlhYmxlcyBhcmUgb3J0aG9nb25hbCB0byBlYWNoIG90aGVyLCBvciB0aGV5IGFyZSBpbmRlcGVuZGVudCBvZiBlYWNoIG90aGVyLg0KDQojIyBVc2luZyBwYWNrYWdlIHByaW5jb21wKCkNCg0KYGBge3J9DQpwIDwtIHByY29tcChvcmlnaW5hbF9kYXRhLCBjZW50ZXIgPSBULCBzY2FsZS4gPSBUKQ0Kc3VtbWFyeShwKQ0KYGBgDQoNClZhcmlhbmNlcyBleHBsYWluZWQgYnkgZWFjaCBwcmluY2lwYWwgY29tcG9uZW50IC0NCg0KYGBge3J9DQpwJHNkZXZeMg0KYXBwbHkocCR4LCAyLCB2YXIpDQpgYGANCg0KQ29tcG9uZW50IHdlaWdodHMgb3IgZWlnZW4gdmVjdG9ycyAtDQoNCmBgYHtyfQ0KcCRyb3RhdGlvbiB8PiByb3VuZCgzKQ0KYGBgDQoNClByaW5jaXBhbCBjb21wb25lbnRzIC0NCg0KYGBge3J9DQpoZWFkKHAkeCwgMTUpIHw+IHJvdW5kKDMpDQpgYGANCg==