We will use the Places Rated Almanac data (Boyer and Savageau) which
rates 329 communities according to nine criteria:
1. Climate and Terrain
2. Housing
3. Health Care & Environment
4. Crime
5. Transportation
6. Education
7. The Arts
8. Recreation
9. Economics
library(base)
places <- read.table("C:/Users/63906/Downloads/places.txt")
places
head(places)
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1 521 6200 237 923 4031 2757 996 1405 7633 1
2 575 8138 1656 886 4883 2438 5564 2632 4350 2
3 468 7339 618 970 2531 2560 237 859 5250 3
4 476 7908 1431 610 6883 3399 4655 1617 5864 4
5 659 8393 1853 1483 6558 3026 4496 2612 5727 5
6 520 5819 640 727 2444 2972 334 1018 5254 6
# Exclude V10 since it is the column for the 329 observations representing the 329 communities in our dataset
places1 = data.table(places$V1, places$V2, places$V3, places$V4, places$V5, places$V6, places$V7, places$V8, places$V9)
places1
# Log Transform
log.places <- log(places1)
log.places
# Apply PCA - scale. = TRUE is highly advisable, but default is FALSE
places.pca <- prcomp(log.places, center = TRUE, scale. = TRUE)
places.pca
Standard deviations (1, .., p=9):
[1] 1.8159827 1.1016178 1.0514418 0.9525124 0.9277008 0.7497905 0.6955721
[8] 0.5639789 0.5011269
Rotation (n x k) = (9 x 9):
PC1 PC2 PC3 PC4 PC5 PC6
V1 0.1579414 0.06862938 -0.79970997 0.37680952 -0.04104588 0.2166949681
V2 0.3844053 0.13920883 -0.07961647 0.19654301 0.57986793 -0.0822200812
V3 0.4099096 -0.37181203 0.01947537 0.11252206 -0.02956935 -0.5348756017
V4 0.2591017 0.47413246 -0.12846722 -0.04229962 -0.69217100 -0.1399009169
V5 0.3748890 -0.14148642 0.14106828 -0.43007675 -0.19141608 0.3238913974
V6 0.2743254 -0.45235526 0.24105584 0.45694297 -0.22474374 0.5265827320
V7 0.4738471 -0.10441020 -0.01102628 -0.14688130 -0.01193024 -0.3210570706
V8 0.3534118 0.29194243 -0.04181639 -0.40401889 0.30565371 0.3941387718
V9 0.1640135 0.54045312 0.50731026 0.47578009 0.03710776 -0.0009737383
PC7 PC8 PC9
V1 -0.1513516 -0.3411282 -0.03009755
V2 -0.2751971 0.6061010 0.04226906
V3 0.1349750 -0.1500575 -0.59412763
V4 0.1095036 0.4201255 -0.05101188
V5 -0.6785670 -0.1188325 -0.13584327
V6 0.2620958 0.2111749 0.11012420
V7 0.1204986 -0.2598673 0.74672678
V8 0.5530938 -0.1377181 -0.22636544
V9 -0.1468669 -0.4147736 -0.04790278
Since skewness and the magnitude of the variables influence the
resulting PCs, it is good practice to apply skewness transformation,
center and scale the variables prior to the application of PCA.
# Print Method (Returns the standard deviation of each of the nine PCs, and their rotation/loadings which are the coefficients of the linear combinations of the continuous variables)
print(places.pca)
Standard deviations (1, .., p=9):
[1] 1.8159827 1.1016178 1.0514418 0.9525124 0.9277008 0.7497905 0.6955721
[8] 0.5639789 0.5011269
Rotation (n x k) = (9 x 9):
PC1 PC2 PC3 PC4 PC5 PC6
V1 0.1579414 0.06862938 -0.79970997 0.37680952 -0.04104588 0.2166949681
V2 0.3844053 0.13920883 -0.07961647 0.19654301 0.57986793 -0.0822200812
V3 0.4099096 -0.37181203 0.01947537 0.11252206 -0.02956935 -0.5348756017
V4 0.2591017 0.47413246 -0.12846722 -0.04229962 -0.69217100 -0.1399009169
V5 0.3748890 -0.14148642 0.14106828 -0.43007675 -0.19141608 0.3238913974
V6 0.2743254 -0.45235526 0.24105584 0.45694297 -0.22474374 0.5265827320
V7 0.4738471 -0.10441020 -0.01102628 -0.14688130 -0.01193024 -0.3210570706
V8 0.3534118 0.29194243 -0.04181639 -0.40401889 0.30565371 0.3941387718
V9 0.1640135 0.54045312 0.50731026 0.47578009 0.03710776 -0.0009737383
PC7 PC8 PC9
V1 -0.1513516 -0.3411282 -0.03009755
V2 -0.2751971 0.6061010 0.04226906
V3 0.1349750 -0.1500575 -0.59412763
V4 0.1095036 0.4201255 -0.05101188
V5 -0.6785670 -0.1188325 -0.13584327
V6 0.2620958 0.2111749 0.11012420
V7 0.1204986 -0.2598673 0.74672678
V8 0.5530938 -0.1377181 -0.22636544
V9 -0.1468669 -0.4147736 -0.04790278
# Plot Method (Returns a plot of the variances (y-axis) associated with the PCs (x-axis).s)
plot(places.pca, type = "l")
The figure above is useful to decide how many PCs to retain for
further analysis. Hence, in this case, we have only 9 PCs.
# The Summary
summary(places.pca)
Importance of components:
PC1 PC2 PC3 PC4 PC5 PC6 PC7
Standard deviation 1.8160 1.1016 1.0514 0.9525 0.92770 0.74979 0.69557
Proportion of Variance 0.3664 0.1348 0.1228 0.1008 0.09563 0.06247 0.05376
Cumulative Proportion 0.3664 0.5013 0.6241 0.7249 0.82053 0.88300 0.93676
PC8 PC9
Standard deviation 0.56398 0.5011
Proportion of Variance 0.03534 0.0279
Cumulative Proportion 0.97210 1.0000
The first row describes the standard deviation associated with each
PC.
The second row presents the proportion of the variance in the data
explained by each component.
The third row describes the cumulative proportion of explained
variance.
# Predict the PCs
predict(places.pca, newdata = tail(log.places, 2))
PC1 PC2 PC3 PC4 PC5 PC6 PC7
[1,] -0.3577229 -1.332915 -1.044542 -0.2574776 -0.6787995 -0.3418898 0.5519518
[2,] -2.9065986 1.253365 -1.345258 0.2585665 -0.4538613 0.1515577 -0.8514209
PC8 PC9
[1,] 0.3463322 0.43894223
[2,] 1.1935587 -0.08967011
# Biplot of the PC
biplot(places.pca)