Principal Component Analysis

Will Perform and visualize PCA in the given mtcars data set.

df <- mtcars
head(df)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Selecting the numerical data (excluding the categorical variables vs and am)

df <- mtcars[,c(1:7,10,11)]
head(df)
##                    mpg cyl disp  hp drat    wt  qsec gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22    3    1

We then pass df to the prcomp(). We also set two arguments, center and scale, to be TRUE then preview our object with summary

mtcars.pca <- prcomp(df, center = TRUE, scale. = TRUE)
summary(mtcars.pca)
## Importance of components:
##                           PC1    PC2     PC3     PC4     PC5     PC6     PC7
## Standard deviation     2.3782 1.4429 0.71008 0.51481 0.42797 0.35184 0.32413
## Proportion of Variance 0.6284 0.2313 0.05602 0.02945 0.02035 0.01375 0.01167
## Cumulative Proportion  0.6284 0.8598 0.91581 0.94525 0.96560 0.97936 0.99103
##                           PC8     PC9
## Standard deviation     0.2419 0.14896
## Proportion of Variance 0.0065 0.00247
## Cumulative Proportion  0.9975 1.00000

As a result we obtain 9 principal components, each which explain a percentage of the total variation of the data set. PC1 explains 63% of the total variance, which means that nearly two-thirds of the information in the data set (9 variables) can be encapsulated by just that one Principal Component. PC2 explains 23% of the variance. etc

Calling str() to have a look at your PCA object

str(mtcars.pca)
## List of 5
##  $ sdev    : num [1:9] 2.378 1.443 0.71 0.515 0.428 ...
##  $ rotation: num [1:9, 1:9] -0.393 0.403 0.397 0.367 -0.312 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:9] "mpg" "cyl" "disp" "hp" ...
##   .. ..$ : chr [1:9] "PC1" "PC2" "PC3" "PC4" ...
##  $ center  : Named num [1:9] 20.09 6.19 230.72 146.69 3.6 ...
##   ..- attr(*, "names")= chr [1:9] "mpg" "cyl" "disp" "hp" ...
##  $ scale   : Named num [1:9] 6.027 1.786 123.939 68.563 0.535 ...
##   ..- attr(*, "names")= chr [1:9] "mpg" "cyl" "disp" "hp" ...
##  $ x       : num [1:32, 1:9] -0.664 -0.637 -2.3 -0.215 1.587 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
##   .. ..$ : chr [1:9] "PC1" "PC2" "PC3" "PC4" ...
##  - attr(*, "class")= chr "prcomp"

Here we note that our pca object: The center point (center), scaling (scale), standard deviation(sdev) of each principal component. The relationship (correlation or anti correlation, etc) between the initial variables and the principal components (rotation). The values of each sample in terms of the principal components (x)

View the principal component loading

mtcars.pca$rotation
##             PC1         PC2         PC3          PC4        PC5         PC6
## mpg  -0.3931477  0.02753861 -0.22119309 -0.006126378 -0.3207620  0.72015586
## cyl   0.4025537  0.01570975 -0.25231615  0.040700251  0.1171397  0.22432550
## disp  0.3973528 -0.08888469 -0.07825139  0.339493732 -0.4867849 -0.01967516
## hp    0.3670814  0.26941371 -0.01721159  0.068300993 -0.2947317  0.35394225
## drat -0.3118165  0.34165268  0.14995507  0.845658485  0.1619259 -0.01536794
## wt    0.3734771 -0.17194306  0.45373418  0.191260029 -0.1874822 -0.08377237
## qsec -0.2243508 -0.48404435  0.62812782 -0.030329127 -0.1482495  0.25752940
## gear -0.2094749  0.55078264  0.20658376 -0.282381831 -0.5624860 -0.32298239
## carb  0.2445807  0.48431310  0.46412069 -0.214492216  0.3997820  0.35706914
##              PC7         PC8         PC9
## mpg  -0.38138068 -0.12465987  0.11492862
## cyl  -0.15893251  0.81032177  0.16266295
## disp -0.18233095 -0.06416707 -0.66190812
## hp    0.69620751 -0.16573993  0.25177306
## drat  0.04767957  0.13505066  0.03809096
## wt   -0.42777608 -0.19839375  0.56918844
## qsec  0.27622581  0.35613350 -0.16873731
## gear -0.08555707  0.31636479  0.04719694
## carb -0.20604210 -0.10832772 -0.32045892

See the principal components

dim(mtcars.pca$x)
## [1] 32  9

We will now plot our pca. This will provide us with some very useful insights i.e. which cars are most similar to each other

library(dplyr)
## Warning: package 'dplyr' was built under R version 4.1.3
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(rpart)
## Warning: package 'rpart' was built under R version 4.1.3
library(rpart.plot)
## Warning: package 'rpart.plot' was built under R version 4.1.3

Plotting the resultant principal components

biplot(mtcars.pca, main = "Biplot", scale = 0.01)

From the graph we will see that the variables hp, cyl and disp contribute to PC1,with higher values in those variables moving the samples to the right on the plot.

Adding more detail to the plot, we provide arguments row names as labels

biplot(mtcars.pca,  obs.scale = 1, var.scale = 1)
## Warning in plot.window(...): "obs.scale" is not a graphical parameter
## Warning in plot.window(...): "var.scale" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "obs.scale" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "var.scale" is not a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "obs.scale" is not
## a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "var.scale" is not
## a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "obs.scale" is not
## a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "var.scale" is not
## a graphical parameter
## Warning in box(...): "obs.scale" is not a graphical parameter
## Warning in box(...): "var.scale" is not a graphical parameter
## Warning in title(...): "obs.scale" is not a graphical parameter
## Warning in title(...): "var.scale" is not a graphical parameter
## Warning in text.default(x, xlabs, cex = cex[1L], col = col[1L], ...):
## "obs.scale" is not a graphical parameter
## Warning in text.default(x, xlabs, cex = cex[1L], col = col[1L], ...):
## "var.scale" is not a graphical parameter
## Warning in plot.window(...): "obs.scale" is not a graphical parameter
## Warning in plot.window(...): "var.scale" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "obs.scale" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "var.scale" is not a graphical parameter
## Warning in title(...): "obs.scale" is not a graphical parameter
## Warning in title(...): "var.scale" is not a graphical parameter
## Warning in axis(3, col = col[2L], ...): "obs.scale" is not a graphical parameter
## Warning in axis(3, col = col[2L], ...): "var.scale" is not a graphical parameter
## Warning in axis(4, col = col[2L], ...): "obs.scale" is not a graphical parameter
## Warning in axis(4, col = col[2L], ...): "var.scale" is not a graphical parameter
## Warning in text.default(y, labels = ylabs, cex = cex[2L], col = col[2L], :
## "obs.scale" is not a graphical parameter
## Warning in text.default(y, labels = ylabs, cex = cex[2L], col = col[2L], :
## "var.scale" is not a graphical parameter

We now see which cars are similar to one another. The sports cars Maserati Bora, Ferrari Dino and Ford Pantera L all cluster together at the top

We can also look at the origin of each of the cars by putting them into one of three categories i.e. US, Japanese and European cars.

suppressWarnings(expr)
## function (expr) 
## {
##     enexpr(expr)
## }
## <bytecode: 0x000000001c434fd8>
## <environment: namespace:rlang>
mtcars.country <- c(rep("Japan", 3), rep("US",4), rep("Europe", 7),rep("US",3), "Europe", rep("Japan", 3), rep("US",4), rep("Europe", 3), "US", rep("Europe", 3))

biplot(mtcars.pca,ellipse=TRUE,  groups=mtcars.country, obs.scale = 1, var.scale = 1)
## Warning in plot.window(...): "ellipse" is not a graphical parameter
## Warning in plot.window(...): "groups" is not a graphical parameter
## Warning in plot.window(...): "obs.scale" is not a graphical parameter
## Warning in plot.window(...): "var.scale" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "ellipse" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "groups" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "obs.scale" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "var.scale" is not a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "ellipse" is not a
## graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "groups" is not a
## graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "obs.scale" is not
## a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "var.scale" is not
## a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "ellipse" is not a
## graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "groups" is not a
## graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "obs.scale" is not
## a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "var.scale" is not
## a graphical parameter
## Warning in box(...): "ellipse" is not a graphical parameter
## Warning in box(...): "groups" is not a graphical parameter
## Warning in box(...): "obs.scale" is not a graphical parameter
## Warning in box(...): "var.scale" is not a graphical parameter
## Warning in title(...): "ellipse" is not a graphical parameter
## Warning in title(...): "groups" is not a graphical parameter
## Warning in title(...): "obs.scale" is not a graphical parameter
## Warning in title(...): "var.scale" is not a graphical parameter
## Warning in text.default(x, xlabs, cex = cex[1L], col = col[1L], ...): "ellipse"
## is not a graphical parameter
## Warning in text.default(x, xlabs, cex = cex[1L], col = col[1L], ...): "groups"
## is not a graphical parameter
## Warning in text.default(x, xlabs, cex = cex[1L], col = col[1L], ...):
## "obs.scale" is not a graphical parameter
## Warning in text.default(x, xlabs, cex = cex[1L], col = col[1L], ...):
## "var.scale" is not a graphical parameter
## Warning in plot.window(...): "ellipse" is not a graphical parameter
## Warning in plot.window(...): "groups" is not a graphical parameter
## Warning in plot.window(...): "obs.scale" is not a graphical parameter
## Warning in plot.window(...): "var.scale" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "ellipse" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "groups" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "obs.scale" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "var.scale" is not a graphical parameter
## Warning in title(...): "ellipse" is not a graphical parameter
## Warning in title(...): "groups" is not a graphical parameter
## Warning in title(...): "obs.scale" is not a graphical parameter
## Warning in title(...): "var.scale" is not a graphical parameter
## Warning in axis(3, col = col[2L], ...): "ellipse" is not a graphical parameter
## Warning in axis(3, col = col[2L], ...): "groups" is not a graphical parameter
## Warning in axis(3, col = col[2L], ...): "obs.scale" is not a graphical parameter
## Warning in axis(3, col = col[2L], ...): "var.scale" is not a graphical parameter
## Warning in axis(4, col = col[2L], ...): "ellipse" is not a graphical parameter
## Warning in axis(4, col = col[2L], ...): "groups" is not a graphical parameter
## Warning in axis(4, col = col[2L], ...): "obs.scale" is not a graphical parameter
## Warning in axis(4, col = col[2L], ...): "var.scale" is not a graphical parameter
## Warning in text.default(y, labels = ylabs, cex = cex[2L], col = col[2L], :
## "ellipse" is not a graphical parameter
## Warning in text.default(y, labels = ylabs, cex = cex[2L], col = col[2L], :
## "groups" is not a graphical parameter
## Warning in text.default(y, labels = ylabs, cex = cex[2L], col = col[2L], :
## "obs.scale" is not a graphical parameter
## Warning in text.default(y, labels = ylabs, cex = cex[2L], col = col[2L], :
## "var.scale" is not a graphical parameter

We get to see that US cars for a cluster on the right. This cluster is characterized by high values for cyl, disp and wt. Japanese cars are characterized by high mpg. European cars are somewhat in the middle and less tightly clustered that either group.

Let’s plot PC3 and PC4

biplot(mtcars.pca,ellipse=TRUE,choices=c(3,4),   groups=mtcars.country)
## Warning in plot.window(...): "ellipse" is not a graphical parameter
## Warning in plot.window(...): "groups" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "ellipse" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "groups" is not a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "ellipse" is not a
## graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "groups" is not a
## graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "ellipse" is not a
## graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "groups" is not a
## graphical parameter
## Warning in box(...): "ellipse" is not a graphical parameter
## Warning in box(...): "groups" is not a graphical parameter
## Warning in title(...): "ellipse" is not a graphical parameter
## Warning in title(...): "groups" is not a graphical parameter
## Warning in text.default(x, xlabs, cex = cex[1L], col = col[1L], ...): "ellipse"
## is not a graphical parameter
## Warning in text.default(x, xlabs, cex = cex[1L], col = col[1L], ...): "groups"
## is not a graphical parameter
## Warning in plot.window(...): "ellipse" is not a graphical parameter
## Warning in plot.window(...): "groups" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "ellipse" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "groups" is not a graphical parameter
## Warning in title(...): "ellipse" is not a graphical parameter
## Warning in title(...): "groups" is not a graphical parameter
## Warning in axis(3, col = col[2L], ...): "ellipse" is not a graphical parameter
## Warning in axis(3, col = col[2L], ...): "groups" is not a graphical parameter
## Warning in axis(4, col = col[2L], ...): "ellipse" is not a graphical parameter
## Warning in axis(4, col = col[2L], ...): "groups" is not a graphical parameter
## Warning in text.default(y, labels = ylabs, cex = cex[2L], col = col[2L], :
## "ellipse" is not a graphical parameter
## Warning in text.default(y, labels = ylabs, cex = cex[2L], col = col[2L], :
## "groups" is not a graphical parameter

We find it difficult to derive insights from the given plot mainly because PC3 and PC4 explain very small percentages of the total variation, thus it would be surprising if we found that they were very informative and separated the groups or revealed apparent patterns.

Having performed PCA using this data set, if we were to build a classification model to identify the origin of a car (i.e. European, Japanese, US), the variables cyl, disp, wt and mpg would be significant variables as seen in our PCA analysis.

Challenge 1

Perform and plot PCA to the give Iris dataset. Reduce 4 dimensinal data into 2 or three dimensions. Provide remarks on your analysis. Dataset url = http://bit.ly/IrisDataset

Loading the data set

library(data.table)
## Warning: package 'data.table' was built under R version 4.1.2
## 
## Attaching package: 'data.table'
## The following objects are masked from 'package:dplyr':
## 
##     between, first, last
iris <- fread('http://bit.ly/IrisDataset')
iris
##      sepal_length sepal_width petal_length petal_width        species
##   1:          5.1         3.5          1.4         0.2    Iris-setosa
##   2:          4.9         3.0          1.4         0.2    Iris-setosa
##   3:          4.7         3.2          1.3         0.2    Iris-setosa
##   4:          4.6         3.1          1.5         0.2    Iris-setosa
##   5:          5.0         3.6          1.4         0.2    Iris-setosa
##  ---                                                                 
## 146:          6.7         3.0          5.2         2.3 Iris-virginica
## 147:          6.3         2.5          5.0         1.9 Iris-virginica
## 148:          6.5         3.0          5.2         2.0 Iris-virginica
## 149:          6.2         3.4          5.4         2.3 Iris-virginica
## 150:          5.9         3.0          5.1         1.8 Iris-virginica

Selecting the numerical attribute

drf <- iris[,c(1:3)]
head(drf)
##    sepal_length sepal_width petal_length
## 1:          5.1         3.5          1.4
## 2:          4.9         3.0          1.4
## 3:          4.7         3.2          1.3
## 4:          4.6         3.1          1.5
## 5:          5.0         3.6          1.4
## 6:          5.4         3.9          1.7

Applying PCA using prcomp function, Need to scale / Normalize as PCA depends on distance measure

my_pca <- prcomp(drf, scale = TRUE,
                center = TRUE, retx = T)
names(my_pca)
## [1] "sdev"     "rotation" "center"   "scale"    "x"
summary(my_pca)
## Importance of components:
##                           PC1    PC2     PC3
## Standard deviation     1.4191 0.9565 0.26710
## Proportion of Variance 0.6713 0.3049 0.02378
## Cumulative Proportion  0.6713 0.9762 1.00000

View the principal component loading

my_pca$rotation
##                     PC1         PC2        PC3
## sepal_length  0.6313798 -0.42771052 -0.6468564
## sepal_width  -0.3542423 -0.90110216  0.2500546
## petal_length  0.6898347 -0.07126444  0.7204509

See the principal components

dim(my_pca$x)
## [1] 150   3

Plotting the resultant principal components

biplot(my_pca, main = "Biplot", scale = 0)

Compute standard deviation

my_pca$sdev
## [1] 1.4190935 0.9564678 0.2671011

Compute variance

my_pca.var <- my_pca$sdev ^ 2
my_pca.var
## [1] 2.01382631 0.91483072 0.07134297

Proportion of variance for a scree plot

propve <- my_pca.var / sum(my_pca.var)
propve
## [1] 0.67127544 0.30494357 0.02378099

Plot variance explained for each principal component

plot(propve, xlab = "principal component",
            ylab = "Proportion of Variance Explained",
            ylim = c(0, 1), type = "b",
            main = "Scree Plot")

Plot the cumulative proportion of variance explained

plot(cumsum(propve),
    xlab = "Principal Component",
    ylab = "Cumulative Proportion of Variance Explained",
    ylim = c(0, 1), type = "b")

Challenge 2

Perform and plot the given housing dataset. Provide remarks to your analysis. Data set url = http://bit.ly/BostonHousingDataset

Loading the data set

library(data.table)
boston <- fread('http://bit.ly/BostonHousingDataset')
boston
##         crim zn indus chas   nox    rm  age    dis rad tax ptratio      b lstat
##   1: 0.00632 18  2.31    0 0.538 6.575 65.2 4.0900   1 296    15.3 396.90  4.98
##   2: 0.02731  0  7.07    0 0.469 6.421 78.9 4.9671   2 242    17.8 396.90  9.14
##   3: 0.02729  0  7.07    0 0.469 7.185 61.1 4.9671   2 242    17.8 392.83  4.03
##   4: 0.03237  0  2.18    0 0.458 6.998 45.8 6.0622   3 222    18.7 394.63  2.94
##   5: 0.06905  0  2.18    0 0.458 7.147 54.2 6.0622   3 222    18.7 396.90  5.33
##  ---                                                                           
## 502: 0.06263  0 11.93    0 0.573 6.593 69.1 2.4786   1 273    21.0 391.99  9.67
## 503: 0.04527  0 11.93    0 0.573 6.120 76.7 2.2875   1 273    21.0 396.90  9.08
## 504: 0.06076  0 11.93    0 0.573 6.976 91.0 2.1675   1 273    21.0 396.90  5.64
## 505: 0.10959  0 11.93    0 0.573 6.794 89.3 2.3889   1 273    21.0 393.45  6.48
## 506: 0.04741  0 11.93    0 0.573 6.030 80.8 2.5050   1 273    21.0 396.90  7.88
##      medv
##   1: 24.0
##   2: 21.6
##   3: 34.7
##   4: 33.4
##   5: 36.2
##  ---     
## 502: 22.4
## 503: 20.6
## 504: 23.9
## 505: 22.0
## 506: 11.9

checking for nulls

is.null(boston)
## [1] FALSE

Applying PCA function

bpca <- prcomp(boston, scale = TRUE,
                center = TRUE, retx = T)
names(bpca)
## [1] "sdev"     "rotation" "center"   "scale"    "x"

summary

summary(bpca)
## Importance of components:
##                           PC1    PC2     PC3     PC4     PC5     PC6     PC7
## Standard deviation     2.5585 1.2843 1.16142 0.94156 0.92244 0.81241 0.73172
## Proportion of Variance 0.4676 0.1178 0.09635 0.06332 0.06078 0.04714 0.03824
## Cumulative Proportion  0.4676 0.5854 0.68174 0.74507 0.80585 0.85299 0.89123
##                            PC8    PC9    PC10   PC11    PC12    PC13    PC14
## Standard deviation     0.63488 0.5266 0.50225 0.4613 0.42777 0.36607 0.24561
## Proportion of Variance 0.02879 0.0198 0.01802 0.0152 0.01307 0.00957 0.00431
## Cumulative Proportion  0.92003 0.9398 0.95785 0.9730 0.98612 0.99569 1.00000

View the principal component loading

bpca$rotation
##                  PC1          PC2          PC3          PC4          PC5
## crim     0.242284451 -0.065873108  0.395077419 -0.100366211  0.004957659
## zn      -0.245435005 -0.148002653  0.394545713 -0.342958421  0.114495002
## indus    0.331859746  0.127075668 -0.066081913  0.009626936 -0.022583692
## chas    -0.005027133  0.410668763 -0.125305293 -0.700406497 -0.535197817
## nox      0.325193880  0.254276363 -0.046475549 -0.053707583  0.194605570
## rm      -0.202816554  0.434005810  0.353406095  0.293357309 -0.008320481
## age      0.296976574  0.260303205 -0.200823078  0.078426326  0.149750092
## dis     -0.298169809 -0.359149977  0.157068710 -0.184747787 -0.106219480
## rad      0.303412754  0.031149596  0.418510334  0.051374381 -0.230352185
## tax      0.324033052  0.008851406  0.343232194  0.026810695 -0.163425820
## ptratio  0.207679535 -0.314623061  0.000399092  0.342036328 -0.615707380
## b       -0.196638358  0.026481032 -0.361375914  0.201741185 -0.367460674
## lstat    0.311397955 -0.201245177 -0.161060336 -0.242621217  0.178358870
## medv    -0.266636396  0.444924411  0.163188735  0.180297553 -0.050659893
##                 PC6          PC7         PC8          PC9         PC10
## crim    -0.22462703  0.777083366 -0.15740140  0.254211798 -0.071384615
## zn      -0.33574694 -0.274178365  0.38031404  0.382899480  0.245579673
## indus   -0.08082495 -0.340273839 -0.17174578  0.627048264 -0.254827026
## chas     0.16264906  0.074075775  0.03292700 -0.018642967 -0.041706916
## nox     -0.14899191 -0.198092965 -0.04745838 -0.043024391 -0.211620349
## rm       0.13108056  0.074084938  0.43761566 -0.003666947 -0.526133916
## age     -0.06086960  0.118580363  0.58810569 -0.043265822  0.245647942
## dis      0.01162335 -0.104397844  0.12823060 -0.175802196 -0.299412026
## rad     -0.13493732 -0.137080107 -0.07464872 -0.463439397  0.115793486
## tax     -0.18847146 -0.313984433 -0.07099212 -0.179446555 -0.008366413
## ptratio  0.27901731  0.001485608  0.28346960  0.274525949  0.160474164
## b       -0.78590728  0.074842780  0.04444175 -0.060975651 -0.146292237
## lstat   -0.09197211  0.083213083  0.35748247 -0.171810921  0.066647267
## medv    -0.05402804 -0.009964973 -0.15230879  0.070751083  0.575547284
##                 PC11        PC12         PC13         PC14
## crim    -0.071068781  0.06327612  0.097032312  0.059114176
## zn      -0.127709065 -0.22112210 -0.132375830 -0.096296807
## indus    0.273797614  0.34840828  0.083716854 -0.235472877
## chas    -0.009968402 -0.01903975 -0.049917454  0.023488966
## nox     -0.437475550 -0.44909357  0.524974687  0.087649148
## rm       0.223951923 -0.12560554 -0.049893596  0.007190515
## age     -0.329630928  0.48633905 -0.051462562 -0.038227027
## dis     -0.114600078  0.49356822  0.552292172  0.047124029
## rad      0.042213348  0.01863641 -0.006278474 -0.634975332
## tax      0.042794054  0.17042179 -0.242987756  0.698822190
## ptratio -0.099991841 -0.23214842  0.188347079  0.055738160
## b        0.039194858 -0.04152885 -0.021078199 -0.016165280
## lstat    0.683032690 -0.18189209  0.249489863  0.083143795
## medv     0.242001064  0.09828580  0.469629324  0.134127182

See the principal components

dim(bpca$x)
## [1] 506  14

Plotting the resultant principal components

biplot(bpca, main = "Biplot", scale = 0)

Compute standard deviation

bpca$sdev
##  [1] 2.5585132 1.2843410 1.1614241 0.9415625 0.9224421 0.8124105 0.7317177
##  [8] 0.6348831 0.5265582 0.5022524 0.4612919 0.4277704 0.3660733 0.2456149

Compute variance

bpca.var <- bpca$sdev ^ 2
bpca.var
##  [1] 6.54598958 1.64953191 1.34890592 0.88653987 0.85089944 0.66001077
##  [7] 0.53541080 0.40307658 0.27726358 0.25225744 0.21279025 0.18298750
## [13] 0.13400970 0.06032666

Proportion of variance for a scree plot

propve <- bpca.var / sum(bpca.var)
propve
##  [1] 0.467570684 0.117823708 0.096350423 0.063324276 0.060778531 0.047143627
##  [7] 0.038243629 0.028791184 0.019804542 0.018018389 0.015199304 0.013070536
## [13] 0.009572121 0.004309047

Plot variance explained for each principal component

plot(propve, xlab = "principal component",
            ylab = "Proportion of Variance Explained",
            ylim = c(0, 1), type = "b",
            main = "Scree Plot")

Plot the cumulative proportion of variance explained

plot(cumsum(propve),
    xlab = "Principal Component",
    ylab = "Cumulative Proportion of Variance Explained",
    ylim = c(0, 1), type = "b")

Find Top n principal component

which(cumsum(propve) >= 0.9)[1]
## [1] 8