Will Perform and visualize PCA in the given mtcars data set.
df <- mtcars
head(df)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Selecting the numerical data (excluding the categorical variables vs and am)
df <- mtcars[,c(1:7,10,11)]
head(df)
## mpg cyl disp hp drat wt qsec gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 3 1
We then pass df to the prcomp(). We also set two arguments, center and scale, to be TRUE then preview our object with summary
mtcars.pca <- prcomp(df, center = TRUE, scale. = TRUE)
summary(mtcars.pca)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 2.3782 1.4429 0.71008 0.51481 0.42797 0.35184 0.32413
## Proportion of Variance 0.6284 0.2313 0.05602 0.02945 0.02035 0.01375 0.01167
## Cumulative Proportion 0.6284 0.8598 0.91581 0.94525 0.96560 0.97936 0.99103
## PC8 PC9
## Standard deviation 0.2419 0.14896
## Proportion of Variance 0.0065 0.00247
## Cumulative Proportion 0.9975 1.00000
As a result we obtain 9 principal components, each which explain a percentage of the total variation of the data set. PC1 explains 63% of the total variance, which means that nearly two-thirds of the information in the data set (9 variables) can be encapsulated by just that one Principal Component. PC2 explains 23% of the variance. etc
Calling str() to have a look at your PCA object
str(mtcars.pca)
## List of 5
## $ sdev : num [1:9] 2.378 1.443 0.71 0.515 0.428 ...
## $ rotation: num [1:9, 1:9] -0.393 0.403 0.397 0.367 -0.312 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:9] "mpg" "cyl" "disp" "hp" ...
## .. ..$ : chr [1:9] "PC1" "PC2" "PC3" "PC4" ...
## $ center : Named num [1:9] 20.09 6.19 230.72 146.69 3.6 ...
## ..- attr(*, "names")= chr [1:9] "mpg" "cyl" "disp" "hp" ...
## $ scale : Named num [1:9] 6.027 1.786 123.939 68.563 0.535 ...
## ..- attr(*, "names")= chr [1:9] "mpg" "cyl" "disp" "hp" ...
## $ x : num [1:32, 1:9] -0.664 -0.637 -2.3 -0.215 1.587 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
## .. ..$ : chr [1:9] "PC1" "PC2" "PC3" "PC4" ...
## - attr(*, "class")= chr "prcomp"
Here we note that our pca object: The center point (center), scaling (scale), standard deviation(sdev) of each principal component. The relationship (correlation or anti correlation, etc) between the initial variables and the principal components (rotation). The values of each sample in terms of the principal components (x)
View the principal component loading
mtcars.pca$rotation
## PC1 PC2 PC3 PC4 PC5 PC6
## mpg -0.3931477 0.02753861 -0.22119309 -0.006126378 -0.3207620 0.72015586
## cyl 0.4025537 0.01570975 -0.25231615 0.040700251 0.1171397 0.22432550
## disp 0.3973528 -0.08888469 -0.07825139 0.339493732 -0.4867849 -0.01967516
## hp 0.3670814 0.26941371 -0.01721159 0.068300993 -0.2947317 0.35394225
## drat -0.3118165 0.34165268 0.14995507 0.845658485 0.1619259 -0.01536794
## wt 0.3734771 -0.17194306 0.45373418 0.191260029 -0.1874822 -0.08377237
## qsec -0.2243508 -0.48404435 0.62812782 -0.030329127 -0.1482495 0.25752940
## gear -0.2094749 0.55078264 0.20658376 -0.282381831 -0.5624860 -0.32298239
## carb 0.2445807 0.48431310 0.46412069 -0.214492216 0.3997820 0.35706914
## PC7 PC8 PC9
## mpg -0.38138068 -0.12465987 0.11492862
## cyl -0.15893251 0.81032177 0.16266295
## disp -0.18233095 -0.06416707 -0.66190812
## hp 0.69620751 -0.16573993 0.25177306
## drat 0.04767957 0.13505066 0.03809096
## wt -0.42777608 -0.19839375 0.56918844
## qsec 0.27622581 0.35613350 -0.16873731
## gear -0.08555707 0.31636479 0.04719694
## carb -0.20604210 -0.10832772 -0.32045892
See the principal components
dim(mtcars.pca$x)
## [1] 32 9
We will now plot our pca. This will provide us with some very useful insights i.e. which cars are most similar to each other
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.1.3
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(rpart)
## Warning: package 'rpart' was built under R version 4.1.3
library(rpart.plot)
## Warning: package 'rpart.plot' was built under R version 4.1.3
Plotting the resultant principal components
biplot(mtcars.pca, main = "Biplot", scale = 0.01)
From the graph we will see that the variables hp, cyl and disp contribute to PC1,with higher values in those variables moving the samples to the right on the plot.
Adding more detail to the plot, we provide arguments row names as labels
biplot(mtcars.pca, obs.scale = 1, var.scale = 1)
## Warning in plot.window(...): "obs.scale" is not a graphical parameter
## Warning in plot.window(...): "var.scale" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "obs.scale" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "var.scale" is not a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "obs.scale" is not
## a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "var.scale" is not
## a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "obs.scale" is not
## a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "var.scale" is not
## a graphical parameter
## Warning in box(...): "obs.scale" is not a graphical parameter
## Warning in box(...): "var.scale" is not a graphical parameter
## Warning in title(...): "obs.scale" is not a graphical parameter
## Warning in title(...): "var.scale" is not a graphical parameter
## Warning in text.default(x, xlabs, cex = cex[1L], col = col[1L], ...):
## "obs.scale" is not a graphical parameter
## Warning in text.default(x, xlabs, cex = cex[1L], col = col[1L], ...):
## "var.scale" is not a graphical parameter
## Warning in plot.window(...): "obs.scale" is not a graphical parameter
## Warning in plot.window(...): "var.scale" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "obs.scale" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "var.scale" is not a graphical parameter
## Warning in title(...): "obs.scale" is not a graphical parameter
## Warning in title(...): "var.scale" is not a graphical parameter
## Warning in axis(3, col = col[2L], ...): "obs.scale" is not a graphical parameter
## Warning in axis(3, col = col[2L], ...): "var.scale" is not a graphical parameter
## Warning in axis(4, col = col[2L], ...): "obs.scale" is not a graphical parameter
## Warning in axis(4, col = col[2L], ...): "var.scale" is not a graphical parameter
## Warning in text.default(y, labels = ylabs, cex = cex[2L], col = col[2L], :
## "obs.scale" is not a graphical parameter
## Warning in text.default(y, labels = ylabs, cex = cex[2L], col = col[2L], :
## "var.scale" is not a graphical parameter
We now see which cars are similar to one another. The sports cars
Maserati Bora, Ferrari Dino and Ford Pantera L all cluster together at
the top
We can also look at the origin of each of the cars by putting them into one of three categories i.e. US, Japanese and European cars.
suppressWarnings(expr)
## function (expr)
## {
## enexpr(expr)
## }
## <bytecode: 0x000000001c434fd8>
## <environment: namespace:rlang>
mtcars.country <- c(rep("Japan", 3), rep("US",4), rep("Europe", 7),rep("US",3), "Europe", rep("Japan", 3), rep("US",4), rep("Europe", 3), "US", rep("Europe", 3))
biplot(mtcars.pca,ellipse=TRUE, groups=mtcars.country, obs.scale = 1, var.scale = 1)
## Warning in plot.window(...): "ellipse" is not a graphical parameter
## Warning in plot.window(...): "groups" is not a graphical parameter
## Warning in plot.window(...): "obs.scale" is not a graphical parameter
## Warning in plot.window(...): "var.scale" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "ellipse" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "groups" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "obs.scale" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "var.scale" is not a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "ellipse" is not a
## graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "groups" is not a
## graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "obs.scale" is not
## a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "var.scale" is not
## a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "ellipse" is not a
## graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "groups" is not a
## graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "obs.scale" is not
## a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "var.scale" is not
## a graphical parameter
## Warning in box(...): "ellipse" is not a graphical parameter
## Warning in box(...): "groups" is not a graphical parameter
## Warning in box(...): "obs.scale" is not a graphical parameter
## Warning in box(...): "var.scale" is not a graphical parameter
## Warning in title(...): "ellipse" is not a graphical parameter
## Warning in title(...): "groups" is not a graphical parameter
## Warning in title(...): "obs.scale" is not a graphical parameter
## Warning in title(...): "var.scale" is not a graphical parameter
## Warning in text.default(x, xlabs, cex = cex[1L], col = col[1L], ...): "ellipse"
## is not a graphical parameter
## Warning in text.default(x, xlabs, cex = cex[1L], col = col[1L], ...): "groups"
## is not a graphical parameter
## Warning in text.default(x, xlabs, cex = cex[1L], col = col[1L], ...):
## "obs.scale" is not a graphical parameter
## Warning in text.default(x, xlabs, cex = cex[1L], col = col[1L], ...):
## "var.scale" is not a graphical parameter
## Warning in plot.window(...): "ellipse" is not a graphical parameter
## Warning in plot.window(...): "groups" is not a graphical parameter
## Warning in plot.window(...): "obs.scale" is not a graphical parameter
## Warning in plot.window(...): "var.scale" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "ellipse" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "groups" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "obs.scale" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "var.scale" is not a graphical parameter
## Warning in title(...): "ellipse" is not a graphical parameter
## Warning in title(...): "groups" is not a graphical parameter
## Warning in title(...): "obs.scale" is not a graphical parameter
## Warning in title(...): "var.scale" is not a graphical parameter
## Warning in axis(3, col = col[2L], ...): "ellipse" is not a graphical parameter
## Warning in axis(3, col = col[2L], ...): "groups" is not a graphical parameter
## Warning in axis(3, col = col[2L], ...): "obs.scale" is not a graphical parameter
## Warning in axis(3, col = col[2L], ...): "var.scale" is not a graphical parameter
## Warning in axis(4, col = col[2L], ...): "ellipse" is not a graphical parameter
## Warning in axis(4, col = col[2L], ...): "groups" is not a graphical parameter
## Warning in axis(4, col = col[2L], ...): "obs.scale" is not a graphical parameter
## Warning in axis(4, col = col[2L], ...): "var.scale" is not a graphical parameter
## Warning in text.default(y, labels = ylabs, cex = cex[2L], col = col[2L], :
## "ellipse" is not a graphical parameter
## Warning in text.default(y, labels = ylabs, cex = cex[2L], col = col[2L], :
## "groups" is not a graphical parameter
## Warning in text.default(y, labels = ylabs, cex = cex[2L], col = col[2L], :
## "obs.scale" is not a graphical parameter
## Warning in text.default(y, labels = ylabs, cex = cex[2L], col = col[2L], :
## "var.scale" is not a graphical parameter
We get to see that US cars for a cluster on the right. This cluster is characterized by high values for cyl, disp and wt. Japanese cars are characterized by high mpg. European cars are somewhat in the middle and less tightly clustered that either group.
Let’s plot PC3 and PC4
biplot(mtcars.pca,ellipse=TRUE,choices=c(3,4), groups=mtcars.country)
## Warning in plot.window(...): "ellipse" is not a graphical parameter
## Warning in plot.window(...): "groups" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "ellipse" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "groups" is not a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "ellipse" is not a
## graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "groups" is not a
## graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "ellipse" is not a
## graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "groups" is not a
## graphical parameter
## Warning in box(...): "ellipse" is not a graphical parameter
## Warning in box(...): "groups" is not a graphical parameter
## Warning in title(...): "ellipse" is not a graphical parameter
## Warning in title(...): "groups" is not a graphical parameter
## Warning in text.default(x, xlabs, cex = cex[1L], col = col[1L], ...): "ellipse"
## is not a graphical parameter
## Warning in text.default(x, xlabs, cex = cex[1L], col = col[1L], ...): "groups"
## is not a graphical parameter
## Warning in plot.window(...): "ellipse" is not a graphical parameter
## Warning in plot.window(...): "groups" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "ellipse" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "groups" is not a graphical parameter
## Warning in title(...): "ellipse" is not a graphical parameter
## Warning in title(...): "groups" is not a graphical parameter
## Warning in axis(3, col = col[2L], ...): "ellipse" is not a graphical parameter
## Warning in axis(3, col = col[2L], ...): "groups" is not a graphical parameter
## Warning in axis(4, col = col[2L], ...): "ellipse" is not a graphical parameter
## Warning in axis(4, col = col[2L], ...): "groups" is not a graphical parameter
## Warning in text.default(y, labels = ylabs, cex = cex[2L], col = col[2L], :
## "ellipse" is not a graphical parameter
## Warning in text.default(y, labels = ylabs, cex = cex[2L], col = col[2L], :
## "groups" is not a graphical parameter
We find it difficult to derive insights from the given plot mainly because PC3 and PC4 explain very small percentages of the total variation, thus it would be surprising if we found that they were very informative and separated the groups or revealed apparent patterns.
Having performed PCA using this data set, if we were to build a classification model to identify the origin of a car (i.e. European, Japanese, US), the variables cyl, disp, wt and mpg would be significant variables as seen in our PCA analysis.
Perform and plot PCA to the give Iris dataset. Reduce 4 dimensinal data into 2 or three dimensions. Provide remarks on your analysis. Dataset url = http://bit.ly/IrisDataset
Loading the data set
library(data.table)
## Warning: package 'data.table' was built under R version 4.1.2
##
## Attaching package: 'data.table'
## The following objects are masked from 'package:dplyr':
##
## between, first, last
iris <- fread('http://bit.ly/IrisDataset')
iris
## sepal_length sepal_width petal_length petal_width species
## 1: 5.1 3.5 1.4 0.2 Iris-setosa
## 2: 4.9 3.0 1.4 0.2 Iris-setosa
## 3: 4.7 3.2 1.3 0.2 Iris-setosa
## 4: 4.6 3.1 1.5 0.2 Iris-setosa
## 5: 5.0 3.6 1.4 0.2 Iris-setosa
## ---
## 146: 6.7 3.0 5.2 2.3 Iris-virginica
## 147: 6.3 2.5 5.0 1.9 Iris-virginica
## 148: 6.5 3.0 5.2 2.0 Iris-virginica
## 149: 6.2 3.4 5.4 2.3 Iris-virginica
## 150: 5.9 3.0 5.1 1.8 Iris-virginica
Selecting the numerical attribute
drf <- iris[,c(1:3)]
head(drf)
## sepal_length sepal_width petal_length
## 1: 5.1 3.5 1.4
## 2: 4.9 3.0 1.4
## 3: 4.7 3.2 1.3
## 4: 4.6 3.1 1.5
## 5: 5.0 3.6 1.4
## 6: 5.4 3.9 1.7
Applying PCA using prcomp function, Need to scale / Normalize as PCA depends on distance measure
my_pca <- prcomp(drf, scale = TRUE,
center = TRUE, retx = T)
names(my_pca)
## [1] "sdev" "rotation" "center" "scale" "x"
summary(my_pca)
## Importance of components:
## PC1 PC2 PC3
## Standard deviation 1.4191 0.9565 0.26710
## Proportion of Variance 0.6713 0.3049 0.02378
## Cumulative Proportion 0.6713 0.9762 1.00000
View the principal component loading
my_pca$rotation
## PC1 PC2 PC3
## sepal_length 0.6313798 -0.42771052 -0.6468564
## sepal_width -0.3542423 -0.90110216 0.2500546
## petal_length 0.6898347 -0.07126444 0.7204509
See the principal components
dim(my_pca$x)
## [1] 150 3
Plotting the resultant principal components
biplot(my_pca, main = "Biplot", scale = 0)
Compute standard deviation
my_pca$sdev
## [1] 1.4190935 0.9564678 0.2671011
Compute variance
my_pca.var <- my_pca$sdev ^ 2
my_pca.var
## [1] 2.01382631 0.91483072 0.07134297
Proportion of variance for a scree plot
propve <- my_pca.var / sum(my_pca.var)
propve
## [1] 0.67127544 0.30494357 0.02378099
Plot variance explained for each principal component
plot(propve, xlab = "principal component",
ylab = "Proportion of Variance Explained",
ylim = c(0, 1), type = "b",
main = "Scree Plot")
Plot the cumulative proportion of variance explained
plot(cumsum(propve),
xlab = "Principal Component",
ylab = "Cumulative Proportion of Variance Explained",
ylim = c(0, 1), type = "b")
Perform and plot the given housing dataset. Provide remarks to your analysis. Data set url = http://bit.ly/BostonHousingDataset
Loading the data set
library(data.table)
boston <- fread('http://bit.ly/BostonHousingDataset')
boston
## crim zn indus chas nox rm age dis rad tax ptratio b lstat
## 1: 0.00632 18 2.31 0 0.538 6.575 65.2 4.0900 1 296 15.3 396.90 4.98
## 2: 0.02731 0 7.07 0 0.469 6.421 78.9 4.9671 2 242 17.8 396.90 9.14
## 3: 0.02729 0 7.07 0 0.469 7.185 61.1 4.9671 2 242 17.8 392.83 4.03
## 4: 0.03237 0 2.18 0 0.458 6.998 45.8 6.0622 3 222 18.7 394.63 2.94
## 5: 0.06905 0 2.18 0 0.458 7.147 54.2 6.0622 3 222 18.7 396.90 5.33
## ---
## 502: 0.06263 0 11.93 0 0.573 6.593 69.1 2.4786 1 273 21.0 391.99 9.67
## 503: 0.04527 0 11.93 0 0.573 6.120 76.7 2.2875 1 273 21.0 396.90 9.08
## 504: 0.06076 0 11.93 0 0.573 6.976 91.0 2.1675 1 273 21.0 396.90 5.64
## 505: 0.10959 0 11.93 0 0.573 6.794 89.3 2.3889 1 273 21.0 393.45 6.48
## 506: 0.04741 0 11.93 0 0.573 6.030 80.8 2.5050 1 273 21.0 396.90 7.88
## medv
## 1: 24.0
## 2: 21.6
## 3: 34.7
## 4: 33.4
## 5: 36.2
## ---
## 502: 22.4
## 503: 20.6
## 504: 23.9
## 505: 22.0
## 506: 11.9
checking for nulls
is.null(boston)
## [1] FALSE
Applying PCA function
bpca <- prcomp(boston, scale = TRUE,
center = TRUE, retx = T)
names(bpca)
## [1] "sdev" "rotation" "center" "scale" "x"
summary
summary(bpca)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 2.5585 1.2843 1.16142 0.94156 0.92244 0.81241 0.73172
## Proportion of Variance 0.4676 0.1178 0.09635 0.06332 0.06078 0.04714 0.03824
## Cumulative Proportion 0.4676 0.5854 0.68174 0.74507 0.80585 0.85299 0.89123
## PC8 PC9 PC10 PC11 PC12 PC13 PC14
## Standard deviation 0.63488 0.5266 0.50225 0.4613 0.42777 0.36607 0.24561
## Proportion of Variance 0.02879 0.0198 0.01802 0.0152 0.01307 0.00957 0.00431
## Cumulative Proportion 0.92003 0.9398 0.95785 0.9730 0.98612 0.99569 1.00000
View the principal component loading
bpca$rotation
## PC1 PC2 PC3 PC4 PC5
## crim 0.242284451 -0.065873108 0.395077419 -0.100366211 0.004957659
## zn -0.245435005 -0.148002653 0.394545713 -0.342958421 0.114495002
## indus 0.331859746 0.127075668 -0.066081913 0.009626936 -0.022583692
## chas -0.005027133 0.410668763 -0.125305293 -0.700406497 -0.535197817
## nox 0.325193880 0.254276363 -0.046475549 -0.053707583 0.194605570
## rm -0.202816554 0.434005810 0.353406095 0.293357309 -0.008320481
## age 0.296976574 0.260303205 -0.200823078 0.078426326 0.149750092
## dis -0.298169809 -0.359149977 0.157068710 -0.184747787 -0.106219480
## rad 0.303412754 0.031149596 0.418510334 0.051374381 -0.230352185
## tax 0.324033052 0.008851406 0.343232194 0.026810695 -0.163425820
## ptratio 0.207679535 -0.314623061 0.000399092 0.342036328 -0.615707380
## b -0.196638358 0.026481032 -0.361375914 0.201741185 -0.367460674
## lstat 0.311397955 -0.201245177 -0.161060336 -0.242621217 0.178358870
## medv -0.266636396 0.444924411 0.163188735 0.180297553 -0.050659893
## PC6 PC7 PC8 PC9 PC10
## crim -0.22462703 0.777083366 -0.15740140 0.254211798 -0.071384615
## zn -0.33574694 -0.274178365 0.38031404 0.382899480 0.245579673
## indus -0.08082495 -0.340273839 -0.17174578 0.627048264 -0.254827026
## chas 0.16264906 0.074075775 0.03292700 -0.018642967 -0.041706916
## nox -0.14899191 -0.198092965 -0.04745838 -0.043024391 -0.211620349
## rm 0.13108056 0.074084938 0.43761566 -0.003666947 -0.526133916
## age -0.06086960 0.118580363 0.58810569 -0.043265822 0.245647942
## dis 0.01162335 -0.104397844 0.12823060 -0.175802196 -0.299412026
## rad -0.13493732 -0.137080107 -0.07464872 -0.463439397 0.115793486
## tax -0.18847146 -0.313984433 -0.07099212 -0.179446555 -0.008366413
## ptratio 0.27901731 0.001485608 0.28346960 0.274525949 0.160474164
## b -0.78590728 0.074842780 0.04444175 -0.060975651 -0.146292237
## lstat -0.09197211 0.083213083 0.35748247 -0.171810921 0.066647267
## medv -0.05402804 -0.009964973 -0.15230879 0.070751083 0.575547284
## PC11 PC12 PC13 PC14
## crim -0.071068781 0.06327612 0.097032312 0.059114176
## zn -0.127709065 -0.22112210 -0.132375830 -0.096296807
## indus 0.273797614 0.34840828 0.083716854 -0.235472877
## chas -0.009968402 -0.01903975 -0.049917454 0.023488966
## nox -0.437475550 -0.44909357 0.524974687 0.087649148
## rm 0.223951923 -0.12560554 -0.049893596 0.007190515
## age -0.329630928 0.48633905 -0.051462562 -0.038227027
## dis -0.114600078 0.49356822 0.552292172 0.047124029
## rad 0.042213348 0.01863641 -0.006278474 -0.634975332
## tax 0.042794054 0.17042179 -0.242987756 0.698822190
## ptratio -0.099991841 -0.23214842 0.188347079 0.055738160
## b 0.039194858 -0.04152885 -0.021078199 -0.016165280
## lstat 0.683032690 -0.18189209 0.249489863 0.083143795
## medv 0.242001064 0.09828580 0.469629324 0.134127182
See the principal components
dim(bpca$x)
## [1] 506 14
Plotting the resultant principal components
biplot(bpca, main = "Biplot", scale = 0)
Compute standard deviation
bpca$sdev
## [1] 2.5585132 1.2843410 1.1614241 0.9415625 0.9224421 0.8124105 0.7317177
## [8] 0.6348831 0.5265582 0.5022524 0.4612919 0.4277704 0.3660733 0.2456149
Compute variance
bpca.var <- bpca$sdev ^ 2
bpca.var
## [1] 6.54598958 1.64953191 1.34890592 0.88653987 0.85089944 0.66001077
## [7] 0.53541080 0.40307658 0.27726358 0.25225744 0.21279025 0.18298750
## [13] 0.13400970 0.06032666
Proportion of variance for a scree plot
propve <- bpca.var / sum(bpca.var)
propve
## [1] 0.467570684 0.117823708 0.096350423 0.063324276 0.060778531 0.047143627
## [7] 0.038243629 0.028791184 0.019804542 0.018018389 0.015199304 0.013070536
## [13] 0.009572121 0.004309047
Plot variance explained for each principal component
plot(propve, xlab = "principal component",
ylab = "Proportion of Variance Explained",
ylim = c(0, 1), type = "b",
main = "Scree Plot")
Plot the cumulative proportion of variance explained
plot(cumsum(propve),
xlab = "Principal Component",
ylab = "Cumulative Proportion of Variance Explained",
ylim = c(0, 1), type = "b")
Find Top n principal component
which(cumsum(propve) >= 0.9)[1]
## [1] 8