============================================================================================================
About: This document is also available at http://rpubs.com/sherloconan/615424
In this assignment you will be working with dataset from your 699 project. You will perform principal component analysis (PCA).
Establish the optimal number of components: visualize the scree plot and explain your decision. [ - 10pts]
Visualize PCA1 and PCA2 and describe which variables contribute to the PCA. [ - 10pts]
Reflect how you could use the reduced dimensionality in your final paper. [ - 10pts]
Writing style. [ - 10pts]
The EDA parts are available at RPubs - part I and RPubs - part II. The modeling part is available at RPubs - part III.
The project dataset is the Bitcoin price as a time series. The PCA cannot be conducted in this case. Hence, the default dataset “mtcars” will be in use. Fig. 1 shows the raw data.
data <- mtcars[,c(1,3)]
f1 <- ggplot(data,aes(mpg,disp))+geom_point()+labs(x="Fuel economy (miles per gallon)",y="Engine displacement (cubic inch)",subtitle="Raw data")+ggtitle("Fig. 1. Scatter plot of MPG and DISP")+theme_classic()
data <- data.frame(scale(data))
f2 <- ggplot(data,aes(mpg,disp))+geom_point()+labs(x="Fuel economy (miles per gallon)",y="Engine displacement (cubic inch)",subtitle="Scaled data")+ggtitle("Fig. 2. Scatter plot of MPG and DISP")+theme_classic()
gridExtra::grid.arrange(f1,f2,nrow=1,bottom="Before PCA")
The summary shows the importance of components. (1) sdev, the standard deviations of the principal components, i.e., the square roots of the eigenvalues of the covariance/correlation matrix, though the calculation is actually done with the singular values of the data matrix. (2) rotation, the matrix of variable loadings, i.e., a matrix whose columns contain the eigenvectors. The function “princomp” returns this in the element loadings. (3) x, if retx is true, then the value of the rotated data (the centred (and scaled if requested) data multiplied by the rotation matrix) is returned. Hence, cov(x) is the diagonal matrix diag(sdev^2). For the formula method, napredict() is applied to handle the treatment of values omitted by the na.action.
\(PC1 = -0.7071 * mpg + 0.7071 * disp\)
\(PC2 = 0.7071 * mpg + 0.7071 * disp\)
PCA <- prcomp(data,center=F,scale.=F)
summary(PCA)
## Importance of components:
## PC1 PC2
## Standard deviation 1.3592 0.39045
## Proportion of Variance 0.9238 0.07622
## Cumulative Proportion 0.9238 1.00000
PCA$rotation
## PC1 PC2
## mpg -0.7071068 0.7071068
## disp 0.7071068 0.7071068
#plot(PCA$x)
data_PCA <- data.frame(PCA$x)
ggplot(data_PCA,aes(PC1,PC2))+geom_point()+labs(subtitle="After PCA")+ggtitle("Fig. 3. Scatter plot of MPG and DISP")+theme_classic()
data2 <- prcomp(mtcars,center=F,scale.=F,retx=T)
plot(data2$x[,1:2],main="Principal component analysis")
data2$x[1:5,1:5]
## PC1 PC2 PC3 PC4 PC5
## Mazda RX4 -195.4586 -12.82442 11.366763 -0.01644411 2.1681553
## Mazda RX4 Wag -195.4900 -12.85837 11.672530 0.47938929 2.1123201
## Datsun 710 -142.4803 -25.93604 16.034034 1.33669483 -1.1818054
## Hornet 4 Drive -279.1129 38.27291 14.032390 -0.15698678 -0.8169072
## Hornet Sportabout -399.4494 37.33958 1.384863 -2.55678873 -0.4435470
data2$sdev
## [1] 310.1170486 40.8849807 15.8494620 2.1406948 1.0130078 0.7559841
## [7] 0.4637388 0.2914478 0.2518935 0.2107261 0.1985110
data2$rotation[1:5,1:5]
## PC1 PC2 PC3 PC4 PC5
## mpg -0.05192570 -0.12168895 0.816770206 -0.5384199012 0.014048862
## cyl -0.02055752 -0.01353632 0.068072076 0.0965395213 0.220032664
## disp -0.85225865 0.52234494 0.009452163 -0.0223072691 0.007183881
## hp -0.51719303 -0.84049922 -0.157726895 0.0009687559 -0.033835232
## drat -0.01010078 -0.02135314 0.107698845 0.0342661459 0.167587424
recover <- as.matrix(mtcars) %*% as.matrix(data2$rotation)
recover[1:5,1:5]
## PC1 PC2 PC3 PC4 PC5
## Mazda RX4 -195.4586 -12.82442 11.366763 -0.01644411 2.1681553
## Mazda RX4 Wag -195.4900 -12.85837 11.672530 0.47938929 2.1123201
## Datsun 710 -142.4803 -25.93604 16.034034 1.33669483 -1.1818054
## Hornet 4 Drive -279.1129 38.27291 14.032390 -0.15698678 -0.8169072
## Hornet Sportabout -399.4494 37.33958 1.384863 -2.55678873 -0.4435470