This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
data("USArrests")
summary(USArrests)
## Murder Assault UrbanPop Rape
## Min. : 0.800 Min. : 45.0 Min. :32.00 Min. : 7.30
## 1st Qu.: 4.075 1st Qu.:109.0 1st Qu.:54.50 1st Qu.:15.07
## Median : 7.250 Median :159.0 Median :66.00 Median :20.10
## Mean : 7.788 Mean :170.8 Mean :65.54 Mean :21.23
## 3rd Qu.:11.250 3rd Qu.:249.0 3rd Qu.:77.75 3rd Qu.:26.18
## Max. :17.400 Max. :337.0 Max. :91.00 Max. :46.00
arrests.pca<-prcomp(USArrests,center = TRUE, scale. = TRUE)
names(arrests.pca)
## [1] "sdev" "rotation" "center" "scale" "x"
print(arrests.pca)
## Standard deviations (1, .., p=4):
## [1] 1.5748783 0.9948694 0.5971291 0.4164494
##
## Rotation (n x k) = (4 x 4):
## PC1 PC2 PC3 PC4
## Murder -0.5358995 0.4181809 -0.3412327 0.64922780
## Assault -0.5831836 0.1879856 -0.2681484 -0.74340748
## UrbanPop -0.2781909 -0.8728062 -0.3780158 0.13387773
## Rape -0.5434321 -0.1673186 0.8177779 0.08902432
summary(arrests.pca)
## Importance of components:
## PC1 PC2 PC3 PC4
## Standard deviation 1.5749 0.9949 0.59713 0.41645
## Proportion of Variance 0.6201 0.2474 0.08914 0.04336
## Cumulative Proportion 0.6201 0.8675 0.95664 1.00000
From the the summary, we can undersand PC1 explains 62% of variance and PC2 explains 24% so on. Usually Principal components which explains about 95% variance can be considered for models. Summary also yields cumulative proportion of the principal components.
Best thing is, plot PCA using various types of scree plot. Above declared ‘pcaCharts’ function invokes various forms of scree plot
pcaCharts <- function(x) {
x.var <- x$sdev ^ 2
x.pvar <- x.var/sum(x.var)
print("proportions of variance:")
print(x.pvar)
par(mfrow=c(2,2))
plot(x.pvar,xlab="Principal component", ylab="Proportion of variance explained", ylim=c(0,1), type='b')
plot(cumsum(x.pvar),xlab="Principal component", ylab="Cumulative Proportion of variance explained", ylim=c(0,1), type='b')
screeplot(x)
screeplot(x,type="l")
par(mfrow=c(1,1))
}
pcaCharts(arrests.pca)
## [1] "proportions of variance:"
## [1] 0.62006039 0.24744129 0.08914080 0.04335752
biplot(arrests.pca,scale = 0,cex=0.7)
pca.out<-arrests.pca
pca.out$rotation<--pca.out$rotation
pca.out$x<-pca.out$x
biplot(pca.out,scale=0,cex = 0.7)
pca.out$rotation[, 1:2]
## PC1 PC2
## Murder 0.5358995 -0.4181809
## Assault 0.5831836 -0.1879856
## UrbanPop 0.2781909 0.8728062
## Rape 0.5434321 0.1673186
library(devtools)
install_github("vqv/ggbiplot")
## Downloading GitHub repo vqv/ggbiplot@master
## from URL https://api.github.com/repos/vqv/ggbiplot/zipball/master
## Installing ggbiplot
## "C:/Users/supre/Desktop/R-34~1.3/bin/x64/R" --no-site-file --no-environ \
## --no-save --no-restore --quiet CMD INSTALL \
## "C:/Users/supre/AppData/Local/Temp/Rtmpq0Gxtu/devtools5fa02fc81b32/vqv-ggbiplot-7325e88" \
## --library="C:/Users/supre/Desktop/R-3.4.3/library" --install-tests
##
library(ggbiplot)
## Loading required package: ggplot2
## Loading required package: plyr
## Loading required package: scales
## Loading required package: grid
P.out<- ggbiplot(pca.out, obs.scale = 1, var.scale = 1, labels=row.names(USArrests),force=TRUE,
ellipse = TRUE,
circle = TRUE)
P.out <- P.out + scale_color_discrete(name = '')
P.out <- P.out + theme(legend.direction = 'horizontal',
legend.position = 'top')
print(P.out)
You can also embed plots, for example:
Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.