R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

data("USArrests")
summary(USArrests)
##      Murder          Assault         UrbanPop          Rape      
##  Min.   : 0.800   Min.   : 45.0   Min.   :32.00   Min.   : 7.30  
##  1st Qu.: 4.075   1st Qu.:109.0   1st Qu.:54.50   1st Qu.:15.07  
##  Median : 7.250   Median :159.0   Median :66.00   Median :20.10  
##  Mean   : 7.788   Mean   :170.8   Mean   :65.54   Mean   :21.23  
##  3rd Qu.:11.250   3rd Qu.:249.0   3rd Qu.:77.75   3rd Qu.:26.18  
##  Max.   :17.400   Max.   :337.0   Max.   :91.00   Max.   :46.00
arrests.pca<-prcomp(USArrests,center = TRUE, scale. = TRUE)
names(arrests.pca)
## [1] "sdev"     "rotation" "center"   "scale"    "x"
print(arrests.pca)
## Standard deviations (1, .., p=4):
## [1] 1.5748783 0.9948694 0.5971291 0.4164494
## 
## Rotation (n x k) = (4 x 4):
##                 PC1        PC2        PC3         PC4
## Murder   -0.5358995  0.4181809 -0.3412327  0.64922780
## Assault  -0.5831836  0.1879856 -0.2681484 -0.74340748
## UrbanPop -0.2781909 -0.8728062 -0.3780158  0.13387773
## Rape     -0.5434321 -0.1673186  0.8177779  0.08902432
summary(arrests.pca)
## Importance of components:
##                           PC1    PC2     PC3     PC4
## Standard deviation     1.5749 0.9949 0.59713 0.41645
## Proportion of Variance 0.6201 0.2474 0.08914 0.04336
## Cumulative Proportion  0.6201 0.8675 0.95664 1.00000

From the the summary, we can undersand PC1 explains 62% of variance and PC2 explains 24% so on. Usually Principal components which explains about 95% variance can be considered for models. Summary also yields cumulative proportion of the principal components.

Best thing is, plot PCA using various types of scree plot. Above declared ‘pcaCharts’ function invokes various forms of scree plot

pcaCharts <- function(x) {
    x.var <- x$sdev ^ 2
    x.pvar <- x.var/sum(x.var)
    print("proportions of variance:")
    print(x.pvar)
    
    par(mfrow=c(2,2))
    plot(x.pvar,xlab="Principal component", ylab="Proportion of variance explained", ylim=c(0,1), type='b')
    plot(cumsum(x.pvar),xlab="Principal component", ylab="Cumulative Proportion of variance explained", ylim=c(0,1), type='b')
    screeplot(x)
    screeplot(x,type="l")
    par(mfrow=c(1,1))
}
pcaCharts(arrests.pca)
## [1] "proportions of variance:"
## [1] 0.62006039 0.24744129 0.08914080 0.04335752

biplot(arrests.pca,scale = 0,cex=0.7)

pca.out<-arrests.pca
pca.out$rotation<--pca.out$rotation
pca.out$x<-pca.out$x
biplot(pca.out,scale=0,cex = 0.7)

pca.out$rotation[, 1:2]
##                PC1        PC2
## Murder   0.5358995 -0.4181809
## Assault  0.5831836 -0.1879856
## UrbanPop 0.2781909  0.8728062
## Rape     0.5434321  0.1673186
library(devtools)
install_github("vqv/ggbiplot")
## Downloading GitHub repo vqv/ggbiplot@master
## from URL https://api.github.com/repos/vqv/ggbiplot/zipball/master
## Installing ggbiplot
## "C:/Users/supre/Desktop/R-34~1.3/bin/x64/R" --no-site-file --no-environ  \
##   --no-save --no-restore --quiet CMD INSTALL  \
##   "C:/Users/supre/AppData/Local/Temp/Rtmpq0Gxtu/devtools5fa02fc81b32/vqv-ggbiplot-7325e88"  \
##   --library="C:/Users/supre/Desktop/R-3.4.3/library" --install-tests
## 
library(ggbiplot)
## Loading required package: ggplot2
## Loading required package: plyr
## Loading required package: scales
## Loading required package: grid
P.out<- ggbiplot(pca.out, obs.scale = 1, var.scale = 1, labels=row.names(USArrests),force=TRUE,
              ellipse = TRUE, 
              circle = TRUE)
P.out <- P.out + scale_color_discrete(name = '')
P.out <- P.out + theme(legend.direction = 'horizontal', 
               legend.position = 'top')
print(P.out)

Including Plots

You can also embed plots, for example:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.