PCA (Principal Components Analysis) is easy in R, but the standard biplot() function is a little clunky. The vegan package can do PCA using the rda() function (normally for redundancy analysis) and has some nice plotting functions. (Note that ggplot is also developing biplot tools).

Fisher’s Irises

data("iris")
summary(iris)
##   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
##  Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
##  1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
##  Median :5.800   Median :3.000   Median :4.350   Median :1.300  
##  Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
##  3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
##  Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
##        Species  
##  setosa    :50  
##  versicolor:50  
##  virginica :50  
##                 
##                 
## 

Visualize variables in 2D

pairs(iris,
      lower.panel = NULL, 
      col = as.numeric(iris$Species))

Note: can also be done in ggplot using GGalley::ggpairs; however, not updated for most recent version of R.



Do PCA with base R function

Run PCA

  • NOTE: there is also a similar function to princomp(), prcomp()
#"[,-5]" drops that last columns
##the species code
iris.pca <- princomp(iris[,-5])

Plot Biplot in base graphics

biplot(iris.pca)

There are is also ggplot biplot code somewhere outthere on the internet.

Do PCA with vegan::rda

run a vegan PCA with rda()

library(vegan)
## Loading required package: permute
## Loading required package: lattice
## This is vegan 2.4-0
my.rda <- rda(iris[,-5])


#"[,-5]" drops that last columns
##the species code



Plot the RDA

Note to get help when doing and RDA biplot, call ?biplot.rda; however, use biplot() for the actual plotting.

Default biplot of RDA output

biplot(my.rda)

Note that vegan’s scaling is different than a normal biplot.



biplot.rda options

  • vegan is package for ecological analysis and so its functions are oriented towards particular kinds of analyses
  • the “display” arguement is not in the normal biplot comman used with princomp
    • “display = ‘sites’” corresonds to the rows of data; one datapoint per row
    • “display = ‘species’” corresponds to the columns of data; one line/arrow per column
  • “type” is an argument also not in biplots originating from princomp
    • “type = ‘text’” plots row numbers and column names as labels
    • “type = points’” plots points
biplot(my.rda,
       display = c("sites", 
                   "species"),
       type = c("text",
                "points"))

vegan’s ordination plotting functions

Add “hulls” around each species

#make basic plot
biplot(my.rda,
       display = c("sites", 
                   "species"),
       type = c("text",
                "points"))

#Add "hulls"
ordihull(my.rda,
         group = iris$Species)

Make plot pretty

#make basic plot
biplot(my.rda,
       display = c("sites", 
                   "species"),
       type = c("text",
                "points"))

#get names of species
## levels() extracts fator levels
## from the column
spp.names <- levels(iris$Species)

#Add hulls
ordihull(my.rda,
         group = iris$Species,
         col = c(1,2,3))


#Add legend
legend("topright",
       col = c(1,2,3), 
       lty = 1,
       legend = spp.names)