Only do this once, then comment out of the script. You probably already did this in the previous Code Checkpoint.
#install.packages("ggplot2")
#install.package("vegan")
library(ggplot2)
library(vegan)
## Loading required package: permute
## Loading required package: lattice
## This is vegan 2.5-6
The mammals dataset is a classic dataset in the MASS package. msleep is an updated version of the data that includes more numeric data (e.g. hours of sleep) and categorical data (e.g. if a species is endangered)
data(msleep)
msleep has a mixture of numeric and cateogrical varibles. We want only the numeric data and a few key labels, which I’ll also have to set up a special way. Don’t worry about the details
First, all the columns we may want
msleep_num <- msleep[,c("sleep_total","sleep_rem",
"sleep_cycle","awake","brainwt" ,
"bodywt","vore","order")]
All data should be continuous numeric that can take on decimal values, except the last 2
summary(msleep_num[,-c(7:8)])
## sleep_total sleep_rem sleep_cycle awake
## Min. : 1.90 Min. :0.100 Min. :0.1167 Min. : 4.10
## 1st Qu.: 7.85 1st Qu.:0.900 1st Qu.:0.1833 1st Qu.:10.25
## Median :10.10 Median :1.500 Median :0.3333 Median :13.90
## Mean :10.43 Mean :1.875 Mean :0.4396 Mean :13.57
## 3rd Qu.:13.75 3rd Qu.:2.400 3rd Qu.:0.5792 3rd Qu.:16.15
## Max. :19.90 Max. :6.600 Max. :1.5000 Max. :22.10
## NA's :22 NA's :51
## brainwt bodywt
## Min. :0.00014 Min. : 0.005
## 1st Qu.:0.00290 1st Qu.: 0.174
## Median :0.01240 Median : 1.670
## Mean :0.28158 Mean : 166.136
## 3rd Qu.:0.12550 3rd Qu.: 41.750
## Max. :5.71200 Max. :6654.000
## NA's :27
We can look at these data with a scatterplot matrix
plot(msleep_num[,-c(7:8)])
Convert to dataframe
msleep_num <- data.frame(msleep_num)
Remove missing values
msleep_num <- na.omit(msleep_num)
Principal component analysis is typically done using base R functions.
Run the PCA
pca.out <- prcomp(msleep_num[,-c(7,8)], scale = TRUE)
Plot the output
biplot(pca.out)
The base R PCA output isn’t very flexible. The R package vegan is has a function rda() which does PCA and has many more nice features.
For another example see https://rpubs.com/brouwern/veganpca
Run the PCA using rda()
rda.out <- vegan::rda(msleep_num[,-c(7,8)], scale = TRUE)
Extract what are known as the PCA score - don’t worry about what this means
rda_scores <- scores(rda.out)
This displays the 2D PCA plot without the arrows. For more info on what this code does, see the RPubs document linked above
biplot(rda.out, display = "sites")
vegan has some nice tools for groups things.
In this dataset I don’t expect there to be any interest groups, but I’ll check anyway. I will supply this code if needed.
PCA is an exploratory method. First I’ll see if there are any groupsing based on diet (“vore”). Not really
biplot(rda.out, display = "sites")
vegan::ordihull(rda.out,
group = msleep_num$vore,
col = 1:7,
lty = 1:7,
lwd = c(3,6))
Copy the prevous code chunk and change the group so that the taxonomic order is plotted. Upload the image to this assignment. Consider if there are any meaninginful groups.
biplot(rda.out, display = "sites")
vegan::ordihull(rda.out,
group = msleep_num$order,
col = 1:7,
lty = 1:7,
lwd = c(3,6))