Only do this once, then comment out of the script. You may have already done this this for a previous assignment.
#install.packages("vegan")
library(ggplot2)
library(vegan)
## Loading required package: permute
## Loading required package: lattice
## This is vegan 2.6-4
The mammals dataset is a classic dataset in the MASS package. msleep is an updated version of the data in the ggplot2 package that includes more numeric data (e.g. hours of sleep) and categorical data (e.g. if a species is endangered).
#make sure you've loaded ggplot2 w/ library(ggplot2_
data(msleep)
msleep has a mixture of numeric and categorical variables. We want only the numeric data and a few key labels, which I’ll also have to set up a special way. Don’t worry about the details.
First, we will subset all the columns we may want:
msleep_num <- msleep[,c("sleep_total","sleep_rem",
"sleep_cycle","awake","brainwt" ,
"bodywt",
"vore","order")]
All data should be continuous numeric that can take on decimal values, except the last 2
summary(msleep_num[,-c(7:8)])
## sleep_total sleep_rem sleep_cycle awake
## Min. : 1.90 Min. :0.100 Min. :0.1167 Min. : 4.10
## 1st Qu.: 7.85 1st Qu.:0.900 1st Qu.:0.1833 1st Qu.:10.25
## Median :10.10 Median :1.500 Median :0.3333 Median :13.90
## Mean :10.43 Mean :1.875 Mean :0.4396 Mean :13.57
## 3rd Qu.:13.75 3rd Qu.:2.400 3rd Qu.:0.5792 3rd Qu.:16.15
## Max. :19.90 Max. :6.600 Max. :1.5000 Max. :22.10
## NA's :22 NA's :51
## brainwt bodywt
## Min. :0.00014 Min. : 0.005
## 1st Qu.:0.00290 1st Qu.: 0.174
## Median :0.01240 Median : 1.670
## Mean :0.28158 Mean : 166.136
## 3rd Qu.:0.12550 3rd Qu.: 41.750
## Max. :5.71200 Max. :6654.000
## NA's :27
We can look at these data with a scatterplot matrix
# add pairs()
pairs(msleep_num[,-c(7:8)])
The data are in tibble format (tbl_df)
is(msleep_num)
## [1] "tbl_df" "tbl" "data.frame" "list" "oldClass"
## [6] "vector"
Convert to dataframe because I don’t like tibbles!
# add data.frame()
data.frame(msleep_num <- (msleep_num))
## sleep_total sleep_rem sleep_cycle awake brainwt bodywt vore
## 1 12.1 NA NA 11.90 NA 50.000 carni
## 2 17.0 1.8 NA 7.00 0.01550 0.480 omni
## 3 14.4 2.4 NA 9.60 NA 1.350 herbi
## 4 14.9 2.3 0.1333333 9.10 0.00029 0.019 omni
## 5 4.0 0.7 0.6666667 20.00 0.42300 600.000 herbi
## 6 14.4 2.2 0.7666667 9.60 NA 3.850 herbi
## 7 8.7 1.4 0.3833333 15.30 NA 20.490 carni
## 8 7.0 NA NA 17.00 NA 0.045 <NA>
## 9 10.1 2.9 0.3333333 13.90 0.07000 14.000 carni
## 10 3.0 NA NA 21.00 0.09820 14.800 herbi
## 11 5.3 0.6 NA 18.70 0.11500 33.500 herbi
## 12 9.4 0.8 0.2166667 14.60 0.00550 0.728 herbi
## 13 10.0 0.7 NA 14.00 NA 4.750 omni
## 14 12.5 1.5 0.1166667 11.50 0.00640 0.420 herbi
## 15 10.3 2.2 NA 13.70 0.00100 0.060 omni
## 16 8.3 2.0 NA 15.70 0.00660 1.000 omni
## 17 9.1 1.4 0.1500000 14.90 0.00014 0.005 omni
## 18 17.4 3.1 0.3833333 6.60 0.01080 3.500 carni
## 19 5.3 0.5 NA 18.70 0.01230 2.950 herbi
## 20 18.0 4.9 0.3333333 6.00 0.00630 1.700 omni
## 21 3.9 NA NA 20.10 4.60300 2547.000 herbi
## 22 19.7 3.9 0.1166667 4.30 0.00030 0.023 insecti
## 23 2.9 0.6 1.0000000 21.10 0.65500 521.000 herbi
## 24 3.1 0.4 NA 20.90 0.41900 187.000 herbi
## 25 10.1 3.5 0.2833333 13.90 0.00350 0.770 omni
## 26 10.9 1.1 NA 13.10 0.11500 10.000 omni
## 27 14.9 NA NA 9.10 NA 0.071 herbi
## 28 12.5 3.2 0.4166667 11.50 0.02560 3.300 carni
## 29 9.8 1.1 0.5500000 14.20 0.00500 0.200 omni
## 30 1.9 0.4 NA 22.10 NA 899.995 herbi
## 31 2.7 0.1 NA 21.35 NA 800.000 carni
## 32 6.2 1.5 NA 17.80 0.32500 85.000 carni
## 33 6.3 0.6 NA 17.70 0.01227 2.625 herbi
## 34 8.0 1.9 1.5000000 16.00 1.32000 62.000 omni
## 35 9.5 0.9 NA 14.50 NA 1.670 herbi
## 36 3.3 NA NA 20.70 5.71200 6654.000 herbi
## 37 19.4 6.6 NA 4.60 NA 0.370 carni
## 38 10.1 1.2 0.7500000 13.90 0.17900 6.800 omni
## 39 14.2 1.9 NA 9.80 NA 0.053 herbi
## 40 14.3 3.1 0.2000000 9.70 0.00100 0.120 herbi
## 41 12.8 NA NA 11.20 NA 0.035 herbi
## 42 12.5 1.4 0.1833333 11.50 0.00040 0.022 herbi
## 43 19.9 2.0 0.2000000 4.10 0.00025 0.010 insecti
## 44 14.6 NA NA 9.40 NA 0.266 herbi
## 45 11.0 NA NA 13.00 0.01250 1.400 carni
## 46 7.7 0.9 NA 16.30 NA 0.210 herbi
## 47 14.5 NA NA 9.50 NA 0.028 carni
## 48 8.4 0.9 0.4166667 15.60 0.01210 2.500 herbi
## 49 3.8 0.6 NA 20.20 0.17500 55.500 herbi
## 50 9.7 1.4 1.4166667 14.30 0.44000 52.200 omni
## 51 15.8 NA NA 8.20 NA 162.564 carni
## 52 10.4 NA NA 13.60 0.15700 100.000 carni
## 53 13.5 NA NA 10.50 NA 161.499 carni
## 54 9.4 1.0 0.6666667 14.60 0.18000 25.235 omni
## 55 10.3 2.7 NA 13.70 0.00240 0.550 <NA>
## 56 11.0 NA NA 13.00 NA 1.100 omni
## 57 11.5 NA NA 12.50 NA 0.021 <NA>
## 58 13.7 1.8 NA 10.30 0.01140 1.620 <NA>
## 59 3.5 0.4 NA 20.50 NA 86.000 carni
## 60 5.6 NA NA 18.45 NA 53.180 carni
## 61 11.1 1.5 NA 12.90 NA 1.100 herbi
## 62 18.1 6.1 NA 5.90 0.08100 60.000 insecti
## 63 5.4 0.5 NA 18.60 0.02100 3.600 <NA>
## 64 13.0 2.4 0.1833333 11.00 0.00190 0.320 herbi
## 65 8.7 NA NA 15.30 NA 0.044 omni
## 66 9.6 1.4 NA 14.40 0.02000 0.743 omni
## 67 8.4 2.1 0.1666667 15.60 0.00120 0.075 insecti
## 68 11.3 1.1 0.1500000 12.70 0.00118 0.148 herbi
## 69 10.6 2.4 NA 13.40 0.00300 0.122 <NA>
## 70 16.6 NA NA 7.40 0.00570 0.920 herbi
## 71 13.8 3.4 0.2166667 10.20 0.00400 0.101 herbi
## 72 15.9 3.0 NA 8.10 NA 0.205 herbi
## 73 12.8 2.0 0.1833333 11.20 0.00033 0.048 <NA>
## 74 9.1 2.4 0.5000000 14.90 0.18000 86.250 omni
## 75 8.6 NA NA 15.40 0.02500 4.500 insecti
## 76 15.8 NA NA 8.20 NA 0.112 herbi
## 77 4.4 1.0 0.9000000 19.60 0.16900 207.501 herbi
## 78 15.6 2.3 NA 8.40 0.00260 0.900 omni
## 79 8.9 2.6 0.2333333 15.10 0.00250 0.104 omni
## 80 5.2 NA NA 18.80 NA 173.330 carni
## 81 6.3 1.3 NA 17.70 0.01750 2.000 carni
## 82 12.5 NA NA 11.50 0.04450 3.380 carni
## 83 9.8 2.4 0.3500000 14.20 0.05040 4.230 carni
## order
## 1 Carnivora
## 2 Primates
## 3 Rodentia
## 4 Soricomorpha
## 5 Artiodactyla
## 6 Pilosa
## 7 Carnivora
## 8 Rodentia
## 9 Carnivora
## 10 Artiodactyla
## 11 Artiodactyla
## 12 Rodentia
## 13 Primates
## 14 Rodentia
## 15 Soricomorpha
## 16 Rodentia
## 17 Soricomorpha
## 18 Cingulata
## 19 Hyracoidea
## 20 Didelphimorphia
## 21 Proboscidea
## 22 Chiroptera
## 23 Perissodactyla
## 24 Perissodactyla
## 25 Erinaceomorpha
## 26 Primates
## 27 Rodentia
## 28 Carnivora
## 29 Primates
## 30 Artiodactyla
## 31 Cetacea
## 32 Carnivora
## 33 Hyracoidea
## 34 Primates
## 35 Primates
## 36 Proboscidea
## 37 Didelphimorphia
## 38 Primates
## 39 Rodentia
## 40 Rodentia
## 41 Rodentia
## 42 Rodentia
## 43 Chiroptera
## 44 Rodentia
## 45 Primates
## 46 Rodentia
## 47 Rodentia
## 48 Lagomorpha
## 49 Artiodactyla
## 50 Primates
## 51 Carnivora
## 52 Carnivora
## 53 Carnivora
## 54 Primates
## 55 Erinaceomorpha
## 56 Primates
## 57 Rodentia
## 58 Diprotodontia
## 59 Carnivora
## 60 Cetacea
## 61 Diprotodontia
## 62 Cingulata
## 63 Hyracoidea
## 64 Rodentia
## 65 Rodentia
## 66 Primates
## 67 Soricomorpha
## 68 Rodentia
## 69 Rodentia
## 70 Rodentia
## 71 Rodentia
## 72 Rodentia
## 73 Soricomorpha
## 74 Artiodactyla
## 75 Monotremata
## 76 Rodentia
## 77 Perissodactyla
## 78 Afrosoricida
## 79 Scandentia
## 80 Cetacea
## 81 Carnivora
## 82 Carnivora
## 83 Carnivora
Remove missing values with na.omit()
#add na.omit()
msleep_num <- na.omit(msleep_num)
Principal component analysis is typically done using base R functions using the prcomp() function.
Run the PCA.
# add prcomp()
pca.out <- prcomp(msleep_num[,-c(7,8)], scale = TRUE)
Plot the biplot with biplot()
# add biplot()
biplot(pca.out)
The base R PCA output isn’t very flexible. The R package vegan is has a function rda() which does PCA and has many more nice features.
For another example see https://rpubs.com/brouwern/veganpca
Run the PCA using vegan::rda()
rda.out <- vegan::rda(msleep_num[,-c(7,8)], scale = TRUE)
Extract what are known as the PCA scores using the scores() function.
# add scores()
rda_scores <- scores(rda.out)
vegan let’s you plot a scatterplot of the scores or a full biplot with the arrows (vectors). The displays the 2D scatterplot plot without the arrows. For more info on what this code does, see the RPubs document linked above.
# add biplot()
biplot(rda.out,
display = "sites")
vegan has some nice tools for grouping things.
In this dataset I don’t expect there to be any interesting groups, but I’ll check anyway.
PCA is an exploratory method. First I’ll see if there are any groupings based on diet (“vore”). Not really.
#make the plot with biplot()
# add biplot()
biplot(rda.out, display = "sites")
#add "hulls" with the ordihull()function
# add ordihull()
ordihull(rda.out,
group = msleep_num$vore,
col = 1:7,
lty = 1:7,
lwd = c(3,6))
Copy the previous code chunk and change the group so that the taxonomic order is plotted (column “order”). Upload the image to this assignment. Consider if there are any meaningful groups.
## add your code here
#make the plot with biplot()
# add biplot()
biplot(rda.out, display = "sites")
#add "hulls" with the ordihull()function
# add ordihull()
ordihull(rda.out,
group = msleep_num$order,
col = 1:7,
lty = 1:7,
lwd = c(3,6))