I will show how to conduct PCA with princomp(), which is a fucntion in base R to carry out PCA. prcomp() is another function in base R that is also used for PCA.
We’ll use the “palmerpenguins” packages (https://allisonhorst.github.io/palmerpenguins/) to address this question. You’ll need to install the package with install.packages(“palmerpenguins”) if you have not done so before, call library(““palmerpenguins”), and load the data with data(penguins)
#install.packages("palmerpenguins")
library(palmerpenguins)
data(penguins)
penguins
## # A tibble: 344 × 8
## species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
## <fct> <fct> <dbl> <dbl> <int> <int>
## 1 Adelie Torgersen 39.1 18.7 181 3750
## 2 Adelie Torgersen 39.5 17.4 186 3800
## 3 Adelie Torgersen 40.3 18 195 3250
## 4 Adelie Torgersen NA NA NA NA
## 5 Adelie Torgersen 36.7 19.3 193 3450
## 6 Adelie Torgersen 39.3 20.6 190 3650
## 7 Adelie Torgersen 38.9 17.8 181 3625
## 8 Adelie Torgersen 39.2 19.6 195 4675
## 9 Adelie Torgersen 34.1 18.1 193 3475
## 10 Adelie Torgersen 42 20.2 190 4250
## # … with 334 more rows, and 2 more variables: sex <fct>, year <int>
Here I’m accessing only the columns of penguins that contain numeric data.
penguins.numeric <- penguins[,c("bill_length_mm","bill_depth_mm","flipper_length_mm","body_mass_g","year","sex")]
penguins.numeric.mod <- penguins.numeric[,-c(5,6)]
summary(penguins.numeric.mod)
## bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
## Min. :32.10 Min. :13.10 Min. :172.0 Min. :2700
## 1st Qu.:39.23 1st Qu.:15.60 1st Qu.:190.0 1st Qu.:3550
## Median :44.45 Median :17.30 Median :197.0 Median :4050
## Mean :43.92 Mean :17.15 Mean :200.9 Mean :4202
## 3rd Qu.:48.50 3rd Qu.:18.70 3rd Qu.:213.0 3rd Qu.:4750
## Max. :59.60 Max. :21.50 Max. :231.0 Max. :6300
## NA's :2 NA's :2 NA's :2 NA's :2
Now, I’m using plot() to produce a scatterplot matrix
plot(penguins.numeric.mod)
Converting into a dataframe and using na.omit() to omit any values of “NA”
penguins.numeric.mod <- data.frame(penguins.numeric.mod)
penguins.numeric.mod <- na.omit(penguins.numeric.mod)
Using princomp(): The main difference between prcomp() and princomp() is that for prcomp(), the default value for scale is equal to true, while it isn’t for princomp(). Here, I need to set scale = TRUE to ensure that the data is scaled and centered.
pca.penguins <- princomp(penguins.numeric.mod, scale = TRUE)
## Warning: In princomp.default(penguins.numeric.mod, scale = TRUE) :
## extra argument 'scale' will be disregarded
## Displays the PCA
biplot(pca.penguins)
rda.out <- vegan::rda(penguins.numeric.mod, scale = TRUE)
biplot(rda.out, display = "sites")
For more information on this topic, see https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/princomp