Introduction to Principal Component Analysis (PCA)

Principal Component Analysis (PCA) is a statistical technique used to reduce the dimensionality of a dataset while preserving as much variability as possible. PCA achieves this by transforming the original variables into a new set of uncorrelated variables called principal components. Each principal component is a linear combination of the original variables and represents a portion of the total variance in the dataset.

The first principal component captures the maximum variance, the second captures the second most, and so on. PCA is widely used in data visualization, noise reduction, and feature extraction.

Turtle Data and PCA Analysis

We used a dataset of turtle shells that includes measurements of length, width, and height. The dataset is divided into two groups: female and male turtles. The goal of applying PCA to both groups is to understand the primary directions of variability in the shell measurements.

Here is the symmetric turtle shell image used in the analysis:

## # A tibble: 1 × 7
##   format width height colorspace matte filesize density
##   <chr>  <int>  <int> <chr>      <lgl>    <int> <chr>  
## 1 PNG      250    200 sRGB       TRUE         0 118x118

PCA Analysis Results

PCA was performed separately on the female and male turtle data. Below are the numerical values of the principal components for each group, derived from the original variables (length, width, and height).

Female Turtles PCA

## Importance of components:
##                            Comp.1      Comp.2      Comp.3
## Standard deviation     25.4970668 2.547081962 1.653745717
## Proportion of Variance  0.9860122 0.009839832 0.004148005
## Cumulative Proportion   0.9860122 0.995851995 1.000000000

Male Turtles PCA

## Importance of components:
##                           Comp.1     Comp.2      Comp.3
## Standard deviation     13.679846 1.88012597 1.028513282
## Proportion of Variance  0.976046 0.01843664 0.005517314
## Cumulative Proportion   0.976046 0.99448269 1.000000000

Scree plots

Heatmap of Loadings for Turtles PCA

Visual Representation of PCA on Turtle Shells

The composite image below visually represents the PCA-transformed turtle shells. The transformations are applied separately for female and male turtles, with each row corresponding to the principal component transformations of one group. In the heatmap, brighter colors indicate higher values for the height variable, while darker colors indicate lower values.

## # A tibble: 1 × 7
##   format width height colorspace matte filesize density
##   <chr>  <int>  <int> <chr>      <lgl>    <int> <chr>  
## 1 PNG      499    360 sRGB       TRUE         0 118x118

Conclusion

PCA effectively identifies the primary modes of variation in the shell dimensions of both female and male turtles. The visual transformations based on PCA loadings provide an intuitive understanding of how each principal component influences the shape and appearance of the turtle shells.

Access the R Markdown Script

You can access the R Markdown (RMD) script used to generate this analysis on GitHub. Click the link below to view or download the source code: