Preface

This project aims to be as educational and detailed as possible to showcase my understanding and proficiency of the R programming language, R Markdown and the topics at hand: data science (exploration, visualization, PCA…) and machine learning (through several high level libraries).
Note that the R Markdown file outputs a .HTML document that has been uplodad to RPubs where it can be properly visualized. Both files are also available in their dedicated GitHub repository.

The goal of this project is to perform a complete exploratory data analysis upon the Wisconsin Breast Cancer Dataset describing and detailing each step of the process. The document and its content are heavily based upon the work of Miri Choi in kaggle although with a much larger scope, much more detail (throughfully commenting each step of the process) and further developed (using additional machine learning libraries and approaches).

1. Setup

1.1. Project libraries

As previously stated, this project uses the R programming language along with several libraries which, as is the norm in most non-basic R projects, are often required to complement R with additional (and specific) functions. The following code snippet installs all of the required libraries if they are not installed already (through the use of conditionals and the built-in require() function), which are presented in alphabetical order for the sake of convenience (note that the installation itself is called by the install.packages() function).

if(!require(ade4)) {
  install.packages("ade4")
}
if(!require(caret)) {
  install.packages("caret")
}
if(!require(C50)) {
  install.packages("C50")
}
if(!require(corrplot)) {
  install.packages("corrplot")
}
if(!require(data.table)) {
  install.packages("data.table")
}
if(!require(dplyr)) {
  install.packages("dplyr")
}
if(!require(ExPosition)) {
  install.packages("ExPosition")
}
if(!require(factoextra)) {
  install.packages("factoextra")
}
if(!require(FactoMineR)) {
  install.packages("FactoMineR")
}
if(!require(GGally)) {
  install.packages("GGally")
}
if(!require(ggplot2)) {
  install.packages("ggplot2")
}
if(!require(gridExtra)) {
  install.packages("gridExtra")
}
if(!require(highcharter)) {
  install.packages("highcharter")
}
if(!require(PerformanceAnalytics)) {
  install.packages("PerformanceAnalytics")
}
if(!require(PST)) {
  install.packages("PST")
}
if(!require(psych)) {
  install.packages("psych")
}
if(!require(RCurl)) {
  install.packages("RCurl")
}

Installing a given package does not mean said package (and its associated functions) are yet ready to be used. To do so, it needs to be properly loaded into the R workspace, for which there exists the built-in library() function. The following code snippet makes use of said function to import/load all of the project’s required libraries (once again, in alphabetical order for the sake of convenience).

library(ade4)
library(caret)
library(C50)
library(corrplot)
library(data.table)
library(dplyr)
library(ExPosition)
library(factoextra)
library(FactoMineR)
library(GGally)
library(ggplot2)
library(gridExtra)
library(highcharter)
library(PerformanceAnalytics)
library(PST)
library(psych)
library(RCurl)

Note that some of these libraries are also included in the tidyverse package. However, I rather understand the use-case scenario of each instead of relying on library bundles.

1.2. Importing the dataset

The next step after installing and loading all of the required packages is to load the data itself. To do so, an optional variable (refered to as urlfile in the upcoming code snippet) can be created so that it holds the the URL (string) from which to mine the data - doing so requires the use of the read.csv() function (or an equivalent function from an alternative library).

It is highly recommended to explore the .data file before importing its content into the work environment in order to check whether or not the data is preceded by a header. In this case it is not, so using the argument header = FALSE ensures the data is read properly.

More information about the read.csv() function, its behavior and its arguments is available in its associated RDocumentation page: https://www.rdocumentation.org/packages/utils/versions/3.6.2/topics/read.table

The function head() prints the very first rows of the loaded dataset, which is useful to observe how the data has been interpreted by R.

urlfile = "https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/wdbc.data"
wbcd <- read.csv(urlfile, header = FALSE)
head(wbcd)
##         V1 V2    V3    V4     V5     V6      V7      V8     V9     V10    V11
## 1   842302  M 17.99 10.38 122.80 1001.0 0.11840 0.27760 0.3001 0.14710 0.2419
## 2   842517  M 20.57 17.77 132.90 1326.0 0.08474 0.07864 0.0869 0.07017 0.1812
## 3 84300903  M 19.69 21.25 130.00 1203.0 0.10960 0.15990 0.1974 0.12790 0.2069
## 4 84348301  M 11.42 20.38  77.58  386.1 0.14250 0.28390 0.2414 0.10520 0.2597
## 5 84358402  M 20.29 14.34 135.10 1297.0 0.10030 0.13280 0.1980 0.10430 0.1809
## 6   843786  M 12.45 15.70  82.57  477.1 0.12780 0.17000 0.1578 0.08089 0.2087
##       V12    V13    V14   V15    V16      V17     V18     V19     V20     V21
## 1 0.07871 1.0950 0.9053 8.589 153.40 0.006399 0.04904 0.05373 0.01587 0.03003
## 2 0.05667 0.5435 0.7339 3.398  74.08 0.005225 0.01308 0.01860 0.01340 0.01389
## 3 0.05999 0.7456 0.7869 4.585  94.03 0.006150 0.04006 0.03832 0.02058 0.02250
## 4 0.09744 0.4956 1.1560 3.445  27.23 0.009110 0.07458 0.05661 0.01867 0.05963
## 5 0.05883 0.7572 0.7813 5.438  94.44 0.011490 0.02461 0.05688 0.01885 0.01756
## 6 0.07613 0.3345 0.8902 2.217  27.19 0.007510 0.03345 0.03672 0.01137 0.02165
##        V22   V23   V24    V25    V26    V27    V28    V29    V30    V31     V32
## 1 0.006193 25.38 17.33 184.60 2019.0 0.1622 0.6656 0.7119 0.2654 0.4601 0.11890
## 2 0.003532 24.99 23.41 158.80 1956.0 0.1238 0.1866 0.2416 0.1860 0.2750 0.08902
## 3 0.004571 23.57 25.53 152.50 1709.0 0.1444 0.4245 0.4504 0.2430 0.3613 0.08758
## 4 0.009208 14.91 26.50  98.87  567.7 0.2098 0.8663 0.6869 0.2575 0.6638 0.17300
## 5 0.005115 22.54 16.67 152.20 1575.0 0.1374 0.2050 0.4000 0.1625 0.2364 0.07678
## 6 0.005082 15.47 23.75 103.40  741.6 0.1791 0.5249 0.5355 0.1741 0.3985 0.12440

The function colnames() combines a string vector of length “N” with a dataset of “N” columns so that the elements inside the vector become the dataset’s header. Since the lack of content-descriptive column headers in the Wisconsin Breast Cancer Dataset makes the data hard to understand, the following code snippet makes use of colnames() to properly name each of the columns.

More information about the colnames() function, its behavior and its arguments is available in its associated RDocumentation page: https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/row%2Bcolnames

colnames(wbcd) <- c("id","diagnosis","radius_mean","texture_mean","perimeter_mean","area_mean","smoothness_mean","compactness_mean","concavity_mean",
                    "concave points_mean","symmetry_mean","fractal_dimension_mean","radius_se","texture_se","perimeter_se","area_se","smoothness_se",
                    "compactness_se","concavity_se","concave points_se","symmetry_se","fractal_dimension_se","radius_worst","texture_worst","perimeter_worst",
                    "area_worst","smoothness_worst","compactness_worst","concavity_worst","concave points_worst","symmetry_worst","fractal_dimension_worst")
head(wbcd)
##         id diagnosis radius_mean texture_mean perimeter_mean area_mean
## 1   842302         M       17.99        10.38         122.80    1001.0
## 2   842517         M       20.57        17.77         132.90    1326.0
## 3 84300903         M       19.69        21.25         130.00    1203.0
## 4 84348301         M       11.42        20.38          77.58     386.1
## 5 84358402         M       20.29        14.34         135.10    1297.0
## 6   843786         M       12.45        15.70          82.57     477.1
##   smoothness_mean compactness_mean concavity_mean concave points_mean
## 1         0.11840          0.27760         0.3001             0.14710
## 2         0.08474          0.07864         0.0869             0.07017
## 3         0.10960          0.15990         0.1974             0.12790
## 4         0.14250          0.28390         0.2414             0.10520
## 5         0.10030          0.13280         0.1980             0.10430
## 6         0.12780          0.17000         0.1578             0.08089
##   symmetry_mean fractal_dimension_mean radius_se texture_se perimeter_se
## 1        0.2419                0.07871    1.0950     0.9053        8.589
## 2        0.1812                0.05667    0.5435     0.7339        3.398
## 3        0.2069                0.05999    0.7456     0.7869        4.585
## 4        0.2597                0.09744    0.4956     1.1560        3.445
## 5        0.1809                0.05883    0.7572     0.7813        5.438
## 6        0.2087                0.07613    0.3345     0.8902        2.217
##   area_se smoothness_se compactness_se concavity_se concave points_se
## 1  153.40      0.006399        0.04904      0.05373           0.01587
## 2   74.08      0.005225        0.01308      0.01860           0.01340
## 3   94.03      0.006150        0.04006      0.03832           0.02058
## 4   27.23      0.009110        0.07458      0.05661           0.01867
## 5   94.44      0.011490        0.02461      0.05688           0.01885
## 6   27.19      0.007510        0.03345      0.03672           0.01137
##   symmetry_se fractal_dimension_se radius_worst texture_worst perimeter_worst
## 1     0.03003             0.006193        25.38         17.33          184.60
## 2     0.01389             0.003532        24.99         23.41          158.80
## 3     0.02250             0.004571        23.57         25.53          152.50
## 4     0.05963             0.009208        14.91         26.50           98.87
## 5     0.01756             0.005115        22.54         16.67          152.20
## 6     0.02165             0.005082        15.47         23.75          103.40
##   area_worst smoothness_worst compactness_worst concavity_worst
## 1     2019.0           0.1622            0.6656          0.7119
## 2     1956.0           0.1238            0.1866          0.2416
## 3     1709.0           0.1444            0.4245          0.4504
## 4      567.7           0.2098            0.8663          0.6869
## 5     1575.0           0.1374            0.2050          0.4000
## 6      741.6           0.1791            0.5249          0.5355
##   concave points_worst symmetry_worst fractal_dimension_worst
## 1               0.2654         0.4601                 0.11890
## 2               0.1860         0.2750                 0.08902
## 3               0.2430         0.3613                 0.08758
## 4               0.2575         0.6638                 0.17300
## 5               0.1625         0.2364                 0.07678
## 6               0.1741         0.3985                 0.12440

After checking that the data is now properly referenced by the new header, it is time to perform a proper examination of the dataset through the use of the str() and the summary() functions. The function class() is redundant in this case, as the function str() already returns the class of the dataset alongside the class and first elements of the many columns/variables which define the dataset (in other words, str() displays, in a compact manner, the internal structure of an R object) - however, it is included in the upcoming code snippet due to its usefulness within other scenarios (and given that, as stated previously, this extensive analysis aims to be educational and informative).

The summary() function, on the other hand, produces result summaries of the results of various model fitting functions: it returns the minimum, maximum, mean, median along with the first and third quartiles of any numeric-based columns/variables (for factor-based columns/variables such as the diagnosis one, it returns the occurrence of each of the factors).

More information about these functions can be found in their associated RDocumentation page:
class(wbcd)
## [1] "data.frame"
str(wbcd)
## 'data.frame':    569 obs. of  32 variables:
##  $ id                     : int  842302 842517 84300903 84348301 84358402 843786 844359 84458202 844981 84501001 ...
##  $ diagnosis              : chr  "M" "M" "M" "M" ...
##  $ radius_mean            : num  18 20.6 19.7 11.4 20.3 ...
##  $ texture_mean           : num  10.4 17.8 21.2 20.4 14.3 ...
##  $ perimeter_mean         : num  122.8 132.9 130 77.6 135.1 ...
##  $ area_mean              : num  1001 1326 1203 386 1297 ...
##  $ smoothness_mean        : num  0.1184 0.0847 0.1096 0.1425 0.1003 ...
##  $ compactness_mean       : num  0.2776 0.0786 0.1599 0.2839 0.1328 ...
##  $ concavity_mean         : num  0.3001 0.0869 0.1974 0.2414 0.198 ...
##  $ concave points_mean    : num  0.1471 0.0702 0.1279 0.1052 0.1043 ...
##  $ symmetry_mean          : num  0.242 0.181 0.207 0.26 0.181 ...
##  $ fractal_dimension_mean : num  0.0787 0.0567 0.06 0.0974 0.0588 ...
##  $ radius_se              : num  1.095 0.543 0.746 0.496 0.757 ...
##  $ texture_se             : num  0.905 0.734 0.787 1.156 0.781 ...
##  $ perimeter_se           : num  8.59 3.4 4.58 3.44 5.44 ...
##  $ area_se                : num  153.4 74.1 94 27.2 94.4 ...
##  $ smoothness_se          : num  0.0064 0.00522 0.00615 0.00911 0.01149 ...
##  $ compactness_se         : num  0.049 0.0131 0.0401 0.0746 0.0246 ...
##  $ concavity_se           : num  0.0537 0.0186 0.0383 0.0566 0.0569 ...
##  $ concave points_se      : num  0.0159 0.0134 0.0206 0.0187 0.0188 ...
##  $ symmetry_se            : num  0.03 0.0139 0.0225 0.0596 0.0176 ...
##  $ fractal_dimension_se   : num  0.00619 0.00353 0.00457 0.00921 0.00511 ...
##  $ radius_worst           : num  25.4 25 23.6 14.9 22.5 ...
##  $ texture_worst          : num  17.3 23.4 25.5 26.5 16.7 ...
##  $ perimeter_worst        : num  184.6 158.8 152.5 98.9 152.2 ...
##  $ area_worst             : num  2019 1956 1709 568 1575 ...
##  $ smoothness_worst       : num  0.162 0.124 0.144 0.21 0.137 ...
##  $ compactness_worst      : num  0.666 0.187 0.424 0.866 0.205 ...
##  $ concavity_worst        : num  0.712 0.242 0.45 0.687 0.4 ...
##  $ concave points_worst   : num  0.265 0.186 0.243 0.258 0.163 ...
##  $ symmetry_worst         : num  0.46 0.275 0.361 0.664 0.236 ...
##  $ fractal_dimension_worst: num  0.1189 0.089 0.0876 0.173 0.0768 ...
summary(wbcd)
##        id             diagnosis          radius_mean      texture_mean  
##  Min.   :     8670   Length:569         Min.   : 6.981   Min.   : 9.71  
##  1st Qu.:   869218   Class :character   1st Qu.:11.700   1st Qu.:16.17  
##  Median :   906024   Mode  :character   Median :13.370   Median :18.84  
##  Mean   : 30371831                      Mean   :14.127   Mean   :19.29  
##  3rd Qu.:  8813129                      3rd Qu.:15.780   3rd Qu.:21.80  
##  Max.   :911320502                      Max.   :28.110   Max.   :39.28  
##  perimeter_mean     area_mean      smoothness_mean   compactness_mean 
##  Min.   : 43.79   Min.   : 143.5   Min.   :0.05263   Min.   :0.01938  
##  1st Qu.: 75.17   1st Qu.: 420.3   1st Qu.:0.08637   1st Qu.:0.06492  
##  Median : 86.24   Median : 551.1   Median :0.09587   Median :0.09263  
##  Mean   : 91.97   Mean   : 654.9   Mean   :0.09636   Mean   :0.10434  
##  3rd Qu.:104.10   3rd Qu.: 782.7   3rd Qu.:0.10530   3rd Qu.:0.13040  
##  Max.   :188.50   Max.   :2501.0   Max.   :0.16340   Max.   :0.34540  
##  concavity_mean    concave points_mean symmetry_mean    fractal_dimension_mean
##  Min.   :0.00000   Min.   :0.00000     Min.   :0.1060   Min.   :0.04996       
##  1st Qu.:0.02956   1st Qu.:0.02031     1st Qu.:0.1619   1st Qu.:0.05770       
##  Median :0.06154   Median :0.03350     Median :0.1792   Median :0.06154       
##  Mean   :0.08880   Mean   :0.04892     Mean   :0.1812   Mean   :0.06280       
##  3rd Qu.:0.13070   3rd Qu.:0.07400     3rd Qu.:0.1957   3rd Qu.:0.06612       
##  Max.   :0.42680   Max.   :0.20120     Max.   :0.3040   Max.   :0.09744       
##    radius_se        texture_se      perimeter_se       area_se       
##  Min.   :0.1115   Min.   :0.3602   Min.   : 0.757   Min.   :  6.802  
##  1st Qu.:0.2324   1st Qu.:0.8339   1st Qu.: 1.606   1st Qu.: 17.850  
##  Median :0.3242   Median :1.1080   Median : 2.287   Median : 24.530  
##  Mean   :0.4052   Mean   :1.2169   Mean   : 2.866   Mean   : 40.337  
##  3rd Qu.:0.4789   3rd Qu.:1.4740   3rd Qu.: 3.357   3rd Qu.: 45.190  
##  Max.   :2.8730   Max.   :4.8850   Max.   :21.980   Max.   :542.200  
##  smoothness_se      compactness_se      concavity_se     concave points_se 
##  Min.   :0.001713   Min.   :0.002252   Min.   :0.00000   Min.   :0.000000  
##  1st Qu.:0.005169   1st Qu.:0.013080   1st Qu.:0.01509   1st Qu.:0.007638  
##  Median :0.006380   Median :0.020450   Median :0.02589   Median :0.010930  
##  Mean   :0.007041   Mean   :0.025478   Mean   :0.03189   Mean   :0.011796  
##  3rd Qu.:0.008146   3rd Qu.:0.032450   3rd Qu.:0.04205   3rd Qu.:0.014710  
##  Max.   :0.031130   Max.   :0.135400   Max.   :0.39600   Max.   :0.052790  
##   symmetry_se       fractal_dimension_se  radius_worst   texture_worst  
##  Min.   :0.007882   Min.   :0.0008948    Min.   : 7.93   Min.   :12.02  
##  1st Qu.:0.015160   1st Qu.:0.0022480    1st Qu.:13.01   1st Qu.:21.08  
##  Median :0.018730   Median :0.0031870    Median :14.97   Median :25.41  
##  Mean   :0.020542   Mean   :0.0037949    Mean   :16.27   Mean   :25.68  
##  3rd Qu.:0.023480   3rd Qu.:0.0045580    3rd Qu.:18.79   3rd Qu.:29.72  
##  Max.   :0.078950   Max.   :0.0298400    Max.   :36.04   Max.   :49.54  
##  perimeter_worst    area_worst     smoothness_worst  compactness_worst
##  Min.   : 50.41   Min.   : 185.2   Min.   :0.07117   Min.   :0.02729  
##  1st Qu.: 84.11   1st Qu.: 515.3   1st Qu.:0.11660   1st Qu.:0.14720  
##  Median : 97.66   Median : 686.5   Median :0.13130   Median :0.21190  
##  Mean   :107.26   Mean   : 880.6   Mean   :0.13237   Mean   :0.25427  
##  3rd Qu.:125.40   3rd Qu.:1084.0   3rd Qu.:0.14600   3rd Qu.:0.33910  
##  Max.   :251.20   Max.   :4254.0   Max.   :0.22260   Max.   :1.05800  
##  concavity_worst  concave points_worst symmetry_worst   fractal_dimension_worst
##  Min.   :0.0000   Min.   :0.00000      Min.   :0.1565   Min.   :0.05504        
##  1st Qu.:0.1145   1st Qu.:0.06493      1st Qu.:0.2504   1st Qu.:0.07146        
##  Median :0.2267   Median :0.09993      Median :0.2822   Median :0.08004        
##  Mean   :0.2722   Mean   :0.11461      Mean   :0.2901   Mean   :0.08395        
##  3rd Qu.:0.3829   3rd Qu.:0.16140      3rd Qu.:0.3179   3rd Qu.:0.09208        
##  Max.   :1.2520   Max.   :0.29100      Max.   :0.6638   Max.   :0.20750
The code snippet available right above showcases the dataset’ class, structure and overall summary - as expected from the functions at play. For the purposes of this document, the observations worth highlighting at this point are the following:
  • There is an id column which holds no valuable information.
  • Every tumor measurement appears thrice (e.g., there is a perimeter_mean, a perimeter_se and a perimeter_worst - each with its own mean, standard error and maximum and minimum values).

1.3. Data wrangling

Data wrangling, sometimes referred to as data munging, is the process of transforming and mapping data from one “raw” data form into another format with the intent of making it more appropriate and valuable for a variety of downstream purposes such as analytics.

Since the dataset at hand is already of class data.frame, there is no need to transform its class in any way (the functions to be used work with data.frame class objects). Given that, the next procedure is to clean any empty values (NULL and NAs through the use of the is.null() and is.na() functions) which could meddle with later operations - let that be the first wrangling operation to be performed:

# Check for NULL data
null_check <- c()
for (i in 1:dim(wbcd)[1]) {
  for (j in 1:dim(wbcd)[2]) {
    # print(wbcd[i,j]) # This print() snippet allows to check each individual value passed onto this loop
    append(null_check, is.null(wbcd[i,j]))
  }
}
null_check # No NULL data (null_check = NULL)
## NULL

# Check for NA data
na_check <- c()
for (i in 1:dim(wbcd)[1]) {
  for (j in 1:dim(wbcd)[2]) {
    # print(wbcd[i,j]) # This print() snippet allows to check each individual value passed onto this loop
    append(null_check, is.na(wbcd[i,j]))
  }
}
na_check # No NA data (na_check = NULL)
## NULL

As stated previously, there is no need for the id column within our dataset (since it holds no valuable information), so the next (and final) wrangling procedure is to get rid of it. The following code snippet showcases an addition optional step, albeit a recommended one: the factors’ nomenclature are changed to better convey its meaning (this helps those unfamiliarized with the dataset to better interpret it).

wbcd <- wbcd[,-1]
wbcd$diagnosis <- factor(ifelse(wbcd$diagnosis=="B","Benign","Malignant"))

2. Data Exploration

2.1. Correlation Charts

Data analysts/scientists aim to study and understand a given set of data - correlation charts facilitate said study, clearly showing which variables are independent and which are not. What’s more, these correlations are core for the Principal Component Analysis (PCA) which is to be performed (and detailed) later on this document.

As stated previously, examining the data allows the user to appreciate that every tumor measurement appears thrice (e.g., there is a perimeter_mean, a perimeter_se and a perimeter_worst variables - each with its own mean, standard error and maximum and minimum values). Given that, this correlation chart exercise is performed upon 3 distinct groups:
  • One with mean variables.
  • One with standard error (se) variables.
  • One with worst variables.
There are also multiple functions which can be used to plot the correlation charts - this document covers the following:
  • The chart.Correlation() function from the PerformanceAnalytics package.
  • The pairs.panels() function from the psych package.
  • The ggpairs() function from the GGally package.
  • The ggcorr() function from the GGally package.

2.1.1. chart.Correlation()

The chart.Correlation() function from the PerformanceAnalytics package allows the user to plot a correlation chart based on the arguments at play, which are the following:
  • R: data to correlate against itself (it can either be a vector, a matrix or a timeseries).
  • histogram: TRUE or FALSE whether or not to display a histogram.
  • method: a character string indicating which correlation coefficient (or covariance) is to be computed. Options are “pearson” (default), “kendall” and “spearman”.
# Do not run this code snippet, as it is only here for illustration purposes
library(PerformanceAnalytics)
chart.Correlation(R, 
                  histogram = TRUE, 
                  method = "pearson", 
                  ...)

Additional arguments can be passed through in order to better define the aesthetic elements of the scatter plots and optional histogram - the function accepts any arguments that can passed through into pairs. Further information regarding the function chart.Correlation(), its behavior and its arguments is available in its RDocumentation associated page: https://www.rdocumentation.org/packages/PerformanceAnalytics/versions/2.0.4/topics/chart.Correlation

The following code snippet showcases the function at hand, although only Pearson correlation is plotted for the sake of simplicity and readability.

library(PerformanceAnalytics)

# Analysis of the Pearson correlation between variables
chart.Correlation(wbcd[,c(2:11)], 
                  histogram=TRUE, 
                  method = "pearson",
                  col="grey10", 
                  pch=1,
                  main="Correlation chart for mean data")

chart.Correlation(wbcd[,c(12:21)], 
                  histogram=TRUE, 
                  method = "pearson",
                  col="grey10", 
                  pch=1,
                  main="Correlation chart for se data")

chart.Correlation(wbcd[,c(22:31)], 
                  histogram=TRUE,
                  method = "pearson",
                  col="grey10", 
                  pch=1,
                  main="Correlation chart for worst data")

# Analysis of the Kendall correlation between variables
chart.Correlation(wbcd[,c(2:11)], 
                  method = "kendall", 
                  histogram=TRUE, 
                  col="grey10", 
                  pch=1,
                  main="Correlation chart for mean data")

chart.Correlation(wbcd[,c(12:21)], 
                  method = "kendall", 
                  histogram=TRUE, 
                  col="grey10", 
                  pch=1,
                  main="Correlation chart for se data")

chart.Correlation(wbcd[,c(22:31)], 
                  method = "kendall", 
                  histogram=TRUE, 
                  col="grey10", 
                  pch=1,
                  main="Correlation chart for worst data")

# Analysis of the Spearman correlation between variables
chart.Correlation(wbcd[,c(2:11)],
                  method = "spearman", 
                  histogram=TRUE, 
                  col="grey10", 
                  pch=1,
                  main="Correlation chart for mean data")

chart.Correlation(wbcd[,c(12:21)], 
                  method = "spearman", 
                  histogram=TRUE, 
                  col="grey10", 
                  pch=1,
                  main="Correlation chart for se data")

chart.Correlation(wbcd[,c(22:31)],
                  method = "spearman", 
                  histogram=TRUE, 
                  col="grey10", 
                  pch=1,
                  main="Correlation chart for worst data")

Mean

SE

Worst

2.1.2. pairs.panels()

The pairs.panels() function from the psych package allows the user to plot a correlation chart based on the arguments at play, which are the following:
  • x: data to correlate against itself (a matrix or data.frame).
  • smooth: TRUE draws loess smooths.
  • scale: TRUE scales the correlation font by the size of the absolute correlation.
  • density: TRUE shows the density plots as well as histograms.
  • ellipses: TRUE draws correlation ellipses.
  • lm: TRUE plots the linear fit rather than the LOESS smoothed fits.
  • digits: the number of digits to show.
  • method: a character string indicating which correlation coefficient (or covariance) is to be computed. Options are “pearson” (default), “kendall” and “spearman”.
  • pch: the plot character (defaults to 20 which is a ‘.’).
  • cor: TRUE or FALSE determines whether or not to report correlations if plotting regressions.
  • jiggle: TRUE or FALSE determines whether or not the data points are jittered before being plotted.
  • factor: factor for jittering (1-5)
  • hist.col: defines the histogram’s color.
  • show.points: if FALSE, do not show the data points, just the data ellipses and smoothed functions.
  • rug: TRUE or FALSE determines whether or not a rug is drawed under the histograms.
  • breaks: if specified, allows control for the number of breaks in the histogram.
  • cex.cor: if one just specifies cex then the argument will only determines the size of the text in the correlation’s boxes, but if cex.cor is specified then the argument will function to change the points’ size.
  • wt: if specified, then weight the correlations by a weights matrix.
  • smoother: if TRUE, then smooth.scatter is applied upon the data points (which is slow but pretty when lots of subjects are to be plotted).
  • stars: TRUE or FALSE determines whether or not to show the significance of correlations by using astricks [*].
  • ci: TRUE or FALSE determines whether or not to draw confidence intervals for the linear model or for the loess fit. If confidence intervals are not drawn, the fitting function is lowess.
  • alpha: the alpha level for the confidence regions.
# Do not run this code snippet, as it is only here for illustration purposes
library(psych)
pairs.panels(x, 
             smooth = TRUE, 
             scale = FALSE, 
             density = TRUE, 
             ellipses = TRUE, 
             lm = FALSE, 
             digits = 2, 
             method = "pearson",
             pch = 20,
             cor = TRUE,
             jiggle = FALSE, 
             factor = 2, 
             hist.col = "cyan", 
             show.points = TRUE, 
             rug = TRUE,
             breaks = "Sturges",
             cex.cor = 1,
             wt = NULL,
             smoother = FALSE, 
             stars = FALSE, 
             ci = FALSE, 
             alpha = .05, 
             ...)

Like with the chart.Correlation() function, additional arguments can be passed through in order to better define the scatter plot and the histogram, which is not optional with this function. Besides that, the function pairs.panels() can be considered more customizable than chart.Correlation() due to the sheer amount of specific arguments that can be used with it. Note that, as was the case with chart.Correlation(), the function pairs.panels() also accepts any arguments that can passed through into pairs.

For more information about the function itself, here’s the rdocumentation.org related page: https://www.rdocumentation.org/packages/psych/versions/2.1.6/topics/pairs.panels

The following code snippet showcases the function at hand, although only Pearson correlation is plotted for the sake of simplicity and readability.

library(psych)

# Analysis of the Pearson correlation between variables
pairs.panels(wbcd[,c(2:11)], 
             method="pearson", 
             hist.col = "#cccccc", 
             pch=1, lm=TRUE, 
             stars = TRUE, 
             main="Correlation chart for mean data")

pairs.panels(wbcd[,c(12:21)],
             method="pearson",
             hist.col = "#cccccc", 
             pch=1, 
             lm=TRUE, 
             stars = TRUE, 
             main="Correlation chart for se data")

pairs.panels(wbcd[,c(22:31)], 
             method="pearson", 
             hist.col = "#cccccc", 
             pch=1, 
             lm=TRUE, 
             stars = TRUE, 
             main="Correlation chart for worst data")

# Analysis of the Kendall correlation between variables
pairs.panels(wbcd[,c(2:11)], 
             method="kendall",
             hist.col = "#cccccc",
             pch=1, 
             lm=TRUE, 
             stars = TRUE, 
             main="Correlation chart for mean data")

pairs.panels(wbcd[,c(12:21)], 
             method="kendall", 
             hist.col = "#cccccc", 
             pch=1, 
             lm=TRUE, 
             stars = TRUE, 
             main="Correlation chart for se data")

pairs.panels(wbcd[,c(22:31)], 
             method="kendall", 
             hist.col = "#cccccc", 
             pch=1, 
             lm=TRUE, 
             stars = TRUE, 
             main="Correlation chart for worst data")

# Analysis of the Spearman correlation between variables
pairs.panels(wbcd[,c(2:11)], 
             method="spearman", 
             hist.col = "#cccccc",
             pch=1, 
             lm=TRUE, 
             stars = TRUE, 
             main="Correlation chart for mean data")

pairs.panels(wbcd[,c(12:21)], 
             method="spearman", 
             hist.col = "#cccccc",
             pch=1, 
             lm=TRUE, 
             stars = TRUE, 
             main="Correlation chart for se data")

pairs.panels(wbcd[,c(22:31)], 
             method="spearman", 
             hist.col = "#cccccc", 
             pch=1, 
             lm=TRUE, 
             stars = TRUE,
             main="Correlation chart for worst data")

Mean

SE

Worst

2.1.3. ggpairs()

The ggpairs() function from the GGally package allows the user to plot a correlation chart based on the arguments at play, among which it is worth noting the following:
  • x: dataset to correlate against itself (can have both numerical and categorical data).
  • mapping: aesthetic mapping.
  • columns: which columns are used to make plots. Defaults to all columns.
  • title, xlab, ylab: title for the graph and labels for the x and y axis (respectively).
# Do not run this code snippet, as it is only here for illustration purposes
library(GGally)
ggpairs(x, 
        mapping = NULL, 
        columns = 1:ncol(data),
        title = NULL,
        xlab = NULL,
        ylab = NULL,
        ...)

Like with the previous functions, additional arguments can be passed through. Fact is that ggpairs() is based upon ggplot2, which makes it compatible with many of the arguments available for said kind of plots and it would be besides the scope of this document to cover them all. Some of which are showcased in the upcoming code snippet, but more information about the function and its potential arguments can be found in its associated RDocumentation page: https://www.rdocumentation.org/packages/GGally/versions/1.5.0/topics/ggpairs

Beware that ggplot2 is one of the most powerful R tools to create any sort of graphics (arguably the most powerful one). The following RDocumentation page overviews its installation and usage: https://www.rdocumentation.org/packages/ggplot2/versions/3.3.5

The following code snippet showcases the function at hand. Note that, as oppossed to the previously detailed functions, ggpairs() does not allow the user to specify which correlation methodology to apply to the dataset - it applies Pearson’s and that’s about it.

library(GGally)

ggpairs(wbcd[,c(2:11)],) + 
  theme_bw() +
  labs(title = "Correlation chart for mean data") +
  theme(plot.title = element_text(face = 'bold', color = '#000000', hjust = 0.5, size = 14))

ggpairs(wbcd[,c(12:21)],) + 
  theme_bw() +
  labs(title = "Correlation chart for se data") +
  theme(plot.title = element_text(face = 'bold', color = '#000000', hjust = 0.5, size = 14))

ggpairs(wbcd[,c(22:31)],) + 
  theme_bw() +
  labs(title = "Correlation chart for worst data") +
  theme(plot.title = element_text(face = 'bold', color = '#000000', hjust = 0.5, size = 14))
Mean

SE

Worst


Being based upon ggplot2 is a strong point in favor of ggpairs() and creativity can go a long way: data science is not only about understanding the data at hand but also to make it understandable for others - the many possibilities ggplot2 brings along can dramatically increase the plot’s readability thus making it easier to be understood by non-specialists (as in not data scientists). The upcoming code snippet slightly modifies the previous one showcasing a diagnosis-based coloring which makes the graph way easier to interpret (while also highlighting the striking similarities between this function’ structure and ggplot()’s).

library(GGally)

ggpairs(wbcd[,c(2:11,1)], aes(color = diagnosis, alpha = 0.75), lower = list(continuous = "smooth")) + 
  theme_bw() +
  labs(title = "Correlation chart for mean data") +
  theme(plot.title = element_text(face = 'bold', color = '#000000', hjust = 0.5, size = 14))

ggpairs(wbcd[,c(12:21,1)], aes(color = diagnosis, alpha = 0.75), lower = list(continuous = "smooth")) + 
  theme_bw() +
  labs(title = "Correlation chart for mean data") +
  theme(plot.title = element_text(face = 'bold', color = '#000000', hjust = 0.5, size = 14))

ggpairs(wbcd[,c(22:31,1)], aes(color = diagnosis, alpha = 0.75), lower = list(continuous = "smooth")) + 
  theme_bw() +
  labs(title = "Correlation chart for mean data") +
  theme(plot.title = element_text(face = 'bold', color = '#000000', hjust = 0.5, size = 14))

Mean

SE

Worst

2.1.4. ggcorr()

The ggcorr() function from the GGally package allows the user to plot a simplified correlation chart focused solely on correlation values, which increases their visibility and readability (helping to illustrate and showcase certain points) at the cost of omitting the data itself (neither the datapoints and their scatter plot nor the histograms are plotted with this function).

As with previous functions, the plot is based on the arguments at play which are the following:
  • data: a data frame or matrix containing numeric (continuous) data.
  • method: a vector of two character strings. The first value gives the method for computing covariances in the presence of missing values and must be one of “everything”, “all.obs”, “complete.obs”, “na.or.complete” or “pairwise.complete.obs” (abbreviations work); the second value gives the type of correlation coefficient to compute, and must be one of “pearson”, “kendall” or “spearman”.
  • cor_matrix: the named correlation matrix to use for calculations. Defaults to the correlation matrix of data when data is supplied.
  • nbreaks: the number of breaks to apply to the correlation coefficients, which results in a categorical color scale. Defaults to NULL which implies no breaks (continuous scaling).
  • digits: the number of digits to show in the breaks of the correlation coefficients.
  • low: the lower color of the gradient for continuous scaling of the correlation coefficients.
  • mid: the midpoint color of the gradient for continuous scaling of the correlation coefficients.
  • high: the upper color of the gradient for continuous scaling of the correlation coefficients.
  • midpoint: the midpoint value for continuous scaling of the correlation coefficients.
  • palette: if nbreaks is used, a ColorBrewer palette to use instead of the colors specified by low, mid and high
  • geom: the geom object to use. Accepts either “tile”, “circle”, “text” or “blank”.
  • min_size: when geom has been set to “circle”, the minimum size of the circles.
  • max_size: when geom has been set to “circle”, the maximum size of the circles.
  • label: TRUE or FALSE determines whether or not to add correlation coefficients to the plot.
  • label_alpha, label_color, label_round, label_size
  • : aesthetic components for the label.
  • limits: bounding of color scaling for correlations, set limits = NULL or FALSE to remove.
  • drop: if using nbreaks, TRUE or FALSE determines whether or not to drop unused breaks from the color scale.
  • layout.exp: a multiplier to expand the horizontal axis to the left if variable names get clipped.
  • legend.position: where to put the legend of the correlation coefficients.
  • legend.size: the size of the legend title and labels.

As was the case with ggpairs(), this function is also based upon ggplot2 making it compatible with many of ggplot2’s arguments. More information regarding the function and its arguments is available in its associated RDocumentation page: https://www.rdocumentation.org/packages/GGally/versions/1.5.0/topics/ggcorr

# Do not run this code snippet, as it is only here for illustration purposes
library(GGally)
ggcorr(data,
       method = c("pairwise", "pearson"),
       cor_matrix = NULL,
       nbreaks = NULL,
       digits = 2,
       name = "",
       low = "#3B9AB2",
       mid = "#EEEEEE",
       high = "#F21A00",
       midpoint = 0,
       palette = NULL,
       geom = "tile",
       min_size = 2,
       max_size = 6,
       label = FALSE,
       label_alpha = FALSE,
       label_color = "black",
       label_round = 1,
       label_size = 4,
       limits = c(-1, 1),
       drop = is.null(limits) || identical(limits, FALSE),
       layout.exp = 0,
       legend.position = "right",
       legend.size = 9,
       ...)

The following code snippets showcase the function at hand being applied to the Wisconsin Breast Cancer Dataset, although only Pearson correlation is plotted for the sake of simplicity and readability.

library(GGally)

# Analysis of the Pearson correlation between variables using the pairs.panels() function from the psych package
ggcorr(wbcd[,c(2:11)],
       method = c("pairwise", "pearson"),
       name = "corr",
       geom = "tile",
       label = TRUE) +
  theme(legend.position = "none") +
  labs(title = "Correlation chart for mean data") +
  theme(plot.title = element_text(face = 'bold', color = 'black', hjust = 0.5, size = 12))

ggcorr(wbcd[,c(12:21)],
       method = c("pairwise", "pearson"),
       name = "corr",
       geom = "tile",
       label = TRUE) +
  theme(legend.position = "none") +
  labs(title = "Correlation chart for se data") +
  theme(plot.title = element_text(face = 'bold', color = 'black', hjust = 0.5, size = 12))

ggcorr(wbcd[,c(22:31)],
       method = c("pairwise", "pearson"),
       name = "corr",
       geom = "tile",
       label = TRUE) +
  theme(legend.position = "none") +
  labs(title = "Correlation chart for worst data") +
  theme(plot.title = element_text(face = 'bold', color = 'black', hjust = 0.5, size = 12))

# Analysis of the Pearson correlation between variables using the pairs.panels() function from the psych package
ggcorr(wbcd[,c(2:11)],
       method = c("pairwise", "kendall"),
       name = "corr",
       geom = "tile",
       label = TRUE) +
  theme(legend.position = "none") +
  labs(title = "Correlation chart for mean data") +
  theme(plot.title = element_text(face = 'bold', color = 'black', hjust = 0.5, size = 12))

ggcorr(wbcd[,c(12:21)],
       method = c("pairwise", "kendall"),
       name = "corr",
       geom = "tile",
       label = TRUE) +
  theme(legend.position = "none") +
  labs(title = "Correlation chart for se data") +
  theme(plot.title = element_text(face = 'bold', color = 'black', hjust = 0.5, size = 12))

ggcorr(wbcd[,c(22:31)],
       method = c("pairwise", "kendall"),
       name = "corr",
       geom = "tile",
       label = TRUE) +
  theme(legend.position = "none") +
  labs(title = "Correlation chart for worst data") +
  theme(plot.title = element_text(face = 'bold', color = 'black', hjust = 0.5, size = 12))

# Analysis of the Pearson correlation between variables using the pairs.panels() function from the psych package
ggcorr(wbcd[,c(2:11)],
       method = c("pairwise", "spearman"),
       name = "corr",
       geom = "tile",
       label = TRUE) +
  theme(legend.position = "none") +
  labs(title = "Correlation chart for mean data") +
  theme(plot.title = element_text(face = 'bold', color = 'black', hjust = 0.5, size = 12))

ggcorr(wbcd[,c(12:21)],
       method = c("pairwise", "spearman"),
       name = "corr",
       geom = "tile",
       label = TRUE) +
  theme(legend.position = "none") +
  labs(title = "Correlation chart for se data") +
  theme(plot.title = element_text(face = 'bold', color = 'black', hjust = 0.5, size = 12))

ggcorr(wbcd[,c(22:31)],
       method = c("pairwise", "spearman"),
       name = "corr",
       geom = "tile",
       label = TRUE) +
  theme(legend.position = "none") +
  labs(title = "Correlation chart for worst data") +
  theme(plot.title = element_text(face = 'bold', color = 'black', hjust = 0.5, size = 12))

Mean

SE

Worst

2.2. Principal Component Analysis

Principal Component Analysis, mostly known as PCA, is the most popular approach to dimensional reduction. It is widely used to summarize and to visualize the information within a dataset, often described through multiple inter-correlated quantitative variables. Considering each variable a dimension turns any dataset into a multi-dimensional matrix/hyperspace which is impossible to visualize when said dataset is defined by more than 3 variables (since there exist only 3 spatial dimensions) - that is where dimensional reduction (and thus, PCA) comes into play.

Through PCA, the important information from a multivariate dataset can be extracted and expressed through a set of new variables called Principal Components (PC - singular; PCs - plural). These PCs are linear combinations of the original variables, so expressing the dataset using these PCs as variables reduces the overall number of variables needed to understand the data. Using this approach, PCA reduces the dimensionality of a multivariate dataset in order to identify patterns and better understand these highly complex matrices. This process also allows to visualize said multivariate datasets with minimal loss of information by reducing the number of PCs down to either 2 (for a bi-dimensional plot) or 3 (for a three-dimensional one).

There’s a fantastic video by content creator Josh Starmer which explains PCA step by step, detailing the intricacies of this technique and all of the elements at play. There are many aspects of PCA that will not be covered by the scope of this documents due to time constraints and to keep the overall size within reasonable limits, so the video essay in question is highly recommended as an excellent starting point: https://www.youtube.com/watch?v=FgakZw6K1QQ

Several functions from different packages can be used to perform a PCA:
  • prcomp() and princomp(), which are R built-in functions
  • PCA() from the FactoMineR package
  • dudi.pca() from the ade4 package
  • epPCA() from the ExPosition package

2.2.1. PCA with R built-in functions

As already stated, there are two built-in R functions (from within R’s built-in stats package) that allow the user to perform a Principal Component Analysis: prcomp() and princomp(). Despite the similarities in name, their approaches to PCA are quite different: prcomp() is a singular value decomposition (SVD) which means that it is based upon the covariances/correlations between individuals, whereas princomp() is a spectral decomposition which instead examines the covariances/correlations between variables.

It is worth noting that, according to the R help module, SVD has slightly better numerical accuracy and that makes prcomp() is the preferred approach (although there could exist a scenario under which princomp() yields better results).

Let’s take a look at the main arguments for both functions:

# Do not run this code snippet, as it is only here for illustration purposes
prcomp(x, 
       retx = TRUE, 
       center = TRUE, 
       scale. = FALSE,
       tol = NULL, 
       rank. = NULL, 
       ...)

princomp(x, 
         cor = FALSE, 
         scores = TRUE, ...)
Let’s detail the arguments for prcomp():
  • x: the numeric matrix or dataset/dataframe upon which to perform the PCA.
  • retx: TRUE or FALSE determines whether or not to return the rotated variables.
  • center: TRUE or FALSE determines if the variables should be shifted to be zero centered.
  • scale.: in Principal Component Analysis, variables are often scaled (i.e. standardized). This is particularly recommended when variables are measured in different scales (e.g: kilograms, kilometers, centimeters…); otherwise, the PCA outputs obtained will be severely affected. Generally, variables are scaled to have a standard deviation of 1 and a mean of 0, which is achieved by subtracting from each variable’s element the variable’s mean and then dividing the result by the variable’ standard deviation.
    Fact is scaling is useful under most scenarios and, to further empathize its relevance, every PCA’s code snippet will showcase that the data is being scaled even if such is the default behavior of the associated argument.
  • tol: stands for tolerance; a value indicating the magnitude below which components should be omitted (defaults to tol = NULL with which no components are omitted).
  • rank.: a number specifying the maximal rank, i.e., maximal number of principal components to be used.

More information regarding the prcomp() function and its arguments is available in its associated RDocumentation page: https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/prcomp

Let’s detail the arguments for princomp():
  • x: the numeric matrix or dataset/dataframe upon which to perform the PCA.
  • cor: a logical value indicating whether the calculation should use the correlation matrix or the covariance matrix.
  • scores: a logical value indicating whether the score on each principal component should be calculated.
  • covmat: an optional covariance matrix to be used rather than the covariance matrix of x.
  • subset: an optional vector used to select rows (observations) of the data matrix x.
  • fix_sign: a logical value indicating whether to choose the signs of the loadings and scores so that the first element of each loading is non-negative.

More information regarding the princomp() function and its arguments is available in its associated RDocumentation page: https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/princomp

Let’s now evaluate the results when applying prcomp() to the Wisconsin Breast Cancer Dataset:

All
all_pca_1 <- prcomp(wbcd[,-1], scale. = TRUE)
class(all_pca_1)
## [1] "prcomp"
str(all_pca_1)
## List of 5
##  $ sdev    : num [1:30] 3.64 2.39 1.68 1.41 1.28 ...
##  $ rotation: num [1:30, 1:30] -0.219 -0.104 -0.228 -0.221 -0.143 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:30] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
##   .. ..$ : chr [1:30] "PC1" "PC2" "PC3" "PC4" ...
##  $ center  : Named num [1:30] 14.1273 19.2896 91.969 654.8891 0.0964 ...
##   ..- attr(*, "names")= chr [1:30] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
##  $ scale   : Named num [1:30] 3.524 4.301 24.299 351.9141 0.0141 ...
##   ..- attr(*, "names")= chr [1:30] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
##  $ x       : num [1:569, 1:30] -9.18 -2.39 -5.73 -7.12 -3.93 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : NULL
##   .. ..$ : chr [1:30] "PC1" "PC2" "PC3" "PC4" ...
##  - attr(*, "class")= chr "prcomp"
summary(all_pca_1)
## Importance of components:
##                           PC1    PC2     PC3     PC4     PC5     PC6     PC7
## Standard deviation     3.6444 2.3857 1.67867 1.40735 1.28403 1.09880 0.82172
## Proportion of Variance 0.4427 0.1897 0.09393 0.06602 0.05496 0.04025 0.02251
## Cumulative Proportion  0.4427 0.6324 0.72636 0.79239 0.84734 0.88759 0.91010
##                            PC8    PC9    PC10   PC11    PC12    PC13    PC14
## Standard deviation     0.69037 0.6457 0.59219 0.5421 0.51104 0.49128 0.39624
## Proportion of Variance 0.01589 0.0139 0.01169 0.0098 0.00871 0.00805 0.00523
## Cumulative Proportion  0.92598 0.9399 0.95157 0.9614 0.97007 0.97812 0.98335
##                           PC15    PC16    PC17    PC18    PC19    PC20   PC21
## Standard deviation     0.30681 0.28260 0.24372 0.22939 0.22244 0.17652 0.1731
## Proportion of Variance 0.00314 0.00266 0.00198 0.00175 0.00165 0.00104 0.0010
## Cumulative Proportion  0.98649 0.98915 0.99113 0.99288 0.99453 0.99557 0.9966
##                           PC22    PC23   PC24    PC25    PC26    PC27    PC28
## Standard deviation     0.16565 0.15602 0.1344 0.12442 0.09043 0.08307 0.03987
## Proportion of Variance 0.00091 0.00081 0.0006 0.00052 0.00027 0.00023 0.00005
## Cumulative Proportion  0.99749 0.99830 0.9989 0.99942 0.99969 0.99992 0.99997
##                           PC29    PC30
## Standard deviation     0.02736 0.01153
## Proportion of Variance 0.00002 0.00000
## Cumulative Proportion  1.00000 1.00000
Mean
mean_pca_1 <- prcomp(wbcd[,c(2:11)], scale. = TRUE)
class(mean_pca_1)
## [1] "prcomp"
str(mean_pca_1)
## List of 5
##  $ sdev    : num [1:10] 2.341 1.587 0.938 0.706 0.61 ...
##  $ rotation: num [1:10, 1:10] -0.364 -0.154 -0.376 -0.364 -0.232 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:10] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
##   .. ..$ : chr [1:10] "PC1" "PC2" "PC3" "PC4" ...
##  $ center  : Named num [1:10] 14.1273 19.2896 91.969 654.8891 0.0964 ...
##   ..- attr(*, "names")= chr [1:10] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
##  $ scale   : Named num [1:10] 3.524 4.301 24.299 351.9141 0.0141 ...
##   ..- attr(*, "names")= chr [1:10] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
##  $ x       : num [1:569, 1:10] -5.22 -1.73 -3.97 -3.59 -3.15 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : NULL
##   .. ..$ : chr [1:10] "PC1" "PC2" "PC3" "PC4" ...
##  - attr(*, "class")= chr "prcomp"
summary(mean_pca_1)
## Importance of components:
##                           PC1    PC2     PC3    PC4     PC5     PC6     PC7
## Standard deviation     2.3406 1.5870 0.93841 0.7064 0.61036 0.35234 0.28299
## Proportion of Variance 0.5479 0.2519 0.08806 0.0499 0.03725 0.01241 0.00801
## Cumulative Proportion  0.5479 0.7997 0.88779 0.9377 0.97495 0.98736 0.99537
##                            PC8     PC9    PC10
## Standard deviation     0.18679 0.10552 0.01680
## Proportion of Variance 0.00349 0.00111 0.00003
## Cumulative Proportion  0.99886 0.99997 1.00000
SE
se_pca_1 <- prcomp(wbcd[,c(12:21)], scale. = TRUE)
class(se_pca_1)
## [1] "prcomp"
str(se_pca_1)
## List of 5
##  $ sdev    : num [1:10] 2.178 1.441 1.124 0.771 0.76 ...
##  $ rotation: num [1:10, 1:10] -0.346 -0.189 -0.357 -0.304 -0.212 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:10] "radius_se" "texture_se" "perimeter_se" "area_se" ...
##   .. ..$ : chr [1:10] "PC1" "PC2" "PC3" "PC4" ...
##  $ center  : Named num [1:10] 0.40517 1.21685 2.86606 40.33708 0.00704 ...
##   ..- attr(*, "names")= chr [1:10] "radius_se" "texture_se" "perimeter_se" "area_se" ...
##  $ scale   : Named num [1:10] 0.277 0.552 2.022 45.491 0.003 ...
##   ..- attr(*, "names")= chr [1:10] "radius_se" "texture_se" "perimeter_se" "area_se" ...
##  $ x       : num [1:569, 1:10] -4.05 0.34 -1.96 -3.79 -2.22 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : NULL
##   .. ..$ : chr [1:10] "PC1" "PC2" "PC3" "PC4" ...
##  - attr(*, "class")= chr "prcomp"
summary(se_pca_1)
## Importance of components:
##                           PC1    PC2    PC3     PC4     PC5     PC6     PC7
## Standard deviation     2.1779 1.4406 1.1245 0.77095 0.75991 0.57939 0.43512
## Proportion of Variance 0.4743 0.2075 0.1264 0.05944 0.05775 0.03357 0.01893
## Cumulative Proportion  0.4743 0.6819 0.8083 0.86774 0.92548 0.95905 0.97798
##                           PC8     PC9    PC10
## Standard deviation     0.3962 0.20436 0.14635
## Proportion of Variance 0.0157 0.00418 0.00214
## Cumulative Proportion  0.9937 0.99786 1.00000
Worst
worst_pca_1 <- prcomp(wbcd[,c(22:31)], scale. = TRUE)
class(worst_pca_1)
## [1] "prcomp"
str(worst_pca_1)
## List of 5
##  $ sdev    : num [1:10] 2.387 1.444 0.896 0.735 0.717 ...
##  $ rotation: num [1:10, 1:10] -0.336 -0.201 -0.348 -0.325 -0.249 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:10] "radius_worst" "texture_worst" "perimeter_worst" "area_worst" ...
##   .. ..$ : chr [1:10] "PC1" "PC2" "PC3" "PC4" ...
##  $ center  : Named num [1:10] 16.269 25.677 107.261 880.583 0.132 ...
##   ..- attr(*, "names")= chr [1:10] "radius_worst" "texture_worst" "perimeter_worst" "area_worst" ...
##  $ scale   : Named num [1:10] 4.8332 6.1463 33.6025 569.357 0.0228 ...
##   ..- attr(*, "names")= chr [1:10] "radius_worst" "texture_worst" "perimeter_worst" "area_worst" ...
##  $ x       : num [1:569, 1:10] -5.97 -1.82 -3.4 -6.3 -1.15 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : NULL
##   .. ..$ : chr [1:10] "PC1" "PC2" "PC3" "PC4" ...
##  - attr(*, "class")= chr "prcomp"
summary(worst_pca_1)
## Importance of components:
##                           PC1    PC2     PC3     PC4     PC5     PC6     PC7
## Standard deviation     2.3869 1.4443 0.89597 0.73531 0.71741 0.42862 0.28959
## Proportion of Variance 0.5697 0.2086 0.08028 0.05407 0.05147 0.01837 0.00839
## Cumulative Proportion  0.5697 0.7783 0.85860 0.91267 0.96413 0.98251 0.99089
##                            PC8     PC9    PC10
## Standard deviation     0.26802 0.12343 0.06326
## Proportion of Variance 0.00718 0.00152 0.00040
## Cumulative Proportion  0.99808 0.99960 1.00000

The function class() showcases that the object created by using the prcomp() function is a list of class “prcomp”. Said object/list contains the following components, as illustrated by the function str():
  • sdev: the standard deviations of the principal components (i.e., the square roots of the eigenvalues of the covariance/correlation matrix, though the calculation is actually done with the singular values of the data matrix).
  • rotation: the matrix of variable loadings (i.e., a matrix whose columns contain the eigenvectors); it is the equivalent of the loadings component of the princomp() function.
  • center: the centering used, if any, or FALSE otherwise.
  • scale: the scaling used, if any, or FALSE otherwise.
  • x: if the argument retx is TRUE, then this component holds the value of the rotated data in the form of the centered (and scaled if requested) data multiplied by the rotation matrix.

The results of applying to it the function summary() returns the standard deviation of each Principal Component as well as two core elements of the PCA: the proportion of variance and the cumulative proportion. The former indicates de percentage of the data explained by each Principal Component whereas the latter sums the proportion of variances of all the Principal Components up until the one being observed. Taking all_pca_1 as an example, this means that first PC explains about 44.27% of the data, PC1 and PC2 cover 63.24% of the data, PC1~PC3 cover 72.64% of the data and so forth.

Let’s now repeat this exercise with the princomp() function and examine the results:

All
all_pca_2 <- princomp(wbcd[,-1], cor = TRUE)
class(all_pca_2)
## [1] "princomp"
str(all_pca_2)
## List of 7
##  $ sdev    : Named num [1:30] 3.64 2.39 1.68 1.41 1.28 ...
##   ..- attr(*, "names")= chr [1:30] "Comp.1" "Comp.2" "Comp.3" "Comp.4" ...
##  $ loadings: 'loadings' num [1:30, 1:30] 0.219 0.104 0.228 0.221 0.143 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:30] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
##   .. ..$ : chr [1:30] "Comp.1" "Comp.2" "Comp.3" "Comp.4" ...
##  $ center  : Named num [1:30] 14.1273 19.2896 91.969 654.8891 0.0964 ...
##   ..- attr(*, "names")= chr [1:30] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
##  $ scale   : Named num [1:30] 3.521 4.2973 24.2776 351.6048 0.0141 ...
##   ..- attr(*, "names")= chr [1:30] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
##  $ n.obs   : int 569
##  $ scores  : num [1:569, 1:30] 9.19 2.39 5.73 7.12 3.94 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : NULL
##   .. ..$ : chr [1:30] "Comp.1" "Comp.2" "Comp.3" "Comp.4" ...
##  $ call    : language princomp(x = wbcd[, -1], cor = TRUE)
##  - attr(*, "class")= chr "princomp"
summary(all_pca_2)
## Importance of components:
##                           Comp.1    Comp.2     Comp.3     Comp.4     Comp.5
## Standard deviation     3.6443940 2.3856560 1.67867477 1.40735229 1.28402903
## Proportion of Variance 0.4427203 0.1897118 0.09393163 0.06602135 0.05495768
## Cumulative Proportion  0.4427203 0.6324321 0.72636371 0.79238506 0.84734274
##                            Comp.6     Comp.7     Comp.8     Comp.9    Comp.10
## Standard deviation     1.09879780 0.82171778 0.69037464 0.64567392 0.59219377
## Proportion of Variance 0.04024522 0.02250734 0.01588724 0.01389649 0.01168978
## Cumulative Proportion  0.88758796 0.91009530 0.92598254 0.93987903 0.95156881
##                           Comp.11     Comp.12    Comp.13     Comp.14
## Standard deviation     0.54213992 0.511039500 0.49128148 0.396244525
## Proportion of Variance 0.00979719 0.008705379 0.00804525 0.005233657
## Cumulative Proportion  0.96136600 0.970071383 0.97811663 0.983350291
##                            Comp.15     Comp.16     Comp.17     Comp.18
## Standard deviation     0.306814219 0.282600072 0.243719178 0.229387845
## Proportion of Variance 0.003137832 0.002662093 0.001979968 0.001753959
## Cumulative Proportion  0.986488123 0.989150216 0.991130184 0.992884143
##                            Comp.19     Comp.20      Comp.21      Comp.22
## Standard deviation     0.222435590 0.176520261 0.1731268145 0.1656484305
## Proportion of Variance 0.001649253 0.001038647 0.0009990965 0.0009146468
## Cumulative Proportion  0.994533397 0.995572043 0.9965711397 0.9974857865
##                             Comp.23      Comp.24      Comp.25     Comp.26
## Standard deviation     0.1560155049 0.1343689213 0.1244237573 0.090430304
## Proportion of Variance 0.0008113613 0.0006018336 0.0005160424 0.000272588
## Cumulative Proportion  0.9982971477 0.9988989813 0.9994150237 0.999687612
##                             Comp.27      Comp.28      Comp.29      Comp.30
## Standard deviation     0.0830690308 3.986650e-02 0.0273642668 1.153451e-02
## Proportion of Variance 0.0002300155 5.297793e-05 0.0000249601 4.434827e-06
## Cumulative Proportion  0.9999176271 9.999706e-01 0.9999955652 1.000000e+00
Mean
mean_pca_2 <- princomp(wbcd[,c(2:11)], cor = TRUE)
class(mean_pca_2)
## [1] "princomp"
str(mean_pca_2)
## List of 7
##  $ sdev    : Named num [1:10] 2.341 1.587 0.938 0.706 0.61 ...
##   ..- attr(*, "names")= chr [1:10] "Comp.1" "Comp.2" "Comp.3" "Comp.4" ...
##  $ loadings: 'loadings' num [1:10, 1:10] 0.364 0.154 0.376 0.364 0.232 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:10] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
##   .. ..$ : chr [1:10] "Comp.1" "Comp.2" "Comp.3" "Comp.4" ...
##  $ center  : Named num [1:10] 14.1273 19.2896 91.969 654.8891 0.0964 ...
##   ..- attr(*, "names")= chr [1:10] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
##  $ scale   : Named num [1:10] 3.521 4.2973 24.2776 351.6048 0.0141 ...
##   ..- attr(*, "names")= chr [1:10] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
##  $ n.obs   : int 569
##  $ scores  : num [1:569, 1:10] 5.22 1.73 3.97 3.6 3.15 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : NULL
##   .. ..$ : chr [1:10] "Comp.1" "Comp.2" "Comp.3" "Comp.4" ...
##  $ call    : language princomp(x = wbcd[, c(2:11)], cor = TRUE)
##  - attr(*, "class")= chr "princomp"
summary(mean_pca_2)
## Importance of components:
##                           Comp.1    Comp.2     Comp.3     Comp.4     Comp.5
## Standard deviation     2.3406384 1.5870456 0.93841099 0.70640600 0.61035989
## Proportion of Variance 0.5478588 0.2518714 0.08806152 0.04990094 0.03725392
## Cumulative Proportion  0.5478588 0.7997302 0.88779168 0.93769262 0.97494654
##                            Comp.6      Comp.7      Comp.8      Comp.9
## Standard deviation     0.35233755 0.282993481 0.186788096 0.105524692
## Proportion of Variance 0.01241417 0.008008531 0.003488979 0.001113546
## Cumulative Proportion  0.98736071 0.995369244 0.998858223 0.999971769
##                             Comp.10
## Standard deviation     1.680196e-02
## Proportion of Variance 2.823059e-05
## Cumulative Proportion  1.000000e+00
SE
se_pca_2 <- princomp(wbcd[,c(12:21)], cor = TRUE)
class(se_pca_2)
## [1] "princomp"
str(se_pca_2)
## List of 7
##  $ sdev    : Named num [1:10] 2.178 1.441 1.124 0.771 0.76 ...
##   ..- attr(*, "names")= chr [1:10] "Comp.1" "Comp.2" "Comp.3" "Comp.4" ...
##  $ loadings: 'loadings' num [1:10, 1:10] 0.346 0.189 0.357 0.304 0.212 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:10] "radius_se" "texture_se" "perimeter_se" "area_se" ...
##   .. ..$ : chr [1:10] "Comp.1" "Comp.2" "Comp.3" "Comp.4" ...
##  $ center  : Named num [1:10] 0.40517 1.21685 2.86606 40.33708 0.00704 ...
##   ..- attr(*, "names")= chr [1:10] "radius_se" "texture_se" "perimeter_se" "area_se" ...
##  $ scale   : Named num [1:10] 0.277 0.551 2.02 45.451 0.003 ...
##   ..- attr(*, "names")= chr [1:10] "radius_se" "texture_se" "perimeter_se" "area_se" ...
##  $ n.obs   : int 569
##  $ scores  : num [1:569, 1:10] 4.053 -0.341 1.961 3.795 2.219 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : NULL
##   .. ..$ : chr [1:10] "Comp.1" "Comp.2" "Comp.3" "Comp.4" ...
##  $ call    : language princomp(x = wbcd[, c(12:21)], cor = TRUE)
##  - attr(*, "class")= chr "princomp"
summary(se_pca_2)
## Importance of components:
##                          Comp.1    Comp.2    Comp.3     Comp.4     Comp.5
## Standard deviation     2.177928 1.4405579 1.1244649 0.77094730 0.75991287
## Proportion of Variance 0.474337 0.2075207 0.1264421 0.05943597 0.05774676
## Cumulative Proportion  0.474337 0.6818577 0.8082998 0.86773580 0.92548255
##                            Comp.6     Comp.7     Comp.8     Comp.9     Comp.10
## Standard deviation     0.57939475 0.43511509 0.39619334 0.20436292 0.146347852
## Proportion of Variance 0.03356983 0.01893251 0.01569692 0.00417642 0.002141769
## Cumulative Proportion  0.95905238 0.97798489 0.99368181 0.99785823 1.000000000
Worst
worst_pca_2 <- princomp(wbcd[,c(22:31)], cor = TRUE)
class(worst_pca_2)
## [1] "princomp"
str(worst_pca_2)
## List of 7
##  $ sdev    : Named num [1:10] 2.387 1.444 0.896 0.735 0.717 ...
##   ..- attr(*, "names")= chr [1:10] "Comp.1" "Comp.2" "Comp.3" "Comp.4" ...
##  $ loadings: 'loadings' num [1:10, 1:10] 0.336 0.201 0.348 0.325 0.249 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:10] "radius_worst" "texture_worst" "perimeter_worst" "area_worst" ...
##   .. ..$ : chr [1:10] "Comp.1" "Comp.2" "Comp.3" "Comp.4" ...
##  $ center  : Named num [1:10] 16.269 25.677 107.261 880.583 0.132 ...
##   ..- attr(*, "names")= chr [1:10] "radius_worst" "texture_worst" "perimeter_worst" "area_worst" ...
##  $ scale   : Named num [1:10] 4.829 6.1409 33.573 568.8565 0.0228 ...
##   ..- attr(*, "names")= chr [1:10] "radius_worst" "texture_worst" "perimeter_worst" "area_worst" ...
##  $ n.obs   : int 569
##  $ scores  : num [1:569, 1:10] 5.97 1.82 3.41 6.3 1.15 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : NULL
##   .. ..$ : chr [1:10] "Comp.1" "Comp.2" "Comp.3" "Comp.4" ...
##  $ call    : language princomp(x = wbcd[, c(22:31)], cor = TRUE)
##  - attr(*, "class")= chr "princomp"
summary(worst_pca_2)
## Importance of components:
##                           Comp.1    Comp.2     Comp.3     Comp.4     Comp.5
## Standard deviation     2.3868885 1.4442930 0.89597293 0.73531379 0.71740732
## Proportion of Variance 0.5697237 0.2085982 0.08027675 0.05406864 0.05146733
## Cumulative Proportion  0.5697237 0.7783219 0.85859864 0.91266728 0.96413461
##                            Comp.6      Comp.7     Comp.8      Comp.9
## Standard deviation     0.42862478 0.289591321 0.26801978 0.123428309
## Proportion of Variance 0.01837192 0.008386313 0.00718346 0.001523455
## Cumulative Proportion  0.98250653 0.990892839 0.99807630 0.999599754
##                             Comp.10
## Standard deviation     0.0632649639
## Proportion of Variance 0.0004002456
## Cumulative Proportion  1.0000000000

The function class() showcases that the object created by using the princomp() function is a list of class “princomp”. Said object/list contains the following components, as illustrated by the function str():
  • sdev: the standard deviations of the principal components.
  • loadings: the matrix of variable loadings (i.e., a matrix whose columns contain the eigenvectors); it is the equivalent of the rotation component of the prcomp() function.
  • center: the means that were subtracted.
  • scale: the scalings applied to each variable.
  • n.obs: the number of observations.
  • scores: if scores = TRUE then this component holds the scores of the supplied data on the principal components.
  • call: the matched call.
  • na.action: if relevant.

As can be observed, the objects created by prcomp() and princomp() differ both in class and components.

Once again, the results of applying to it the function summary() returns the standard deviation of each Principal Component as well as both the proportion of variance and the cumulative proportion. It is worth noting that the cumulative proportion with princomp() is identical to the one obtained with prcomp() - as should be.

2.2.2. PCA() from FactoMineR

The PCA() function from the FactoMineR package is formatted as follows:

# Do not run this code snippet, as it is only here for illustration purposes
library(FactoMineR)
PCA(X, 
    scale.unit = TRUE, 
    ncp = 5,
    ind.sup = NULL, 
    quanti.sup = NULL, 
    quali.sup = NULL, 
    row.w = NULL, 
    col.w = NULL,
    graph = TRUE,
    axes = c(1,2)
    )
Let’s detail its most notable arguments:
  • X: the numeric matrix or dataset/dataframe upon which to perform the PCA.
  • scale.unit: TRUE of FALSE determines whether or not to scale (i.e. standardize) the dataset/dataframe variables.
  • ncp: number of dimensions kept in the final results - for illustration purposes all the PCA functions will use as many dimensions (Principal Components) as original variables since such is the behavior of the functions covered until this point and, since Principal Components are sorted by relevance, one can always ignore the extra PCs later on (meaning that selecting a large-ish number at this point does not make much of a difference).
  • ind.sup: a vector indicating the indexes of the supplementary individuals.
  • quanti.sup: a vector indicating the indexes of the quantitative supplementary variables.
  • quali.sup: a vector indicating the indexes of the categorical supplementary variables.
  • row.w: an optional row weights.
  • col.w: an optional column weights.
  • graph: TRUE of FALSE determines whether or not to display the PCA’s associated graph.
  • axes: a length 2 vector specifying the components to plot.

More information regarding the PCA() function and its arguments is available in its associated RDocumentation page: https://www.rdocumentation.org/packages/FactoMineR/versions/2.4/topics/PCA

Let’s now evaluate the results when applying PCA() to the Wisconsin Breast Cancer Dataset:

All
all_pca_3 <- PCA(wbcd[,-1], scale.unit = TRUE, ncp = 30, graph = FALSE)
class(all_pca_3)
## [1] "PCA"  "list"
str(all_pca_3)
## List of 5
##  $ eig : num [1:30, 1:3] 13.28 5.69 2.82 1.98 1.65 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:30] "comp 1" "comp 2" "comp 3" "comp 4" ...
##   .. ..$ : chr [1:3] "eigenvalue" "percentage of variance" "cumulative percentage of variance"
##  $ var :List of 4
##   ..$ coord  : num [1:30, 1:30] 0.798 0.378 0.829 0.805 0.52 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:30] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
##   .. .. ..$ : chr [1:30] "Dim.1" "Dim.2" "Dim.3" "Dim.4" ...
##   ..$ cor    : num [1:30, 1:30] 0.798 0.378 0.829 0.805 0.52 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:30] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
##   .. .. ..$ : chr [1:30] "Dim.1" "Dim.2" "Dim.3" "Dim.4" ...
##   ..$ cos2   : num [1:30, 1:30] 0.636 0.143 0.688 0.649 0.27 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:30] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
##   .. .. ..$ : chr [1:30] "Dim.1" "Dim.2" "Dim.3" "Dim.4" ...
##   ..$ contrib: num [1:30, 1:30] 4.79 1.08 5.18 4.88 2.03 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:30] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
##   .. .. ..$ : chr [1:30] "Dim.1" "Dim.2" "Dim.3" "Dim.4" ...
##  $ ind :List of 4
##   ..$ coord  : num [1:569, 1:30] 9.19 2.39 5.73 7.12 3.94 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:569] "1" "2" "3" "4" ...
##   .. .. ..$ : chr [1:30] "Dim.1" "Dim.2" "Dim.3" "Dim.4" ...
##   ..$ cos2   : num [1:569, 1:30] 0.737 0.216 0.878 0.259 0.45 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:569] "1" "2" "3" "4" ...
##   .. .. ..$ : chr [1:30] "Dim.1" "Dim.2" "Dim.3" "Dim.4" ...
##   ..$ contrib: num [1:569, 1:30] 1.1182 0.0754 0.435 0.6714 0.2049 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:569] "1" "2" "3" "4" ...
##   .. .. ..$ : chr [1:30] "Dim.1" "Dim.2" "Dim.3" "Dim.4" ...
##   ..$ dist   : Named num [1:569] 10.71 5.13 6.12 13.99 5.87 ...
##   .. ..- attr(*, "names")= chr [1:569] "1" "2" "3" "4" ...
##  $ svd :List of 3
##   ..$ vs: num [1:30] 3.64 2.39 1.68 1.41 1.28 ...
##   ..$ U : num [1:569, 1:30] 2.522 0.655 1.573 1.954 1.08 ...
##   ..$ V : num [1:30, 1:30] 0.219 0.104 0.228 0.221 0.143 ...
##  $ call:List of 9
##   ..$ row.w     : num [1:569] 0.00176 0.00176 0.00176 0.00176 0.00176 ...
##   ..$ col.w     : num [1:30] 1 1 1 1 1 1 1 1 1 1 ...
##   ..$ scale.unit: logi TRUE
##   ..$ ncp       : num 30
##   ..$ centre    : num [1:30] 14.1273 19.2896 91.969 654.8891 0.0964 ...
##   ..$ ecart.type: num [1:30] 3.521 4.2973 24.2776 351.6048 0.0141 ...
##   ..$ X         :'data.frame':   569 obs. of  30 variables:
##   .. ..$ radius_mean            : num [1:569] 18 20.6 19.7 11.4 20.3 ...
##   .. ..$ texture_mean           : num [1:569] 10.4 17.8 21.2 20.4 14.3 ...
##   .. ..$ perimeter_mean         : num [1:569] 122.8 132.9 130 77.6 135.1 ...
##   .. ..$ area_mean              : num [1:569] 1001 1326 1203 386 1297 ...
##   .. ..$ smoothness_mean        : num [1:569] 0.1184 0.0847 0.1096 0.1425 0.1003 ...
##   .. ..$ compactness_mean       : num [1:569] 0.2776 0.0786 0.1599 0.2839 0.1328 ...
##   .. ..$ concavity_mean         : num [1:569] 0.3001 0.0869 0.1974 0.2414 0.198 ...
##   .. ..$ concave points_mean    : num [1:569] 0.1471 0.0702 0.1279 0.1052 0.1043 ...
##   .. ..$ symmetry_mean          : num [1:569] 0.242 0.181 0.207 0.26 0.181 ...
##   .. ..$ fractal_dimension_mean : num [1:569] 0.0787 0.0567 0.06 0.0974 0.0588 ...
##   .. ..$ radius_se              : num [1:569] 1.095 0.543 0.746 0.496 0.757 ...
##   .. ..$ texture_se             : num [1:569] 0.905 0.734 0.787 1.156 0.781 ...
##   .. ..$ perimeter_se           : num [1:569] 8.59 3.4 4.58 3.44 5.44 ...
##   .. ..$ area_se                : num [1:569] 153.4 74.1 94 27.2 94.4 ...
##   .. ..$ smoothness_se          : num [1:569] 0.0064 0.00522 0.00615 0.00911 0.01149 ...
##   .. ..$ compactness_se         : num [1:569] 0.049 0.0131 0.0401 0.0746 0.0246 ...
##   .. ..$ concavity_se           : num [1:569] 0.0537 0.0186 0.0383 0.0566 0.0569 ...
##   .. ..$ concave points_se      : num [1:569] 0.0159 0.0134 0.0206 0.0187 0.0188 ...
##   .. ..$ symmetry_se            : num [1:569] 0.03 0.0139 0.0225 0.0596 0.0176 ...
##   .. ..$ fractal_dimension_se   : num [1:569] 0.00619 0.00353 0.00457 0.00921 0.00511 ...
##   .. ..$ radius_worst           : num [1:569] 25.4 25 23.6 14.9 22.5 ...
##   .. ..$ texture_worst          : num [1:569] 17.3 23.4 25.5 26.5 16.7 ...
##   .. ..$ perimeter_worst        : num [1:569] 184.6 158.8 152.5 98.9 152.2 ...
##   .. ..$ area_worst             : num [1:569] 2019 1956 1709 568 1575 ...
##   .. ..$ smoothness_worst       : num [1:569] 0.162 0.124 0.144 0.21 0.137 ...
##   .. ..$ compactness_worst      : num [1:569] 0.666 0.187 0.424 0.866 0.205 ...
##   .. ..$ concavity_worst        : num [1:569] 0.712 0.242 0.45 0.687 0.4 ...
##   .. ..$ concave points_worst   : num [1:569] 0.265 0.186 0.243 0.258 0.163 ...
##   .. ..$ symmetry_worst         : num [1:569] 0.46 0.275 0.361 0.664 0.236 ...
##   .. ..$ fractal_dimension_worst: num [1:569] 0.1189 0.089 0.0876 0.173 0.0768 ...
##   ..$ row.w.init: num [1:569] 1 1 1 1 1 1 1 1 1 1 ...
##   ..$ call      : language PCA(X = wbcd[, -1], scale.unit = TRUE, ncp = 30, graph = FALSE)
##  - attr(*, "class")= chr [1:2] "PCA" "list"
summary(all_pca_3)
## 
## Call:
## PCA(X = wbcd[, -1], scale.unit = TRUE, ncp = 30, graph = FALSE) 
## 
## 
## Eigenvalues
##                        Dim.1   Dim.2   Dim.3   Dim.4   Dim.5   Dim.6   Dim.7
## Variance              13.282   5.691   2.818   1.981   1.649   1.207   0.675
## % of var.             44.272  18.971   9.393   6.602   5.496   4.025   2.251
## Cumulative % of var.  44.272  63.243  72.636  79.239  84.734  88.759  91.010
##                        Dim.8   Dim.9  Dim.10  Dim.11  Dim.12  Dim.13  Dim.14
## Variance               0.477   0.417   0.351   0.294   0.261   0.241   0.157
## % of var.              1.589   1.390   1.169   0.980   0.871   0.805   0.523
## Cumulative % of var.  92.598  93.988  95.157  96.137  97.007  97.812  98.335
##                       Dim.15  Dim.16  Dim.17  Dim.18  Dim.19  Dim.20  Dim.21
## Variance               0.094   0.080   0.059   0.053   0.049   0.031   0.030
## % of var.              0.314   0.266   0.198   0.175   0.165   0.104   0.100
## Cumulative % of var.  98.649  98.915  99.113  99.288  99.453  99.557  99.657
##                       Dim.22  Dim.23  Dim.24  Dim.25  Dim.26  Dim.27  Dim.28
## Variance               0.027   0.024   0.018   0.015   0.008   0.007   0.002
## % of var.              0.091   0.081   0.060   0.052   0.027   0.023   0.005
## Cumulative % of var.  99.749  99.830  99.890  99.942  99.969  99.992  99.997
##                       Dim.29  Dim.30
## Variance               0.001   0.000
## % of var.              0.002   0.000
## Cumulative % of var. 100.000 100.000
## 
## Individuals (the 10 first)
##                             Dist    Dim.1    ctr   cos2    Dim.2    ctr   cos2
## 1                       | 10.710 |  9.193  1.118  0.737 |  1.949  0.117  0.033
## 2                       |  5.132 |  2.388  0.075  0.216 | -3.768  0.438  0.539
## 3                       |  6.119 |  5.734  0.435  0.878 | -1.075  0.036  0.031
## 4                       | 13.986 |  7.123  0.671  0.259 | 10.276  3.261  0.540
## 5                       |  5.868 |  3.935  0.205  0.450 | -1.948  0.117  0.110
## 6                       |  5.735 |  2.380  0.075  0.172 |  3.950  0.482  0.474
## 7                       |  3.970 |  2.239  0.066  0.318 | -2.690  0.223  0.459
## 8                       |  4.195 |  2.143  0.061  0.261 |  2.340  0.169  0.311
## 9                       |  6.017 |  3.175  0.133  0.278 |  3.392  0.355  0.318
## 10                      | 12.163 |  6.352  0.534  0.273 |  7.727  1.844  0.404
##                            Dim.3    ctr   cos2  
## 1                       | -1.123  0.079  0.011 |
## 2                       | -0.529  0.017  0.011 |
## 3                       | -0.552  0.019  0.008 |
## 4                       | -3.233  0.652  0.053 |
## 5                       |  1.390  0.120  0.056 |
## 6                       | -2.935  0.537  0.262 |
## 7                       | -1.640  0.168  0.171 |
## 8                       | -0.872  0.047  0.043 |
## 9                       | -3.120  0.607  0.269 |
## 10                      | -4.342  1.176  0.127 |
## 
## Variables (the 10 first)
##                            Dim.1    ctr   cos2    Dim.2    ctr   cos2    Dim.3
## radius_mean             |  0.798  4.792  0.636 | -0.558  5.469  0.311 | -0.014
## texture_mean            |  0.378  1.076  0.143 | -0.142  0.356  0.020 |  0.108
## perimeter_mean          |  0.829  5.177  0.688 | -0.513  4.630  0.264 | -0.016
## area_mean               |  0.805  4.884  0.649 | -0.551  5.340  0.304 |  0.048
## smoothness_mean         |  0.520  2.033  0.270 |  0.444  3.464  0.197 | -0.175
## compactness_mean        |  0.872  5.726  0.760 |  0.362  2.307  0.131 | -0.124
## concavity_mean          |  0.942  6.677  0.887 |  0.144  0.362  0.021 |  0.005
## concave points_mean     |  0.951  6.804  0.904 | -0.083  0.121  0.007 | -0.043
## symmetry_mean           |  0.504  1.909  0.254 |  0.454  3.623  0.206 | -0.068
## fractal_dimension_mean  |  0.235  0.414  0.055 |  0.875 13.438  0.765 | -0.038
##                            ctr   cos2  
## radius_mean              0.007  0.000 |
## texture_mean             0.417  0.012 |
## perimeter_mean           0.009  0.000 |
## area_mean                0.082  0.002 |
## smoothness_mean          1.088  0.031 |
## compactness_mean         0.549  0.015 |
## concavity_mean           0.001  0.000 |
## concave points_mean      0.065  0.002 |
## symmetry_mean            0.162  0.005 |
## fractal_dimension_mean   0.051  0.001 |
Mean
mean_pca_3 <- PCA(wbcd[,c(2:11)], scale.unit = TRUE, ncp = 30, graph = FALSE)
class(mean_pca_3)
## [1] "PCA"  "list"
str(mean_pca_3)
## List of 5
##  $ eig : num [1:10, 1:3] 5.479 2.519 0.881 0.499 0.373 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:10] "comp 1" "comp 2" "comp 3" "comp 4" ...
##   .. ..$ : chr [1:3] "eigenvalue" "percentage of variance" "cumulative percentage of variance"
##  $ var :List of 4
##   ..$ coord  : num [1:10, 1:10] 0.852 0.362 0.88 0.852 0.544 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:10] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
##   .. .. ..$ : chr [1:10] "Dim.1" "Dim.2" "Dim.3" "Dim.4" ...
##   ..$ cor    : num [1:10, 1:10] 0.852 0.362 0.88 0.852 0.544 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:10] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
##   .. .. ..$ : chr [1:10] "Dim.1" "Dim.2" "Dim.3" "Dim.4" ...
##   ..$ cos2   : num [1:10, 1:10] 0.726 0.131 0.775 0.726 0.296 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:10] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
##   .. .. ..$ : chr [1:10] "Dim.1" "Dim.2" "Dim.3" "Dim.4" ...
##   ..$ contrib: num [1:10, 1:10] 13.25 2.39 14.14 13.26 5.4 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:10] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
##   .. .. ..$ : chr [1:10] "Dim.1" "Dim.2" "Dim.3" "Dim.4" ...
##  $ ind :List of 4
##   ..$ coord  : num [1:569, 1:10] 5.22 1.73 3.97 3.6 3.15 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:569] "1" "2" "3" "4" ...
##   .. .. ..$ : chr [1:10] "Dim.1" "Dim.2" "Dim.3" "Dim.4" ...
##   ..$ cos2   : num [1:569, 1:10] 0.609 0.25 0.947 0.208 0.641 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:569] "1" "2" "3" "4" ...
##   .. .. ..$ : chr [1:10] "Dim.1" "Dim.2" "Dim.3" "Dim.4" ...
##   ..$ contrib: num [1:569, 1:10] 0.8755 0.0958 0.5055 0.415 0.3185 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:569] "1" "2" "3" "4" ...
##   .. .. ..$ : chr [1:10] "Dim.1" "Dim.2" "Dim.3" "Dim.4" ...
##   ..$ dist   : Named num [1:569] 6.69 3.45 4.08 7.88 3.94 ...
##   .. ..- attr(*, "names")= chr [1:569] "1" "2" "3" "4" ...
##  $ svd :List of 3
##   ..$ vs: num [1:10] 2.341 1.587 0.938 0.706 0.61 ...
##   ..$ U : num [1:569, 1:10] 2.232 0.738 1.696 1.537 1.346 ...
##   ..$ V : num [1:10, 1:10] 0.364 0.154 0.376 0.364 0.232 ...
##  $ call:List of 9
##   ..$ row.w     : num [1:569] 0.00176 0.00176 0.00176 0.00176 0.00176 ...
##   ..$ col.w     : num [1:10] 1 1 1 1 1 1 1 1 1 1
##   ..$ scale.unit: logi TRUE
##   ..$ ncp       : num 10
##   ..$ centre    : num [1:10] 14.1273 19.2896 91.969 654.8891 0.0964 ...
##   ..$ ecart.type: num [1:10] 3.521 4.2973 24.2776 351.6048 0.0141 ...
##   ..$ X         :'data.frame':   569 obs. of  10 variables:
##   .. ..$ radius_mean           : num [1:569] 18 20.6 19.7 11.4 20.3 ...
##   .. ..$ texture_mean          : num [1:569] 10.4 17.8 21.2 20.4 14.3 ...
##   .. ..$ perimeter_mean        : num [1:569] 122.8 132.9 130 77.6 135.1 ...
##   .. ..$ area_mean             : num [1:569] 1001 1326 1203 386 1297 ...
##   .. ..$ smoothness_mean       : num [1:569] 0.1184 0.0847 0.1096 0.1425 0.1003 ...
##   .. ..$ compactness_mean      : num [1:569] 0.2776 0.0786 0.1599 0.2839 0.1328 ...
##   .. ..$ concavity_mean        : num [1:569] 0.3001 0.0869 0.1974 0.2414 0.198 ...
##   .. ..$ concave points_mean   : num [1:569] 0.1471 0.0702 0.1279 0.1052 0.1043 ...
##   .. ..$ symmetry_mean         : num [1:569] 0.242 0.181 0.207 0.26 0.181 ...
##   .. ..$ fractal_dimension_mean: num [1:569] 0.0787 0.0567 0.06 0.0974 0.0588 ...
##   ..$ row.w.init: num [1:569] 1 1 1 1 1 1 1 1 1 1 ...
##   ..$ call      : language PCA(X = wbcd[, c(2:11)], scale.unit = TRUE, ncp = 30, graph = FALSE)
##  - attr(*, "class")= chr [1:2] "PCA" "list"
summary(mean_pca_3)
## 
## Call:
## PCA(X = wbcd[, c(2:11)], scale.unit = TRUE, ncp = 30, graph = FALSE) 
## 
## 
## Eigenvalues
##                        Dim.1   Dim.2   Dim.3   Dim.4   Dim.5   Dim.6   Dim.7
## Variance               5.479   2.519   0.881   0.499   0.373   0.124   0.080
## % of var.             54.786  25.187   8.806   4.990   3.725   1.241   0.801
## Cumulative % of var.  54.786  79.973  88.779  93.769  97.495  98.736  99.537
##                        Dim.8   Dim.9  Dim.10
## Variance               0.035   0.011   0.000
## % of var.              0.349   0.111   0.003
## Cumulative % of var.  99.886  99.997 100.000
## 
## Individuals (the 10 first)
##                            Dist    Dim.1    ctr   cos2    Dim.2    ctr   cos2  
## 1                      |  6.692 |  5.224  0.875  0.609 |  3.204  0.716  0.229 |
## 2                      |  3.455 |  1.728  0.096  0.250 | -2.541  0.450  0.541 |
## 3                      |  4.079 |  3.970  0.506  0.947 | -0.550  0.021  0.018 |
## 4                      |  7.878 |  3.597  0.415  0.208 |  6.905  3.327  0.768 |
## 5                      |  3.935 |  3.151  0.319  0.641 | -1.358  0.129  0.119 |
## 6                      |  3.728 |  1.381  0.061  0.137 |  3.314  0.767  0.790 |
## 7                      |  2.238 |  1.602  0.082  0.512 | -1.499  0.157  0.448 |
## 8                      |  2.980 |  1.257  0.051  0.178 |  2.495  0.434  0.701 |
## 9                      |  4.179 |  2.390  0.183  0.327 |  3.275  0.748  0.614 |
## 10                     |  4.815 |  2.445  0.192  0.258 |  3.626  0.917  0.567 |
##                         Dim.3    ctr   cos2  
## 1                      -2.171  0.941  0.105 |
## 2                      -1.020  0.208  0.087 |
## 3                      -0.324  0.021  0.006 |
## 4                       0.793  0.125  0.010 |
## 5                      -1.862  0.692  0.224 |
## 6                      -0.698  0.097  0.035 |
## 7                      -0.353  0.025  0.025 |
## 8                       0.414  0.034  0.019 |
## 9                       0.622  0.077  0.022 |
## 10                      1.449  0.419  0.091 |
## 
## Variables
##                           Dim.1    ctr   cos2    Dim.2    ctr   cos2    Dim.3
## radius_mean            |  0.852 13.245  0.726 | -0.498  9.855  0.248 | -0.117
## texture_mean           |  0.362  2.386  0.131 | -0.234  2.166  0.055 |  0.892
## perimeter_mean         |  0.880 14.141  0.775 | -0.452  8.103  0.204 | -0.107
## area_mean              |  0.852 13.256  0.726 | -0.484  9.293  0.234 | -0.116
## smoothness_mean        |  0.544  5.405  0.296 |  0.638 16.157  0.407 | -0.156
## compactness_mean       |  0.853 13.282  0.728 |  0.422  7.076  0.178 |  0.055
## concavity_mean         |  0.926 15.662  0.858 |  0.166  1.088  0.027 |  0.039
## concave points_mean    |  0.978 17.476  0.957 |  0.011  0.005  0.000 | -0.064
## symmetry_mean          |  0.504  4.633  0.254 |  0.585 13.565  0.342 |  0.034
## fractal_dimension_mean |  0.168  0.516  0.028 |  0.907 32.692  0.823 |  0.107
##                           ctr   cos2  
## radius_mean             1.548  0.014 |
## texture_mean           90.451  0.797 |
## perimeter_mean          1.302  0.011 |
## area_mean               1.522  0.013 |
## smoothness_mean         2.773  0.024 |
## compactness_mean        0.340  0.003 |
## concavity_mean          0.169  0.001 |
## concave points_mean     0.470  0.004 |
## symmetry_mean           0.135  0.001 |
## fractal_dimension_mean  1.290  0.011 |
SE
se_pca_3 <- PCA(wbcd[,c(12:21)], scale.unit = TRUE, ncp = 30, graph = FALSE)
class(se_pca_3)
## [1] "PCA"  "list"
str(se_pca_3)
## List of 5
##  $ eig : num [1:10, 1:3] 4.743 2.075 1.264 0.594 0.577 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:10] "comp 1" "comp 2" "comp 3" "comp 4" ...
##   .. ..$ : chr [1:3] "eigenvalue" "percentage of variance" "cumulative percentage of variance"
##  $ var :List of 4
##   ..$ coord  : num [1:10, 1:10] 0.753 0.411 0.779 0.662 0.463 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:10] "radius_se" "texture_se" "perimeter_se" "area_se" ...
##   .. .. ..$ : chr [1:10] "Dim.1" "Dim.2" "Dim.3" "Dim.4" ...
##   ..$ cor    : num [1:10, 1:10] 0.753 0.411 0.779 0.662 0.463 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:10] "radius_se" "texture_se" "perimeter_se" "area_se" ...
##   .. .. ..$ : chr [1:10] "Dim.1" "Dim.2" "Dim.3" "Dim.4" ...
##   ..$ cos2   : num [1:10, 1:10] 0.567 0.169 0.606 0.438 0.214 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:10] "radius_se" "texture_se" "perimeter_se" "area_se" ...
##   .. .. ..$ : chr [1:10] "Dim.1" "Dim.2" "Dim.3" "Dim.4" ...
##   ..$ contrib: num [1:10, 1:10] 11.94 3.56 12.78 9.24 4.51 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:10] "radius_se" "texture_se" "perimeter_se" "area_se" ...
##   .. .. ..$ : chr [1:10] "Dim.1" "Dim.2" "Dim.3" "Dim.4" ...
##  $ ind :List of 4
##   ..$ coord  : num [1:569, 1:10] 4.053 -0.341 1.961 3.795 2.219 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:569] "1" "2" "3" "4" ...
##   .. .. ..$ : chr [1:10] "Dim.1" "Dim.2" "Dim.3" "Dim.4" ...
##   ..$ cos2   : num [1:569, 1:10] 0.6413 0.0341 0.5336 0.3914 0.4996 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:569] "1" "2" "3" "4" ...
##   .. .. ..$ : chr [1:10] "Dim.1" "Dim.2" "Dim.3" "Dim.4" ...
##   ..$ contrib: num [1:569, 1:10] 0.6086 0.0043 0.1425 0.5336 0.1824 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:569] "1" "2" "3" "4" ...
##   .. .. ..$ : chr [1:10] "Dim.1" "Dim.2" "Dim.3" "Dim.4" ...
##   ..$ dist   : Named num [1:569] 5.06 1.85 2.68 6.07 3.14 ...
##   .. ..- attr(*, "names")= chr [1:569] "1" "2" "3" "4" ...
##  $ svd :List of 3
##   ..$ vs: num [1:10] 2.178 1.441 1.124 0.771 0.76 ...
##   ..$ U : num [1:569, 1:10] 1.861 -0.156 0.9 1.742 1.019 ...
##   ..$ V : num [1:10, 1:10] 0.346 0.189 0.357 0.304 0.212 ...
##  $ call:List of 9
##   ..$ row.w     : num [1:569] 0.00176 0.00176 0.00176 0.00176 0.00176 ...
##   ..$ col.w     : num [1:10] 1 1 1 1 1 1 1 1 1 1
##   ..$ scale.unit: logi TRUE
##   ..$ ncp       : num 10
##   ..$ centre    : num [1:10] 0.40517 1.21685 2.86606 40.33708 0.00704 ...
##   ..$ ecart.type: num [1:10] 0.277 0.551 2.02 45.451 0.003 ...
##   ..$ X         :'data.frame':   569 obs. of  10 variables:
##   .. ..$ radius_se           : num [1:569] 1.095 0.543 0.746 0.496 0.757 ...
##   .. ..$ texture_se          : num [1:569] 0.905 0.734 0.787 1.156 0.781 ...
##   .. ..$ perimeter_se        : num [1:569] 8.59 3.4 4.58 3.44 5.44 ...
##   .. ..$ area_se             : num [1:569] 153.4 74.1 94 27.2 94.4 ...
##   .. ..$ smoothness_se       : num [1:569] 0.0064 0.00522 0.00615 0.00911 0.01149 ...
##   .. ..$ compactness_se      : num [1:569] 0.049 0.0131 0.0401 0.0746 0.0246 ...
##   .. ..$ concavity_se        : num [1:569] 0.0537 0.0186 0.0383 0.0566 0.0569 ...
##   .. ..$ concave points_se   : num [1:569] 0.0159 0.0134 0.0206 0.0187 0.0188 ...
##   .. ..$ symmetry_se         : num [1:569] 0.03 0.0139 0.0225 0.0596 0.0176 ...
##   .. ..$ fractal_dimension_se: num [1:569] 0.00619 0.00353 0.00457 0.00921 0.00511 ...
##   ..$ row.w.init: num [1:569] 1 1 1 1 1 1 1 1 1 1 ...
##   ..$ call      : language PCA(X = wbcd[, c(12:21)], scale.unit = TRUE, ncp = 30, graph = FALSE)
##  - attr(*, "class")= chr [1:2] "PCA" "list"
summary(se_pca_3)
## 
## Call:
## PCA(X = wbcd[, c(12:21)], scale.unit = TRUE, ncp = 30, graph = FALSE) 
## 
## 
## Eigenvalues
##                        Dim.1   Dim.2   Dim.3   Dim.4   Dim.5   Dim.6   Dim.7
## Variance               4.743   2.075   1.264   0.594   0.577   0.336   0.189
## % of var.             47.434  20.752  12.644   5.944   5.775   3.357   1.893
## Cumulative % of var.  47.434  68.186  80.830  86.774  92.548  95.905  97.798
##                        Dim.8   Dim.9  Dim.10
## Variance               0.157   0.042   0.021
## % of var.              1.570   0.418   0.214
## Cumulative % of var.  99.368  99.786 100.000
## 
## Individuals (the 10 first)
##                          Dist    Dim.1    ctr   cos2    Dim.2    ctr   cos2  
## 1                    |  5.061 |  4.053  0.609  0.641 | -2.587  0.567  0.261 |
## 2                    |  1.845 | -0.341  0.004  0.034 | -1.442  0.176  0.611 |
## 3                    |  2.685 |  1.961  0.142  0.534 | -1.172  0.116  0.190 |
## 4                    |  6.066 |  3.795  0.534  0.391 |  2.660  0.599  0.192 |
## 5                    |  3.139 |  2.219  0.182  0.500 | -1.030  0.090  0.108 |
## 6                    |  1.054 |  0.019  0.000  0.000 |  0.681  0.039  0.417 |
## 7                    |  1.801 | -0.986  0.036  0.300 | -1.279  0.139  0.504 |
## 8                    |  1.517 |  0.873  0.028  0.331 | -0.274  0.006  0.033 |
## 9                    |  0.981 | -0.187  0.001  0.036 |  0.430  0.016  0.192 |
## 10                   |  3.993 |  2.127  0.168  0.284 |  2.428  0.499  0.370 |
##                       Dim.3    ctr   cos2  
## 1                    -0.385  0.021  0.006 |
## 2                    -0.772  0.083  0.175 |
## 3                    -0.966  0.130  0.129 |
## 4                     0.748  0.078  0.015 |
## 5                    -0.403  0.023  0.016 |
## 6                    -0.510  0.036  0.234 |
## 7                    -0.769  0.082  0.182 |
## 8                     0.010  0.000  0.000 |
## 9                    -0.613  0.052  0.390 |
## 10                   -1.471  0.301  0.136 |
## 
## Variables
##                         Dim.1    ctr   cos2    Dim.2    ctr   cos2    Dim.3
## radius_se            |  0.753 11.943  0.567 | -0.634 19.391  0.402 |  0.091
## texture_se           |  0.411  3.557  0.169 |  0.221  2.353  0.049 |  0.665
## perimeter_se         |  0.779 12.779  0.606 | -0.605 17.665  0.367 |  0.066
## area_se              |  0.662  9.243  0.438 | -0.721 25.021  0.519 |  0.028
## smoothness_se        |  0.463  4.514  0.214 |  0.390  7.342  0.152 |  0.481
## compactness_se       |  0.816 14.047  0.666 |  0.350  5.887  0.122 | -0.289
## concavity_se         |  0.774 12.642  0.600 |  0.330  5.250  0.109 | -0.380
## concave points_se    |  0.840 14.880  0.706 |  0.122  0.722  0.015 | -0.258
## symmetry_se          |  0.515  5.585  0.265 |  0.286  3.943  0.082 |  0.494
## fractal_dimension_se |  0.716 10.810  0.513 |  0.508 12.426  0.258 | -0.197
##                         ctr   cos2  
## radius_se             0.653  0.008 |
## texture_se           34.991  0.442 |
## perimeter_se          0.345  0.004 |
## area_se               0.062  0.001 |
## smoothness_se        18.274  0.231 |
## compactness_se        6.595  0.083 |
## concavity_se         11.438  0.145 |
## concave points_se     5.270  0.067 |
## symmetry_se          19.300  0.244 |
## fractal_dimension_se  3.073  0.039 |
Worst
worst_pca_3 <- PCA(wbcd[,c(22:31)], scale.unit = TRUE, ncp = 30, graph = FALSE)
class(worst_pca_3)
## [1] "PCA"  "list"
str(worst_pca_3)
## List of 5
##  $ eig : num [1:10, 1:3] 5.697 2.086 0.803 0.541 0.515 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:10] "comp 1" "comp 2" "comp 3" "comp 4" ...
##   .. ..$ : chr [1:3] "eigenvalue" "percentage of variance" "cumulative percentage of variance"
##  $ var :List of 4
##   ..$ coord  : num [1:10, 1:10] 0.802 0.479 0.831 0.775 0.593 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:10] "radius_worst" "texture_worst" "perimeter_worst" "area_worst" ...
##   .. .. ..$ : chr [1:10] "Dim.1" "Dim.2" "Dim.3" "Dim.4" ...
##   ..$ cor    : num [1:10, 1:10] 0.802 0.479 0.831 0.775 0.593 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:10] "radius_worst" "texture_worst" "perimeter_worst" "area_worst" ...
##   .. .. ..$ : chr [1:10] "Dim.1" "Dim.2" "Dim.3" "Dim.4" ...
##   ..$ cos2   : num [1:10, 1:10] 0.643 0.23 0.691 0.601 0.352 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:10] "radius_worst" "texture_worst" "perimeter_worst" "area_worst" ...
##   .. .. ..$ : chr [1:10] "Dim.1" "Dim.2" "Dim.3" "Dim.4" ...
##   ..$ contrib: num [1:10, 1:10] 11.28 4.03 12.12 10.55 6.18 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:10] "radius_worst" "texture_worst" "perimeter_worst" "area_worst" ...
##   .. .. ..$ : chr [1:10] "Dim.1" "Dim.2" "Dim.3" "Dim.4" ...
##  $ ind :List of 4
##   ..$ coord  : num [1:569, 1:10] 5.97 1.82 3.41 6.3 1.15 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:569] "1" "2" "3" "4" ...
##   .. .. ..$ : chr [1:10] "Dim.1" "Dim.2" "Dim.3" "Dim.4" ...
##   ..$ cos2   : num [1:569, 1:10] 0.805 0.301 0.854 0.411 0.145 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:569] "1" "2" "3" "4" ...
##   .. .. ..$ : chr [1:10] "Dim.1" "Dim.2" "Dim.3" "Dim.4" ...
##   ..$ contrib: num [1:569, 1:10] 1.1011 0.102 0.3582 1.2262 0.0406 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:569] "1" "2" "3" "4" ...
##   .. .. ..$ : chr [1:10] "Dim.1" "Dim.2" "Dim.3" "Dim.4" ...
##   ..$ dist   : Named num [1:569] 6.66 3.32 3.69 9.84 3.01 ...
##   .. ..- attr(*, "names")= chr [1:569] "1" "2" "3" "4" ...
##  $ svd :List of 3
##   ..$ vs: num [1:10] 2.387 1.444 0.896 0.735 0.717 ...
##   ..$ U : num [1:569, 1:10] 2.503 0.762 1.428 2.641 0.48 ...
##   ..$ V : num [1:10, 1:10] 0.336 0.201 0.348 0.325 0.249 ...
##  $ call:List of 9
##   ..$ row.w     : num [1:569] 0.00176 0.00176 0.00176 0.00176 0.00176 ...
##   ..$ col.w     : num [1:10] 1 1 1 1 1 1 1 1 1 1
##   ..$ scale.unit: logi TRUE
##   ..$ ncp       : num 10
##   ..$ centre    : num [1:10] 16.269 25.677 107.261 880.583 0.132 ...
##   ..$ ecart.type: num [1:10] 4.829 6.1409 33.573 568.8565 0.0228 ...
##   ..$ X         :'data.frame':   569 obs. of  10 variables:
##   .. ..$ radius_worst           : num [1:569] 25.4 25 23.6 14.9 22.5 ...
##   .. ..$ texture_worst          : num [1:569] 17.3 23.4 25.5 26.5 16.7 ...
##   .. ..$ perimeter_worst        : num [1:569] 184.6 158.8 152.5 98.9 152.2 ...
##   .. ..$ area_worst             : num [1:569] 2019 1956 1709 568 1575 ...
##   .. ..$ smoothness_worst       : num [1:569] 0.162 0.124 0.144 0.21 0.137 ...
##   .. ..$ compactness_worst      : num [1:569] 0.666 0.187 0.424 0.866 0.205 ...
##   .. ..$ concavity_worst        : num [1:569] 0.712 0.242 0.45 0.687 0.4 ...
##   .. ..$ concave points_worst   : num [1:569] 0.265 0.186 0.243 0.258 0.163 ...
##   .. ..$ symmetry_worst         : num [1:569] 0.46 0.275 0.361 0.664 0.236 ...
##   .. ..$ fractal_dimension_worst: num [1:569] 0.1189 0.089 0.0876 0.173 0.0768 ...
##   ..$ row.w.init: num [1:569] 1 1 1 1 1 1 1 1 1 1 ...
##   ..$ call      : language PCA(X = wbcd[, c(22:31)], scale.unit = TRUE, ncp = 30, graph = FALSE)
##  - attr(*, "class")= chr [1:2] "PCA" "list"
summary(worst_pca_3)
## 
## Call:
## PCA(X = wbcd[, c(22:31)], scale.unit = TRUE, ncp = 30, graph = FALSE) 
## 
## 
## Eigenvalues
##                        Dim.1   Dim.2   Dim.3   Dim.4   Dim.5   Dim.6   Dim.7
## Variance               5.697   2.086   0.803   0.541   0.515   0.184   0.084
## % of var.             56.972  20.860   8.028   5.407   5.147   1.837   0.839
## Cumulative % of var.  56.972  77.832  85.860  91.267  96.413  98.251  99.089
##                        Dim.8   Dim.9  Dim.10
## Variance               0.072   0.015   0.004
## % of var.              0.718   0.152   0.040
## Cumulative % of var.  99.808  99.960 100.000
## 
## Individuals (the 10 first)
##                             Dist    Dim.1    ctr   cos2    Dim.2    ctr   cos2
## 1                       |  6.657 |  5.975  1.101  0.805 |  0.672  0.038  0.010
## 2                       |  3.316 |  1.818  0.102  0.301 | -2.315  0.452  0.487
## 3                       |  3.687 |  3.408  0.358  0.854 | -0.780  0.051  0.045
## 4                       |  9.836 |  6.305  1.226  0.411 |  6.966  4.088  0.502
## 5                       |  3.014 |  1.147  0.041  0.145 | -1.878  0.297  0.388
## 6                       |  4.228 |  2.740  0.232  0.420 |  3.100  0.810  0.538
## 7                       |  2.740 |  2.280  0.160  0.693 | -1.334  0.150  0.237
## 8                       |  2.534 |  1.602  0.079  0.400 |  1.483  0.185  0.343
## 9                       |  4.217 |  3.053  0.288  0.524 |  2.637  0.586  0.391
## 10                      | 10.431 |  7.126  1.567  0.467 |  6.786  3.880  0.423
##                            Dim.3    ctr   cos2  
## 1                       | -2.545  1.418  0.146 |
## 2                       | -0.881  0.170  0.071 |
## 3                       | -0.775  0.132  0.044 |
## 4                       | -0.817  0.146  0.007 |
## 5                       | -1.839  0.740  0.372 |
## 6                       | -0.748  0.122  0.031 |
## 7                       | -0.225  0.011  0.007 |
## 8                       |  0.113  0.003  0.002 |
## 9                       |  0.327  0.023  0.006 |
## 10                      |  1.393  0.425  0.018 |
## 
## Variables
##                            Dim.1    ctr   cos2    Dim.2    ctr   cos2    Dim.3
## radius_worst            |  0.802 11.284  0.643 | -0.582 16.252  0.339 | -0.068
## texture_worst           |  0.479  4.029  0.230 | -0.061  0.181  0.004 |  0.875
## perimeter_worst         |  0.831 12.121  0.691 | -0.542 14.101  0.294 | -0.075
## area_worst              |  0.775 10.546  0.601 | -0.600 17.244  0.360 | -0.071
## smoothness_worst        |  0.593  6.181  0.352 |  0.488 11.416  0.238 | -0.046
## compactness_worst       |  0.870 13.291  0.757 |  0.362  6.278  0.131 | -0.034
## concavity_worst         |  0.894 14.043  0.800 |  0.201  1.934  0.040 | -0.052
## concave points_worst    |  0.949 15.812  0.901 | -0.060  0.174  0.004 | -0.118
## symmetry_worst          |  0.596  6.238  0.355 |  0.446  9.524  0.199 | -0.019
## fractal_dimension_worst |  0.606  6.456  0.368 |  0.691 22.896  0.478 | -0.032
##                            ctr   cos2  
## radius_worst             0.580  0.005 |
## texture_worst           95.418  0.766 |
## perimeter_worst          0.703  0.006 |
## area_worst               0.624  0.005 |
## smoothness_worst         0.265  0.002 |
## compactness_worst        0.145  0.001 |
## concavity_worst          0.343  0.003 |
## concave points_worst     1.747  0.014 |
## symmetry_worst           0.046  0.000 |
## fractal_dimension_worst  0.130  0.001 |

The function class() showcases that the object created by using the PCA() function is of class “list” and “PCA”. It is also worth noting that the cumulative proportions obtained with PCA() (and seen when applying the summary() function to an object of this kind) are identical to the ones obtained with the R built-in functions - as should be.

Another aspect to highlight about this function lies in how well organized is/are this function’s output/results. Even though the components of the PCA object obtained through PCA() can be read in str()’s output (as was the case with R built-in functions), the function print() does a better job at showcasing the object’s components and how the resulting data is stored within it/them - the following code snippets compare the outputs of the print() function when applied to the PCA objects described so far:

prcomp()
print(all_pca_1)
## Standard deviations (1, .., p=30):
##  [1] 3.64439401 2.38565601 1.67867477 1.40735229 1.28402903 1.09879780
##  [7] 0.82171778 0.69037464 0.64567392 0.59219377 0.54213992 0.51103950
## [13] 0.49128148 0.39624453 0.30681422 0.28260007 0.24371918 0.22938785
## [19] 0.22243559 0.17652026 0.17312681 0.16564843 0.15601550 0.13436892
## [25] 0.12442376 0.09043030 0.08306903 0.03986650 0.02736427 0.01153451
## 
## Rotation (n x k) = (30 x 30):
##                                 PC1          PC2          PC3          PC4
## radius_mean             -0.21890244  0.233857132 -0.008531243  0.041408962
## texture_mean            -0.10372458  0.059706088  0.064549903 -0.603050001
## perimeter_mean          -0.22753729  0.215181361 -0.009314220  0.041983099
## area_mean               -0.22099499  0.231076711  0.028699526  0.053433795
## smoothness_mean         -0.14258969 -0.186113023 -0.104291904  0.159382765
## compactness_mean        -0.23928535 -0.151891610 -0.074091571  0.031794581
## concavity_mean          -0.25840048 -0.060165363  0.002733838  0.019122753
## concave points_mean     -0.26085376  0.034767500 -0.025563541  0.065335944
## symmetry_mean           -0.13816696 -0.190348770 -0.040239936  0.067124984
## fractal_dimension_mean  -0.06436335 -0.366575471 -0.022574090  0.048586765
## radius_se               -0.20597878  0.105552152  0.268481387  0.097941242
## texture_se              -0.01742803 -0.089979682  0.374633665 -0.359855528
## perimeter_se            -0.21132592  0.089457234  0.266645367  0.088992415
## area_se                 -0.20286964  0.152292628  0.216006528  0.108205039
## smoothness_se           -0.01453145 -0.204430453  0.308838979  0.044664180
## compactness_se          -0.17039345 -0.232715896  0.154779718 -0.027469363
## concavity_se            -0.15358979 -0.197207283  0.176463743  0.001316880
## concave points_se       -0.18341740 -0.130321560  0.224657567  0.074067335
## symmetry_se             -0.04249842 -0.183848000  0.288584292  0.044073351
## fractal_dimension_se    -0.10256832 -0.280092027  0.211503764  0.015304750
## radius_worst            -0.22799663  0.219866379 -0.047506990  0.015417240
## texture_worst           -0.10446933  0.045467298 -0.042297823 -0.632807885
## perimeter_worst         -0.23663968  0.199878428 -0.048546508  0.013802794
## area_worst              -0.22487053  0.219351858 -0.011902318  0.025894749
## smoothness_worst        -0.12795256 -0.172304352 -0.259797613  0.017652216
## compactness_worst       -0.21009588 -0.143593173 -0.236075625 -0.091328415
## concavity_worst         -0.22876753 -0.097964114 -0.173057335 -0.073951180
## concave points_worst    -0.25088597  0.008257235 -0.170344076  0.006006996
## symmetry_worst          -0.12290456 -0.141883349 -0.271312642 -0.036250695
## fractal_dimension_worst -0.13178394 -0.275339469 -0.232791313 -0.077053470
##                                  PC5           PC6           PC7          PC8
## radius_mean             -0.037786354  0.0187407904 -0.1240883403  0.007452296
## texture_mean             0.049468850 -0.0321788366  0.0113995382 -0.130674825
## perimeter_mean          -0.037374663  0.0173084449 -0.1144770573  0.018687258
## area_mean               -0.010331251 -0.0018877480 -0.0516534275 -0.034673604
## smoothness_mean          0.365088528 -0.2863744966 -0.1406689928  0.288974575
## compactness_mean        -0.011703971 -0.0141309489  0.0309184960  0.151396350
## concavity_mean          -0.086375412 -0.0093441809 -0.1075204434  0.072827285
## concave points_mean      0.043861025 -0.0520499505 -0.1504822142  0.152322414
## symmetry_mean            0.305941428  0.3564584607 -0.0938911345  0.231530989
## fractal_dimension_mean   0.044424360 -0.1194306679  0.2957600240  0.177121441
## radius_se                0.154456496 -0.0256032561  0.3124900373 -0.022539967
## texture_se               0.191650506 -0.0287473145 -0.0907553556  0.475413139
## perimeter_se             0.120990220  0.0018107150  0.3146403902  0.011896690
## area_se                  0.127574432 -0.0428639079  0.3466790028 -0.085805135
## smoothness_se            0.232065676 -0.3429173935 -0.2440240556 -0.573410232
## compactness_se          -0.279968156  0.0691975186  0.0234635340 -0.117460157
## concavity_se            -0.353982091  0.0563432386 -0.2088237897 -0.060566501
## concave points_se       -0.195548089 -0.0312244482 -0.3696459369  0.108319309
## symmetry_se              0.252868765  0.4902456426 -0.0803822539 -0.220149279
## fractal_dimension_se    -0.263297438 -0.0531952674  0.1913949726 -0.011168188
## radius_worst             0.004406592 -0.0002906849 -0.0097099360 -0.042619416
## texture_worst            0.092883400 -0.0500080613  0.0098707439 -0.036251636
## perimeter_worst         -0.007454151  0.0085009872 -0.0004457267 -0.030558534
## area_worst               0.027390903 -0.0251643821  0.0678316595 -0.079394246
## smoothness_worst         0.324435445 -0.3692553703 -0.1088308865 -0.205852191
## compactness_worst       -0.121804107  0.0477057929  0.1404729381 -0.084019659
## concavity_worst         -0.188518727  0.0283792555 -0.0604880561 -0.072467871
## concave points_worst    -0.043332069 -0.0308734498 -0.1679666187  0.036170795
## symmetry_worst           0.244558663  0.4989267845 -0.0184906298 -0.228225053
## fractal_dimension_worst -0.094423351 -0.0802235245  0.3746576261 -0.048360667
##                                  PC9         PC10        PC11         PC12
## radius_mean             -0.223109764  0.095486443 -0.04147149  0.051067457
## texture_mean             0.112699390  0.240934066  0.30224340  0.254896423
## perimeter_mean          -0.223739213  0.086385615 -0.01678264  0.038926106
## area_mean               -0.195586014  0.074956489 -0.11016964  0.065437508
## smoothness_mean          0.006424722 -0.069292681  0.13702184  0.316727211
## compactness_mean        -0.167841425  0.012936200  0.30800963 -0.104017044
## concavity_mean           0.040591006 -0.135602298 -0.12419024  0.065653480
## concave points_mean     -0.111971106  0.008054528  0.07244603  0.042589267
## symmetry_mean            0.256040084  0.572069479 -0.16305408 -0.288865504
## fractal_dimension_mean  -0.123740789  0.081103207  0.03804827  0.236358988
## radius_se                0.249985002 -0.049547594  0.02535702 -0.016687915
## texture_se              -0.246645397 -0.289142742 -0.34494446 -0.306160423
## perimeter_se             0.227154024 -0.114508236  0.16731877 -0.101446828
## area_se                  0.229160015 -0.091927889 -0.05161946 -0.017679218
## smoothness_se           -0.141924890  0.160884609 -0.08420621 -0.294710053
## compactness_se          -0.145322810  0.043504866  0.20688568 -0.263456509
## concavity_se             0.358107079 -0.141276243 -0.34951794  0.251146975
## concave points_se        0.272519886  0.086240847  0.34237591 -0.006458751
## symmetry_se             -0.304077200 -0.316529830  0.18784404  0.320571348
## fractal_dimension_se    -0.213722716  0.367541918 -0.25062479  0.276165974
## radius_worst            -0.112141463  0.077361643 -0.10506733  0.039679665
## texture_worst            0.103341204  0.029550941 -0.01315727  0.079797450
## perimeter_worst         -0.109614364  0.050508334 -0.05107628 -0.008987738
## area_worst              -0.080732461  0.069921152 -0.18459894  0.048088657
## smoothness_worst         0.112315904 -0.128304659 -0.14389035  0.056514866
## compactness_worst       -0.100677822 -0.172133632  0.19742047 -0.371662503
## concavity_worst          0.161908621 -0.311638520 -0.18501676 -0.087034532
## concave points_worst     0.060488462 -0.076648291  0.11777205 -0.068125354
## symmetry_worst           0.064637806 -0.029563075 -0.15756025  0.044033503
## fractal_dimension_worst -0.134174175  0.012609579 -0.11828355 -0.034731693
##                                PC13         PC14         PC15        PC16
## radius_mean              0.01196721  0.059506135 -0.051118775 -0.15058388
## texture_mean             0.20346133 -0.021560100 -0.107922421 -0.15784196
## perimeter_mean           0.04410950  0.048513812 -0.039902936 -0.11445396
## area_mean                0.06737574  0.010830829  0.013966907 -0.13244803
## smoothness_mean          0.04557360  0.445064860 -0.118143364 -0.20461325
## compactness_mean         0.22928130  0.008101057  0.230899962  0.17017837
## concavity_mean           0.38709081 -0.189358699 -0.128283732  0.26947021
## concave points_mean      0.13213810 -0.244794768 -0.217099194  0.38046410
## symmetry_mean            0.18993367  0.030738856 -0.073961707 -0.16466159
## fractal_dimension_mean   0.10623908 -0.377078865  0.517975705 -0.04079279
## radius_se               -0.06819523  0.010347413 -0.110050711  0.05890572
## texture_se              -0.16822238 -0.010849347  0.032752721 -0.03450040
## perimeter_se            -0.03784399 -0.045523718 -0.008268089  0.02651665
## area_se                  0.05606493  0.083570718 -0.046024366  0.04115323
## smoothness_se            0.15044143 -0.201152530  0.018559465 -0.05803906
## compactness_se           0.01004017  0.491755932  0.168209315  0.18983090
## concavity_se             0.15878319  0.134586924  0.250471408 -0.12542065
## concave points_se       -0.49402674 -0.199666719  0.062079344 -0.19881035
## symmetry_se              0.01033274 -0.046864383 -0.113383199 -0.15771150
## fractal_dimension_se    -0.24045832  0.145652466 -0.353232211  0.26855388
## radius_worst            -0.13789053  0.023101281  0.166567074 -0.08156057
## texture_worst           -0.08014543  0.053430792  0.101115399  0.18555785
## perimeter_worst         -0.09696571  0.012219382  0.182755198 -0.05485705
## area_worst              -0.10116061 -0.006685465  0.314993600 -0.09065339
## smoothness_worst        -0.20513034  0.162235443  0.046125866  0.14555166
## compactness_worst        0.01227931  0.166470250 -0.049956014 -0.15373486
## concavity_worst          0.21798433 -0.066798931 -0.204835886 -0.21502195
## concave points_worst    -0.25438749 -0.276418891 -0.169499607  0.17814174
## symmetry_worst          -0.25653491  0.005355574  0.139888394  0.25789401
## fractal_dimension_worst -0.17281424 -0.212104110 -0.256173195 -0.40555649
##                                 PC17          PC18        PC19         PC20
## radius_mean              0.202924255  0.1467123385  0.22538466 -0.049698664
## texture_mean            -0.038706119 -0.0411029851  0.02978864 -0.244134993
## perimeter_mean           0.194821310  0.1583174548  0.23959528 -0.017665012
## area_mean                0.255705763  0.2661681046 -0.02732219 -0.090143762
## smoothness_mean          0.167929914 -0.3522268017 -0.16456584  0.017100960
## compactness_mean        -0.020307708  0.0077941384  0.28422236  0.488686329
## concavity_mean          -0.001598353 -0.0269681105  0.00226636 -0.033387086
## concave points_mean      0.034509509 -0.0828277367 -0.15497236 -0.235407606
## symmetry_mean           -0.191737848  0.1733977905 -0.05881116  0.026069156
## fractal_dimension_mean   0.050225246  0.0878673570 -0.05815705 -0.175637222
## radius_se               -0.139396866 -0.2362165319  0.17588331 -0.090800503
## texture_se               0.043963016 -0.0098586620  0.03600985 -0.071659988
## perimeter_se            -0.024635639 -0.0259288003  0.36570154 -0.177250625
## area_se                  0.334418173  0.3049069032 -0.41657231  0.274201148
## smoothness_se            0.139595006 -0.2312599432 -0.01326009  0.090061477
## compactness_se          -0.008246477  0.1004742346 -0.24244818 -0.461098220
## concavity_se             0.084616716 -0.0001954852  0.12638102  0.066946174
## concave points_se        0.108132263  0.0460549116 -0.01216430  0.068868294
## symmetry_se             -0.274059129  0.1870147640 -0.08903929  0.107385289
## fractal_dimension_se    -0.122733398 -0.0598230982  0.08660084  0.222345297
## radius_worst            -0.240049982 -0.2161013526  0.01366130 -0.005626909
## texture_worst            0.069365185  0.0583984505 -0.07586693  0.300599798
## perimeter_worst         -0.234164147 -0.1885435919  0.09081325  0.011003858
## area_worst              -0.273399584 -0.1420648558 -0.41004720  0.060047387
## smoothness_worst        -0.278030197  0.5015516751  0.23451384 -0.129723903
## compactness_worst       -0.004037123 -0.0735745143  0.02020070  0.229280589
## concavity_worst         -0.191313419 -0.1039079796 -0.04578612 -0.046482792
## concave points_worst    -0.075485316  0.0758138963 -0.26022962  0.033022340
## symmetry_worst           0.430658116 -0.2787138431  0.11725053 -0.116759236
## fractal_dimension_worst  0.159394300  0.0235647497 -0.01149448 -0.104991974
##                                  PC21        PC22          PC23        PC24
## radius_mean             -0.0685700057 -0.07292890 -0.0985526942 -0.18257944
## texture_mean             0.4483694667 -0.09480063 -0.0005549975  0.09878679
## perimeter_mean          -0.0697690429 -0.07516048 -0.0402447050 -0.11664888
## area_mean               -0.0184432785 -0.09756578  0.0077772734  0.06984834
## smoothness_mean         -0.1194917473 -0.06382295 -0.0206657211  0.06869742
## compactness_mean         0.1926213963  0.09807756  0.0523603957 -0.10413552
## concavity_mean           0.0055717533  0.18521200  0.3248703785  0.04474106
## concave points_mean     -0.0094238187  0.31185243 -0.0514087968  0.08402770
## symmetry_mean           -0.0869384844  0.01840673 -0.0512005770  0.01933947
## fractal_dimension_mean  -0.0762718362 -0.28786888 -0.0846898562 -0.13326055
## radius_se                0.0863867747  0.15027468 -0.2641253170 -0.55870157
## texture_se               0.2170719674 -0.04845693 -0.0008738805  0.02426730
## perimeter_se            -0.3049501584 -0.15935280  0.0900742110  0.51675039
## area_se                  0.1925877857 -0.06423262  0.0982150746 -0.02246072
## smoothness_se           -0.0720987261 -0.05054490 -0.0598177179  0.01563119
## compactness_se          -0.1403865724  0.04528769  0.0091038710 -0.12177779
## concavity_se             0.0630479298  0.20521269 -0.3875423290  0.18820504
## concave points_se        0.0343753236  0.07254538  0.3517550738 -0.10966898
## symmetry_se             -0.0976995265  0.08465443 -0.0423628949  0.00322620
## fractal_dimension_se     0.0628432814 -0.24470508  0.0857810992  0.07519442
## radius_worst             0.0072938995  0.09629821 -0.0556767923 -0.15683037
## texture_worst           -0.5944401434  0.11111202 -0.0089228997 -0.11848460
## perimeter_worst         -0.0920235990 -0.01722163  0.0633448296  0.23711317
## area_worst               0.1467901315  0.09695982  0.1908896250  0.14406303
## smoothness_worst         0.1648492374  0.06825409  0.0936901494 -0.01099014
## compactness_worst        0.1813748671 -0.02967641 -0.1479209247  0.18674995
## concavity_worst         -0.1321005945 -0.46042619  0.2864331353 -0.28885257
## concave points_worst     0.0008860815 -0.29984056 -0.5675277966  0.10734024
## symmetry_worst           0.1627085487 -0.09714484  0.1213434508 -0.01438181
## fractal_dimension_worst -0.0923439434  0.46947115  0.0076253382  0.03782545
##                                PC25         PC26         PC27          PC28
## radius_mean             -0.01922650 -0.129476396 -0.131526670  2.111940e-01
## texture_mean             0.08474593 -0.024556664 -0.017357309 -6.581146e-05
## perimeter_mean           0.02701541 -0.125255946 -0.115415423  8.433827e-02
## area_mean               -0.21004078  0.362727403  0.466612477 -2.725083e-01
## smoothness_mean          0.02895489 -0.037003686  0.069689923  1.479269e-03
## compactness_mean         0.39662323  0.262808474  0.097748705 -5.462767e-03
## concavity_mean          -0.09697732 -0.548876170  0.364808397  4.553864e-02
## concave points_mean     -0.18645160  0.387643377 -0.454699351 -8.883097e-03
## symmetry_mean           -0.02458369 -0.016044038 -0.015164835  1.433026e-03
## fractal_dimension_mean  -0.20722186 -0.097404839 -0.101244946 -6.311687e-03
## radius_se               -0.17493043  0.049977080  0.212982901 -1.922239e-01
## texture_se               0.05698648 -0.011237242 -0.010092889 -5.622611e-03
## perimeter_se             0.07292764  0.103653282  0.041691553  2.631919e-01
## area_se                  0.13185041 -0.155304589 -0.313358657 -4.206811e-02
## smoothness_se            0.03121070 -0.007717557 -0.009052154  9.792963e-03
## compactness_se           0.17316455 -0.049727632  0.046536088 -1.539555e-02
## concavity_se             0.01593998  0.091454968 -0.084224797  5.820978e-03
## concave points_se       -0.12954655 -0.017941919 -0.011165509 -2.900930e-02
## symmetry_se             -0.01951493 -0.017267849 -0.019975983 -7.636526e-03
## fractal_dimension_se    -0.08417120  0.035488974 -0.012036564  1.975646e-02
## radius_worst             0.07070972 -0.197054744 -0.178666740  4.126396e-01
## texture_worst           -0.11818972  0.036469433  0.021410694 -3.902509e-04
## perimeter_worst          0.11803403 -0.244103670 -0.241031046 -7.286809e-01
## area_worst              -0.03828995  0.231359525  0.237162466  2.389603e-01
## smoothness_worst        -0.04796476  0.012602464 -0.040853568 -1.535248e-03
## compactness_worst       -0.62438494 -0.100463424 -0.070505414  4.869182e-02
## concavity_worst          0.11577034  0.266853781 -0.142905801 -1.764090e-02
## concave points_worst     0.26319634 -0.133574507  0.230901389  2.247567e-02
## symmetry_worst           0.04529962  0.028184296  0.022790444  4.920481e-03
## fractal_dimension_worst  0.28013348  0.004520482  0.059985998 -2.356214e-02
##                                  PC29          PC30
## radius_mean              2.114605e-01  0.7024140910
## texture_mean            -1.053393e-02  0.0002736610
## perimeter_mean           3.838261e-01 -0.6898969685
## area_mean               -4.227949e-01 -0.0329473482
## smoothness_mean         -3.434667e-03 -0.0048474577
## compactness_mean        -4.101677e-02  0.0446741863
## concavity_mean          -1.001479e-02  0.0251386661
## concave points_mean     -4.206949e-03 -0.0010772653
## symmetry_mean           -7.569862e-03 -0.0012803794
## fractal_dimension_mean   7.301433e-03 -0.0047556848
## radius_se                1.184421e-01 -0.0087110937
## texture_se              -8.776279e-03 -0.0010710392
## perimeter_se            -6.100219e-03  0.0137293906
## area_se                 -8.592591e-02  0.0011053260
## smoothness_se            1.776386e-03 -0.0016082109
## compactness_se           3.158134e-03  0.0019156224
## concavity_se             1.607852e-02 -0.0089265265
## concave points_se       -2.393779e-02 -0.0021601973
## symmetry_se             -5.223292e-03  0.0003293898
## fractal_dimension_se    -8.341912e-03  0.0017989568
## radius_worst            -6.357249e-01 -0.1356430561
## texture_worst            1.723549e-02  0.0010205360
## perimeter_worst          2.292180e-02  0.0797438536
## area_worst               4.449359e-01  0.0397422838
## smoothness_worst         7.385492e-03  0.0045832773
## compactness_worst        3.566904e-06 -0.0128415624
## concavity_worst         -1.267572e-02  0.0004021392
## concave points_worst     3.524045e-02 -0.0022884418
## symmetry_worst           1.340423e-02  0.0003954435
## fractal_dimension_worst  1.147766e-02  0.0018942925
princomp()
print(all_pca_2)
## Call:
## princomp(x = wbcd[, -1], cor = TRUE)
## 
## Standard deviations:
##     Comp.1     Comp.2     Comp.3     Comp.4     Comp.5     Comp.6     Comp.7 
## 3.64439401 2.38565601 1.67867477 1.40735229 1.28402903 1.09879780 0.82171778 
##     Comp.8     Comp.9    Comp.10    Comp.11    Comp.12    Comp.13    Comp.14 
## 0.69037464 0.64567392 0.59219377 0.54213992 0.51103950 0.49128148 0.39624453 
##    Comp.15    Comp.16    Comp.17    Comp.18    Comp.19    Comp.20    Comp.21 
## 0.30681422 0.28260007 0.24371918 0.22938785 0.22243559 0.17652026 0.17312681 
##    Comp.22    Comp.23    Comp.24    Comp.25    Comp.26    Comp.27    Comp.28 
## 0.16564843 0.15601550 0.13436892 0.12442376 0.09043030 0.08306903 0.03986650 
##    Comp.29    Comp.30 
## 0.02736427 0.01153451 
## 
##  30  variables and  569 observations.
PCA()
print(all_pca_3)
## **Results for the Principal Component Analysis (PCA)**
## The analysis was performed on 569 individuals, described by 30 variables
## *The results are available in the following objects:
## 
##    name               description                          
## 1  "$eig"             "eigenvalues"                        
## 2  "$var"             "results for the variables"          
## 3  "$var$coord"       "coord. for the variables"           
## 4  "$var$cor"         "correlations variables - dimensions"
## 5  "$var$cos2"        "cos2 for the variables"             
## 6  "$var$contrib"     "contributions of the variables"     
## 7  "$ind"             "results for the individuals"        
## 8  "$ind$coord"       "coord. for the individuals"         
## 9  "$ind$cos2"        "cos2 for the individuals"           
## 10 "$ind$contrib"     "contributions of the individuals"   
## 11 "$call"            "summary statistics"                 
## 12 "$call$centre"     "mean of the variables"              
## 13 "$call$ecart.type" "standard error of the variables"    
## 14 "$call$row.w"      "weights for the individuals"        
## 15 "$call$col.w"      "weights for the variables"

Printing a “PCA” class object results in an organized look (moreso when compared to the alternatives) at the function’s (object’s) components.

As was already stated, there are many aspects of PCA that are not covered by the scope of this documents due to time constraints and to keep the overall size within reasonable limits. prcomp() and princomp() components are not directly useful for the tasks that are yet to be performed, yet PCA()’s are - certain components (like the eigenvalues and the coordinates, correlations, squared cosines and contributions of both variables and individuals) are core to following procedures, albeit those will be detailed later on the document. For now, let’s just state that PCA() is a more convenient function in most cases than both prcomp() and princomp() due to the these particular components (such information can be obtained with the other functions, but require the use of additional functions whereas PCA() makes such task more straightforward).

2.2.3. dudi.pca() from ade4

The dudi.pca() function from the ade4 package is formatted as follows:

# Do not run this code snippet, as it is only here for illustration purposes
library(ade4)
dudi.pca(df,
         row.w = rep(1, nrow(df))/nrow(df),
         col.w = rep(1, ncol(df)),
         center = TRUE, 
         scale = TRUE,
         scannf = TRUE, 
         nf = 2,
         ...)
Let’s detail its most notable arguments:
  • df: a data frame with n rows (individuals) and p columns (numeric variables).
  • row.w: an optional row weights.
  • col.w: an optional column weights.
  • center: TRUE of FALSE determines whether perform an optional row weight (by default, uniform row weights).
  • scale: TRUE or FALSE determines whether or not to scale/standardize the data.
  • scannf: TRUE of FALSE determines whether or not to display an screeplot (the topic is properly explained later on the document).
  • nf: an integer to indicate the number of kept axes if scannf = FALSE

More information regarding the dudi.pca() function and its arguments is available in its associated RDocumentation page: https://www.rdocumentation.org/packages/ade4/versions/1.7-15/topics/dudi.pca

Let’s now evaluate the results when applying dudi.pca() to the Wisconsin Breast Cancer Dataset:

All
all_pca_4 <- dudi.pca(wbcd[,-1], scale = TRUE, scannf = FALSE, nf = 30)
class(all_pca_4)
## [1] "pca"  "dudi"
str(all_pca_4)
## List of 13
##  $ tab :'data.frame':    569 obs. of  30 variables:
##   ..$ radius_mean            : num [1:569] 1.097 1.83 1.58 -0.769 1.75 ...
##   ..$ texture_mean           : num [1:569] -2.073 -0.354 0.456 0.254 -1.152 ...
##   ..$ perimeter_mean         : num [1:569] 1.27 1.686 1.567 -0.593 1.777 ...
##   ..$ area_mean              : num [1:569] 0.984 1.909 1.559 -0.764 1.826 ...
##   ..$ smoothness_mean        : num [1:569] 1.568 -0.827 0.942 3.284 0.28 ...
##   ..$ compactness_mean       : num [1:569] 3.284 -0.487 1.053 3.403 0.539 ...
##   ..$ concavity_mean         : num [1:569] 2.6529 -0.0238 1.3635 1.9159 1.371 ...
##   ..$ concave points_mean    : num [1:569] 2.532 0.548 2.037 1.452 1.428 ...
##   ..$ symmetry_mean          : num [1:569] 2.21752 0.00139 0.93968 2.86738 -0.00956 ...
##   ..$ fractal_dimension_mean : num [1:569] 2.256 -0.869 -0.398 4.911 -0.562 ...
##   ..$ radius_se              : num [1:569] 2.49 0.499 1.229 0.326 1.271 ...
##   ..$ texture_se             : num [1:569] -0.565 -0.876 -0.78 -0.11 -0.79 ...
##   ..$ perimeter_se           : num [1:569] 2.833 0.263 0.851 0.287 1.273 ...
##   ..$ area_se                : num [1:569] 2.488 0.742 1.181 -0.288 1.19 ...
##   ..$ smoothness_se          : num [1:569] -0.214 -0.605 -0.297 0.69 1.483 ...
##   ..$ compactness_se         : num [1:569] 1.3169 -0.6929 0.815 2.7443 -0.0485 ...
##   ..$ concavity_se           : num [1:569] 0.724 -0.441 0.213 0.82 0.828 ...
##   ..$ concave points_se      : num [1:569] 0.661 0.26 1.425 1.115 1.144 ...
##   ..$ symmetry_se            : num [1:569] 1.149 -0.805 0.237 4.733 -0.361 ...
##   ..$ fractal_dimension_se   : num [1:569] 0.9071 -0.0994 0.2936 2.0475 0.4993 ...
##   ..$ radius_worst           : num [1:569] 1.887 1.806 1.512 -0.281 1.299 ...
##   ..$ texture_worst          : num [1:569] -1.359 -0.369 -0.024 0.134 -1.467 ...
##   ..$ perimeter_worst        : num [1:569] 2.3 1.54 1.35 -0.25 1.34 ...
##   ..$ area_worst             : num [1:569] 2 1.89 1.46 -0.55 1.22 ...
##   ..$ smoothness_worst       : num [1:569] 1.308 -0.376 0.527 3.394 0.221 ...
##   ..$ compactness_worst      : num [1:569] 2.617 -0.43 1.083 3.893 -0.313 ...
##   ..$ concavity_worst        : num [1:569] 2.11 -0.147 0.855 1.99 0.613 ...
##   ..$ concave points_worst   : num [1:569] 2.296 1.087 1.955 2.176 0.729 ...
##   ..$ symmetry_worst         : num [1:569] 2.751 -0.244 1.152 6.046 -0.868 ...
##   ..$ fractal_dimension_worst: num [1:569] 1.937 0.281 0.201 4.935 -0.397 ...
##  $ cw  : num [1:30] 1 1 1 1 1 1 1 1 1 1 ...
##  $ lw  : num [1:569] 0.00176 0.00176 0.00176 0.00176 0.00176 ...
##  $ eig : num [1:30] 13.28 5.69 2.82 1.98 1.65 ...
##  $ rank: int 30
##  $ nf  : num 30
##  $ c1  :'data.frame':    30 obs. of  30 variables:
##   ..$ CS1 : num [1:30] -0.219 -0.104 -0.228 -0.221 -0.143 ...
##   ..$ CS2 : num [1:30] -0.2339 -0.0597 -0.2152 -0.2311 0.1861 ...
##   ..$ CS3 : num [1:30] -0.00853 0.06455 -0.00931 0.0287 -0.10429 ...
##   ..$ CS4 : num [1:30] 0.0414 -0.6031 0.042 0.0534 0.1594 ...
##   ..$ CS5 : num [1:30] 0.0378 -0.0495 0.0374 0.0103 -0.3651 ...
##   ..$ CS6 : num [1:30] 0.01874 -0.03218 0.01731 -0.00189 -0.28637 ...
##   ..$ CS7 : num [1:30] -0.1241 0.0114 -0.1145 -0.0517 -0.1407 ...
##   ..$ CS8 : num [1:30] 0.00745 -0.13067 0.01869 -0.03467 0.28897 ...
##   ..$ CS9 : num [1:30] -0.22311 0.1127 -0.22374 -0.19559 0.00642 ...
##   ..$ CS10: num [1:30] 0.0955 0.2409 0.0864 0.075 -0.0693 ...
##   ..$ CS11: num [1:30] 0.0415 -0.3022 0.0168 0.1102 -0.137 ...
##   ..$ CS12: num [1:30] 0.0511 0.2549 0.0389 0.0654 0.3167 ...
##   ..$ CS13: num [1:30] 0.012 0.2035 0.0441 0.0674 0.0456 ...
##   ..$ CS14: num [1:30] -0.0595 0.0216 -0.0485 -0.0108 -0.4451 ...
##   ..$ CS15: num [1:30] 0.0511 0.1079 0.0399 -0.014 0.1181 ...
##   ..$ CS16: num [1:30] 0.151 0.158 0.114 0.132 0.205 ...
##   ..$ CS17: num [1:30] -0.2029 0.0387 -0.1948 -0.2557 -0.1679 ...
##   ..$ CS18: num [1:30] -0.1467 0.0411 -0.1583 -0.2662 0.3522 ...
##   ..$ CS19: num [1:30] 0.2254 0.0298 0.2396 -0.0273 -0.1646 ...
##   ..$ CS20: num [1:30] -0.0497 -0.2441 -0.0177 -0.0901 0.0171 ...
##   ..$ CS21: num [1:30] -0.0686 0.4484 -0.0698 -0.0184 -0.1195 ...
##   ..$ CS22: num [1:30] 0.0729 0.0948 0.0752 0.0976 0.0638 ...
##   ..$ CS23: num [1:30] -0.098553 -0.000555 -0.040245 0.007777 -0.020666 ...
##   ..$ CS24: num [1:30] 0.1826 -0.0988 0.1166 -0.0698 -0.0687 ...
##   ..$ CS25: num [1:30] -0.0192 0.0847 0.027 -0.21 0.029 ...
##   ..$ CS26: num [1:30] -0.1295 -0.0246 -0.1253 0.3627 -0.037 ...
##   ..$ CS27: num [1:30] 0.1315 0.0174 0.1154 -0.4666 -0.0697 ...
##   ..$ CS28: num [1:30] -2.11e-01 6.58e-05 -8.43e-02 2.73e-01 -1.48e-03 ...
##   ..$ CS29: num [1:30] 0.21146 -0.01053 0.38383 -0.42279 -0.00343 ...
##   ..$ CS30: num [1:30] 0.702414 0.000274 -0.689897 -0.032947 -0.004847 ...
##  $ li  :'data.frame':    569 obs. of  30 variables:
##   ..$ Axis1 : num [1:569] -9.19 -2.39 -5.73 -7.12 -3.94 ...
##   ..$ Axis2 : num [1:569] 1.95 -3.77 -1.08 10.28 -1.95 ...
##   ..$ Axis3 : num [1:569] -1.123 -0.529 -0.552 -3.233 1.39 ...
##   ..$ Axis4 : num [1:569] 3.634 1.118 0.912 0.153 2.941 ...
##   ..$ Axis5 : num [1:569] -1.195 0.622 -0.177 -2.961 0.547 ...
##   ..$ Axis6 : num [1:569] 1.4114 0.0287 0.5415 3.0534 -1.2265 ...
##   ..$ Axis7 : num [1:569] 2.1594 0.0134 -0.6682 1.4299 -0.9362 ...
##   ..$ Axis8 : num [1:569] 0.3984 -0.241 -0.0974 -1.0596 -0.6364 ...
##   ..$ Axis9 : num [1:569] -0.1571 -0.7119 0.0241 -1.4054 -0.2638 ...
##   ..$ Axis10: num [1:569] -0.877 1.107 0.454 -1.117 0.378 ...
##   ..$ Axis11: num [1:569] 0.263 0.813 -0.606 -1.152 0.651 ...
##   ..$ Axis12: num [1:569] -0.859 0.158 0.124 1.011 -0.111 ...
##   ..$ Axis13: num [1:569] 0.103 -0.944 -0.411 -0.933 0.388 ...
##   ..$ Axis14: num [1:569] 0.6908 0.6535 -0.0167 0.4874 0.5392 ...
##   ..$ Axis15: num [1:569] -0.60179 0.00897 0.48342 -0.16885 0.31032 ...
##   ..$ Axis16: num [1:569] -0.7451 0.6488 -0.3251 -0.0514 0.1526 ...
##   ..$ Axis17: num [1:569] 0.2655 0.0172 -0.1909 -0.4826 -0.1331 ...
##   ..$ Axis18: num [1:569] 0.5496 -0.3183 0.088 0.0359 0.0187 ...
##   ..$ Axis19: num [1:569] 0.1338 -0.2476 -0.3926 -0.0267 0.4614 ...
##   ..$ Axis20: num [1:569] 0.3456 -0.1141 -0.2045 -0.4647 0.0655 ...
##   ..$ Axis21: num [1:569] 0.0965 -0.0773 0.3111 0.4342 -0.1165 ...
##   ..$ Axis22: num [1:569] 0.0688 -0.0946 -0.0603 -0.2033 -0.0176 ...
##   ..$ Axis23: num [1:569] 0.0845 -0.2177 -0.0743 -0.1241 0.1395 ...
##   ..$ Axis24: num [1:569] -0.17526 0.01129 0.10276 0.15343 -0.00533 ...
##   ..$ Axis25: num [1:569] 0.15102 0.17051 -0.17116 -0.0775 -0.00306 ...
##   ..$ Axis26: num [1:569] -0.2015 -0.04113 0.00474 -0.27522 0.03925 ...
##   ..$ Axis27: num [1:569] 0.2526 -0.1813 -0.0496 -0.1835 -0.0322 ...
##   ..$ Axis28: num [1:569] 0.0339 -0.0326 -0.047 -0.0425 0.0348 ...
##   ..$ Axis29: num [1:569] 0.04565 -0.00569 0.00315 -0.06929 0.00504 ...
##   ..$ Axis30: num [1:569] 0.047169 0.001868 -0.000751 0.019937 -0.021214 ...
##  $ co  :'data.frame':    30 obs. of  30 variables:
##   ..$ Comp1 : num [1:30] -0.798 -0.378 -0.829 -0.805 -0.52 ...
##   ..$ Comp2 : num [1:30] -0.558 -0.142 -0.513 -0.551 0.444 ...
##   ..$ Comp3 : num [1:30] -0.0143 0.1084 -0.0156 0.0482 -0.1751 ...
##   ..$ Comp4 : num [1:30] 0.0583 -0.8487 0.0591 0.0752 0.2243 ...
##   ..$ Comp5 : num [1:30] 0.0485 -0.0635 0.048 0.0133 -0.4688 ...
##   ..$ Comp6 : num [1:30] 0.02059 -0.03536 0.01902 -0.00207 -0.31467 ...
##   ..$ Comp7 : num [1:30] -0.10197 0.00937 -0.09407 -0.04244 -0.11559 ...
##   ..$ Comp8 : num [1:30] 0.00514 -0.09021 0.0129 -0.02394 0.1995 ...
##   ..$ Comp9 : num [1:30] -0.14406 0.07277 -0.14446 -0.12628 0.00415 ...
##   ..$ Comp10: num [1:30] 0.0565 0.1427 0.0512 0.0444 -0.041 ...
##   ..$ Comp11: num [1:30] 0.0225 -0.1639 0.0091 0.0597 -0.0743 ...
##   ..$ Comp12: num [1:30] 0.0261 0.1303 0.0199 0.0334 0.1619 ...
##   ..$ Comp13: num [1:30] 0.00588 0.09996 0.02167 0.0331 0.02239 ...
##   ..$ Comp14: num [1:30] -0.02358 0.00854 -0.01922 -0.00429 -0.17635 ...
##   ..$ Comp15: num [1:30] 0.01568 0.03311 0.01224 -0.00429 0.03625 ...
##   ..$ Comp16: num [1:30] 0.0426 0.0446 0.0323 0.0374 0.0578 ...
##   ..$ Comp17: num [1:30] -0.04946 0.00943 -0.04748 -0.06232 -0.04093 ...
##   ..$ Comp18: num [1:30] -0.03365 0.00943 -0.03632 -0.06106 0.0808 ...
##   ..$ Comp19: num [1:30] 0.05013 0.00663 0.05329 -0.00608 -0.03661 ...
##   ..$ Comp20: num [1:30] -0.00877 -0.04309 -0.00312 -0.01591 0.00302 ...
##   ..$ Comp21: num [1:30] -0.01187 0.07762 -0.01208 -0.00319 -0.02069 ...
##   ..$ Comp22: num [1:30] 0.0121 0.0157 0.0125 0.0162 0.0106 ...
##   ..$ Comp23: num [1:30] -1.54e-02 -8.66e-05 -6.28e-03 1.21e-03 -3.22e-03 ...
##   ..$ Comp24: num [1:30] 0.02453 -0.01327 0.01567 -0.00939 -0.00923 ...
##   ..$ Comp25: num [1:30] -0.00239 0.01054 0.00336 -0.02613 0.0036 ...
##   ..$ Comp26: num [1:30] -0.01171 -0.00222 -0.01133 0.0328 -0.00335 ...
##   ..$ Comp27: num [1:30] 0.01093 0.00144 0.00959 -0.03876 -0.00579 ...
##   ..$ Comp28: num [1:30] -8.42e-03 2.62e-06 -3.36e-03 1.09e-02 -5.90e-05 ...
##   ..$ Comp29: num [1:30] 0.005786 -0.000288 0.010503 -0.011569 -0.000094 ...
##   ..$ Comp30: num [1:30] 8.10e-03 3.16e-06 -7.96e-03 -3.80e-04 -5.59e-05 ...
##  $ l1  :'data.frame':    569 obs. of  30 variables:
##   ..$ RS1 : num [1:569] -2.522 -0.655 -1.573 -1.954 -1.08 ...
##   ..$ RS2 : num [1:569] 0.817 -1.58 -0.451 4.307 -0.817 ...
##   ..$ RS3 : num [1:569] -0.669 -0.315 -0.329 -1.926 0.828 ...
##   ..$ RS4 : num [1:569] 2.582 0.795 0.648 0.108 2.089 ...
##   ..$ RS5 : num [1:569] -0.931 0.484 -0.138 -2.306 0.426 ...
##   ..$ RS6 : num [1:569] 1.2845 0.0261 0.4928 2.7789 -1.1162 ...
##   ..$ RS7 : num [1:569] 2.6279 0.0163 -0.8131 1.7401 -1.1393 ...
##   ..$ RS8 : num [1:569] 0.577 -0.349 -0.141 -1.535 -0.922 ...
##   ..$ RS9 : num [1:569] -0.2433 -1.1026 0.0373 -2.1767 -0.4086 ...
##   ..$ RS10: num [1:569] -1.482 1.869 0.767 -1.886 0.638 ...
##   ..$ RS11: num [1:569] 0.485 1.5 -1.117 -2.124 1.201 ...
##   ..$ RS12: num [1:569] -1.681 0.309 0.243 1.979 -0.216 ...
##   ..$ RS13: num [1:569] 0.21 -1.921 -0.836 -1.9 0.79 ...
##   ..$ RS14: num [1:569] 1.7434 1.6492 -0.0421 1.2301 1.3607 ...
##   ..$ RS15: num [1:569] -1.9614 0.0293 1.5756 -0.5503 1.0114 ...
##   ..$ RS16: num [1:569] -2.637 2.296 -1.15 -0.182 0.54 ...
##   ..$ RS17: num [1:569] 1.0892 0.0706 -0.7834 -1.9803 -0.5463 ...
##   ..$ RS18: num [1:569] 2.3958 -1.3876 0.3835 0.1564 0.0816 ...
##   ..$ RS19: num [1:569] 0.601 -1.113 -1.765 -0.12 2.074 ...
##   ..$ RS20: num [1:569] 1.958 -0.647 -1.159 -2.633 0.371 ...
##   ..$ RS21: num [1:569] 0.557 -0.447 1.797 2.508 -0.673 ...
##   ..$ RS22: num [1:569] 0.416 -0.571 -0.364 -1.227 -0.107 ...
##   ..$ RS23: num [1:569] 0.542 -1.395 -0.476 -0.795 0.894 ...
##   ..$ RS24: num [1:569] -1.3043 0.084 0.7648 1.1419 -0.0397 ...
##   ..$ RS25: num [1:569] 1.2138 1.3704 -1.3756 -0.6228 -0.0246 ...
##   ..$ RS26: num [1:569] -2.2283 -0.4548 0.0524 -3.0435 0.4341 ...
##   ..$ RS27: num [1:569] 3.041 -2.182 -0.597 -2.209 -0.387 ...
##   ..$ RS28: num [1:569] 0.851 -0.818 -1.18 -1.066 0.873 ...
##   ..$ RS29: num [1:569] 1.668 -0.208 0.115 -2.532 0.184 ...
##   ..$ RS30: num [1:569] 4.0894 0.1619 -0.0651 1.7285 -1.8392 ...
##  $ call: language dudi.pca(df = wbcd[, -1], scale = TRUE, scannf = FALSE, nf = 30)
##  $ cent: Named num [1:30] 14.1273 19.2896 91.969 654.8891 0.0964 ...
##   ..- attr(*, "names")= chr [1:30] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
##  $ norm: Named num [1:30] 3.521 4.2973 24.2776 351.6048 0.0141 ...
##   ..- attr(*, "names")= chr [1:30] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
##  - attr(*, "class")= chr [1:2] "pca" "dudi"
summary(all_pca_4)
## Class: pca dudi
## Call: dudi.pca(df = wbcd[, -1], scale = TRUE, scannf = FALSE, nf = 30)
## 
## Total inertia: 30
## 
## Eigenvalues:
##     Ax1     Ax2     Ax3     Ax4     Ax5 
##  13.282   5.691   2.818   1.981   1.649 
## 
## Projected inertia (%):
##     Ax1     Ax2     Ax3     Ax4     Ax5 
##  44.272  18.971   9.393   6.602   5.496 
## 
## Cumulative projected inertia (%):
##     Ax1   Ax1:2   Ax1:3   Ax1:4   Ax1:5 
##   44.27   63.24   72.64   79.24   84.73 
## 
## (Only 5 dimensions (out of 30) are shown)
Mean
mean_pca_4 <- dudi.pca(wbcd[,c(2:11)], scale = TRUE, scannf = FALSE, nf = 30)
class(mean_pca_4)
## [1] "pca"  "dudi"
str(mean_pca_4)
## List of 13
##  $ tab :'data.frame':    569 obs. of  10 variables:
##   ..$ radius_mean           : num [1:569] 1.097 1.83 1.58 -0.769 1.75 ...
##   ..$ texture_mean          : num [1:569] -2.073 -0.354 0.456 0.254 -1.152 ...
##   ..$ perimeter_mean        : num [1:569] 1.27 1.686 1.567 -0.593 1.777 ...
##   ..$ area_mean             : num [1:569] 0.984 1.909 1.559 -0.764 1.826 ...
##   ..$ smoothness_mean       : num [1:569] 1.568 -0.827 0.942 3.284 0.28 ...
##   ..$ compactness_mean      : num [1:569] 3.284 -0.487 1.053 3.403 0.539 ...
##   ..$ concavity_mean        : num [1:569] 2.6529 -0.0238 1.3635 1.9159 1.371 ...
##   ..$ concave points_mean   : num [1:569] 2.532 0.548 2.037 1.452 1.428 ...
##   ..$ symmetry_mean         : num [1:569] 2.21752 0.00139 0.93968 2.86738 -0.00956 ...
##   ..$ fractal_dimension_mean: num [1:569] 2.256 -0.869 -0.398 4.911 -0.562 ...
##  $ cw  : num [1:10] 1 1 1 1 1 1 1 1 1 1
##  $ lw  : num [1:569] 0.00176 0.00176 0.00176 0.00176 0.00176 ...
##  $ eig : num [1:10] 5.479 2.519 0.881 0.499 0.373 ...
##  $ rank: int 10
##  $ nf  : int 10
##  $ c1  :'data.frame':    10 obs. of  10 variables:
##   ..$ CS1 : num [1:10] -0.364 -0.154 -0.376 -0.364 -0.232 ...
##   ..$ CS2 : num [1:10] -0.314 -0.147 -0.285 -0.305 0.402 ...
##   ..$ CS3 : num [1:10] -0.124 0.951 -0.114 -0.123 -0.167 ...
##   ..$ CS4 : num [1:10] 0.02956 0.00892 0.01346 0.01344 -0.1078 ...
##   ..$ CS5 : num [1:10] -0.03107 -0.21992 -0.00595 -0.01934 -0.84375 ...
##   ..$ CS6 : num [1:10] -0.2642 -0.0322 -0.2378 -0.3317 0.0622 ...
##   ..$ CS7 : num [1:10] 0.0442 -0.0206 0.0834 -0.2612 -0.0113 ...
##   ..$ CS8 : num [1:10] -0.08483 0.00713 -0.08926 -0.14461 -0.1705 ...
##   ..$ CS9 : num [1:10] -0.47443 -0.00421 -0.38017 0.74735 -0.00585 ...
##   ..$ CS10: num [1:10] 0.66907 -0.00025 -0.74049 0.03236 -0.00369 ...
##  $ li  :'data.frame':    569 obs. of  10 variables:
##   ..$ Axis1 : num [1:569] -5.22 -1.73 -3.97 -3.6 -3.15 ...
##   ..$ Axis2 : num [1:569] 3.2 -2.54 -0.55 6.91 -1.36 ...
##   ..$ Axis3 : num [1:569] -2.171 -1.02 -0.324 0.793 -1.862 ...
##   ..$ Axis4 : num [1:569] -0.169 0.548 0.398 -0.605 -0.185 ...
##   ..$ Axis5 : num [1:569] 1.514 0.312 -0.323 0.243 0.311 ...
##   ..$ Axis6 : num [1:569] 0.1131 -0.9356 0.2715 -0.617 0.0908 ...
##   ..$ Axis7 : num [1:569] 0.3447 -0.4209 -0.0765 0.0681 -0.3081 ...
##   ..$ Axis8 : num [1:569] 0.23193 0.00834 0.35505 0.10016 -0.09906 ...
##   ..$ Axis9 : num [1:569] -0.022 -0.0562 0.0201 -0.0435 -0.0266 ...
##   ..$ Axis10: num [1:569] 0.0113 0.023 0.0227 0.0535 -0.0341 ...
##  $ co  :'data.frame':    10 obs. of  10 variables:
##   ..$ Comp1 : num [1:10] -0.852 -0.362 -0.88 -0.852 -0.544 ...
##   ..$ Comp2 : num [1:10] -0.498 -0.234 -0.452 -0.484 0.638 ...
##   ..$ Comp3 : num [1:10] -0.117 0.892 -0.107 -0.116 -0.156 ...
##   ..$ Comp4 : num [1:10] 0.02088 0.0063 0.00951 0.0095 -0.07615 ...
##   ..$ Comp5 : num [1:10] -0.01896 -0.13423 -0.00363 -0.01181 -0.51499 ...
##   ..$ Comp6 : num [1:10] -0.0931 -0.0113 -0.0838 -0.1169 0.0219 ...
##   ..$ Comp7 : num [1:10] 0.01251 -0.00582 0.02359 -0.07391 -0.0032 ...
##   ..$ Comp8 : num [1:10] -0.01585 0.00133 -0.01667 -0.02701 -0.03185 ...
##   ..$ Comp9 : num [1:10] -0.050064 -0.000445 -0.040117 0.078864 -0.000617 ...
##   ..$ Comp10: num [1:10] 1.12e-02 -4.20e-06 -1.24e-02 5.44e-04 -6.20e-05 ...
##  $ l1  :'data.frame':    569 obs. of  10 variables:
##   ..$ RS1 : num [1:569] -2.232 -0.738 -1.696 -1.537 -1.346 ...
##   ..$ RS2 : num [1:569] 2.019 -1.601 -0.347 4.351 -0.856 ...
##   ..$ RS3 : num [1:569] -2.314 -1.087 -0.345 0.845 -1.984 ...
##   ..$ RS4 : num [1:569] -0.24 0.775 0.563 -0.856 -0.262 ...
##   ..$ RS5 : num [1:569] 2.481 0.512 -0.529 0.398 0.51 ...
##   ..$ RS6 : num [1:569] 0.321 -2.656 0.771 -1.751 0.258 ...
##   ..$ RS7 : num [1:569] 1.22 -1.49 -0.27 0.24 -1.09 ...
##   ..$ RS8 : num [1:569] 1.2417 0.0447 1.9008 0.5362 -0.5303 ...
##   ..$ RS9 : num [1:569] -0.208 -0.532 0.191 -0.412 -0.252 ...
##   ..$ RS10: num [1:569] 0.67 1.37 1.35 3.18 -2.03 ...
##  $ call: language dudi.pca(df = wbcd[, c(2:11)], scale = TRUE, scannf = FALSE, nf = 30)
##  $ cent: Named num [1:10] 14.1273 19.2896 91.969 654.8891 0.0964 ...
##   ..- attr(*, "names")= chr [1:10] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
##  $ norm: Named num [1:10] 3.521 4.2973 24.2776 351.6048 0.0141 ...
##   ..- attr(*, "names")= chr [1:10] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
##  - attr(*, "class")= chr [1:2] "pca" "dudi"
summary(mean_pca_4)
## Class: pca dudi
## Call: dudi.pca(df = wbcd[, c(2:11)], scale = TRUE, scannf = FALSE, 
##     nf = 30)
## 
## Total inertia: 10
## 
## Eigenvalues:
##     Ax1     Ax2     Ax3     Ax4     Ax5 
##  5.4786  2.5187  0.8806  0.4990  0.3725 
## 
## Projected inertia (%):
##     Ax1     Ax2     Ax3     Ax4     Ax5 
##  54.786  25.187   8.806   4.990   3.725 
## 
## Cumulative projected inertia (%):
##     Ax1   Ax1:2   Ax1:3   Ax1:4   Ax1:5 
##   54.79   79.97   88.78   93.77   97.49 
## 
## (Only 5 dimensions (out of 10) are shown)
SE
se_pca_4 <- dudi.pca(wbcd[,c(12:21)], scale = TRUE, scannf = FALSE, nf = 30)
class(se_pca_4)
## [1] "pca"  "dudi"
str(se_pca_4)
## List of 13
##  $ tab :'data.frame':    569 obs. of  10 variables:
##   ..$ radius_se           : num [1:569] 2.49 0.499 1.229 0.326 1.271 ...
##   ..$ texture_se          : num [1:569] -0.565 -0.876 -0.78 -0.11 -0.79 ...
##   ..$ perimeter_se        : num [1:569] 2.833 0.263 0.851 0.287 1.273 ...
##   ..$ area_se             : num [1:569] 2.488 0.742 1.181 -0.288 1.19 ...
##   ..$ smoothness_se       : num [1:569] -0.214 -0.605 -0.297 0.69 1.483 ...
##   ..$ compactness_se      : num [1:569] 1.3169 -0.6929 0.815 2.7443 -0.0485 ...
##   ..$ concavity_se        : num [1:569] 0.724 -0.441 0.213 0.82 0.828 ...
##   ..$ concave points_se   : num [1:569] 0.661 0.26 1.425 1.115 1.144 ...
##   ..$ symmetry_se         : num [1:569] 1.149 -0.805 0.237 4.733 -0.361 ...
##   ..$ fractal_dimension_se: num [1:569] 0.9071 -0.0994 0.2936 2.0475 0.4993 ...
##  $ cw  : num [1:10] 1 1 1 1 1 1 1 1 1 1
##  $ lw  : num [1:569] 0.00176 0.00176 0.00176 0.00176 0.00176 ...
##  $ eig : num [1:10] 4.743 2.075 1.264 0.594 0.577 ...
##  $ rank: int 10
##  $ nf  : int 10
##  $ c1  :'data.frame':    10 obs. of  10 variables:
##   ..$ CS1 : num [1:10] -0.346 -0.189 -0.357 -0.304 -0.212 ...
##   ..$ CS2 : num [1:10] -0.44 0.153 -0.42 -0.5 0.271 ...
##   ..$ CS3 : num [1:10] -0.0808 -0.5915 -0.0588 -0.0248 -0.4275 ...
##   ..$ CS4 : num [1:10] -0.0486 0.263 0.01 -0.0728 -0.7962 ...
##   ..$ CS5 : num [1:10] -0.0162 0.7188 -0.0174 -0.0249 -0.1821 ...
##   ..$ CS6 : num [1:10] 0.08864 -0.00945 0.03959 0.14303 -0.09 ...
##   ..$ CS7 : num [1:10] 0.02138 0.00784 -0.10094 0.17863 0.10052 ...
##   ..$ CS8 : num [1:10] -0.1255 0.0486 0.0336 0.0657 0.1111 ...
##   ..$ CS9 : num [1:10] 0.3192 -0.0511 0.5182 -0.7596 0.0233 ...
##   ..$ CS10: num [1:10] 0.74268 0.00286 -0.64051 -0.13073 -0.02422 ...
##  $ li  :'data.frame':    569 obs. of  10 variables:
##   ..$ Axis1 : num [1:569] -4.053 0.341 -1.961 -3.795 -2.219 ...
##   ..$ Axis2 : num [1:569] -2.59 -1.44 -1.17 2.66 -1.03 ...
##   ..$ Axis3 : num [1:569] 0.385 0.772 0.966 -0.748 0.403 ...
##   ..$ Axis4 : num [1:569] 0.4463 -0.3489 0.0336 2.0225 -1.7073 ...
##   ..$ Axis5 : num [1:569] -1.1229 -0.0348 -0.58 -3.0803 -0.4694 ...
##   ..$ Axis6 : num [1:569] 0.88039 -0.00367 -0.31629 0.54353 -0.41467 ...
##   ..$ Axis7 : num [1:569] -0.0375 -0.2149 -0.6578 -0.85 0.3203 ...
##   ..$ Axis8 : num [1:569] 0.126 -0.6072 -0.0941 0.305 -0.5156 ...
##   ..$ Axis9 : num [1:569] 0.2351 -0.2678 -0.2899 -0.0379 0.1376 ...
##   ..$ Axis10: num [1:569] -0.204 0.055 0.307 0.262 -0.117 ...
##  $ co  :'data.frame':    10 obs. of  10 variables:
##   ..$ Comp1 : num [1:10] -0.753 -0.411 -0.779 -0.662 -0.463 ...
##   ..$ Comp2 : num [1:10] -0.634 0.221 -0.605 -0.721 0.39 ...
##   ..$ Comp3 : num [1:10] -0.0908 -0.6652 -0.0661 -0.0279 -0.4807 ...
##   ..$ Comp4 : num [1:10] -0.0375 0.20274 0.00773 -0.05613 -0.61379 ...
##   ..$ Comp5 : num [1:10] -0.0123 0.5462 -0.0132 -0.0189 -0.1384 ...
##   ..$ Comp6 : num [1:10] 0.05136 -0.00548 0.02294 0.08287 -0.05214 ...
##   ..$ Comp7 : num [1:10] 0.0093 0.00341 -0.04392 0.07772 0.04374 ...
##   ..$ Comp8 : num [1:10] -0.0497 0.0193 0.0133 0.026 0.044 ...
##   ..$ Comp9 : num [1:10] 0.06522 -0.01045 0.10591 -0.15524 0.00477 ...
##   ..$ Comp10: num [1:10] 0.108689 0.000418 -0.093737 -0.019132 -0.003544 ...
##  $ l1  :'data.frame':    569 obs. of  10 variables:
##   ..$ RS1 : num [1:569] -1.861 0.156 -0.9 -1.742 -1.019 ...
##   ..$ RS2 : num [1:569] -1.796 -1.001 -0.813 1.846 -0.715 ...
##   ..$ RS3 : num [1:569] 0.343 0.687 0.859 -0.665 0.358 ...
##   ..$ RS4 : num [1:569] 0.5789 -0.4526 0.0436 2.6234 -2.2146 ...
##   ..$ RS5 : num [1:569] -1.4777 -0.0459 -0.7632 -4.0535 -0.6177 ...
##   ..$ RS6 : num [1:569] 1.5195 -0.00633 -0.5459 0.9381 -0.71569 ...
##   ..$ RS7 : num [1:569] -0.0861 -0.4939 -1.5117 -1.9535 0.7361 ...
##   ..$ RS8 : num [1:569] 0.318 -1.533 -0.238 0.77 -1.301 ...
##   ..$ RS9 : num [1:569] 1.15 -1.31 -1.418 -0.186 0.673 ...
##   ..$ RS10: num [1:569] -1.397 0.376 2.098 1.79 -0.796 ...
##  $ call: language dudi.pca(df = wbcd[, c(12:21)], scale = TRUE, scannf = FALSE, nf = 30)
##  $ cent: Named num [1:10] 0.40517 1.21685 2.86606 40.33708 0.00704 ...
##   ..- attr(*, "names")= chr [1:10] "radius_se" "texture_se" "perimeter_se" "area_se" ...
##  $ norm: Named num [1:10] 0.277 0.551 2.02 45.451 0.003 ...
##   ..- attr(*, "names")= chr [1:10] "radius_se" "texture_se" "perimeter_se" "area_se" ...
##  - attr(*, "class")= chr [1:2] "pca" "dudi"
summary(se_pca_4)
## Class: pca dudi
## Call: dudi.pca(df = wbcd[, c(12:21)], scale = TRUE, scannf = FALSE, 
##     nf = 30)
## 
## Total inertia: 10
## 
## Eigenvalues:
##     Ax1     Ax2     Ax3     Ax4     Ax5 
##  4.7434  2.0752  1.2644  0.5944  0.5775 
## 
## Projected inertia (%):
##     Ax1     Ax2     Ax3     Ax4     Ax5 
##  47.434  20.752  12.644   5.944   5.775 
## 
## Cumulative projected inertia (%):
##     Ax1   Ax1:2   Ax1:3   Ax1:4   Ax1:5 
##   47.43   68.19   80.83   86.77   92.55 
## 
## (Only 5 dimensions (out of 10) are shown)
Worst
worst_pca_4 <- dudi.pca(wbcd[,c(22:31)], scale = TRUE, scannf = FALSE, nf = 30)
class(worst_pca_4)
## [1] "pca"  "dudi"
str(worst_pca_4)
## List of 13
##  $ tab :'data.frame':    569 obs. of  10 variables:
##   ..$ radius_worst           : num [1:569] 1.887 1.806 1.512 -0.281 1.299 ...
##   ..$ texture_worst          : num [1:569] -1.359 -0.369 -0.024 0.134 -1.467 ...
##   ..$ perimeter_worst        : num [1:569] 2.3 1.54 1.35 -0.25 1.34 ...
##   ..$ area_worst             : num [1:569] 2 1.89 1.46 -0.55 1.22 ...
##   ..$ smoothness_worst       : num [1:569] 1.308 -0.376 0.527 3.394 0.221 ...
##   ..$ compactness_worst      : num [1:569] 2.617 -0.43 1.083 3.893 -0.313 ...
##   ..$ concavity_worst        : num [1:569] 2.11 -0.147 0.855 1.99 0.613 ...
##   ..$ concave points_worst   : num [1:569] 2.296 1.087 1.955 2.176 0.729 ...
##   ..$ symmetry_worst         : num [1:569] 2.751 -0.244 1.152 6.046 -0.868 ...
##   ..$ fractal_dimension_worst: num [1:569] 1.937 0.281 0.201 4.935 -0.397 ...
##  $ cw  : num [1:10] 1 1 1 1 1 1 1 1 1 1
##  $ lw  : num [1:569] 0.00176 0.00176 0.00176 0.00176 0.00176 ...
##  $ eig : num [1:10] 5.697 2.086 0.803 0.541 0.515 ...
##  $ rank: int 10
##  $ nf  : int 10
##  $ c1  :'data.frame':    10 obs. of  10 variables:
##   ..$ CS1 : num [1:10] -0.336 -0.201 -0.348 -0.325 -0.249 ...
##   ..$ CS2 : num [1:10] -0.4031 -0.0426 -0.3755 -0.4153 0.3379 ...
##   ..$ CS3 : num [1:10] -0.0761 0.9768 -0.0838 -0.079 -0.0514 ...
##   ..$ CS4 : num [1:10] 0.07096 -0.00233 0.03361 0.0661 0.31184 ...
##   ..$ CS5 : num [1:10] -0.026914 -0.029027 0.000677 -0.069245 -0.826364 ...
##   ..$ CS6 : num [1:10] 0.1738 -0.0151 0.1317 0.2944 -0.0711 ...
##   ..$ CS7 : num [1:10] 0.0258 -0.0265 -0.0265 0.2488 0.0908 ...
##   ..$ CS8 : num [1:10] -0.015 0.0431 -0.0922 -0.0317 -0.1624 ...
##   ..$ CS9 : num [1:10] 0.42612 -0.00619 0.45915 -0.74526 0.03946 ...
##   ..$ CS10: num [1:10] 0.70741 -0.006 -0.7016 -0.04175 -0.00681 ...
##  $ li  :'data.frame':    569 obs. of  10 variables:
##   ..$ Axis1 : num [1:569] -5.97 -1.82 -3.41 -6.3 -1.15 ...
##   ..$ Axis2 : num [1:569] 0.672 -2.315 -0.78 6.966 -1.878 ...
##   ..$ Axis3 : num [1:569] -2.545 -0.881 -0.775 -0.817 -1.839 ...
##   ..$ Axis4 : num [1:569] 0.70743 0.00988 0.5648 2.15604 -0.39309 ...
##   ..$ Axis5 : num [1:569] 0.918 -0.139 0.247 1.314 -0.694 ...
##   ..$ Axis6 : num [1:569] 0.491 0.927 -0.215 1.024 -0.131 ...
##   ..$ Axis7 : num [1:569] -0.039 -0.0995 -0.5089 -0.5796 0.3704 ...
##   ..$ Axis8 : num [1:569] -0.241 0.809 0.174 0.169 0.167 ...
##   ..$ Axis9 : num [1:569] -0.00564 -0.05416 -0.19504 -0.04306 0.20087 ...
##   ..$ Axis10: num [1:569] -0.2178 0.0948 0.1363 0.1227 -0.0626 ...
##  $ co  :'data.frame':    10 obs. of  10 variables:
##   ..$ Comp1 : num [1:10] -0.802 -0.479 -0.831 -0.775 -0.593 ...
##   ..$ Comp2 : num [1:10] -0.5822 -0.0615 -0.5424 -0.5998 0.488 ...
##   ..$ Comp3 : num [1:10] -0.0682 0.8752 -0.0751 -0.0708 -0.0461 ...
##   ..$ Comp4 : num [1:10] 0.05218 -0.00172 0.02471 0.0486 0.2293 ...
##   ..$ Comp5 : num [1:10] -0.019308 -0.020824 0.000486 -0.049677 -0.59284 ...
##   ..$ Comp6 : num [1:10] 0.07448 -0.00647 0.05646 0.12618 -0.0305 ...
##   ..$ Comp7 : num [1:10] 0.00747 -0.00766 -0.00769 0.07204 0.02629 ...
##   ..$ Comp8 : num [1:10] -0.00401 0.01156 -0.02472 -0.0085 -0.04351 ...
##   ..$ Comp9 : num [1:10] 0.052595 -0.000764 0.056672 -0.091986 0.00487 ...
##   ..$ Comp10: num [1:10] 0.044754 -0.00038 -0.044387 -0.002642 -0.000431 ...
##  $ l1  :'data.frame':    569 obs. of  10 variables:
##   ..$ RS1 : num [1:569] -2.503 -0.762 -1.428 -2.641 -0.48 ...
##   ..$ RS2 : num [1:569] 0.465 -1.603 -0.54 4.823 -1.3 ...
##   ..$ RS3 : num [1:569] -2.841 -0.983 -0.865 -0.912 -2.053 ...
##   ..$ RS4 : num [1:569] 0.9621 0.0134 0.7681 2.9321 -0.5346 ...
##   ..$ RS5 : num [1:569] 1.279 -0.194 0.345 1.831 -0.968 ...
##   ..$ RS6 : num [1:569] 1.146 2.162 -0.502 2.39 -0.306 ...
##   ..$ RS7 : num [1:569] -0.135 -0.344 -1.757 -2.002 1.279 ...
##   ..$ RS8 : num [1:569] -0.9 3.02 0.649 0.63 0.622 ...
##   ..$ RS9 : num [1:569] -0.0457 -0.4388 -1.5802 -0.3488 1.6274 ...
##   ..$ RS10: num [1:569] -3.44 1.5 2.15 1.94 -0.99 ...
##  $ call: language dudi.pca(df = wbcd[, c(22:31)], scale = TRUE, scannf = FALSE, nf = 30)
##  $ cent: Named num [1:10] 16.269 25.677 107.261 880.583 0.132 ...
##   ..- attr(*, "names")= chr [1:10] "radius_worst" "texture_worst" "perimeter_worst" "area_worst" ...
##  $ norm: Named num [1:10] 4.829 6.1409 33.573 568.8565 0.0228 ...
##   ..- attr(*, "names")= chr [1:10] "radius_worst" "texture_worst" "perimeter_worst" "area_worst" ...
##  - attr(*, "class")= chr [1:2] "pca" "dudi"
summary(worst_pca_4)
## Class: pca dudi
## Call: dudi.pca(df = wbcd[, c(22:31)], scale = TRUE, scannf = FALSE, 
##     nf = 30)
## 
## Total inertia: 10
## 
## Eigenvalues:
##     Ax1     Ax2     Ax3     Ax4     Ax5 
##  5.6972  2.0860  0.8028  0.5407  0.5147 
## 
## Projected inertia (%):
##     Ax1     Ax2     Ax3     Ax4     Ax5 
##  56.972  20.860   8.028   5.407   5.147 
## 
## Cumulative projected inertia (%):
##     Ax1   Ax1:2   Ax1:3   Ax1:4   Ax1:5 
##   56.97   77.83   85.86   91.27   96.41 
## 
## (Only 5 dimensions (out of 10) are shown)

The function class() showcases that the object created by using the dudi.pca() function is of class “pca” (not “PCA” like PCA()’s objects) and “dudi”. It is also worth noting that the cumulative proportions obtained with dudi.pca() (and seen when applying the summary() function to an object of this kind) are identical to the ones obtained with the PCA functions covered up until this point - as should be.

Once again, even though the components of the PCA object obtained through dudi.pca() can be read in str()’s output, the function print() does a better job at showcasing the object’s components and how the resulting data is stored within it/them.

print(all_pca_4)
## Duality diagramm
## class: pca dudi
## $call: dudi.pca(df = wbcd[, -1], scale = TRUE, scannf = FALSE, nf = 30)
## 
## $nf: 30 axis-components saved
## $rank: 30
## eigen values: 13.28 5.691 2.818 1.981 1.649 ...
##   vector length mode    content       
## 1 $cw    30     numeric column weights
## 2 $lw    569    numeric row weights   
## 3 $eig   30     numeric eigen values  
## 
##   data.frame nrow ncol content             
## 1 $tab       569  30   modified array      
## 2 $li        569  30   row coordinates     
## 3 $l1        569  30   row normed scores   
## 4 $co        30   30   column coordinates  
## 5 $c1        30   30   column normed scores
## other elements: cent norm
The resulting output isn’t as impressive as was with PCA()’s objects, but it’s a cleaner way to observe the components than through str() and definitely an improvement over R built-in functions. These components are the following:
  • tab: the data frame to be analyzed depending of the transformation arguments (center and scale).
  • cw: the column weights.
  • lw: the row weights.
  • eig: the eigenvalues.
  • rank: the rank of the analyzed matrice.
  • nf: the number of kept factors.
  • c1: the column normed scores i.e. the principal axes.
  • l1: the row normed scores.
  • co: the column coordinates.
  • li: the row coordinates i.e. the principal components.
  • call: the call function.
  • cent: the p vector containing the means for variables.
  • norm: the p vector containing the standard deviations for variables i.e. the root of the sum of squares deviations of the values from their means divided by n

Having the eigenvalues stored as a component is an improvement with respect to R built-in functions, but PCA()’s arguments are more convenient under most circumstances.

2.2.4. epPCA() from ExPosition

The epPCA() function from the ExPosition package is formatted as follows:

# Do not run this code snippet, as it is only here for illustration purposes
library(ExPosition)
epPCA(DATA, 
      scale = TRUE, 
      center = TRUE, 
      DESIGN = NULL, 
      make_design_nominal = TRUE, 
      graphs = TRUE, 
      k = 0)
Let’s detail its arguments:
  • DATA: the numeric matrix or dataset/dataframe upon which to perform the PCA.
  • scale: TRUE or FALSE determines whether or not to scale/standardize the data.
  • center: TRUE of FALSE determines whether perform an optional row weight (by default, uniform row weights).
  • DESIGN: a design matrix to indicate if rows belong to groups.
  • make_design_nominal: if TRUE (default) then DESIGN is a vector that indicates groups (and will be dummy-coded); if FALSE then DESIGN is a dummy-coded matrix.
  • graph: TRUE of FALSE determines whether or not to display the PCA’s associated graph.
  • k: number of components to return.

More information regarding the epPCA() function and its arguments is available in its associated RDocumentation page: https://www.rdocumentation.org/packages/ExPosition/versions/2.8.23/topics/epPCA

Let’s now evaluate the results when applying epPCA() to the Wisconsin Breast Cancer Dataset:

All
all_pca_5 <- epPCA(wbcd[,-1], scale = TRUE, graphs = FALSE, k = 30)
class(all_pca_5)
## [1] "expoOutput" "list"
str(all_pca_5)
## List of 2
##  $ ExPosition.Data:List of 16
##   ..$ fi    : num [1:569, 1:30] -9.18 -2.39 -5.73 -7.12 -3.93 ...
##   ..$ di    : num [1:569, 1] 114.5 26.3 37.4 195.3 34.4 ...
##   ..$ ci    : num [1:569, 1:30] 0.011182 0.000754 0.00435 0.006714 0.002049 ...
##   ..$ ri    : num [1:569, 1:30] 0.737 0.216 0.878 0.259 0.45 ...
##   ..$ fj    : num [1:30, 1:30] -19.01 -9.01 -19.76 -19.19 -12.38 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:30] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
##   .. .. ..$ : NULL
##   ..$ cj    : num [1:30, 1:30] 0.0479 0.0108 0.0518 0.0488 0.0203 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:30] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
##   .. .. ..$ : NULL
##   ..$ rj    : num [1:30, 1:30] 0.636 0.143 0.688 0.649 0.27 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:30] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
##   .. .. ..$ : NULL
##   ..$ dj    : num [1:30, 1] 568 568 568 568 568 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:30] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
##   .. .. ..$ : NULL
##   ..$ t     : num [1:30] 44.27 18.97 9.39 6.6 5.5 ...
##   ..$ eigs  : num [1:30] 7544 3233 1601 1125 936 ...
##   ..$ pdq   :List of 8
##   .. ..$ p   : num [1:569, 1:30] -0.1057 -0.0275 -0.066 -0.0819 -0.0453 ...
##   .. ..$ q   : num [1:30, 1:30] -0.219 -0.104 -0.228 -0.221 -0.143 ...
##   .. .. ..- attr(*, "dimnames")=List of 2
##   .. .. .. ..$ : chr [1:30] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
##   .. .. .. ..$ : NULL
##   .. ..$ Dv  : num [1:30] 86.9 56.9 40 33.5 30.6 ...
##   .. ..$ Dd  : num [1:30, 1:30] 86.9 0 0 0 0 ...
##   .. ..$ ng  : int 30
##   .. ..$ rank: int 30
##   .. ..$ tau : num [1:30] 44.27 18.97 9.39 6.6 5.5 ...
##   .. ..$ eigs: num [1:30] 7544 3233 1601 1125 936 ...
##   .. ..- attr(*, "class")= chr [1:2] "epSVD" "list"
##   ..$ X     : num [1:569, 1:30] 1.096 1.828 1.578 -0.768 1.749 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : NULL
##   .. .. ..$ : chr [1:30] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
##   .. ..- attr(*, "scaled:center")= Named num [1:30] 14.1273 19.2896 91.969 654.8891 0.0964 ...
##   .. .. ..- attr(*, "names")= chr [1:30] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
##   .. ..- attr(*, "scaled:scale")= Named num [1:30] 3.524 4.301 24.299 351.9141 0.0141 ...
##   .. .. ..- attr(*, "names")= chr [1:30] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
##   ..$ M     : num [1:569] 1 1 1 1 1 1 1 1 1 1 ...
##   ..$ W     : num [1:30] 1 1 1 1 1 1 1 1 1 1 ...
##   ..$ center: Named num [1:30] 14.1273 19.2896 91.969 654.8891 0.0964 ...
##   .. ..- attr(*, "names")= chr [1:30] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
##   ..$ scale : Named num [1:30] 3.524 4.301 24.299 351.9141 0.0141 ...
##   .. ..- attr(*, "names")= chr [1:30] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
##   ..- attr(*, "class")= chr [1:2] "epPCA" "list"
##  $ Plotting.Data  :List of 5
##   ..$ fi.col     : chr [1:569, 1] "#305ABF" "#305ABF" "#305ABF" "#305ABF" ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:569] "1" "2" "3" "4" ...
##   .. .. ..$ : NULL
##   ..$ fi.pch     : num [1:569, 1] 21 21 21 21 21 21 21 21 21 21 ...
##   ..$ fj.col     : chr [1:30, 1] "mediumorchid4" "mediumorchid4" "mediumorchid4" "mediumorchid4" ...
##   ..$ fj.pch     : num [1:30, 1] 21 21 21 21 21 21 21 21 21 21 ...
##   ..$ constraints:List of 4
##   .. ..$ minx: num -26.1
##   .. ..$ miny: num -24
##   .. ..$ maxx: num 6.39
##   .. ..$ maxy: num 15.3
##   ..- attr(*, "class")= chr [1:2] "epGraphs" "list"
##  - attr(*, "class")= chr [1:2] "expoOutput" "list"
summary(all_pca_5)
##                 Length Class    Mode
## ExPosition.Data 16     epPCA    list
## Plotting.Data    5     epGraphs list
Mean
mean_pca_5 <- epPCA(wbcd[,c(2:11)], scale = TRUE, graphs = FALSE, k = 30)
class(mean_pca_5)
## [1] "expoOutput" "list"
str(mean_pca_5)
## List of 2
##  $ ExPosition.Data:List of 16
##   ..$ fi    : num [1:569, 1:10] -5.22 -1.73 -3.97 -3.59 -3.15 ...
##   ..$ di    : num [1:569, 1] 44.7 11.9 16.6 62 15.5 ...
##   ..$ ci    : num [1:569, 1:10] 0.008755 0.000958 0.005055 0.00415 0.003185 ...
##   ..$ ri    : num [1:569, 1:10] 0.609 0.25 0.947 0.208 0.641 ...
##   ..$ fj    : num [1:10, 1:10] -20.3 -8.62 -20.98 -20.31 -12.97 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:10] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
##   .. .. ..$ : NULL
##   ..$ cj    : num [1:10, 1:10] 0.1325 0.0239 0.1414 0.1326 0.054 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:10] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
##   .. .. ..$ : NULL
##   ..$ rj    : num [1:10, 1:10] 0.726 0.131 0.775 0.726 0.296 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:10] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
##   .. .. ..$ : NULL
##   ..$ dj    : num [1:10, 1] 568 568 568 568 568 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:10] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
##   .. .. ..$ : NULL
##   ..$ t     : num [1:10] 54.79 25.19 8.81 4.99 3.73 ...
##   ..$ eigs  : num [1:10] 3112 1431 500 283 212 ...
##   ..$ pdq   :List of 8
##   .. ..$ p   : num [1:569, 1:10] -0.0936 -0.031 -0.0711 -0.0644 -0.0564 ...
##   .. ..$ q   : num [1:10, 1:10] -0.364 -0.154 -0.376 -0.364 -0.232 ...
##   .. .. ..- attr(*, "dimnames")=List of 2
##   .. .. .. ..$ : chr [1:10] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
##   .. .. .. ..$ : NULL
##   .. ..$ Dv  : num [1:10] 55.8 37.8 22.4 16.8 14.5 ...
##   .. ..$ Dd  : num [1:10, 1:10] 55.8 0 0 0 0 ...
##   .. ..$ ng  : int 10
##   .. ..$ rank: int 10
##   .. ..$ tau : num [1:10] 54.79 25.19 8.81 4.99 3.73 ...
##   .. ..$ eigs: num [1:10] 3112 1431 500 283 212 ...
##   .. ..- attr(*, "class")= chr [1:2] "epSVD" "list"
##   ..$ X     : num [1:569, 1:10] 1.096 1.828 1.578 -0.768 1.749 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : NULL
##   .. .. ..$ : chr [1:10] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
##   .. ..- attr(*, "scaled:center")= Named num [1:10] 14.1273 19.2896 91.969 654.8891 0.0964 ...
##   .. .. ..- attr(*, "names")= chr [1:10] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
##   .. ..- attr(*, "scaled:scale")= Named num [1:10] 3.524 4.301 24.299 351.9141 0.0141 ...
##   .. .. ..- attr(*, "names")= chr [1:10] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
##   ..$ M     : num [1:569] 1 1 1 1 1 1 1 1 1 1 ...
##   ..$ W     : num [1:10] 1 1 1 1 1 1 1 1 1 1
##   ..$ center: Named num [1:10] 14.1273 19.2896 91.969 654.8891 0.0964 ...
##   .. ..- attr(*, "names")= chr [1:10] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
##   ..$ scale : Named num [1:10] 3.524 4.301 24.299 351.9141 0.0141 ...
##   .. ..- attr(*, "names")= chr [1:10] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
##   ..- attr(*, "class")= chr [1:2] "epPCA" "list"
##  $ Plotting.Data  :List of 5
##   ..$ fi.col     : chr [1:569, 1] "#305ABF" "#305ABF" "#305ABF" "#305ABF" ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:569] "1" "2" "3" "4" ...
##   .. .. ..$ : NULL
##   ..$ fi.pch     : num [1:569, 1] 21 21 21 21 21 21 21 21 21 21 ...
##   ..$ fj.col     : chr [1:10, 1] "mediumorchid4" "mediumorchid4" "mediumorchid4" "mediumorchid4" ...
##   ..$ fj.pch     : num [1:10, 1] 21 21 21 21 21 21 21 21 21 21
##   ..$ constraints:List of 4
##   .. ..$ minx: num -26.8
##   .. ..$ miny: num -24.9
##   .. ..$ maxx: num 4.56
##   .. ..$ maxy: num 13.7
##   ..- attr(*, "class")= chr [1:2] "epGraphs" "list"
##  - attr(*, "class")= chr [1:2] "expoOutput" "list"
summary(mean_pca_5)
##                 Length Class    Mode
## ExPosition.Data 16     epPCA    list
## Plotting.Data    5     epGraphs list
SE
se_pca_5 <- epPCA(wbcd[,c(12:21)], scale = TRUE, graphs = FALSE, k = 30)
class(se_pca_5)
## [1] "expoOutput" "list"
str(se_pca_5)
## List of 2
##  $ ExPosition.Data:List of 16
##   ..$ fi    : num [1:569, 1:10] -4.05 0.34 -1.96 -3.79 -2.22 ...
##   ..$ di    : num [1:569, 1] 25.57 3.4 7.2 36.73 9.84 ...
##   ..$ ci    : num [1:569, 1:10] 0.006086 0.000043 0.001425 0.005336 0.001824 ...
##   ..$ ri    : num [1:569, 1:10] 0.6413 0.0341 0.5336 0.3914 0.4996 ...
##   ..$ fj    : num [1:10, 1:10] -17.94 -9.79 -18.56 -15.78 -11.03 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:10] "radius_se" "texture_se" "perimeter_se" "area_se" ...
##   .. .. ..$ : NULL
##   ..$ cj    : num [1:10, 1:10] 0.1194 0.0356 0.1278 0.0924 0.0451 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:10] "radius_se" "texture_se" "perimeter_se" "area_se" ...
##   .. .. ..$ : NULL
##   ..$ rj    : num [1:10, 1:10] 0.567 0.169 0.606 0.438 0.214 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:10] "radius_se" "texture_se" "perimeter_se" "area_se" ...
##   .. .. ..$ : NULL
##   ..$ dj    : num [1:10, 1] 568 568 568 568 568 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:10] "radius_se" "texture_se" "perimeter_se" "area_se" ...
##   .. .. ..$ : NULL
##   ..$ t     : num [1:10] 47.43 20.75 12.64 5.94 5.77 ...
##   ..$ eigs  : num [1:10] 2694 1179 718 338 328 ...
##   ..$ pdq   :List of 8
##   .. ..$ p   : num [1:569, 1:10] -0.07801 0.00656 -0.03775 -0.07305 -0.04271 ...
##   .. ..$ q   : num [1:10, 1:10] -0.346 -0.189 -0.357 -0.304 -0.212 ...
##   .. .. ..- attr(*, "dimnames")=List of 2
##   .. .. .. ..$ : chr [1:10] "radius_se" "texture_se" "perimeter_se" "area_se" ...
##   .. .. .. ..$ : NULL
##   .. ..$ Dv  : num [1:10] 51.9 34.3 26.8 18.4 18.1 ...
##   .. ..$ Dd  : num [1:10, 1:10] 51.9 0 0 0 0 ...
##   .. ..$ ng  : int 10
##   .. ..$ rank: int 10
##   .. ..$ tau : num [1:10] 47.43 20.75 12.64 5.94 5.77 ...
##   .. ..$ eigs: num [1:10] 2694 1179 718 338 328 ...
##   .. ..- attr(*, "class")= chr [1:2] "epSVD" "list"
##   ..$ X     : num [1:569, 1:10] 2.488 0.499 1.228 0.326 1.269 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : NULL
##   .. .. ..$ : chr [1:10] "radius_se" "texture_se" "perimeter_se" "area_se" ...
##   .. ..- attr(*, "scaled:center")= Named num [1:10] 0.40517 1.21685 2.86606 40.33708 0.00704 ...
##   .. .. ..- attr(*, "names")= chr [1:10] "radius_se" "texture_se" "perimeter_se" "area_se" ...
##   .. ..- attr(*, "scaled:scale")= Named num [1:10] 0.277 0.552 2.022 45.491 0.003 ...
##   .. .. ..- attr(*, "names")= chr [1:10] "radius_se" "texture_se" "perimeter_se" "area_se" ...
##   ..$ M     : num [1:569] 1 1 1 1 1 1 1 1 1 1 ...
##   ..$ W     : num [1:10] 1 1 1 1 1 1 1 1 1 1
##   ..$ center: Named num [1:10] 0.40517 1.21685 2.86606 40.33708 0.00704 ...
##   .. ..- attr(*, "names")= chr [1:10] "radius_se" "texture_se" "perimeter_se" "area_se" ...
##   ..$ scale : Named num [1:10] 0.277 0.552 2.022 45.491 0.003 ...
##   .. ..- attr(*, "names")= chr [1:10] "radius_se" "texture_se" "perimeter_se" "area_se" ...
##   ..- attr(*, "class")= chr [1:2] "epPCA" "list"
##  $ Plotting.Data  :List of 5
##   ..$ fi.col     : chr [1:569, 1] "#305ABF" "#305ABF" "#305ABF" "#305ABF" ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:569] "1" "2" "3" "4" ...
##   .. .. ..$ : NULL
##   ..$ fi.pch     : num [1:569, 1] 21 21 21 21 21 21 21 21 21 21 ...
##   ..$ fj.col     : chr [1:10, 1] "mediumorchid4" "mediumorchid4" "mediumorchid4" "mediumorchid4" ...
##   ..$ fj.pch     : num [1:10, 1] 21 21 21 21 21 21 21 21 21 21
##   ..$ constraints:List of 4
##   .. ..$ minx: num -23
##   .. ..$ miny: num -13.9
##   .. ..$ maxx: num 3.51
##   .. ..$ maxy: num 19.7
##   ..- attr(*, "class")= chr [1:2] "epGraphs" "list"
##  - attr(*, "class")= chr [1:2] "expoOutput" "list"
summary(se_pca_5)
##                 Length Class    Mode
## ExPosition.Data 16     epPCA    list
## Plotting.Data    5     epGraphs list
Worst
worst_pca_5 <- epPCA(wbcd[,c(22:31)], scale = TRUE, graphs = FALSE, k = 30)
class(worst_pca_5)
## [1] "expoOutput" "list"
str(worst_pca_5)
## List of 2
##  $ ExPosition.Data:List of 16
##   ..$ fi    : num [1:569, 1:10] -5.97 -1.82 -3.4 -6.3 -1.15 ...
##   ..$ di    : num [1:569, 1] 44.24 10.98 13.57 96.57 9.07 ...
##   ..$ ci    : num [1:569, 1:10] 0.011011 0.00102 0.003582 0.012262 0.000406 ...
##   ..$ ri    : num [1:569, 1:10] 0.805 0.301 0.854 0.411 0.145 ...
##   ..$ fj    : num [1:10, 1:10] -19.1 -11.4 -19.8 -18.5 -14.1 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:10] "radius_worst" "texture_worst" "perimeter_worst" "area_worst" ...
##   .. .. ..$ : NULL
##   ..$ cj    : num [1:10, 1:10] 0.1128 0.0403 0.1212 0.1055 0.0618 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:10] "radius_worst" "texture_worst" "perimeter_worst" "area_worst" ...
##   .. .. ..$ : NULL
##   ..$ rj    : num [1:10, 1:10] 0.643 0.23 0.691 0.601 0.352 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:10] "radius_worst" "texture_worst" "perimeter_worst" "area_worst" ...
##   .. .. ..$ : NULL
##   ..$ dj    : num [1:10, 1] 568 568 568 568 568 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:10] "radius_worst" "texture_worst" "perimeter_worst" "area_worst" ...
##   .. .. ..$ : NULL
##   ..$ t     : num [1:10] 56.97 20.86 8.03 5.41 5.15 ...
##   ..$ eigs  : num [1:10] 3236 1185 456 307 292 ...
##   ..$ pdq   :List of 8
##   .. ..$ p   : num [1:569, 1:10] -0.1049 -0.0319 -0.0599 -0.1107 -0.0201 ...
##   .. ..$ q   : num [1:10, 1:10] -0.336 -0.201 -0.348 -0.325 -0.249 ...
##   .. .. ..- attr(*, "dimnames")=List of 2
##   .. .. .. ..$ : chr [1:10] "radius_worst" "texture_worst" "perimeter_worst" "area_worst" ...
##   .. .. .. ..$ : NULL
##   .. ..$ Dv  : num [1:10] 56.9 34.4 21.4 17.5 17.1 ...
##   .. ..$ Dd  : num [1:10, 1:10] 56.9 0 0 0 0 ...
##   .. ..$ ng  : int 10
##   .. ..$ rank: int 10
##   .. ..$ tau : num [1:10] 56.97 20.86 8.03 5.41 5.15 ...
##   .. ..$ eigs: num [1:10] 3236 1185 456 307 292 ...
##   .. ..- attr(*, "class")= chr [1:2] "epSVD" "list"
##   ..$ X     : num [1:569, 1:10] 1.885 1.804 1.511 -0.281 1.297 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : NULL
##   .. .. ..$ : chr [1:10] "radius_worst" "texture_worst" "perimeter_worst" "area_worst" ...
##   .. ..- attr(*, "scaled:center")= Named num [1:10] 16.269 25.677 107.261 880.583 0.132 ...
##   .. .. ..- attr(*, "names")= chr [1:10] "radius_worst" "texture_worst" "perimeter_worst" "area_worst" ...
##   .. ..- attr(*, "scaled:scale")= Named num [1:10] 4.8332 6.1463 33.6025 569.357 0.0228 ...
##   .. .. ..- attr(*, "names")= chr [1:10] "radius_worst" "texture_worst" "perimeter_worst" "area_worst" ...
##   ..$ M     : num [1:569] 1 1 1 1 1 1 1 1 1 1 ...
##   ..$ W     : num [1:10] 1 1 1 1 1 1 1 1 1 1
##   ..$ center: Named num [1:10] 16.269 25.677 107.261 880.583 0.132 ...
##   .. ..- attr(*, "names")= chr [1:10] "radius_worst" "texture_worst" "perimeter_worst" "area_worst" ...
##   ..$ scale : Named num [1:10] 4.8332 6.1463 33.6025 569.357 0.0228 ...
##   .. ..- attr(*, "names")= chr [1:10] "radius_worst" "texture_worst" "perimeter_worst" "area_worst" ...
##   ..- attr(*, "class")= chr [1:2] "epPCA" "list"
##  $ Plotting.Data  :List of 5
##   ..$ fi.col     : chr [1:569, 1] "#305ABF" "#305ABF" "#305ABF" "#305ABF" ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:569] "1" "2" "3" "4" ...
##   .. .. ..$ : NULL
##   ..$ fi.pch     : num [1:569, 1] 21 21 21 21 21 21 21 21 21 21 ...
##   ..$ fj.col     : chr [1:10, 1] "mediumorchid4" "mediumorchid4" "mediumorchid4" "mediumorchid4" ...
##   ..$ fj.pch     : num [1:10, 1] 21 21 21 21 21 21 21 21 21 21
##   ..$ constraints:List of 4
##   .. ..$ minx: num -26
##   .. ..$ miny: num -18.9
##   .. ..$ maxx: num 5.09
##   .. ..$ maxy: num 16.4
##   ..- attr(*, "class")= chr [1:2] "epGraphs" "list"
##  - attr(*, "class")= chr [1:2] "expoOutput" "list"
summary(worst_pca_5)
##                 Length Class    Mode
## ExPosition.Data 16     epPCA    list
## Plotting.Data    5     epGraphs list

The function class() showcases that the object created by using the epPCA() function is of class “list” and “expoOutput”. It is also worth noting that the cumulative proportions easily observed with previous functions is hidden somewhere within the object and not readily available for an evaluation.

The output of both str() and summary() suggest that the core components are hidden within two “father” components by the name of ExPosition.Data and Plotting.Data. The output of applying the print() function to the object at hand further evidentiates this:

print(all_pca_5)
## **ExPosition output data**
## *Contains the following objects:
## 
##   name              
## 1 "$ExPosition.Data"
## 2 "$Plotting.Data"  
##   description                                                                 
## 1 "All ExPosition classes output (data, factor scores, contributions, etc...)"
## 2 "All ExPosition & prettyGraphs plotting data (constraints, colors, etc...)"

The brief descriptions given by print()’s output imply that the components equivalent to those seen in previous functions are stored inside ExPosition.Data whereas Plotting.Data holds data regarding an optional plot (as their name would suggest).

The following code snippet details the components stored inside ExPosition.Data:

All
print(all_pca_5$ExPosition.Data)
## **Results for Principal Component Analysis**
## The analysis was performed on  569 individuals, described by 30 variables
## *The results are available in the following objects:
## 
##    name      description                      
## 1  "$fi"     "Factor scores of the rows"      
## 2  "$di"     "Squared distances of the rows"  
## 3  "$ci"     "Contributions of the rows"      
## 4  "$ri"     "Cosines of the rows"            
## 5  "$fj"     "Factor scores of the columns"   
## 6  "$dj"     "square distances of the columns"
## 7  "$cj"     "Contributions for the columns"  
## 8  "$rj"     "Cosines of the columns"         
## 9  "$t"      "Explained Variance"             
## 10 "$eigs"   "Eigenvalues"                    
## 11 "$pdq"    "SVD data"                       
## 12 "$X"      "X matrix to decompose"          
## 13 "$M"      "Masses - each set to 1"         
## 14 "$W"      "Weights - each set to 1"        
## 15 "$center" "Center of X"                    
## 16 "$scale"  "Scale factor of X"
summary(all_pca_5$ExPosition.Data)
##        Length Class  Mode   
## fi     17070  -none- numeric
## di       569  -none- numeric
## ci     17070  -none- numeric
## ri     17070  -none- numeric
## fj       900  -none- numeric
## cj       900  -none- numeric
## rj       900  -none- numeric
## dj        30  -none- numeric
## t         30  -none- numeric
## eigs      30  -none- numeric
## pdq        8  epSVD  list   
## X      17070  -none- numeric
## M        569  -none- numeric
## W         30  -none- numeric
## center    30  -none- numeric
## scale     30  -none- numeric
Mean
print(mean_pca_5$ExPosition.Data)
## **Results for Principal Component Analysis**
## The analysis was performed on  569 individuals, described by 10 variables
## *The results are available in the following objects:
## 
##    name      description                      
## 1  "$fi"     "Factor scores of the rows"      
## 2  "$di"     "Squared distances of the rows"  
## 3  "$ci"     "Contributions of the rows"      
## 4  "$ri"     "Cosines of the rows"            
## 5  "$fj"     "Factor scores of the columns"   
## 6  "$dj"     "square distances of the columns"
## 7  "$cj"     "Contributions for the columns"  
## 8  "$rj"     "Cosines of the columns"         
## 9  "$t"      "Explained Variance"             
## 10 "$eigs"   "Eigenvalues"                    
## 11 "$pdq"    "SVD data"                       
## 12 "$X"      "X matrix to decompose"          
## 13 "$M"      "Masses - each set to 1"         
## 14 "$W"      "Weights - each set to 1"        
## 15 "$center" "Center of X"                    
## 16 "$scale"  "Scale factor of X"
summary(mean_pca_5$ExPosition.Data)
##        Length Class  Mode   
## fi     5690   -none- numeric
## di      569   -none- numeric
## ci     5690   -none- numeric
## ri     5690   -none- numeric
## fj      100   -none- numeric
## cj      100   -none- numeric
## rj      100   -none- numeric
## dj       10   -none- numeric
## t        10   -none- numeric
## eigs     10   -none- numeric
## pdq       8   epSVD  list   
## X      5690   -none- numeric
## M       569   -none- numeric
## W        10   -none- numeric
## center   10   -none- numeric
## scale    10   -none- numeric
SE
print(se_pca_5$ExPosition.Data)
## **Results for Principal Component Analysis**
## The analysis was performed on  569 individuals, described by 10 variables
## *The results are available in the following objects:
## 
##    name      description                      
## 1  "$fi"     "Factor scores of the rows"      
## 2  "$di"     "Squared distances of the rows"  
## 3  "$ci"     "Contributions of the rows"      
## 4  "$ri"     "Cosines of the rows"            
## 5  "$fj"     "Factor scores of the columns"   
## 6  "$dj"     "square distances of the columns"
## 7  "$cj"     "Contributions for the columns"  
## 8  "$rj"     "Cosines of the columns"         
## 9  "$t"      "Explained Variance"             
## 10 "$eigs"   "Eigenvalues"                    
## 11 "$pdq"    "SVD data"                       
## 12 "$X"      "X matrix to decompose"          
## 13 "$M"      "Masses - each set to 1"         
## 14 "$W"      "Weights - each set to 1"        
## 15 "$center" "Center of X"                    
## 16 "$scale"  "Scale factor of X"
summary(se_pca_5$ExPosition.Data)
##        Length Class  Mode   
## fi     5690   -none- numeric
## di      569   -none- numeric
## ci     5690   -none- numeric
## ri     5690   -none- numeric
## fj      100   -none- numeric
## cj      100   -none- numeric
## rj      100   -none- numeric
## dj       10   -none- numeric
## t        10   -none- numeric
## eigs     10   -none- numeric
## pdq       8   epSVD  list   
## X      5690   -none- numeric
## M       569   -none- numeric
## W        10   -none- numeric
## center   10   -none- numeric
## scale    10   -none- numeric
Worst
print(worst_pca_5$ExPosition.Data)
## **Results for Principal Component Analysis**
## The analysis was performed on  569 individuals, described by 10 variables
## *The results are available in the following objects:
## 
##    name      description                      
## 1  "$fi"     "Factor scores of the rows"      
## 2  "$di"     "Squared distances of the rows"  
## 3  "$ci"     "Contributions of the rows"      
## 4  "$ri"     "Cosines of the rows"            
## 5  "$fj"     "Factor scores of the columns"   
## 6  "$dj"     "square distances of the columns"
## 7  "$cj"     "Contributions for the columns"  
## 8  "$rj"     "Cosines of the columns"         
## 9  "$t"      "Explained Variance"             
## 10 "$eigs"   "Eigenvalues"                    
## 11 "$pdq"    "SVD data"                       
## 12 "$X"      "X matrix to decompose"          
## 13 "$M"      "Masses - each set to 1"         
## 14 "$W"      "Weights - each set to 1"        
## 15 "$center" "Center of X"                    
## 16 "$scale"  "Scale factor of X"
summary(worst_pca_5$ExPosition.Data)
##        Length Class  Mode   
## fi     5690   -none- numeric
## di      569   -none- numeric
## ci     5690   -none- numeric
## ri     5690   -none- numeric
## fj      100   -none- numeric
## cj      100   -none- numeric
## rj      100   -none- numeric
## dj       10   -none- numeric
## t        10   -none- numeric
## eigs     10   -none- numeric
## pdq       8   epSVD  list   
## X      5690   -none- numeric
## M       569   -none- numeric
## W        10   -none- numeric
## center   10   -none- numeric
## scale    10   -none- numeric

As was the case with dudi.pca(), having the eigenvalues stored as a component is an improvement with respect to R built-in functions, but PCA()’s arguments are more convenient under most circumstances. What’s more: this is the only PCA function with no direct access to cumulative variance.

2.2.5. Eigenvalues

The aforementioned video essay by Josh Starmer (https://www.youtube.com/watch?v=FgakZw6K1QQ) helps with the understanding of the eigenvalues (also known as singular vectors) since there is a graphical explanation detailing where they come from, but in summary they are a unit long vector whose slope is given by the amount of variation retained by each principal component. As such, eigenvalues store said variation and are therefore large for the first PCs and small for the subsequent ones, meaning that the first PCs correspond to the directions with the maximum amount of variation in the dataset (which makes sense since the first PC is the one that better explains the dataset).

Some of the previously described functions made the eigenvalues easily available and accessible - doing so requires addressing the proper component within each of said functions:

PCA()
all_pca_3$eig
##           eigenvalue percentage of variance cumulative percentage of variance
## comp 1  1.328161e+01           4.427203e+01                          44.27203
## comp 2  5.691355e+00           1.897118e+01                          63.24321
## comp 3  2.817949e+00           9.393163e+00                          72.63637
## comp 4  1.980640e+00           6.602135e+00                          79.23851
## comp 5  1.648731e+00           5.495768e+00                          84.73427
## comp 6  1.207357e+00           4.024522e+00                          88.75880
## comp 7  6.752201e-01           2.250734e+00                          91.00953
## comp 8  4.766171e-01           1.588724e+00                          92.59825
## comp 9  4.168948e-01           1.389649e+00                          93.98790
## comp 10 3.506935e-01           1.168978e+00                          95.15688
## comp 11 2.939157e-01           9.797190e-01                          96.13660
## comp 12 2.611614e-01           8.705379e-01                          97.00714
## comp 13 2.413575e-01           8.045250e-01                          97.81166
## comp 14 1.570097e-01           5.233657e-01                          98.33503
## comp 15 9.413497e-02           3.137832e-01                          98.64881
## comp 16 7.986280e-02           2.662093e-01                          98.91502
## comp 17 5.939904e-02           1.979968e-01                          99.11302
## comp 18 5.261878e-02           1.753959e-01                          99.28841
## comp 19 4.947759e-02           1.649253e-01                          99.45334
## comp 20 3.115940e-02           1.038647e-01                          99.55720
## comp 21 2.997289e-02           9.990965e-02                          99.65711
## comp 22 2.743940e-02           9.146468e-02                          99.74858
## comp 23 2.434084e-02           8.113613e-02                          99.82971
## comp 24 1.805501e-02           6.018336e-02                          99.88990
## comp 25 1.548127e-02           5.160424e-02                          99.94150
## comp 26 8.177640e-03           2.725880e-02                          99.96876
## comp 27 6.900464e-03           2.300155e-02                          99.99176
## comp 28 1.589338e-03           5.297793e-03                          99.99706
## comp 29 7.488031e-04           2.496010e-03                          99.99956
## comp 30 1.330448e-04           4.434827e-04                         100.00000
dudi.pca()
all_pca_4$eig
##  [1] 1.328161e+01 5.691355e+00 2.817949e+00 1.980640e+00 1.648731e+00
##  [6] 1.207357e+00 6.752201e-01 4.766171e-01 4.168948e-01 3.506935e-01
## [11] 2.939157e-01 2.611614e-01 2.413575e-01 1.570097e-01 9.413497e-02
## [16] 7.986280e-02 5.939904e-02 5.261878e-02 4.947759e-02 3.115940e-02
## [21] 2.997289e-02 2.743940e-02 2.434084e-02 1.805501e-02 1.548127e-02
## [26] 8.177640e-03 6.900464e-03 1.589338e-03 7.488031e-04 1.330448e-04
epPCA()
all_pca_5$ExPosition.Data$eigs
##  [1] 7.543953e+03 3.232689e+03 1.600595e+03 1.125004e+03 9.364790e+02
##  [6] 6.857786e+02 3.835250e+02 2.707185e+02 2.367963e+02 1.991939e+02
## [11] 1.669441e+02 1.483397e+02 1.370911e+02 8.918152e+01 5.346866e+01
## [16] 4.536207e+01 3.373865e+01 2.988747e+01 2.810327e+01 1.769854e+01
## [21] 1.702460e+01 1.558558e+01 1.382560e+01 1.025524e+01 8.793362e+00
## [26] 4.644899e+00 3.919463e+00 9.027439e-01 4.253202e-01 7.556946e-02

Note that, for example, accessing the eigenvalues of the objects created through the use of the dudi.pca and epPCA() functions does not return the variances associated with each of the eigenvalues, which is an useful metric one would rather observe and evaluate than not. In epPCA()’s case these variances can be accessed via $ExPosition.Data$t, but as was already stated there is not a component for cumulative variances. In dudi.pca()’s case, one can observe these variances through the summary() function, but format-wise that is less than ideal. Fortunately, there are alternative ways to obtain the eigenvalues - the package factoextra includes several functions to extract and visualize these variances. More information about them can be found in their associated RDocumentation page: https://www.rdocumentation.org/packages/factoextra/versions/1.0.7/topics/eigenvalue

Let’s focus on the eigenvalues themselves, which can be extracted through the functions get_eig() and get_eigenvalue(). Note that both functions are identical, one is but an alias of the other.

library(factoextra)
get_eig(pca)
get_eigenvalue(pca)

Let’s observe the results of applying said functions to the previously constructed PCAs:

prcomp()
get_eig(all_pca_1)
##          eigenvalue variance.percent cumulative.variance.percent
## Dim.1  1.328161e+01     4.427203e+01                    44.27203
## Dim.2  5.691355e+00     1.897118e+01                    63.24321
## Dim.3  2.817949e+00     9.393163e+00                    72.63637
## Dim.4  1.980640e+00     6.602135e+00                    79.23851
## Dim.5  1.648731e+00     5.495768e+00                    84.73427
## Dim.6  1.207357e+00     4.024522e+00                    88.75880
## Dim.7  6.752201e-01     2.250734e+00                    91.00953
## Dim.8  4.766171e-01     1.588724e+00                    92.59825
## Dim.9  4.168948e-01     1.389649e+00                    93.98790
## Dim.10 3.506935e-01     1.168978e+00                    95.15688
## Dim.11 2.939157e-01     9.797190e-01                    96.13660
## Dim.12 2.611614e-01     8.705379e-01                    97.00714
## Dim.13 2.413575e-01     8.045250e-01                    97.81166
## Dim.14 1.570097e-01     5.233657e-01                    98.33503
## Dim.15 9.413497e-02     3.137832e-01                    98.64881
## Dim.16 7.986280e-02     2.662093e-01                    98.91502
## Dim.17 5.939904e-02     1.979968e-01                    99.11302
## Dim.18 5.261878e-02     1.753959e-01                    99.28841
## Dim.19 4.947759e-02     1.649253e-01                    99.45334
## Dim.20 3.115940e-02     1.038647e-01                    99.55720
## Dim.21 2.997289e-02     9.990965e-02                    99.65711
## Dim.22 2.743940e-02     9.146468e-02                    99.74858
## Dim.23 2.434084e-02     8.113613e-02                    99.82971
## Dim.24 1.805501e-02     6.018336e-02                    99.88990
## Dim.25 1.548127e-02     5.160424e-02                    99.94150
## Dim.26 8.177640e-03     2.725880e-02                    99.96876
## Dim.27 6.900464e-03     2.300155e-02                    99.99176
## Dim.28 1.589338e-03     5.297793e-03                    99.99706
## Dim.29 7.488031e-04     2.496010e-03                    99.99956
## Dim.30 1.330448e-04     4.434827e-04                   100.00000
get_eigenvalue(all_pca_1)
##          eigenvalue variance.percent cumulative.variance.percent
## Dim.1  1.328161e+01     4.427203e+01                    44.27203
## Dim.2  5.691355e+00     1.897118e+01                    63.24321
## Dim.3  2.817949e+00     9.393163e+00                    72.63637
## Dim.4  1.980640e+00     6.602135e+00                    79.23851
## Dim.5  1.648731e+00     5.495768e+00                    84.73427
## Dim.6  1.207357e+00     4.024522e+00                    88.75880
## Dim.7  6.752201e-01     2.250734e+00                    91.00953
## Dim.8  4.766171e-01     1.588724e+00                    92.59825
## Dim.9  4.168948e-01     1.389649e+00                    93.98790
## Dim.10 3.506935e-01     1.168978e+00                    95.15688
## Dim.11 2.939157e-01     9.797190e-01                    96.13660
## Dim.12 2.611614e-01     8.705379e-01                    97.00714
## Dim.13 2.413575e-01     8.045250e-01                    97.81166
## Dim.14 1.570097e-01     5.233657e-01                    98.33503
## Dim.15 9.413497e-02     3.137832e-01                    98.64881
## Dim.16 7.986280e-02     2.662093e-01                    98.91502
## Dim.17 5.939904e-02     1.979968e-01                    99.11302
## Dim.18 5.261878e-02     1.753959e-01                    99.28841
## Dim.19 4.947759e-02     1.649253e-01                    99.45334
## Dim.20 3.115940e-02     1.038647e-01                    99.55720
## Dim.21 2.997289e-02     9.990965e-02                    99.65711
## Dim.22 2.743940e-02     9.146468e-02                    99.74858
## Dim.23 2.434084e-02     8.113613e-02                    99.82971
## Dim.24 1.805501e-02     6.018336e-02                    99.88990
## Dim.25 1.548127e-02     5.160424e-02                    99.94150
## Dim.26 8.177640e-03     2.725880e-02                    99.96876
## Dim.27 6.900464e-03     2.300155e-02                    99.99176
## Dim.28 1.589338e-03     5.297793e-03                    99.99706
## Dim.29 7.488031e-04     2.496010e-03                    99.99956
## Dim.30 1.330448e-04     4.434827e-04                   100.00000
identical(get_eig(all_pca_1), get_eigenvalue(all_pca_1))
## [1] TRUE
princomp()
get_eig(all_pca_2)
##          eigenvalue variance.percent cumulative.variance.percent
## Dim.1  1.328161e+01     4.427203e+01                    44.27203
## Dim.2  5.691355e+00     1.897118e+01                    63.24321
## Dim.3  2.817949e+00     9.393163e+00                    72.63637
## Dim.4  1.980640e+00     6.602135e+00                    79.23851
## Dim.5  1.648731e+00     5.495768e+00                    84.73427
## Dim.6  1.207357e+00     4.024522e+00                    88.75880
## Dim.7  6.752201e-01     2.250734e+00                    91.00953
## Dim.8  4.766171e-01     1.588724e+00                    92.59825
## Dim.9  4.168948e-01     1.389649e+00                    93.98790
## Dim.10 3.506935e-01     1.168978e+00                    95.15688
## Dim.11 2.939157e-01     9.797190e-01                    96.13660
## Dim.12 2.611614e-01     8.705379e-01                    97.00714
## Dim.13 2.413575e-01     8.045250e-01                    97.81166
## Dim.14 1.570097e-01     5.233657e-01                    98.33503
## Dim.15 9.413497e-02     3.137832e-01                    98.64881
## Dim.16 7.986280e-02     2.662093e-01                    98.91502
## Dim.17 5.939904e-02     1.979968e-01                    99.11302
## Dim.18 5.261878e-02     1.753959e-01                    99.28841
## Dim.19 4.947759e-02     1.649253e-01                    99.45334
## Dim.20 3.115940e-02     1.038647e-01                    99.55720
## Dim.21 2.997289e-02     9.990965e-02                    99.65711
## Dim.22 2.743940e-02     9.146468e-02                    99.74858
## Dim.23 2.434084e-02     8.113613e-02                    99.82971
## Dim.24 1.805501e-02     6.018336e-02                    99.88990
## Dim.25 1.548127e-02     5.160424e-02                    99.94150
## Dim.26 8.177640e-03     2.725880e-02                    99.96876
## Dim.27 6.900464e-03     2.300155e-02                    99.99176
## Dim.28 1.589338e-03     5.297793e-03                    99.99706
## Dim.29 7.488031e-04     2.496010e-03                    99.99956
## Dim.30 1.330448e-04     4.434827e-04                   100.00000
get_eigenvalue(all_pca_2)
##          eigenvalue variance.percent cumulative.variance.percent
## Dim.1  1.328161e+01     4.427203e+01                    44.27203
## Dim.2  5.691355e+00     1.897118e+01                    63.24321
## Dim.3  2.817949e+00     9.393163e+00                    72.63637
## Dim.4  1.980640e+00     6.602135e+00                    79.23851
## Dim.5  1.648731e+00     5.495768e+00                    84.73427
## Dim.6  1.207357e+00     4.024522e+00                    88.75880
## Dim.7  6.752201e-01     2.250734e+00                    91.00953
## Dim.8  4.766171e-01     1.588724e+00                    92.59825
## Dim.9  4.168948e-01     1.389649e+00                    93.98790
## Dim.10 3.506935e-01     1.168978e+00                    95.15688
## Dim.11 2.939157e-01     9.797190e-01                    96.13660
## Dim.12 2.611614e-01     8.705379e-01                    97.00714
## Dim.13 2.413575e-01     8.045250e-01                    97.81166
## Dim.14 1.570097e-01     5.233657e-01                    98.33503
## Dim.15 9.413497e-02     3.137832e-01                    98.64881
## Dim.16 7.986280e-02     2.662093e-01                    98.91502
## Dim.17 5.939904e-02     1.979968e-01                    99.11302
## Dim.18 5.261878e-02     1.753959e-01                    99.28841
## Dim.19 4.947759e-02     1.649253e-01                    99.45334
## Dim.20 3.115940e-02     1.038647e-01                    99.55720
## Dim.21 2.997289e-02     9.990965e-02                    99.65711
## Dim.22 2.743940e-02     9.146468e-02                    99.74858
## Dim.23 2.434084e-02     8.113613e-02                    99.82971
## Dim.24 1.805501e-02     6.018336e-02                    99.88990
## Dim.25 1.548127e-02     5.160424e-02                    99.94150
## Dim.26 8.177640e-03     2.725880e-02                    99.96876
## Dim.27 6.900464e-03     2.300155e-02                    99.99176
## Dim.28 1.589338e-03     5.297793e-03                    99.99706
## Dim.29 7.488031e-04     2.496010e-03                    99.99956
## Dim.30 1.330448e-04     4.434827e-04                   100.00000
identical(get_eig(all_pca_2), get_eigenvalue(all_pca_2))
## [1] TRUE
PCA()
get_eig(all_pca_3)
##          eigenvalue variance.percent cumulative.variance.percent
## Dim.1  1.328161e+01     4.427203e+01                    44.27203
## Dim.2  5.691355e+00     1.897118e+01                    63.24321
## Dim.3  2.817949e+00     9.393163e+00                    72.63637
## Dim.4  1.980640e+00     6.602135e+00                    79.23851
## Dim.5  1.648731e+00     5.495768e+00                    84.73427
## Dim.6  1.207357e+00     4.024522e+00                    88.75880
## Dim.7  6.752201e-01     2.250734e+00                    91.00953
## Dim.8  4.766171e-01     1.588724e+00                    92.59825
## Dim.9  4.168948e-01     1.389649e+00                    93.98790
## Dim.10 3.506935e-01     1.168978e+00                    95.15688
## Dim.11 2.939157e-01     9.797190e-01                    96.13660
## Dim.12 2.611614e-01     8.705379e-01                    97.00714
## Dim.13 2.413575e-01     8.045250e-01                    97.81166
## Dim.14 1.570097e-01     5.233657e-01                    98.33503
## Dim.15 9.413497e-02     3.137832e-01                    98.64881
## Dim.16 7.986280e-02     2.662093e-01                    98.91502
## Dim.17 5.939904e-02     1.979968e-01                    99.11302
## Dim.18 5.261878e-02     1.753959e-01                    99.28841
## Dim.19 4.947759e-02     1.649253e-01                    99.45334
## Dim.20 3.115940e-02     1.038647e-01                    99.55720
## Dim.21 2.997289e-02     9.990965e-02                    99.65711
## Dim.22 2.743940e-02     9.146468e-02                    99.74858
## Dim.23 2.434084e-02     8.113613e-02                    99.82971
## Dim.24 1.805501e-02     6.018336e-02                    99.88990
## Dim.25 1.548127e-02     5.160424e-02                    99.94150
## Dim.26 8.177640e-03     2.725880e-02                    99.96876
## Dim.27 6.900464e-03     2.300155e-02                    99.99176
## Dim.28 1.589338e-03     5.297793e-03                    99.99706
## Dim.29 7.488031e-04     2.496010e-03                    99.99956
## Dim.30 1.330448e-04     4.434827e-04                   100.00000
get_eigenvalue(all_pca_3)
##          eigenvalue variance.percent cumulative.variance.percent
## Dim.1  1.328161e+01     4.427203e+01                    44.27203
## Dim.2  5.691355e+00     1.897118e+01                    63.24321
## Dim.3  2.817949e+00     9.393163e+00                    72.63637
## Dim.4  1.980640e+00     6.602135e+00                    79.23851
## Dim.5  1.648731e+00     5.495768e+00                    84.73427
## Dim.6  1.207357e+00     4.024522e+00                    88.75880
## Dim.7  6.752201e-01     2.250734e+00                    91.00953
## Dim.8  4.766171e-01     1.588724e+00                    92.59825
## Dim.9  4.168948e-01     1.389649e+00                    93.98790
## Dim.10 3.506935e-01     1.168978e+00                    95.15688
## Dim.11 2.939157e-01     9.797190e-01                    96.13660
## Dim.12 2.611614e-01     8.705379e-01                    97.00714
## Dim.13 2.413575e-01     8.045250e-01                    97.81166
## Dim.14 1.570097e-01     5.233657e-01                    98.33503
## Dim.15 9.413497e-02     3.137832e-01                    98.64881
## Dim.16 7.986280e-02     2.662093e-01                    98.91502
## Dim.17 5.939904e-02     1.979968e-01                    99.11302
## Dim.18 5.261878e-02     1.753959e-01                    99.28841
## Dim.19 4.947759e-02     1.649253e-01                    99.45334
## Dim.20 3.115940e-02     1.038647e-01                    99.55720
## Dim.21 2.997289e-02     9.990965e-02                    99.65711
## Dim.22 2.743940e-02     9.146468e-02                    99.74858
## Dim.23 2.434084e-02     8.113613e-02                    99.82971
## Dim.24 1.805501e-02     6.018336e-02                    99.88990
## Dim.25 1.548127e-02     5.160424e-02                    99.94150
## Dim.26 8.177640e-03     2.725880e-02                    99.96876
## Dim.27 6.900464e-03     2.300155e-02                    99.99176
## Dim.28 1.589338e-03     5.297793e-03                    99.99706
## Dim.29 7.488031e-04     2.496010e-03                    99.99956
## Dim.30 1.330448e-04     4.434827e-04                   100.00000
identical(get_eig(all_pca_3), get_eigenvalue(all_pca_3))
## [1] TRUE
dudi.pca()
get_eig(all_pca_4)
##          eigenvalue variance.percent cumulative.variance.percent
## Dim.1  1.328161e+01     4.427203e+01                    44.27203
## Dim.2  5.691355e+00     1.897118e+01                    63.24321
## Dim.3  2.817949e+00     9.393163e+00                    72.63637
## Dim.4  1.980640e+00     6.602135e+00                    79.23851
## Dim.5  1.648731e+00     5.495768e+00                    84.73427
## Dim.6  1.207357e+00     4.024522e+00                    88.75880
## Dim.7  6.752201e-01     2.250734e+00                    91.00953
## Dim.8  4.766171e-01     1.588724e+00                    92.59825
## Dim.9  4.168948e-01     1.389649e+00                    93.98790
## Dim.10 3.506935e-01     1.168978e+00                    95.15688
## Dim.11 2.939157e-01     9.797190e-01                    96.13660
## Dim.12 2.611614e-01     8.705379e-01                    97.00714
## Dim.13 2.413575e-01     8.045250e-01                    97.81166
## Dim.14 1.570097e-01     5.233657e-01                    98.33503
## Dim.15 9.413497e-02     3.137832e-01                    98.64881
## Dim.16 7.986280e-02     2.662093e-01                    98.91502
## Dim.17 5.939904e-02     1.979968e-01                    99.11302
## Dim.18 5.261878e-02     1.753959e-01                    99.28841
## Dim.19 4.947759e-02     1.649253e-01                    99.45334
## Dim.20 3.115940e-02     1.038647e-01                    99.55720
## Dim.21 2.997289e-02     9.990965e-02                    99.65711
## Dim.22 2.743940e-02     9.146468e-02                    99.74858
## Dim.23 2.434084e-02     8.113613e-02                    99.82971
## Dim.24 1.805501e-02     6.018336e-02                    99.88990
## Dim.25 1.548127e-02     5.160424e-02                    99.94150
## Dim.26 8.177640e-03     2.725880e-02                    99.96876
## Dim.27 6.900464e-03     2.300155e-02                    99.99176
## Dim.28 1.589338e-03     5.297793e-03                    99.99706
## Dim.29 7.488031e-04     2.496010e-03                    99.99956
## Dim.30 1.330448e-04     4.434827e-04                   100.00000
get_eigenvalue(all_pca_4)
##          eigenvalue variance.percent cumulative.variance.percent
## Dim.1  1.328161e+01     4.427203e+01                    44.27203
## Dim.2  5.691355e+00     1.897118e+01                    63.24321
## Dim.3  2.817949e+00     9.393163e+00                    72.63637
## Dim.4  1.980640e+00     6.602135e+00                    79.23851
## Dim.5  1.648731e+00     5.495768e+00                    84.73427
## Dim.6  1.207357e+00     4.024522e+00                    88.75880
## Dim.7  6.752201e-01     2.250734e+00                    91.00953
## Dim.8  4.766171e-01     1.588724e+00                    92.59825
## Dim.9  4.168948e-01     1.389649e+00                    93.98790
## Dim.10 3.506935e-01     1.168978e+00                    95.15688
## Dim.11 2.939157e-01     9.797190e-01                    96.13660
## Dim.12 2.611614e-01     8.705379e-01                    97.00714
## Dim.13 2.413575e-01     8.045250e-01                    97.81166
## Dim.14 1.570097e-01     5.233657e-01                    98.33503
## Dim.15 9.413497e-02     3.137832e-01                    98.64881
## Dim.16 7.986280e-02     2.662093e-01                    98.91502
## Dim.17 5.939904e-02     1.979968e-01                    99.11302
## Dim.18 5.261878e-02     1.753959e-01                    99.28841
## Dim.19 4.947759e-02     1.649253e-01                    99.45334
## Dim.20 3.115940e-02     1.038647e-01                    99.55720
## Dim.21 2.997289e-02     9.990965e-02                    99.65711
## Dim.22 2.743940e-02     9.146468e-02                    99.74858
## Dim.23 2.434084e-02     8.113613e-02                    99.82971
## Dim.24 1.805501e-02     6.018336e-02                    99.88990
## Dim.25 1.548127e-02     5.160424e-02                    99.94150
## Dim.26 8.177640e-03     2.725880e-02                    99.96876
## Dim.27 6.900464e-03     2.300155e-02                    99.99176
## Dim.28 1.589338e-03     5.297793e-03                    99.99706
## Dim.29 7.488031e-04     2.496010e-03                    99.99956
## Dim.30 1.330448e-04     4.434827e-04                   100.00000
identical(get_eig(all_pca_4), get_eigenvalue(all_pca_4))
## [1] TRUE
epPCA()
get_eig(all_pca_5)
##          eigenvalue variance.percent cumulative.variance.percent
## Dim.1  7.543953e+03     4.427203e+01                    44.27203
## Dim.2  3.232689e+03     1.897118e+01                    63.24321
## Dim.3  1.600595e+03     9.393163e+00                    72.63637
## Dim.4  1.125004e+03     6.602135e+00                    79.23851
## Dim.5  9.364790e+02     5.495768e+00                    84.73427
## Dim.6  6.857786e+02     4.024522e+00                    88.75880
## Dim.7  3.835250e+02     2.250734e+00                    91.00953
## Dim.8  2.707185e+02     1.588724e+00                    92.59825
## Dim.9  2.367963e+02     1.389649e+00                    93.98790
## Dim.10 1.991939e+02     1.168978e+00                    95.15688
## Dim.11 1.669441e+02     9.797190e-01                    96.13660
## Dim.12 1.483397e+02     8.705379e-01                    97.00714
## Dim.13 1.370911e+02     8.045250e-01                    97.81166
## Dim.14 8.918152e+01     5.233657e-01                    98.33503
## Dim.15 5.346866e+01     3.137832e-01                    98.64881
## Dim.16 4.536207e+01     2.662093e-01                    98.91502
## Dim.17 3.373865e+01     1.979968e-01                    99.11302
## Dim.18 2.988747e+01     1.753959e-01                    99.28841
## Dim.19 2.810327e+01     1.649253e-01                    99.45334
## Dim.20 1.769854e+01     1.038647e-01                    99.55720
## Dim.21 1.702460e+01     9.990965e-02                    99.65711
## Dim.22 1.558558e+01     9.146468e-02                    99.74858
## Dim.23 1.382560e+01     8.113613e-02                    99.82971
## Dim.24 1.025524e+01     6.018336e-02                    99.88990
## Dim.25 8.793362e+00     5.160424e-02                    99.94150
## Dim.26 4.644899e+00     2.725880e-02                    99.96876
## Dim.27 3.919463e+00     2.300155e-02                    99.99176
## Dim.28 9.027439e-01     5.297793e-03                    99.99706
## Dim.29 4.253202e-01     2.496010e-03                    99.99956
## Dim.30 7.556946e-02     4.434827e-04                   100.00000
get_eigenvalue(all_pca_5)
##          eigenvalue variance.percent cumulative.variance.percent
## Dim.1  7.543953e+03     4.427203e+01                    44.27203
## Dim.2  3.232689e+03     1.897118e+01                    63.24321
## Dim.3  1.600595e+03     9.393163e+00                    72.63637
## Dim.4  1.125004e+03     6.602135e+00                    79.23851
## Dim.5  9.364790e+02     5.495768e+00                    84.73427
## Dim.6  6.857786e+02     4.024522e+00                    88.75880
## Dim.7  3.835250e+02     2.250734e+00                    91.00953
## Dim.8  2.707185e+02     1.588724e+00                    92.59825
## Dim.9  2.367963e+02     1.389649e+00                    93.98790
## Dim.10 1.991939e+02     1.168978e+00                    95.15688
## Dim.11 1.669441e+02     9.797190e-01                    96.13660
## Dim.12 1.483397e+02     8.705379e-01                    97.00714
## Dim.13 1.370911e+02     8.045250e-01                    97.81166
## Dim.14 8.918152e+01     5.233657e-01                    98.33503
## Dim.15 5.346866e+01     3.137832e-01                    98.64881
## Dim.16 4.536207e+01     2.662093e-01                    98.91502
## Dim.17 3.373865e+01     1.979968e-01                    99.11302
## Dim.18 2.988747e+01     1.753959e-01                    99.28841
## Dim.19 2.810327e+01     1.649253e-01                    99.45334
## Dim.20 1.769854e+01     1.038647e-01                    99.55720
## Dim.21 1.702460e+01     9.990965e-02                    99.65711
## Dim.22 1.558558e+01     9.146468e-02                    99.74858
## Dim.23 1.382560e+01     8.113613e-02                    99.82971
## Dim.24 1.025524e+01     6.018336e-02                    99.88990
## Dim.25 8.793362e+00     5.160424e-02                    99.94150
## Dim.26 4.644899e+00     2.725880e-02                    99.96876
## Dim.27 3.919463e+00     2.300155e-02                    99.99176
## Dim.28 9.027439e-01     5.297793e-03                    99.99706
## Dim.29 4.253202e-01     2.496010e-03                    99.99956
## Dim.30 7.556946e-02     4.434827e-04                   100.00000
identical(get_eig(all_pca_5), get_eigenvalue(all_pca_5))
## [1] TRUE

Every single PCA object returns the same eigenvalues and variances, as should be. Is also worth noting that both get_eig() and get_eigenvalue() yield the same results - the function identical() returns TRUE in each scenario, indicating equality between both functions’ outputs.

Despite the usefulness of having the eigenvalues and variances in a tidy dataframe such as the ones obtained in the previous code snippets, being able to visualize the differences graphically helps to illustrate it all - that is when the screeplot comes into play.

2.2.6. Screeplot

The RDocumentation page detailing both both get_eig() and get_eigenvalue() also mentions the fviz_eig() and the fviz_screeplot() functions which, as was the case with the previous functions, are identical in behavior - one is but an alias of the other.

These functions draw what is know as a “screeplot” or “scree plot”, which is a graph of eigenvalues ordered from largest to smallest. It can also be interpreted as a plot of the percentage of variance associated with each Principal Component or dimension (since such is the definition of an eigenvalue after all).

Unlike the previous factoextra functions, these ones accept multiple arguments in order to customize the screeplot:
  • X: an object obtained through any of the PCA functions previously detailed.
  • choice: a text specifying the data to be plotted. Allowed values are “variance” or “eigenvalue”.
  • geom: a text specifying the geometry to be used for the graph. Allowed values are “bar” for barplot, “line” for lineplot or c(“bar”, “line”) to use both types.
  • barfill: fill color for bar plot.
  • barcolor: outline color for bar plot.
  • linecolor: color for line plot.
  • ncp: a numeric value specifying the number of dimensions to be shown.
  • addlabels: TRUE of FALSE determines whether or not labels are added at the top of bars or points showcasing the information retained by each dimension.
  • hjust: horizontal adjustment of the labels.
  • main, xlab, ylab: plot main and axis titles.
  • ggtheme: allows the user to set a ggplot2 theme.
library(factoextra)
fviz_eig(X,
         choice = "variance",
         geom = c("bar", "line"),
         barfill = "steelblue",
         barcolor = "steelblue",
         linecolor = "black",
         ncp = 10,
         addlabels = FALSE,
         hjust = 0,
         main = NULL,
         xlab = NULL,
         ylab = NULL,
         ggtheme = theme_minimal(),
         ...)
fviz_screeplot(pca, ...)

These functions also accept optional arguments to be passed onto the function ggpar() upon which they are based. More information regarding these functions and their arguments is available at the very same RDocumentation page as the functions get_eig() and get_eigenvalue() (https://www.rdocumentation.org/packages/factoextra/versions/1.0.7/topics/eigenvalue).

Let’s now observe the results of applying said functions to the previously constructed PCA objects:

prcomp()
fviz_eig(all_pca_1, 
         addlabels = TRUE, 
         ylim = c(0, 50), 
         barfill = "#81d4fa", 
         barcolor = "#81d4fa", 
         linecolor = "red")

princomp()
fviz_eig(all_pca_2, 
         addlabels = TRUE, 
         ylim = c(0, 50), 
         barfill = "#81d4fa", 
         barcolor = "#81d4fa", 
         linecolor = "red")

PCA()
fviz_eig(all_pca_3, 
         addlabels = TRUE, 
         ylim = c(0, 50), 
         barfill = "#81d4fa", 
         barcolor = "#81d4fa", 
         linecolor = "red")

dudi.pca()
fviz_eig(all_pca_4, 
         addlabels = TRUE, 
         ylim = c(0, 50), 
         barfill = "#81d4fa", 
         barcolor = "#81d4fa", 
         linecolor = "red")

epPCA()
fviz_eig(all_pca_5, 
         addlabels = TRUE, 
         ylim = c(0, 50), 
         barfill = "#81d4fa", 
         barcolor = "#81d4fa", 
         linecolor = "red")

Unsurprisingly, all of the screeplots are identical.

As a side note, the scannf argument of the function dudi.pca() allows for a screeplot to be plotted. Setting said argument to TRUE would yield a plot akin to the ones obtained earlier with the function fviz_eig(), albeit considerably less complete/polished.

These plots help to visualize the variance results obtained when performing a PCA upon the Wisconsin Breast Cancer Dataset. Unfortunately, there is no well-accepted objective way to decide how many Principal Components are enough - this will depend on the specific field of application and the specific dataset (biomedical scenarios tend to require high cummulative variance since people’s health is at play). In practice, the first few principal components are the most important ones in order to find interesting patterns in the data and undoubtedly the most important ones when it comes to representing the data.

2.2.7. Variables and individuals

Navigating through the PCA outputs previously coded is not an easy task. As was the case with the eigenvalues, some of the PCA functions yield an object whose components include the results for variables and individuals, namely PCA() and epPCA().

PCA()
print(all_pca_3)
## **Results for the Principal Component Analysis (PCA)**
## The analysis was performed on 569 individuals, described by 30 variables
## *The results are available in the following objects:
## 
##    name               description                          
## 1  "$eig"             "eigenvalues"                        
## 2  "$var"             "results for the variables"          
## 3  "$var$coord"       "coord. for the variables"           
## 4  "$var$cor"         "correlations variables - dimensions"
## 5  "$var$cos2"        "cos2 for the variables"             
## 6  "$var$contrib"     "contributions of the variables"     
## 7  "$ind"             "results for the individuals"        
## 8  "$ind$coord"       "coord. for the individuals"         
## 9  "$ind$cos2"        "cos2 for the individuals"           
## 10 "$ind$contrib"     "contributions of the individuals"   
## 11 "$call"            "summary statistics"                 
## 12 "$call$centre"     "mean of the variables"              
## 13 "$call$ecart.type" "standard error of the variables"    
## 14 "$call$row.w"      "weights for the individuals"        
## 15 "$call$col.w"      "weights for the variables"
epPCA()
print(all_pca_5$ExPosition.Data)
## **Results for Principal Component Analysis**
## The analysis was performed on  569 individuals, described by 30 variables
## *The results are available in the following objects:
## 
##    name      description                      
## 1  "$fi"     "Factor scores of the rows"      
## 2  "$di"     "Squared distances of the rows"  
## 3  "$ci"     "Contributions of the rows"      
## 4  "$ri"     "Cosines of the rows"            
## 5  "$fj"     "Factor scores of the columns"   
## 6  "$dj"     "square distances of the columns"
## 7  "$cj"     "Contributions for the columns"  
## 8  "$rj"     "Cosines of the columns"         
## 9  "$t"      "Explained Variance"             
## 10 "$eigs"   "Eigenvalues"                    
## 11 "$pdq"    "SVD data"                       
## 12 "$X"      "X matrix to decompose"          
## 13 "$M"      "Masses - each set to 1"         
## 14 "$W"      "Weights - each set to 1"        
## 15 "$center" "Center of X"                    
## 16 "$scale"  "Scale factor of X"

In PCA()’s case, the results for variables and individuals are accessed with $var and $ind respectively, whereas epPCA() has a unique address for every result (less ideal, but manageable).

A simpler method to extract the results for variables and individuals from a PCA output is to use the function get_pca_var() and get_pca_ind() respectively. There’s also the option of using the function get_pca() with the argument element = “var” for the results for variables or with the argument element = “ind” for the results for individuals.

All of these functions come from the factoextra package and provide a list of matrices containing all the results for either the active variables or individuals (more information regarding these functions is available in their associated RDocumentation page: https://www.rdocumentation.org/packages/factoextra/versions/1.0.7/topics/get_pca).

Let’s first examine the outputs of applying get_pca_var() to the PCA objects previously created:

prcomp()
all_pca_var_1 <- get_pca_var(all_pca_1)
all_pca_var_1
## Principal Component Analysis Results for variables
##  ===================================================
##   Name       Description                                    
## 1 "$coord"   "Coordinates for the variables"                
## 2 "$cor"     "Correlations between variables and dimensions"
## 3 "$cos2"    "Cos2 for the variables"                       
## 4 "$contrib" "contributions of the variables"
head(all_pca_var_1$coord)
##                       Dim.1      Dim.2       Dim.3       Dim.4       Dim.5
## radius_mean      -0.7977668  0.5579027 -0.01432118  0.05827700 -0.04851878
## texture_mean     -0.3780132  0.1424382  0.10835829 -0.84870380  0.06351944
## perimeter_mean   -0.8292355  0.5133487 -0.01563555  0.05908501 -0.04799015
## area_mean        -0.8053928  0.5512695  0.04817717  0.07520017 -0.01326563
## smoothness_mean  -0.5196530 -0.4440017 -0.17507219  0.22430770  0.46878427
## compactness_mean -0.8720501 -0.3623611 -0.12437565  0.04474618 -0.01502824
##                         Dim.6        Dim.7        Dim.8        Dim.9
## radius_mean       0.020592339 -0.101965596  0.005144876 -0.144056156
## texture_mean     -0.035358035  0.009367203 -0.090214585  0.072767057
## perimeter_mean    0.019018481 -0.094067834  0.012901209 -0.144462575
## area_mean        -0.002074253 -0.042444540 -0.023937777 -0.126284788
## smoothness_mean  -0.314667668 -0.115590213  0.199500717  0.004148275
## compactness_mean -0.015527056  0.025406278  0.104520200 -0.108370831
##                        Dim.10       Dim.11      Dim.12      Dim.13       Dim.14
## radius_mean       0.056546476 -0.022483349  0.02609749 0.005879269  0.023578980
## texture_mean      0.142679652  0.163858215  0.13026214 0.099956785 -0.008543071
## perimeter_mean    0.051157023 -0.009098538  0.01989278 0.021670182  0.019223333
## area_mean         0.044388765 -0.059727362  0.03344115 0.033100452  0.004291657
## smoothness_mean  -0.041034694  0.074285011  0.16186012 0.022389467  0.176354514
## compactness_mean  0.007660737  0.166984319 -0.05315682 0.112641659  0.003210000
##                        Dim.15      Dim.16       Dim.17       Dim.18
## radius_mean      -0.015683967 -0.04255502  0.049456533  0.033654027
## texture_mean     -0.033112133 -0.04460615 -0.009433423 -0.009428525
## perimeter_mean   -0.012242788 -0.03234470  0.047481690  0.036316100
## area_mean         0.004285246 -0.03742982  0.062320398  0.061055728
## smoothness_mean  -0.036248064 -0.05782372  0.040927741 -0.080796547
## compactness_mean  0.070843391  0.04809242 -0.004949378  0.001787881
##                        Dim.19       Dim.20       Dim.21      Dim.22
## radius_mean       0.050133570 -0.008772821 -0.011871307 -0.01208056
## texture_mean      0.006626055 -0.043094773  0.077624778 -0.01570358
## perimeter_mean    0.053294517 -0.003118233 -0.012078892 -0.01245022
## area_mean        -0.006077427 -0.015912200 -0.003193026 -0.01616162
## smoothness_mean  -0.036605301  0.003018666 -0.020687226 -0.01057217
## compactness_mean  0.063221168  0.086263038  0.033347929  0.01624639
##                         Dim.23       Dim.24       Dim.25       Dim.26
## radius_mean      -1.537575e-02 -0.024533003 -0.002392233 -0.011708590
## texture_mean     -8.658821e-05  0.013273874  0.010544407 -0.002220667
## perimeter_mean   -6.278798e-03 -0.015673984  0.003361359 -0.011326933
## area_mean         1.213375e-03  0.009385446 -0.026134063  0.032801549
## smoothness_mean  -3.224173e-03  0.009230799  0.003602676 -0.003346255
## compactness_mean  8.169034e-03 -0.013992577  0.049349353  0.023765850
##                        Dim.27        Dim.28        Dim.29        Dim.30
## radius_mean      -0.010925793  8.419566e-03  5.786460e-03  8.101999e-03
## texture_mean     -0.001441855 -2.623673e-06 -2.882534e-04  3.156545e-06
## perimeter_mean   -0.009587447  3.362272e-03  1.050312e-02 -7.957621e-03
## area_mean         0.038761046 -1.086395e-02 -1.156947e-02 -3.800314e-04
## smoothness_mean   0.005789074  5.897327e-05 -9.398714e-05 -5.591303e-05
## compactness_mean  0.008119890 -2.177814e-04 -1.122394e-03  5.152947e-04
head(all_pca_var_1$cor)
##                       Dim.1      Dim.2       Dim.3       Dim.4       Dim.5
## radius_mean      -0.7977668  0.5579027 -0.01432118  0.05827700 -0.04851878
## texture_mean     -0.3780132  0.1424382  0.10835829 -0.84870380  0.06351944
## perimeter_mean   -0.8292355  0.5133487 -0.01563555  0.05908501 -0.04799015
## area_mean        -0.8053928  0.5512695  0.04817717  0.07520017 -0.01326563
## smoothness_mean  -0.5196530 -0.4440017 -0.17507219  0.22430770  0.46878427
## compactness_mean -0.8720501 -0.3623611 -0.12437565  0.04474618 -0.01502824
##                         Dim.6        Dim.7        Dim.8        Dim.9
## radius_mean       0.020592339 -0.101965596  0.005144876 -0.144056156
## texture_mean     -0.035358035  0.009367203 -0.090214585  0.072767057
## perimeter_mean    0.019018481 -0.094067834  0.012901209 -0.144462575
## area_mean        -0.002074253 -0.042444540 -0.023937777 -0.126284788
## smoothness_mean  -0.314667668 -0.115590213  0.199500717  0.004148275
## compactness_mean -0.015527056  0.025406278  0.104520200 -0.108370831
##                        Dim.10       Dim.11      Dim.12      Dim.13       Dim.14
## radius_mean       0.056546476 -0.022483349  0.02609749 0.005879269  0.023578980
## texture_mean      0.142679652  0.163858215  0.13026214 0.099956785 -0.008543071
## perimeter_mean    0.051157023 -0.009098538  0.01989278 0.021670182  0.019223333
## area_mean         0.044388765 -0.059727362  0.03344115 0.033100452  0.004291657
## smoothness_mean  -0.041034694  0.074285011  0.16186012 0.022389467  0.176354514
## compactness_mean  0.007660737  0.166984319 -0.05315682 0.112641659  0.003210000
##                        Dim.15      Dim.16       Dim.17       Dim.18
## radius_mean      -0.015683967 -0.04255502  0.049456533  0.033654027
## texture_mean     -0.033112133 -0.04460615 -0.009433423 -0.009428525
## perimeter_mean   -0.012242788 -0.03234470  0.047481690  0.036316100
## area_mean         0.004285246 -0.03742982  0.062320398  0.061055728
## smoothness_mean  -0.036248064 -0.05782372  0.040927741 -0.080796547
## compactness_mean  0.070843391  0.04809242 -0.004949378  0.001787881
##                        Dim.19       Dim.20       Dim.21      Dim.22
## radius_mean       0.050133570 -0.008772821 -0.011871307 -0.01208056
## texture_mean      0.006626055 -0.043094773  0.077624778 -0.01570358
## perimeter_mean    0.053294517 -0.003118233 -0.012078892 -0.01245022
## area_mean        -0.006077427 -0.015912200 -0.003193026 -0.01616162
## smoothness_mean  -0.036605301  0.003018666 -0.020687226 -0.01057217
## compactness_mean  0.063221168  0.086263038  0.033347929  0.01624639
##                         Dim.23       Dim.24       Dim.25       Dim.26
## radius_mean      -1.537575e-02 -0.024533003 -0.002392233 -0.011708590
## texture_mean     -8.658821e-05  0.013273874  0.010544407 -0.002220667
## perimeter_mean   -6.278798e-03 -0.015673984  0.003361359 -0.011326933
## area_mean         1.213375e-03  0.009385446 -0.026134063  0.032801549
## smoothness_mean  -3.224173e-03  0.009230799  0.003602676 -0.003346255
## compactness_mean  8.169034e-03 -0.013992577  0.049349353  0.023765850
##                        Dim.27        Dim.28        Dim.29        Dim.30
## radius_mean      -0.010925793  8.419566e-03  5.786460e-03  8.101999e-03
## texture_mean     -0.001441855 -2.623673e-06 -2.882534e-04  3.156545e-06
## perimeter_mean   -0.009587447  3.362272e-03  1.050312e-02 -7.957621e-03
## area_mean         0.038761046 -1.086395e-02 -1.156947e-02 -3.800314e-04
## smoothness_mean   0.005789074  5.897327e-05 -9.398714e-05 -5.591303e-05
## compactness_mean  0.008119890 -2.177814e-04 -1.122394e-03  5.152947e-04
head(all_pca_var_1$cos2)
##                      Dim.1      Dim.2        Dim.3       Dim.4        Dim.5
## radius_mean      0.6364318 0.31125539 0.0002050963 0.003396209 0.0023540715
## texture_mean     0.1428940 0.02028864 0.0117415199 0.720298141 0.0040347193
## perimeter_mean   0.6876316 0.26352690 0.0002444703 0.003491038 0.0023030547
## area_mean        0.6486576 0.30389811 0.0023210397 0.005655066 0.0001759769
## smoothness_mean  0.2700393 0.19713747 0.0306502709 0.050313944 0.2197586899
## compactness_mean 0.7604714 0.13130559 0.0154693024 0.002002220 0.0002258480
##                         Dim.6        Dim.7        Dim.8        Dim.9
## radius_mean      4.240444e-04 0.0103969827 2.646975e-05 2.075218e-02
## texture_mean     1.250191e-03 0.0000877445 8.138671e-03 5.295045e-03
## perimeter_mean   3.617026e-04 0.0088487573 1.664412e-04 2.086944e-02
## area_mean        4.302527e-06 0.0018015390 5.730172e-04 1.594785e-02
## smoothness_mean  9.901574e-02 0.0133610973 3.980054e-02 1.720819e-05
## compactness_mean 2.410895e-04 0.0006454790 1.092447e-02 1.174424e-02
##                        Dim.10       Dim.11       Dim.12       Dim.13
## radius_mean      3.197504e-03 5.055010e-04 0.0006810789 3.456581e-05
## texture_mean     2.035748e-02 2.684951e-02 0.0169682252 9.991359e-03
## perimeter_mean   2.617041e-03 8.278339e-05 0.0003957226 4.695968e-04
## area_mean        1.970362e-03 3.567358e-03 0.0011183106 1.095640e-03
## smoothness_mean  1.683846e-03 5.518263e-03 0.0261986970 5.012882e-04
## compactness_mean 5.868689e-05 2.788376e-02 0.0028256473 1.268814e-02
##                        Dim.14       Dim.15      Dim.16       Dim.17
## radius_mean      5.559683e-04 2.459868e-04 0.001810929 2.445949e-03
## texture_mean     7.298407e-05 1.096413e-03 0.001989709 8.898948e-05
## perimeter_mean   3.695365e-04 1.498859e-04 0.001046179 2.254511e-03
## area_mean        1.841832e-05 1.836333e-05 0.001400992 3.883832e-03
## smoothness_mean  3.110091e-02 1.313922e-03 0.003343582 1.675080e-03
## compactness_mean 1.030410e-05 5.018786e-03 0.002312881 2.449634e-05
##                        Dim.18       Dim.19       Dim.20       Dim.21
## radius_mean      1.132594e-03 2.513375e-03 7.696239e-05 1.409279e-04
## texture_mean     8.889709e-05 4.390460e-05 1.857159e-03 6.025606e-03
## perimeter_mean   1.318859e-03 2.840305e-03 9.723374e-06 1.458996e-04
## area_mean        3.727802e-03 3.693512e-05 2.531981e-04 1.019542e-05
## smoothness_mean  6.528082e-03 1.339948e-03 9.112344e-06 4.279613e-04
## compactness_mean 3.196517e-06 3.996916e-03 7.441312e-03 1.112084e-03
##                        Dim.22       Dim.23       Dim.24       Dim.25
## radius_mean      0.0001459399 2.364136e-04 6.018682e-04 5.722780e-06
## texture_mean     0.0002466023 7.497518e-09 1.761957e-04 1.111845e-04
## perimeter_mean   0.0001550079 3.942330e-05 2.456738e-04 1.129874e-05
## area_mean        0.0002611979 1.472279e-06 8.808659e-05 6.829892e-04
## smoothness_mean  0.0001117708 1.039529e-05 8.520764e-05 1.297927e-05
## compactness_mean 0.0002639453 6.673311e-05 1.957922e-04 2.435359e-03
##                        Dim.26       Dim.27       Dim.28       Dim.29
## radius_mean      1.370911e-04 1.193730e-04 7.088910e-05 3.348312e-05
## texture_mean     4.931360e-06 2.078945e-06 6.883658e-12 8.309002e-08
## perimeter_mean   1.282994e-04 9.191915e-05 1.130487e-05 1.103155e-04
## area_mean        1.075942e-03 1.502419e-03 1.180255e-04 1.338527e-04
## smoothness_mean  1.119742e-05 3.351338e-05 3.477847e-09 8.833583e-09
## compactness_mean 5.648156e-04 6.593262e-05 4.742874e-08 1.259768e-06
##                        Dim.30
## radius_mean      6.564239e-05
## texture_mean     9.963774e-12
## perimeter_mean   6.332372e-05
## area_mean        1.444238e-07
## smoothness_mean  3.126267e-09
## compactness_mean 2.655286e-07
head(all_pca_var_1$contrib)
##                     Dim.1     Dim.2       Dim.3      Dim.4       Dim.5
## radius_mean      4.791828 5.4689158 0.007278210  0.1714702  0.14278085
## texture_mean     1.075879 0.3564817 0.416669002 36.3669303  0.24471672
## perimeter_mean   5.177322 4.6303018 0.008675469  0.1762581  0.13968654
## area_mean        4.883878 5.3396446 0.082366279  0.2855170  0.01067348
## smoothness_mean  2.033182 3.4638057 1.087680124  2.5402866 13.32896332
## compactness_mean 5.725748 2.3071061 0.548956087  0.1010895  0.01369829
##                         Dim.6      Dim.7       Dim.8       Dim.9     Dim.10
## radius_mean      0.0351217226 1.53979162 0.005553672 4.977796701 0.91176608
## texture_mean     0.1035477523 0.01299495 1.707590993 1.270115259 5.80492242
## perimeter_mean   0.0299582265 1.31049967 0.034921362 5.005923564 0.74624745
## area_mean        0.0003563592 0.26680766 0.120225880 3.825388875 0.56184752
## smoothness_mean  8.2010352310 1.97877655 8.350630500 0.004127705 0.48014757
## compactness_mean 0.0199683716 0.09559534 2.292085472 2.817074391 0.01673453
##                      Dim.11     Dim.12     Dim.13       Dim.14     Dim.15
## radius_mean      0.17198842  0.2607885 0.01432142  0.354098008 0.26131291
## texture_mean     9.13510741  6.4972186 4.13965139  0.046483789 1.16472490
## perimeter_mean   0.02816569  0.1515242 0.19456483  0.235358999 0.15922443
## area_mean        1.21373501  0.4282067 0.45394900  0.011730686 0.01950745
## smoothness_mean  1.87749853 10.0316126 0.20769532 19.808272945 1.39578544
## compactness_mean 9.48699343  1.0819546 5.25699162  0.006562713 5.33147923
##                    Dim.16    Dim.17       Dim.18     Dim.19      Dim.20
## radius_mean      2.267551 4.1178253  2.152451027 5.07982446  0.24699572
## texture_mean     2.491408 0.1498164  0.168945538 0.08873633  5.96018948
## perimeter_mean   1.309971 3.7955343  2.506441650 5.74058961  0.03120527
## area_mean        1.754248 6.5385437  7.084545992 0.07465020  0.81258978
## smoothness_mean  4.186658 2.8200456 12.406371982 2.70819168  0.02924428
## compactness_mean 2.896068 0.0412403  0.006074859 8.07823486 23.88143277
##                       Dim.21    Dim.22       Dim.23    Dim.24      Dim.25
## radius_mean       0.47018457 0.5318625 9.712634e-01 3.3335252  0.03696583
## texture_mean     20.10351786 0.8987160 3.080222e-05 0.9758830  0.71818728
## perimeter_mean    0.48677194 0.5649097 1.619636e-01 1.3606960  0.07298326
## area_mean         0.03401545 0.9519081 6.048598e-03 0.4878790  4.41171292
## smoothness_mean   1.42782777 0.4073369 4.270720e-02 0.4719336  0.08383854
## compactness_mean  3.71030023 0.9619207 2.741611e-01 1.0844206 15.73099874
##                       Dim.26      Dim.27       Dim.28       Dim.29       Dim.30
## radius_mean       1.67641371  1.72992648 4.460291e+00  4.471552424 4.933856e+01
## texture_mean      0.06030297  0.03012762 4.331148e-07  0.011096377 7.489035e-06
## perimeter_mean    1.56890519  1.33207198 7.112943e-01 14.732247365 4.759578e+01
## area_mean        13.15711689 21.77272040 7.426079e+00 17.875554424 1.085528e-01
## smoothness_mean   0.13692728  0.48566854 2.188236e-04  0.001179694 2.349785e-03
## compactness_mean  6.90682942  0.95548094 2.984182e-03  0.168237574 1.995783e-01
princomp()
all_pca_var_2 <- get_pca_var(all_pca_2)
all_pca_var_2
## Principal Component Analysis Results for variables
##  ===================================================
##   Name       Description                                    
## 1 "$coord"   "Coordinates for the variables"                
## 2 "$cor"     "Correlations between variables and dimensions"
## 3 "$cos2"    "Cos2 for the variables"                       
## 4 "$contrib" "contributions of the variables"
head(all_pca_var_2$coord)
##                      Dim.1      Dim.2       Dim.3       Dim.4       Dim.5
## radius_mean      0.7977668  0.5579027  0.01432118  0.05827700  0.04851878
## texture_mean     0.3780132  0.1424382 -0.10835829 -0.84870380 -0.06351944
## perimeter_mean   0.8292355  0.5133487  0.01563555  0.05908501  0.04799015
## area_mean        0.8053928  0.5512695 -0.04817717  0.07520017  0.01326563
## smoothness_mean  0.5196530 -0.4440017  0.17507219  0.22430770 -0.46878427
## compactness_mean 0.8720501 -0.3623611  0.12437565  0.04474618  0.01502824
##                         Dim.6        Dim.7        Dim.8        Dim.9
## radius_mean       0.020592339  0.101965596  0.005144876  0.144056156
## texture_mean     -0.035358035 -0.009367203 -0.090214585 -0.072767057
## perimeter_mean    0.019018481  0.094067834  0.012901209  0.144462575
## area_mean        -0.002074253  0.042444540 -0.023937777  0.126284788
## smoothness_mean  -0.314667668  0.115590213  0.199500717 -0.004148275
## compactness_mean -0.015527056 -0.025406278  0.104520200  0.108370831
##                        Dim.10       Dim.11      Dim.12      Dim.13       Dim.14
## radius_mean       0.056546476  0.022483349  0.02609749 0.005879269  0.023578980
## texture_mean      0.142679652 -0.163858215  0.13026214 0.099956785 -0.008543071
## perimeter_mean    0.051157023  0.009098538  0.01989278 0.021670182  0.019223333
## area_mean         0.044388765  0.059727362  0.03344115 0.033100452  0.004291657
## smoothness_mean  -0.041034694 -0.074285011  0.16186012 0.022389467  0.176354514
## compactness_mean  0.007660737 -0.166984319 -0.05315682 0.112641659  0.003210000
##                        Dim.15      Dim.16       Dim.17       Dim.18
## radius_mean       0.015683967  0.04255502  0.049456533  0.033654027
## texture_mean      0.033112133  0.04460615 -0.009433423 -0.009428525
## perimeter_mean    0.012242788  0.03234470  0.047481690  0.036316100
## area_mean        -0.004285246  0.03742982  0.062320398  0.061055728
## smoothness_mean   0.036248064  0.05782372  0.040927741 -0.080796547
## compactness_mean -0.070843391 -0.04809242 -0.004949378  0.001787881
##                        Dim.19       Dim.20       Dim.21      Dim.22
## radius_mean       0.050133570  0.008772821  0.011871307  0.01208056
## texture_mean      0.006626055  0.043094773 -0.077624778  0.01570358
## perimeter_mean    0.053294517  0.003118233  0.012078892  0.01245022
## area_mean        -0.006077427  0.015912200  0.003193026  0.01616162
## smoothness_mean  -0.036605301 -0.003018666  0.020687226  0.01057217
## compactness_mean  0.063221168 -0.086263038 -0.033347929 -0.01624639
##                         Dim.23       Dim.24       Dim.25       Dim.26
## radius_mean       1.537575e-02  0.024533003  0.002392233  0.011708590
## texture_mean      8.658821e-05 -0.013273874 -0.010544407  0.002220667
## perimeter_mean    6.278798e-03  0.015673984 -0.003361359  0.011326933
## area_mean        -1.213375e-03 -0.009385446  0.026134063 -0.032801549
## smoothness_mean   3.224173e-03 -0.009230799 -0.003602676  0.003346255
## compactness_mean -8.169034e-03  0.013992577 -0.049349353 -0.023765850
##                        Dim.27        Dim.28        Dim.29        Dim.30
## radius_mean       0.010925793  8.419566e-03  5.786460e-03  8.101999e-03
## texture_mean      0.001441855 -2.623673e-06 -2.882534e-04  3.156545e-06
## perimeter_mean    0.009587447  3.362272e-03  1.050312e-02 -7.957621e-03
## area_mean        -0.038761046 -1.086395e-02 -1.156947e-02 -3.800314e-04
## smoothness_mean  -0.005789074  5.897327e-05 -9.398714e-05 -5.591303e-05
## compactness_mean -0.008119890 -2.177814e-04 -1.122394e-03  5.152947e-04
head(all_pca_var_2$cor)
##                      Dim.1      Dim.2       Dim.3       Dim.4       Dim.5
## radius_mean      0.7977668  0.5579027  0.01432118  0.05827700  0.04851878
## texture_mean     0.3780132  0.1424382 -0.10835829 -0.84870380 -0.06351944
## perimeter_mean   0.8292355  0.5133487  0.01563555  0.05908501  0.04799015
## area_mean        0.8053928  0.5512695 -0.04817717  0.07520017  0.01326563
## smoothness_mean  0.5196530 -0.4440017  0.17507219  0.22430770 -0.46878427
## compactness_mean 0.8720501 -0.3623611  0.12437565  0.04474618  0.01502824
##                         Dim.6        Dim.7        Dim.8        Dim.9
## radius_mean       0.020592339  0.101965596  0.005144876  0.144056156
## texture_mean     -0.035358035 -0.009367203 -0.090214585 -0.072767057
## perimeter_mean    0.019018481  0.094067834  0.012901209  0.144462575
## area_mean        -0.002074253  0.042444540 -0.023937777  0.126284788
## smoothness_mean  -0.314667668  0.115590213  0.199500717 -0.004148275
## compactness_mean -0.015527056 -0.025406278  0.104520200  0.108370831
##                        Dim.10       Dim.11      Dim.12      Dim.13       Dim.14
## radius_mean       0.056546476  0.022483349  0.02609749 0.005879269  0.023578980
## texture_mean      0.142679652 -0.163858215  0.13026214 0.099956785 -0.008543071
## perimeter_mean    0.051157023  0.009098538  0.01989278 0.021670182  0.019223333
## area_mean         0.044388765  0.059727362  0.03344115 0.033100452  0.004291657
## smoothness_mean  -0.041034694 -0.074285011  0.16186012 0.022389467  0.176354514
## compactness_mean  0.007660737 -0.166984319 -0.05315682 0.112641659  0.003210000
##                        Dim.15      Dim.16       Dim.17       Dim.18
## radius_mean       0.015683967  0.04255502  0.049456533  0.033654027
## texture_mean      0.033112133  0.04460615 -0.009433423 -0.009428525
## perimeter_mean    0.012242788  0.03234470  0.047481690  0.036316100
## area_mean        -0.004285246  0.03742982  0.062320398  0.061055728
## smoothness_mean   0.036248064  0.05782372  0.040927741 -0.080796547
## compactness_mean -0.070843391 -0.04809242 -0.004949378  0.001787881
##                        Dim.19       Dim.20       Dim.21      Dim.22
## radius_mean       0.050133570  0.008772821  0.011871307  0.01208056
## texture_mean      0.006626055  0.043094773 -0.077624778  0.01570358
## perimeter_mean    0.053294517  0.003118233  0.012078892  0.01245022
## area_mean        -0.006077427  0.015912200  0.003193026  0.01616162
## smoothness_mean  -0.036605301 -0.003018666  0.020687226  0.01057217
## compactness_mean  0.063221168 -0.086263038 -0.033347929 -0.01624639
##                         Dim.23       Dim.24       Dim.25       Dim.26
## radius_mean       1.537575e-02  0.024533003  0.002392233  0.011708590
## texture_mean      8.658821e-05 -0.013273874 -0.010544407  0.002220667
## perimeter_mean    6.278798e-03  0.015673984 -0.003361359  0.011326933
## area_mean        -1.213375e-03 -0.009385446  0.026134063 -0.032801549
## smoothness_mean   3.224173e-03 -0.009230799 -0.003602676  0.003346255
## compactness_mean -8.169034e-03  0.013992577 -0.049349353 -0.023765850
##                        Dim.27        Dim.28        Dim.29        Dim.30
## radius_mean       0.010925793  8.419566e-03  5.786460e-03  8.101999e-03
## texture_mean      0.001441855 -2.623673e-06 -2.882534e-04  3.156545e-06
## perimeter_mean    0.009587447  3.362272e-03  1.050312e-02 -7.957621e-03
## area_mean        -0.038761046 -1.086395e-02 -1.156947e-02 -3.800314e-04
## smoothness_mean  -0.005789074  5.897327e-05 -9.398714e-05 -5.591303e-05
## compactness_mean -0.008119890 -2.177814e-04 -1.122394e-03  5.152947e-04
head(all_pca_var_2$cos2)
##                      Dim.1      Dim.2        Dim.3       Dim.4        Dim.5
## radius_mean      0.6364318 0.31125539 0.0002050963 0.003396209 0.0023540715
## texture_mean     0.1428940 0.02028864 0.0117415199 0.720298141 0.0040347193
## perimeter_mean   0.6876316 0.26352690 0.0002444703 0.003491038 0.0023030547
## area_mean        0.6486576 0.30389811 0.0023210397 0.005655066 0.0001759769
## smoothness_mean  0.2700393 0.19713747 0.0306502709 0.050313944 0.2197586899
## compactness_mean 0.7604714 0.13130559 0.0154693024 0.002002220 0.0002258480
##                         Dim.6        Dim.7        Dim.8        Dim.9
## radius_mean      4.240444e-04 0.0103969827 2.646975e-05 2.075218e-02
## texture_mean     1.250191e-03 0.0000877445 8.138671e-03 5.295045e-03
## perimeter_mean   3.617026e-04 0.0088487573 1.664412e-04 2.086944e-02
## area_mean        4.302527e-06 0.0018015390 5.730172e-04 1.594785e-02
## smoothness_mean  9.901574e-02 0.0133610973 3.980054e-02 1.720819e-05
## compactness_mean 2.410895e-04 0.0006454790 1.092447e-02 1.174424e-02
##                        Dim.10       Dim.11       Dim.12       Dim.13
## radius_mean      3.197504e-03 5.055010e-04 0.0006810789 3.456581e-05
## texture_mean     2.035748e-02 2.684951e-02 0.0169682252 9.991359e-03
## perimeter_mean   2.617041e-03 8.278339e-05 0.0003957226 4.695968e-04
## area_mean        1.970362e-03 3.567358e-03 0.0011183106 1.095640e-03
## smoothness_mean  1.683846e-03 5.518263e-03 0.0261986970 5.012882e-04
## compactness_mean 5.868689e-05 2.788376e-02 0.0028256473 1.268814e-02
##                        Dim.14       Dim.15      Dim.16       Dim.17
## radius_mean      5.559683e-04 2.459868e-04 0.001810929 2.445949e-03
## texture_mean     7.298407e-05 1.096413e-03 0.001989709 8.898948e-05
## perimeter_mean   3.695365e-04 1.498859e-04 0.001046179 2.254511e-03
## area_mean        1.841832e-05 1.836333e-05 0.001400992 3.883832e-03
## smoothness_mean  3.110091e-02 1.313922e-03 0.003343582 1.675080e-03
## compactness_mean 1.030410e-05 5.018786e-03 0.002312881 2.449634e-05
##                        Dim.18       Dim.19       Dim.20       Dim.21
## radius_mean      1.132594e-03 2.513375e-03 7.696239e-05 1.409279e-04
## texture_mean     8.889709e-05 4.390460e-05 1.857159e-03 6.025606e-03
## perimeter_mean   1.318859e-03 2.840305e-03 9.723374e-06 1.458996e-04
## area_mean        3.727802e-03 3.693512e-05 2.531981e-04 1.019542e-05
## smoothness_mean  6.528082e-03 1.339948e-03 9.112344e-06 4.279613e-04
## compactness_mean 3.196517e-06 3.996916e-03 7.441312e-03 1.112084e-03
##                        Dim.22       Dim.23       Dim.24       Dim.25
## radius_mean      0.0001459399 2.364136e-04 6.018682e-04 5.722780e-06
## texture_mean     0.0002466023 7.497518e-09 1.761957e-04 1.111845e-04
## perimeter_mean   0.0001550079 3.942330e-05 2.456738e-04 1.129874e-05
## area_mean        0.0002611979 1.472279e-06 8.808659e-05 6.829892e-04
## smoothness_mean  0.0001117708 1.039529e-05 8.520764e-05 1.297927e-05
## compactness_mean 0.0002639453 6.673311e-05 1.957922e-04 2.435359e-03
##                        Dim.26       Dim.27       Dim.28       Dim.29
## radius_mean      1.370911e-04 1.193730e-04 7.088910e-05 3.348312e-05
## texture_mean     4.931360e-06 2.078945e-06 6.883658e-12 8.309002e-08
## perimeter_mean   1.282994e-04 9.191915e-05 1.130487e-05 1.103155e-04
## area_mean        1.075942e-03 1.502419e-03 1.180255e-04 1.338527e-04
## smoothness_mean  1.119742e-05 3.351338e-05 3.477847e-09 8.833583e-09
## compactness_mean 5.648156e-04 6.593262e-05 4.742874e-08 1.259768e-06
##                        Dim.30
## radius_mean      6.564239e-05
## texture_mean     9.963774e-12
## perimeter_mean   6.332372e-05
## area_mean        1.444238e-07
## smoothness_mean  3.126267e-09
## compactness_mean 2.655286e-07
head(all_pca_var_2$contrib)
##                     Dim.1     Dim.2       Dim.3      Dim.4       Dim.5
## radius_mean      4.791828 5.4689158 0.007278210  0.1714702  0.14278085
## texture_mean     1.075879 0.3564817 0.416669002 36.3669303  0.24471672
## perimeter_mean   5.177322 4.6303018 0.008675469  0.1762581  0.13968654
## area_mean        4.883878 5.3396446 0.082366279  0.2855170  0.01067348
## smoothness_mean  2.033182 3.4638057 1.087680124  2.5402866 13.32896332
## compactness_mean 5.725748 2.3071061 0.548956087  0.1010895  0.01369829
##                         Dim.6      Dim.7       Dim.8       Dim.9     Dim.10
## radius_mean      0.0351217226 1.53979162 0.005553672 4.977796701 0.91176608
## texture_mean     0.1035477523 0.01299495 1.707590993 1.270115259 5.80492242
## perimeter_mean   0.0299582265 1.31049967 0.034921362 5.005923564 0.74624745
## area_mean        0.0003563592 0.26680766 0.120225880 3.825388875 0.56184752
## smoothness_mean  8.2010352310 1.97877655 8.350630500 0.004127705 0.48014757
## compactness_mean 0.0199683716 0.09559534 2.292085472 2.817074391 0.01673453
##                      Dim.11     Dim.12     Dim.13       Dim.14     Dim.15
## radius_mean      0.17198842  0.2607885 0.01432142  0.354098008 0.26131291
## texture_mean     9.13510741  6.4972186 4.13965139  0.046483789 1.16472490
## perimeter_mean   0.02816569  0.1515242 0.19456483  0.235358999 0.15922443
## area_mean        1.21373501  0.4282067 0.45394900  0.011730686 0.01950745
## smoothness_mean  1.87749853 10.0316126 0.20769532 19.808272945 1.39578544
## compactness_mean 9.48699343  1.0819546 5.25699162  0.006562713 5.33147923
##                    Dim.16    Dim.17       Dim.18     Dim.19      Dim.20
## radius_mean      2.267551 4.1178253  2.152451027 5.07982446  0.24699572
## texture_mean     2.491408 0.1498164  0.168945538 0.08873633  5.96018948
## perimeter_mean   1.309971 3.7955343  2.506441650 5.74058961  0.03120527
## area_mean        1.754248 6.5385437  7.084545992 0.07465020  0.81258978
## smoothness_mean  4.186658 2.8200456 12.406371982 2.70819168  0.02924428
## compactness_mean 2.896068 0.0412403  0.006074859 8.07823486 23.88143277
##                       Dim.21    Dim.22       Dim.23    Dim.24      Dim.25
## radius_mean       0.47018457 0.5318625 9.712634e-01 3.3335252  0.03696583
## texture_mean     20.10351786 0.8987160 3.080222e-05 0.9758830  0.71818728
## perimeter_mean    0.48677194 0.5649097 1.619636e-01 1.3606960  0.07298326
## area_mean         0.03401545 0.9519081 6.048598e-03 0.4878790  4.41171292
## smoothness_mean   1.42782777 0.4073369 4.270720e-02 0.4719336  0.08383854
## compactness_mean  3.71030023 0.9619207 2.741611e-01 1.0844206 15.73099874
##                       Dim.26      Dim.27       Dim.28       Dim.29       Dim.30
## radius_mean       1.67641371  1.72992648 4.460291e+00  4.471552424 4.933856e+01
## texture_mean      0.06030297  0.03012762 4.331148e-07  0.011096377 7.489035e-06
## perimeter_mean    1.56890519  1.33207198 7.112943e-01 14.732247365 4.759578e+01
## area_mean        13.15711689 21.77272040 7.426079e+00 17.875554424 1.085528e-01
## smoothness_mean   0.13692728  0.48566854 2.188236e-04  0.001179694 2.349785e-03
## compactness_mean  6.90682942  0.95548094 2.984182e-03  0.168237574 1.995783e-01
PCA()
all_pca_var_3 <- get_pca_var(all_pca_3)
all_pca_var_3
## Principal Component Analysis Results for variables
##  ===================================================
##   Name       Description                                    
## 1 "$coord"   "Coordinates for the variables"                
## 2 "$cor"     "Correlations between variables and dimensions"
## 3 "$cos2"    "Cos2 for the variables"                       
## 4 "$contrib" "contributions of the variables"
head(all_pca_var_3$coord)
##                      Dim.1      Dim.2       Dim.3       Dim.4       Dim.5
## radius_mean      0.7977668 -0.5579027 -0.01432118 -0.05827700 -0.04851878
## texture_mean     0.3780132 -0.1424382  0.10835829  0.84870380  0.06351944
## perimeter_mean   0.8292355 -0.5133487 -0.01563555 -0.05908501 -0.04799015
## area_mean        0.8053928 -0.5512695  0.04817717 -0.07520017 -0.01326563
## smoothness_mean  0.5196530  0.4440017 -0.17507219 -0.22430770  0.46878427
## compactness_mean 0.8720501  0.3623611 -0.12437565 -0.04474618 -0.01502824
##                         Dim.6        Dim.7        Dim.8        Dim.9
## radius_mean      -0.020592339  0.101965596 -0.005144876  0.144056156
## texture_mean      0.035358035 -0.009367203  0.090214585 -0.072767057
## perimeter_mean   -0.019018481  0.094067834 -0.012901209  0.144462575
## area_mean         0.002074253  0.042444540  0.023937777  0.126284788
## smoothness_mean   0.314667668  0.115590213 -0.199500717 -0.004148275
## compactness_mean  0.015527056 -0.025406278 -0.104520200  0.108370831
##                        Dim.10       Dim.11      Dim.12       Dim.13
## radius_mean       0.056546476  0.022483349  0.02609749 -0.005879269
## texture_mean      0.142679652 -0.163858215  0.13026214 -0.099956785
## perimeter_mean    0.051157023  0.009098538  0.01989278 -0.021670182
## area_mean         0.044388765  0.059727362  0.03344115 -0.033100452
## smoothness_mean  -0.041034694 -0.074285011  0.16186012 -0.022389467
## compactness_mean  0.007660737 -0.166984319 -0.05315682 -0.112641659
##                        Dim.14       Dim.15      Dim.16       Dim.17
## radius_mean      -0.023578980 -0.015683967  0.04255502  0.049456533
## texture_mean      0.008543071 -0.033112133  0.04460615 -0.009433423
## perimeter_mean   -0.019223333 -0.012242788  0.03234470  0.047481690
## area_mean        -0.004291657  0.004285246  0.03742982  0.062320398
## smoothness_mean  -0.176354514 -0.036248064  0.05782372  0.040927741
## compactness_mean -0.003210000  0.070843391 -0.04809242 -0.004949378
##                        Dim.18       Dim.19       Dim.20       Dim.21
## radius_mean       0.033654027  0.050133570  0.008772821  0.011871307
## texture_mean     -0.009428525  0.006626055  0.043094773 -0.077624778
## perimeter_mean    0.036316100  0.053294517  0.003118233  0.012078892
## area_mean         0.061055728 -0.006077427  0.015912200  0.003193026
## smoothness_mean  -0.080796547 -0.036605301 -0.003018666  0.020687226
## compactness_mean  0.001787881  0.063221168 -0.086263038 -0.033347929
##                       Dim.22        Dim.23       Dim.24       Dim.25
## radius_mean       0.01208056  1.537575e-02  0.024533003  0.002392233
## texture_mean      0.01570358  8.658821e-05 -0.013273874 -0.010544407
## perimeter_mean    0.01245022  6.278798e-03  0.015673984 -0.003361359
## area_mean         0.01616162 -1.213375e-03 -0.009385446  0.026134063
## smoothness_mean   0.01057217  3.224173e-03 -0.009230799 -0.003602676
## compactness_mean -0.01624639 -8.169034e-03  0.013992577 -0.049349353
##                        Dim.26       Dim.27        Dim.28        Dim.29
## radius_mean       0.011708590  0.010925793  8.419566e-03  5.786460e-03
## texture_mean      0.002220667  0.001441855 -2.623673e-06 -2.882534e-04
## perimeter_mean    0.011326933  0.009587447  3.362272e-03  1.050312e-02
## area_mean        -0.032801549 -0.038761046 -1.086395e-02 -1.156947e-02
## smoothness_mean   0.003346255 -0.005789074  5.897327e-05 -9.398714e-05
## compactness_mean -0.023765850 -0.008119890 -2.177814e-04 -1.122394e-03
##                         Dim.30
## radius_mean       8.101999e-03
## texture_mean      3.156545e-06
## perimeter_mean   -7.957621e-03
## area_mean        -3.800314e-04
## smoothness_mean  -5.591303e-05
## compactness_mean  5.152947e-04
head(all_pca_var_3$cor)
##                      Dim.1      Dim.2       Dim.3       Dim.4       Dim.5
## radius_mean      0.7977668 -0.5579027 -0.01432118 -0.05827700 -0.04851878
## texture_mean     0.3780132 -0.1424382  0.10835829  0.84870380  0.06351944
## perimeter_mean   0.8292355 -0.5133487 -0.01563555 -0.05908501 -0.04799015
## area_mean        0.8053928 -0.5512695  0.04817717 -0.07520017 -0.01326563
## smoothness_mean  0.5196530  0.4440017 -0.17507219 -0.22430770  0.46878427
## compactness_mean 0.8720501  0.3623611 -0.12437565 -0.04474618 -0.01502824
##                         Dim.6        Dim.7        Dim.8        Dim.9
## radius_mean      -0.020592339  0.101965596 -0.005144876  0.144056156
## texture_mean      0.035358035 -0.009367203  0.090214585 -0.072767057
## perimeter_mean   -0.019018481  0.094067834 -0.012901209  0.144462575
## area_mean         0.002074253  0.042444540  0.023937777  0.126284788
## smoothness_mean   0.314667668  0.115590213 -0.199500717 -0.004148275
## compactness_mean  0.015527056 -0.025406278 -0.104520200  0.108370831
##                        Dim.10       Dim.11      Dim.12       Dim.13
## radius_mean       0.056546476  0.022483349  0.02609749 -0.005879269
## texture_mean      0.142679652 -0.163858215  0.13026214 -0.099956785
## perimeter_mean    0.051157023  0.009098538  0.01989278 -0.021670182
## area_mean         0.044388765  0.059727362  0.03344115 -0.033100452
## smoothness_mean  -0.041034694 -0.074285011  0.16186012 -0.022389467
## compactness_mean  0.007660737 -0.166984319 -0.05315682 -0.112641659
##                        Dim.14       Dim.15      Dim.16       Dim.17
## radius_mean      -0.023578980 -0.015683967  0.04255502  0.049456533
## texture_mean      0.008543071 -0.033112133  0.04460615 -0.009433423
## perimeter_mean   -0.019223333 -0.012242788  0.03234470  0.047481690
## area_mean        -0.004291657  0.004285246  0.03742982  0.062320398
## smoothness_mean  -0.176354514 -0.036248064  0.05782372  0.040927741
## compactness_mean -0.003210000  0.070843391 -0.04809242 -0.004949378
##                        Dim.18       Dim.19       Dim.20       Dim.21
## radius_mean       0.033654027  0.050133570  0.008772821  0.011871307
## texture_mean     -0.009428525  0.006626055  0.043094773 -0.077624778
## perimeter_mean    0.036316100  0.053294517  0.003118233  0.012078892
## area_mean         0.061055728 -0.006077427  0.015912200  0.003193026
## smoothness_mean  -0.080796547 -0.036605301 -0.003018666  0.020687226
## compactness_mean  0.001787881  0.063221168 -0.086263038 -0.033347929
##                       Dim.22        Dim.23       Dim.24       Dim.25
## radius_mean       0.01208056  1.537575e-02  0.024533003  0.002392233
## texture_mean      0.01570358  8.658821e-05 -0.013273874 -0.010544407
## perimeter_mean    0.01245022  6.278798e-03  0.015673984 -0.003361359
## area_mean         0.01616162 -1.213375e-03 -0.009385446  0.026134063
## smoothness_mean   0.01057217  3.224173e-03 -0.009230799 -0.003602676
## compactness_mean -0.01624639 -8.169034e-03  0.013992577 -0.049349353
##                        Dim.26       Dim.27        Dim.28        Dim.29
## radius_mean       0.011708590  0.010925793  8.419566e-03  5.786460e-03
## texture_mean      0.002220667  0.001441855 -2.623673e-06 -2.882534e-04
## perimeter_mean    0.011326933  0.009587447  3.362272e-03  1.050312e-02
## area_mean        -0.032801549 -0.038761046 -1.086395e-02 -1.156947e-02
## smoothness_mean   0.003346255 -0.005789074  5.897327e-05 -9.398714e-05
## compactness_mean -0.023765850 -0.008119890 -2.177814e-04 -1.122394e-03
##                         Dim.30
## radius_mean       8.101999e-03
## texture_mean      3.156545e-06
## perimeter_mean   -7.957621e-03
## area_mean        -3.800314e-04
## smoothness_mean  -5.591303e-05
## compactness_mean  5.152947e-04
head(all_pca_var_3$cos2)
##                      Dim.1      Dim.2        Dim.3       Dim.4        Dim.5
## radius_mean      0.6364318 0.31125539 0.0002050963 0.003396209 0.0023540715
## texture_mean     0.1428940 0.02028864 0.0117415199 0.720298141 0.0040347193
## perimeter_mean   0.6876316 0.26352690 0.0002444703 0.003491038 0.0023030547
## area_mean        0.6486576 0.30389811 0.0023210397 0.005655066 0.0001759769
## smoothness_mean  0.2700393 0.19713747 0.0306502709 0.050313944 0.2197586899
## compactness_mean 0.7604714 0.13130559 0.0154693024 0.002002220 0.0002258480
##                         Dim.6        Dim.7        Dim.8        Dim.9
## radius_mean      4.240444e-04 0.0103969827 2.646975e-05 2.075218e-02
## texture_mean     1.250191e-03 0.0000877445 8.138671e-03 5.295045e-03
## perimeter_mean   3.617026e-04 0.0088487573 1.664412e-04 2.086944e-02
## area_mean        4.302527e-06 0.0018015390 5.730172e-04 1.594785e-02
## smoothness_mean  9.901574e-02 0.0133610973 3.980054e-02 1.720819e-05
## compactness_mean 2.410895e-04 0.0006454790 1.092447e-02 1.174424e-02
##                        Dim.10       Dim.11       Dim.12       Dim.13
## radius_mean      3.197504e-03 5.055010e-04 0.0006810789 3.456581e-05
## texture_mean     2.035748e-02 2.684951e-02 0.0169682252 9.991359e-03
## perimeter_mean   2.617041e-03 8.278339e-05 0.0003957226 4.695968e-04
## area_mean        1.970362e-03 3.567358e-03 0.0011183106 1.095640e-03
## smoothness_mean  1.683846e-03 5.518263e-03 0.0261986970 5.012882e-04
## compactness_mean 5.868689e-05 2.788376e-02 0.0028256473 1.268814e-02
##                        Dim.14       Dim.15      Dim.16       Dim.17
## radius_mean      5.559683e-04 2.459868e-04 0.001810929 2.445949e-03
## texture_mean     7.298407e-05 1.096413e-03 0.001989709 8.898948e-05
## perimeter_mean   3.695365e-04 1.498859e-04 0.001046179 2.254511e-03
## area_mean        1.841832e-05 1.836333e-05 0.001400992 3.883832e-03
## smoothness_mean  3.110091e-02 1.313922e-03 0.003343582 1.675080e-03
## compactness_mean 1.030410e-05 5.018786e-03 0.002312881 2.449634e-05
##                        Dim.18       Dim.19       Dim.20       Dim.21
## radius_mean      1.132594e-03 2.513375e-03 7.696239e-05 1.409279e-04
## texture_mean     8.889709e-05 4.390460e-05 1.857159e-03 6.025606e-03
## perimeter_mean   1.318859e-03 2.840305e-03 9.723374e-06 1.458996e-04
## area_mean        3.727802e-03 3.693512e-05 2.531981e-04 1.019542e-05
## smoothness_mean  6.528082e-03 1.339948e-03 9.112344e-06 4.279613e-04
## compactness_mean 3.196517e-06 3.996916e-03 7.441312e-03 1.112084e-03
##                        Dim.22       Dim.23       Dim.24       Dim.25
## radius_mean      0.0001459399 2.364136e-04 6.018682e-04 5.722780e-06
## texture_mean     0.0002466023 7.497518e-09 1.761957e-04 1.111845e-04
## perimeter_mean   0.0001550079 3.942330e-05 2.456738e-04 1.129874e-05
## area_mean        0.0002611979 1.472279e-06 8.808659e-05 6.829892e-04
## smoothness_mean  0.0001117708 1.039529e-05 8.520764e-05 1.297927e-05
## compactness_mean 0.0002639453 6.673311e-05 1.957922e-04 2.435359e-03
##                        Dim.26       Dim.27       Dim.28       Dim.29
## radius_mean      1.370911e-04 1.193730e-04 7.088910e-05 3.348312e-05
## texture_mean     4.931360e-06 2.078945e-06 6.883658e-12 8.309002e-08
## perimeter_mean   1.282994e-04 9.191915e-05 1.130487e-05 1.103155e-04
## area_mean        1.075942e-03 1.502419e-03 1.180255e-04 1.338527e-04
## smoothness_mean  1.119742e-05 3.351338e-05 3.477847e-09 8.833583e-09
## compactness_mean 5.648156e-04 6.593262e-05 4.742874e-08 1.259768e-06
##                        Dim.30
## radius_mean      6.564239e-05
## texture_mean     9.963774e-12
## perimeter_mean   6.332372e-05
## area_mean        1.444238e-07
## smoothness_mean  3.126267e-09
## compactness_mean 2.655286e-07
head(all_pca_var_3$contrib)
##                     Dim.1     Dim.2       Dim.3      Dim.4       Dim.5
## radius_mean      4.791828 5.4689158 0.007278210  0.1714702  0.14278085
## texture_mean     1.075879 0.3564817 0.416669002 36.3669303  0.24471672
## perimeter_mean   5.177322 4.6303018 0.008675469  0.1762581  0.13968654
## area_mean        4.883878 5.3396446 0.082366279  0.2855170  0.01067348
## smoothness_mean  2.033182 3.4638057 1.087680124  2.5402866 13.32896332
## compactness_mean 5.725748 2.3071061 0.548956087  0.1010895  0.01369829
##                         Dim.6      Dim.7       Dim.8       Dim.9     Dim.10
## radius_mean      0.0351217226 1.53979162 0.005553672 4.977796701 0.91176608
## texture_mean     0.1035477523 0.01299495 1.707590993 1.270115259 5.80492242
## perimeter_mean   0.0299582265 1.31049967 0.034921362 5.005923564 0.74624745
## area_mean        0.0003563592 0.26680766 0.120225880 3.825388875 0.56184752
## smoothness_mean  8.2010352310 1.97877655 8.350630500 0.004127705 0.48014757
## compactness_mean 0.0199683716 0.09559534 2.292085472 2.817074391 0.01673453
##                      Dim.11     Dim.12     Dim.13       Dim.14     Dim.15
## radius_mean      0.17198842  0.2607885 0.01432142  0.354098008 0.26131291
## texture_mean     9.13510741  6.4972186 4.13965139  0.046483789 1.16472490
## perimeter_mean   0.02816569  0.1515242 0.19456483  0.235358999 0.15922443
## area_mean        1.21373501  0.4282067 0.45394900  0.011730686 0.01950745
## smoothness_mean  1.87749853 10.0316126 0.20769532 19.808272945 1.39578544
## compactness_mean 9.48699343  1.0819546 5.25699162  0.006562713 5.33147923
##                    Dim.16    Dim.17       Dim.18     Dim.19      Dim.20
## radius_mean      2.267551 4.1178253  2.152451027 5.07982446  0.24699572
## texture_mean     2.491408 0.1498164  0.168945538 0.08873633  5.96018948
## perimeter_mean   1.309971 3.7955343  2.506441650 5.74058961  0.03120527
## area_mean        1.754248 6.5385437  7.084545992 0.07465020  0.81258978
## smoothness_mean  4.186658 2.8200456 12.406371982 2.70819168  0.02924428
## compactness_mean 2.896068 0.0412403  0.006074859 8.07823486 23.88143277
##                       Dim.21    Dim.22       Dim.23    Dim.24      Dim.25
## radius_mean       0.47018457 0.5318625 9.712634e-01 3.3335252  0.03696583
## texture_mean     20.10351786 0.8987160 3.080222e-05 0.9758830  0.71818728
## perimeter_mean    0.48677194 0.5649097 1.619636e-01 1.3606960  0.07298326
## area_mean         0.03401545 0.9519081 6.048598e-03 0.4878790  4.41171292
## smoothness_mean   1.42782777 0.4073369 4.270720e-02 0.4719336  0.08383854
## compactness_mean  3.71030023 0.9619207 2.741611e-01 1.0844206 15.73099874
##                       Dim.26      Dim.27       Dim.28       Dim.29       Dim.30
## radius_mean       1.67641371  1.72992648 4.460291e+00  4.471552424 4.933856e+01
## texture_mean      0.06030297  0.03012762 4.331148e-07  0.011096377 7.489035e-06
## perimeter_mean    1.56890519  1.33207198 7.112943e-01 14.732247365 4.759578e+01
## area_mean        13.15711689 21.77272040 7.426079e+00 17.875554424 1.085528e-01
## smoothness_mean   0.13692728  0.48566854 2.188236e-04  0.001179694 2.349785e-03
## compactness_mean  6.90682942  0.95548094 2.984182e-03  0.168237574 1.995783e-01
dudi.pca()
all_pca_var_4 <- get_pca_var(all_pca_4)
all_pca_var_4
## Principal Component Analysis Results for variables
##  ===================================================
##   Name       Description                                    
## 1 "$coord"   "Coordinates for the variables"                
## 2 "$cor"     "Correlations between variables and dimensions"
## 3 "$cos2"    "Cos2 for the variables"                       
## 4 "$contrib" "contributions of the variables"
head(all_pca_var_4$coord)
##                       Dim.1      Dim.2       Dim.3       Dim.4       Dim.5
## radius_mean      -0.7977668 -0.5579027 -0.01432118  0.05827700  0.04851878
## texture_mean     -0.3780132 -0.1424382  0.10835829 -0.84870380 -0.06351944
## perimeter_mean   -0.8292355 -0.5133487 -0.01563555  0.05908501  0.04799015
## area_mean        -0.8053928 -0.5512695  0.04817717  0.07520017  0.01326563
## smoothness_mean  -0.5196530  0.4440017 -0.17507219  0.22430770 -0.46878427
## compactness_mean -0.8720501  0.3623611 -0.12437565  0.04474618  0.01502824
##                         Dim.6        Dim.7        Dim.8        Dim.9
## radius_mean       0.020592339 -0.101965596  0.005144876 -0.144056156
## texture_mean     -0.035358035  0.009367203 -0.090214585  0.072767057
## perimeter_mean    0.019018481 -0.094067834  0.012901209 -0.144462575
## area_mean        -0.002074253 -0.042444540 -0.023937777 -0.126284788
## smoothness_mean  -0.314667668 -0.115590213  0.199500717  0.004148275
## compactness_mean -0.015527056  0.025406278  0.104520200 -0.108370831
##                        Dim.10       Dim.11      Dim.12      Dim.13       Dim.14
## radius_mean       0.056546476  0.022483349  0.02609749 0.005879269 -0.023578980
## texture_mean      0.142679652 -0.163858215  0.13026214 0.099956785  0.008543071
## perimeter_mean    0.051157023  0.009098538  0.01989278 0.021670182 -0.019223333
## area_mean         0.044388765  0.059727362  0.03344115 0.033100452 -0.004291657
## smoothness_mean  -0.041034694 -0.074285011  0.16186012 0.022389467 -0.176354514
## compactness_mean  0.007660737 -0.166984319 -0.05315682 0.112641659 -0.003210000
##                        Dim.15      Dim.16       Dim.17       Dim.18
## radius_mean       0.015683967  0.04255502 -0.049456533 -0.033654027
## texture_mean      0.033112133  0.04460615  0.009433423  0.009428525
## perimeter_mean    0.012242788  0.03234470 -0.047481690 -0.036316100
## area_mean        -0.004285246  0.03742982 -0.062320398 -0.061055728
## smoothness_mean   0.036248064  0.05782372 -0.040927741  0.080796547
## compactness_mean -0.070843391 -0.04809242  0.004949378 -0.001787881
##                        Dim.19       Dim.20       Dim.21      Dim.22
## radius_mean       0.050133570 -0.008772821 -0.011871307  0.01208056
## texture_mean      0.006626055 -0.043094773  0.077624778  0.01570358
## perimeter_mean    0.053294517 -0.003118233 -0.012078892  0.01245022
## area_mean        -0.006077427 -0.015912200 -0.003193026  0.01616162
## smoothness_mean  -0.036605301  0.003018666 -0.020687226  0.01057217
## compactness_mean  0.063221168  0.086263038  0.033347929 -0.01624639
##                         Dim.23       Dim.24       Dim.25       Dim.26
## radius_mean      -1.537575e-02  0.024533003 -0.002392233 -0.011708590
## texture_mean     -8.658821e-05 -0.013273874  0.010544407 -0.002220667
## perimeter_mean   -6.278798e-03  0.015673984  0.003361359 -0.011326933
## area_mean         1.213375e-03 -0.009385446 -0.026134063  0.032801549
## smoothness_mean  -3.224173e-03 -0.009230799  0.003602676 -0.003346255
## compactness_mean  8.169034e-03  0.013992577  0.049349353  0.023765850
##                        Dim.27        Dim.28        Dim.29        Dim.30
## radius_mean       0.010925793 -8.419566e-03  5.786460e-03  8.101999e-03
## texture_mean      0.001441855  2.623673e-06 -2.882534e-04  3.156545e-06
## perimeter_mean    0.009587447 -3.362272e-03  1.050312e-02 -7.957621e-03
## area_mean        -0.038761046  1.086395e-02 -1.156947e-02 -3.800314e-04
## smoothness_mean  -0.005789074 -5.897327e-05 -9.398714e-05 -5.591303e-05
## compactness_mean -0.008119890  2.177814e-04 -1.122394e-03  5.152947e-04
head(all_pca_var_4$cor)
##                       Dim.1      Dim.2       Dim.3       Dim.4       Dim.5
## radius_mean      -0.7977668 -0.5579027 -0.01432118  0.05827700  0.04851878
## texture_mean     -0.3780132 -0.1424382  0.10835829 -0.84870380 -0.06351944
## perimeter_mean   -0.8292355 -0.5133487 -0.01563555  0.05908501  0.04799015
## area_mean        -0.8053928 -0.5512695  0.04817717  0.07520017  0.01326563
## smoothness_mean  -0.5196530  0.4440017 -0.17507219  0.22430770 -0.46878427
## compactness_mean -0.8720501  0.3623611 -0.12437565  0.04474618  0.01502824
##                         Dim.6        Dim.7        Dim.8        Dim.9
## radius_mean       0.020592339 -0.101965596  0.005144876 -0.144056156
## texture_mean     -0.035358035  0.009367203 -0.090214585  0.072767057
## perimeter_mean    0.019018481 -0.094067834  0.012901209 -0.144462575
## area_mean        -0.002074253 -0.042444540 -0.023937777 -0.126284788
## smoothness_mean  -0.314667668 -0.115590213  0.199500717  0.004148275
## compactness_mean -0.015527056  0.025406278  0.104520200 -0.108370831
##                        Dim.10       Dim.11      Dim.12      Dim.13       Dim.14
## radius_mean       0.056546476  0.022483349  0.02609749 0.005879269 -0.023578980
## texture_mean      0.142679652 -0.163858215  0.13026214 0.099956785  0.008543071
## perimeter_mean    0.051157023  0.009098538  0.01989278 0.021670182 -0.019223333
## area_mean         0.044388765  0.059727362  0.03344115 0.033100452 -0.004291657
## smoothness_mean  -0.041034694 -0.074285011  0.16186012 0.022389467 -0.176354514
## compactness_mean  0.007660737 -0.166984319 -0.05315682 0.112641659 -0.003210000
##                        Dim.15      Dim.16       Dim.17       Dim.18
## radius_mean       0.015683967  0.04255502 -0.049456533 -0.033654027
## texture_mean      0.033112133  0.04460615  0.009433423  0.009428525
## perimeter_mean    0.012242788  0.03234470 -0.047481690 -0.036316100
## area_mean        -0.004285246  0.03742982 -0.062320398 -0.061055728
## smoothness_mean   0.036248064  0.05782372 -0.040927741  0.080796547
## compactness_mean -0.070843391 -0.04809242  0.004949378 -0.001787881
##                        Dim.19       Dim.20       Dim.21      Dim.22
## radius_mean       0.050133570 -0.008772821 -0.011871307  0.01208056
## texture_mean      0.006626055 -0.043094773  0.077624778  0.01570358
## perimeter_mean    0.053294517 -0.003118233 -0.012078892  0.01245022
## area_mean        -0.006077427 -0.015912200 -0.003193026  0.01616162
## smoothness_mean  -0.036605301  0.003018666 -0.020687226  0.01057217
## compactness_mean  0.063221168  0.086263038  0.033347929 -0.01624639
##                         Dim.23       Dim.24       Dim.25       Dim.26
## radius_mean      -1.537575e-02  0.024533003 -0.002392233 -0.011708590
## texture_mean     -8.658821e-05 -0.013273874  0.010544407 -0.002220667
## perimeter_mean   -6.278798e-03  0.015673984  0.003361359 -0.011326933
## area_mean         1.213375e-03 -0.009385446 -0.026134063  0.032801549
## smoothness_mean  -3.224173e-03 -0.009230799  0.003602676 -0.003346255
## compactness_mean  8.169034e-03  0.013992577  0.049349353  0.023765850
##                        Dim.27        Dim.28        Dim.29        Dim.30
## radius_mean       0.010925793 -8.419566e-03  5.786460e-03  8.101999e-03
## texture_mean      0.001441855  2.623673e-06 -2.882534e-04  3.156545e-06
## perimeter_mean    0.009587447 -3.362272e-03  1.050312e-02 -7.957621e-03
## area_mean        -0.038761046  1.086395e-02 -1.156947e-02 -3.800314e-04
## smoothness_mean  -0.005789074 -5.897327e-05 -9.398714e-05 -5.591303e-05
## compactness_mean -0.008119890  2.177814e-04 -1.122394e-03  5.152947e-04
head(all_pca_var_4$cos2)
##                      Dim.1      Dim.2        Dim.3       Dim.4        Dim.5
## radius_mean      0.6364318 0.31125539 0.0002050963 0.003396209 0.0023540715
## texture_mean     0.1428940 0.02028864 0.0117415199 0.720298141 0.0040347193
## perimeter_mean   0.6876316 0.26352690 0.0002444703 0.003491038 0.0023030547
## area_mean        0.6486576 0.30389811 0.0023210397 0.005655066 0.0001759769
## smoothness_mean  0.2700393 0.19713747 0.0306502709 0.050313944 0.2197586899
## compactness_mean 0.7604714 0.13130559 0.0154693024 0.002002220 0.0002258480
##                         Dim.6        Dim.7        Dim.8        Dim.9
## radius_mean      4.240444e-04 0.0103969827 2.646975e-05 2.075218e-02
## texture_mean     1.250191e-03 0.0000877445 8.138671e-03 5.295045e-03
## perimeter_mean   3.617026e-04 0.0088487573 1.664412e-04 2.086944e-02
## area_mean        4.302527e-06 0.0018015390 5.730172e-04 1.594785e-02
## smoothness_mean  9.901574e-02 0.0133610973 3.980054e-02 1.720819e-05
## compactness_mean 2.410895e-04 0.0006454790 1.092447e-02 1.174424e-02
##                        Dim.10       Dim.11       Dim.12       Dim.13
## radius_mean      3.197504e-03 5.055010e-04 0.0006810789 3.456581e-05
## texture_mean     2.035748e-02 2.684951e-02 0.0169682252 9.991359e-03
## perimeter_mean   2.617041e-03 8.278339e-05 0.0003957226 4.695968e-04
## area_mean        1.970362e-03 3.567358e-03 0.0011183106 1.095640e-03
## smoothness_mean  1.683846e-03 5.518263e-03 0.0261986970 5.012882e-04
## compactness_mean 5.868689e-05 2.788376e-02 0.0028256473 1.268814e-02
##                        Dim.14       Dim.15      Dim.16       Dim.17
## radius_mean      5.559683e-04 2.459868e-04 0.001810929 2.445949e-03
## texture_mean     7.298407e-05 1.096413e-03 0.001989709 8.898948e-05
## perimeter_mean   3.695365e-04 1.498859e-04 0.001046179 2.254511e-03
## area_mean        1.841832e-05 1.836333e-05 0.001400992 3.883832e-03
## smoothness_mean  3.110091e-02 1.313922e-03 0.003343582 1.675080e-03
## compactness_mean 1.030410e-05 5.018786e-03 0.002312881 2.449634e-05
##                        Dim.18       Dim.19       Dim.20       Dim.21
## radius_mean      1.132594e-03 2.513375e-03 7.696239e-05 1.409279e-04
## texture_mean     8.889709e-05 4.390460e-05 1.857159e-03 6.025606e-03
## perimeter_mean   1.318859e-03 2.840305e-03 9.723374e-06 1.458996e-04
## area_mean        3.727802e-03 3.693512e-05 2.531981e-04 1.019542e-05
## smoothness_mean  6.528082e-03 1.339948e-03 9.112344e-06 4.279613e-04
## compactness_mean 3.196517e-06 3.996916e-03 7.441312e-03 1.112084e-03
##                        Dim.22       Dim.23       Dim.24       Dim.25
## radius_mean      0.0001459399 2.364136e-04 6.018682e-04 5.722780e-06
## texture_mean     0.0002466023 7.497518e-09 1.761957e-04 1.111845e-04
## perimeter_mean   0.0001550079 3.942330e-05 2.456738e-04 1.129874e-05
## area_mean        0.0002611979 1.472279e-06 8.808659e-05 6.829892e-04
## smoothness_mean  0.0001117708 1.039529e-05 8.520764e-05 1.297927e-05
## compactness_mean 0.0002639453 6.673311e-05 1.957922e-04 2.435359e-03
##                        Dim.26       Dim.27       Dim.28       Dim.29
## radius_mean      1.370911e-04 1.193730e-04 7.088910e-05 3.348312e-05
## texture_mean     4.931360e-06 2.078945e-06 6.883658e-12 8.309002e-08
## perimeter_mean   1.282994e-04 9.191915e-05 1.130487e-05 1.103155e-04
## area_mean        1.075942e-03 1.502419e-03 1.180255e-04 1.338527e-04
## smoothness_mean  1.119742e-05 3.351338e-05 3.477847e-09 8.833583e-09
## compactness_mean 5.648156e-04 6.593262e-05 4.742874e-08 1.259768e-06
##                        Dim.30
## radius_mean      6.564239e-05
## texture_mean     9.963774e-12
## perimeter_mean   6.332372e-05
## area_mean        1.444238e-07
## smoothness_mean  3.126267e-09
## compactness_mean 2.655286e-07
head(all_pca_var_4$contrib)
##                     Dim.1     Dim.2       Dim.3      Dim.4       Dim.5
## radius_mean      4.791828 5.4689158 0.007278210  0.1714702  0.14278085
## texture_mean     1.075879 0.3564817 0.416669002 36.3669303  0.24471672
## perimeter_mean   5.177322 4.6303018 0.008675469  0.1762581  0.13968654
## area_mean        4.883878 5.3396446 0.082366279  0.2855170  0.01067348
## smoothness_mean  2.033182 3.4638057 1.087680124  2.5402866 13.32896332
## compactness_mean 5.725748 2.3071061 0.548956087  0.1010895  0.01369829
##                         Dim.6      Dim.7       Dim.8       Dim.9     Dim.10
## radius_mean      0.0351217226 1.53979162 0.005553672 4.977796701 0.91176608
## texture_mean     0.1035477523 0.01299495 1.707590993 1.270115259 5.80492242
## perimeter_mean   0.0299582265 1.31049967 0.034921362 5.005923564 0.74624745
## area_mean        0.0003563592 0.26680766 0.120225880 3.825388875 0.56184752
## smoothness_mean  8.2010352310 1.97877655 8.350630500 0.004127705 0.48014757
## compactness_mean 0.0199683716 0.09559534 2.292085472 2.817074391 0.01673453
##                      Dim.11     Dim.12     Dim.13       Dim.14     Dim.15
## radius_mean      0.17198842  0.2607885 0.01432142  0.354098008 0.26131291
## texture_mean     9.13510741  6.4972186 4.13965139  0.046483789 1.16472490
## perimeter_mean   0.02816569  0.1515242 0.19456483  0.235358999 0.15922443
## area_mean        1.21373501  0.4282067 0.45394900  0.011730686 0.01950745
## smoothness_mean  1.87749853 10.0316126 0.20769532 19.808272945 1.39578544
## compactness_mean 9.48699343  1.0819546 5.25699162  0.006562713 5.33147923
##                    Dim.16    Dim.17       Dim.18     Dim.19      Dim.20
## radius_mean      2.267551 4.1178253  2.152451027 5.07982446  0.24699572
## texture_mean     2.491408 0.1498164  0.168945538 0.08873633  5.96018948
## perimeter_mean   1.309971 3.7955343  2.506441650 5.74058961  0.03120527
## area_mean        1.754248 6.5385437  7.084545992 0.07465020  0.81258978
## smoothness_mean  4.186658 2.8200456 12.406371982 2.70819168  0.02924428
## compactness_mean 2.896068 0.0412403  0.006074859 8.07823486 23.88143277
##                       Dim.21    Dim.22       Dim.23    Dim.24      Dim.25
## radius_mean       0.47018457 0.5318625 9.712634e-01 3.3335252  0.03696583
## texture_mean     20.10351786 0.8987160 3.080222e-05 0.9758830  0.71818728
## perimeter_mean    0.48677194 0.5649097 1.619636e-01 1.3606960  0.07298326
## area_mean         0.03401545 0.9519081 6.048598e-03 0.4878790  4.41171292
## smoothness_mean   1.42782777 0.4073369 4.270720e-02 0.4719336  0.08383854
## compactness_mean  3.71030023 0.9619207 2.741611e-01 1.0844206 15.73099874
##                       Dim.26      Dim.27       Dim.28       Dim.29       Dim.30
## radius_mean       1.67641371  1.72992648 4.460291e+00  4.471552424 4.933856e+01
## texture_mean      0.06030297  0.03012762 4.331148e-07  0.011096377 7.489035e-06
## perimeter_mean    1.56890519  1.33207198 7.112943e-01 14.732247365 4.759578e+01
## area_mean        13.15711689 21.77272040 7.426079e+00 17.875554424 1.085528e-01
## smoothness_mean   0.13692728  0.48566854 2.188236e-04  0.001179694 2.349785e-03
## compactness_mean  6.90682942  0.95548094 2.984182e-03  0.168237574 1.995783e-01
epPCA()
all_pca_var_5 <- get_pca_var(all_pca_5)
all_pca_var_5
## Principal Component Analysis Results for variables
##  ===================================================
##   Name       Description                                    
## 1 "$coord"   "Coordinates for the variables"                
## 2 "$cor"     "Correlations between variables and dimensions"
## 3 "$cos2"    "Cos2 for the variables"                       
## 4 "$contrib" "contributions of the variables"
head(all_pca_var_5$coord)
##                        [,1]       [,2]        [,3]        [,4]        [,5]
## radius_mean      -0.7977668  0.5579027 -0.01432118  0.05827700 -0.04851878
## texture_mean     -0.3780132  0.1424382  0.10835829 -0.84870380  0.06351944
## perimeter_mean   -0.8292355  0.5133487 -0.01563555  0.05908501 -0.04799015
## area_mean        -0.8053928  0.5512695  0.04817717  0.07520017 -0.01326563
## smoothness_mean  -0.5196530 -0.4440017 -0.17507219  0.22430770  0.46878427
## compactness_mean -0.8720501 -0.3623611 -0.12437565  0.04474618 -0.01502824
##                          [,6]         [,7]         [,8]         [,9]
## radius_mean       0.020592339 -0.101965596  0.005144876 -0.144056156
## texture_mean     -0.035358035  0.009367203 -0.090214585  0.072767057
## perimeter_mean    0.019018481 -0.094067834  0.012901209 -0.144462575
## area_mean        -0.002074253 -0.042444540 -0.023937777 -0.126284788
## smoothness_mean  -0.314667668 -0.115590213  0.199500717  0.004148275
## compactness_mean -0.015527056  0.025406278  0.104520200 -0.108370831
##                         [,10]        [,11]       [,12]       [,13]        [,14]
## radius_mean       0.056546476 -0.022483349  0.02609749 0.005879269  0.023578980
## texture_mean      0.142679652  0.163858215  0.13026214 0.099956785 -0.008543071
## perimeter_mean    0.051157023 -0.009098538  0.01989278 0.021670182  0.019223333
## area_mean         0.044388765 -0.059727362  0.03344115 0.033100452  0.004291657
## smoothness_mean  -0.041034694  0.074285011  0.16186012 0.022389467  0.176354514
## compactness_mean  0.007660737  0.166984319 -0.05315682 0.112641659  0.003210000
##                         [,15]       [,16]        [,17]        [,18]
## radius_mean      -0.015683967 -0.04255502  0.049456533  0.033654027
## texture_mean     -0.033112133 -0.04460615 -0.009433423 -0.009428525
## perimeter_mean   -0.012242788 -0.03234470  0.047481690  0.036316100
## area_mean         0.004285246 -0.03742982  0.062320398  0.061055728
## smoothness_mean  -0.036248064 -0.05782372  0.040927741 -0.080796547
## compactness_mean  0.070843391  0.04809242 -0.004949378  0.001787881
##                         [,19]        [,20]        [,21]       [,22]
## radius_mean       0.050133570 -0.008772821 -0.011871307 -0.01208056
## texture_mean      0.006626055 -0.043094773  0.077624778 -0.01570358
## perimeter_mean    0.053294517 -0.003118233 -0.012078892 -0.01245022
## area_mean        -0.006077427 -0.015912200 -0.003193026 -0.01616162
## smoothness_mean  -0.036605301  0.003018666 -0.020687226 -0.01057217
## compactness_mean  0.063221168  0.086263038  0.033347929  0.01624639
##                          [,23]        [,24]        [,25]        [,26]
## radius_mean      -1.537575e-02 -0.024533003 -0.002392233 -0.011708590
## texture_mean     -8.658821e-05  0.013273874  0.010544407 -0.002220667
## perimeter_mean   -6.278798e-03 -0.015673984  0.003361359 -0.011326933
## area_mean         1.213375e-03  0.009385446 -0.026134063  0.032801549
## smoothness_mean  -3.224173e-03  0.009230799  0.003602676 -0.003346255
## compactness_mean  8.169034e-03 -0.013992577  0.049349353  0.023765850
##                         [,27]         [,28]         [,29]         [,30]
## radius_mean      -0.010925793  8.419566e-03  5.786460e-03  8.101999e-03
## texture_mean     -0.001441855 -2.623673e-06 -2.882534e-04  3.156545e-06
## perimeter_mean   -0.009587447  3.362272e-03  1.050312e-02 -7.957621e-03
## area_mean         0.038761046 -1.086395e-02 -1.156947e-02 -3.800314e-04
## smoothness_mean   0.005789074  5.897327e-05 -9.398714e-05 -5.591303e-05
## compactness_mean  0.008119890 -2.177814e-04 -1.122394e-03  5.152947e-04
head(all_pca_var_5$cor)
##                        [,1]       [,2]        [,3]        [,4]        [,5]
## radius_mean      -0.7977668  0.5579027 -0.01432118  0.05827700 -0.04851878
## texture_mean     -0.3780132  0.1424382  0.10835829 -0.84870380  0.06351944
## perimeter_mean   -0.8292355  0.5133487 -0.01563555  0.05908501 -0.04799015
## area_mean        -0.8053928  0.5512695  0.04817717  0.07520017 -0.01326563
## smoothness_mean  -0.5196530 -0.4440017 -0.17507219  0.22430770  0.46878427
## compactness_mean -0.8720501 -0.3623611 -0.12437565  0.04474618 -0.01502824
##                          [,6]         [,7]         [,8]         [,9]
## radius_mean       0.020592339 -0.101965596  0.005144876 -0.144056156
## texture_mean     -0.035358035  0.009367203 -0.090214585  0.072767057
## perimeter_mean    0.019018481 -0.094067834  0.012901209 -0.144462575
## area_mean        -0.002074253 -0.042444540 -0.023937777 -0.126284788
## smoothness_mean  -0.314667668 -0.115590213  0.199500717  0.004148275
## compactness_mean -0.015527056  0.025406278  0.104520200 -0.108370831
##                         [,10]        [,11]       [,12]       [,13]        [,14]
## radius_mean       0.056546476 -0.022483349  0.02609749 0.005879269  0.023578980
## texture_mean      0.142679652  0.163858215  0.13026214 0.099956785 -0.008543071
## perimeter_mean    0.051157023 -0.009098538  0.01989278 0.021670182  0.019223333
## area_mean         0.044388765 -0.059727362  0.03344115 0.033100452  0.004291657
## smoothness_mean  -0.041034694  0.074285011  0.16186012 0.022389467  0.176354514
## compactness_mean  0.007660737  0.166984319 -0.05315682 0.112641659  0.003210000
##                         [,15]       [,16]        [,17]        [,18]
## radius_mean      -0.015683967 -0.04255502  0.049456533  0.033654027
## texture_mean     -0.033112133 -0.04460615 -0.009433423 -0.009428525
## perimeter_mean   -0.012242788 -0.03234470  0.047481690  0.036316100
## area_mean         0.004285246 -0.03742982  0.062320398  0.061055728
## smoothness_mean  -0.036248064 -0.05782372  0.040927741 -0.080796547
## compactness_mean  0.070843391  0.04809242 -0.004949378  0.001787881
##                         [,19]        [,20]        [,21]       [,22]
## radius_mean       0.050133570 -0.008772821 -0.011871307 -0.01208056
## texture_mean      0.006626055 -0.043094773  0.077624778 -0.01570358
## perimeter_mean    0.053294517 -0.003118233 -0.012078892 -0.01245022
## area_mean        -0.006077427 -0.015912200 -0.003193026 -0.01616162
## smoothness_mean  -0.036605301  0.003018666 -0.020687226 -0.01057217
## compactness_mean  0.063221168  0.086263038  0.033347929  0.01624639
##                          [,23]        [,24]        [,25]        [,26]
## radius_mean      -1.537575e-02 -0.024533003 -0.002392233 -0.011708590
## texture_mean     -8.658821e-05  0.013273874  0.010544407 -0.002220667
## perimeter_mean   -6.278798e-03 -0.015673984  0.003361359 -0.011326933
## area_mean         1.213375e-03  0.009385446 -0.026134063  0.032801549
## smoothness_mean  -3.224173e-03  0.009230799  0.003602676 -0.003346255
## compactness_mean  8.169034e-03 -0.013992577  0.049349353  0.023765850
##                         [,27]         [,28]         [,29]         [,30]
## radius_mean      -0.010925793  8.419566e-03  5.786460e-03  8.101999e-03
## texture_mean     -0.001441855 -2.623673e-06 -2.882534e-04  3.156545e-06
## perimeter_mean   -0.009587447  3.362272e-03  1.050312e-02 -7.957621e-03
## area_mean         0.038761046 -1.086395e-02 -1.156947e-02 -3.800314e-04
## smoothness_mean   0.005789074  5.897327e-05 -9.398714e-05 -5.591303e-05
## compactness_mean  0.008119890 -2.177814e-04 -1.122394e-03  5.152947e-04
head(all_pca_var_5$cos2)
##                       [,1]       [,2]         [,3]        [,4]         [,5]
## radius_mean      0.6364318 0.31125539 0.0002050963 0.003396209 0.0023540715
## texture_mean     0.1428940 0.02028864 0.0117415199 0.720298141 0.0040347193
## perimeter_mean   0.6876316 0.26352690 0.0002444703 0.003491038 0.0023030547
## area_mean        0.6486576 0.30389811 0.0023210397 0.005655066 0.0001759769
## smoothness_mean  0.2700393 0.19713747 0.0306502709 0.050313944 0.2197586899
## compactness_mean 0.7604714 0.13130559 0.0154693024 0.002002220 0.0002258480
##                          [,6]         [,7]         [,8]         [,9]
## radius_mean      4.240444e-04 0.0103969827 2.646975e-05 2.075218e-02
## texture_mean     1.250191e-03 0.0000877445 8.138671e-03 5.295045e-03
## perimeter_mean   3.617026e-04 0.0088487573 1.664412e-04 2.086944e-02
## area_mean        4.302527e-06 0.0018015390 5.730172e-04 1.594785e-02
## smoothness_mean  9.901574e-02 0.0133610973 3.980054e-02 1.720819e-05
## compactness_mean 2.410895e-04 0.0006454790 1.092447e-02 1.174424e-02
##                         [,10]        [,11]        [,12]        [,13]
## radius_mean      3.197504e-03 5.055010e-04 0.0006810789 3.456581e-05
## texture_mean     2.035748e-02 2.684951e-02 0.0169682252 9.991359e-03
## perimeter_mean   2.617041e-03 8.278339e-05 0.0003957226 4.695968e-04
## area_mean        1.970362e-03 3.567358e-03 0.0011183106 1.095640e-03
## smoothness_mean  1.683846e-03 5.518263e-03 0.0261986970 5.012882e-04
## compactness_mean 5.868689e-05 2.788376e-02 0.0028256473 1.268814e-02
##                         [,14]        [,15]       [,16]        [,17]
## radius_mean      5.559683e-04 2.459868e-04 0.001810929 2.445949e-03
## texture_mean     7.298407e-05 1.096413e-03 0.001989709 8.898948e-05
## perimeter_mean   3.695365e-04 1.498859e-04 0.001046179 2.254511e-03
## area_mean        1.841832e-05 1.836333e-05 0.001400992 3.883832e-03
## smoothness_mean  3.110091e-02 1.313922e-03 0.003343582 1.675080e-03
## compactness_mean 1.030410e-05 5.018786e-03 0.002312881 2.449634e-05
##                         [,18]        [,19]        [,20]        [,21]
## radius_mean      1.132594e-03 2.513375e-03 7.696239e-05 1.409279e-04
## texture_mean     8.889709e-05 4.390460e-05 1.857159e-03 6.025606e-03
## perimeter_mean   1.318859e-03 2.840305e-03 9.723374e-06 1.458996e-04
## area_mean        3.727802e-03 3.693512e-05 2.531981e-04 1.019542e-05
## smoothness_mean  6.528082e-03 1.339948e-03 9.112344e-06 4.279613e-04
## compactness_mean 3.196517e-06 3.996916e-03 7.441312e-03 1.112084e-03
##                         [,22]        [,23]        [,24]        [,25]
## radius_mean      0.0001459399 2.364136e-04 6.018682e-04 5.722780e-06
## texture_mean     0.0002466023 7.497518e-09 1.761957e-04 1.111845e-04
## perimeter_mean   0.0001550079 3.942330e-05 2.456738e-04 1.129874e-05
## area_mean        0.0002611979 1.472279e-06 8.808659e-05 6.829892e-04
## smoothness_mean  0.0001117708 1.039529e-05 8.520764e-05 1.297927e-05
## compactness_mean 0.0002639453 6.673311e-05 1.957922e-04 2.435359e-03
##                         [,26]        [,27]        [,28]        [,29]
## radius_mean      1.370911e-04 1.193730e-04 7.088910e-05 3.348312e-05
## texture_mean     4.931360e-06 2.078945e-06 6.883658e-12 8.309002e-08
## perimeter_mean   1.282994e-04 9.191915e-05 1.130487e-05 1.103155e-04
## area_mean        1.075942e-03 1.502419e-03 1.180255e-04 1.338527e-04
## smoothness_mean  1.119742e-05 3.351338e-05 3.477847e-09 8.833583e-09
## compactness_mean 5.648156e-04 6.593262e-05 4.742874e-08 1.259768e-06
##                         [,30]
## radius_mean      6.564239e-05
## texture_mean     9.963774e-12
## perimeter_mean   6.332372e-05
## area_mean        1.444238e-07
## smoothness_mean  3.126267e-09
## compactness_mean 2.655286e-07
head(all_pca_var_5$contrib)
##                      [,1]      [,2]        [,3]       [,4]        [,5]
## radius_mean      4.791828 5.4689158 0.007278210  0.1714702  0.14278085
## texture_mean     1.075879 0.3564817 0.416669002 36.3669303  0.24471672
## perimeter_mean   5.177322 4.6303018 0.008675469  0.1762581  0.13968654
## area_mean        4.883878 5.3396446 0.082366279  0.2855170  0.01067348
## smoothness_mean  2.033182 3.4638057 1.087680124  2.5402866 13.32896332
## compactness_mean 5.725748 2.3071061 0.548956087  0.1010895  0.01369829
##                          [,6]       [,7]        [,8]        [,9]      [,10]
## radius_mean      0.0351217226 1.53979162 0.005553672 4.977796701 0.91176608
## texture_mean     0.1035477523 0.01299495 1.707590993 1.270115259 5.80492242
## perimeter_mean   0.0299582265 1.31049967 0.034921362 5.005923564 0.74624745
## area_mean        0.0003563592 0.26680766 0.120225880 3.825388875 0.56184752
## smoothness_mean  8.2010352310 1.97877655 8.350630500 0.004127705 0.48014757
## compactness_mean 0.0199683716 0.09559534 2.292085472 2.817074391 0.01673453
##                       [,11]      [,12]      [,13]        [,14]      [,15]
## radius_mean      0.17198842  0.2607885 0.01432142  0.354098008 0.26131291
## texture_mean     9.13510741  6.4972186 4.13965139  0.046483789 1.16472490
## perimeter_mean   0.02816569  0.1515242 0.19456483  0.235358999 0.15922443
## area_mean        1.21373501  0.4282067 0.45394900  0.011730686 0.01950745
## smoothness_mean  1.87749853 10.0316126 0.20769532 19.808272945 1.39578544
## compactness_mean 9.48699343  1.0819546 5.25699162  0.006562713 5.33147923
##                     [,16]     [,17]        [,18]      [,19]       [,20]
## radius_mean      2.267551 4.1178253  2.152451027 5.07982446  0.24699572
## texture_mean     2.491408 0.1498164  0.168945538 0.08873633  5.96018948
## perimeter_mean   1.309971 3.7955343  2.506441650 5.74058961  0.03120527
## area_mean        1.754248 6.5385437  7.084545992 0.07465020  0.81258978
## smoothness_mean  4.186658 2.8200456 12.406371982 2.70819168  0.02924428
## compactness_mean 2.896068 0.0412403  0.006074859 8.07823486 23.88143277
##                        [,21]     [,22]        [,23]     [,24]       [,25]
## radius_mean       0.47018457 0.5318625 9.712634e-01 3.3335252  0.03696583
## texture_mean     20.10351786 0.8987160 3.080222e-05 0.9758830  0.71818728
## perimeter_mean    0.48677194 0.5649097 1.619636e-01 1.3606960  0.07298326
## area_mean         0.03401545 0.9519081 6.048598e-03 0.4878790  4.41171292
## smoothness_mean   1.42782777 0.4073369 4.270720e-02 0.4719336  0.08383854
## compactness_mean  3.71030023 0.9619207 2.741611e-01 1.0844206 15.73099874
##                        [,26]       [,27]        [,28]        [,29]        [,30]
## radius_mean       1.67641371  1.72992648 4.460291e+00  4.471552424 4.933856e+01
## texture_mean      0.06030297  0.03012762 4.331148e-07  0.011096377 7.489035e-06
## perimeter_mean    1.56890519  1.33207198 7.112943e-01 14.732247365 4.759578e+01
## area_mean        13.15711689 21.77272040 7.426079e+00 17.875554424 1.085528e-01
## smoothness_mean   0.13692728  0.48566854 2.188236e-04  0.001179694 2.349785e-03
## compactness_mean  6.90682942  0.95548094 2.984182e-03  0.168237574 1.995783e-01

Unsurprisingly, each object yields the same output. Let’s now examine the outputs of applying get_pca_ind() to the PCA objects previously created:

prcomp()
all_pca_ind_1 <- get_pca_ind(all_pca_1)
all_pca_ind_1
## Principal Component Analysis Results for individuals
##  ===================================================
##   Name       Description                       
## 1 "$coord"   "Coordinates for the individuals" 
## 2 "$cos2"    "Cos2 for the individuals"        
## 3 "$contrib" "contributions of the individuals"
head(all_pca_ind_1$coord)
##       Dim.1      Dim.2      Dim.3     Dim.4      Dim.5       Dim.6       Dim.7
## 1 -9.184755  -1.946870 -1.1221788 3.6305364  1.1940595  1.41018364  2.15747152
## 2 -2.385703   3.764859 -0.5288274 1.1172808 -0.6212284  0.02863116  0.01334635
## 3 -5.728855   1.074229 -0.5512625 0.9112808  0.1769302  0.54097615 -0.66757908
## 4 -7.116691 -10.266556 -3.2299475 0.1524129  2.9582754  3.05073750  1.42865363
## 5 -3.931842   1.946359  1.3885450 2.9380542 -0.5462667 -1.22541641 -0.93538950
## 6 -2.378155  -3.946456 -2.9322967 0.9402096  1.0551135 -0.45064213  0.49001396
##         Dim.8       Dim.9     Dim.10     Dim.11     Dim.12      Dim.13
## 1  0.39805698 -0.15698023 -0.8766305 -0.2627243 -0.8582593  0.10329677
## 2 -0.24077660 -0.71127897  1.1060218 -0.8124048  0.1577838 -0.94269981
## 3 -0.09728813  0.02404449  0.4538760  0.6050715  0.1242777 -0.41026561
## 4 -1.05863376 -1.40420412 -1.1159933  1.1505012  1.0104267 -0.93245070
## 5 -0.63581661 -0.26357355  0.3773724 -0.6507870 -0.1104183  0.38760691
## 6  0.16529843 -0.13335576 -0.5299649 -0.1096698  0.0813699 -0.02625135
##         Dim.14       Dim.15      Dim.16      Dim.17      Dim.18     Dim.19
## 1 -0.690196797  0.601264078  0.74446075 -0.26523740 -0.54907956  0.1336499
## 2 -0.652900844 -0.008966977 -0.64823831 -0.01719707  0.31801756 -0.2473470
## 3  0.016665095 -0.482994760  0.32482472  0.19075064 -0.08789759 -0.3922812
## 4 -0.486988399  0.168699395  0.05132509  0.48220960 -0.03584323 -0.0267241
## 5 -0.538706543 -0.310046684 -0.15247165  0.13302526 -0.01869779  0.4610302
## 6  0.003133944 -0.178447576 -0.01270566  0.19671335 -0.29727706 -0.1297265
##        Dim.20       Dim.21      Dim.22      Dim.23       Dim.24       Dim.25
## 1  0.34526111  0.096430045 -0.06878939  0.08444429  0.175102213  0.150887294
## 2 -0.11403274 -0.077259494  0.09449530 -0.21752666 -0.011280193  0.170360355
## 3 -0.20435242  0.310793246  0.06025601 -0.07422581 -0.102671419 -0.171007656
## 4 -0.46432511  0.433811661  0.20308706 -0.12399554 -0.153294780 -0.077427574
## 5  0.06543782 -0.116442469  0.01763433  0.13933105  0.005327110 -0.003059371
## 6 -0.07117453 -0.002400178  0.10108043  0.03344819 -0.002837749 -0.122282765
##         Dim.26      Dim.27        Dim.28       Dim.29        Dim.30
## 1 -0.201326305 -0.25236294 -0.0338846387  0.045607590  0.0471277407
## 2 -0.041092627  0.18111081  0.0325955021 -0.005682424  0.0018662342
## 3  0.004731249  0.04952586  0.0469844833  0.003143131 -0.0007498749
## 4 -0.274982822  0.18330078  0.0424469831 -0.069233868  0.0199198881
## 5  0.039219780  0.03213957 -0.0347556386  0.005033481 -0.0211951203
## 6 -0.030272333 -0.08438081  0.0007296587 -0.019703996 -0.0034564331
head(all_pca_ind_1$cos2)
##       Dim.1      Dim.2       Dim.3        Dim.4        Dim.5        Dim.6
## 1 0.7366868 0.03309951 0.010996938 0.1151037016 0.0124508677 1.736597e-02
## 2 0.2164877 0.53913561 0.010637227 0.0474815863 0.0146792249 3.118017e-05
## 3 0.8781764 0.03087731 0.008131356 0.0222203296 0.0008376258 7.830730e-03
## 4 0.2593764 0.53978871 0.053427543 0.0001189646 0.0448178978 4.766328e-02
## 5 0.4498315 0.11023095 0.056101900 0.2511755035 0.0086829482 4.369433e-02
## 6 0.1722700 0.47439927 0.261905853 0.0269264275 0.0339099994 6.185759e-03
##          Dim.7        Dim.8        Dim.9      Dim.10       Dim.11       Dim.12
## 1 4.064787e-02 0.0013836880 2.151977e-04 0.006710902 0.0006027652 0.0064325731
## 2 6.775254e-06 0.0022051042 1.924334e-02 0.046529449 0.0251041522 0.0009469458
## 3 1.192481e-02 0.0002532595 1.546953e-05 0.005512144 0.0097962404 0.0004132687
## 4 1.045269e-02 0.0057393901 1.009799e-02 0.006378190 0.0067787311 0.0052285821
## 5 2.545908e-02 0.0117630901 2.021442e-03 0.004143792 0.0123235390 0.0003547641
## 6 7.313854e-03 0.0008322750 5.416927e-04 0.008555071 0.0003663563 0.0002016773
##         Dim.13       Dim.14       Dim.15       Dim.16       Dim.17       Dim.18
## 1 9.317968e-05 4.160002e-03 3.157026e-03 4.839843e-03 6.143519e-04 2.632802e-03
## 2 3.380239e-02 1.621418e-02 3.058389e-06 1.598343e-02 1.124889e-05 3.846828e-03
## 3 4.503770e-03 7.431246e-06 6.242102e-03 2.823216e-03 9.735943e-04 2.067283e-04
## 4 4.452727e-03 1.214539e-03 1.457476e-04 1.349067e-05 1.190820e-03 6.579435e-06
## 5 4.371603e-03 8.444271e-03 2.797125e-03 6.764503e-04 5.149036e-04 1.017275e-05
## 6 2.099098e-05 2.991656e-07 9.699530e-04 4.917266e-06 1.178683e-03 2.691858e-03
##         Dim.19       Dim.20       Dim.21       Dim.22       Dim.23       Dim.24
## 1 1.559858e-04 0.0010409815 8.120307e-05 4.132289e-05 6.227135e-05 2.677509e-04
## 2 2.327093e-03 0.0004946064 2.270410e-04 3.396417e-04 1.799805e-03 4.839869e-06
## 3 4.117570e-03 0.0011173921 2.584575e-03 9.715089e-05 1.474198e-04 2.820624e-04
## 4 3.657467e-06 0.0011041259 9.637773e-04 2.112218e-04 7.873837e-05 1.203453e-04
## 5 6.184670e-03 0.0001245992 3.945304e-04 9.048481e-06 5.648765e-04 8.257356e-07
## 6 5.126095e-04 0.0001543045 1.754755e-07 3.112171e-04 3.407804e-05 2.452886e-07
##         Dim.25       Dim.26       Dim.27       Dim.28       Dim.29       Dim.30
## 1 1.988168e-04 3.539556e-04 5.561589e-04 1.002659e-05 1.816444e-05 1.939550e-05
## 2 1.103920e-03 6.422859e-05 1.247640e-03 4.041252e-05 1.228197e-06 1.324747e-07
## 3 7.824870e-04 5.989596e-07 6.563115e-05 5.906836e-05 2.643449e-07 1.504609e-08
## 4 3.070192e-05 3.872446e-04 1.720691e-04 9.227157e-06 2.454774e-05 2.032114e-06
## 5 2.723465e-07 4.475772e-05 3.005646e-05 3.514862e-05 7.372157e-07 1.307162e-05
## 6 4.554701e-04 2.791394e-05 2.168786e-04 1.621694e-08 1.182600e-05 3.639031e-07
head(all_pca_ind_1$contrib)
##        Dim.1      Dim.2      Dim.3       Dim.4       Dim.5        Dim.6
## 1 1.11627772 0.11704315 0.07853779 1.169563151 0.151981235 0.2894699673
## 2 0.07531296 0.43769293 0.01744145 0.110766067 0.041137756 0.0001193246
## 3 0.43428299 0.03563408 0.01895272 0.073686268 0.003336891 0.0425998823
## 4 0.67018289 3.25477984 0.65064717 0.002061226 0.932857403 1.3547583890
## 5 0.20456405 0.11698171 0.12024707 0.765952223 0.031808821 0.2185845940
## 6 0.07483715 0.48093536 0.53625385 0.078438889 0.118668771 0.0295607711
##          Dim.7      Dim.8        Dim.9     Dim.10      Dim.11      Dim.12
## 1 1.211525e+00 0.05842632 0.0103884603 0.38511751 0.041272939 0.495696475
## 2 4.636256e-05 0.02137699 0.2132756007 0.61303803 0.394648035 0.016753415
## 3 1.159973e-01 0.00349010 0.0002437206 0.10323681 0.218916436 0.010393583
## 4 5.312467e-01 0.41324685 0.8312309885 0.62414179 0.791478446 0.687050176
## 5 2.277337e-01 0.14906710 0.0292863255 0.07136748 0.253246070 0.008204663
## 6 6.249701e-02 0.01007524 0.0074969532 0.14075191 0.007191827 0.004455602
##         Dim.13       Dim.14       Dim.15       Dim.16       Dim.17      Dim.18
## 1 0.0077696323 5.332208e-01 0.6749432895 1.2196263567 0.2081506802 1.006972211
## 2 0.6471035437 4.771508e-01 0.0001501167 0.9247249753 0.0008750187 0.337791822
## 3 0.1225623765 3.108685e-04 0.4355335406 0.2321888338 0.1076565195 0.025804823
## 4 0.6331093156 2.654596e-01 0.0531329362 0.0057969902 0.6879866799 0.004291027
## 5 0.1093981490 3.248371e-01 0.1794696227 0.0511589256 0.0523572209 0.001167690
## 6 0.0005017996 1.099369e-05 0.0594508507 0.0003552528 0.1144922276 0.295168313
##        Dim.19     Dim.20       Dim.21      Dim.22     Dim.23       Dim.24
## 1 0.063447786 0.67234773 5.452351e-02 0.030307915 0.05148643 2.984512e-01
## 2 0.217316368 0.07334285 3.499958e-02 0.057191771 0.34164668 1.238577e-03
## 3 0.546605705 0.23553649 5.663726e-01 0.023254867 0.03977976 1.026099e-01
## 4 0.002536795 1.21602625 1.103472e+00 0.264166418 0.11101058 2.287414e-01
## 5 0.754984443 0.02415217 7.950270e-02 0.001991733 0.14016772 2.762316e-04
## 6 0.059777324 0.02857248 3.377895e-05 0.065440591 0.00807788 7.838593e-05
##        Dim.25       Dim.26     Dim.27       Dim.28      Dim.29       Dim.30
## 1 0.258455901 0.8710855293 1.62203643 1.269630e-01 0.488196148 2.9338843773
## 2 0.329471749 0.0362900507 0.83540714 1.174862e-01 0.007578573 0.0046006800
## 3 0.331980227 0.0004810733 0.06247028 2.441071e-01 0.002318703 0.0007427924
## 4 0.068056939 1.6250655464 0.85573255 1.992348e-01 1.125012348 0.5241595992
## 5 0.000106254 0.0330575041 0.02630811 1.335740e-01 0.005946440 0.5934191099
## 6 0.169750714 0.0196947983 0.18134133 5.887231e-05 0.091123153 0.0157814195
princomp()
all_pca_ind_2 <- get_pca_ind(all_pca_2)
all_pca_ind_2
## Principal Component Analysis Results for individuals
##  ===================================================
##   Name       Description                       
## 1 "$coord"   "Coordinates for the individuals" 
## 2 "$cos2"    "Cos2 for the individuals"        
## 3 "$contrib" "contributions of the individuals"
head(all_pca_ind_2$coord)
##      Dim.1      Dim.2      Dim.3     Dim.4      Dim.5       Dim.6       Dim.7
## 1 9.192837  -1.948583  1.1231662 3.6337309 -1.1951101  1.41142445 -2.15936987
## 2 2.387802   3.768172  0.5292927 1.1182639  0.6217750  0.02865635 -0.01335809
## 3 5.733896   1.075174  0.5517476 0.9120827 -0.1770859  0.54145215  0.66816648
## 4 7.122953 -10.275589  3.2327895 0.1525470 -2.9608784  3.05342182 -1.42991070
## 5 3.935302   1.948072 -1.3897667 2.9406393  0.5467474 -1.22649464  0.93621255
## 6 2.380247  -3.949929  2.9348768 0.9410369 -1.0560419 -0.45103865 -0.49044512
##         Dim.8       Dim.9     Dim.10     Dim.11     Dim.12      Dim.13
## 1  0.39840723  0.15711836 -0.8774019  0.2629555 -0.8590145  0.10338766
## 2 -0.24098846  0.71190482  1.1069949  0.8131197  0.1579226 -0.94352928
## 3 -0.09737374 -0.02406564  0.4542754 -0.6056039  0.1243871 -0.41062660
## 4 -1.05956524  1.40543967 -1.1169753 -1.1515135  1.0113158 -0.93327116
## 5 -0.63637606  0.26380546  0.3777045  0.6513596 -0.1105154  0.38794797
## 6  0.16544388  0.13347310 -0.5304312  0.1097663  0.0814415 -0.02627445
##         Dim.14       Dim.15      Dim.16     Dim.17      Dim.18      Dim.19
## 1 -0.690804097 -0.601793127 -0.74511579 -0.2654708 -0.54956269  0.13376750
## 2 -0.653475327  0.008974867  0.64880869 -0.0172122  0.31829738 -0.24756463
## 3  0.016679759  0.483419744 -0.32511053  0.1909185 -0.08797493 -0.39262636
## 4 -0.487416897 -0.168847832 -0.05137025  0.4826339 -0.03587477 -0.02674762
## 5 -0.539180548  0.310319492  0.15260581  0.1331423 -0.01871424  0.46143590
## 6  0.003136701  0.178604591  0.01271684  0.1968864 -0.29753864 -0.12984063
##        Dim.20      Dim.21      Dim.22      Dim.23       Dim.24       Dim.25
## 1 -0.34556490 -0.09651489  0.06884992 -0.08451859 -0.175256284 -0.151020059
## 2  0.11413308  0.07732747 -0.09457845  0.21771806  0.011290118 -0.170510254
## 3  0.20453223 -0.31106671 -0.06030903  0.07429112  0.102761759  0.171158125
## 4  0.46473366 -0.43419337 -0.20326576  0.12410465  0.153429663  0.077495702
## 5 -0.06549539  0.11654493 -0.01764985 -0.13945364 -0.005331797  0.003062062
## 6  0.07123716  0.00240229 -0.10116937 -0.03347762  0.002840246  0.122390361
##         Dim.26      Dim.27        Dim.28       Dim.29        Dim.30
## 1  0.201503451  0.25258499 -0.0339144536  0.045647720  0.0471692081
## 2  0.041128785 -0.18127017  0.0326241827 -0.005687424  0.0018678763
## 3 -0.004735412 -0.04956943  0.0470258247  0.003145897 -0.0007505348
## 4  0.275224778 -0.18346206  0.0424843320 -0.069294786  0.0199374155
## 5 -0.039254289 -0.03216785 -0.0347862199  0.005037910 -0.0212137698
## 6  0.030298969  0.08445505  0.0007303007 -0.019721334 -0.0034594744
head(all_pca_ind_2$cos2)
##       Dim.1      Dim.2       Dim.3        Dim.4        Dim.5        Dim.6
## 1 0.7366868 0.03309951 0.010996938 0.1151037016 0.0124508677 1.736597e-02
## 2 0.2164877 0.53913561 0.010637227 0.0474815863 0.0146792249 3.118017e-05
## 3 0.8781764 0.03087731 0.008131356 0.0222203296 0.0008376258 7.830730e-03
## 4 0.2593764 0.53978871 0.053427543 0.0001189646 0.0448178978 4.766328e-02
## 5 0.4498315 0.11023095 0.056101900 0.2511755035 0.0086829482 4.369433e-02
## 6 0.1722700 0.47439927 0.261905853 0.0269264275 0.0339099994 6.185759e-03
##          Dim.7        Dim.8        Dim.9      Dim.10       Dim.11       Dim.12
## 1 4.064787e-02 0.0013836880 2.151977e-04 0.006710902 0.0006027652 0.0064325731
## 2 6.775254e-06 0.0022051042 1.924334e-02 0.046529449 0.0251041522 0.0009469458
## 3 1.192481e-02 0.0002532595 1.546953e-05 0.005512144 0.0097962404 0.0004132687
## 4 1.045269e-02 0.0057393901 1.009799e-02 0.006378190 0.0067787311 0.0052285821
## 5 2.545908e-02 0.0117630901 2.021442e-03 0.004143792 0.0123235390 0.0003547641
## 6 7.313854e-03 0.0008322750 5.416927e-04 0.008555071 0.0003663563 0.0002016773
##         Dim.13       Dim.14       Dim.15       Dim.16       Dim.17       Dim.18
## 1 9.317968e-05 4.160002e-03 3.157026e-03 4.839843e-03 6.143519e-04 2.632802e-03
## 2 3.380239e-02 1.621418e-02 3.058389e-06 1.598343e-02 1.124889e-05 3.846828e-03
## 3 4.503770e-03 7.431246e-06 6.242102e-03 2.823216e-03 9.735943e-04 2.067283e-04
## 4 4.452727e-03 1.214539e-03 1.457476e-04 1.349067e-05 1.190820e-03 6.579435e-06
## 5 4.371603e-03 8.444271e-03 2.797125e-03 6.764503e-04 5.149036e-04 1.017275e-05
## 6 2.099098e-05 2.991656e-07 9.699530e-04 4.917266e-06 1.178683e-03 2.691858e-03
##         Dim.19       Dim.20       Dim.21       Dim.22       Dim.23       Dim.24
## 1 1.559858e-04 0.0010409815 8.120307e-05 4.132289e-05 6.227135e-05 2.677509e-04
## 2 2.327093e-03 0.0004946064 2.270410e-04 3.396417e-04 1.799805e-03 4.839869e-06
## 3 4.117570e-03 0.0011173921 2.584575e-03 9.715089e-05 1.474198e-04 2.820624e-04
## 4 3.657467e-06 0.0011041259 9.637773e-04 2.112218e-04 7.873837e-05 1.203453e-04
## 5 6.184670e-03 0.0001245992 3.945304e-04 9.048481e-06 5.648765e-04 8.257356e-07
## 6 5.126095e-04 0.0001543045 1.754755e-07 3.112171e-04 3.407804e-05 2.452886e-07
##         Dim.25       Dim.26       Dim.27       Dim.28       Dim.29       Dim.30
## 1 1.988168e-04 3.539556e-04 5.561589e-04 1.002659e-05 1.816444e-05 1.939550e-05
## 2 1.103920e-03 6.422859e-05 1.247640e-03 4.041252e-05 1.228197e-06 1.324747e-07
## 3 7.824870e-04 5.989596e-07 6.563115e-05 5.906836e-05 2.643449e-07 1.504609e-08
## 4 3.070192e-05 3.872446e-04 1.720691e-04 9.227157e-06 2.454774e-05 2.032114e-06
## 5 2.723465e-07 4.475772e-05 3.005646e-05 3.514862e-05 7.372157e-07 1.307162e-05
## 6 4.554701e-04 2.791394e-05 2.168786e-04 1.621694e-08 1.182600e-05 3.639031e-07
head(all_pca_ind_2$contrib)
##        Dim.1      Dim.2      Dim.3       Dim.4       Dim.5        Dim.6
## 1 1.11824300 0.11724921 0.07867607 1.171622241 0.152248808 0.2899795975
## 2 0.07544555 0.43846352 0.01747215 0.110961077 0.041210182 0.0001195347
## 3 0.43504757 0.03569681 0.01898609 0.073815998 0.003342766 0.0426748821
## 4 0.67136279 3.26051009 0.65179267 0.002064855 0.934499758 1.3571435270
## 5 0.20492419 0.11718767 0.12045877 0.767300731 0.031864823 0.2189694260
## 6 0.07496891 0.48178208 0.53719796 0.078576986 0.118877695 0.0296128148
##          Dim.7       Dim.8        Dim.9     Dim.10      Dim.11      Dim.12
## 1 1.213658e+00 0.058529188 0.0104067498 0.38579553 0.041345602 0.496569180
## 2 4.644418e-05 0.021414629 0.2136510859 0.61411732 0.395342837 0.016782911
## 3 1.162015e-01 0.003496244 0.0002441497 0.10341857 0.219301852 0.010411881
## 4 5.321820e-01 0.413974398 0.8326944234 0.62524063 0.792871893 0.688259771
## 5 2.281347e-01 0.149329546 0.0293378859 0.07149313 0.253691925 0.008219107
## 6 6.260704e-02 0.010092981 0.0075101521 0.14099971 0.007204489 0.004463446
##        Dim.13       Dim.14       Dim.15       Dim.16       Dim.17      Dim.18
## 1 0.007783311 5.341595e-01 0.6761315700 1.2217735862 0.2085171427 1.008745049
## 2 0.648242810 4.779908e-01 0.0001503809 0.9263530122 0.0008765592 0.338386526
## 3 0.122778155 3.114159e-04 0.4363003250 0.2325976169 0.1078460556 0.025850254
## 4 0.634223945 2.659269e-01 0.0532264801 0.0058071962 0.6891979240 0.004298581
## 5 0.109590751 3.254090e-01 0.1797855904 0.0512489941 0.0524493991 0.001169745
## 6 0.000502683 1.101305e-05 0.0595555177 0.0003558783 0.1146937985 0.295687975
##        Dim.19     Dim.20       Dim.21      Dim.22      Dim.23       Dim.24
## 1 0.063559490 0.67353144 5.461950e-02 0.030361274 0.051577078 2.989766e-01
## 2 0.217698967 0.07347197 3.506119e-02 0.057292461 0.342248175 1.240758e-03
## 3 0.547568039 0.23595117 5.673697e-01 0.023295809 0.039849794 1.027905e-01
## 4 0.002541261 1.21816714 1.105415e+00 0.264631499 0.111206018 2.291441e-01
## 5 0.756313641 0.02419469 7.964267e-02 0.001995239 0.140414495 2.767180e-04
## 6 0.059882566 0.02862278 3.383842e-05 0.065555803 0.008092102 7.852393e-05
##         Dim.25       Dim.26     Dim.27       Dim.28      Dim.29       Dim.30
## 1 0.2589109285 0.8726191306 1.62489212 1.271865e-01 0.489055649 2.9390496668
## 2 0.3300518050 0.0363539416 0.83687792 1.176930e-01 0.007591915 0.0046087798
## 3 0.3325646996 0.0004819203 0.06258026 2.445369e-01 0.002322785 0.0007441001
## 4 0.0681767579 1.6279265773 0.85723913 1.995855e-01 1.126993004 0.5250824154
## 5 0.0001064411 0.0331157040 0.02635442 1.338092e-01 0.005956909 0.5944638618
## 6 0.1700495704 0.0197294722 0.18166059 5.897596e-05 0.091283581 0.0158092037
PCA()
all_pca_ind_3 <- get_pca_ind(all_pca_3)
all_pca_ind_3
## Principal Component Analysis Results for individuals
##  ===================================================
##   Name       Description                       
## 1 "$coord"   "Coordinates for the individuals" 
## 2 "$cos2"    "Cos2 for the individuals"        
## 3 "$contrib" "contributions of the individuals"
head(all_pca_ind_3$coord)
##      Dim.1     Dim.2      Dim.3      Dim.4      Dim.5       Dim.6       Dim.7
## 1 9.192837  1.948583 -1.1231662 -3.6337309  1.1951101 -1.41142445 -2.15936987
## 2 2.387802 -3.768172 -0.5292927 -1.1182639 -0.6217750 -0.02865635 -0.01335809
## 3 5.733896 -1.075174 -0.5517476 -0.9120827  0.1770859 -0.54145215  0.66816648
## 4 7.122953 10.275589 -3.2327895 -0.1525470  2.9608784 -3.05342182 -1.42991070
## 5 3.935302 -1.948072  1.3897667 -2.9406393 -0.5467474  1.22649464  0.93621255
## 6 2.380247  3.949929 -2.9348768 -0.9410369  1.0560419  0.45103865 -0.49044512
##         Dim.8       Dim.9     Dim.10     Dim.11     Dim.12      Dim.13
## 1 -0.39840723  0.15711836 -0.8774019  0.2629555 -0.8590145 -0.10338766
## 2  0.24098846  0.71190482  1.1069949  0.8131197  0.1579226  0.94352928
## 3  0.09737374 -0.02406564  0.4542754 -0.6056039  0.1243871  0.41062660
## 4  1.05956524  1.40543967 -1.1169753 -1.1515135  1.0113158  0.93327116
## 5  0.63637606  0.26380546  0.3777045  0.6513596 -0.1105154 -0.38794797
## 6 -0.16544388  0.13347310 -0.5304312  0.1097663  0.0814415  0.02627445
##         Dim.14       Dim.15      Dim.16     Dim.17      Dim.18      Dim.19
## 1  0.690804097  0.601793127 -0.74511579 -0.2654708 -0.54956269  0.13376750
## 2  0.653475327 -0.008974867  0.64880869 -0.0172122  0.31829738 -0.24756463
## 3 -0.016679759 -0.483419744 -0.32511053  0.1909185 -0.08797493 -0.39262636
## 4  0.487416897  0.168847832 -0.05137025  0.4826339 -0.03587477 -0.02674762
## 5  0.539180548 -0.310319492  0.15260581  0.1331423 -0.01871424  0.46143590
## 6 -0.003136701 -0.178604591  0.01271684  0.1968864 -0.29753864 -0.12984063
##        Dim.20      Dim.21      Dim.22      Dim.23       Dim.24       Dim.25
## 1 -0.34556490 -0.09651489  0.06884992 -0.08451859 -0.175256284 -0.151020059
## 2  0.11413308  0.07732747 -0.09457845  0.21771806  0.011290118 -0.170510254
## 3  0.20453223 -0.31106671 -0.06030903  0.07429112  0.102761759  0.171158125
## 4  0.46473366 -0.43419337 -0.20326576  0.12410465  0.153429663  0.077495702
## 5 -0.06549539  0.11654493 -0.01764985 -0.13945364 -0.005331797  0.003062062
## 6  0.07123716  0.00240229 -0.10116937 -0.03347762  0.002840246  0.122390361
##         Dim.26      Dim.27        Dim.28       Dim.29        Dim.30
## 1  0.201503451  0.25258499 -0.0339144536  0.045647720  0.0471692081
## 2  0.041128785 -0.18127017  0.0326241827 -0.005687424  0.0018678763
## 3 -0.004735412 -0.04956943  0.0470258247  0.003145897 -0.0007505348
## 4  0.275224778 -0.18346206  0.0424843320 -0.069294786  0.0199374155
## 5 -0.039254289 -0.03216785 -0.0347862199  0.005037910 -0.0212137698
## 6  0.030298969  0.08445505  0.0007303007 -0.019721334 -0.0034594744
head(all_pca_ind_3$cos2)
##       Dim.1      Dim.2       Dim.3        Dim.4        Dim.5        Dim.6
## 1 0.7366868 0.03309951 0.010996938 0.1151037016 0.0124508677 1.736597e-02
## 2 0.2164877 0.53913561 0.010637227 0.0474815863 0.0146792249 3.118017e-05
## 3 0.8781764 0.03087731 0.008131356 0.0222203296 0.0008376258 7.830730e-03
## 4 0.2593764 0.53978871 0.053427543 0.0001189646 0.0448178978 4.766328e-02
## 5 0.4498315 0.11023095 0.056101900 0.2511755035 0.0086829482 4.369433e-02
## 6 0.1722700 0.47439927 0.261905853 0.0269264275 0.0339099994 6.185759e-03
##          Dim.7        Dim.8        Dim.9      Dim.10       Dim.11       Dim.12
## 1 4.064787e-02 0.0013836880 2.151977e-04 0.006710902 0.0006027652 0.0064325731
## 2 6.775254e-06 0.0022051042 1.924334e-02 0.046529449 0.0251041522 0.0009469458
## 3 1.192481e-02 0.0002532595 1.546953e-05 0.005512144 0.0097962404 0.0004132687
## 4 1.045269e-02 0.0057393901 1.009799e-02 0.006378190 0.0067787311 0.0052285821
## 5 2.545908e-02 0.0117630901 2.021442e-03 0.004143792 0.0123235390 0.0003547641
## 6 7.313854e-03 0.0008322750 5.416927e-04 0.008555071 0.0003663563 0.0002016773
##         Dim.13       Dim.14       Dim.15       Dim.16       Dim.17       Dim.18
## 1 9.317968e-05 4.160002e-03 3.157026e-03 4.839843e-03 6.143519e-04 2.632802e-03
## 2 3.380239e-02 1.621418e-02 3.058389e-06 1.598343e-02 1.124889e-05 3.846828e-03
## 3 4.503770e-03 7.431246e-06 6.242102e-03 2.823216e-03 9.735943e-04 2.067283e-04
## 4 4.452727e-03 1.214539e-03 1.457476e-04 1.349067e-05 1.190820e-03 6.579435e-06
## 5 4.371603e-03 8.444271e-03 2.797125e-03 6.764503e-04 5.149036e-04 1.017275e-05
## 6 2.099098e-05 2.991656e-07 9.699530e-04 4.917266e-06 1.178683e-03 2.691858e-03
##         Dim.19       Dim.20       Dim.21       Dim.22       Dim.23       Dim.24
## 1 1.559858e-04 0.0010409815 8.120307e-05 4.132289e-05 6.227135e-05 2.677509e-04
## 2 2.327093e-03 0.0004946064 2.270410e-04 3.396417e-04 1.799805e-03 4.839869e-06
## 3 4.117570e-03 0.0011173921 2.584575e-03 9.715089e-05 1.474198e-04 2.820624e-04
## 4 3.657467e-06 0.0011041259 9.637773e-04 2.112218e-04 7.873837e-05 1.203453e-04
## 5 6.184670e-03 0.0001245992 3.945304e-04 9.048481e-06 5.648765e-04 8.257356e-07
## 6 5.126095e-04 0.0001543045 1.754755e-07 3.112171e-04 3.407804e-05 2.452886e-07
##         Dim.25       Dim.26       Dim.27       Dim.28       Dim.29       Dim.30
## 1 1.988168e-04 3.539556e-04 5.561589e-04 1.002659e-05 1.816444e-05 1.939550e-05
## 2 1.103920e-03 6.422859e-05 1.247640e-03 4.041252e-05 1.228197e-06 1.324747e-07
## 3 7.824870e-04 5.989596e-07 6.563115e-05 5.906836e-05 2.643449e-07 1.504609e-08
## 4 3.070192e-05 3.872446e-04 1.720691e-04 9.227157e-06 2.454774e-05 2.032114e-06
## 5 2.723465e-07 4.475772e-05 3.005646e-05 3.514862e-05 7.372157e-07 1.307162e-05
## 6 4.554701e-04 2.791394e-05 2.168786e-04 1.621694e-08 1.182600e-05 3.639031e-07
head(all_pca_ind_3$contrib)
##        Dim.1      Dim.2      Dim.3       Dim.4       Dim.5        Dim.6
## 1 1.11824300 0.11724921 0.07867607 1.171622241 0.152248808 0.2899795975
## 2 0.07544555 0.43846352 0.01747215 0.110961077 0.041210182 0.0001195347
## 3 0.43504757 0.03569681 0.01898609 0.073815998 0.003342766 0.0426748821
## 4 0.67136279 3.26051009 0.65179267 0.002064855 0.934499758 1.3571435270
## 5 0.20492419 0.11718767 0.12045877 0.767300731 0.031864823 0.2189694260
## 6 0.07496891 0.48178208 0.53719796 0.078576986 0.118877695 0.0296128148
##          Dim.7       Dim.8        Dim.9     Dim.10      Dim.11      Dim.12
## 1 1.213658e+00 0.058529188 0.0104067498 0.38579553 0.041345602 0.496569180
## 2 4.644418e-05 0.021414629 0.2136510859 0.61411732 0.395342837 0.016782911
## 3 1.162015e-01 0.003496244 0.0002441497 0.10341857 0.219301852 0.010411881
## 4 5.321820e-01 0.413974398 0.8326944234 0.62524063 0.792871893 0.688259771
## 5 2.281347e-01 0.149329546 0.0293378859 0.07149313 0.253691925 0.008219107
## 6 6.260704e-02 0.010092981 0.0075101521 0.14099971 0.007204489 0.004463446
##        Dim.13       Dim.14       Dim.15       Dim.16       Dim.17      Dim.18
## 1 0.007783311 5.341595e-01 0.6761315700 1.2217735862 0.2085171427 1.008745049
## 2 0.648242810 4.779908e-01 0.0001503809 0.9263530122 0.0008765592 0.338386526
## 3 0.122778155 3.114159e-04 0.4363003250 0.2325976169 0.1078460556 0.025850254
## 4 0.634223945 2.659269e-01 0.0532264801 0.0058071962 0.6891979240 0.004298581
## 5 0.109590751 3.254090e-01 0.1797855904 0.0512489941 0.0524493991 0.001169745
## 6 0.000502683 1.101305e-05 0.0595555177 0.0003558783 0.1146937985 0.295687975
##        Dim.19     Dim.20       Dim.21      Dim.22      Dim.23       Dim.24
## 1 0.063559490 0.67353144 5.461950e-02 0.030361274 0.051577078 2.989766e-01
## 2 0.217698967 0.07347197 3.506119e-02 0.057292461 0.342248175 1.240758e-03
## 3 0.547568039 0.23595117 5.673697e-01 0.023295809 0.039849794 1.027905e-01
## 4 0.002541261 1.21816714 1.105415e+00 0.264631499 0.111206018 2.291441e-01
## 5 0.756313641 0.02419469 7.964267e-02 0.001995239 0.140414495 2.767180e-04
## 6 0.059882566 0.02862278 3.383842e-05 0.065555803 0.008092102 7.852393e-05
##         Dim.25       Dim.26     Dim.27       Dim.28      Dim.29       Dim.30
## 1 0.2589109285 0.8726191306 1.62489212 1.271865e-01 0.489055649 2.9390496667
## 2 0.3300518050 0.0363539416 0.83687792 1.176930e-01 0.007591915 0.0046087798
## 3 0.3325646996 0.0004819203 0.06258026 2.445369e-01 0.002322785 0.0007441001
## 4 0.0681767579 1.6279265773 0.85723913 1.995855e-01 1.126993004 0.5250824154
## 5 0.0001064411 0.0331157040 0.02635442 1.338092e-01 0.005956909 0.5944638618
## 6 0.1700495704 0.0197294722 0.18166059 5.897596e-05 0.091283581 0.0158092037
dudi.pca()
all_pca_ind_4 <- get_pca_ind(all_pca_4)
all_pca_ind_4
## Principal Component Analysis Results for individuals
##  ===================================================
##   Name       Description                       
## 1 "$coord"   "Coordinates for the individuals" 
## 2 "$cos2"    "Cos2 for the individuals"        
## 3 "$contrib" "contributions of the individuals"
head(all_pca_ind_4$coord)
##       Dim.1     Dim.2      Dim.3     Dim.4      Dim.5       Dim.6       Dim.7
## 1 -9.192837  1.948583 -1.1231662 3.6337309 -1.1951101  1.41142445  2.15936987
## 2 -2.387802 -3.768172 -0.5292927 1.1182639  0.6217750  0.02865635  0.01335809
## 3 -5.733896 -1.075174 -0.5517476 0.9120827 -0.1770859  0.54145215 -0.66816648
## 4 -7.122953 10.275589 -3.2327895 0.1525470 -2.9608784  3.05342182  1.42991070
## 5 -3.935302 -1.948072  1.3897667 2.9406393  0.5467474 -1.22649464 -0.93621255
## 6 -2.380247  3.949929 -2.9348768 0.9410369 -1.0560419 -0.45103865  0.49044512
##         Dim.8       Dim.9     Dim.10     Dim.11     Dim.12      Dim.13
## 1  0.39840723 -0.15711836 -0.8774019  0.2629555 -0.8590145  0.10338766
## 2 -0.24098846 -0.71190482  1.1069949  0.8131197  0.1579226 -0.94352928
## 3 -0.09737374  0.02406564  0.4542754 -0.6056039  0.1243871 -0.41062660
## 4 -1.05956524 -1.40543967 -1.1169753 -1.1515135  1.0113158 -0.93327116
## 5 -0.63637606 -0.26380546  0.3777045  0.6513596 -0.1105154  0.38794797
## 6  0.16544388 -0.13347310 -0.5304312  0.1097663  0.0814415 -0.02627445
##         Dim.14       Dim.15      Dim.16     Dim.17      Dim.18      Dim.19
## 1  0.690804097 -0.601793127 -0.74511579  0.2654708  0.54956269  0.13376750
## 2  0.653475327  0.008974867  0.64880869  0.0172122 -0.31829738 -0.24756463
## 3 -0.016679759  0.483419744 -0.32511053 -0.1909185  0.08797493 -0.39262636
## 4  0.487416897 -0.168847832 -0.05137025 -0.4826339  0.03587477 -0.02674762
## 5  0.539180548  0.310319492  0.15260581 -0.1331423  0.01871424  0.46143590
## 6 -0.003136701  0.178604591  0.01271684 -0.1968864  0.29753864 -0.12984063
##        Dim.20      Dim.21      Dim.22      Dim.23       Dim.24       Dim.25
## 1  0.34556490  0.09651489  0.06884992  0.08451859 -0.175256284  0.151020059
## 2 -0.11413308 -0.07732747 -0.09457845 -0.21771806  0.011290118  0.170510254
## 3 -0.20453223  0.31106671 -0.06030903 -0.07429112  0.102761759 -0.171158125
## 4 -0.46473366  0.43419337 -0.20326576 -0.12410465  0.153429663 -0.077495702
## 5  0.06549539 -0.11654493 -0.01764985  0.13945364 -0.005331797 -0.003062062
## 6 -0.07123716 -0.00240229 -0.10116937  0.03347762  0.002840246 -0.122390361
##         Dim.26      Dim.27        Dim.28       Dim.29        Dim.30
## 1 -0.201503451  0.25258499  0.0339144536  0.045647720  0.0471692081
## 2 -0.041128785 -0.18127017 -0.0326241827 -0.005687424  0.0018678763
## 3  0.004735412 -0.04956943 -0.0470258247  0.003145897 -0.0007505348
## 4 -0.275224778 -0.18346206 -0.0424843320 -0.069294786  0.0199374155
## 5  0.039254289 -0.03216785  0.0347862199  0.005037910 -0.0212137698
## 6 -0.030298969  0.08445505 -0.0007303007 -0.019721334 -0.0034594744
head(all_pca_ind_4$cos2)
##       Dim.1      Dim.2       Dim.3        Dim.4        Dim.5        Dim.6
## 1 0.7366868 0.03309951 0.010996938 0.1151037016 0.0124508677 1.736597e-02
## 2 0.2164877 0.53913561 0.010637227 0.0474815863 0.0146792249 3.118017e-05
## 3 0.8781764 0.03087731 0.008131356 0.0222203296 0.0008376258 7.830730e-03
## 4 0.2593764 0.53978871 0.053427543 0.0001189646 0.0448178978 4.766328e-02
## 5 0.4498315 0.11023095 0.056101900 0.2511755035 0.0086829482 4.369433e-02
## 6 0.1722700 0.47439927 0.261905853 0.0269264275 0.0339099994 6.185759e-03
##          Dim.7        Dim.8        Dim.9      Dim.10       Dim.11       Dim.12
## 1 4.064787e-02 0.0013836880 2.151977e-04 0.006710902 0.0006027652 0.0064325731
## 2 6.775254e-06 0.0022051042 1.924334e-02 0.046529449 0.0251041522 0.0009469458
## 3 1.192481e-02 0.0002532595 1.546953e-05 0.005512144 0.0097962404 0.0004132687
## 4 1.045269e-02 0.0057393901 1.009799e-02 0.006378190 0.0067787311 0.0052285821
## 5 2.545908e-02 0.0117630901 2.021442e-03 0.004143792 0.0123235390 0.0003547641
## 6 7.313854e-03 0.0008322750 5.416927e-04 0.008555071 0.0003663563 0.0002016773
##         Dim.13       Dim.14       Dim.15       Dim.16       Dim.17       Dim.18
## 1 9.317968e-05 4.160002e-03 3.157026e-03 4.839843e-03 6.143519e-04 2.632802e-03
## 2 3.380239e-02 1.621418e-02 3.058389e-06 1.598343e-02 1.124889e-05 3.846828e-03
## 3 4.503770e-03 7.431246e-06 6.242102e-03 2.823216e-03 9.735943e-04 2.067283e-04
## 4 4.452727e-03 1.214539e-03 1.457476e-04 1.349067e-05 1.190820e-03 6.579435e-06
## 5 4.371603e-03 8.444271e-03 2.797125e-03 6.764503e-04 5.149036e-04 1.017275e-05
## 6 2.099098e-05 2.991656e-07 9.699530e-04 4.917266e-06 1.178683e-03 2.691858e-03
##         Dim.19       Dim.20       Dim.21       Dim.22       Dim.23       Dim.24
## 1 1.559858e-04 0.0010409815 8.120307e-05 4.132289e-05 6.227135e-05 2.677509e-04
## 2 2.327093e-03 0.0004946064 2.270410e-04 3.396417e-04 1.799805e-03 4.839869e-06
## 3 4.117570e-03 0.0011173921 2.584575e-03 9.715089e-05 1.474198e-04 2.820624e-04
## 4 3.657467e-06 0.0011041259 9.637773e-04 2.112218e-04 7.873837e-05 1.203453e-04
## 5 6.184670e-03 0.0001245992 3.945304e-04 9.048481e-06 5.648765e-04 8.257356e-07
## 6 5.126095e-04 0.0001543045 1.754755e-07 3.112171e-04 3.407804e-05 2.452886e-07
##         Dim.25       Dim.26       Dim.27       Dim.28       Dim.29       Dim.30
## 1 1.988168e-04 3.539556e-04 5.561589e-04 1.002659e-05 1.816444e-05 1.939550e-05
## 2 1.103920e-03 6.422859e-05 1.247640e-03 4.041252e-05 1.228197e-06 1.324747e-07
## 3 7.824870e-04 5.989596e-07 6.563115e-05 5.906836e-05 2.643449e-07 1.504609e-08
## 4 3.070192e-05 3.872446e-04 1.720691e-04 9.227157e-06 2.454774e-05 2.032114e-06
## 5 2.723465e-07 4.475772e-05 3.005646e-05 3.514862e-05 7.372157e-07 1.307162e-05
## 6 4.554701e-04 2.791394e-05 2.168786e-04 1.621694e-08 1.182600e-05 3.639031e-07
head(all_pca_ind_4$contrib)
##        Dim.1      Dim.2      Dim.3       Dim.4       Dim.5        Dim.6
## 1 1.11824300 0.11724921 0.07867607 1.171622241 0.152248808 0.2899795975
## 2 0.07544555 0.43846352 0.01747215 0.110961077 0.041210182 0.0001195347
## 3 0.43504757 0.03569681 0.01898609 0.073815998 0.003342766 0.0426748821
## 4 0.67136279 3.26051009 0.65179267 0.002064855 0.934499758 1.3571435270
## 5 0.20492419 0.11718767 0.12045877 0.767300731 0.031864823 0.2189694260
## 6 0.07496891 0.48178208 0.53719796 0.078576986 0.118877695 0.0296128148
##          Dim.7       Dim.8        Dim.9     Dim.10      Dim.11      Dim.12
## 1 1.213658e+00 0.058529188 0.0104067498 0.38579553 0.041345602 0.496569180
## 2 4.644418e-05 0.021414629 0.2136510859 0.61411732 0.395342837 0.016782911
## 3 1.162015e-01 0.003496244 0.0002441497 0.10341857 0.219301852 0.010411881
## 4 5.321820e-01 0.413974398 0.8326944234 0.62524063 0.792871893 0.688259771
## 5 2.281347e-01 0.149329546 0.0293378859 0.07149313 0.253691925 0.008219107
## 6 6.260704e-02 0.010092981 0.0075101521 0.14099971 0.007204489 0.004463446
##        Dim.13       Dim.14       Dim.15       Dim.16       Dim.17      Dim.18
## 1 0.007783311 5.341595e-01 0.6761315700 1.2217735862 0.2085171427 1.008745049
## 2 0.648242810 4.779908e-01 0.0001503809 0.9263530122 0.0008765592 0.338386526
## 3 0.122778155 3.114159e-04 0.4363003250 0.2325976169 0.1078460556 0.025850254
## 4 0.634223945 2.659269e-01 0.0532264801 0.0058071962 0.6891979240 0.004298581
## 5 0.109590751 3.254090e-01 0.1797855904 0.0512489941 0.0524493991 0.001169745
## 6 0.000502683 1.101305e-05 0.0595555177 0.0003558783 0.1146937985 0.295687975
##        Dim.19     Dim.20       Dim.21      Dim.22      Dim.23       Dim.24
## 1 0.063559490 0.67353144 5.461950e-02 0.030361274 0.051577078 2.989766e-01
## 2 0.217698967 0.07347197 3.506119e-02 0.057292461 0.342248175 1.240758e-03
## 3 0.547568039 0.23595117 5.673697e-01 0.023295809 0.039849794 1.027905e-01
## 4 0.002541261 1.21816714 1.105415e+00 0.264631499 0.111206018 2.291441e-01
## 5 0.756313641 0.02419469 7.964267e-02 0.001995239 0.140414495 2.767180e-04
## 6 0.059882566 0.02862278 3.383842e-05 0.065555803 0.008092102 7.852393e-05
##         Dim.25       Dim.26     Dim.27       Dim.28      Dim.29       Dim.30
## 1 0.2589109285 0.8726191306 1.62489212 1.271865e-01 0.489055649 2.9390496667
## 2 0.3300518050 0.0363539416 0.83687792 1.176930e-01 0.007591915 0.0046087798
## 3 0.3325646996 0.0004819203 0.06258026 2.445369e-01 0.002322785 0.0007441001
## 4 0.0681767579 1.6279265773 0.85723913 1.995855e-01 1.126993004 0.5250824154
## 5 0.0001064411 0.0331157040 0.02635442 1.338092e-01 0.005956909 0.5944638618
## 6 0.1700495704 0.0197294722 0.18166059 5.897596e-05 0.091283581 0.0158092037
epPCA()
all_pca_ind_5 <- get_pca_ind(all_pca_5)
all_pca_ind_5
## Principal Component Analysis Results for individuals
##  ===================================================
##   Name       Description                       
## 1 "$coord"   "Coordinates for the individuals" 
## 2 "$cos2"    "Cos2 for the individuals"        
## 3 "$contrib" "contributions of the individuals"
head(all_pca_ind_5$coord)
##           [,1]       [,2]       [,3]      [,4]       [,5]        [,6]
## [1,] -9.184755  -1.946870 -1.1221788 3.6305364  1.1940595  1.41018364
## [2,] -2.385703   3.764859 -0.5288274 1.1172808 -0.6212284  0.02863116
## [3,] -5.728855   1.074229 -0.5512625 0.9112808  0.1769302  0.54097615
## [4,] -7.116691 -10.266556 -3.2299475 0.1524129  2.9582754  3.05073750
## [5,] -3.931842   1.946359  1.3885450 2.9380542 -0.5462667 -1.22541641
## [6,] -2.378155  -3.946456 -2.9322967 0.9402096  1.0551135 -0.45064213
##             [,7]        [,8]        [,9]      [,10]      [,11]      [,12]
## [1,]  2.15747152  0.39805698 -0.15698023 -0.8766305 -0.2627243 -0.8582593
## [2,]  0.01334635 -0.24077660 -0.71127897  1.1060218 -0.8124048  0.1577838
## [3,] -0.66757908 -0.09728813  0.02404449  0.4538760  0.6050715  0.1242777
## [4,]  1.42865363 -1.05863376 -1.40420412 -1.1159933  1.1505012  1.0104267
## [5,] -0.93538950 -0.63581661 -0.26357355  0.3773724 -0.6507870 -0.1104183
## [6,]  0.49001396  0.16529843 -0.13335576 -0.5299649 -0.1096698  0.0813699
##            [,13]        [,14]        [,15]       [,16]       [,17]       [,18]
## [1,]  0.10329677 -0.690196797  0.601264078  0.74446075 -0.26523740 -0.54907956
## [2,] -0.94269981 -0.652900844 -0.008966977 -0.64823831 -0.01719707  0.31801756
## [3,] -0.41026561  0.016665095 -0.482994760  0.32482472  0.19075064 -0.08789759
## [4,] -0.93245070 -0.486988399  0.168699395  0.05132509  0.48220960 -0.03584323
## [5,]  0.38760691 -0.538706543 -0.310046684 -0.15247165  0.13302526 -0.01869779
## [6,] -0.02625135  0.003133944 -0.178447576 -0.01270566  0.19671335 -0.29727706
##           [,19]       [,20]        [,21]       [,22]       [,23]        [,24]
## [1,]  0.1336499  0.34526111  0.096430045 -0.06878939  0.08444429  0.175102213
## [2,] -0.2473470 -0.11403274 -0.077259494  0.09449530 -0.21752666 -0.011280193
## [3,] -0.3922812 -0.20435242  0.310793246  0.06025601 -0.07422581 -0.102671419
## [4,] -0.0267241 -0.46432511  0.433811661  0.20308706 -0.12399554 -0.153294780
## [5,]  0.4610302  0.06543782 -0.116442469  0.01763433  0.13933105  0.005327110
## [6,] -0.1297265 -0.07117453 -0.002400178  0.10108043  0.03344819 -0.002837749
##             [,25]        [,26]       [,27]         [,28]        [,29]
## [1,]  0.150887294 -0.201326305 -0.25236294 -0.0338846387  0.045607590
## [2,]  0.170360355 -0.041092627  0.18111081  0.0325955021 -0.005682424
## [3,] -0.171007656  0.004731249  0.04952586  0.0469844833  0.003143131
## [4,] -0.077427574 -0.274982822  0.18330078  0.0424469831 -0.069233868
## [5,] -0.003059371  0.039219780  0.03213957 -0.0347556386  0.005033481
## [6,] -0.122282765 -0.030272333 -0.08438081  0.0007296587 -0.019703996
##              [,30]
## [1,]  0.0471277407
## [2,]  0.0018662342
## [3,] -0.0007498749
## [4,]  0.0199198881
## [5,] -0.0211951203
## [6,] -0.0034564331
head(all_pca_ind_5$cos2)
##           [,1]       [,2]        [,3]         [,4]         [,5]         [,6]
## [1,] 0.7366868 0.03309951 0.010996938 0.1151037016 0.0124508677 1.736597e-02
## [2,] 0.2164877 0.53913561 0.010637227 0.0474815863 0.0146792249 3.118017e-05
## [3,] 0.8781764 0.03087731 0.008131356 0.0222203296 0.0008376258 7.830730e-03
## [4,] 0.2593764 0.53978871 0.053427543 0.0001189646 0.0448178978 4.766328e-02
## [5,] 0.4498315 0.11023095 0.056101900 0.2511755035 0.0086829482 4.369433e-02
## [6,] 0.1722700 0.47439927 0.261905853 0.0269264275 0.0339099994 6.185759e-03
##              [,7]         [,8]         [,9]       [,10]        [,11]
## [1,] 4.064787e-02 0.0013836880 2.151977e-04 0.006710902 0.0006027652
## [2,] 6.775254e-06 0.0022051042 1.924334e-02 0.046529449 0.0251041522
## [3,] 1.192481e-02 0.0002532595 1.546953e-05 0.005512144 0.0097962404
## [4,] 1.045269e-02 0.0057393901 1.009799e-02 0.006378190 0.0067787311
## [5,] 2.545908e-02 0.0117630901 2.021442e-03 0.004143792 0.0123235390
## [6,] 7.313854e-03 0.0008322750 5.416927e-04 0.008555071 0.0003663563
##             [,12]        [,13]        [,14]        [,15]        [,16]
## [1,] 0.0064325731 9.317968e-05 4.160002e-03 3.157026e-03 4.839843e-03
## [2,] 0.0009469458 3.380239e-02 1.621418e-02 3.058389e-06 1.598343e-02
## [3,] 0.0004132687 4.503770e-03 7.431246e-06 6.242102e-03 2.823216e-03
## [4,] 0.0052285821 4.452727e-03 1.214539e-03 1.457476e-04 1.349067e-05
## [5,] 0.0003547641 4.371603e-03 8.444271e-03 2.797125e-03 6.764503e-04
## [6,] 0.0002016773 2.099098e-05 2.991656e-07 9.699530e-04 4.917266e-06
##             [,17]        [,18]        [,19]        [,20]        [,21]
## [1,] 6.143519e-04 2.632802e-03 1.559858e-04 0.0010409815 8.120307e-05
## [2,] 1.124889e-05 3.846828e-03 2.327093e-03 0.0004946064 2.270410e-04
## [3,] 9.735943e-04 2.067283e-04 4.117570e-03 0.0011173921 2.584575e-03
## [4,] 1.190820e-03 6.579435e-06 3.657467e-06 0.0011041259 9.637773e-04
## [5,] 5.149036e-04 1.017275e-05 6.184670e-03 0.0001245992 3.945304e-04
## [6,] 1.178683e-03 2.691858e-03 5.126095e-04 0.0001543045 1.754755e-07
##             [,22]        [,23]        [,24]        [,25]        [,26]
## [1,] 4.132289e-05 6.227135e-05 2.677509e-04 1.988168e-04 3.539556e-04
## [2,] 3.396417e-04 1.799805e-03 4.839869e-06 1.103920e-03 6.422859e-05
## [3,] 9.715089e-05 1.474198e-04 2.820624e-04 7.824870e-04 5.989596e-07
## [4,] 2.112218e-04 7.873837e-05 1.203453e-04 3.070192e-05 3.872446e-04
## [5,] 9.048481e-06 5.648765e-04 8.257356e-07 2.723465e-07 4.475772e-05
## [6,] 3.112171e-04 3.407804e-05 2.452886e-07 4.554701e-04 2.791394e-05
##             [,27]        [,28]        [,29]        [,30]
## [1,] 5.561589e-04 1.002659e-05 1.816444e-05 1.939550e-05
## [2,] 1.247640e-03 4.041252e-05 1.228197e-06 1.324747e-07
## [3,] 6.563115e-05 5.906836e-05 2.643449e-07 1.504609e-08
## [4,] 1.720691e-04 9.227157e-06 2.454774e-05 2.032114e-06
## [5,] 3.005646e-05 3.514862e-05 7.372157e-07 1.307162e-05
## [6,] 2.168786e-04 1.621694e-08 1.182600e-05 3.639031e-07
head(all_pca_ind_5$contrib)
##            [,1]       [,2]       [,3]        [,4]        [,5]         [,6]
## [1,] 1.11824300 0.11724921 0.07867607 1.171622241 0.152248808 0.2899795975
## [2,] 0.07544555 0.43846352 0.01747215 0.110961077 0.041210182 0.0001195347
## [3,] 0.43504757 0.03569681 0.01898609 0.073815998 0.003342766 0.0426748821
## [4,] 0.67136279 3.26051009 0.65179267 0.002064855 0.934499758 1.3571435270
## [5,] 0.20492419 0.11718767 0.12045877 0.767300731 0.031864823 0.2189694260
## [6,] 0.07496891 0.48178208 0.53719796 0.078576986 0.118877695 0.0296128148
##              [,7]        [,8]         [,9]      [,10]       [,11]       [,12]
## [1,] 1.213658e+00 0.058529188 0.0104067498 0.38579553 0.041345602 0.496569180
## [2,] 4.644418e-05 0.021414629 0.2136510859 0.61411732 0.395342837 0.016782911
## [3,] 1.162015e-01 0.003496244 0.0002441497 0.10341857 0.219301852 0.010411881
## [4,] 5.321820e-01 0.413974398 0.8326944234 0.62524063 0.792871893 0.688259771
## [5,] 2.281347e-01 0.149329546 0.0293378859 0.07149313 0.253691925 0.008219107
## [6,] 6.260704e-02 0.010092981 0.0075101521 0.14099971 0.007204489 0.004463446
##            [,13]        [,14]        [,15]        [,16]        [,17]
## [1,] 0.007783311 5.341595e-01 0.6761315700 1.2217735862 0.2085171427
## [2,] 0.648242810 4.779908e-01 0.0001503809 0.9263530122 0.0008765592
## [3,] 0.122778155 3.114159e-04 0.4363003250 0.2325976169 0.1078460556
## [4,] 0.634223945 2.659269e-01 0.0532264801 0.0058071962 0.6891979240
## [5,] 0.109590751 3.254090e-01 0.1797855904 0.0512489941 0.0524493991
## [6,] 0.000502683 1.101305e-05 0.0595555177 0.0003558783 0.1146937985
##            [,18]       [,19]      [,20]        [,21]       [,22]       [,23]
## [1,] 1.008745049 0.063559490 0.67353144 5.461950e-02 0.030361274 0.051577078
## [2,] 0.338386526 0.217698967 0.07347197 3.506119e-02 0.057292461 0.342248175
## [3,] 0.025850254 0.547568039 0.23595117 5.673697e-01 0.023295809 0.039849794
## [4,] 0.004298581 0.002541261 1.21816714 1.105415e+00 0.264631499 0.111206018
## [5,] 0.001169745 0.756313641 0.02419469 7.964267e-02 0.001995239 0.140414495
## [6,] 0.295687975 0.059882566 0.02862278 3.383842e-05 0.065555803 0.008092102
##             [,24]        [,25]        [,26]      [,27]        [,28]       [,29]
## [1,] 2.989766e-01 0.2589109285 0.8726191306 1.62489212 1.271865e-01 0.489055649
## [2,] 1.240758e-03 0.3300518050 0.0363539416 0.83687792 1.176930e-01 0.007591915
## [3,] 1.027905e-01 0.3325646996 0.0004819203 0.06258026 2.445369e-01 0.002322785
## [4,] 2.291441e-01 0.0681767579 1.6279265773 0.85723913 1.995855e-01 1.126993004
## [5,] 2.767180e-04 0.0001064411 0.0331157040 0.02635442 1.338092e-01 0.005956909
## [6,] 7.852393e-05 0.1700495704 0.0197294722 0.18166059 5.897596e-05 0.091283581
##             [,30]
## [1,] 2.9390496667
## [2,] 0.0046087798
## [3,] 0.0007441001
## [4,] 0.5250824154
## [5,] 0.5944638618
## [6,] 0.0158092037

Each of these results has its usefulness, although more often than not a plot is required in order to properly interpret the information they hold. Some of these plots and applications are overviewed from this point onward.

2.2.8. Contributions

It was already stated that Principal Components are linear combinations of the dataset original features/variables. These linear combinations are more dependant on certain variables than upon others, and the distribution is stored within the results for variables under the $contrib index and expressed as a percentage (as in how much of a given Principal Component is determined by the variable in question). Note that each of the individuals also contributes to the Principal Components (albeit in a less direct way due to the higher amount of them) - these individual contributions are stored within the results for individuals under the $contrib index (also expressed as a percentage).

The function fviz_contrib() from the factoextra package can be used to draw a barplot of these contributions.

# Do not run this code snippet, as it is only here for illustration purposes
library(factoextra)
fviz_contrib(X,
             choice = c("row", "col", "var", "ind", "quanti.var", "quali.var",
                        "group", "partial.axes"),
             axes = 1,
             fill = "steelblue",
             color = "steelblue",
             sort.val = c("desc", "asc", "none"),
             top = Inf,
             ggtheme = theme_minimal(),
             ...)
Let’s detail the function’s arguments:
  • X: an object of class PCA, CA, MCA, FAMD, MFA and HMFA (from the FactoMineR package); prcomp and princomp (from R built-in functions); dudi, pca, coa and acm (from the ade4 package); ca (from the ca package).
  • choice: allowed values are “row” and “col” for CA objects; “var” and “ind” for PCA or MCA objects; “var”, “ind”, “quanti.var”, “quali.var” and “group” for FAMD, MFA and HMFA objects.
  • axes: a numeric vector specifying the dimension(s) of interest (it can be used to evaluate contributions either in a single dimension or across multiple dimensions, as is showcased in the upcoming code snippet).
  • fill: a fill color for the bar plot.
  • color: an outline color for the bar plot.
  • sort.val: a string specifying whether the value should be sorted. Allowed values are “none” (no sorting), “asc” (for ascending) or “desc” (for descending).
  • top: a numeric value specifying the number of top elements to be shown.
  • ggtheme: allows the user to tweak the plot’s aesthetic through a ggplot2-based theme customization.

More information regarding the fviz_contrib() function and its arguments is available in its associated RDocumentation page: https://www.rdocumentation.org/packages/factoextra/versions/1.0.7/topics/fviz_contrib

Let’s now observe the results of applying the function at hand to the previously constructed PCA objects. Note that various graphs are plotted to showcase an evalution of these contributions in various scenarios: a pair of single-dimensional ones and a combination of these (a multi-dimension scenario) - to do so, the function grid.arrange() from the gridExtra package is used; more information regarding this function and its arguments is available in its associated RDocumentation page: https://www.rdocumentation.org/packages/gridExtra/versions/2.3/topics/arrangeGrob

The following code snippets and plots cover the contribution of variables to PCs:

prcomp()
barplot_1 <- fviz_contrib(all_pca_1, 
                          choice = "var", 
                          axes = 1) + 
  theme(text = element_text(size = 10),
        axis.text.x = element_text(angle = 90))

barplot_2 <- fviz_contrib(all_pca_1, 
                          choice = "var", 
                          axes = 2) + 
  theme(text = element_text(size = 10),
        axis.text.x = element_text(angle = 90))

barplot_3 <- fviz_contrib(all_pca_1, 
                          choice = "var", 
                          axes = 1:2) + 
  theme(text = element_text(size = 10),
        axis.text.x = element_text(angle = 90))

grid.arrange(barplot_1, barplot_2, barplot_3, nrow = 3, ncol = 1)

princomp()
barplot_1 <- fviz_contrib(all_pca_2, 
                          choice = "var", 
                          axes = 1) + 
  theme(text = element_text(size = 10),
        axis.text.x = element_text(angle = 90))

barplot_2 <- fviz_contrib(all_pca_2, 
                          choice = "var", 
                          axes = 2) + 
  theme(text = element_text(size = 10),
        axis.text.x = element_text(angle = 90))

barplot_3 <- fviz_contrib(all_pca_2, 
                          choice = "var", 
                          axes = 1:2) + 
  theme(text = element_text(size = 10),
        axis.text.x = element_text(angle = 90))

grid.arrange(barplot_1, barplot_2, barplot_3, nrow = 3, ncol = 1)

PCA()
barplot_1 <- fviz_contrib(all_pca_3, 
                          choice = "var", 
                          axes = 1) + 
  theme(text = element_text(size = 10),
        axis.text.x = element_text(angle = 90))

barplot_2 <- fviz_contrib(all_pca_3, 
                          choice = "var", 
                          axes = 2) + 
  theme(text = element_text(size = 10),
        axis.text.x = element_text(angle = 90))

barplot_3 <- fviz_contrib(all_pca_3, 
                          choice = "var", 
                          axes = 1:2) + 
  theme(text = element_text(size = 10),
        axis.text.x = element_text(angle = 90))

grid.arrange(barplot_1, barplot_2, barplot_3, nrow = 3, ncol = 1)

dudi.pca()
barplot_1 <- fviz_contrib(all_pca_4, 
                          choice = "var", 
                          axes = 1) + 
  theme(text = element_text(size = 10),
        axis.text.x = element_text(angle = 90))

barplot_2 <- fviz_contrib(all_pca_4, 
                          choice = "var", 
                          axes = 2) + 
  theme(text = element_text(size = 10),
        axis.text.x = element_text(angle = 90))

barplot_3 <- fviz_contrib(all_pca_4, 
                          choice = "var", 
                          axes = 1:2) + 
  theme(text = element_text(size = 10),
        axis.text.x = element_text(angle = 90))

grid.arrange(barplot_1, barplot_2, barplot_3, nrow = 3, ncol = 1)

epPCA()
barplot_1 <- fviz_contrib(all_pca_5, 
                          choice = "var", 
                          axes = 1) + 
  theme(text = element_text(size = 10),
        axis.text.x = element_text(angle = 90))

barplot_2 <- fviz_contrib(all_pca_5, 
                          choice = "var", 
                          axes = 2) + 
  theme(text = element_text(size = 10),
        axis.text.x = element_text(angle = 90))

barplot_3 <- fviz_contrib(all_pca_5, 
                          choice = "var", 
                          axes = 1:2) + 
  theme(text = element_text(size = 10),
        axis.text.x = element_text(angle = 90))

grid.arrange(barplot_1, barplot_2, barplot_3, nrow = 3, ncol = 1)


Unsurprisingly, all of the resulting plots are identical - as should be. Let’s now evaluate the contribution of individuals to PCs:

prcomp()
barplot_1 <- fviz_contrib(all_pca_1, 
                          choice = "ind", 
                          axes = 1) + 
  theme(text = element_text(size = 10),
        axis.text.x = element_text(angle = 90))

barplot_2 <- fviz_contrib(all_pca_1, 
                          choice = "ind", 
                          axes = 2) + 
  theme(text = element_text(size = 10),
        axis.text.x = element_text(angle = 90))

barplot_3 <- fviz_contrib(all_pca_1, 
                          choice = "ind", 
                          axes = 1:2) + 
  theme(text = element_text(size = 10),
        axis.text.x = element_text(angle = 90))

grid.arrange(barplot_1, barplot_2, barplot_3, nrow = 3, ncol = 1)

princomp()
barplot_1 <- fviz_contrib(all_pca_2, 
                          choice = "ind", 
                          axes = 1) + 
  theme(text = element_text(size = 10),
        axis.text.x = element_text(angle = 90))

barplot_2 <- fviz_contrib(all_pca_2, 
                          choice = "ind", 
                          axes = 2) + 
  theme(text = element_text(size = 10),
        axis.text.x = element_text(angle = 90))

barplot_3 <- fviz_contrib(all_pca_2, 
                          choice = "ind", 
                          axes = 1:2) + 
  theme(text = element_text(size = 10),
        axis.text.x = element_text(angle = 90))

grid.arrange(barplot_1, barplot_2, barplot_3, nrow = 3, ncol = 1)

PCA()
barplot_1 <- fviz_contrib(all_pca_3, 
                          choice = "ind", 
                          axes = 1) + 
  theme(text = element_text(size = 10),
        axis.text.x = element_text(angle = 90))

barplot_2 <- fviz_contrib(all_pca_3, 
                          choice = "ind", 
                          axes = 2) + 
  theme(text = element_text(size = 10),
        axis.text.x = element_text(angle = 90))

barplot_3 <- fviz_contrib(all_pca_3, 
                          choice = "ind", 
                          axes = 1:2) + 
  theme(text = element_text(size = 10),
        axis.text.x = element_text(angle = 90))

grid.arrange(barplot_1, barplot_2, barplot_3, nrow = 3, ncol = 1)

dudi.pca()
barplot_1 <- fviz_contrib(all_pca_4, 
                          choice = "ind", 
                          axes = 1) + 
  theme(text = element_text(size = 10),
        axis.text.x = element_text(angle = 90))

barplot_2 <- fviz_contrib(all_pca_4, 
                          choice = "ind", 
                          axes = 2) + 
  theme(text = element_text(size = 10),
        axis.text.x = element_text(angle = 90))

barplot_3 <- fviz_contrib(all_pca_4, 
                          choice = "ind", 
                          axes = 1:2) + 
  theme(text = element_text(size = 10),
        axis.text.x = element_text(angle = 90))

grid.arrange(barplot_1, barplot_2, barplot_3, nrow = 3, ncol = 1)

epPCA()
barplot_1 <- fviz_contrib(all_pca_5, 
                          choice = "ind", 
                          axes = 1) + 
  theme(text = element_text(size = 10),
        axis.text.x = element_text(angle = 90))

barplot_2 <- fviz_contrib(all_pca_5, 
                          choice = "ind", 
                          axes = 2) + 
  theme(text = element_text(size = 10),
        axis.text.x = element_text(angle = 90))

barplot_3 <- fviz_contrib(all_pca_5, 
                          choice = "ind", 
                          axes = 1:2) + 
  theme(text = element_text(size = 10),
        axis.text.x = element_text(angle = 90))

grid.arrange(barplot_1, barplot_2, barplot_3, nrow = 3, ncol = 1)


Once again, identical outputs - as should be. Note that the amount of individuals makes it impossible to apreciate the X-axis’ labels (without an extreme zoom-in). A more illustrative approach to visualizing these contributions involves the use of the corrplot() function from the corrplot package, which showcases every contribution (either variables’ or individuals’) on every Principal Component.

# Do not run this code snippet, as it is only here for illustration purposes
library(corrplot)
corrplot(corr, 
         is.corr = FALSE,
         ...)
corrplot() is a really complete function with much attention to detail, meaning that there are way too many arguments to tweak its behavior and graph for this document to cover - given that most of them are of negligible relevance for the task at hand, only the key arguments will be detailed:
  • corr: the correlation matrix to visualize.
  • method: the visualization method of correlation matrix to be used. It currently supports seven methods, named ‘circle’ (default), ‘square’, ‘ellipse’, ‘number’, ‘pie’, ‘shade’ and ‘color’.
  • type: a character determines whether to display full matrix (‘full’), lower triangular (‘lower’) or upper triangular matrix (‘upper’).
  • add: TRUE or FALSE determines wheter or not to add the graph to an existing plot or not.
  • title: the title of the graph
  • is.corr: TRUE or FALSE determines whether the input matrix is a correlation matrix or not.
  • diag: TRUE or FALSE determines whether to display the correlation coefficient on the principal diagonal

More information regarding the corrplot() function and all of its arguments is available in its associated RDocumentation page: page: https://www.rdocumentation.org/packages/corrplot/versions/0.90/topics/corrplot

Let’s now observe the results of applying the function at hand to the Wisconsin Breast Cancer Dataset. Note that the function’s very first argument corresponds to the contributions themselves (previous functions used the PCA objects instead) - let’s evaluate the variables first:

prcomp()
corrplot(all_pca_var_1$contrib, is.corr=FALSE)

princomp()
corrplot(all_pca_var_2$contrib, is.corr=FALSE)

PCA()
corrplot(all_pca_var_3$contrib, is.corr=FALSE)

dudi.pca()
corrplot(as.matrix(all_pca_var_4$contrib), is.corr=FALSE)

epPCA()
corrplot(all_pca_var_5$contrib, is.corr=FALSE)

Once again, all of the resulting plots are identical - as should be. However, note the use of the function as.matrix() within the dudi.pca() tab - that is due to the particular structure of said function’s objects which creates oddities when applying certain functions such as get_pca_var() or get_pca_ind(). The resulting objects obtained via said functions are usually of matrix class, but in this case it is a data.frame instead; using as.matrix() reformats the data.frame so that the function corrplot() accepts the input.

Applying corrplot() upon the individuals’ contributions would yield a massive plot due to the sheer amount of individuals. Given that, the following code snippets are not rendered in order to keep the document clean and readable.

prcomp()
corrplot(all_pca_ind_1$contrib, is.corr=FALSE)
princomp()
corrplot(all_pca_ind_2$contrib, is.corr=FALSE)
PCA()
corrplot(all_pca_ind_3$contrib, is.corr=FALSE)
dudi.pca()
corrplot(as.matrix(all_pca_ind_4$contrib), is.corr=FALSE)
epPCA()
corrplot(all_pca_ind_5$contrib, is.corr=FALSE)

2.2.9. Quality of representation

The quality of representation (cos2) measures how well represented is a given variable within a given Principal Component (or within a set of them). This logic is also applied to the individuals, meaning that cos2 measures how well represented they are within a given Principal Component (or within a set of them). It is worth noting that for any given variable or individual the sum of the cos2 across all the Principal Components is equal to one.

The function fviz_cos2() from the factoextra package helps to visualize through a barplot which of the PCA variables and/or individuals are best represented within a certain Principal Component (or within a set of Principal Components).

# Do not run this code snippet, as it is only here for illustration purposes
library(factoextra)
fviz_cos2(X,
          choice = c("row", "col", "var", "ind", "quanti.var", "quali.var", "group"),
          axes = 1,
          fill = "steelblue",
          color = "steelblue",
          sort.val = c("desc", "asc", "none"),
          top = Inf,
          xtickslab.rt = 45,
          ggtheme = theme_minimal(),
          ...)
Let’s detail the function’s arguments:
  • X: an object of class PCA, CA, MCA, FAMD, MFA and HMFA (from the FactoMineR package); prcomp and princomp (from R built-in functions); dudi, pca, coa and acm (from the ade4 package); ca (from the ca package).
  • choice: allowed values are “row” and “col” for CA objects; “var” and “ind” for PCA or MCA objects; “var”, “ind”, “quanti.var”, “quali.var” and “group” for FAMD, MFA and HMFA objects.
  • axes: a numeric vector specifying the dimension(s) of interest (it can be used to evaluate cos2 either in a single dimension or across multiple dimensions, as is showcased in the upcoming code snippets).
  • fill: a fill color for the bar plot.
  • color: an outline color for the bar plot.
  • sort.val: a string specifying whether the value should be sorted. Allowed values are “none” (no sorting), “asc” (for ascending) or “desc” (for descending).
  • top: a numeric value specifying the number of top elements to be shown.
  • ggtheme: allows the user to tweak the plot’s aesthetic through a ggplot2-based theme customization.

More information regarding the fviz_cos2() function and its arguments is available in its associated RDocumentation page: https://www.rdocumentation.org/packages/factoextra/versions/1.0.7/topics/fviz_cos2

Let’s now observe the results of applying the function at hand to the previously constructed PCA objects. Note that various graphs are plotted to showcase an evalution of cos2 in various scenarios: a pair of single-dimensional ones and a combination of these (a multi-dimension scenario) - to do so, the function grid.arrange() from the gridExtra package is used once again.

The following code snippets and plots cover the contribution of variables to PCs:

prcomp()
barplot_1 <- fviz_cos2(all_pca_1, 
                       choice = "var", 
                       axes = 1) + 
  theme(text = element_text(size = 10),
        axis.text.x = element_text(angle = 90))

barplot_2 <- fviz_cos2(all_pca_1, 
                       choice = "var", 
                       axes = 2) + 
  theme(text = element_text(size = 10),
        axis.text.x = element_text(angle = 90))

barplot_3 <- fviz_cos2(all_pca_1, 
                       choice = "var", 
                       axes = 1:2) + 
  theme(text = element_text(size = 10),
        axis.text.x = element_text(angle = 90))

grid.arrange(barplot_1, barplot_2, barplot_3, nrow = 3, ncol = 1)

princomp()
barplot_1 <- fviz_cos2(all_pca_2, 
                       choice = "var", 
                       axes = 1) + 
  theme(text = element_text(size = 10),
        axis.text.x = element_text(angle = 90))

barplot_2 <- fviz_cos2(all_pca_2, 
                       choice = "var", 
                       axes = 2) + 
  theme(text = element_text(size = 10),
        axis.text.x = element_text(angle = 90))

barplot_3 <- fviz_cos2(all_pca_2, 
                       choice = "var", 
                       axes = 1:2) + 
  theme(text = element_text(size = 10),
        axis.text.x = element_text(angle = 90))

grid.arrange(barplot_1, barplot_2, barplot_3, nrow = 3, ncol = 1)

PCA()
barplot_1 <- fviz_cos2(all_pca_3, 
                       choice = "var", 
                       axes = 1) + 
  theme(text = element_text(size = 10),
        axis.text.x = element_text(angle = 90))

barplot_2 <- fviz_cos2(all_pca_3, 
                       choice = "var", 
                       axes = 2) + 
  theme(text = element_text(size = 10),
        axis.text.x = element_text(angle = 90))

barplot_3 <- fviz_cos2(all_pca_3, 
                       choice = "var", 
                       axes = 1:2) + 
  theme(text = element_text(size = 10),
        axis.text.x = element_text(angle = 90))

grid.arrange(barplot_1, barplot_2, barplot_3, nrow = 3, ncol = 1)

dudi.pca()
barplot_1 <- fviz_cos2(all_pca_4, 
                       choice = "var", 
                       axes = 1) + 
  theme(text = element_text(size = 10),
        axis.text.x = element_text(angle = 90))

barplot_2 <- fviz_cos2(all_pca_4, 
                       choice = "var", 
                       axes = 2) + 
  theme(text = element_text(size = 10),
        axis.text.x = element_text(angle = 90))

barplot_3 <- fviz_cos2(all_pca_4, 
                       choice = "var", 
                       axes = 1:2) + 
  theme(text = element_text(size = 10),
        axis.text.x = element_text(angle = 90))

grid.arrange(barplot_1, barplot_2, barplot_3, nrow = 3, ncol = 1)

epPCA()
barplot_1 <- fviz_cos2(all_pca_5, 
                       choice = "var", 
                       axes = 1) + 
  theme(text = element_text(size = 10),
        axis.text.x = element_text(angle = 90))

barplot_2 <- fviz_cos2(all_pca_5, 
                       choice = "var", 
                       axes = 2) + 
  theme(text = element_text(size = 10),
        axis.text.x = element_text(angle = 90))

barplot_3 <- fviz_cos2(all_pca_5, 
                       choice = "var", 
                       axes = 1:2) + 
  theme(text = element_text(size = 10),
        axis.text.x = element_text(angle = 90))

grid.arrange(barplot_1, barplot_2, barplot_3, nrow = 3, ncol = 1)


Unsurprisingly, all of the resulting plots are identical - as should be. Let’s now evaluate the cos2 for the individuals:

prcomp()
barplot_1 <- fviz_cos2(all_pca_1, 
                       choice = "ind", 
                       axes = 1) + 
  theme(text = element_text(size = 10),
        axis.text.x = element_text(angle = 90))

barplot_2 <- fviz_cos2(all_pca_1, 
                       choice = "ind", 
                       axes = 2) + 
  theme(text = element_text(size = 10),
        axis.text.x = element_text(angle = 90))

barplot_3 <- fviz_cos2(all_pca_1, 
                       choice = "ind", 
                       axes = 1:2) + 
  theme(text = element_text(size = 10),
        axis.text.x = element_text(angle = 90))

grid.arrange(barplot_1, barplot_2, barplot_3, nrow = 3, ncol = 1)

princomp()
barplot_1 <- fviz_cos2(all_pca_2, 
                       choice = "ind", 
                       axes = 1) + 
  theme(text = element_text(size = 10),
        axis.text.x = element_text(angle = 90))

barplot_2 <- fviz_cos2(all_pca_2, 
                       choice = "ind", 
                       axes = 2) + 
  theme(text = element_text(size = 10),
        axis.text.x = element_text(angle = 90))

barplot_3 <- fviz_cos2(all_pca_2, 
                       choice = "ind", 
                       axes = 1:2) + 
  theme(text = element_text(size = 10),
        axis.text.x = element_text(angle = 90))

grid.arrange(barplot_1, barplot_2, barplot_3, nrow = 3, ncol = 1)

PCA()
barplot_1 <- fviz_cos2(all_pca_3, 
                       choice = "ind", 
                       axes = 1) + 
  theme(text = element_text(size = 10),
        axis.text.x = element_text(angle = 90))

barplot_2 <- fviz_cos2(all_pca_3, 
                       choice = "ind", 
                       axes = 2) + 
  theme(text = element_text(size = 10),
        axis.text.x = element_text(angle = 90))

barplot_3 <- fviz_cos2(all_pca_3, 
                       choice = "ind", 
                       axes = 1:2) + 
  theme(text = element_text(size = 10),
        axis.text.x = element_text(angle = 90))

grid.arrange(barplot_1, barplot_2, barplot_3, nrow = 3, ncol = 1)

dudi.pca()
barplot_1 <- fviz_cos2(all_pca_4, 
                       choice = "ind", 
                       axes = 1) + 
  theme(text = element_text(size = 10),
        axis.text.x = element_text(angle = 90))

barplot_2 <- fviz_cos2(all_pca_4, 
                       choice = "ind", 
                       axes = 2) + 
  theme(text = element_text(size = 10),
        axis.text.x = element_text(angle = 90))

barplot_3 <- fviz_cos2(all_pca_4, 
                       choice = "ind", 
                       axes = 1:2) + 
  theme(text = element_text(size = 10),
        axis.text.x = element_text(angle = 90))

grid.arrange(barplot_1, barplot_2, barplot_3, nrow = 3, ncol = 1)

epPCA()
barplot_1 <- fviz_cos2(all_pca_5, 
                       choice = "ind", 
                       axes = 1) + 
  theme(text = element_text(size = 10),
        axis.text.x = element_text(angle = 90))

barplot_2 <- fviz_cos2(all_pca_5, 
                       choice = "ind", 
                       axes = 2) + 
  theme(text = element_text(size = 10),
        axis.text.x = element_text(angle = 90))

barplot_3 <- fviz_cos2(all_pca_5, 
                       choice = "ind", 
                       axes = 1:2) + 
  theme(text = element_text(size = 10),
        axis.text.x = element_text(angle = 90))

grid.arrange(barplot_1, barplot_2, barplot_3, nrow = 3, ncol = 1)

Once again, identical outputs - as should be. Note that, once again, the amount of individuals makes it impossible to apreciate the X-axis’ labels (without an extreme zoom-in).

As was already stated, a more illustrative approach to visualizing a data array such as the contributions’ one or this one (with the quality of representations for both variables and individuals) can be achieved through the use of the corrplot() function from the corrplot package, which in this case showcases the quality of representation of every variable or individual within every Principal Component.

Let’s now observe the results of applying said function to the Wisconsin Breast Cancer Dataset. Note that the function’s very first argument corresponds to the cos2 results themselves (some previous functions used the PCA objects instead) - let’s evaluate the variables first:

prcomp()
corrplot(all_pca_var_1$cos2, is.corr=FALSE)

princomp()
corrplot(all_pca_var_2$cos2, is.corr=FALSE)

PCA()
corrplot(all_pca_var_3$cos2, is.corr=FALSE)

dudi.pca()
corrplot(as.matrix(all_pca_var_4$cos2), is.corr=FALSE)

epPCA()
corrplot(all_pca_var_5$cos2, is.corr=FALSE)

Once again, all of the resulting plots are identical - as should be. Note the use of the function as.matrix() within the dudi.pca() tab to transform once again the object of class data.frame to one of class matrix so that the function corrplot() can use it as an argument without issues.

Applying corrplot() upon the individuals’ cos2 would yield a massive plot due to the sheer amount of individuals. Given that, the following code snippets are not rendered in order to keep the document clean and readable.

prcomp()
corrplot(all_pca_ind_1$cos2, is.corr=FALSE)
princomp()
corrplot(all_pca_ind_2$cos2, is.corr=FALSE)
PCA()
corrplot(all_pca_ind_3$cos2, is.corr=FALSE)
dudi.pca()
corrplot(as.matrix(all_pca_ind_4$cos2), is.corr=FALSE)
epPCA()
corrplot(all_pca_ind_5$cos2, is.corr=FALSE)

2.2.10. Correlation Circle

As was stated previously, there is no well-accepted objective way to decide how many Principal Components are enough - this will depend on the specific field of application and the specific dataset (biomedical scenarios tend to require high cummulative variance since the people’s health is at play). However, the first few principal components are the most important ones in order to find interesting patterns in the data and undoubtedly the most important ones when it comes to representing the data.

The correlation circle showcases the correlation between the original dataset features/variables and the Principal Components via coordinates within a 2D circle: the dimension with the most explained variance is the first Principal Component (PC1) and is plotted on the horizontal axis, whereas the second most explanatory dimension is the second Principal Component (PC2) and placed on the vertical axis; the original features/variables are then projected upon this bi-dimensional factor space.

“The observations are represented by their projections, but the variables are represented by their correlations.”
Abdi and Williams, 2010

The correlation circle allows to easily visualize said correlations: if two given lines are pointing in the same direction that implies their associated features/variables are highly correlated, if they are orthogonal they are mostly unrelated and if they are pointing in opposite directions they are negatively correlated.

Plotting correlation circles within R requires the use of the fviz_pca_var() function from the factoextra package.

# Do not run this code snippet, as it is only here for illustration purposes
library(factoextra)
fviz_pca_var(X,
             axes = c(1, 2),
             geom = c("arrow", "text"),
             geom.var = geom,
             repel = FALSE,
             col.var = "black",
             fill.var = "white",
             alpha.var = 1,
             col.quanti.sup = "blue",
             col.circle = "grey70",
             select.var = list(name = NULL, cos2 = NULL, contrib = NULL),
             gradiant.cols = NULL,
             ...)
Let’s detail its arguments:
  • X: a PCA object.
  • axes: a numeric vector of length 2 specifying the dimensions to be plotted.
  • geom: a text specifying the geometry to be used for the graph - allowed values are the combination of c(“point”, “arrow”, “text”).
  • geom.var: as geom but for variables.
  • repel: TRUE or FALSE determines whether to use ggrepel to avoid overplotting text labels or not.
  • col.var: a color for variables.
  • fill.var: a fill color for variables.
  • alpha.var: controls the transparency of the variables’ colors.
  • col.quanti.sup: a color for the quantitative supplementary variables.
  • col.circle: a color for the correlation circle.
  • select.var: a selection of variables to be drawn.
  • gradient.cols: vector of colors to use for n-colour gradient. Allowed values include brewer and ggsci color palettes.

More information regarding the fviz_pca_var() function and all of its arguments is available in its associated RDocumentation page: https://www.rdocumentation.org/packages/factoextra/versions/1.0.7/topics/fviz_pca

Let’s observe the results of applying the function at hand to the previously constructed PCA objects:

prcomp()
fviz_pca_var(all_pca_1, 
             repel = TRUE
             )

princomp()
fviz_pca_var(all_pca_2, 
             repel = TRUE
             )

PCA()
fviz_pca_var(all_pca_3, 
             repel = TRUE
             )

dudi.pca()
fviz_pca_var(all_pca_4, 
             repel = TRUE
             )

epPCA()
fviz_pca_var(all_pca_5, 
             repel = TRUE
             )

Note that, despite them being rotated, the resulting plots are identical - as should be.

The function’s arguments allow the colors of the correlation circle to be based upon results for variables obtained through the get_pca_var() function, such as the contribution and quality of representation - the following code snippets showcase said examples (using fviz_pca_var() upon PCA()’s resulting object).

Contribution-based
fviz_pca_var(all_pca_3, 
             col.var = "contrib",
             repel = TRUE,
             gradient.cols = c("#FF0000", "#00FF00", "#0000FF")
             # The higher the contrib values, the closer to the last color (blue)
             )

Cos2-based
fviz_pca_var(all_pca_3, 
             col.var = "cos2",
             repel = TRUE,
             gradient.cols = c("#FF0000", "#00FF00", "#0000FF")
             # The higher the cos2 values, the closer to the last color (blue)
             )

It’s also possible to change the color of variables by groups defined by a qualitative/categorical variable, commonly known as a factor (and thus, factor-based coloring) - in a correlation circle, this can help to illustrate which groups of variables are highly correlated and which are not.

There are multiple approaches as to how to create appropriate clusters; this document and the following code snippets showcase the kmeans clustering algorithm, which aims to partition the points (in this case, the variables) into “k” groups (hence the name) so that the sum of squares from the points to the assigned cluster centers is minimized - the function to perform this clustering algorithm is called kmeans() and is a built-in R function (from R’s built-in stats package, like the prcomp() and princomp functions previously detailed).

# Do not run this code snippet, as it is only here for illustration purposes
kmeans(x, 
       centers, 
       iter.max = 10, 
       nstart = 1,
       algorithm = c("Hartigan-Wong", "Lloyd", "Forgy", "MacQueen"), 
       trace=FALSE)
Let’s detail the function’s arguments:
  • x: a numeric matrix of data.
  • centers: “k” - the number of clusters (or a set of cluster centers).
  • iter.max: the maximum number of iterations allowed.
  • nstart: if centers is a number (meaning that is not a set of cluster centers), then this argument determines the amount of random sets chosen.
  • algorithm: a character to determine the underlying algorithm of the k-means clustering - must be one of “Hartigan-Wong”, “Lloyd”, “Forgy” or “MacQueen”.
  • trace (only used in the default method, “Hartigan-Wong”): either a logical or an integer number - if positive (or true), tracing information on the progress of the algorithm is produced. Higher values may produce more tracing information.

More information about this function, its behavior and its arguments can be found in its associated RDocumentation page: https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/kmeans

The following code snippet showcases this function’s use in order to create “k” different clusters (the different tabs are meant to illustrate different values of “k”), which are then latter used within fviz_pca_var() to create a correlation circle with factor-based coloring. Note that main argument of kmeans() uses the coordinates from the results for variables previously obtained through get_pca_var() (also note that the object and variables at play are the ones obtained with the PCA() function).

k = 6
# Cluster Creation
res.all <- kmeans(all_pca_var_3$coord, centers = 6, nstart = 25)
grp <- as.factor(res.all$cluster)

# Correlation Circle
fviz_pca_var(all_pca_3, 
             col.var = grp,
             repel = TRUE,
             palette = "jco",
             legend.title = "Clusters")

k = 3
# Cluster Creation
res.all <- kmeans(all_pca_var_3$coord, centers = 3, nstart = 25)
grp <- as.factor(res.all$cluster)

# Correlation Circle
fviz_pca_var(all_pca_3, 
             col.var = grp,
             repel = TRUE,
             palette = "jco",
             legend.title = "Clusters")

k = 2
# Cluster Creation
res.all <- kmeans(all_pca_var_3$coord, centers = 2, nstart = 25)
grp <- as.factor(res.all$cluster)

# Correlation Circle
fviz_pca_var(all_pca_3, 
             col.var = grp,
             repel = TRUE,
             palette = "jco",
             legend.title = "Clusters")

2.2.11. Plot of individuals

If the correlation circle showcases the correlation between the original dataset features/variables and the Principal Components through a projection upon a bi-dimensional factor space defined by the two most relevant PCs, the plot of individuals delivers the same approach with the individuals themselves. As such, this plot of individuals is also useful to easily visualize the correlations among them based on their position within the graph: proximity means that the individuals are highly correlated whereas distanced individuals are mostly unrelated (data-wise).

Plotting correlation circles within R requires the use of the fviz_pca_ind() function from the factoextra package.

# Do not run this code snippet, as it is only here for illustration purposes
library(factoextra)
fviz_pca_ind(X,
             axes = c(1, 2),
             geom = c("point", "text"),
             geom.ind = geom,
             repel = FALSE,
             habillage = "none",
             palette = NULL,
             addEllipses = FALSE,
             col.ind = "black",
             fill.ind = "white",
             col.ind.sup = "blue",
             alpha.ind = 1,
             select.ind = list(name = NULL, cos2 = NULL, contrib = NULL),
             ...)
Let’s detail its arguments:
  • X: a PCA object.
  • axes: a numeric vector of length 2 specifying the dimensions to be plotted.
  • geom: a text specifying the geometry to be used for the graph - allowed values are the combination of c(“point”, “arrow”, “text”).
  • geom.ind: as geom but for the individuals.
  • repel: TRUE or FALSE determines whether to use ggrepel to avoid overplotting text labels or not.
  • habillage: an optional factor variable for coloring the observations by groups.
  • palette: the color palette to be used for coloring or filling by groups.
  • addEllipses: TRUE or FALSE determines whether to draw ellipses around the individuals or not (only when habillage is not none
  • col.ind: a color for individuals.
  • fill.ind: a fill color for individuals.
  • col.ind.sup: color for supplementary individuals.
  • alpha.ind: controls the transparency of individuals’ colors.
  • select.ind: a selection of individuals to be drawn.
  • gradient.cols: vector of colors to use for n-colour gradient. Allowed values include brewer and ggsci color palettes.

Note how the arguments are almost identical to those of the fviz_pca_var() function, albeit not exactly equal. More information regarding the fviz_pca_ind() function and all of its arguments is available in its associated RDocumentation page (which is the same as fviz_pca_var()‘s, holding both functions’ information): https://www.rdocumentation.org/packages/factoextra/versions/1.0.7/topics/fviz_pca

Let’s observe the results of applying the function at hand to the previously constructed PCA objects:

prcomp()
fviz_pca_ind(all_pca_1, 
             repel = TRUE
             )

princomp()
fviz_pca_ind(all_pca_2, 
             repel = TRUE
             )

PCA()
fviz_pca_ind(all_pca_3, 
             repel = TRUE
             )

dudi.pca()
fviz_pca_ind(all_pca_4, 
             repel = TRUE
             )

epPCA()
fviz_pca_ind(all_pca_5, 
             repel = TRUE
             )

Once again, the resulting plots are identical - as should be.

The function’s arguments allow the colors of the plot of individuals to be based upon the results for individuals obtained through the get_pca_ind() function, such as the contribution and quality of representation - the following code snippets showcase said examples (using fviz_pca_ind() upon PCA()’s resulting object).

Contribution-based
fviz_pca_ind(all_pca_3, 
             col.ind = "contrib",
             repel = TRUE,
             gradient.cols = c("#FF0000", "#00FF00", "#0000FF"),
             # The higher the contrib values, the closer to the last color (blue)
             label = "none" # hide individual labels - no sensible information
             )

Cos2-based
fviz_pca_ind(all_pca_3, 
             col.ind = "cos2",
             repel = TRUE,
             gradient.cols = c("#FF0000", "#00FF00", "#0000FF"),
             # The higher the cos2 values, the closer to the last color (blue)
             label = "none" # hide individual labels - no sensible information
             )

It’s also possible to change the color of variables by groups defined by a qualitative/categorical variable, commonly known as a factor (and thus, factor-based coloring) - in a plot of individuals, this can help to illustrate which individuals are highly correlated and which are not.

As was the case with the variables, the clusters that define the factors are determined through the use of the kmeans() function. The following code snippet showcases this function’s use in order to create “k” different clusters (the different tabs are meant to illustrate different values of “k”), which are then latter used within fviz_pca_ind() to create a plot of individuals with factor-based coloring. Note that main argument of kmeans() uses the coordinates from the results for individuals previously obtained through get_pca_ind() (also note that the object and individuals at play are the ones obtained with the PCA() function).

k = 6
# Cluster Creation
res.all <- kmeans(all_pca_ind_3$coord, centers = 6, nstart = 25)
grp <- as.factor(res.all$cluster)

# Correlation Circle
fviz_pca_ind(all_pca_3, 
             col.ind = grp,
             repel = TRUE,
             palette = "jco",
             legend.title = "Clusters",
             label = "none" # hide individual labels - no sensible information
             )

k = 3
# Cluster Creation
res.all <- kmeans(all_pca_ind_3$coord, centers = 3, nstart = 25)
grp <- as.factor(res.all$cluster)

# Correlation Circle
fviz_pca_ind(all_pca_3, 
             col.ind = grp,
             repel = TRUE,
             palette = "jco",
             legend.title = "Clusters",
             label = "none" # hide individual labels - no sensible information
             )

k = 2
# Cluster Creation
res.all <- kmeans(all_pca_ind_3$coord, centers = 2, nstart = 25)
grp <- as.factor(res.all$cluster)

# Correlation Circle
fviz_pca_ind(all_pca_3, 
             col.ind = grp,
             repel = TRUE,
             palette = "jco",
             legend.title = "Clusters",
             label = "none" # hide individual labels - no sensible information
             )

Adding ellipses helps to visualize the clusters - the argument addEllipses controls that with a boolean, as previously stated and as shown in the following code snippets.

k = 6
# Cluster Creation
res.all <- kmeans(all_pca_ind_3$coord, centers = 6, nstart = 25)
grp <- as.factor(res.all$cluster)

# Correlation Circle
fviz_pca_ind(all_pca_3, 
             col.ind = grp,
             repel = TRUE,
             palette = "jco",
             addEllipses = TRUE,
             legend.title = "Clusters",
             label = "none" # hide individual labels - no sensible information
             )

k = 3
# Cluster Creation
res.all <- kmeans(all_pca_ind_3$coord, centers = 3, nstart = 25)
grp <- as.factor(res.all$cluster)

# Correlation Circle
fviz_pca_ind(all_pca_3, 
             col.ind = grp,
             repel = TRUE,
             palette = "jco",
             addEllipses = TRUE,
             legend.title = "Clusters",
             label = "none" # hide individual labels - no sensible information
             )

k = 2
# Cluster Creation
res.all <- kmeans(all_pca_ind_3$coord, centers = 2, nstart = 25)
grp <- as.factor(res.all$cluster)

# Correlation Circle
fviz_pca_ind(all_pca_3, 
             col.ind = grp,
             repel = TRUE,
             palette = "jco",
             addEllipses = TRUE,
             legend.title = "Clusters",
             label = "none" # hide individual labels - no sensible information
             )

2.2.12. Biplot

As the correlation circle and the plot of individuals, the biplot is a graphing method which approximates the multi-dimensional dataset (along with its data points) by a bi-dimensional matrix defined by the two most relevant Principal Components; in fact, a biplot is kind of combination of both those plots within a single graph and, as such, its plot is dependent on functions from the factoextra package: fviz_pca_biplot() and fviz_pca(), whose behavior is identical (one is but an alias of the other).

# Do not run this code snippet, as it is only here for illustration purposes
library(factoextra)
fviz_pca(X, ...)
fviz_pca_biplot(X,
                axes = c(1, 2),
                geom = c("point", "text"),
                geom.ind = geom,
                geom.var = c("arrow", "text"),
                col.ind = "black",
                fill.ind = "white",
                col.var = "steelblue",
                fill.var = "white",
                gradient.cols = NULL,
                label = "all",
                invisible = "none",
                repel = FALSE,
                habillage = "none",
                palette = NULL,
                addEllipses = FALSE,
                title = "PCA - Biplot",
                ...)

It can be observed that its arguments are a combination of those available for the fviz_pca_var() and fviz_pca_ind() - there are arguments to tweak either, which makes sense given that the biplot itself is a combination of the correlation circle (for variables) and the plot of individuals. As such, it is also possible to create a biplot based upon the results obtained through get_pca_var() and get_pca_ind(). The following code snippets showcase that: the first tab is the most basic biplot (a colorless one) whereas the second and third tab illustrate a biplot based upon contributions and quality of representation (cos2) respectively (using fviz_pca_biplot() upon PCA()’s resulting object).

Colorless
fviz_pca_biplot(all_pca_3, 
                col.ind = wbcd$diagnosis, 
                col="black",
                palette = "jco",
                geom = "point",
                repel=TRUE,
                legend.title="Diagnosis", 
                addEllipses = TRUE)

Contribution-based
fviz_pca_biplot(all_pca_3, 
                col.ind = wbcd$diagnosis, 
                col="black",
                palette = "jco",
                geom = "point",
                repel=TRUE,
                legend.title="Diagnosis", 
                addEllipses = TRUE)

Cos2-based
fviz_pca_biplot(all_pca_3, 
                col.ind = wbcd$diagnosis, 
                col="black",
                palette = "jco",
                geom = "point",
                repel=TRUE,
                legend.title="Diagnosis", 
                addEllipses = TRUE)

It is worth noting that the RDocumentation page documenting both fviz_pca_var() and fviz_pca_ind() also details the biplot functions at hand, so any additional information regarding these functions, their behavior and their arguments is available at https://www.rdocumentation.org/packages/factoextra/versions/1.0.7/topics/fviz_pca

Let’s observe the results of applying the function fviz_pca_biplot() to the previously constructed PCA objects.

3. Machine Learning

Machine learning is a branch/subset of artificial intelligence (AI) and an important component of the growing field of data science which mimics the way humans learn by using certain algorithms that improve their accuracy upon training, a process that loops through the sampled data contrasting guesses with real values to evaluate the algorithm’s accuracy so that it can develop a statistical model which maximizes said accuracy and best fits the supplied data. These models can be used in classification and regression-based scenarios (to predict integers/factors and continuous values, respectively) as they uncover key insights and relationships from within the data that are hidden to the human eye and could take years to take grasp of.

This chapter aims to apply a variety of machine learning approaches to the dataset at hand so that the algorithms can determine whether any given patient’s cancer is benign or malign. The goals of this classification exercise include the exploration of the machine learning approaches, its application within R and a comparison of their accuracy in order to determine/choose the one that best fits within this scenario.

3.1. Machine Learning Algorithms

Machine learning algorithms are often categorized as either supervised learning or unsupervised learning. The former, as the name suggests, requires a supervisor/user that feeds the algorithm with well labeled data so that the algorithm can learn/train whereas the latter (unsupervised) is being fed information that is neither classified nor labeled allowing the algorithm to act without guidance, grouping such information in clusters according to similarities, patters and differences found in the data. Note that due to the nature of this clustering process, unsupervised machine learning approaches are rarely seen outside of classification scenarios, but since this exercise is of such kind it works well with both supervised and unsupervised algorithms.

The very first step is to divide the dataset at hand in two subsets: a training one, which will be used to train/educate the algorithm, and a testing one, which will be used to evaluate the algorithm’s effectiveness. This subdivision can be achieved using the R built-in functions nrow() and sample() to randomly select row indexes from within the dataset and thus create both the training and the testing sets pseudo-randomly.

train_index <- sample(1:nrow(wbcd), 0.7*nrow(wbcd))
train_set <- wbcd[train_index,]
test_set <- wbcd[-train_index,]

dim(wbcd) # 569 test data
## [1] 569  31
dim(train_set) # 398/569 test data (70%)
## [1] 398  31
dim(test_set) # 171/569 test data (30%)
## [1] 171  31

The previous code snippet showcases the construction of the training and testing sets, although built-in R functions are not the usual/preferred approach. The caret package (short for Classification And Regression Training) contains functions to streamline many machine learning tasks and, as such, it is undoubtedly one of the most popular libraries for the matter. Among its many functions lies createDataPartition(), which divides the working dataset into the training and testing subsets while keeping the classification ratio constant within each set - that means that if the dataset is to be divided into a training set that holds 70% of the data and a testing set that holds the missing 30%, then each will have the same factor distribution (from which the algorithm will learn to classify) as the original dataset, avoiding certain unfavorable scenarios where there might not be a sufficient amount of a given factor within the training set to allow the algorithm to develop a fitting model.

# Do not run this code snippet, as it is only here for illustration purposes
library(caret)
createDataPartition(
  y,
  times = 1,
  p = 0.5,
  list = TRUE,
  groups = min(5, length(y))
)
The arguments of the function are as follows:
  • y: a vector of outcomes.
  • times: the number of partitions to create.
  • p: the percentage of data that goes to training.
  • list: whether to hold the results within a list or within a matrix.
  • groups: if y (the vector of outcomes) is numerical, then this argument defines the number of breaks in the quantiles.

Note that, as opposed to the previous approach, the main argument of the function does not ask for the dataset and uses a vector of outcomes instead (in the case of this exercise said vector of outcomes is the diagnosis array). More information about the function, its behavior and its arguments can be found in its associated RDocumentation page: https://www.rdocumentation.org/packages/caret/versions/6.0-90/topics/createDataPartition

The following code snippet showcases the subset construction using the createDataPartition() function. Note that the balance is 70% for the training set and 30% for the testing one, which is a commonly used ratio for the train-test split (an in fact such was the ratio previously used with the R built-in functions).

library(caret)
train_index <- createDataPartition(wbcd$diagnosis, times = 1, p = 0.7, list = FALSE)
train_set <- wbcd[train_index,]
test_set <- wbcd[-train_index,]

dim(wbcd) # 569 test data
## [1] 569  31
dim(train_set) # 398/569 test data (70%)
## [1] 399  31
dim(test_set) # 171/569 test data (30%)
## [1] 170  31

Once the training and testing sets are constructed, the next step is to apply the machine learning algorithm of choice. Many of these algorithms are covered within this chapter, and every single one of them is worthy enough of a document of its own detailing the intricacies of their behavior and inner working - such task goes beyond the scope of this project although there will be an overview briefly describing each of these machine learning approaches.

3.1.1. Linear Regression

Linear regression can be considered a machine learning algorithm despite its simplicity and rigidness (which makes it unreliable in most cases). It works rather well under certain scenarios and can be used to obtain a quick reference given the algorithm’s speed (processing-wise it is less demanding than most other machine learning approaches).

The R built-in stats package already bundles a set of functions to build linear regression models, namely lm() and glm(). Let’s detail the former, which is the one related to linear regression (the latter corresponds to logistic regression, which will be detailed later on):

# Do not run this code snippet, as it is only here for illustration purposes
library(stats)
lm(formula, 
   data,
   subset,
   weights,
   na.action,
   method = "qr",
   model = TRUE,
   x = FALSE,
   y = FALSE,
   qr = TRUE,
   singular.ok = TRUE,
   contrasts = NULL,
   offset,
   ...
)
Let’s detail its arguments:
  • formula: an object of class formula (or one that can be coarced to that class).
  • data: the data frame containing the variables in the model.
  • subset: an optional vector specifying a subset of observations to be used in the fitting process.
  • weights: an optional vector of weights to be used in the fitting process.
  • na.action: a function which indicates what should happen when the data contain NA.
  • method: the method to be used for fitting, although at the time of writing only method = “qr”
  • model, x, y, qr: TRUE or FALSE determines whether to return the corresponding components of the fit (the model frame, the model matrix, the response, the QR decomposition respectively).
  • singular.ok: if FALSE a singular fit yields an error.
  • contrasts: an optional list of contrasts to use with the model matrix.
  • offset: this can be used to specify an a priori known component to be included in the linear predictor during fitting.

More information regarding the lm() function and all of its arguments is available in its associated RDocumentation page: https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/lm

After feeding the function with the proper information, a model is built:

library(stats)
model_linear <- lm(diagnosis ~ ., data=train_set)

Once the model is built, the function predict() is used to apply the constructed model upon the testing subset’s features in order to classify the set’s items. Note that the predicted data needs to be properly factored, for which one can use the factor() function along with a cutoff value (in this case the mean predicted value is used) to tag the guesses.

The function confusionMatrix() from the caret package can be used to visualize the accuracy/success of the algorithm. Confusion matrices are widely used in the data science and machine learning fields since they showcase said accuracy/success through various metrics in an easy-to-understand table. The following code snippet illustrates the use of these functions (predict() and confusionMatrix()), both of which are core functions within the data science and machine learning fields.

library(caret)
prediction_linear <- predict(model_linear, test_set)
prediction_linear <- factor(ifelse(prediction_linear > mean(prediction_linear), "Malignant", "Benign"))
cm_linear <- confusionMatrix(prediction_linear, test_set$diagnosis)
cm_linear
## Confusion Matrix and Statistics
## 
##            Reference
## Prediction  Benign Malignant
##   Benign       106         5
##   Malignant      1        58
##                                           
##                Accuracy : 0.9647          
##                  95% CI : (0.9248, 0.9869)
##     No Information Rate : 0.6294          
##     P-Value [Acc > NIR] : <2e-16          
##                                           
##                   Kappa : 0.9233          
##                                           
##  Mcnemar's Test P-Value : 0.2207          
##                                           
##             Sensitivity : 0.9907          
##             Specificity : 0.9206          
##          Pos Pred Value : 0.9550          
##          Neg Pred Value : 0.9831          
##              Prevalence : 0.6294          
##          Detection Rate : 0.6235          
##    Detection Prevalence : 0.6529          
##       Balanced Accuracy : 0.9556          
##                                           
##        'Positive' Class : Benign          
## 

There are numerous metrics to measure a machine learning algorithm accuracy/success, but it is practical/useful to have a one number summary. The overall accuracy is somewhat misleading since it does not account for the sensitivity and specificity of the model, so alternatives such as the balanced accuracy (which is the average of specificity and sensitivity) or the F1 score (the harmonic average of precision and recall) are preferred.
Accessing these values from the confusion matrix can be achieved using $byClass as is showcased in the following code snippet:

acc_linear <- cm_linear$byClass['Balanced Accuracy']
F1_linear <- cm_linear$byClass['F1']
print(c(acc_linear, F1_linear))
## Balanced Accuracy                F1 
##         0.9556446         0.9724771

3.1.2. Logistic Regression

A slightly more advanced version of the linear model just described is found within logistic regression. Its fit follows a logistic curve pattern allowing it to better represent non-linear data distributions.

As was the case with the lm() function, the function glm() comes from the built-in stats package, meaning that no additional libraries need to be imported.

# Do not run this code snippet, as it is only here for illustration purposes
library(stats)
glm(formula,
    data,
    family = gaussian,
    weights,
    subset,
    na.action,
    start = NULL,
    etastart,
    mustart,
    offset,
    control = list(...),
    method = "glm.fit",
    model = TRUE,
    x = FALSE,
    y = TRUE,
    singular.ok = TRUE,
    contrasts = NULL,
    ...
)
Let’s detail its arguments:
  • formula: an object of class formula (or one that can be coarced to that class).
  • data: the data frame containing the variables in the model.
  • family: a description of the error distribution and link function to be used in the model (e.g. a character string naming a family function, a family function or the result of a call to a family function).
  • weights: an optional vector of weights to be used in the fitting process.
  • subset: an optional vector specifying a subset of observations to be used in the fitting process.
  • na.action: a function which indicates what should happen when the data contain NA data.
  • start: starting values for the parameters in the linear predictor.
  • etastart: starting values for the linear predictor.
  • mustart: starting values for the vector of means.
  • offset: this can be used to specify an a priori known component to be included in the linear predictor during fitting.
  • control: a list of parameters for controlling the fitting process.
  • method: the method to be used for fitting. The default method “glm.fit” uses iteratively reweighted least squares (IWLS) whereas the alternative “model.frame” returns the model frame and does no fitting.
  • model: a logical value indicating whether model frame should be included as a component of the returned value.
  • x, y: logical values indicating whether the response vector and model matrix used in the fitting process should be returned as components of the returned value.
  • singular.ok: if FALSE a singular fit yields an error.
  • contrasts: an optional list of contrasts to use with the model matrix.

More information regarding the lm() function and all of its arguments is available in its associated RDocumentation page: https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/glm

The following code snippet showcases the construction of the model, its associated prediction and confusion matrix and the most relevant one number summary metrics:

library(stats)
library(caret)

model_logistic <- glm(diagnosis ~ ., data=train_set, family = "binomial")
prediction_logistic <- predict(model_logistic, test_set)
prediction_logistic <- factor(ifelse(prediction_logistic > mean(prediction_logistic), "Malignant", "Benign"))
cm_logistic <- confusionMatrix(prediction_logistic, test_set$diagnosis)
cm_logistic
## Confusion Matrix and Statistics
## 
##            Reference
## Prediction  Benign Malignant
##   Benign       104         7
##   Malignant      3        56
##                                           
##                Accuracy : 0.9412          
##                  95% CI : (0.8945, 0.9714)
##     No Information Rate : 0.6294          
##     P-Value [Acc > NIR] : <2e-16          
##                                           
##                   Kappa : 0.8722          
##                                           
##  Mcnemar's Test P-Value : 0.3428          
##                                           
##             Sensitivity : 0.9720          
##             Specificity : 0.8889          
##          Pos Pred Value : 0.9369          
##          Neg Pred Value : 0.9492          
##              Prevalence : 0.6294          
##          Detection Rate : 0.6118          
##    Detection Prevalence : 0.6529          
##       Balanced Accuracy : 0.9304          
##                                           
##        'Positive' Class : Benign          
## 

acc_logistic <- cm_logistic$byClass['Balanced Accuracy']
F1_logistic <- cm_logistic$byClass['F1']
print(c(acc_logistic, F1_logistic))
## Balanced Accuracy                F1 
##         0.9304258         0.9541284

3.1.3. C5.0

The C50 package contains an interface to the C5.0 classification model. More information regarding this package, its functions and their inner working can be found in https://cran.r-project.org/web/packages/C50/vignettes/C5.0.html

The most important note about the package is that it uses decision trees as its core. Decision trees can be used to visually and explicitly represent decisions and decision making through, as the name implies, a tree-like model. Visually speaking, these trees are drawn upside down with its root at the top, branching through conditionals through a downwards reading - the end of a branch is known as the “leaf” and represents the algorithm’s decision.
Decision trees are used in machine learning covering both classification and regression scenarios: classification trees predicts the class/factor of an item given a set of features whereas regression trees behave in the same manner although predicting continuous values instead. The C50 package is built around the C5.0() function, which fits a classification tree model upon the dataset in order to train the algorithm so that the tree model learns which features to choose and what conditions to use for splitting/branching (by constantly looping through the constructed tree and comparing the obtained hypothetical results with the real ones provided by the training set).

# Do not run this code snippet, as it is only here for illustration purposes
library(C50)
C5.0(
  x,
  y,
  trials = 1,
  rules = FALSE,
  weights = NULL,
  control = C5.0Control(),
  costs = NULL,
  ...
)
Let’s detail its arguments:
  • x: the predictors (the data features required for the algorithm so that it can develop the decision tree).
  • y: a factor vector with 2 or more levels; note that the C5.0() function fits a classification tree, and this argument clearly shows that the function can not fit a regression fit since it only accepts factors as input.
  • trials: an integer specifying the number of boosting iterations. A value of one indicates that a single model is used.
  • rules: a logical determining whether or not the tree should be decomposed into a rule-based model.
  • weights: an optional numeric vector of case weights.
  • control: a list of control parameters (see https://www.rdocumentation.org/packages/C50/versions/0.1.5/topics/C5.0Control)
  • costs: a matrix of costs associated with the possible errors.

More information regarding the C5.0() function and all of its arguments is available in its associated RDocumentation page: https://www.rdocumentation.org/packages/C50/versions/0.1.5/topics/C5.0.default

The following code snippet showcases the construction of the model, its associated prediction and confusion matrix and the most relevant one number summary metrics:

library(C50)
library(caret)

model_C50 <- C5.0(train_set[,-1], train_set$diagnosis)
prediction_C50 <- predict(model_C50, test_set[,-1])
cm_C50 <- confusionMatrix(prediction_C50, test_set$diagnosis)
cm_C50
## Confusion Matrix and Statistics
## 
##            Reference
## Prediction  Benign Malignant
##   Benign       104         5
##   Malignant      3        58
##                                           
##                Accuracy : 0.9529          
##                  95% CI : (0.9094, 0.9795)
##     No Information Rate : 0.6294          
##     P-Value [Acc > NIR] : <2e-16          
##                                           
##                   Kappa : 0.8985          
##                                           
##  Mcnemar's Test P-Value : 0.7237          
##                                           
##             Sensitivity : 0.9720          
##             Specificity : 0.9206          
##          Pos Pred Value : 0.9541          
##          Neg Pred Value : 0.9508          
##              Prevalence : 0.6294          
##          Detection Rate : 0.6118          
##    Detection Prevalence : 0.6412          
##       Balanced Accuracy : 0.9463          
##                                           
##        'Positive' Class : Benign          
## 

acc_C50 <- cm_C50$byClass['Balanced Accuracy']
F1_C50 <- cm_C50$byClass['F1']
print(c(acc_C50, F1_C50))
## Balanced Accuracy                F1 
##         0.9462988         0.9629630

Note that the prediction metrics can be improved by changing the arguments constructing the function, namely the trials input - the following code snippet showcases a loop within which a table is created to compare the accuracy results obtained with various trials values (in between 1 and 50). Note that this process is usually known as tuning, and consists in tweaking the model so that it fits/works better with the data it is being fed so that a higher accuracy can be achieved.

acc_C50_array <- NULL
F1_C50_array <- NULL

for(i in 1:50){
    model_C50_temp <- C5.0(train_set[,-1], train_set$diagnosis, trials = i)      
    prediction_C50_temp <- predict(model_C50_temp, test_set[,-1]) 
    cm_C50_temp <- confusionMatrix(prediction_C50_temp, test_set$diagnosis)
    acc_C50_array[i] <- cm_C50_temp$byClass['Balanced Accuracy']
    F1_C50_array[i] <- cm_C50_temp$byClass['F1']
}

acc_C50_df <- data.frame(trials = seq(1,50), acc = acc_C50_array)
acc_C50_optimal <- subset(acc_C50_df, acc == max(acc))[1,]

F1_C50_df <- data.frame(trials = seq(1,50), F1 = F1_C50_array)
F1_C50_optimal <- subset(F1_C50_df, F1 == max(F1))[1,]

print(c(acc_C50_optimal, F1_C50_optimal)) # At the time of writing these values coincide, but that might not be the case
## $trials
## [1] 39
## 
## $acc
## [1] 0.9827177
## 
## $trials
## [1] 5
## 
## $F1
## [1] 0.9861751

tuning_C50_df <- data.frame(trials = seq(1,50), success = 0.5 * (acc_C50_array + F1_C50_array)) # We average F1 and balanced accuracy to measure success
library(dplyr) # For the mutate() function used to add balanced accuracy and F1 values to the dataframe
tuning_C50 <- subset(tuning_C50_df, success == max(success))[1,] %>% mutate(acc = acc_C50_df[max(trials), 2], F1 = F1_C50_df[max(trials), 2])
print(tuning_C50)
##    trials   success       acc        F1
## 39     39 0.9843166 0.9827177 0.9859155

Through tuning, a higher accuracy/success can be achieved. Using a value of 39 for the algorithm trials increases the balanced accuracy from 0.9462988 to 0.9827177 and the F1 score from 0.962963 to 0.9859155, which is a considerable improvement upon the previous results. To see how the number of trials affects the accuracy values a graph can be plotted - for illustration purposes, said plot is performed via two different libraries: highcharter and ggplot2.

highcharter
sub_C50 <- paste("Optimal number of trials is", tuning_C50$trials, "with an averaged success (balanced accuracy and F1 score) of ", tuning_C50$success)

library(highcharter)
hchart(tuning_C50_df, 'line', hcaes(trials, success)) %>%
  hc_title(text = "Averaged success with varying trials (C5.0)") %>%
  hc_subtitle(text = sub_C50) %>%
  hc_add_theme(hc_theme_google()) %>%
  hc_xAxis(title = list(text = "Number of trials")) %>%
  hc_yAxis(title = list(text = "Averaged success"))
ggplot2
sub_C50 <- paste("Optimal number of trials is", tuning_C50$trials, "with an averaged success (balanced accuracy and F1 score) of ", tuning_C50$success)

library(ggplot2)
ggplot(tuning_C50_df, aes(trials, success)) + 
  geom_line() + 
  geom_point() + theme_minimal() + 
  labs(title = "Averaged success with varying trials (C5.0)",
       subtitle = sub_C50,
       x = "Number of trials",
       y = "Averaged success")

3.1.4. rpart

The rpart package allows to easily build classification or regression models with a very general structure (meaning that they can be applied in multiple cases/scenarios) using a two stage procedure which follow a decision tree behavior (already detailed within the C5.0 chapter). More information regarding this package can be found in its associated RDocumentation page: https://www.rdocumentation.org/packages/rpart

The core of the package lies within its main function: rpart().

# Do not run this code snippet, as it is only here for illustration purposes
library(rpart)
rpart(formula, 
      data,
      weights,
      subset,
      na.action = na.rpart,
      method,
      model = FALSE,
      x = FALSE, 
      y = TRUE, 
      parms, 
      control, 
      cost,
      ...)
The function’s arguments are as follows:
  • formula: the model formula (should look like feature ~ predictor).
  • data: an optional data frame in which to interpret the variables named in the formula.
  • weights: optional case weights.
  • subset: optional expression saying that only a subset of the rows of the data should be used in the fit.
  • na.action: the default action deletes all observations for which y is missing, but keeps those in which one or more predictors are missing.
  • method: one of “anova”, “poisson”, “class” or “exp”. If method is missing then the routine tries to make an intelligent guess, which is one of the strengths of this function/package:
    • If y is a survival object, then method = “exp” is assumed.
    • If y has 2 columns, then method = “poisson” is assumed.
    • If y is a factor, then method = “class” is assumed.
    • Otherwise method = “anova” is assumed.
  • model: can be a boolean, where TRUE or FALSE determine whether to keep a copy of the model frame in the result or not; can also be a model frame, in which case said frame is used rather than constructing new data.
  • x: TRUE or FALSE determine whether to keep a copy of the x matrix in the result or not.
  • y: TRUE or FALSE determine whether to keep a copy of the dependent variable in the result or not. If missing and model is supplied this defaults to FALSE.
  • parms: optional parameters for the splitting function.
    • Anova splitting has no parameters.
    • Poisson splitting has a single parameter, the coefficient of variation of the prior distribution on the rates (default is 1).
    • Exponential splitting has the same parameter as Poisson.
    • For classification splitting, the list can contain any of: the vector of prior probabilities (component prior, the loss matrix (component loss) or the splitting index (component split).
      • The priors must be positive and sum to 1. Note that the default priors are proportional to the data counts.
      • The loss matrix must have zeros on the diagonal and positive off-diagonal elements. Note that the losses default to 1.
      • The splitting index can be gini or information.
  • control: the function accepts additional arguments that can be passed to rpart.control; see https://www.rdocumentation.org/packages/rpart/versions/4.1-15/topics/rpart.control
  • cost: a vector of non-negative costs, one for each variable in the model (defaults to one for all variables). These are scalings to be applied when considering splits, so the improvement on splitting on a variable is divided by its cost in deciding which split to choose.

More information about this function, its behavior and its arguments can be found in its associated RDocumentation page: https://www.rdocumentation.org/packages/rpart/versions/4.1-15/topics/rpart

Despite the amount of customization this function allows, the following code snippet showcases its use, prediction, confusion matrix and relevant one number summary metrics:

library(rpart)
library(caret)
model_rpart <- rpart(diagnosis ~ ., data = train_set)
prediction_rpart <- predict(model_rpart, test_set[,-1], type = "class")
cm_rpart <- confusionMatrix(prediction_rpart, test_set$diagnosis)
cm_rpart
## Confusion Matrix and Statistics
## 
##            Reference
## Prediction  Benign Malignant
##   Benign       103         4
##   Malignant      4        59
##                                           
##                Accuracy : 0.9529          
##                  95% CI : (0.9094, 0.9795)
##     No Information Rate : 0.6294          
##     P-Value [Acc > NIR] : <2e-16          
##                                           
##                   Kappa : 0.8991          
##                                           
##  Mcnemar's Test P-Value : 1               
##                                           
##             Sensitivity : 0.9626          
##             Specificity : 0.9365          
##          Pos Pred Value : 0.9626          
##          Neg Pred Value : 0.9365          
##              Prevalence : 0.6294          
##          Detection Rate : 0.6059          
##    Detection Prevalence : 0.6294          
##       Balanced Accuracy : 0.9496          
##                                           
##        'Positive' Class : Benign          
## 

acc_rpart <- cm_rpart$byClass['Balanced Accuracy']
F1_rpart <- cm_rpart$byClass['F1']
print(c(acc_rpart, F1_rpart))
## Balanced Accuracy                F1 
##         0.9495624         0.9626168

At the time of writing, the results obtained with this function are somewhat unsatisfactory. Tuning this model could theoretically yield better results, but my tinkering with its arguments has only led to worse accuracy values - here’s an interesting article about the function’s behavior that could help anyone tune the function to its needs and data, but this document needs to move on to the following machine learning approach (time is but the most valuable currency).

3.1.5. RWeka

Weka is a collection of machine learning algorithm for data mining tasks written in Java, containing tools for data pre-processing, classification, regression, clustering, association rules, and visualization. The package RWeka is but an R interface to said collection, bringing the Weka toolset to the R environment.

# Do not run this code snippet, as it is only here for illustration purposes
library("RWeka")
JRip(formula, 
     data, 
     subset, 
     na.action,
     control = Weka_control(), 
     options = NULL)

M5Rules(formula, 
        data, 
        subset, 
        na.action,
        control = Weka_control(), 
        options = NULL)

OneR(formula, 
     data, 
     subset, 
     na.action,
     control = Weka_control(), 
     options = NULL)

PART(formula, 
     data, 
     subset, 
     na.action,
     control = Weka_control(), 
     options = NULL)
The functions’ arguments are as follows:
  • formula: a symbolic description of the model to be fit.
  • data: an optional data frame containing the variables in the model.
  • subset: an optional vector specifying a subset of observations to be used in the fitting process.
  • na.action: the default action deletes all observations for which y is missing, but keeps those in which one or more predictors are missing.
  • control: an object of class Weka_control giving options to be passed to the Weka learner (see the associated documentation).
  • options: a named list of further options, or NULL (default).

Note that additional information about these functions, their behavior and their arguments can be found in their associated RDocumentation page.

The JRip() function is based upon “RIPPER”, an acronym standing for “Repeated Incremental Pruning to Produce Error Reduction” which, as the name suggests, prunes a decision tree to avoid overfitting and minimize/reduce (potential) error. Its foundations are detailed in Cohen’s work (its author/creator).

The M5Rules() function generates a decision tree using a “separate-and-conquer” approach where each iteration constructs a tree using a given set of rules and turns the “best” leaf into a rule. However, M5Rules() is strictly used within regression exercises, so it cannot be applied in classification tasks such as the ones being performed throughout this document.

The OneR() function builds a simple yet effective and useful “one-rule” classifier, also known as Holte’s classifier or Holte’s 1R classifier after its creator/developer (see the original paper or Nevill-Manning et al.’s take on it). While technically based upon “one feature” and not “one rule”, fact is that its name comes from the fact that it finds exactly one feature (and one or more feature values for that feature) to classify data instances, which makes it a fast and simple approach although it is worth noting that it is not known for its good prediction performance (it is rather recommended for teaching purposes and for lower-bound performance baselines in real-world applications).

The PART() function is based upon “RIPPER”, an acronym standing for “Repeated Incremental Pruning to Produce Error Reduction” which, as the name suggests, prunes a decision tree to avoid overfitting and minimize/reduce (potential) error. Its foundations are detailed in Cohen’s work (its author/creator).

The following code snippet showcases the previously detailed functions albeit M5Rules, which is used in regression exercises and does not fit classification exercises such as this one. The showcase illustrates the construction of the models, their associated predictions and confusion matrices as well as the most relevant one number summary metrics (which can be used to compare their predictions’ success):

library("RWeka")

model_JRip <- JRip(diagnosis~., data = train_set)
predict_JRip <- predict(model_JRip, test_set[,-1])
cm_JRip  <- confusionMatrix(predict_JRip, test_set$diagnosis)   
cm_JRip
## Confusion Matrix and Statistics
## 
##            Reference
## Prediction  Benign Malignant
##   Benign       101         3
##   Malignant      6        60
##                                           
##                Accuracy : 0.9471          
##                  95% CI : (0.9019, 0.9755)
##     No Information Rate : 0.6294          
##     P-Value [Acc > NIR] : <2e-16          
##                                           
##                   Kappa : 0.8876          
##                                           
##  Mcnemar's Test P-Value : 0.505           
##                                           
##             Sensitivity : 0.9439          
##             Specificity : 0.9524          
##          Pos Pred Value : 0.9712          
##          Neg Pred Value : 0.9091          
##              Prevalence : 0.6294          
##          Detection Rate : 0.5941          
##    Detection Prevalence : 0.6118          
##       Balanced Accuracy : 0.9482          
##                                           
##        'Positive' Class : Benign          
## 

model_OneR <- OneR(diagnosis~., data = train_set)
predict_OneR <- predict(model_OneR, test_set[,-1])
cm_OneR  <- confusionMatrix(predict_OneR, test_set$diagnosis)   
cm_OneR
## Confusion Matrix and Statistics
## 
##            Reference
## Prediction  Benign Malignant
##   Benign        97         6
##   Malignant     10        57
##                                           
##                Accuracy : 0.9059          
##                  95% CI : (0.8517, 0.9452)
##     No Information Rate : 0.6294          
##     P-Value [Acc > NIR] : <2e-16          
##                                           
##                   Kappa : 0.8008          
##                                           
##  Mcnemar's Test P-Value : 0.4533          
##                                           
##             Sensitivity : 0.9065          
##             Specificity : 0.9048          
##          Pos Pred Value : 0.9417          
##          Neg Pred Value : 0.8507          
##              Prevalence : 0.6294          
##          Detection Rate : 0.5706          
##    Detection Prevalence : 0.6059          
##       Balanced Accuracy : 0.9057          
##                                           
##        'Positive' Class : Benign          
## 

model_PART <- PART(diagnosis~., data = train_set)
predict_PART <- predict(model_PART, test_set[,-1])
cm_PART  <- confusionMatrix(predict_PART, test_set$diagnosis)
cm_PART
## Confusion Matrix and Statistics
## 
##            Reference
## Prediction  Benign Malignant
##   Benign       102         3
##   Malignant      5        60
##                                           
##                Accuracy : 0.9529          
##                  95% CI : (0.9094, 0.9795)
##     No Information Rate : 0.6294          
##     P-Value [Acc > NIR] : <2e-16          
##                                           
##                   Kappa : 0.8998          
##                                           
##  Mcnemar's Test P-Value : 0.7237          
##                                           
##             Sensitivity : 0.9533          
##             Specificity : 0.9524          
##          Pos Pred Value : 0.9714          
##          Neg Pred Value : 0.9231          
##              Prevalence : 0.6294          
##          Detection Rate : 0.6000          
##    Detection Prevalence : 0.6176          
##       Balanced Accuracy : 0.9528          
##                                           
##        'Positive' Class : Benign          
## 

RWeka_df <- data.frame("Balanced Accuracy" = c(cm_JRip$byClass['Balanced Accuracy'],
                                               cm_OneR$byClass['Balanced Accuracy'],
                                               cm_PART$byClass['Balanced Accuracy']),
                       "F1" = c(cm_JRip$byClass['F1'],
                                cm_OneR$byClass['F1'],
                                cm_PART$byClass['F1']),
                       row.names = c("JRip", "OneR", "PART"))
RWeka_df
##      Balanced.Accuracy        F1
## JRip         0.9481531 0.9573460
## OneR         0.9056520 0.9238095
## PART         0.9528260 0.9622642

As can be seen, PART yields the best results out of all of these functions with default settings (perhaps tuning each of these functions could change these results).

3.1.6. Naive Bayes

The Naive Bayes algorithm is one of the most popular machine learning approaches. It is based upon the Bayes’ theorem, which describes the probability of an event based on prior knowledge of conditions related to said event. The mathematical notation is as follows: \[P(A|B) = \frac{P(B|A)P(A)}{P(B)}\] Where:
  • P(A|B): probability of A given B.
  • P(B|A)>: probability of B given A.
  • P(A): probability of A.
  • P(B): probability of B.

The Naive Bayes algorithm, also known as the Multinomial Naive Bayes Classifier, uses this principle to build its decision tree, developing each branch based upon the probability of each branch given prior (training) knowledge. Josh Starmer’s video on it visually explains and exemplifies this concept (he has published an even more simplified video where the naivity aspect of the algorithm is explained, which briefly speaking is due to the algorithm ignoring potential relationships between features).

There are many libraries which include a Naive Bayes’ machine learning function but, for the sake of simplicity, this document will focus around one: the e1071 package, which holds various miscellaneous statistics-based functions (from Fourier transformations to many clustering approaches and machine learning algorithms), can make use of this algorithm via the naiveBayes() function.

# Do not run this code snippet, as it is only here for illustration purposes
library(e1071)
naiveBayes(x, 
           y, 
           laplace = 0,
           ...
)
The functions’ arguments are as follows:
  • x: a numeric matrix, or a data frame of categorical and/or numeric variables.
  • y: class vector.
  • laplace: positive double controlling Laplace smoothing. The default (0) disables Laplace smoothing.

More information about this function, its behavior and its arguments can be found in its associated RDocumentation page: https://www.rdocumentation.org/packages/e1071/versions/1.7-9/topics/naiveBayes

The following code snippet showcases the construction of the model, its associated prediction and confusion matrix and the most relevant one number summary metrics:

library(e1071)
library(caret)

model_naiveBayes <- naiveBayes(train_set[,-1], train_set$diagnosis)
prediction_naiveBayes <- predict(model_naiveBayes, test_set[,-1], type = "class")
cm_naiveBayes <- confusionMatrix(prediction_naiveBayes, test_set$diagnosis)
cm_naiveBayes
## Confusion Matrix and Statistics
## 
##            Reference
## Prediction  Benign Malignant
##   Benign       106         3
##   Malignant      1        60
##                                           
##                Accuracy : 0.9765          
##                  95% CI : (0.9409, 0.9936)
##     No Information Rate : 0.6294          
##     P-Value [Acc > NIR] : <2e-16          
##                                           
##                   Kappa : 0.9492          
##                                           
##  Mcnemar's Test P-Value : 0.6171          
##                                           
##             Sensitivity : 0.9907          
##             Specificity : 0.9524          
##          Pos Pred Value : 0.9725          
##          Neg Pred Value : 0.9836          
##              Prevalence : 0.6294          
##          Detection Rate : 0.6235          
##    Detection Prevalence : 0.6412          
##       Balanced Accuracy : 0.9715          
##                                           
##        'Positive' Class : Benign          
## 

acc_naiveBayes <- cm_naiveBayes$byClass['Balanced Accuracy']
F1_naiveBayes <- cm_naiveBayes$byClass['F1']
print(c(acc_naiveBayes, F1_naiveBayes))
## Balanced Accuracy                F1 
##         0.9715176         0.9814815

Laplace smoothing can be tuned to achieve a better prediction.

library(e1071)
library(caret)

acc_naiveBayes_array <- NULL
F1_naiveBayes_array <- NULL

for(i in 0:50){
    model_naiveBayes_temp <- naiveBayes(train_set[,-1], train_set$diagnosis, laplace = i) 
    prediction_naiveBayes_temp <- predict(model_naiveBayes_temp, test_set[,-1]) 
    cm_naiveBayes_temp <- confusionMatrix(prediction_naiveBayes_temp, test_set$diagnosis)
    acc_naiveBayes_array[i+1] <- cm_naiveBayes_temp$byClass['Balanced Accuracy']
    F1_naiveBayes_array[i+1] <- cm_naiveBayes_temp$byClass['F1']
}

acc_naiveBayes_df <- data.frame(laplace = seq(0,50), acc = acc_naiveBayes_array)
acc_naiveBayes_optimal <- subset(acc_naiveBayes_df, acc == max(acc))[1,]

F1_naiveBayes_df <- data.frame(laplace = seq(0,50), F1 = F1_naiveBayes_array)
F1_naiveBayes_optimal <- subset(F1_naiveBayes_df, F1 == max(F1))[1,]

print(c(acc_naiveBayes_optimal, F1_naiveBayes_optimal))
## $laplace
## [1] 0
## 
## $acc
## [1] 0.9715176
## 
## $laplace
## [1] 0
## 
## $F1
## [1] 0.9814815

# We average F1 and balanced accuracy to measure success
tuning_naiveBayes_df <- data.frame(laplace = seq(0,50), success = 0.5 * (acc_naiveBayes_array + F1_naiveBayes_array))
tuning_naiveBayes <- subset(tuning_naiveBayes_df, success == max(success))[1,]
print(tuning_naiveBayes)
##   laplace   success
## 1       0 0.9764995

In this case, Laplace smoothing does not affect the one number summary metrics whatsoever. To appreciate this, a graph can be plotted - for illustration purposes, said plot is performed via two different libraries: highcharter and ggplot2.

highcharter
sub_naiveBayes <- paste("Optimal number of laplace is", tuning_naiveBayes$laplace, "with an averaged success (balanced accuracy and F1 score) of ", tuning_naiveBayes$success)

library(highcharter)
hchart(tuning_naiveBayes_df, 'line', hcaes(laplace, success)) %>%
  hc_title(text = "Averaged success with varying Laplace values (Naive Bayes)") %>%
  hc_subtitle(text = sub_naiveBayes) %>%
  hc_add_theme(hc_theme_google()) %>%
  hc_xAxis(title = list(text = "Number of laplace")) %>%
  hc_yAxis(title = list(text = "Averaged success"))
ggplot2
sub_naiveBayes <- paste("Optimal Laplace value is", tuning_naiveBayes$laplace, "with an averaged success (balanced accuracy and F1 score) of ", tuning_naiveBayes$success)

library(ggplot2)
ggplot(tuning_naiveBayes_df, aes(laplace, success)) + 
  geom_line() + 
  geom_point() + theme_minimal() + 
  labs(title = "Averaged success with varying Laplace values (Naive Bayes)",
       subtitle = sub_naiveBayes,
       x = "Number of trials",
       y = "Averaged success")

3.1.7. Conditional Inference Trees

Conditional Inference trees, also referred as unbiased recursive partitioning, is a non-parametric class of decision trees that uses a statistical theory (selection by permutation-based significance tests) in order to select variables instead of selecting the variable that maximizes an information measure (Gini coefficient or Information Gain) and thereby removes the potential bias in CART or similar decision trees.

Its usage within R comes from the party (usually referred to as partykit in most documentation) and its associated ctree() function.

# Do not run this code snippet, as it is only here for illustration purposes
library(party)
ctree(formula,
      data,
      subset,
      weights,
      na.action = na.pass,
      offset, cluster,
      control = ctree_control(...),
      ytrafo = NULL,
      converged = NULL,
      scores = NULL,
      doFit = TRUE,
      ...
)
The arguments are as follows:
  • formula: a symbolic description of the model to be fit.
  • data: a data frame containing the variables in the model.
  • subset: an optional vector specifying a subset of observations to be used in the fitting process.
  • weights: an optional vector of weights to be used in the fitting process. Only non-negative integer valued weights are allowed.
  • offset: an optional vector of offset values.
  • cluster: an EXPERIMENTAL and optional factor indicating independent clusters.
  • na.action: a function which indicates what should happen when the data contain missing value.
  • control: a list with control parameters.
  • ytrafo: an optional named list of functions to be applied to the response variable(s) before testing their association with the explanatory variables.
  • converged: an optional function for checking user-defined criteria before splits are implemented.
  • scores: an optional named list of scores to be attached to ordered factors.
  • doFit: if FALSE, the tree is not fitted.

More information about this function, its behavior and its arguments can be found in its associated RDocumentation page: https://www.rdocumentation.org/packages/partykit/versions/1.2-15/topics/ctree

The following code snippet showcases the construction of the model, its associated prediction and confusion matrix and the most relevant one number summary metrics:

library(party)
library(caret)

model_ctree <- ctree(diagnosis~., data=train_set)
prediction_ctree <- predict(model_ctree, test_set[,-1])
cm_ctree <- confusionMatrix(prediction_ctree, test_set$diagnosis)
cm_ctree
## Confusion Matrix and Statistics
## 
##            Reference
## Prediction  Benign Malignant
##   Benign       100         4
##   Malignant      7        59
##                                           
##                Accuracy : 0.9353          
##                  95% CI : (0.8872, 0.9673)
##     No Information Rate : 0.6294          
##     P-Value [Acc > NIR] : <2e-16          
##                                           
##                   Kappa : 0.8626          
##                                           
##  Mcnemar's Test P-Value : 0.5465          
##                                           
##             Sensitivity : 0.9346          
##             Specificity : 0.9365          
##          Pos Pred Value : 0.9615          
##          Neg Pred Value : 0.8939          
##              Prevalence : 0.6294          
##          Detection Rate : 0.5882          
##    Detection Prevalence : 0.6118          
##       Balanced Accuracy : 0.9355          
##                                           
##        'Positive' Class : Benign          
## 

acc_ctree <- cm_ctree$byClass['Balanced Accuracy']
F1_ctree <- cm_ctree$byClass['F1']
print(c(acc_ctree, F1_ctree))
## Balanced Accuracy                F1 
##         0.9355437         0.9478673

The control parameter allows users to tweak the function through the ctree_control function and its maxdepth argument. Said argument determines the number of features around which to build the decision tree, and as such it should plateau at the optimal number of significant features.
The following code snippet loops through values to evaluate how the one number summary metrics change with the tree’s depth.

acc_ctree_array <- NULL
F1_ctree_array <- NULL

for(i in 1:50){
    model_ctree_temp <- ctree(diagnosis~., data=train_set, controls=ctree_control(maxdepth=i)) 
    prediction_ctree_temp <- predict(model_ctree_temp, test_set[,-1]) 
    cm_ctree_temp <- confusionMatrix(prediction_ctree_temp, test_set$diagnosis)
    acc_ctree_array[i] <- cm_ctree_temp$byClass['Balanced Accuracy']
    F1_ctree_array[i] <- cm_ctree_temp$byClass['F1']
}

acc_ctree_df <- data.frame(depth = seq(1,50), acc = acc_ctree_array)
acc_ctree_optimal <- subset(acc_ctree_df, acc == max(acc))[1,]

F1_ctree_df <- data.frame(depth = seq(1,50), F1 = F1_ctree_array)
F1_ctree_optimal <- subset(F1_ctree_df, F1 == max(F1))[1,]

print(c(acc_ctree_optimal, F1_ctree_optimal)) # At the time of writing these values coincide, but that might not be the case
## $depth
## [1] 2
## 
## $acc
## [1] 0.9420709
## 
## $depth
## [1] 3
## 
## $F1
## [1] 0.9478673

tuning_ctree_df <- data.frame(depth = seq(1,50), success = 0.5 * (acc_ctree_array + F1_ctree_array)) # We average F1 and balanced accuracy to measure success
library(dplyr) # For the mutate() function used to add balanced accuracy and F1 values to the dataframe
tuning_ctree <- subset(tuning_ctree_df, success == max(success))[1,] %>% mutate(acc = acc_ctree_df[max(depth), 2], F1 = F1_ctree_df[max(depth), 2])
print(tuning_ctree)
##   depth   success       acc        F1
## 2     2 0.9444654 0.9420709 0.9468599

A graph can be plotted to appreciate the depth’s effect on the algorithm’ success. For illustration purposes, said plot is performed via two different libraries: highcharter and ggplot2.

highcharter
sub_ctree <- paste("Optimal depth value is", tuning_ctree$depth, "with an averaged success (balanced accuracy and F1 score) of ", tuning_ctree$success)

library(highcharter)
hchart(tuning_ctree_df, 'line', hcaes(depth, success)) %>%
  hc_title(text = "Averaged success with varying depth (Conditional Inference Trees)") %>%
  hc_subtitle(text = sub_ctree) %>%
  hc_add_theme(hc_theme_google()) %>%
  hc_xAxis(title = list(text = "Depth value")) %>%
  hc_yAxis(title = list(text = "Averaged success"))
ggplot2
sub_ctree <- paste("Optimal depth value is", tuning_ctree$depth, "with an averaged success (balanced accuracy and F1 score) of ", tuning_ctree$success)

library(ggplot2)
ggplot(tuning_ctree_df, aes(depth, success)) + 
  geom_line() + 
  geom_point() + theme_minimal() + 
  labs(title = "Averaged success with varying depth (Conditional Inference Trees)",
       subtitle = sub_ctree,
       x = "Depth value",
       y = "Averaged success")

These results evidentiate that the most reasonable value for the maxdepth argument is 2 and any further increase in said value does not yield an improvement.

There are many other possible tweaks to better tune the function’s behavior. However, the current scope of this document does not cover them (might update it in the future). As of now, refer to the provided links to better understand the function and its arguments.

3.1.8. randomForest

Decision trees tend to suffer from high variance: if the dataset is split into two halves, applying a decision tree to both halves could yield quite different results. One method that can be used in order to reduce the variance of a single decision tree is to make use of a random forest model. With the random forest approach a large number of decision trees are created, and every observation is fed into every decision tree with the most common outcome for each observation being used as the final output. Every new observation is fed into all the trees so that predictions are built upon a majority vote (with each and every tree being participant in said resolution).

The go-to library for random forest machine learning usage is randomForest, with its core function being randomForest().

# Do not run this code snippet, as it is only here for illustration purposes
randomForest(x, 
             y=NULL,  
             xtest=NULL, 
             ytest=NULL, 
             ntree=500,
             na.action=na.fail,
             mtry = if (!is.null(y) && !is.factor(y))
                        max(floor(ncol(x)/3), 1)
                    else
                        floor(sqrt(ncol(x))),
             replace=TRUE, 
             classwt=NULL, 
             cutoff, 
             strata,
             sampsize = if (replace) nrow(x) else ceiling(.632*nrow(x)),
             nodesize = if (!is.null(y) && !is.factor(y)) 5 else 1,
             maxnodes = NULL,
             importance=FALSE, 
             localImp=FALSE, 
             nPerm=1,
             proximity, 
             oob.prox=proximity,
             norm.votes=TRUE, 
             do.trace=FALSE,
             keep.forest=!is.null(y) && is.null(xtest), 
             corr.bias=FALSE,
             keep.inbag=FALSE, 
             ...
)
The arguments are as follows:
  • x: a data frame or matrix of predictors.
  • y: a response vector. If a factor, classification is assumed, otherwise regression is assumed. If omitted, the function will run in unsupervised mode.
  • xtest: a data frame or matrix containing predictors for the test set.
  • ytest: response for the test set.
  • ntree: number of trees to “grow” (to be used in the forest for the prediction voting process).
  • na.action: the default action deletes all observations for which y is missing, but keeps those in which one or more predictors are missing.
  • mtry: number of variables randomly sampled as candidates at each split.
  • replace: TRUE or FALSE determines whether to sample the cases with replacement or not.
  • classwt: priors of the classes (ignored for regression).
  • cutoff: a vector of length equal to number of classes (and thus only useful in classification exercises). The winning class for an observation is the one with the maximum ratio of proportion of votes to cutoff. Default is 1/k where k is the number of classes (i.e., majority vote wins).
  • strata: a (factor) variable that is used for stratified sampling.
  • sampsize: size(s) of sample to draw.
  • nodesize: minimum size of terminal nodes. Setting this number larger causes smaller trees to be grown (and thus take less time). Note that the default values are different for classification (1) and regression (5).
  • maxnodes: maximum number of terminal nodes trees in the forest can have. If not given, trees are grown to the maximum possible (limited by nodes’ size).
  • importance: TRUE or FALSE determines whether to assess the importance of predictors.
  • localImp: TRUE or FALSE determines whether to compute casewire importance measures.
  • nPerm: number of times the out-of-bag data are permuted per tree for assessing variable importance.
  • proximity: TRUE or FALSE determines whether to measure the proximity between rows.
  • oob.prox: TRUE or FALSE determines whether to calculate proximity using only out-of-bag data.
  • norm.votes: if TRUE (default), the final result of votes are expressed as fractions. If FALSE, raw vote counts are returned (useful for combining results from different runs). Ignored for regression exercises.
  • do.trace: if TRUE, a more verbose output output takes place.
  • keep.forest: if FALSE, the forest will not be retained in the output object.
  • corr.bias: an EXPERIMENTAL argument to perform bias correction in regression exercises.
  • keep.inbag: TRUE or FALSE determines whether to return a matrix with dimensions nxntree that keeps track of which samples are “in-bag” in which trees.

As can be observed, random forests (and randomForest() particularly) have an overwhelming amount of customazibility, mainly due to their potential complexity. If needed/interested, refer to the associated RDocumentation page for more information about this function, its behavior and its arguments: https://www.rdocumentation.org/packages/e1071/versions/1.7-9/topics/naiveBayes

The following code snippet showcases the construction of the model, its associated prediction and confusion matrix and the most relevant one number summary metrics (arguments set at default):

library(randomForest)
library(caret)

model_randomForest <- randomForest(x = train_set[,-1], y = train_set$diagnosis)
prediction_randomForest <- predict(model_randomForest, test_set[,-1], type = "class")
cm_randomForest <- confusionMatrix(prediction_randomForest, test_set$diagnosis)
cm_randomForest
## Confusion Matrix and Statistics
## 
##            Reference
## Prediction  Benign Malignant
##   Benign       106         1
##   Malignant      1        62
##                                           
##                Accuracy : 0.9882          
##                  95% CI : (0.9581, 0.9986)
##     No Information Rate : 0.6294          
##     P-Value [Acc > NIR] : <2e-16          
##                                           
##                   Kappa : 0.9748          
##                                           
##  Mcnemar's Test P-Value : 1               
##                                           
##             Sensitivity : 0.9907          
##             Specificity : 0.9841          
##          Pos Pred Value : 0.9907          
##          Neg Pred Value : 0.9841          
##              Prevalence : 0.6294          
##          Detection Rate : 0.6235          
##    Detection Prevalence : 0.6294          
##       Balanced Accuracy : 0.9874          
##                                           
##        'Positive' Class : Benign          
## 

acc_randomForest <- cm_randomForest$byClass['Balanced Accuracy']
F1_randomForest <- cm_randomForest$byClass['F1']
print(c(acc_randomForest, F1_randomForest))
## Balanced Accuracy                F1 
##         0.9873906         0.9906542

The aforementioned complexity of random forests can make the process of tuning the algorithm’s behavior overwhelming and detailing every intricacy goes beyond the scope of this document. However, it is worth noting that there is a distinct correlation between the number of trees and the predictions’ prevalence, up until a given number up from which there is no significant accuracy gains - there’s only an increasing computing time requirement. That number can be as low as one (tree), and selecting the lowest value possible with an acceptable prevalence can save precious processing time.
The following code snippet aims to find said number of trees:

library(randomForest)
library(caret)

acc_randomForest_array <- NULL
F1_randomForest_array <- NULL
prevalence_randomForest_array <- NULL

for(i in 0:50){
    model_randomForest_temp <- naiveBayes(train_set[,-1], train_set$diagnosis, ntree = i*15 + 1, importance = TRUE, proximity = TRUE) 
    prediction_randomForest_temp <- predict(model_randomForest_temp, test_set[,-1]) 
    cm_randomForest_temp <- confusionMatrix(prediction_randomForest_temp, test_set$diagnosis)
    acc_randomForest_array[i+1] <- cm_randomForest_temp$byClass['Balanced Accuracy']
    F1_randomForest_array[i+1] <- cm_randomForest_temp$byClass['F1']
    prevalence_randomForest_array[i+1] <- cm_randomForest_temp$byClass['Prevalence']
}

acc_randomForest_df <- data.frame(ntree = seq(1, length.out = 51, by = 15), acc = acc_randomForest_array)
acc_randomForest_optimal <- subset(acc_randomForest_df, acc == max(acc))[1,]

F1_randomForest_df <- data.frame(ntree = seq(1, length.out = 51, by = 15), F1 = F1_randomForest_array)
F1_randomForest_optimal <- subset(F1_randomForest_df, F1 == max(F1))[1,]

prevalence_randomForest_df <- data.frame(ntree = seq(1, length.out = 51, by = 15), prevalence = prevalence_randomForest_array)
prevalence_randomForest_optimal <- subset(prevalence_randomForest_df, prevalence == max(prevalence))[1,]

print(c(acc_randomForest_optimal, F1_randomForest_optimal, prevalence_randomForest_optimal))
## $ntree
## [1] 1
## 
## $acc
## [1] 0.9715176
## 
## $ntree
## [1] 1
## 
## $F1
## [1] 0.9814815
## 
## $ntree
## [1] 1
## 
## $prevalence
## [1] 0.6294118

# We average F1, balanced accuracy and prevalence to measure success
tuning_randomForest_df <- data.frame(ntree = seq(1, by = 15), success = 0.33 * (acc_randomForest_array + F1_randomForest_array + prevalence_randomForest_array))
library(dplyr) # For the mutate() function used to add balanced accuracy and F1 values to the dataframe
tuning_randomForest <- subset(tuning_randomForest_df, success == max(success))[1,] %>% 
  mutate(acc = acc_randomForest_df[max(ntree), 2], F1 = F1_randomForest_df[max(ntree), 2], prevalence = prevalence_randomForest_df[max(ntree), 2])
print(tuning_randomForest)
##   ntree   success       acc        F1 prevalence
## 1     1 0.8521956 0.9715176 0.9814815  0.6294118

3.1.9. K-Nearest Neighbors

K-Nearest Neighbors (KNN) is built around Euclidian distances. Using KNN, for any point (x1, x2) for which an estimate p(x1, x2) is wanted, the algorithm looks for the K nearest points to (x1, x2) and computes an average of the 0s and 1s associated with these points. This set of points used to compute the average as the neighborhood. Larger values of K result in smoother estimates, while smaller values of K result in more flexible and wiggly estimates.

As with any of the other machine learning approaches covered by this document, KNN can be used within R via different libraries. The ones to be used are the following:
  • The caret package, which includes the knn3() function.
  • The class package, which includes the knn() function.

Let’s detail the former:

# Do not run this code snippet, as it is only here for illustration purposes
library(caret)

# For formula
knn3(formula,
     data,
     k = 5,
     subset,
     na.action,
     ...
)

# For dataframes and matrices
knn3(x,
     y,
     k = 5,
     ...
)
Let’s detail its arguments:
  • x: a data frame or matrix of predictors.
  • y: a factor vector of training set classes.
  • formula: a formula of the form lhs ~ rhs where lhs is the response variable and rhs a set of predictors.
  • data: optional data frame containing the variables in the model formula.
  • k: number of neighbors considered.
  • subset: optional vector specifying a subset of observations to be used.
  • na.action: a function which indicates what should happen when the data contain NA data.

More information about this function, its behavior and its arguments can be found in its associated RDocumentation page: https://www.rdocumentation.org/packages/caret/versions/6.0-90/topics/knn3

On the other hand, the function knn() from the class library goes as follows:

# Do not run this code snippet, as it is only here for illustration purposes
library(class)
knn(train,
    test,
    cl,
    k = 1,
    l = 0,
    prob = FALSE,
    use.all = TRUE)
Let’s detail its arguments:
  • train: matrix or data frame of training set cases.
  • test: matrix or data frame of test set cases. A vector will be interpreted as a row vector for a single case.
  • cl: factor of true classifications of training set.
  • k: number of neighbors considered.
  • l: minimum vote for definite decision (meaning that less than k dissenting votes are allowed).
  • prob: if TRUE, the proportion of the votes for the winning class are returned as attribute prob.

More information about this function, its behavior and its arguments can be found in its associated RDocumentation page: https://www.rdocumentation.org/packages/class/versions/7.3-20/topics/knn

The following code snippet showcases both functions, where the number of neighbors has been set at 5 in both cases in order to properly compare their results. Note that knn() returns the prediction directly whereas previous functions build a model which is then used to construct a prediction array.

library(caret)

model_knn3 <- knn3(x = train_set[,-1], y = train_set$diagnosis, k = 5)
prediction_knn3 <- predict(model_knn3, test_set[,-1], type = "class")
cm_knn3 <- confusionMatrix(prediction_knn3, test_set$diagnosis)
cm_knn3
## Confusion Matrix and Statistics
## 
##            Reference
## Prediction  Benign Malignant
##   Benign       100         7
##   Malignant      7        56
##                                           
##                Accuracy : 0.9176          
##                  95% CI : (0.8657, 0.9542)
##     No Information Rate : 0.6294          
##     P-Value [Acc > NIR] : <2e-16          
##                                           
##                   Kappa : 0.8235          
##                                           
##  Mcnemar's Test P-Value : 1               
##                                           
##             Sensitivity : 0.9346          
##             Specificity : 0.8889          
##          Pos Pred Value : 0.9346          
##          Neg Pred Value : 0.8889          
##              Prevalence : 0.6294          
##          Detection Rate : 0.5882          
##    Detection Prevalence : 0.6294          
##       Balanced Accuracy : 0.9117          
##                                           
##        'Positive' Class : Benign          
## 

acc_knn3 <- cm_knn3$byClass['Balanced Accuracy']
F1_knn3 <- cm_knn3$byClass['F1']
print(c(acc_knn3, F1_knn3))
## Balanced Accuracy                F1 
##         0.9117342         0.9345794

library(class)

prediction_knn <- knn(train = train_set[,-1], test = test_set[,-1], cl = train_set$diagnosis, k = 5)
cm_knn <- confusionMatrix(prediction_knn, test_set$diagnosis)
cm_knn
## Confusion Matrix and Statistics
## 
##            Reference
## Prediction  Benign Malignant
##   Benign       100         7
##   Malignant      7        56
##                                           
##                Accuracy : 0.9176          
##                  95% CI : (0.8657, 0.9542)
##     No Information Rate : 0.6294          
##     P-Value [Acc > NIR] : <2e-16          
##                                           
##                   Kappa : 0.8235          
##                                           
##  Mcnemar's Test P-Value : 1               
##                                           
##             Sensitivity : 0.9346          
##             Specificity : 0.8889          
##          Pos Pred Value : 0.9346          
##          Neg Pred Value : 0.8889          
##              Prevalence : 0.6294          
##          Detection Rate : 0.5882          
##    Detection Prevalence : 0.6294          
##       Balanced Accuracy : 0.9117          
##                                           
##        'Positive' Class : Benign          
## 

acc_knn <- cm_knn$byClass['Balanced Accuracy']
F1_knn <- cm_knn$byClass['F1']
print(c(acc_knn, F1_knn))
## Balanced Accuracy                F1 
##         0.9117342         0.9345794

Note that results are identical with both functions, as should be (core-wise they should be identical).

KNN can be fine-tuned by selecting an optimal number of neighbors. Said tuning process is showcased in the following code snippet (only knn3() is used now):

library(caret)
acc_knn3_array <- NULL
F1_knn3_array <- NULL

for(i in 1:50){
    model_knn3_temp <- knn3(x = train_set[,-1], y = train_set$diagnosis, k = i)     
    prediction_knn3_temp <- predict(model_knn3_temp, test_set[,-1], type = "class") 
    cm_knn3_temp <- confusionMatrix(prediction_knn3_temp, test_set$diagnosis)
    acc_knn3_array[i] <- cm_knn3_temp$byClass['Balanced Accuracy']
    F1_knn3_array[i] <- cm_knn3_temp$byClass['F1']
}

acc_knn3_df <- data.frame(k = seq(1,50), acc = acc_knn3_array)
acc_knn3_optimal <- subset(acc_knn3_df, acc == max(acc))[1,]

F1_knn3_df <- data.frame(k = seq(1,50), F1 = F1_knn3_array)
F1_knn3_optimal <- subset(F1_knn3_df, F1 == max(F1))[1,]

print(c(acc_knn3_optimal, F1_knn3_optimal)) # At the time of writing these values coincide, but that might not be the case
## $k
## [1] 20
## 
## $acc
## [1] 0.9224892
## 
## $k
## [1] 20
## 
## $F1
## [1] 0.9497717

tuning_knn3_df <- data.frame(k = seq(1,50), success = 0.5 * (acc_knn3_array + F1_knn3_array)) # We average F1 and balanced accuracy to measure success
library(dplyr) # For the mutate() function used to add balanced accuracy and F1 values to the dataframe
tuning_knn3 <- subset(tuning_knn3_df, success == max(success))[1,] %>% mutate(acc = acc_knn3_df[max(k), 2], F1 = F1_knn3_df[max(k), 2])
print(tuning_knn3)
##     k   success       acc        F1
## 20 20 0.9361305 0.9224892 0.9497717

The best results are obtained with k = 20, which might seem lower than expected but it does make sense since, as was already stated, smaller values of K result in more flexible and wiggly estimates.
These values and their associated results can be graphically interpreted with a plot. For illustration purposes, said plot is performed via two different libraries: highcharter and ggplot2.

highcharter
sub_knn3 <- paste("Optimal number of neighbors is", tuning_knn3$k, "with an averaged success (balanced accuracy and F1 score) of ", tuning_knn3$success)

library(highcharter)
hchart(tuning_knn3_df, 'line', hcaes(k, success)) %>%
  hc_title(text = "Averaged success with varying k (KNN)") %>%
  hc_subtitle(text = sub_knn3) %>%
  hc_add_theme(hc_theme_google()) %>%
  hc_xAxis(title = list(text = "K neighbors")) %>%
  hc_yAxis(title = list(text = "Averaged success"))
ggplot2
sub_knn3 <- paste("Optimal number of neighbors is", tuning_knn3$k, "with an averaged success (balanced accuracy and F1 score) of ", tuning_knn3$success)

library(ggplot2)
ggplot(tuning_knn3_df, aes(k, success)) + 
  geom_line() + 
  geom_point() + theme_minimal() + 
  labs(title = "Averaged success with varying k (KNN)",
       subtitle = sub_knn3,
       x = "K neighbors",
       y = "Averaged success")

More machine learning approaches/algorithms to be included in a later update.

3.2. Results

The following code snippet plots the overall success of each approach. It does so with a “Four fold” plot where each confusion matrix can be visually evaluated: the four folds represent each section of the confusion matrix table, with the blue sections being correct guesses (on one side, benign being predicted as benign; on the other, malign being predicted as malign) and the red sections correspond to incorrect guesses (benign being predicted as malign and vice-versa).
The overall success of each approach is represented by the average of their balanced accuracy and their F1 score (as a percentage).

# Visualize to compare the accuracy of all methods
col <- c("#ed3b3b", "#0099ff")
par(mfrow=c(4,4))
fourfoldplot(cm_linear$table, color = col, conf.level = 0, margin = 1, main=paste("Linear Model (",round(mean(acc_linear, F1_linear)*100, 2),"%)",sep=""))
fourfoldplot(cm_logistic$table, color = col, conf.level = 0, margin = 1, main=paste("Logistic Model (",round(mean(acc_logistic, F1_logistic)*100, 2),"%)",sep=""))
fourfoldplot(cm_C50$table, color = col, conf.level = 0, margin = 1, main=paste("C5.0 (",round(mean(acc_C50, F1_C50)*100, 2),"%)",sep=""))
fourfoldplot(cm_rpart$table, color = col, conf.level = 0, margin = 1, main=paste("rpart (",round(mean(acc_rpart, F1_rpart)*100, 2),"%)",sep=""))

fourfoldplot(cm_JRip$table, color = col, conf.level = 0, margin = 1, main=paste("RWeka JRip (",round(mean(cm_JRip$byClass['Balanced Accuracy'], cm_JRip$byClass['F1'])*100, 2),"%)",sep=""))
fourfoldplot(cm_OneR$table, color = col, conf.level = 0, margin = 1, main=paste("RWeka OneR (",round(mean(cm_OneR$byClass['Balanced Accuracy'], cm_OneR$byClass['F1'])*100, 2),"%)",sep=""))
fourfoldplot(cm_PART$table, color = col, conf.level = 0, margin = 1, main=paste("RWeka PART (",round(mean(cm_PART$byClass['Balanced Accuracy'], cm_PART$byClass['F1'])*100, 2),"%)",sep=""))
fourfoldplot(cm_naiveBayes$table, color = col, conf.level = 0, margin = 1, main=paste("Naive Bayes (",round(mean(acc_naiveBayes, F1_naiveBayes)*100, 2),"%)",sep=""))

fourfoldplot(cm_ctree$table, color = col, conf.level = 0, margin = 1, main=paste("ctree (",round(mean(acc_ctree, F1_ctree)*100, 2),"%)",sep=""))
fourfoldplot(cm_randomForest$table, color = col, conf.level = 0, margin = 1, main=paste("Random Forest (",round(mean(acc_randomForest, F1_randomForest)*100, 2),"%)",sep=""))
fourfoldplot(cm_knn3$table, color = col, conf.level = 0, margin = 1, main=paste("KNN3 (",round(mean(acc_knn3, F1_knn3)*100, 2),"%)",sep=""))
fourfoldplot(cm_knn$table, color = col, conf.level = 0, margin = 1, main=paste("KNN (",round(mean(acc_knn, F1_knn)*100, 2),"%)",sep=""))

The following code snippet goes through the evaluated approaches and picks the one with the higher success.

# Select a best prediction model according to high accuracy
opt_predict <- c(mean(acc_linear, F1_linear)*100, 
mean(acc_logistic, F1_logistic)*100,
mean(acc_C50, F1_C50)*100,
mean(acc_rpart, F1_rpart)*100,
mean(cm_JRip$byClass['Balanced Accuracy'], cm_JRip$byClass['F1'])*100,
mean(cm_OneR$byClass['Balanced Accuracy'], cm_OneR$byClass['F1'])*100,
mean(cm_PART$byClass['Balanced Accuracy'], cm_PART$byClass['F1'])*100,
mean(acc_naiveBayes, F1_naiveBayes)*100,
mean(acc_ctree, F1_ctree)*100,
mean(acc_randomForest, F1_randomForest)*100,
mean(acc_knn3, F1_knn3)*100,
mean(acc_knn, F1_knn)*100)

names(opt_predict) <- c("Linear Model",
"Logistic Model",
"C5.0",
"rpart",
"RWeka JRip",
"RWeka OneR",
"RWeka PART",
"Naive Bayes",
"ctree",
"Random Forest",
"KNN3",
"KNN")

best_predict_model <- subset(opt_predict, opt_predict==max(opt_predict))
best_predict_model
## Random Forest 
##      98.73906