This project aims to be as educational and detailed as possible to showcase my understanding and proficiency of the R programming language, R Markdown and the topics at hand: data science (exploration, visualization, PCA…) and machine learning (through several high level libraries).
Note that the R Markdown file outputs a .HTML document that has been uplodad to RPubs where it can be properly visualized. Both files are also available in their dedicated GitHub repository.
The goal of this project is to perform a complete exploratory data analysis upon the Wisconsin Breast Cancer Dataset describing and detailing each step of the process. The document and its content are heavily based upon the work of Miri Choi in kaggle although with a much larger scope, much more detail (throughfully commenting each step of the process) and further developed (using additional machine learning libraries and approaches).
As previously stated, this project uses the R programming language along with several libraries which, as is the norm in most non-basic R projects, are often required to complement R with additional (and specific) functions. The following code snippet installs all of the required libraries if they are not installed already (through the use of conditionals and the built-in require() function), which are presented in alphabetical order for the sake of convenience (note that the installation itself is called by the install.packages() function).
if(!require(ade4)) {
install.packages("ade4")
}
if(!require(caret)) {
install.packages("caret")
}
if(!require(C50)) {
install.packages("C50")
}
if(!require(corrplot)) {
install.packages("corrplot")
}
if(!require(data.table)) {
install.packages("data.table")
}
if(!require(dplyr)) {
install.packages("dplyr")
}
if(!require(ExPosition)) {
install.packages("ExPosition")
}
if(!require(factoextra)) {
install.packages("factoextra")
}
if(!require(FactoMineR)) {
install.packages("FactoMineR")
}
if(!require(GGally)) {
install.packages("GGally")
}
if(!require(ggplot2)) {
install.packages("ggplot2")
}
if(!require(gridExtra)) {
install.packages("gridExtra")
}
if(!require(highcharter)) {
install.packages("highcharter")
}
if(!require(PerformanceAnalytics)) {
install.packages("PerformanceAnalytics")
}
if(!require(PST)) {
install.packages("PST")
}
if(!require(psych)) {
install.packages("psych")
}
if(!require(RCurl)) {
install.packages("RCurl")
}Installing a given package does not mean said package (and its associated functions) are yet ready to be used. To do so, it needs to be properly loaded into the R workspace, for which there exists the built-in library() function. The following code snippet makes use of said function to import/load all of the project’s required libraries (once again, in alphabetical order for the sake of convenience).
library(ade4)
library(caret)
library(C50)
library(corrplot)
library(data.table)
library(dplyr)
library(ExPosition)
library(factoextra)
library(FactoMineR)
library(GGally)
library(ggplot2)
library(gridExtra)
library(highcharter)
library(PerformanceAnalytics)
library(PST)
library(psych)
library(RCurl)Note that some of these libraries are also included in the tidyverse package. However, I rather understand the use-case scenario of each instead of relying on library bundles.
The next step after installing and loading all of the required packages is to load the data itself. To do so, an optional variable (refered to as urlfile in the upcoming code snippet) can be created so that it holds the the URL (string) from which to mine the data - doing so requires the use of the read.csv() function (or an equivalent function from an alternative library).
It is highly recommended to explore the .data file before importing its content into the work environment in order to check whether or not the data is preceded by a header. In this case it is not, so using the argument header = FALSE ensures the data is read properly.
More information about the read.csv() function, its behavior and its arguments is available in its associated RDocumentation page: https://www.rdocumentation.org/packages/utils/versions/3.6.2/topics/read.table
The function head() prints the very first rows of the loaded dataset, which is useful to observe how the data has been interpreted by R.
urlfile = "https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/wdbc.data"
wbcd <- read.csv(urlfile, header = FALSE)
head(wbcd)
## V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11
## 1 842302 M 17.99 10.38 122.80 1001.0 0.11840 0.27760 0.3001 0.14710 0.2419
## 2 842517 M 20.57 17.77 132.90 1326.0 0.08474 0.07864 0.0869 0.07017 0.1812
## 3 84300903 M 19.69 21.25 130.00 1203.0 0.10960 0.15990 0.1974 0.12790 0.2069
## 4 84348301 M 11.42 20.38 77.58 386.1 0.14250 0.28390 0.2414 0.10520 0.2597
## 5 84358402 M 20.29 14.34 135.10 1297.0 0.10030 0.13280 0.1980 0.10430 0.1809
## 6 843786 M 12.45 15.70 82.57 477.1 0.12780 0.17000 0.1578 0.08089 0.2087
## V12 V13 V14 V15 V16 V17 V18 V19 V20 V21
## 1 0.07871 1.0950 0.9053 8.589 153.40 0.006399 0.04904 0.05373 0.01587 0.03003
## 2 0.05667 0.5435 0.7339 3.398 74.08 0.005225 0.01308 0.01860 0.01340 0.01389
## 3 0.05999 0.7456 0.7869 4.585 94.03 0.006150 0.04006 0.03832 0.02058 0.02250
## 4 0.09744 0.4956 1.1560 3.445 27.23 0.009110 0.07458 0.05661 0.01867 0.05963
## 5 0.05883 0.7572 0.7813 5.438 94.44 0.011490 0.02461 0.05688 0.01885 0.01756
## 6 0.07613 0.3345 0.8902 2.217 27.19 0.007510 0.03345 0.03672 0.01137 0.02165
## V22 V23 V24 V25 V26 V27 V28 V29 V30 V31 V32
## 1 0.006193 25.38 17.33 184.60 2019.0 0.1622 0.6656 0.7119 0.2654 0.4601 0.11890
## 2 0.003532 24.99 23.41 158.80 1956.0 0.1238 0.1866 0.2416 0.1860 0.2750 0.08902
## 3 0.004571 23.57 25.53 152.50 1709.0 0.1444 0.4245 0.4504 0.2430 0.3613 0.08758
## 4 0.009208 14.91 26.50 98.87 567.7 0.2098 0.8663 0.6869 0.2575 0.6638 0.17300
## 5 0.005115 22.54 16.67 152.20 1575.0 0.1374 0.2050 0.4000 0.1625 0.2364 0.07678
## 6 0.005082 15.47 23.75 103.40 741.6 0.1791 0.5249 0.5355 0.1741 0.3985 0.12440The function colnames() combines a string vector of length “N” with a dataset of “N” columns so that the elements inside the vector become the dataset’s header. Since the lack of content-descriptive column headers in the Wisconsin Breast Cancer Dataset makes the data hard to understand, the following code snippet makes use of colnames() to properly name each of the columns.
More information about the colnames() function, its behavior and its arguments is available in its associated RDocumentation page: https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/row%2Bcolnames
colnames(wbcd) <- c("id","diagnosis","radius_mean","texture_mean","perimeter_mean","area_mean","smoothness_mean","compactness_mean","concavity_mean",
"concave points_mean","symmetry_mean","fractal_dimension_mean","radius_se","texture_se","perimeter_se","area_se","smoothness_se",
"compactness_se","concavity_se","concave points_se","symmetry_se","fractal_dimension_se","radius_worst","texture_worst","perimeter_worst",
"area_worst","smoothness_worst","compactness_worst","concavity_worst","concave points_worst","symmetry_worst","fractal_dimension_worst")
head(wbcd)
## id diagnosis radius_mean texture_mean perimeter_mean area_mean
## 1 842302 M 17.99 10.38 122.80 1001.0
## 2 842517 M 20.57 17.77 132.90 1326.0
## 3 84300903 M 19.69 21.25 130.00 1203.0
## 4 84348301 M 11.42 20.38 77.58 386.1
## 5 84358402 M 20.29 14.34 135.10 1297.0
## 6 843786 M 12.45 15.70 82.57 477.1
## smoothness_mean compactness_mean concavity_mean concave points_mean
## 1 0.11840 0.27760 0.3001 0.14710
## 2 0.08474 0.07864 0.0869 0.07017
## 3 0.10960 0.15990 0.1974 0.12790
## 4 0.14250 0.28390 0.2414 0.10520
## 5 0.10030 0.13280 0.1980 0.10430
## 6 0.12780 0.17000 0.1578 0.08089
## symmetry_mean fractal_dimension_mean radius_se texture_se perimeter_se
## 1 0.2419 0.07871 1.0950 0.9053 8.589
## 2 0.1812 0.05667 0.5435 0.7339 3.398
## 3 0.2069 0.05999 0.7456 0.7869 4.585
## 4 0.2597 0.09744 0.4956 1.1560 3.445
## 5 0.1809 0.05883 0.7572 0.7813 5.438
## 6 0.2087 0.07613 0.3345 0.8902 2.217
## area_se smoothness_se compactness_se concavity_se concave points_se
## 1 153.40 0.006399 0.04904 0.05373 0.01587
## 2 74.08 0.005225 0.01308 0.01860 0.01340
## 3 94.03 0.006150 0.04006 0.03832 0.02058
## 4 27.23 0.009110 0.07458 0.05661 0.01867
## 5 94.44 0.011490 0.02461 0.05688 0.01885
## 6 27.19 0.007510 0.03345 0.03672 0.01137
## symmetry_se fractal_dimension_se radius_worst texture_worst perimeter_worst
## 1 0.03003 0.006193 25.38 17.33 184.60
## 2 0.01389 0.003532 24.99 23.41 158.80
## 3 0.02250 0.004571 23.57 25.53 152.50
## 4 0.05963 0.009208 14.91 26.50 98.87
## 5 0.01756 0.005115 22.54 16.67 152.20
## 6 0.02165 0.005082 15.47 23.75 103.40
## area_worst smoothness_worst compactness_worst concavity_worst
## 1 2019.0 0.1622 0.6656 0.7119
## 2 1956.0 0.1238 0.1866 0.2416
## 3 1709.0 0.1444 0.4245 0.4504
## 4 567.7 0.2098 0.8663 0.6869
## 5 1575.0 0.1374 0.2050 0.4000
## 6 741.6 0.1791 0.5249 0.5355
## concave points_worst symmetry_worst fractal_dimension_worst
## 1 0.2654 0.4601 0.11890
## 2 0.1860 0.2750 0.08902
## 3 0.2430 0.3613 0.08758
## 4 0.2575 0.6638 0.17300
## 5 0.1625 0.2364 0.07678
## 6 0.1741 0.3985 0.12440After checking that the data is now properly referenced by the new header, it is time to perform a proper examination of the dataset through the use of the str() and the summary() functions. The function class() is redundant in this case, as the function str() already returns the class of the dataset alongside the class and first elements of the many columns/variables which define the dataset (in other words, str() displays, in a compact manner, the internal structure of an R object) - however, it is included in the upcoming code snippet due to its usefulness within other scenarios (and given that, as stated previously, this extensive analysis aims to be educational and informative).
The summary() function, on the other hand, produces result summaries of the results of various model fitting functions: it returns the minimum, maximum, mean, median along with the first and third quartiles of any numeric-based columns/variables (for factor-based columns/variables such as the diagnosis one, it returns the occurrence of each of the factors).
class(): https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/class
str(): https://www.rdocumentation.org/packages/utils/versions/3.6.2/topics/str
summary(): https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/summary
class(wbcd)
## [1] "data.frame"
str(wbcd)
## 'data.frame': 569 obs. of 32 variables:
## $ id : int 842302 842517 84300903 84348301 84358402 843786 844359 84458202 844981 84501001 ...
## $ diagnosis : chr "M" "M" "M" "M" ...
## $ radius_mean : num 18 20.6 19.7 11.4 20.3 ...
## $ texture_mean : num 10.4 17.8 21.2 20.4 14.3 ...
## $ perimeter_mean : num 122.8 132.9 130 77.6 135.1 ...
## $ area_mean : num 1001 1326 1203 386 1297 ...
## $ smoothness_mean : num 0.1184 0.0847 0.1096 0.1425 0.1003 ...
## $ compactness_mean : num 0.2776 0.0786 0.1599 0.2839 0.1328 ...
## $ concavity_mean : num 0.3001 0.0869 0.1974 0.2414 0.198 ...
## $ concave points_mean : num 0.1471 0.0702 0.1279 0.1052 0.1043 ...
## $ symmetry_mean : num 0.242 0.181 0.207 0.26 0.181 ...
## $ fractal_dimension_mean : num 0.0787 0.0567 0.06 0.0974 0.0588 ...
## $ radius_se : num 1.095 0.543 0.746 0.496 0.757 ...
## $ texture_se : num 0.905 0.734 0.787 1.156 0.781 ...
## $ perimeter_se : num 8.59 3.4 4.58 3.44 5.44 ...
## $ area_se : num 153.4 74.1 94 27.2 94.4 ...
## $ smoothness_se : num 0.0064 0.00522 0.00615 0.00911 0.01149 ...
## $ compactness_se : num 0.049 0.0131 0.0401 0.0746 0.0246 ...
## $ concavity_se : num 0.0537 0.0186 0.0383 0.0566 0.0569 ...
## $ concave points_se : num 0.0159 0.0134 0.0206 0.0187 0.0188 ...
## $ symmetry_se : num 0.03 0.0139 0.0225 0.0596 0.0176 ...
## $ fractal_dimension_se : num 0.00619 0.00353 0.00457 0.00921 0.00511 ...
## $ radius_worst : num 25.4 25 23.6 14.9 22.5 ...
## $ texture_worst : num 17.3 23.4 25.5 26.5 16.7 ...
## $ perimeter_worst : num 184.6 158.8 152.5 98.9 152.2 ...
## $ area_worst : num 2019 1956 1709 568 1575 ...
## $ smoothness_worst : num 0.162 0.124 0.144 0.21 0.137 ...
## $ compactness_worst : num 0.666 0.187 0.424 0.866 0.205 ...
## $ concavity_worst : num 0.712 0.242 0.45 0.687 0.4 ...
## $ concave points_worst : num 0.265 0.186 0.243 0.258 0.163 ...
## $ symmetry_worst : num 0.46 0.275 0.361 0.664 0.236 ...
## $ fractal_dimension_worst: num 0.1189 0.089 0.0876 0.173 0.0768 ...
summary(wbcd)
## id diagnosis radius_mean texture_mean
## Min. : 8670 Length:569 Min. : 6.981 Min. : 9.71
## 1st Qu.: 869218 Class :character 1st Qu.:11.700 1st Qu.:16.17
## Median : 906024 Mode :character Median :13.370 Median :18.84
## Mean : 30371831 Mean :14.127 Mean :19.29
## 3rd Qu.: 8813129 3rd Qu.:15.780 3rd Qu.:21.80
## Max. :911320502 Max. :28.110 Max. :39.28
## perimeter_mean area_mean smoothness_mean compactness_mean
## Min. : 43.79 Min. : 143.5 Min. :0.05263 Min. :0.01938
## 1st Qu.: 75.17 1st Qu.: 420.3 1st Qu.:0.08637 1st Qu.:0.06492
## Median : 86.24 Median : 551.1 Median :0.09587 Median :0.09263
## Mean : 91.97 Mean : 654.9 Mean :0.09636 Mean :0.10434
## 3rd Qu.:104.10 3rd Qu.: 782.7 3rd Qu.:0.10530 3rd Qu.:0.13040
## Max. :188.50 Max. :2501.0 Max. :0.16340 Max. :0.34540
## concavity_mean concave points_mean symmetry_mean fractal_dimension_mean
## Min. :0.00000 Min. :0.00000 Min. :0.1060 Min. :0.04996
## 1st Qu.:0.02956 1st Qu.:0.02031 1st Qu.:0.1619 1st Qu.:0.05770
## Median :0.06154 Median :0.03350 Median :0.1792 Median :0.06154
## Mean :0.08880 Mean :0.04892 Mean :0.1812 Mean :0.06280
## 3rd Qu.:0.13070 3rd Qu.:0.07400 3rd Qu.:0.1957 3rd Qu.:0.06612
## Max. :0.42680 Max. :0.20120 Max. :0.3040 Max. :0.09744
## radius_se texture_se perimeter_se area_se
## Min. :0.1115 Min. :0.3602 Min. : 0.757 Min. : 6.802
## 1st Qu.:0.2324 1st Qu.:0.8339 1st Qu.: 1.606 1st Qu.: 17.850
## Median :0.3242 Median :1.1080 Median : 2.287 Median : 24.530
## Mean :0.4052 Mean :1.2169 Mean : 2.866 Mean : 40.337
## 3rd Qu.:0.4789 3rd Qu.:1.4740 3rd Qu.: 3.357 3rd Qu.: 45.190
## Max. :2.8730 Max. :4.8850 Max. :21.980 Max. :542.200
## smoothness_se compactness_se concavity_se concave points_se
## Min. :0.001713 Min. :0.002252 Min. :0.00000 Min. :0.000000
## 1st Qu.:0.005169 1st Qu.:0.013080 1st Qu.:0.01509 1st Qu.:0.007638
## Median :0.006380 Median :0.020450 Median :0.02589 Median :0.010930
## Mean :0.007041 Mean :0.025478 Mean :0.03189 Mean :0.011796
## 3rd Qu.:0.008146 3rd Qu.:0.032450 3rd Qu.:0.04205 3rd Qu.:0.014710
## Max. :0.031130 Max. :0.135400 Max. :0.39600 Max. :0.052790
## symmetry_se fractal_dimension_se radius_worst texture_worst
## Min. :0.007882 Min. :0.0008948 Min. : 7.93 Min. :12.02
## 1st Qu.:0.015160 1st Qu.:0.0022480 1st Qu.:13.01 1st Qu.:21.08
## Median :0.018730 Median :0.0031870 Median :14.97 Median :25.41
## Mean :0.020542 Mean :0.0037949 Mean :16.27 Mean :25.68
## 3rd Qu.:0.023480 3rd Qu.:0.0045580 3rd Qu.:18.79 3rd Qu.:29.72
## Max. :0.078950 Max. :0.0298400 Max. :36.04 Max. :49.54
## perimeter_worst area_worst smoothness_worst compactness_worst
## Min. : 50.41 Min. : 185.2 Min. :0.07117 Min. :0.02729
## 1st Qu.: 84.11 1st Qu.: 515.3 1st Qu.:0.11660 1st Qu.:0.14720
## Median : 97.66 Median : 686.5 Median :0.13130 Median :0.21190
## Mean :107.26 Mean : 880.6 Mean :0.13237 Mean :0.25427
## 3rd Qu.:125.40 3rd Qu.:1084.0 3rd Qu.:0.14600 3rd Qu.:0.33910
## Max. :251.20 Max. :4254.0 Max. :0.22260 Max. :1.05800
## concavity_worst concave points_worst symmetry_worst fractal_dimension_worst
## Min. :0.0000 Min. :0.00000 Min. :0.1565 Min. :0.05504
## 1st Qu.:0.1145 1st Qu.:0.06493 1st Qu.:0.2504 1st Qu.:0.07146
## Median :0.2267 Median :0.09993 Median :0.2822 Median :0.08004
## Mean :0.2722 Mean :0.11461 Mean :0.2901 Mean :0.08395
## 3rd Qu.:0.3829 3rd Qu.:0.16140 3rd Qu.:0.3179 3rd Qu.:0.09208
## Max. :1.2520 Max. :0.29100 Max. :0.6638 Max. :0.20750id column which holds no valuable information.
perimeter_mean, a perimeter_se and a perimeter_worst - each with its own mean, standard error and maximum and minimum values).
Data wrangling, sometimes referred to as data munging, is the process of transforming and mapping data from one “raw” data form into another format with the intent of making it more appropriate and valuable for a variety of downstream purposes such as analytics.
Since the dataset at hand is already of class data.frame, there is no need to transform its class in any way (the functions to be used work with data.frame class objects). Given that, the next procedure is to clean any empty values (NULL and NAs through the use of the is.null() and is.na() functions) which could meddle with later operations - let that be the first wrangling operation to be performed:
# Check for NULL data
null_check <- c()
for (i in 1:dim(wbcd)[1]) {
for (j in 1:dim(wbcd)[2]) {
# print(wbcd[i,j]) # This print() snippet allows to check each individual value passed onto this loop
append(null_check, is.null(wbcd[i,j]))
}
}
null_check # No NULL data (null_check = NULL)
## NULL
# Check for NA data
na_check <- c()
for (i in 1:dim(wbcd)[1]) {
for (j in 1:dim(wbcd)[2]) {
# print(wbcd[i,j]) # This print() snippet allows to check each individual value passed onto this loop
append(null_check, is.na(wbcd[i,j]))
}
}
na_check # No NA data (na_check = NULL)
## NULLAs stated previously, there is no need for the id column within our dataset (since it holds no valuable information), so the next (and final) wrangling procedure is to get rid of it. The following code snippet showcases an addition optional step, albeit a recommended one: the factors’ nomenclature are changed to better convey its meaning (this helps those unfamiliarized with the dataset to better interpret it).
wbcd <- wbcd[,-1]
wbcd$diagnosis <- factor(ifelse(wbcd$diagnosis=="B","Benign","Malignant"))Data analysts/scientists aim to study and understand a given set of data - correlation charts facilitate said study, clearly showing which variables are independent and which are not. What’s more, these correlations are core for the Principal Component Analysis (PCA) which is to be performed (and detailed) later on this document.
perimeter_mean, a perimeter_se and a perimeter_worst variables - each with its own mean, standard error and maximum and minimum values). Given that, this correlation chart exercise is performed upon 3 distinct groups:
chart.Correlation() function from the PerformanceAnalytics package.
pairs.panels() function from the psych package.
ggpairs() function from the GGally package.
ggcorr() function from the GGally package.
chart.Correlation() function from the PerformanceAnalytics package allows the user to plot a correlation chart based on the arguments at play, which are the following:
TRUE or FALSE whether or not to display a histogram.
“pearson” (default), “kendall” and “spearman”.
# Do not run this code snippet, as it is only here for illustration purposes
library(PerformanceAnalytics)
chart.Correlation(R,
histogram = TRUE,
method = "pearson",
...)Additional arguments can be passed through in order to better define the aesthetic elements of the scatter plots and optional histogram - the function accepts any arguments that can passed through into pairs. Further information regarding the function chart.Correlation(), its behavior and its arguments is available in its RDocumentation associated page: https://www.rdocumentation.org/packages/PerformanceAnalytics/versions/2.0.4/topics/chart.Correlation
The following code snippet showcases the function at hand, although only Pearson correlation is plotted for the sake of simplicity and readability.
library(PerformanceAnalytics)
# Analysis of the Pearson correlation between variables
chart.Correlation(wbcd[,c(2:11)],
histogram=TRUE,
method = "pearson",
col="grey10",
pch=1,
main="Correlation chart for mean data")
chart.Correlation(wbcd[,c(12:21)],
histogram=TRUE,
method = "pearson",
col="grey10",
pch=1,
main="Correlation chart for se data")
chart.Correlation(wbcd[,c(22:31)],
histogram=TRUE,
method = "pearson",
col="grey10",
pch=1,
main="Correlation chart for worst data")
# Analysis of the Kendall correlation between variables
chart.Correlation(wbcd[,c(2:11)],
method = "kendall",
histogram=TRUE,
col="grey10",
pch=1,
main="Correlation chart for mean data")
chart.Correlation(wbcd[,c(12:21)],
method = "kendall",
histogram=TRUE,
col="grey10",
pch=1,
main="Correlation chart for se data")
chart.Correlation(wbcd[,c(22:31)],
method = "kendall",
histogram=TRUE,
col="grey10",
pch=1,
main="Correlation chart for worst data")
# Analysis of the Spearman correlation between variables
chart.Correlation(wbcd[,c(2:11)],
method = "spearman",
histogram=TRUE,
col="grey10",
pch=1,
main="Correlation chart for mean data")
chart.Correlation(wbcd[,c(12:21)],
method = "spearman",
histogram=TRUE,
col="grey10",
pch=1,
main="Correlation chart for se data")
chart.Correlation(wbcd[,c(22:31)],
method = "spearman",
histogram=TRUE,
col="grey10",
pch=1,
main="Correlation chart for worst data")pairs.panels() function from the psych package allows the user to plot a correlation chart based on the arguments at play, which are the following:
data.frame).
TRUE draws loess smooths.
TRUE scales the correlation font by the size of the absolute correlation.
TRUE shows the density plots as well as histograms.
TRUE draws correlation ellipses.
TRUE plots the linear fit rather than the LOESS smoothed fits.
“pearson” (default), “kendall” and “spearman”.
TRUE or FALSE determines whether or not to report correlations if plotting regressions.
TRUE or FALSE determines whether or not the data points are jittered before being plotted.
FALSE, do not show the data points, just the data ellipses and smoothed functions.
TRUE or FALSE determines whether or not a rug is drawed under the histograms.
cex then the argument will only determines the size of the text in the correlation’s boxes, but if cex.cor is specified then the argument will function to change the points’ size.
TRUE, then smooth.scatter is applied upon the data points (which is slow but pretty when lots of subjects are to be plotted).
TRUE or FALSE determines whether or not to show the significance of correlations by using astricks [*].
TRUE or FALSE determines whether or not to draw confidence intervals for the linear model or for the loess fit. If confidence intervals are not drawn, the fitting function is lowess.
# Do not run this code snippet, as it is only here for illustration purposes
library(psych)
pairs.panels(x,
smooth = TRUE,
scale = FALSE,
density = TRUE,
ellipses = TRUE,
lm = FALSE,
digits = 2,
method = "pearson",
pch = 20,
cor = TRUE,
jiggle = FALSE,
factor = 2,
hist.col = "cyan",
show.points = TRUE,
rug = TRUE,
breaks = "Sturges",
cex.cor = 1,
wt = NULL,
smoother = FALSE,
stars = FALSE,
ci = FALSE,
alpha = .05,
...)Like with the chart.Correlation() function, additional arguments can be passed through in order to better define the scatter plot and the histogram, which is not optional with this function. Besides that, the function pairs.panels() can be considered more customizable than chart.Correlation() due to the sheer amount of specific arguments that can be used with it. Note that, as was the case with chart.Correlation(), the function pairs.panels() also accepts any arguments that can passed through into pairs.
For more information about the function itself, here’s the rdocumentation.org related page: https://www.rdocumentation.org/packages/psych/versions/2.1.6/topics/pairs.panels
The following code snippet showcases the function at hand, although only Pearson correlation is plotted for the sake of simplicity and readability.
library(psych)
# Analysis of the Pearson correlation between variables
pairs.panels(wbcd[,c(2:11)],
method="pearson",
hist.col = "#cccccc",
pch=1, lm=TRUE,
stars = TRUE,
main="Correlation chart for mean data")
pairs.panels(wbcd[,c(12:21)],
method="pearson",
hist.col = "#cccccc",
pch=1,
lm=TRUE,
stars = TRUE,
main="Correlation chart for se data")
pairs.panels(wbcd[,c(22:31)],
method="pearson",
hist.col = "#cccccc",
pch=1,
lm=TRUE,
stars = TRUE,
main="Correlation chart for worst data")
# Analysis of the Kendall correlation between variables
pairs.panels(wbcd[,c(2:11)],
method="kendall",
hist.col = "#cccccc",
pch=1,
lm=TRUE,
stars = TRUE,
main="Correlation chart for mean data")
pairs.panels(wbcd[,c(12:21)],
method="kendall",
hist.col = "#cccccc",
pch=1,
lm=TRUE,
stars = TRUE,
main="Correlation chart for se data")
pairs.panels(wbcd[,c(22:31)],
method="kendall",
hist.col = "#cccccc",
pch=1,
lm=TRUE,
stars = TRUE,
main="Correlation chart for worst data")
# Analysis of the Spearman correlation between variables
pairs.panels(wbcd[,c(2:11)],
method="spearman",
hist.col = "#cccccc",
pch=1,
lm=TRUE,
stars = TRUE,
main="Correlation chart for mean data")
pairs.panels(wbcd[,c(12:21)],
method="spearman",
hist.col = "#cccccc",
pch=1,
lm=TRUE,
stars = TRUE,
main="Correlation chart for se data")
pairs.panels(wbcd[,c(22:31)],
method="spearman",
hist.col = "#cccccc",
pch=1,
lm=TRUE,
stars = TRUE,
main="Correlation chart for worst data")ggpairs() function from the GGally package allows the user to plot a correlation chart based on the arguments at play, among which it is worth noting the following:
# Do not run this code snippet, as it is only here for illustration purposes
library(GGally)
ggpairs(x,
mapping = NULL,
columns = 1:ncol(data),
title = NULL,
xlab = NULL,
ylab = NULL,
...)Like with the previous functions, additional arguments can be passed through. Fact is that ggpairs() is based upon ggplot2, which makes it compatible with many of the arguments available for said kind of plots and it would be besides the scope of this document to cover them all. Some of which are showcased in the upcoming code snippet, but more information about the function and its potential arguments can be found in its associated RDocumentation page: https://www.rdocumentation.org/packages/GGally/versions/1.5.0/topics/ggpairs
Beware that ggplot2 is one of the most powerful R tools to create any sort of graphics (arguably the most powerful one). The following RDocumentation page overviews its installation and usage: https://www.rdocumentation.org/packages/ggplot2/versions/3.3.5
The following code snippet showcases the function at hand. Note that, as oppossed to the previously detailed functions, ggpairs() does not allow the user to specify which correlation methodology to apply to the dataset - it applies Pearson’s and that’s about it.
library(GGally)
ggpairs(wbcd[,c(2:11)],) +
theme_bw() +
labs(title = "Correlation chart for mean data") +
theme(plot.title = element_text(face = 'bold', color = '#000000', hjust = 0.5, size = 14))
ggpairs(wbcd[,c(12:21)],) +
theme_bw() +
labs(title = "Correlation chart for se data") +
theme(plot.title = element_text(face = 'bold', color = '#000000', hjust = 0.5, size = 14))
ggpairs(wbcd[,c(22:31)],) +
theme_bw() +
labs(title = "Correlation chart for worst data") +
theme(plot.title = element_text(face = 'bold', color = '#000000', hjust = 0.5, size = 14))Being based upon ggplot2 is a strong point in favor of ggpairs() and creativity can go a long way: data science is not only about understanding the data at hand but also to make it understandable for others - the many possibilities ggplot2 brings along can dramatically increase the plot’s readability thus making it easier to be understood by non-specialists (as in not data scientists). The upcoming code snippet slightly modifies the previous one showcasing a diagnosis-based coloring which makes the graph way easier to interpret (while also highlighting the striking similarities between this function’ structure and ggplot()’s).
library(GGally)
ggpairs(wbcd[,c(2:11,1)], aes(color = diagnosis, alpha = 0.75), lower = list(continuous = "smooth")) +
theme_bw() +
labs(title = "Correlation chart for mean data") +
theme(plot.title = element_text(face = 'bold', color = '#000000', hjust = 0.5, size = 14))
ggpairs(wbcd[,c(12:21,1)], aes(color = diagnosis, alpha = 0.75), lower = list(continuous = "smooth")) +
theme_bw() +
labs(title = "Correlation chart for mean data") +
theme(plot.title = element_text(face = 'bold', color = '#000000', hjust = 0.5, size = 14))
ggpairs(wbcd[,c(22:31,1)], aes(color = diagnosis, alpha = 0.75), lower = list(continuous = "smooth")) +
theme_bw() +
labs(title = "Correlation chart for mean data") +
theme(plot.title = element_text(face = 'bold', color = '#000000', hjust = 0.5, size = 14))The ggcorr() function from the GGally package allows the user to plot a simplified correlation chart focused solely on correlation values, which increases their visibility and readability (helping to illustrate and showcase certain points) at the cost of omitting the data itself (neither the datapoints and their scatter plot nor the histograms are plotted with this function).
“everything”, “all.obs”, “complete.obs”, “na.or.complete” or “pairwise.complete.obs” (abbreviations work); the second value gives the type of correlation coefficient to compute, and must be one of “pearson”, “kendall” or “spearman”.
NULL which implies no breaks (continuous scaling).
nbreaks is used, a ColorBrewer palette to use instead of the colors specified by low, mid and high
“tile”, “circle”, “text” or “blank”.
geom has been set to “circle”, the minimum size of the circles.
geom has been set to “circle”, the maximum size of the circles.
TRUE or FALSE determines whether or not to add correlation coefficients to the plot.
limits = NULL or FALSE to remove.
TRUE or FALSE determines whether or not to drop unused breaks from the color scale.
As was the case with ggpairs(), this function is also based upon ggplot2 making it compatible with many of ggplot2’s arguments. More information regarding the function and its arguments is available in its associated RDocumentation page: https://www.rdocumentation.org/packages/GGally/versions/1.5.0/topics/ggcorr
# Do not run this code snippet, as it is only here for illustration purposes
library(GGally)
ggcorr(data,
method = c("pairwise", "pearson"),
cor_matrix = NULL,
nbreaks = NULL,
digits = 2,
name = "",
low = "#3B9AB2",
mid = "#EEEEEE",
high = "#F21A00",
midpoint = 0,
palette = NULL,
geom = "tile",
min_size = 2,
max_size = 6,
label = FALSE,
label_alpha = FALSE,
label_color = "black",
label_round = 1,
label_size = 4,
limits = c(-1, 1),
drop = is.null(limits) || identical(limits, FALSE),
layout.exp = 0,
legend.position = "right",
legend.size = 9,
...)The following code snippets showcase the function at hand being applied to the Wisconsin Breast Cancer Dataset, although only Pearson correlation is plotted for the sake of simplicity and readability.
library(GGally)
# Analysis of the Pearson correlation between variables using the pairs.panels() function from the psych package
ggcorr(wbcd[,c(2:11)],
method = c("pairwise", "pearson"),
name = "corr",
geom = "tile",
label = TRUE) +
theme(legend.position = "none") +
labs(title = "Correlation chart for mean data") +
theme(plot.title = element_text(face = 'bold', color = 'black', hjust = 0.5, size = 12))
ggcorr(wbcd[,c(12:21)],
method = c("pairwise", "pearson"),
name = "corr",
geom = "tile",
label = TRUE) +
theme(legend.position = "none") +
labs(title = "Correlation chart for se data") +
theme(plot.title = element_text(face = 'bold', color = 'black', hjust = 0.5, size = 12))
ggcorr(wbcd[,c(22:31)],
method = c("pairwise", "pearson"),
name = "corr",
geom = "tile",
label = TRUE) +
theme(legend.position = "none") +
labs(title = "Correlation chart for worst data") +
theme(plot.title = element_text(face = 'bold', color = 'black', hjust = 0.5, size = 12))
# Analysis of the Pearson correlation between variables using the pairs.panels() function from the psych package
ggcorr(wbcd[,c(2:11)],
method = c("pairwise", "kendall"),
name = "corr",
geom = "tile",
label = TRUE) +
theme(legend.position = "none") +
labs(title = "Correlation chart for mean data") +
theme(plot.title = element_text(face = 'bold', color = 'black', hjust = 0.5, size = 12))
ggcorr(wbcd[,c(12:21)],
method = c("pairwise", "kendall"),
name = "corr",
geom = "tile",
label = TRUE) +
theme(legend.position = "none") +
labs(title = "Correlation chart for se data") +
theme(plot.title = element_text(face = 'bold', color = 'black', hjust = 0.5, size = 12))
ggcorr(wbcd[,c(22:31)],
method = c("pairwise", "kendall"),
name = "corr",
geom = "tile",
label = TRUE) +
theme(legend.position = "none") +
labs(title = "Correlation chart for worst data") +
theme(plot.title = element_text(face = 'bold', color = 'black', hjust = 0.5, size = 12))
# Analysis of the Pearson correlation between variables using the pairs.panels() function from the psych package
ggcorr(wbcd[,c(2:11)],
method = c("pairwise", "spearman"),
name = "corr",
geom = "tile",
label = TRUE) +
theme(legend.position = "none") +
labs(title = "Correlation chart for mean data") +
theme(plot.title = element_text(face = 'bold', color = 'black', hjust = 0.5, size = 12))
ggcorr(wbcd[,c(12:21)],
method = c("pairwise", "spearman"),
name = "corr",
geom = "tile",
label = TRUE) +
theme(legend.position = "none") +
labs(title = "Correlation chart for se data") +
theme(plot.title = element_text(face = 'bold', color = 'black', hjust = 0.5, size = 12))
ggcorr(wbcd[,c(22:31)],
method = c("pairwise", "spearman"),
name = "corr",
geom = "tile",
label = TRUE) +
theme(legend.position = "none") +
labs(title = "Correlation chart for worst data") +
theme(plot.title = element_text(face = 'bold', color = 'black', hjust = 0.5, size = 12))Principal Component Analysis, mostly known as PCA, is the most popular approach to dimensional reduction. It is widely used to summarize and to visualize the information within a dataset, often described through multiple inter-correlated quantitative variables. Considering each variable a dimension turns any dataset into a multi-dimensional matrix/hyperspace which is impossible to visualize when said dataset is defined by more than 3 variables (since there exist only 3 spatial dimensions) - that is where dimensional reduction (and thus, PCA) comes into play.
Through PCA, the important information from a multivariate dataset can be extracted and expressed through a set of new variables called Principal Components (PC - singular; PCs - plural). These PCs are linear combinations of the original variables, so expressing the dataset using these PCs as variables reduces the overall number of variables needed to understand the data. Using this approach, PCA reduces the dimensionality of a multivariate dataset in order to identify patterns and better understand these highly complex matrices. This process also allows to visualize said multivariate datasets with minimal loss of information by reducing the number of PCs down to either 2 (for a bi-dimensional plot) or 3 (for a three-dimensional one).
There’s a fantastic video by content creator Josh Starmer which explains PCA step by step, detailing the intricacies of this technique and all of the elements at play. There are many aspects of PCA that will not be covered by the scope of this documents due to time constraints and to keep the overall size within reasonable limits, so the video essay in question is highly recommended as an excellent starting point: https://www.youtube.com/watch?v=FgakZw6K1QQ
prcomp() and princomp(), which are R built-in functions
PCA() from the FactoMineR package
dudi.pca() from the ade4 package
epPCA() from the ExPosition package
As already stated, there are two built-in R functions (from within R’s built-in stats package) that allow the user to perform a Principal Component Analysis: prcomp() and princomp(). Despite the similarities in name, their approaches to PCA are quite different: prcomp() is a singular value decomposition (SVD) which means that it is based upon the covariances/correlations between individuals, whereas princomp() is a spectral decomposition which instead examines the covariances/correlations between variables.
It is worth noting that, according to the R help module, SVD has slightly better numerical accuracy and that makes prcomp() is the preferred approach (although there could exist a scenario under which princomp() yields better results).
Let’s take a look at the main arguments for both functions:
# Do not run this code snippet, as it is only here for illustration purposes
prcomp(x,
retx = TRUE,
center = TRUE,
scale. = FALSE,
tol = NULL,
rank. = NULL,
...)
princomp(x,
cor = FALSE,
scores = TRUE, ...)prcomp():
TRUE or FALSE determines whether or not to return the rotated variables.
TRUE or FALSE determines if the variables should be shifted to be zero centered.
tol = NULL with which no components are omitted).
More information regarding the prcomp() function and its arguments is available in its associated RDocumentation page: https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/prcomp
princomp():
x.
x.
More information regarding the princomp() function and its arguments is available in its associated RDocumentation page: https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/princomp
Let’s now evaluate the results when applying prcomp() to the Wisconsin Breast Cancer Dataset:
all_pca_1 <- prcomp(wbcd[,-1], scale. = TRUE)
class(all_pca_1)
## [1] "prcomp"
str(all_pca_1)
## List of 5
## $ sdev : num [1:30] 3.64 2.39 1.68 1.41 1.28 ...
## $ rotation: num [1:30, 1:30] -0.219 -0.104 -0.228 -0.221 -0.143 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:30] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
## .. ..$ : chr [1:30] "PC1" "PC2" "PC3" "PC4" ...
## $ center : Named num [1:30] 14.1273 19.2896 91.969 654.8891 0.0964 ...
## ..- attr(*, "names")= chr [1:30] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
## $ scale : Named num [1:30] 3.524 4.301 24.299 351.9141 0.0141 ...
## ..- attr(*, "names")= chr [1:30] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
## $ x : num [1:569, 1:30] -9.18 -2.39 -5.73 -7.12 -3.93 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : NULL
## .. ..$ : chr [1:30] "PC1" "PC2" "PC3" "PC4" ...
## - attr(*, "class")= chr "prcomp"
summary(all_pca_1)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 3.6444 2.3857 1.67867 1.40735 1.28403 1.09880 0.82172
## Proportion of Variance 0.4427 0.1897 0.09393 0.06602 0.05496 0.04025 0.02251
## Cumulative Proportion 0.4427 0.6324 0.72636 0.79239 0.84734 0.88759 0.91010
## PC8 PC9 PC10 PC11 PC12 PC13 PC14
## Standard deviation 0.69037 0.6457 0.59219 0.5421 0.51104 0.49128 0.39624
## Proportion of Variance 0.01589 0.0139 0.01169 0.0098 0.00871 0.00805 0.00523
## Cumulative Proportion 0.92598 0.9399 0.95157 0.9614 0.97007 0.97812 0.98335
## PC15 PC16 PC17 PC18 PC19 PC20 PC21
## Standard deviation 0.30681 0.28260 0.24372 0.22939 0.22244 0.17652 0.1731
## Proportion of Variance 0.00314 0.00266 0.00198 0.00175 0.00165 0.00104 0.0010
## Cumulative Proportion 0.98649 0.98915 0.99113 0.99288 0.99453 0.99557 0.9966
## PC22 PC23 PC24 PC25 PC26 PC27 PC28
## Standard deviation 0.16565 0.15602 0.1344 0.12442 0.09043 0.08307 0.03987
## Proportion of Variance 0.00091 0.00081 0.0006 0.00052 0.00027 0.00023 0.00005
## Cumulative Proportion 0.99749 0.99830 0.9989 0.99942 0.99969 0.99992 0.99997
## PC29 PC30
## Standard deviation 0.02736 0.01153
## Proportion of Variance 0.00002 0.00000
## Cumulative Proportion 1.00000 1.00000mean_pca_1 <- prcomp(wbcd[,c(2:11)], scale. = TRUE)
class(mean_pca_1)
## [1] "prcomp"
str(mean_pca_1)
## List of 5
## $ sdev : num [1:10] 2.341 1.587 0.938 0.706 0.61 ...
## $ rotation: num [1:10, 1:10] -0.364 -0.154 -0.376 -0.364 -0.232 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:10] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
## .. ..$ : chr [1:10] "PC1" "PC2" "PC3" "PC4" ...
## $ center : Named num [1:10] 14.1273 19.2896 91.969 654.8891 0.0964 ...
## ..- attr(*, "names")= chr [1:10] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
## $ scale : Named num [1:10] 3.524 4.301 24.299 351.9141 0.0141 ...
## ..- attr(*, "names")= chr [1:10] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
## $ x : num [1:569, 1:10] -5.22 -1.73 -3.97 -3.59 -3.15 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : NULL
## .. ..$ : chr [1:10] "PC1" "PC2" "PC3" "PC4" ...
## - attr(*, "class")= chr "prcomp"
summary(mean_pca_1)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 2.3406 1.5870 0.93841 0.7064 0.61036 0.35234 0.28299
## Proportion of Variance 0.5479 0.2519 0.08806 0.0499 0.03725 0.01241 0.00801
## Cumulative Proportion 0.5479 0.7997 0.88779 0.9377 0.97495 0.98736 0.99537
## PC8 PC9 PC10
## Standard deviation 0.18679 0.10552 0.01680
## Proportion of Variance 0.00349 0.00111 0.00003
## Cumulative Proportion 0.99886 0.99997 1.00000se_pca_1 <- prcomp(wbcd[,c(12:21)], scale. = TRUE)
class(se_pca_1)
## [1] "prcomp"
str(se_pca_1)
## List of 5
## $ sdev : num [1:10] 2.178 1.441 1.124 0.771 0.76 ...
## $ rotation: num [1:10, 1:10] -0.346 -0.189 -0.357 -0.304 -0.212 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:10] "radius_se" "texture_se" "perimeter_se" "area_se" ...
## .. ..$ : chr [1:10] "PC1" "PC2" "PC3" "PC4" ...
## $ center : Named num [1:10] 0.40517 1.21685 2.86606 40.33708 0.00704 ...
## ..- attr(*, "names")= chr [1:10] "radius_se" "texture_se" "perimeter_se" "area_se" ...
## $ scale : Named num [1:10] 0.277 0.552 2.022 45.491 0.003 ...
## ..- attr(*, "names")= chr [1:10] "radius_se" "texture_se" "perimeter_se" "area_se" ...
## $ x : num [1:569, 1:10] -4.05 0.34 -1.96 -3.79 -2.22 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : NULL
## .. ..$ : chr [1:10] "PC1" "PC2" "PC3" "PC4" ...
## - attr(*, "class")= chr "prcomp"
summary(se_pca_1)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 2.1779 1.4406 1.1245 0.77095 0.75991 0.57939 0.43512
## Proportion of Variance 0.4743 0.2075 0.1264 0.05944 0.05775 0.03357 0.01893
## Cumulative Proportion 0.4743 0.6819 0.8083 0.86774 0.92548 0.95905 0.97798
## PC8 PC9 PC10
## Standard deviation 0.3962 0.20436 0.14635
## Proportion of Variance 0.0157 0.00418 0.00214
## Cumulative Proportion 0.9937 0.99786 1.00000worst_pca_1 <- prcomp(wbcd[,c(22:31)], scale. = TRUE)
class(worst_pca_1)
## [1] "prcomp"
str(worst_pca_1)
## List of 5
## $ sdev : num [1:10] 2.387 1.444 0.896 0.735 0.717 ...
## $ rotation: num [1:10, 1:10] -0.336 -0.201 -0.348 -0.325 -0.249 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:10] "radius_worst" "texture_worst" "perimeter_worst" "area_worst" ...
## .. ..$ : chr [1:10] "PC1" "PC2" "PC3" "PC4" ...
## $ center : Named num [1:10] 16.269 25.677 107.261 880.583 0.132 ...
## ..- attr(*, "names")= chr [1:10] "radius_worst" "texture_worst" "perimeter_worst" "area_worst" ...
## $ scale : Named num [1:10] 4.8332 6.1463 33.6025 569.357 0.0228 ...
## ..- attr(*, "names")= chr [1:10] "radius_worst" "texture_worst" "perimeter_worst" "area_worst" ...
## $ x : num [1:569, 1:10] -5.97 -1.82 -3.4 -6.3 -1.15 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : NULL
## .. ..$ : chr [1:10] "PC1" "PC2" "PC3" "PC4" ...
## - attr(*, "class")= chr "prcomp"
summary(worst_pca_1)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 2.3869 1.4443 0.89597 0.73531 0.71741 0.42862 0.28959
## Proportion of Variance 0.5697 0.2086 0.08028 0.05407 0.05147 0.01837 0.00839
## Cumulative Proportion 0.5697 0.7783 0.85860 0.91267 0.96413 0.98251 0.99089
## PC8 PC9 PC10
## Standard deviation 0.26802 0.12343 0.06326
## Proportion of Variance 0.00718 0.00152 0.00040
## Cumulative Proportion 0.99808 0.99960 1.00000class() showcases that the object created by using the prcomp() function is a list of class “prcomp”. Said object/list contains the following components, as illustrated by the function str():
loadings component of the princomp() function.
FALSE otherwise.
FALSE otherwise.
retx is TRUE, then this component holds the value of the rotated data in the form of the centered (and scaled if requested) data multiplied by the rotation matrix.
The results of applying to it the function summary() returns the standard deviation of each Principal Component as well as two core elements of the PCA: the proportion of variance and the cumulative proportion. The former indicates de percentage of the data explained by each Principal Component whereas the latter sums the proportion of variances of all the Principal Components up until the one being observed. Taking all_pca_1 as an example, this means that first PC explains about 44.27% of the data, PC1 and PC2 cover 63.24% of the data, PC1~PC3 cover 72.64% of the data and so forth.
Let’s now repeat this exercise with the princomp() function and examine the results:
all_pca_2 <- princomp(wbcd[,-1], cor = TRUE)
class(all_pca_2)
## [1] "princomp"
str(all_pca_2)
## List of 7
## $ sdev : Named num [1:30] 3.64 2.39 1.68 1.41 1.28 ...
## ..- attr(*, "names")= chr [1:30] "Comp.1" "Comp.2" "Comp.3" "Comp.4" ...
## $ loadings: 'loadings' num [1:30, 1:30] 0.219 0.104 0.228 0.221 0.143 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:30] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
## .. ..$ : chr [1:30] "Comp.1" "Comp.2" "Comp.3" "Comp.4" ...
## $ center : Named num [1:30] 14.1273 19.2896 91.969 654.8891 0.0964 ...
## ..- attr(*, "names")= chr [1:30] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
## $ scale : Named num [1:30] 3.521 4.2973 24.2776 351.6048 0.0141 ...
## ..- attr(*, "names")= chr [1:30] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
## $ n.obs : int 569
## $ scores : num [1:569, 1:30] 9.19 2.39 5.73 7.12 3.94 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : NULL
## .. ..$ : chr [1:30] "Comp.1" "Comp.2" "Comp.3" "Comp.4" ...
## $ call : language princomp(x = wbcd[, -1], cor = TRUE)
## - attr(*, "class")= chr "princomp"
summary(all_pca_2)
## Importance of components:
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
## Standard deviation 3.6443940 2.3856560 1.67867477 1.40735229 1.28402903
## Proportion of Variance 0.4427203 0.1897118 0.09393163 0.06602135 0.05495768
## Cumulative Proportion 0.4427203 0.6324321 0.72636371 0.79238506 0.84734274
## Comp.6 Comp.7 Comp.8 Comp.9 Comp.10
## Standard deviation 1.09879780 0.82171778 0.69037464 0.64567392 0.59219377
## Proportion of Variance 0.04024522 0.02250734 0.01588724 0.01389649 0.01168978
## Cumulative Proportion 0.88758796 0.91009530 0.92598254 0.93987903 0.95156881
## Comp.11 Comp.12 Comp.13 Comp.14
## Standard deviation 0.54213992 0.511039500 0.49128148 0.396244525
## Proportion of Variance 0.00979719 0.008705379 0.00804525 0.005233657
## Cumulative Proportion 0.96136600 0.970071383 0.97811663 0.983350291
## Comp.15 Comp.16 Comp.17 Comp.18
## Standard deviation 0.306814219 0.282600072 0.243719178 0.229387845
## Proportion of Variance 0.003137832 0.002662093 0.001979968 0.001753959
## Cumulative Proportion 0.986488123 0.989150216 0.991130184 0.992884143
## Comp.19 Comp.20 Comp.21 Comp.22
## Standard deviation 0.222435590 0.176520261 0.1731268145 0.1656484305
## Proportion of Variance 0.001649253 0.001038647 0.0009990965 0.0009146468
## Cumulative Proportion 0.994533397 0.995572043 0.9965711397 0.9974857865
## Comp.23 Comp.24 Comp.25 Comp.26
## Standard deviation 0.1560155049 0.1343689213 0.1244237573 0.090430304
## Proportion of Variance 0.0008113613 0.0006018336 0.0005160424 0.000272588
## Cumulative Proportion 0.9982971477 0.9988989813 0.9994150237 0.999687612
## Comp.27 Comp.28 Comp.29 Comp.30
## Standard deviation 0.0830690308 3.986650e-02 0.0273642668 1.153451e-02
## Proportion of Variance 0.0002300155 5.297793e-05 0.0000249601 4.434827e-06
## Cumulative Proportion 0.9999176271 9.999706e-01 0.9999955652 1.000000e+00mean_pca_2 <- princomp(wbcd[,c(2:11)], cor = TRUE)
class(mean_pca_2)
## [1] "princomp"
str(mean_pca_2)
## List of 7
## $ sdev : Named num [1:10] 2.341 1.587 0.938 0.706 0.61 ...
## ..- attr(*, "names")= chr [1:10] "Comp.1" "Comp.2" "Comp.3" "Comp.4" ...
## $ loadings: 'loadings' num [1:10, 1:10] 0.364 0.154 0.376 0.364 0.232 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:10] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
## .. ..$ : chr [1:10] "Comp.1" "Comp.2" "Comp.3" "Comp.4" ...
## $ center : Named num [1:10] 14.1273 19.2896 91.969 654.8891 0.0964 ...
## ..- attr(*, "names")= chr [1:10] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
## $ scale : Named num [1:10] 3.521 4.2973 24.2776 351.6048 0.0141 ...
## ..- attr(*, "names")= chr [1:10] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
## $ n.obs : int 569
## $ scores : num [1:569, 1:10] 5.22 1.73 3.97 3.6 3.15 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : NULL
## .. ..$ : chr [1:10] "Comp.1" "Comp.2" "Comp.3" "Comp.4" ...
## $ call : language princomp(x = wbcd[, c(2:11)], cor = TRUE)
## - attr(*, "class")= chr "princomp"
summary(mean_pca_2)
## Importance of components:
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
## Standard deviation 2.3406384 1.5870456 0.93841099 0.70640600 0.61035989
## Proportion of Variance 0.5478588 0.2518714 0.08806152 0.04990094 0.03725392
## Cumulative Proportion 0.5478588 0.7997302 0.88779168 0.93769262 0.97494654
## Comp.6 Comp.7 Comp.8 Comp.9
## Standard deviation 0.35233755 0.282993481 0.186788096 0.105524692
## Proportion of Variance 0.01241417 0.008008531 0.003488979 0.001113546
## Cumulative Proportion 0.98736071 0.995369244 0.998858223 0.999971769
## Comp.10
## Standard deviation 1.680196e-02
## Proportion of Variance 2.823059e-05
## Cumulative Proportion 1.000000e+00se_pca_2 <- princomp(wbcd[,c(12:21)], cor = TRUE)
class(se_pca_2)
## [1] "princomp"
str(se_pca_2)
## List of 7
## $ sdev : Named num [1:10] 2.178 1.441 1.124 0.771 0.76 ...
## ..- attr(*, "names")= chr [1:10] "Comp.1" "Comp.2" "Comp.3" "Comp.4" ...
## $ loadings: 'loadings' num [1:10, 1:10] 0.346 0.189 0.357 0.304 0.212 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:10] "radius_se" "texture_se" "perimeter_se" "area_se" ...
## .. ..$ : chr [1:10] "Comp.1" "Comp.2" "Comp.3" "Comp.4" ...
## $ center : Named num [1:10] 0.40517 1.21685 2.86606 40.33708 0.00704 ...
## ..- attr(*, "names")= chr [1:10] "radius_se" "texture_se" "perimeter_se" "area_se" ...
## $ scale : Named num [1:10] 0.277 0.551 2.02 45.451 0.003 ...
## ..- attr(*, "names")= chr [1:10] "radius_se" "texture_se" "perimeter_se" "area_se" ...
## $ n.obs : int 569
## $ scores : num [1:569, 1:10] 4.053 -0.341 1.961 3.795 2.219 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : NULL
## .. ..$ : chr [1:10] "Comp.1" "Comp.2" "Comp.3" "Comp.4" ...
## $ call : language princomp(x = wbcd[, c(12:21)], cor = TRUE)
## - attr(*, "class")= chr "princomp"
summary(se_pca_2)
## Importance of components:
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
## Standard deviation 2.177928 1.4405579 1.1244649 0.77094730 0.75991287
## Proportion of Variance 0.474337 0.2075207 0.1264421 0.05943597 0.05774676
## Cumulative Proportion 0.474337 0.6818577 0.8082998 0.86773580 0.92548255
## Comp.6 Comp.7 Comp.8 Comp.9 Comp.10
## Standard deviation 0.57939475 0.43511509 0.39619334 0.20436292 0.146347852
## Proportion of Variance 0.03356983 0.01893251 0.01569692 0.00417642 0.002141769
## Cumulative Proportion 0.95905238 0.97798489 0.99368181 0.99785823 1.000000000worst_pca_2 <- princomp(wbcd[,c(22:31)], cor = TRUE)
class(worst_pca_2)
## [1] "princomp"
str(worst_pca_2)
## List of 7
## $ sdev : Named num [1:10] 2.387 1.444 0.896 0.735 0.717 ...
## ..- attr(*, "names")= chr [1:10] "Comp.1" "Comp.2" "Comp.3" "Comp.4" ...
## $ loadings: 'loadings' num [1:10, 1:10] 0.336 0.201 0.348 0.325 0.249 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:10] "radius_worst" "texture_worst" "perimeter_worst" "area_worst" ...
## .. ..$ : chr [1:10] "Comp.1" "Comp.2" "Comp.3" "Comp.4" ...
## $ center : Named num [1:10] 16.269 25.677 107.261 880.583 0.132 ...
## ..- attr(*, "names")= chr [1:10] "radius_worst" "texture_worst" "perimeter_worst" "area_worst" ...
## $ scale : Named num [1:10] 4.829 6.1409 33.573 568.8565 0.0228 ...
## ..- attr(*, "names")= chr [1:10] "radius_worst" "texture_worst" "perimeter_worst" "area_worst" ...
## $ n.obs : int 569
## $ scores : num [1:569, 1:10] 5.97 1.82 3.41 6.3 1.15 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : NULL
## .. ..$ : chr [1:10] "Comp.1" "Comp.2" "Comp.3" "Comp.4" ...
## $ call : language princomp(x = wbcd[, c(22:31)], cor = TRUE)
## - attr(*, "class")= chr "princomp"
summary(worst_pca_2)
## Importance of components:
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
## Standard deviation 2.3868885 1.4442930 0.89597293 0.73531379 0.71740732
## Proportion of Variance 0.5697237 0.2085982 0.08027675 0.05406864 0.05146733
## Cumulative Proportion 0.5697237 0.7783219 0.85859864 0.91266728 0.96413461
## Comp.6 Comp.7 Comp.8 Comp.9
## Standard deviation 0.42862478 0.289591321 0.26801978 0.123428309
## Proportion of Variance 0.01837192 0.008386313 0.00718346 0.001523455
## Cumulative Proportion 0.98250653 0.990892839 0.99807630 0.999599754
## Comp.10
## Standard deviation 0.0632649639
## Proportion of Variance 0.0004002456
## Cumulative Proportion 1.0000000000class() showcases that the object created by using the princomp() function is a list of class “princomp”. Said object/list contains the following components, as illustrated by the function str():
rotation component of the prcomp() function.
scores = TRUE then this component holds the scores of the supplied data on the principal components.
As can be observed, the objects created by prcomp() and princomp() differ both in class and components.
Once again, the results of applying to it the function summary() returns the standard deviation of each Principal Component as well as both the proportion of variance and the cumulative proportion. It is worth noting that the cumulative proportion with princomp() is identical to the one obtained with prcomp() - as should be.
The PCA() function from the FactoMineR package is formatted as follows:
# Do not run this code snippet, as it is only here for illustration purposes
library(FactoMineR)
PCA(X,
scale.unit = TRUE,
ncp = 5,
ind.sup = NULL,
quanti.sup = NULL,
quali.sup = NULL,
row.w = NULL,
col.w = NULL,
graph = TRUE,
axes = c(1,2)
)TRUE of FALSE determines whether or not to scale (i.e. standardize) the dataset/dataframe variables.
TRUE of FALSE determines whether or not to display the PCA’s associated graph.
More information regarding the PCA() function and its arguments is available in its associated RDocumentation page: https://www.rdocumentation.org/packages/FactoMineR/versions/2.4/topics/PCA
Let’s now evaluate the results when applying PCA() to the Wisconsin Breast Cancer Dataset:
all_pca_3 <- PCA(wbcd[,-1], scale.unit = TRUE, ncp = 30, graph = FALSE)
class(all_pca_3)
## [1] "PCA" "list"
str(all_pca_3)
## List of 5
## $ eig : num [1:30, 1:3] 13.28 5.69 2.82 1.98 1.65 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:30] "comp 1" "comp 2" "comp 3" "comp 4" ...
## .. ..$ : chr [1:3] "eigenvalue" "percentage of variance" "cumulative percentage of variance"
## $ var :List of 4
## ..$ coord : num [1:30, 1:30] 0.798 0.378 0.829 0.805 0.52 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:30] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
## .. .. ..$ : chr [1:30] "Dim.1" "Dim.2" "Dim.3" "Dim.4" ...
## ..$ cor : num [1:30, 1:30] 0.798 0.378 0.829 0.805 0.52 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:30] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
## .. .. ..$ : chr [1:30] "Dim.1" "Dim.2" "Dim.3" "Dim.4" ...
## ..$ cos2 : num [1:30, 1:30] 0.636 0.143 0.688 0.649 0.27 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:30] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
## .. .. ..$ : chr [1:30] "Dim.1" "Dim.2" "Dim.3" "Dim.4" ...
## ..$ contrib: num [1:30, 1:30] 4.79 1.08 5.18 4.88 2.03 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:30] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
## .. .. ..$ : chr [1:30] "Dim.1" "Dim.2" "Dim.3" "Dim.4" ...
## $ ind :List of 4
## ..$ coord : num [1:569, 1:30] 9.19 2.39 5.73 7.12 3.94 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:569] "1" "2" "3" "4" ...
## .. .. ..$ : chr [1:30] "Dim.1" "Dim.2" "Dim.3" "Dim.4" ...
## ..$ cos2 : num [1:569, 1:30] 0.737 0.216 0.878 0.259 0.45 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:569] "1" "2" "3" "4" ...
## .. .. ..$ : chr [1:30] "Dim.1" "Dim.2" "Dim.3" "Dim.4" ...
## ..$ contrib: num [1:569, 1:30] 1.1182 0.0754 0.435 0.6714 0.2049 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:569] "1" "2" "3" "4" ...
## .. .. ..$ : chr [1:30] "Dim.1" "Dim.2" "Dim.3" "Dim.4" ...
## ..$ dist : Named num [1:569] 10.71 5.13 6.12 13.99 5.87 ...
## .. ..- attr(*, "names")= chr [1:569] "1" "2" "3" "4" ...
## $ svd :List of 3
## ..$ vs: num [1:30] 3.64 2.39 1.68 1.41 1.28 ...
## ..$ U : num [1:569, 1:30] 2.522 0.655 1.573 1.954 1.08 ...
## ..$ V : num [1:30, 1:30] 0.219 0.104 0.228 0.221 0.143 ...
## $ call:List of 9
## ..$ row.w : num [1:569] 0.00176 0.00176 0.00176 0.00176 0.00176 ...
## ..$ col.w : num [1:30] 1 1 1 1 1 1 1 1 1 1 ...
## ..$ scale.unit: logi TRUE
## ..$ ncp : num 30
## ..$ centre : num [1:30] 14.1273 19.2896 91.969 654.8891 0.0964 ...
## ..$ ecart.type: num [1:30] 3.521 4.2973 24.2776 351.6048 0.0141 ...
## ..$ X :'data.frame': 569 obs. of 30 variables:
## .. ..$ radius_mean : num [1:569] 18 20.6 19.7 11.4 20.3 ...
## .. ..$ texture_mean : num [1:569] 10.4 17.8 21.2 20.4 14.3 ...
## .. ..$ perimeter_mean : num [1:569] 122.8 132.9 130 77.6 135.1 ...
## .. ..$ area_mean : num [1:569] 1001 1326 1203 386 1297 ...
## .. ..$ smoothness_mean : num [1:569] 0.1184 0.0847 0.1096 0.1425 0.1003 ...
## .. ..$ compactness_mean : num [1:569] 0.2776 0.0786 0.1599 0.2839 0.1328 ...
## .. ..$ concavity_mean : num [1:569] 0.3001 0.0869 0.1974 0.2414 0.198 ...
## .. ..$ concave points_mean : num [1:569] 0.1471 0.0702 0.1279 0.1052 0.1043 ...
## .. ..$ symmetry_mean : num [1:569] 0.242 0.181 0.207 0.26 0.181 ...
## .. ..$ fractal_dimension_mean : num [1:569] 0.0787 0.0567 0.06 0.0974 0.0588 ...
## .. ..$ radius_se : num [1:569] 1.095 0.543 0.746 0.496 0.757 ...
## .. ..$ texture_se : num [1:569] 0.905 0.734 0.787 1.156 0.781 ...
## .. ..$ perimeter_se : num [1:569] 8.59 3.4 4.58 3.44 5.44 ...
## .. ..$ area_se : num [1:569] 153.4 74.1 94 27.2 94.4 ...
## .. ..$ smoothness_se : num [1:569] 0.0064 0.00522 0.00615 0.00911 0.01149 ...
## .. ..$ compactness_se : num [1:569] 0.049 0.0131 0.0401 0.0746 0.0246 ...
## .. ..$ concavity_se : num [1:569] 0.0537 0.0186 0.0383 0.0566 0.0569 ...
## .. ..$ concave points_se : num [1:569] 0.0159 0.0134 0.0206 0.0187 0.0188 ...
## .. ..$ symmetry_se : num [1:569] 0.03 0.0139 0.0225 0.0596 0.0176 ...
## .. ..$ fractal_dimension_se : num [1:569] 0.00619 0.00353 0.00457 0.00921 0.00511 ...
## .. ..$ radius_worst : num [1:569] 25.4 25 23.6 14.9 22.5 ...
## .. ..$ texture_worst : num [1:569] 17.3 23.4 25.5 26.5 16.7 ...
## .. ..$ perimeter_worst : num [1:569] 184.6 158.8 152.5 98.9 152.2 ...
## .. ..$ area_worst : num [1:569] 2019 1956 1709 568 1575 ...
## .. ..$ smoothness_worst : num [1:569] 0.162 0.124 0.144 0.21 0.137 ...
## .. ..$ compactness_worst : num [1:569] 0.666 0.187 0.424 0.866 0.205 ...
## .. ..$ concavity_worst : num [1:569] 0.712 0.242 0.45 0.687 0.4 ...
## .. ..$ concave points_worst : num [1:569] 0.265 0.186 0.243 0.258 0.163 ...
## .. ..$ symmetry_worst : num [1:569] 0.46 0.275 0.361 0.664 0.236 ...
## .. ..$ fractal_dimension_worst: num [1:569] 0.1189 0.089 0.0876 0.173 0.0768 ...
## ..$ row.w.init: num [1:569] 1 1 1 1 1 1 1 1 1 1 ...
## ..$ call : language PCA(X = wbcd[, -1], scale.unit = TRUE, ncp = 30, graph = FALSE)
## - attr(*, "class")= chr [1:2] "PCA" "list"
summary(all_pca_3)
##
## Call:
## PCA(X = wbcd[, -1], scale.unit = TRUE, ncp = 30, graph = FALSE)
##
##
## Eigenvalues
## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5 Dim.6 Dim.7
## Variance 13.282 5.691 2.818 1.981 1.649 1.207 0.675
## % of var. 44.272 18.971 9.393 6.602 5.496 4.025 2.251
## Cumulative % of var. 44.272 63.243 72.636 79.239 84.734 88.759 91.010
## Dim.8 Dim.9 Dim.10 Dim.11 Dim.12 Dim.13 Dim.14
## Variance 0.477 0.417 0.351 0.294 0.261 0.241 0.157
## % of var. 1.589 1.390 1.169 0.980 0.871 0.805 0.523
## Cumulative % of var. 92.598 93.988 95.157 96.137 97.007 97.812 98.335
## Dim.15 Dim.16 Dim.17 Dim.18 Dim.19 Dim.20 Dim.21
## Variance 0.094 0.080 0.059 0.053 0.049 0.031 0.030
## % of var. 0.314 0.266 0.198 0.175 0.165 0.104 0.100
## Cumulative % of var. 98.649 98.915 99.113 99.288 99.453 99.557 99.657
## Dim.22 Dim.23 Dim.24 Dim.25 Dim.26 Dim.27 Dim.28
## Variance 0.027 0.024 0.018 0.015 0.008 0.007 0.002
## % of var. 0.091 0.081 0.060 0.052 0.027 0.023 0.005
## Cumulative % of var. 99.749 99.830 99.890 99.942 99.969 99.992 99.997
## Dim.29 Dim.30
## Variance 0.001 0.000
## % of var. 0.002 0.000
## Cumulative % of var. 100.000 100.000
##
## Individuals (the 10 first)
## Dist Dim.1 ctr cos2 Dim.2 ctr cos2
## 1 | 10.710 | 9.193 1.118 0.737 | 1.949 0.117 0.033
## 2 | 5.132 | 2.388 0.075 0.216 | -3.768 0.438 0.539
## 3 | 6.119 | 5.734 0.435 0.878 | -1.075 0.036 0.031
## 4 | 13.986 | 7.123 0.671 0.259 | 10.276 3.261 0.540
## 5 | 5.868 | 3.935 0.205 0.450 | -1.948 0.117 0.110
## 6 | 5.735 | 2.380 0.075 0.172 | 3.950 0.482 0.474
## 7 | 3.970 | 2.239 0.066 0.318 | -2.690 0.223 0.459
## 8 | 4.195 | 2.143 0.061 0.261 | 2.340 0.169 0.311
## 9 | 6.017 | 3.175 0.133 0.278 | 3.392 0.355 0.318
## 10 | 12.163 | 6.352 0.534 0.273 | 7.727 1.844 0.404
## Dim.3 ctr cos2
## 1 | -1.123 0.079 0.011 |
## 2 | -0.529 0.017 0.011 |
## 3 | -0.552 0.019 0.008 |
## 4 | -3.233 0.652 0.053 |
## 5 | 1.390 0.120 0.056 |
## 6 | -2.935 0.537 0.262 |
## 7 | -1.640 0.168 0.171 |
## 8 | -0.872 0.047 0.043 |
## 9 | -3.120 0.607 0.269 |
## 10 | -4.342 1.176 0.127 |
##
## Variables (the 10 first)
## Dim.1 ctr cos2 Dim.2 ctr cos2 Dim.3
## radius_mean | 0.798 4.792 0.636 | -0.558 5.469 0.311 | -0.014
## texture_mean | 0.378 1.076 0.143 | -0.142 0.356 0.020 | 0.108
## perimeter_mean | 0.829 5.177 0.688 | -0.513 4.630 0.264 | -0.016
## area_mean | 0.805 4.884 0.649 | -0.551 5.340 0.304 | 0.048
## smoothness_mean | 0.520 2.033 0.270 | 0.444 3.464 0.197 | -0.175
## compactness_mean | 0.872 5.726 0.760 | 0.362 2.307 0.131 | -0.124
## concavity_mean | 0.942 6.677 0.887 | 0.144 0.362 0.021 | 0.005
## concave points_mean | 0.951 6.804 0.904 | -0.083 0.121 0.007 | -0.043
## symmetry_mean | 0.504 1.909 0.254 | 0.454 3.623 0.206 | -0.068
## fractal_dimension_mean | 0.235 0.414 0.055 | 0.875 13.438 0.765 | -0.038
## ctr cos2
## radius_mean 0.007 0.000 |
## texture_mean 0.417 0.012 |
## perimeter_mean 0.009 0.000 |
## area_mean 0.082 0.002 |
## smoothness_mean 1.088 0.031 |
## compactness_mean 0.549 0.015 |
## concavity_mean 0.001 0.000 |
## concave points_mean 0.065 0.002 |
## symmetry_mean 0.162 0.005 |
## fractal_dimension_mean 0.051 0.001 |mean_pca_3 <- PCA(wbcd[,c(2:11)], scale.unit = TRUE, ncp = 30, graph = FALSE)
class(mean_pca_3)
## [1] "PCA" "list"
str(mean_pca_3)
## List of 5
## $ eig : num [1:10, 1:3] 5.479 2.519 0.881 0.499 0.373 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:10] "comp 1" "comp 2" "comp 3" "comp 4" ...
## .. ..$ : chr [1:3] "eigenvalue" "percentage of variance" "cumulative percentage of variance"
## $ var :List of 4
## ..$ coord : num [1:10, 1:10] 0.852 0.362 0.88 0.852 0.544 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:10] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
## .. .. ..$ : chr [1:10] "Dim.1" "Dim.2" "Dim.3" "Dim.4" ...
## ..$ cor : num [1:10, 1:10] 0.852 0.362 0.88 0.852 0.544 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:10] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
## .. .. ..$ : chr [1:10] "Dim.1" "Dim.2" "Dim.3" "Dim.4" ...
## ..$ cos2 : num [1:10, 1:10] 0.726 0.131 0.775 0.726 0.296 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:10] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
## .. .. ..$ : chr [1:10] "Dim.1" "Dim.2" "Dim.3" "Dim.4" ...
## ..$ contrib: num [1:10, 1:10] 13.25 2.39 14.14 13.26 5.4 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:10] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
## .. .. ..$ : chr [1:10] "Dim.1" "Dim.2" "Dim.3" "Dim.4" ...
## $ ind :List of 4
## ..$ coord : num [1:569, 1:10] 5.22 1.73 3.97 3.6 3.15 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:569] "1" "2" "3" "4" ...
## .. .. ..$ : chr [1:10] "Dim.1" "Dim.2" "Dim.3" "Dim.4" ...
## ..$ cos2 : num [1:569, 1:10] 0.609 0.25 0.947 0.208 0.641 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:569] "1" "2" "3" "4" ...
## .. .. ..$ : chr [1:10] "Dim.1" "Dim.2" "Dim.3" "Dim.4" ...
## ..$ contrib: num [1:569, 1:10] 0.8755 0.0958 0.5055 0.415 0.3185 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:569] "1" "2" "3" "4" ...
## .. .. ..$ : chr [1:10] "Dim.1" "Dim.2" "Dim.3" "Dim.4" ...
## ..$ dist : Named num [1:569] 6.69 3.45 4.08 7.88 3.94 ...
## .. ..- attr(*, "names")= chr [1:569] "1" "2" "3" "4" ...
## $ svd :List of 3
## ..$ vs: num [1:10] 2.341 1.587 0.938 0.706 0.61 ...
## ..$ U : num [1:569, 1:10] 2.232 0.738 1.696 1.537 1.346 ...
## ..$ V : num [1:10, 1:10] 0.364 0.154 0.376 0.364 0.232 ...
## $ call:List of 9
## ..$ row.w : num [1:569] 0.00176 0.00176 0.00176 0.00176 0.00176 ...
## ..$ col.w : num [1:10] 1 1 1 1 1 1 1 1 1 1
## ..$ scale.unit: logi TRUE
## ..$ ncp : num 10
## ..$ centre : num [1:10] 14.1273 19.2896 91.969 654.8891 0.0964 ...
## ..$ ecart.type: num [1:10] 3.521 4.2973 24.2776 351.6048 0.0141 ...
## ..$ X :'data.frame': 569 obs. of 10 variables:
## .. ..$ radius_mean : num [1:569] 18 20.6 19.7 11.4 20.3 ...
## .. ..$ texture_mean : num [1:569] 10.4 17.8 21.2 20.4 14.3 ...
## .. ..$ perimeter_mean : num [1:569] 122.8 132.9 130 77.6 135.1 ...
## .. ..$ area_mean : num [1:569] 1001 1326 1203 386 1297 ...
## .. ..$ smoothness_mean : num [1:569] 0.1184 0.0847 0.1096 0.1425 0.1003 ...
## .. ..$ compactness_mean : num [1:569] 0.2776 0.0786 0.1599 0.2839 0.1328 ...
## .. ..$ concavity_mean : num [1:569] 0.3001 0.0869 0.1974 0.2414 0.198 ...
## .. ..$ concave points_mean : num [1:569] 0.1471 0.0702 0.1279 0.1052 0.1043 ...
## .. ..$ symmetry_mean : num [1:569] 0.242 0.181 0.207 0.26 0.181 ...
## .. ..$ fractal_dimension_mean: num [1:569] 0.0787 0.0567 0.06 0.0974 0.0588 ...
## ..$ row.w.init: num [1:569] 1 1 1 1 1 1 1 1 1 1 ...
## ..$ call : language PCA(X = wbcd[, c(2:11)], scale.unit = TRUE, ncp = 30, graph = FALSE)
## - attr(*, "class")= chr [1:2] "PCA" "list"
summary(mean_pca_3)
##
## Call:
## PCA(X = wbcd[, c(2:11)], scale.unit = TRUE, ncp = 30, graph = FALSE)
##
##
## Eigenvalues
## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5 Dim.6 Dim.7
## Variance 5.479 2.519 0.881 0.499 0.373 0.124 0.080
## % of var. 54.786 25.187 8.806 4.990 3.725 1.241 0.801
## Cumulative % of var. 54.786 79.973 88.779 93.769 97.495 98.736 99.537
## Dim.8 Dim.9 Dim.10
## Variance 0.035 0.011 0.000
## % of var. 0.349 0.111 0.003
## Cumulative % of var. 99.886 99.997 100.000
##
## Individuals (the 10 first)
## Dist Dim.1 ctr cos2 Dim.2 ctr cos2
## 1 | 6.692 | 5.224 0.875 0.609 | 3.204 0.716 0.229 |
## 2 | 3.455 | 1.728 0.096 0.250 | -2.541 0.450 0.541 |
## 3 | 4.079 | 3.970 0.506 0.947 | -0.550 0.021 0.018 |
## 4 | 7.878 | 3.597 0.415 0.208 | 6.905 3.327 0.768 |
## 5 | 3.935 | 3.151 0.319 0.641 | -1.358 0.129 0.119 |
## 6 | 3.728 | 1.381 0.061 0.137 | 3.314 0.767 0.790 |
## 7 | 2.238 | 1.602 0.082 0.512 | -1.499 0.157 0.448 |
## 8 | 2.980 | 1.257 0.051 0.178 | 2.495 0.434 0.701 |
## 9 | 4.179 | 2.390 0.183 0.327 | 3.275 0.748 0.614 |
## 10 | 4.815 | 2.445 0.192 0.258 | 3.626 0.917 0.567 |
## Dim.3 ctr cos2
## 1 -2.171 0.941 0.105 |
## 2 -1.020 0.208 0.087 |
## 3 -0.324 0.021 0.006 |
## 4 0.793 0.125 0.010 |
## 5 -1.862 0.692 0.224 |
## 6 -0.698 0.097 0.035 |
## 7 -0.353 0.025 0.025 |
## 8 0.414 0.034 0.019 |
## 9 0.622 0.077 0.022 |
## 10 1.449 0.419 0.091 |
##
## Variables
## Dim.1 ctr cos2 Dim.2 ctr cos2 Dim.3
## radius_mean | 0.852 13.245 0.726 | -0.498 9.855 0.248 | -0.117
## texture_mean | 0.362 2.386 0.131 | -0.234 2.166 0.055 | 0.892
## perimeter_mean | 0.880 14.141 0.775 | -0.452 8.103 0.204 | -0.107
## area_mean | 0.852 13.256 0.726 | -0.484 9.293 0.234 | -0.116
## smoothness_mean | 0.544 5.405 0.296 | 0.638 16.157 0.407 | -0.156
## compactness_mean | 0.853 13.282 0.728 | 0.422 7.076 0.178 | 0.055
## concavity_mean | 0.926 15.662 0.858 | 0.166 1.088 0.027 | 0.039
## concave points_mean | 0.978 17.476 0.957 | 0.011 0.005 0.000 | -0.064
## symmetry_mean | 0.504 4.633 0.254 | 0.585 13.565 0.342 | 0.034
## fractal_dimension_mean | 0.168 0.516 0.028 | 0.907 32.692 0.823 | 0.107
## ctr cos2
## radius_mean 1.548 0.014 |
## texture_mean 90.451 0.797 |
## perimeter_mean 1.302 0.011 |
## area_mean 1.522 0.013 |
## smoothness_mean 2.773 0.024 |
## compactness_mean 0.340 0.003 |
## concavity_mean 0.169 0.001 |
## concave points_mean 0.470 0.004 |
## symmetry_mean 0.135 0.001 |
## fractal_dimension_mean 1.290 0.011 |se_pca_3 <- PCA(wbcd[,c(12:21)], scale.unit = TRUE, ncp = 30, graph = FALSE)
class(se_pca_3)
## [1] "PCA" "list"
str(se_pca_3)
## List of 5
## $ eig : num [1:10, 1:3] 4.743 2.075 1.264 0.594 0.577 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:10] "comp 1" "comp 2" "comp 3" "comp 4" ...
## .. ..$ : chr [1:3] "eigenvalue" "percentage of variance" "cumulative percentage of variance"
## $ var :List of 4
## ..$ coord : num [1:10, 1:10] 0.753 0.411 0.779 0.662 0.463 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:10] "radius_se" "texture_se" "perimeter_se" "area_se" ...
## .. .. ..$ : chr [1:10] "Dim.1" "Dim.2" "Dim.3" "Dim.4" ...
## ..$ cor : num [1:10, 1:10] 0.753 0.411 0.779 0.662 0.463 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:10] "radius_se" "texture_se" "perimeter_se" "area_se" ...
## .. .. ..$ : chr [1:10] "Dim.1" "Dim.2" "Dim.3" "Dim.4" ...
## ..$ cos2 : num [1:10, 1:10] 0.567 0.169 0.606 0.438 0.214 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:10] "radius_se" "texture_se" "perimeter_se" "area_se" ...
## .. .. ..$ : chr [1:10] "Dim.1" "Dim.2" "Dim.3" "Dim.4" ...
## ..$ contrib: num [1:10, 1:10] 11.94 3.56 12.78 9.24 4.51 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:10] "radius_se" "texture_se" "perimeter_se" "area_se" ...
## .. .. ..$ : chr [1:10] "Dim.1" "Dim.2" "Dim.3" "Dim.4" ...
## $ ind :List of 4
## ..$ coord : num [1:569, 1:10] 4.053 -0.341 1.961 3.795 2.219 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:569] "1" "2" "3" "4" ...
## .. .. ..$ : chr [1:10] "Dim.1" "Dim.2" "Dim.3" "Dim.4" ...
## ..$ cos2 : num [1:569, 1:10] 0.6413 0.0341 0.5336 0.3914 0.4996 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:569] "1" "2" "3" "4" ...
## .. .. ..$ : chr [1:10] "Dim.1" "Dim.2" "Dim.3" "Dim.4" ...
## ..$ contrib: num [1:569, 1:10] 0.6086 0.0043 0.1425 0.5336 0.1824 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:569] "1" "2" "3" "4" ...
## .. .. ..$ : chr [1:10] "Dim.1" "Dim.2" "Dim.3" "Dim.4" ...
## ..$ dist : Named num [1:569] 5.06 1.85 2.68 6.07 3.14 ...
## .. ..- attr(*, "names")= chr [1:569] "1" "2" "3" "4" ...
## $ svd :List of 3
## ..$ vs: num [1:10] 2.178 1.441 1.124 0.771 0.76 ...
## ..$ U : num [1:569, 1:10] 1.861 -0.156 0.9 1.742 1.019 ...
## ..$ V : num [1:10, 1:10] 0.346 0.189 0.357 0.304 0.212 ...
## $ call:List of 9
## ..$ row.w : num [1:569] 0.00176 0.00176 0.00176 0.00176 0.00176 ...
## ..$ col.w : num [1:10] 1 1 1 1 1 1 1 1 1 1
## ..$ scale.unit: logi TRUE
## ..$ ncp : num 10
## ..$ centre : num [1:10] 0.40517 1.21685 2.86606 40.33708 0.00704 ...
## ..$ ecart.type: num [1:10] 0.277 0.551 2.02 45.451 0.003 ...
## ..$ X :'data.frame': 569 obs. of 10 variables:
## .. ..$ radius_se : num [1:569] 1.095 0.543 0.746 0.496 0.757 ...
## .. ..$ texture_se : num [1:569] 0.905 0.734 0.787 1.156 0.781 ...
## .. ..$ perimeter_se : num [1:569] 8.59 3.4 4.58 3.44 5.44 ...
## .. ..$ area_se : num [1:569] 153.4 74.1 94 27.2 94.4 ...
## .. ..$ smoothness_se : num [1:569] 0.0064 0.00522 0.00615 0.00911 0.01149 ...
## .. ..$ compactness_se : num [1:569] 0.049 0.0131 0.0401 0.0746 0.0246 ...
## .. ..$ concavity_se : num [1:569] 0.0537 0.0186 0.0383 0.0566 0.0569 ...
## .. ..$ concave points_se : num [1:569] 0.0159 0.0134 0.0206 0.0187 0.0188 ...
## .. ..$ symmetry_se : num [1:569] 0.03 0.0139 0.0225 0.0596 0.0176 ...
## .. ..$ fractal_dimension_se: num [1:569] 0.00619 0.00353 0.00457 0.00921 0.00511 ...
## ..$ row.w.init: num [1:569] 1 1 1 1 1 1 1 1 1 1 ...
## ..$ call : language PCA(X = wbcd[, c(12:21)], scale.unit = TRUE, ncp = 30, graph = FALSE)
## - attr(*, "class")= chr [1:2] "PCA" "list"
summary(se_pca_3)
##
## Call:
## PCA(X = wbcd[, c(12:21)], scale.unit = TRUE, ncp = 30, graph = FALSE)
##
##
## Eigenvalues
## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5 Dim.6 Dim.7
## Variance 4.743 2.075 1.264 0.594 0.577 0.336 0.189
## % of var. 47.434 20.752 12.644 5.944 5.775 3.357 1.893
## Cumulative % of var. 47.434 68.186 80.830 86.774 92.548 95.905 97.798
## Dim.8 Dim.9 Dim.10
## Variance 0.157 0.042 0.021
## % of var. 1.570 0.418 0.214
## Cumulative % of var. 99.368 99.786 100.000
##
## Individuals (the 10 first)
## Dist Dim.1 ctr cos2 Dim.2 ctr cos2
## 1 | 5.061 | 4.053 0.609 0.641 | -2.587 0.567 0.261 |
## 2 | 1.845 | -0.341 0.004 0.034 | -1.442 0.176 0.611 |
## 3 | 2.685 | 1.961 0.142 0.534 | -1.172 0.116 0.190 |
## 4 | 6.066 | 3.795 0.534 0.391 | 2.660 0.599 0.192 |
## 5 | 3.139 | 2.219 0.182 0.500 | -1.030 0.090 0.108 |
## 6 | 1.054 | 0.019 0.000 0.000 | 0.681 0.039 0.417 |
## 7 | 1.801 | -0.986 0.036 0.300 | -1.279 0.139 0.504 |
## 8 | 1.517 | 0.873 0.028 0.331 | -0.274 0.006 0.033 |
## 9 | 0.981 | -0.187 0.001 0.036 | 0.430 0.016 0.192 |
## 10 | 3.993 | 2.127 0.168 0.284 | 2.428 0.499 0.370 |
## Dim.3 ctr cos2
## 1 -0.385 0.021 0.006 |
## 2 -0.772 0.083 0.175 |
## 3 -0.966 0.130 0.129 |
## 4 0.748 0.078 0.015 |
## 5 -0.403 0.023 0.016 |
## 6 -0.510 0.036 0.234 |
## 7 -0.769 0.082 0.182 |
## 8 0.010 0.000 0.000 |
## 9 -0.613 0.052 0.390 |
## 10 -1.471 0.301 0.136 |
##
## Variables
## Dim.1 ctr cos2 Dim.2 ctr cos2 Dim.3
## radius_se | 0.753 11.943 0.567 | -0.634 19.391 0.402 | 0.091
## texture_se | 0.411 3.557 0.169 | 0.221 2.353 0.049 | 0.665
## perimeter_se | 0.779 12.779 0.606 | -0.605 17.665 0.367 | 0.066
## area_se | 0.662 9.243 0.438 | -0.721 25.021 0.519 | 0.028
## smoothness_se | 0.463 4.514 0.214 | 0.390 7.342 0.152 | 0.481
## compactness_se | 0.816 14.047 0.666 | 0.350 5.887 0.122 | -0.289
## concavity_se | 0.774 12.642 0.600 | 0.330 5.250 0.109 | -0.380
## concave points_se | 0.840 14.880 0.706 | 0.122 0.722 0.015 | -0.258
## symmetry_se | 0.515 5.585 0.265 | 0.286 3.943 0.082 | 0.494
## fractal_dimension_se | 0.716 10.810 0.513 | 0.508 12.426 0.258 | -0.197
## ctr cos2
## radius_se 0.653 0.008 |
## texture_se 34.991 0.442 |
## perimeter_se 0.345 0.004 |
## area_se 0.062 0.001 |
## smoothness_se 18.274 0.231 |
## compactness_se 6.595 0.083 |
## concavity_se 11.438 0.145 |
## concave points_se 5.270 0.067 |
## symmetry_se 19.300 0.244 |
## fractal_dimension_se 3.073 0.039 |worst_pca_3 <- PCA(wbcd[,c(22:31)], scale.unit = TRUE, ncp = 30, graph = FALSE)
class(worst_pca_3)
## [1] "PCA" "list"
str(worst_pca_3)
## List of 5
## $ eig : num [1:10, 1:3] 5.697 2.086 0.803 0.541 0.515 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:10] "comp 1" "comp 2" "comp 3" "comp 4" ...
## .. ..$ : chr [1:3] "eigenvalue" "percentage of variance" "cumulative percentage of variance"
## $ var :List of 4
## ..$ coord : num [1:10, 1:10] 0.802 0.479 0.831 0.775 0.593 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:10] "radius_worst" "texture_worst" "perimeter_worst" "area_worst" ...
## .. .. ..$ : chr [1:10] "Dim.1" "Dim.2" "Dim.3" "Dim.4" ...
## ..$ cor : num [1:10, 1:10] 0.802 0.479 0.831 0.775 0.593 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:10] "radius_worst" "texture_worst" "perimeter_worst" "area_worst" ...
## .. .. ..$ : chr [1:10] "Dim.1" "Dim.2" "Dim.3" "Dim.4" ...
## ..$ cos2 : num [1:10, 1:10] 0.643 0.23 0.691 0.601 0.352 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:10] "radius_worst" "texture_worst" "perimeter_worst" "area_worst" ...
## .. .. ..$ : chr [1:10] "Dim.1" "Dim.2" "Dim.3" "Dim.4" ...
## ..$ contrib: num [1:10, 1:10] 11.28 4.03 12.12 10.55 6.18 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:10] "radius_worst" "texture_worst" "perimeter_worst" "area_worst" ...
## .. .. ..$ : chr [1:10] "Dim.1" "Dim.2" "Dim.3" "Dim.4" ...
## $ ind :List of 4
## ..$ coord : num [1:569, 1:10] 5.97 1.82 3.41 6.3 1.15 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:569] "1" "2" "3" "4" ...
## .. .. ..$ : chr [1:10] "Dim.1" "Dim.2" "Dim.3" "Dim.4" ...
## ..$ cos2 : num [1:569, 1:10] 0.805 0.301 0.854 0.411 0.145 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:569] "1" "2" "3" "4" ...
## .. .. ..$ : chr [1:10] "Dim.1" "Dim.2" "Dim.3" "Dim.4" ...
## ..$ contrib: num [1:569, 1:10] 1.1011 0.102 0.3582 1.2262 0.0406 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:569] "1" "2" "3" "4" ...
## .. .. ..$ : chr [1:10] "Dim.1" "Dim.2" "Dim.3" "Dim.4" ...
## ..$ dist : Named num [1:569] 6.66 3.32 3.69 9.84 3.01 ...
## .. ..- attr(*, "names")= chr [1:569] "1" "2" "3" "4" ...
## $ svd :List of 3
## ..$ vs: num [1:10] 2.387 1.444 0.896 0.735 0.717 ...
## ..$ U : num [1:569, 1:10] 2.503 0.762 1.428 2.641 0.48 ...
## ..$ V : num [1:10, 1:10] 0.336 0.201 0.348 0.325 0.249 ...
## $ call:List of 9
## ..$ row.w : num [1:569] 0.00176 0.00176 0.00176 0.00176 0.00176 ...
## ..$ col.w : num [1:10] 1 1 1 1 1 1 1 1 1 1
## ..$ scale.unit: logi TRUE
## ..$ ncp : num 10
## ..$ centre : num [1:10] 16.269 25.677 107.261 880.583 0.132 ...
## ..$ ecart.type: num [1:10] 4.829 6.1409 33.573 568.8565 0.0228 ...
## ..$ X :'data.frame': 569 obs. of 10 variables:
## .. ..$ radius_worst : num [1:569] 25.4 25 23.6 14.9 22.5 ...
## .. ..$ texture_worst : num [1:569] 17.3 23.4 25.5 26.5 16.7 ...
## .. ..$ perimeter_worst : num [1:569] 184.6 158.8 152.5 98.9 152.2 ...
## .. ..$ area_worst : num [1:569] 2019 1956 1709 568 1575 ...
## .. ..$ smoothness_worst : num [1:569] 0.162 0.124 0.144 0.21 0.137 ...
## .. ..$ compactness_worst : num [1:569] 0.666 0.187 0.424 0.866 0.205 ...
## .. ..$ concavity_worst : num [1:569] 0.712 0.242 0.45 0.687 0.4 ...
## .. ..$ concave points_worst : num [1:569] 0.265 0.186 0.243 0.258 0.163 ...
## .. ..$ symmetry_worst : num [1:569] 0.46 0.275 0.361 0.664 0.236 ...
## .. ..$ fractal_dimension_worst: num [1:569] 0.1189 0.089 0.0876 0.173 0.0768 ...
## ..$ row.w.init: num [1:569] 1 1 1 1 1 1 1 1 1 1 ...
## ..$ call : language PCA(X = wbcd[, c(22:31)], scale.unit = TRUE, ncp = 30, graph = FALSE)
## - attr(*, "class")= chr [1:2] "PCA" "list"
summary(worst_pca_3)
##
## Call:
## PCA(X = wbcd[, c(22:31)], scale.unit = TRUE, ncp = 30, graph = FALSE)
##
##
## Eigenvalues
## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5 Dim.6 Dim.7
## Variance 5.697 2.086 0.803 0.541 0.515 0.184 0.084
## % of var. 56.972 20.860 8.028 5.407 5.147 1.837 0.839
## Cumulative % of var. 56.972 77.832 85.860 91.267 96.413 98.251 99.089
## Dim.8 Dim.9 Dim.10
## Variance 0.072 0.015 0.004
## % of var. 0.718 0.152 0.040
## Cumulative % of var. 99.808 99.960 100.000
##
## Individuals (the 10 first)
## Dist Dim.1 ctr cos2 Dim.2 ctr cos2
## 1 | 6.657 | 5.975 1.101 0.805 | 0.672 0.038 0.010
## 2 | 3.316 | 1.818 0.102 0.301 | -2.315 0.452 0.487
## 3 | 3.687 | 3.408 0.358 0.854 | -0.780 0.051 0.045
## 4 | 9.836 | 6.305 1.226 0.411 | 6.966 4.088 0.502
## 5 | 3.014 | 1.147 0.041 0.145 | -1.878 0.297 0.388
## 6 | 4.228 | 2.740 0.232 0.420 | 3.100 0.810 0.538
## 7 | 2.740 | 2.280 0.160 0.693 | -1.334 0.150 0.237
## 8 | 2.534 | 1.602 0.079 0.400 | 1.483 0.185 0.343
## 9 | 4.217 | 3.053 0.288 0.524 | 2.637 0.586 0.391
## 10 | 10.431 | 7.126 1.567 0.467 | 6.786 3.880 0.423
## Dim.3 ctr cos2
## 1 | -2.545 1.418 0.146 |
## 2 | -0.881 0.170 0.071 |
## 3 | -0.775 0.132 0.044 |
## 4 | -0.817 0.146 0.007 |
## 5 | -1.839 0.740 0.372 |
## 6 | -0.748 0.122 0.031 |
## 7 | -0.225 0.011 0.007 |
## 8 | 0.113 0.003 0.002 |
## 9 | 0.327 0.023 0.006 |
## 10 | 1.393 0.425 0.018 |
##
## Variables
## Dim.1 ctr cos2 Dim.2 ctr cos2 Dim.3
## radius_worst | 0.802 11.284 0.643 | -0.582 16.252 0.339 | -0.068
## texture_worst | 0.479 4.029 0.230 | -0.061 0.181 0.004 | 0.875
## perimeter_worst | 0.831 12.121 0.691 | -0.542 14.101 0.294 | -0.075
## area_worst | 0.775 10.546 0.601 | -0.600 17.244 0.360 | -0.071
## smoothness_worst | 0.593 6.181 0.352 | 0.488 11.416 0.238 | -0.046
## compactness_worst | 0.870 13.291 0.757 | 0.362 6.278 0.131 | -0.034
## concavity_worst | 0.894 14.043 0.800 | 0.201 1.934 0.040 | -0.052
## concave points_worst | 0.949 15.812 0.901 | -0.060 0.174 0.004 | -0.118
## symmetry_worst | 0.596 6.238 0.355 | 0.446 9.524 0.199 | -0.019
## fractal_dimension_worst | 0.606 6.456 0.368 | 0.691 22.896 0.478 | -0.032
## ctr cos2
## radius_worst 0.580 0.005 |
## texture_worst 95.418 0.766 |
## perimeter_worst 0.703 0.006 |
## area_worst 0.624 0.005 |
## smoothness_worst 0.265 0.002 |
## compactness_worst 0.145 0.001 |
## concavity_worst 0.343 0.003 |
## concave points_worst 1.747 0.014 |
## symmetry_worst 0.046 0.000 |
## fractal_dimension_worst 0.130 0.001 |The function class() showcases that the object created by using the PCA() function is of class “list” and “PCA”. It is also worth noting that the cumulative proportions obtained with PCA() (and seen when applying the summary() function to an object of this kind) are identical to the ones obtained with the R built-in functions - as should be.
Another aspect to highlight about this function lies in how well organized is/are this function’s output/results. Even though the components of the PCA object obtained through PCA() can be read in str()’s output (as was the case with R built-in functions), the function print() does a better job at showcasing the object’s components and how the resulting data is stored within it/them - the following code snippets compare the outputs of the print() function when applied to the PCA objects described so far:
print(all_pca_1)
## Standard deviations (1, .., p=30):
## [1] 3.64439401 2.38565601 1.67867477 1.40735229 1.28402903 1.09879780
## [7] 0.82171778 0.69037464 0.64567392 0.59219377 0.54213992 0.51103950
## [13] 0.49128148 0.39624453 0.30681422 0.28260007 0.24371918 0.22938785
## [19] 0.22243559 0.17652026 0.17312681 0.16564843 0.15601550 0.13436892
## [25] 0.12442376 0.09043030 0.08306903 0.03986650 0.02736427 0.01153451
##
## Rotation (n x k) = (30 x 30):
## PC1 PC2 PC3 PC4
## radius_mean -0.21890244 0.233857132 -0.008531243 0.041408962
## texture_mean -0.10372458 0.059706088 0.064549903 -0.603050001
## perimeter_mean -0.22753729 0.215181361 -0.009314220 0.041983099
## area_mean -0.22099499 0.231076711 0.028699526 0.053433795
## smoothness_mean -0.14258969 -0.186113023 -0.104291904 0.159382765
## compactness_mean -0.23928535 -0.151891610 -0.074091571 0.031794581
## concavity_mean -0.25840048 -0.060165363 0.002733838 0.019122753
## concave points_mean -0.26085376 0.034767500 -0.025563541 0.065335944
## symmetry_mean -0.13816696 -0.190348770 -0.040239936 0.067124984
## fractal_dimension_mean -0.06436335 -0.366575471 -0.022574090 0.048586765
## radius_se -0.20597878 0.105552152 0.268481387 0.097941242
## texture_se -0.01742803 -0.089979682 0.374633665 -0.359855528
## perimeter_se -0.21132592 0.089457234 0.266645367 0.088992415
## area_se -0.20286964 0.152292628 0.216006528 0.108205039
## smoothness_se -0.01453145 -0.204430453 0.308838979 0.044664180
## compactness_se -0.17039345 -0.232715896 0.154779718 -0.027469363
## concavity_se -0.15358979 -0.197207283 0.176463743 0.001316880
## concave points_se -0.18341740 -0.130321560 0.224657567 0.074067335
## symmetry_se -0.04249842 -0.183848000 0.288584292 0.044073351
## fractal_dimension_se -0.10256832 -0.280092027 0.211503764 0.015304750
## radius_worst -0.22799663 0.219866379 -0.047506990 0.015417240
## texture_worst -0.10446933 0.045467298 -0.042297823 -0.632807885
## perimeter_worst -0.23663968 0.199878428 -0.048546508 0.013802794
## area_worst -0.22487053 0.219351858 -0.011902318 0.025894749
## smoothness_worst -0.12795256 -0.172304352 -0.259797613 0.017652216
## compactness_worst -0.21009588 -0.143593173 -0.236075625 -0.091328415
## concavity_worst -0.22876753 -0.097964114 -0.173057335 -0.073951180
## concave points_worst -0.25088597 0.008257235 -0.170344076 0.006006996
## symmetry_worst -0.12290456 -0.141883349 -0.271312642 -0.036250695
## fractal_dimension_worst -0.13178394 -0.275339469 -0.232791313 -0.077053470
## PC5 PC6 PC7 PC8
## radius_mean -0.037786354 0.0187407904 -0.1240883403 0.007452296
## texture_mean 0.049468850 -0.0321788366 0.0113995382 -0.130674825
## perimeter_mean -0.037374663 0.0173084449 -0.1144770573 0.018687258
## area_mean -0.010331251 -0.0018877480 -0.0516534275 -0.034673604
## smoothness_mean 0.365088528 -0.2863744966 -0.1406689928 0.288974575
## compactness_mean -0.011703971 -0.0141309489 0.0309184960 0.151396350
## concavity_mean -0.086375412 -0.0093441809 -0.1075204434 0.072827285
## concave points_mean 0.043861025 -0.0520499505 -0.1504822142 0.152322414
## symmetry_mean 0.305941428 0.3564584607 -0.0938911345 0.231530989
## fractal_dimension_mean 0.044424360 -0.1194306679 0.2957600240 0.177121441
## radius_se 0.154456496 -0.0256032561 0.3124900373 -0.022539967
## texture_se 0.191650506 -0.0287473145 -0.0907553556 0.475413139
## perimeter_se 0.120990220 0.0018107150 0.3146403902 0.011896690
## area_se 0.127574432 -0.0428639079 0.3466790028 -0.085805135
## smoothness_se 0.232065676 -0.3429173935 -0.2440240556 -0.573410232
## compactness_se -0.279968156 0.0691975186 0.0234635340 -0.117460157
## concavity_se -0.353982091 0.0563432386 -0.2088237897 -0.060566501
## concave points_se -0.195548089 -0.0312244482 -0.3696459369 0.108319309
## symmetry_se 0.252868765 0.4902456426 -0.0803822539 -0.220149279
## fractal_dimension_se -0.263297438 -0.0531952674 0.1913949726 -0.011168188
## radius_worst 0.004406592 -0.0002906849 -0.0097099360 -0.042619416
## texture_worst 0.092883400 -0.0500080613 0.0098707439 -0.036251636
## perimeter_worst -0.007454151 0.0085009872 -0.0004457267 -0.030558534
## area_worst 0.027390903 -0.0251643821 0.0678316595 -0.079394246
## smoothness_worst 0.324435445 -0.3692553703 -0.1088308865 -0.205852191
## compactness_worst -0.121804107 0.0477057929 0.1404729381 -0.084019659
## concavity_worst -0.188518727 0.0283792555 -0.0604880561 -0.072467871
## concave points_worst -0.043332069 -0.0308734498 -0.1679666187 0.036170795
## symmetry_worst 0.244558663 0.4989267845 -0.0184906298 -0.228225053
## fractal_dimension_worst -0.094423351 -0.0802235245 0.3746576261 -0.048360667
## PC9 PC10 PC11 PC12
## radius_mean -0.223109764 0.095486443 -0.04147149 0.051067457
## texture_mean 0.112699390 0.240934066 0.30224340 0.254896423
## perimeter_mean -0.223739213 0.086385615 -0.01678264 0.038926106
## area_mean -0.195586014 0.074956489 -0.11016964 0.065437508
## smoothness_mean 0.006424722 -0.069292681 0.13702184 0.316727211
## compactness_mean -0.167841425 0.012936200 0.30800963 -0.104017044
## concavity_mean 0.040591006 -0.135602298 -0.12419024 0.065653480
## concave points_mean -0.111971106 0.008054528 0.07244603 0.042589267
## symmetry_mean 0.256040084 0.572069479 -0.16305408 -0.288865504
## fractal_dimension_mean -0.123740789 0.081103207 0.03804827 0.236358988
## radius_se 0.249985002 -0.049547594 0.02535702 -0.016687915
## texture_se -0.246645397 -0.289142742 -0.34494446 -0.306160423
## perimeter_se 0.227154024 -0.114508236 0.16731877 -0.101446828
## area_se 0.229160015 -0.091927889 -0.05161946 -0.017679218
## smoothness_se -0.141924890 0.160884609 -0.08420621 -0.294710053
## compactness_se -0.145322810 0.043504866 0.20688568 -0.263456509
## concavity_se 0.358107079 -0.141276243 -0.34951794 0.251146975
## concave points_se 0.272519886 0.086240847 0.34237591 -0.006458751
## symmetry_se -0.304077200 -0.316529830 0.18784404 0.320571348
## fractal_dimension_se -0.213722716 0.367541918 -0.25062479 0.276165974
## radius_worst -0.112141463 0.077361643 -0.10506733 0.039679665
## texture_worst 0.103341204 0.029550941 -0.01315727 0.079797450
## perimeter_worst -0.109614364 0.050508334 -0.05107628 -0.008987738
## area_worst -0.080732461 0.069921152 -0.18459894 0.048088657
## smoothness_worst 0.112315904 -0.128304659 -0.14389035 0.056514866
## compactness_worst -0.100677822 -0.172133632 0.19742047 -0.371662503
## concavity_worst 0.161908621 -0.311638520 -0.18501676 -0.087034532
## concave points_worst 0.060488462 -0.076648291 0.11777205 -0.068125354
## symmetry_worst 0.064637806 -0.029563075 -0.15756025 0.044033503
## fractal_dimension_worst -0.134174175 0.012609579 -0.11828355 -0.034731693
## PC13 PC14 PC15 PC16
## radius_mean 0.01196721 0.059506135 -0.051118775 -0.15058388
## texture_mean 0.20346133 -0.021560100 -0.107922421 -0.15784196
## perimeter_mean 0.04410950 0.048513812 -0.039902936 -0.11445396
## area_mean 0.06737574 0.010830829 0.013966907 -0.13244803
## smoothness_mean 0.04557360 0.445064860 -0.118143364 -0.20461325
## compactness_mean 0.22928130 0.008101057 0.230899962 0.17017837
## concavity_mean 0.38709081 -0.189358699 -0.128283732 0.26947021
## concave points_mean 0.13213810 -0.244794768 -0.217099194 0.38046410
## symmetry_mean 0.18993367 0.030738856 -0.073961707 -0.16466159
## fractal_dimension_mean 0.10623908 -0.377078865 0.517975705 -0.04079279
## radius_se -0.06819523 0.010347413 -0.110050711 0.05890572
## texture_se -0.16822238 -0.010849347 0.032752721 -0.03450040
## perimeter_se -0.03784399 -0.045523718 -0.008268089 0.02651665
## area_se 0.05606493 0.083570718 -0.046024366 0.04115323
## smoothness_se 0.15044143 -0.201152530 0.018559465 -0.05803906
## compactness_se 0.01004017 0.491755932 0.168209315 0.18983090
## concavity_se 0.15878319 0.134586924 0.250471408 -0.12542065
## concave points_se -0.49402674 -0.199666719 0.062079344 -0.19881035
## symmetry_se 0.01033274 -0.046864383 -0.113383199 -0.15771150
## fractal_dimension_se -0.24045832 0.145652466 -0.353232211 0.26855388
## radius_worst -0.13789053 0.023101281 0.166567074 -0.08156057
## texture_worst -0.08014543 0.053430792 0.101115399 0.18555785
## perimeter_worst -0.09696571 0.012219382 0.182755198 -0.05485705
## area_worst -0.10116061 -0.006685465 0.314993600 -0.09065339
## smoothness_worst -0.20513034 0.162235443 0.046125866 0.14555166
## compactness_worst 0.01227931 0.166470250 -0.049956014 -0.15373486
## concavity_worst 0.21798433 -0.066798931 -0.204835886 -0.21502195
## concave points_worst -0.25438749 -0.276418891 -0.169499607 0.17814174
## symmetry_worst -0.25653491 0.005355574 0.139888394 0.25789401
## fractal_dimension_worst -0.17281424 -0.212104110 -0.256173195 -0.40555649
## PC17 PC18 PC19 PC20
## radius_mean 0.202924255 0.1467123385 0.22538466 -0.049698664
## texture_mean -0.038706119 -0.0411029851 0.02978864 -0.244134993
## perimeter_mean 0.194821310 0.1583174548 0.23959528 -0.017665012
## area_mean 0.255705763 0.2661681046 -0.02732219 -0.090143762
## smoothness_mean 0.167929914 -0.3522268017 -0.16456584 0.017100960
## compactness_mean -0.020307708 0.0077941384 0.28422236 0.488686329
## concavity_mean -0.001598353 -0.0269681105 0.00226636 -0.033387086
## concave points_mean 0.034509509 -0.0828277367 -0.15497236 -0.235407606
## symmetry_mean -0.191737848 0.1733977905 -0.05881116 0.026069156
## fractal_dimension_mean 0.050225246 0.0878673570 -0.05815705 -0.175637222
## radius_se -0.139396866 -0.2362165319 0.17588331 -0.090800503
## texture_se 0.043963016 -0.0098586620 0.03600985 -0.071659988
## perimeter_se -0.024635639 -0.0259288003 0.36570154 -0.177250625
## area_se 0.334418173 0.3049069032 -0.41657231 0.274201148
## smoothness_se 0.139595006 -0.2312599432 -0.01326009 0.090061477
## compactness_se -0.008246477 0.1004742346 -0.24244818 -0.461098220
## concavity_se 0.084616716 -0.0001954852 0.12638102 0.066946174
## concave points_se 0.108132263 0.0460549116 -0.01216430 0.068868294
## symmetry_se -0.274059129 0.1870147640 -0.08903929 0.107385289
## fractal_dimension_se -0.122733398 -0.0598230982 0.08660084 0.222345297
## radius_worst -0.240049982 -0.2161013526 0.01366130 -0.005626909
## texture_worst 0.069365185 0.0583984505 -0.07586693 0.300599798
## perimeter_worst -0.234164147 -0.1885435919 0.09081325 0.011003858
## area_worst -0.273399584 -0.1420648558 -0.41004720 0.060047387
## smoothness_worst -0.278030197 0.5015516751 0.23451384 -0.129723903
## compactness_worst -0.004037123 -0.0735745143 0.02020070 0.229280589
## concavity_worst -0.191313419 -0.1039079796 -0.04578612 -0.046482792
## concave points_worst -0.075485316 0.0758138963 -0.26022962 0.033022340
## symmetry_worst 0.430658116 -0.2787138431 0.11725053 -0.116759236
## fractal_dimension_worst 0.159394300 0.0235647497 -0.01149448 -0.104991974
## PC21 PC22 PC23 PC24
## radius_mean -0.0685700057 -0.07292890 -0.0985526942 -0.18257944
## texture_mean 0.4483694667 -0.09480063 -0.0005549975 0.09878679
## perimeter_mean -0.0697690429 -0.07516048 -0.0402447050 -0.11664888
## area_mean -0.0184432785 -0.09756578 0.0077772734 0.06984834
## smoothness_mean -0.1194917473 -0.06382295 -0.0206657211 0.06869742
## compactness_mean 0.1926213963 0.09807756 0.0523603957 -0.10413552
## concavity_mean 0.0055717533 0.18521200 0.3248703785 0.04474106
## concave points_mean -0.0094238187 0.31185243 -0.0514087968 0.08402770
## symmetry_mean -0.0869384844 0.01840673 -0.0512005770 0.01933947
## fractal_dimension_mean -0.0762718362 -0.28786888 -0.0846898562 -0.13326055
## radius_se 0.0863867747 0.15027468 -0.2641253170 -0.55870157
## texture_se 0.2170719674 -0.04845693 -0.0008738805 0.02426730
## perimeter_se -0.3049501584 -0.15935280 0.0900742110 0.51675039
## area_se 0.1925877857 -0.06423262 0.0982150746 -0.02246072
## smoothness_se -0.0720987261 -0.05054490 -0.0598177179 0.01563119
## compactness_se -0.1403865724 0.04528769 0.0091038710 -0.12177779
## concavity_se 0.0630479298 0.20521269 -0.3875423290 0.18820504
## concave points_se 0.0343753236 0.07254538 0.3517550738 -0.10966898
## symmetry_se -0.0976995265 0.08465443 -0.0423628949 0.00322620
## fractal_dimension_se 0.0628432814 -0.24470508 0.0857810992 0.07519442
## radius_worst 0.0072938995 0.09629821 -0.0556767923 -0.15683037
## texture_worst -0.5944401434 0.11111202 -0.0089228997 -0.11848460
## perimeter_worst -0.0920235990 -0.01722163 0.0633448296 0.23711317
## area_worst 0.1467901315 0.09695982 0.1908896250 0.14406303
## smoothness_worst 0.1648492374 0.06825409 0.0936901494 -0.01099014
## compactness_worst 0.1813748671 -0.02967641 -0.1479209247 0.18674995
## concavity_worst -0.1321005945 -0.46042619 0.2864331353 -0.28885257
## concave points_worst 0.0008860815 -0.29984056 -0.5675277966 0.10734024
## symmetry_worst 0.1627085487 -0.09714484 0.1213434508 -0.01438181
## fractal_dimension_worst -0.0923439434 0.46947115 0.0076253382 0.03782545
## PC25 PC26 PC27 PC28
## radius_mean -0.01922650 -0.129476396 -0.131526670 2.111940e-01
## texture_mean 0.08474593 -0.024556664 -0.017357309 -6.581146e-05
## perimeter_mean 0.02701541 -0.125255946 -0.115415423 8.433827e-02
## area_mean -0.21004078 0.362727403 0.466612477 -2.725083e-01
## smoothness_mean 0.02895489 -0.037003686 0.069689923 1.479269e-03
## compactness_mean 0.39662323 0.262808474 0.097748705 -5.462767e-03
## concavity_mean -0.09697732 -0.548876170 0.364808397 4.553864e-02
## concave points_mean -0.18645160 0.387643377 -0.454699351 -8.883097e-03
## symmetry_mean -0.02458369 -0.016044038 -0.015164835 1.433026e-03
## fractal_dimension_mean -0.20722186 -0.097404839 -0.101244946 -6.311687e-03
## radius_se -0.17493043 0.049977080 0.212982901 -1.922239e-01
## texture_se 0.05698648 -0.011237242 -0.010092889 -5.622611e-03
## perimeter_se 0.07292764 0.103653282 0.041691553 2.631919e-01
## area_se 0.13185041 -0.155304589 -0.313358657 -4.206811e-02
## smoothness_se 0.03121070 -0.007717557 -0.009052154 9.792963e-03
## compactness_se 0.17316455 -0.049727632 0.046536088 -1.539555e-02
## concavity_se 0.01593998 0.091454968 -0.084224797 5.820978e-03
## concave points_se -0.12954655 -0.017941919 -0.011165509 -2.900930e-02
## symmetry_se -0.01951493 -0.017267849 -0.019975983 -7.636526e-03
## fractal_dimension_se -0.08417120 0.035488974 -0.012036564 1.975646e-02
## radius_worst 0.07070972 -0.197054744 -0.178666740 4.126396e-01
## texture_worst -0.11818972 0.036469433 0.021410694 -3.902509e-04
## perimeter_worst 0.11803403 -0.244103670 -0.241031046 -7.286809e-01
## area_worst -0.03828995 0.231359525 0.237162466 2.389603e-01
## smoothness_worst -0.04796476 0.012602464 -0.040853568 -1.535248e-03
## compactness_worst -0.62438494 -0.100463424 -0.070505414 4.869182e-02
## concavity_worst 0.11577034 0.266853781 -0.142905801 -1.764090e-02
## concave points_worst 0.26319634 -0.133574507 0.230901389 2.247567e-02
## symmetry_worst 0.04529962 0.028184296 0.022790444 4.920481e-03
## fractal_dimension_worst 0.28013348 0.004520482 0.059985998 -2.356214e-02
## PC29 PC30
## radius_mean 2.114605e-01 0.7024140910
## texture_mean -1.053393e-02 0.0002736610
## perimeter_mean 3.838261e-01 -0.6898969685
## area_mean -4.227949e-01 -0.0329473482
## smoothness_mean -3.434667e-03 -0.0048474577
## compactness_mean -4.101677e-02 0.0446741863
## concavity_mean -1.001479e-02 0.0251386661
## concave points_mean -4.206949e-03 -0.0010772653
## symmetry_mean -7.569862e-03 -0.0012803794
## fractal_dimension_mean 7.301433e-03 -0.0047556848
## radius_se 1.184421e-01 -0.0087110937
## texture_se -8.776279e-03 -0.0010710392
## perimeter_se -6.100219e-03 0.0137293906
## area_se -8.592591e-02 0.0011053260
## smoothness_se 1.776386e-03 -0.0016082109
## compactness_se 3.158134e-03 0.0019156224
## concavity_se 1.607852e-02 -0.0089265265
## concave points_se -2.393779e-02 -0.0021601973
## symmetry_se -5.223292e-03 0.0003293898
## fractal_dimension_se -8.341912e-03 0.0017989568
## radius_worst -6.357249e-01 -0.1356430561
## texture_worst 1.723549e-02 0.0010205360
## perimeter_worst 2.292180e-02 0.0797438536
## area_worst 4.449359e-01 0.0397422838
## smoothness_worst 7.385492e-03 0.0045832773
## compactness_worst 3.566904e-06 -0.0128415624
## concavity_worst -1.267572e-02 0.0004021392
## concave points_worst 3.524045e-02 -0.0022884418
## symmetry_worst 1.340423e-02 0.0003954435
## fractal_dimension_worst 1.147766e-02 0.0018942925print(all_pca_2)
## Call:
## princomp(x = wbcd[, -1], cor = TRUE)
##
## Standard deviations:
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7
## 3.64439401 2.38565601 1.67867477 1.40735229 1.28402903 1.09879780 0.82171778
## Comp.8 Comp.9 Comp.10 Comp.11 Comp.12 Comp.13 Comp.14
## 0.69037464 0.64567392 0.59219377 0.54213992 0.51103950 0.49128148 0.39624453
## Comp.15 Comp.16 Comp.17 Comp.18 Comp.19 Comp.20 Comp.21
## 0.30681422 0.28260007 0.24371918 0.22938785 0.22243559 0.17652026 0.17312681
## Comp.22 Comp.23 Comp.24 Comp.25 Comp.26 Comp.27 Comp.28
## 0.16564843 0.15601550 0.13436892 0.12442376 0.09043030 0.08306903 0.03986650
## Comp.29 Comp.30
## 0.02736427 0.01153451
##
## 30 variables and 569 observations.print(all_pca_3)
## **Results for the Principal Component Analysis (PCA)**
## The analysis was performed on 569 individuals, described by 30 variables
## *The results are available in the following objects:
##
## name description
## 1 "$eig" "eigenvalues"
## 2 "$var" "results for the variables"
## 3 "$var$coord" "coord. for the variables"
## 4 "$var$cor" "correlations variables - dimensions"
## 5 "$var$cos2" "cos2 for the variables"
## 6 "$var$contrib" "contributions of the variables"
## 7 "$ind" "results for the individuals"
## 8 "$ind$coord" "coord. for the individuals"
## 9 "$ind$cos2" "cos2 for the individuals"
## 10 "$ind$contrib" "contributions of the individuals"
## 11 "$call" "summary statistics"
## 12 "$call$centre" "mean of the variables"
## 13 "$call$ecart.type" "standard error of the variables"
## 14 "$call$row.w" "weights for the individuals"
## 15 "$call$col.w" "weights for the variables"Printing a “PCA” class object results in an organized look (moreso when compared to the alternatives) at the function’s (object’s) components.
As was already stated, there are many aspects of PCA that are not covered by the scope of this documents due to time constraints and to keep the overall size within reasonable limits. prcomp() and princomp() components are not directly useful for the tasks that are yet to be performed, yet PCA()’s are - certain components (like the eigenvalues and the coordinates, correlations, squared cosines and contributions of both variables and individuals) are core to following procedures, albeit those will be detailed later on the document. For now, let’s just state that PCA() is a more convenient function in most cases than both prcomp() and princomp() due to the these particular components (such information can be obtained with the other functions, but require the use of additional functions whereas PCA() makes such task more straightforward).
The dudi.pca() function from the ade4 package is formatted as follows:
# Do not run this code snippet, as it is only here for illustration purposes
library(ade4)
dudi.pca(df,
row.w = rep(1, nrow(df))/nrow(df),
col.w = rep(1, ncol(df)),
center = TRUE,
scale = TRUE,
scannf = TRUE,
nf = 2,
...)TRUE of FALSE determines whether perform an optional row weight (by default, uniform row weights).
TRUE or FALSE determines whether or not to scale/standardize the data.
TRUE of FALSE determines whether or not to display an screeplot (the topic is properly explained later on the document).
scannf = FALSE
More information regarding the dudi.pca() function and its arguments is available in its associated RDocumentation page: https://www.rdocumentation.org/packages/ade4/versions/1.7-15/topics/dudi.pca
Let’s now evaluate the results when applying dudi.pca() to the Wisconsin Breast Cancer Dataset:
all_pca_4 <- dudi.pca(wbcd[,-1], scale = TRUE, scannf = FALSE, nf = 30)
class(all_pca_4)
## [1] "pca" "dudi"
str(all_pca_4)
## List of 13
## $ tab :'data.frame': 569 obs. of 30 variables:
## ..$ radius_mean : num [1:569] 1.097 1.83 1.58 -0.769 1.75 ...
## ..$ texture_mean : num [1:569] -2.073 -0.354 0.456 0.254 -1.152 ...
## ..$ perimeter_mean : num [1:569] 1.27 1.686 1.567 -0.593 1.777 ...
## ..$ area_mean : num [1:569] 0.984 1.909 1.559 -0.764 1.826 ...
## ..$ smoothness_mean : num [1:569] 1.568 -0.827 0.942 3.284 0.28 ...
## ..$ compactness_mean : num [1:569] 3.284 -0.487 1.053 3.403 0.539 ...
## ..$ concavity_mean : num [1:569] 2.6529 -0.0238 1.3635 1.9159 1.371 ...
## ..$ concave points_mean : num [1:569] 2.532 0.548 2.037 1.452 1.428 ...
## ..$ symmetry_mean : num [1:569] 2.21752 0.00139 0.93968 2.86738 -0.00956 ...
## ..$ fractal_dimension_mean : num [1:569] 2.256 -0.869 -0.398 4.911 -0.562 ...
## ..$ radius_se : num [1:569] 2.49 0.499 1.229 0.326 1.271 ...
## ..$ texture_se : num [1:569] -0.565 -0.876 -0.78 -0.11 -0.79 ...
## ..$ perimeter_se : num [1:569] 2.833 0.263 0.851 0.287 1.273 ...
## ..$ area_se : num [1:569] 2.488 0.742 1.181 -0.288 1.19 ...
## ..$ smoothness_se : num [1:569] -0.214 -0.605 -0.297 0.69 1.483 ...
## ..$ compactness_se : num [1:569] 1.3169 -0.6929 0.815 2.7443 -0.0485 ...
## ..$ concavity_se : num [1:569] 0.724 -0.441 0.213 0.82 0.828 ...
## ..$ concave points_se : num [1:569] 0.661 0.26 1.425 1.115 1.144 ...
## ..$ symmetry_se : num [1:569] 1.149 -0.805 0.237 4.733 -0.361 ...
## ..$ fractal_dimension_se : num [1:569] 0.9071 -0.0994 0.2936 2.0475 0.4993 ...
## ..$ radius_worst : num [1:569] 1.887 1.806 1.512 -0.281 1.299 ...
## ..$ texture_worst : num [1:569] -1.359 -0.369 -0.024 0.134 -1.467 ...
## ..$ perimeter_worst : num [1:569] 2.3 1.54 1.35 -0.25 1.34 ...
## ..$ area_worst : num [1:569] 2 1.89 1.46 -0.55 1.22 ...
## ..$ smoothness_worst : num [1:569] 1.308 -0.376 0.527 3.394 0.221 ...
## ..$ compactness_worst : num [1:569] 2.617 -0.43 1.083 3.893 -0.313 ...
## ..$ concavity_worst : num [1:569] 2.11 -0.147 0.855 1.99 0.613 ...
## ..$ concave points_worst : num [1:569] 2.296 1.087 1.955 2.176 0.729 ...
## ..$ symmetry_worst : num [1:569] 2.751 -0.244 1.152 6.046 -0.868 ...
## ..$ fractal_dimension_worst: num [1:569] 1.937 0.281 0.201 4.935 -0.397 ...
## $ cw : num [1:30] 1 1 1 1 1 1 1 1 1 1 ...
## $ lw : num [1:569] 0.00176 0.00176 0.00176 0.00176 0.00176 ...
## $ eig : num [1:30] 13.28 5.69 2.82 1.98 1.65 ...
## $ rank: int 30
## $ nf : num 30
## $ c1 :'data.frame': 30 obs. of 30 variables:
## ..$ CS1 : num [1:30] -0.219 -0.104 -0.228 -0.221 -0.143 ...
## ..$ CS2 : num [1:30] -0.2339 -0.0597 -0.2152 -0.2311 0.1861 ...
## ..$ CS3 : num [1:30] -0.00853 0.06455 -0.00931 0.0287 -0.10429 ...
## ..$ CS4 : num [1:30] 0.0414 -0.6031 0.042 0.0534 0.1594 ...
## ..$ CS5 : num [1:30] 0.0378 -0.0495 0.0374 0.0103 -0.3651 ...
## ..$ CS6 : num [1:30] 0.01874 -0.03218 0.01731 -0.00189 -0.28637 ...
## ..$ CS7 : num [1:30] -0.1241 0.0114 -0.1145 -0.0517 -0.1407 ...
## ..$ CS8 : num [1:30] 0.00745 -0.13067 0.01869 -0.03467 0.28897 ...
## ..$ CS9 : num [1:30] -0.22311 0.1127 -0.22374 -0.19559 0.00642 ...
## ..$ CS10: num [1:30] 0.0955 0.2409 0.0864 0.075 -0.0693 ...
## ..$ CS11: num [1:30] 0.0415 -0.3022 0.0168 0.1102 -0.137 ...
## ..$ CS12: num [1:30] 0.0511 0.2549 0.0389 0.0654 0.3167 ...
## ..$ CS13: num [1:30] 0.012 0.2035 0.0441 0.0674 0.0456 ...
## ..$ CS14: num [1:30] -0.0595 0.0216 -0.0485 -0.0108 -0.4451 ...
## ..$ CS15: num [1:30] 0.0511 0.1079 0.0399 -0.014 0.1181 ...
## ..$ CS16: num [1:30] 0.151 0.158 0.114 0.132 0.205 ...
## ..$ CS17: num [1:30] -0.2029 0.0387 -0.1948 -0.2557 -0.1679 ...
## ..$ CS18: num [1:30] -0.1467 0.0411 -0.1583 -0.2662 0.3522 ...
## ..$ CS19: num [1:30] 0.2254 0.0298 0.2396 -0.0273 -0.1646 ...
## ..$ CS20: num [1:30] -0.0497 -0.2441 -0.0177 -0.0901 0.0171 ...
## ..$ CS21: num [1:30] -0.0686 0.4484 -0.0698 -0.0184 -0.1195 ...
## ..$ CS22: num [1:30] 0.0729 0.0948 0.0752 0.0976 0.0638 ...
## ..$ CS23: num [1:30] -0.098553 -0.000555 -0.040245 0.007777 -0.020666 ...
## ..$ CS24: num [1:30] 0.1826 -0.0988 0.1166 -0.0698 -0.0687 ...
## ..$ CS25: num [1:30] -0.0192 0.0847 0.027 -0.21 0.029 ...
## ..$ CS26: num [1:30] -0.1295 -0.0246 -0.1253 0.3627 -0.037 ...
## ..$ CS27: num [1:30] 0.1315 0.0174 0.1154 -0.4666 -0.0697 ...
## ..$ CS28: num [1:30] -2.11e-01 6.58e-05 -8.43e-02 2.73e-01 -1.48e-03 ...
## ..$ CS29: num [1:30] 0.21146 -0.01053 0.38383 -0.42279 -0.00343 ...
## ..$ CS30: num [1:30] 0.702414 0.000274 -0.689897 -0.032947 -0.004847 ...
## $ li :'data.frame': 569 obs. of 30 variables:
## ..$ Axis1 : num [1:569] -9.19 -2.39 -5.73 -7.12 -3.94 ...
## ..$ Axis2 : num [1:569] 1.95 -3.77 -1.08 10.28 -1.95 ...
## ..$ Axis3 : num [1:569] -1.123 -0.529 -0.552 -3.233 1.39 ...
## ..$ Axis4 : num [1:569] 3.634 1.118 0.912 0.153 2.941 ...
## ..$ Axis5 : num [1:569] -1.195 0.622 -0.177 -2.961 0.547 ...
## ..$ Axis6 : num [1:569] 1.4114 0.0287 0.5415 3.0534 -1.2265 ...
## ..$ Axis7 : num [1:569] 2.1594 0.0134 -0.6682 1.4299 -0.9362 ...
## ..$ Axis8 : num [1:569] 0.3984 -0.241 -0.0974 -1.0596 -0.6364 ...
## ..$ Axis9 : num [1:569] -0.1571 -0.7119 0.0241 -1.4054 -0.2638 ...
## ..$ Axis10: num [1:569] -0.877 1.107 0.454 -1.117 0.378 ...
## ..$ Axis11: num [1:569] 0.263 0.813 -0.606 -1.152 0.651 ...
## ..$ Axis12: num [1:569] -0.859 0.158 0.124 1.011 -0.111 ...
## ..$ Axis13: num [1:569] 0.103 -0.944 -0.411 -0.933 0.388 ...
## ..$ Axis14: num [1:569] 0.6908 0.6535 -0.0167 0.4874 0.5392 ...
## ..$ Axis15: num [1:569] -0.60179 0.00897 0.48342 -0.16885 0.31032 ...
## ..$ Axis16: num [1:569] -0.7451 0.6488 -0.3251 -0.0514 0.1526 ...
## ..$ Axis17: num [1:569] 0.2655 0.0172 -0.1909 -0.4826 -0.1331 ...
## ..$ Axis18: num [1:569] 0.5496 -0.3183 0.088 0.0359 0.0187 ...
## ..$ Axis19: num [1:569] 0.1338 -0.2476 -0.3926 -0.0267 0.4614 ...
## ..$ Axis20: num [1:569] 0.3456 -0.1141 -0.2045 -0.4647 0.0655 ...
## ..$ Axis21: num [1:569] 0.0965 -0.0773 0.3111 0.4342 -0.1165 ...
## ..$ Axis22: num [1:569] 0.0688 -0.0946 -0.0603 -0.2033 -0.0176 ...
## ..$ Axis23: num [1:569] 0.0845 -0.2177 -0.0743 -0.1241 0.1395 ...
## ..$ Axis24: num [1:569] -0.17526 0.01129 0.10276 0.15343 -0.00533 ...
## ..$ Axis25: num [1:569] 0.15102 0.17051 -0.17116 -0.0775 -0.00306 ...
## ..$ Axis26: num [1:569] -0.2015 -0.04113 0.00474 -0.27522 0.03925 ...
## ..$ Axis27: num [1:569] 0.2526 -0.1813 -0.0496 -0.1835 -0.0322 ...
## ..$ Axis28: num [1:569] 0.0339 -0.0326 -0.047 -0.0425 0.0348 ...
## ..$ Axis29: num [1:569] 0.04565 -0.00569 0.00315 -0.06929 0.00504 ...
## ..$ Axis30: num [1:569] 0.047169 0.001868 -0.000751 0.019937 -0.021214 ...
## $ co :'data.frame': 30 obs. of 30 variables:
## ..$ Comp1 : num [1:30] -0.798 -0.378 -0.829 -0.805 -0.52 ...
## ..$ Comp2 : num [1:30] -0.558 -0.142 -0.513 -0.551 0.444 ...
## ..$ Comp3 : num [1:30] -0.0143 0.1084 -0.0156 0.0482 -0.1751 ...
## ..$ Comp4 : num [1:30] 0.0583 -0.8487 0.0591 0.0752 0.2243 ...
## ..$ Comp5 : num [1:30] 0.0485 -0.0635 0.048 0.0133 -0.4688 ...
## ..$ Comp6 : num [1:30] 0.02059 -0.03536 0.01902 -0.00207 -0.31467 ...
## ..$ Comp7 : num [1:30] -0.10197 0.00937 -0.09407 -0.04244 -0.11559 ...
## ..$ Comp8 : num [1:30] 0.00514 -0.09021 0.0129 -0.02394 0.1995 ...
## ..$ Comp9 : num [1:30] -0.14406 0.07277 -0.14446 -0.12628 0.00415 ...
## ..$ Comp10: num [1:30] 0.0565 0.1427 0.0512 0.0444 -0.041 ...
## ..$ Comp11: num [1:30] 0.0225 -0.1639 0.0091 0.0597 -0.0743 ...
## ..$ Comp12: num [1:30] 0.0261 0.1303 0.0199 0.0334 0.1619 ...
## ..$ Comp13: num [1:30] 0.00588 0.09996 0.02167 0.0331 0.02239 ...
## ..$ Comp14: num [1:30] -0.02358 0.00854 -0.01922 -0.00429 -0.17635 ...
## ..$ Comp15: num [1:30] 0.01568 0.03311 0.01224 -0.00429 0.03625 ...
## ..$ Comp16: num [1:30] 0.0426 0.0446 0.0323 0.0374 0.0578 ...
## ..$ Comp17: num [1:30] -0.04946 0.00943 -0.04748 -0.06232 -0.04093 ...
## ..$ Comp18: num [1:30] -0.03365 0.00943 -0.03632 -0.06106 0.0808 ...
## ..$ Comp19: num [1:30] 0.05013 0.00663 0.05329 -0.00608 -0.03661 ...
## ..$ Comp20: num [1:30] -0.00877 -0.04309 -0.00312 -0.01591 0.00302 ...
## ..$ Comp21: num [1:30] -0.01187 0.07762 -0.01208 -0.00319 -0.02069 ...
## ..$ Comp22: num [1:30] 0.0121 0.0157 0.0125 0.0162 0.0106 ...
## ..$ Comp23: num [1:30] -1.54e-02 -8.66e-05 -6.28e-03 1.21e-03 -3.22e-03 ...
## ..$ Comp24: num [1:30] 0.02453 -0.01327 0.01567 -0.00939 -0.00923 ...
## ..$ Comp25: num [1:30] -0.00239 0.01054 0.00336 -0.02613 0.0036 ...
## ..$ Comp26: num [1:30] -0.01171 -0.00222 -0.01133 0.0328 -0.00335 ...
## ..$ Comp27: num [1:30] 0.01093 0.00144 0.00959 -0.03876 -0.00579 ...
## ..$ Comp28: num [1:30] -8.42e-03 2.62e-06 -3.36e-03 1.09e-02 -5.90e-05 ...
## ..$ Comp29: num [1:30] 0.005786 -0.000288 0.010503 -0.011569 -0.000094 ...
## ..$ Comp30: num [1:30] 8.10e-03 3.16e-06 -7.96e-03 -3.80e-04 -5.59e-05 ...
## $ l1 :'data.frame': 569 obs. of 30 variables:
## ..$ RS1 : num [1:569] -2.522 -0.655 -1.573 -1.954 -1.08 ...
## ..$ RS2 : num [1:569] 0.817 -1.58 -0.451 4.307 -0.817 ...
## ..$ RS3 : num [1:569] -0.669 -0.315 -0.329 -1.926 0.828 ...
## ..$ RS4 : num [1:569] 2.582 0.795 0.648 0.108 2.089 ...
## ..$ RS5 : num [1:569] -0.931 0.484 -0.138 -2.306 0.426 ...
## ..$ RS6 : num [1:569] 1.2845 0.0261 0.4928 2.7789 -1.1162 ...
## ..$ RS7 : num [1:569] 2.6279 0.0163 -0.8131 1.7401 -1.1393 ...
## ..$ RS8 : num [1:569] 0.577 -0.349 -0.141 -1.535 -0.922 ...
## ..$ RS9 : num [1:569] -0.2433 -1.1026 0.0373 -2.1767 -0.4086 ...
## ..$ RS10: num [1:569] -1.482 1.869 0.767 -1.886 0.638 ...
## ..$ RS11: num [1:569] 0.485 1.5 -1.117 -2.124 1.201 ...
## ..$ RS12: num [1:569] -1.681 0.309 0.243 1.979 -0.216 ...
## ..$ RS13: num [1:569] 0.21 -1.921 -0.836 -1.9 0.79 ...
## ..$ RS14: num [1:569] 1.7434 1.6492 -0.0421 1.2301 1.3607 ...
## ..$ RS15: num [1:569] -1.9614 0.0293 1.5756 -0.5503 1.0114 ...
## ..$ RS16: num [1:569] -2.637 2.296 -1.15 -0.182 0.54 ...
## ..$ RS17: num [1:569] 1.0892 0.0706 -0.7834 -1.9803 -0.5463 ...
## ..$ RS18: num [1:569] 2.3958 -1.3876 0.3835 0.1564 0.0816 ...
## ..$ RS19: num [1:569] 0.601 -1.113 -1.765 -0.12 2.074 ...
## ..$ RS20: num [1:569] 1.958 -0.647 -1.159 -2.633 0.371 ...
## ..$ RS21: num [1:569] 0.557 -0.447 1.797 2.508 -0.673 ...
## ..$ RS22: num [1:569] 0.416 -0.571 -0.364 -1.227 -0.107 ...
## ..$ RS23: num [1:569] 0.542 -1.395 -0.476 -0.795 0.894 ...
## ..$ RS24: num [1:569] -1.3043 0.084 0.7648 1.1419 -0.0397 ...
## ..$ RS25: num [1:569] 1.2138 1.3704 -1.3756 -0.6228 -0.0246 ...
## ..$ RS26: num [1:569] -2.2283 -0.4548 0.0524 -3.0435 0.4341 ...
## ..$ RS27: num [1:569] 3.041 -2.182 -0.597 -2.209 -0.387 ...
## ..$ RS28: num [1:569] 0.851 -0.818 -1.18 -1.066 0.873 ...
## ..$ RS29: num [1:569] 1.668 -0.208 0.115 -2.532 0.184 ...
## ..$ RS30: num [1:569] 4.0894 0.1619 -0.0651 1.7285 -1.8392 ...
## $ call: language dudi.pca(df = wbcd[, -1], scale = TRUE, scannf = FALSE, nf = 30)
## $ cent: Named num [1:30] 14.1273 19.2896 91.969 654.8891 0.0964 ...
## ..- attr(*, "names")= chr [1:30] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
## $ norm: Named num [1:30] 3.521 4.2973 24.2776 351.6048 0.0141 ...
## ..- attr(*, "names")= chr [1:30] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
## - attr(*, "class")= chr [1:2] "pca" "dudi"
summary(all_pca_4)
## Class: pca dudi
## Call: dudi.pca(df = wbcd[, -1], scale = TRUE, scannf = FALSE, nf = 30)
##
## Total inertia: 30
##
## Eigenvalues:
## Ax1 Ax2 Ax3 Ax4 Ax5
## 13.282 5.691 2.818 1.981 1.649
##
## Projected inertia (%):
## Ax1 Ax2 Ax3 Ax4 Ax5
## 44.272 18.971 9.393 6.602 5.496
##
## Cumulative projected inertia (%):
## Ax1 Ax1:2 Ax1:3 Ax1:4 Ax1:5
## 44.27 63.24 72.64 79.24 84.73
##
## (Only 5 dimensions (out of 30) are shown)mean_pca_4 <- dudi.pca(wbcd[,c(2:11)], scale = TRUE, scannf = FALSE, nf = 30)
class(mean_pca_4)
## [1] "pca" "dudi"
str(mean_pca_4)
## List of 13
## $ tab :'data.frame': 569 obs. of 10 variables:
## ..$ radius_mean : num [1:569] 1.097 1.83 1.58 -0.769 1.75 ...
## ..$ texture_mean : num [1:569] -2.073 -0.354 0.456 0.254 -1.152 ...
## ..$ perimeter_mean : num [1:569] 1.27 1.686 1.567 -0.593 1.777 ...
## ..$ area_mean : num [1:569] 0.984 1.909 1.559 -0.764 1.826 ...
## ..$ smoothness_mean : num [1:569] 1.568 -0.827 0.942 3.284 0.28 ...
## ..$ compactness_mean : num [1:569] 3.284 -0.487 1.053 3.403 0.539 ...
## ..$ concavity_mean : num [1:569] 2.6529 -0.0238 1.3635 1.9159 1.371 ...
## ..$ concave points_mean : num [1:569] 2.532 0.548 2.037 1.452 1.428 ...
## ..$ symmetry_mean : num [1:569] 2.21752 0.00139 0.93968 2.86738 -0.00956 ...
## ..$ fractal_dimension_mean: num [1:569] 2.256 -0.869 -0.398 4.911 -0.562 ...
## $ cw : num [1:10] 1 1 1 1 1 1 1 1 1 1
## $ lw : num [1:569] 0.00176 0.00176 0.00176 0.00176 0.00176 ...
## $ eig : num [1:10] 5.479 2.519 0.881 0.499 0.373 ...
## $ rank: int 10
## $ nf : int 10
## $ c1 :'data.frame': 10 obs. of 10 variables:
## ..$ CS1 : num [1:10] -0.364 -0.154 -0.376 -0.364 -0.232 ...
## ..$ CS2 : num [1:10] -0.314 -0.147 -0.285 -0.305 0.402 ...
## ..$ CS3 : num [1:10] -0.124 0.951 -0.114 -0.123 -0.167 ...
## ..$ CS4 : num [1:10] 0.02956 0.00892 0.01346 0.01344 -0.1078 ...
## ..$ CS5 : num [1:10] -0.03107 -0.21992 -0.00595 -0.01934 -0.84375 ...
## ..$ CS6 : num [1:10] -0.2642 -0.0322 -0.2378 -0.3317 0.0622 ...
## ..$ CS7 : num [1:10] 0.0442 -0.0206 0.0834 -0.2612 -0.0113 ...
## ..$ CS8 : num [1:10] -0.08483 0.00713 -0.08926 -0.14461 -0.1705 ...
## ..$ CS9 : num [1:10] -0.47443 -0.00421 -0.38017 0.74735 -0.00585 ...
## ..$ CS10: num [1:10] 0.66907 -0.00025 -0.74049 0.03236 -0.00369 ...
## $ li :'data.frame': 569 obs. of 10 variables:
## ..$ Axis1 : num [1:569] -5.22 -1.73 -3.97 -3.6 -3.15 ...
## ..$ Axis2 : num [1:569] 3.2 -2.54 -0.55 6.91 -1.36 ...
## ..$ Axis3 : num [1:569] -2.171 -1.02 -0.324 0.793 -1.862 ...
## ..$ Axis4 : num [1:569] -0.169 0.548 0.398 -0.605 -0.185 ...
## ..$ Axis5 : num [1:569] 1.514 0.312 -0.323 0.243 0.311 ...
## ..$ Axis6 : num [1:569] 0.1131 -0.9356 0.2715 -0.617 0.0908 ...
## ..$ Axis7 : num [1:569] 0.3447 -0.4209 -0.0765 0.0681 -0.3081 ...
## ..$ Axis8 : num [1:569] 0.23193 0.00834 0.35505 0.10016 -0.09906 ...
## ..$ Axis9 : num [1:569] -0.022 -0.0562 0.0201 -0.0435 -0.0266 ...
## ..$ Axis10: num [1:569] 0.0113 0.023 0.0227 0.0535 -0.0341 ...
## $ co :'data.frame': 10 obs. of 10 variables:
## ..$ Comp1 : num [1:10] -0.852 -0.362 -0.88 -0.852 -0.544 ...
## ..$ Comp2 : num [1:10] -0.498 -0.234 -0.452 -0.484 0.638 ...
## ..$ Comp3 : num [1:10] -0.117 0.892 -0.107 -0.116 -0.156 ...
## ..$ Comp4 : num [1:10] 0.02088 0.0063 0.00951 0.0095 -0.07615 ...
## ..$ Comp5 : num [1:10] -0.01896 -0.13423 -0.00363 -0.01181 -0.51499 ...
## ..$ Comp6 : num [1:10] -0.0931 -0.0113 -0.0838 -0.1169 0.0219 ...
## ..$ Comp7 : num [1:10] 0.01251 -0.00582 0.02359 -0.07391 -0.0032 ...
## ..$ Comp8 : num [1:10] -0.01585 0.00133 -0.01667 -0.02701 -0.03185 ...
## ..$ Comp9 : num [1:10] -0.050064 -0.000445 -0.040117 0.078864 -0.000617 ...
## ..$ Comp10: num [1:10] 1.12e-02 -4.20e-06 -1.24e-02 5.44e-04 -6.20e-05 ...
## $ l1 :'data.frame': 569 obs. of 10 variables:
## ..$ RS1 : num [1:569] -2.232 -0.738 -1.696 -1.537 -1.346 ...
## ..$ RS2 : num [1:569] 2.019 -1.601 -0.347 4.351 -0.856 ...
## ..$ RS3 : num [1:569] -2.314 -1.087 -0.345 0.845 -1.984 ...
## ..$ RS4 : num [1:569] -0.24 0.775 0.563 -0.856 -0.262 ...
## ..$ RS5 : num [1:569] 2.481 0.512 -0.529 0.398 0.51 ...
## ..$ RS6 : num [1:569] 0.321 -2.656 0.771 -1.751 0.258 ...
## ..$ RS7 : num [1:569] 1.22 -1.49 -0.27 0.24 -1.09 ...
## ..$ RS8 : num [1:569] 1.2417 0.0447 1.9008 0.5362 -0.5303 ...
## ..$ RS9 : num [1:569] -0.208 -0.532 0.191 -0.412 -0.252 ...
## ..$ RS10: num [1:569] 0.67 1.37 1.35 3.18 -2.03 ...
## $ call: language dudi.pca(df = wbcd[, c(2:11)], scale = TRUE, scannf = FALSE, nf = 30)
## $ cent: Named num [1:10] 14.1273 19.2896 91.969 654.8891 0.0964 ...
## ..- attr(*, "names")= chr [1:10] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
## $ norm: Named num [1:10] 3.521 4.2973 24.2776 351.6048 0.0141 ...
## ..- attr(*, "names")= chr [1:10] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
## - attr(*, "class")= chr [1:2] "pca" "dudi"
summary(mean_pca_4)
## Class: pca dudi
## Call: dudi.pca(df = wbcd[, c(2:11)], scale = TRUE, scannf = FALSE,
## nf = 30)
##
## Total inertia: 10
##
## Eigenvalues:
## Ax1 Ax2 Ax3 Ax4 Ax5
## 5.4786 2.5187 0.8806 0.4990 0.3725
##
## Projected inertia (%):
## Ax1 Ax2 Ax3 Ax4 Ax5
## 54.786 25.187 8.806 4.990 3.725
##
## Cumulative projected inertia (%):
## Ax1 Ax1:2 Ax1:3 Ax1:4 Ax1:5
## 54.79 79.97 88.78 93.77 97.49
##
## (Only 5 dimensions (out of 10) are shown)se_pca_4 <- dudi.pca(wbcd[,c(12:21)], scale = TRUE, scannf = FALSE, nf = 30)
class(se_pca_4)
## [1] "pca" "dudi"
str(se_pca_4)
## List of 13
## $ tab :'data.frame': 569 obs. of 10 variables:
## ..$ radius_se : num [1:569] 2.49 0.499 1.229 0.326 1.271 ...
## ..$ texture_se : num [1:569] -0.565 -0.876 -0.78 -0.11 -0.79 ...
## ..$ perimeter_se : num [1:569] 2.833 0.263 0.851 0.287 1.273 ...
## ..$ area_se : num [1:569] 2.488 0.742 1.181 -0.288 1.19 ...
## ..$ smoothness_se : num [1:569] -0.214 -0.605 -0.297 0.69 1.483 ...
## ..$ compactness_se : num [1:569] 1.3169 -0.6929 0.815 2.7443 -0.0485 ...
## ..$ concavity_se : num [1:569] 0.724 -0.441 0.213 0.82 0.828 ...
## ..$ concave points_se : num [1:569] 0.661 0.26 1.425 1.115 1.144 ...
## ..$ symmetry_se : num [1:569] 1.149 -0.805 0.237 4.733 -0.361 ...
## ..$ fractal_dimension_se: num [1:569] 0.9071 -0.0994 0.2936 2.0475 0.4993 ...
## $ cw : num [1:10] 1 1 1 1 1 1 1 1 1 1
## $ lw : num [1:569] 0.00176 0.00176 0.00176 0.00176 0.00176 ...
## $ eig : num [1:10] 4.743 2.075 1.264 0.594 0.577 ...
## $ rank: int 10
## $ nf : int 10
## $ c1 :'data.frame': 10 obs. of 10 variables:
## ..$ CS1 : num [1:10] -0.346 -0.189 -0.357 -0.304 -0.212 ...
## ..$ CS2 : num [1:10] -0.44 0.153 -0.42 -0.5 0.271 ...
## ..$ CS3 : num [1:10] -0.0808 -0.5915 -0.0588 -0.0248 -0.4275 ...
## ..$ CS4 : num [1:10] -0.0486 0.263 0.01 -0.0728 -0.7962 ...
## ..$ CS5 : num [1:10] -0.0162 0.7188 -0.0174 -0.0249 -0.1821 ...
## ..$ CS6 : num [1:10] 0.08864 -0.00945 0.03959 0.14303 -0.09 ...
## ..$ CS7 : num [1:10] 0.02138 0.00784 -0.10094 0.17863 0.10052 ...
## ..$ CS8 : num [1:10] -0.1255 0.0486 0.0336 0.0657 0.1111 ...
## ..$ CS9 : num [1:10] 0.3192 -0.0511 0.5182 -0.7596 0.0233 ...
## ..$ CS10: num [1:10] 0.74268 0.00286 -0.64051 -0.13073 -0.02422 ...
## $ li :'data.frame': 569 obs. of 10 variables:
## ..$ Axis1 : num [1:569] -4.053 0.341 -1.961 -3.795 -2.219 ...
## ..$ Axis2 : num [1:569] -2.59 -1.44 -1.17 2.66 -1.03 ...
## ..$ Axis3 : num [1:569] 0.385 0.772 0.966 -0.748 0.403 ...
## ..$ Axis4 : num [1:569] 0.4463 -0.3489 0.0336 2.0225 -1.7073 ...
## ..$ Axis5 : num [1:569] -1.1229 -0.0348 -0.58 -3.0803 -0.4694 ...
## ..$ Axis6 : num [1:569] 0.88039 -0.00367 -0.31629 0.54353 -0.41467 ...
## ..$ Axis7 : num [1:569] -0.0375 -0.2149 -0.6578 -0.85 0.3203 ...
## ..$ Axis8 : num [1:569] 0.126 -0.6072 -0.0941 0.305 -0.5156 ...
## ..$ Axis9 : num [1:569] 0.2351 -0.2678 -0.2899 -0.0379 0.1376 ...
## ..$ Axis10: num [1:569] -0.204 0.055 0.307 0.262 -0.117 ...
## $ co :'data.frame': 10 obs. of 10 variables:
## ..$ Comp1 : num [1:10] -0.753 -0.411 -0.779 -0.662 -0.463 ...
## ..$ Comp2 : num [1:10] -0.634 0.221 -0.605 -0.721 0.39 ...
## ..$ Comp3 : num [1:10] -0.0908 -0.6652 -0.0661 -0.0279 -0.4807 ...
## ..$ Comp4 : num [1:10] -0.0375 0.20274 0.00773 -0.05613 -0.61379 ...
## ..$ Comp5 : num [1:10] -0.0123 0.5462 -0.0132 -0.0189 -0.1384 ...
## ..$ Comp6 : num [1:10] 0.05136 -0.00548 0.02294 0.08287 -0.05214 ...
## ..$ Comp7 : num [1:10] 0.0093 0.00341 -0.04392 0.07772 0.04374 ...
## ..$ Comp8 : num [1:10] -0.0497 0.0193 0.0133 0.026 0.044 ...
## ..$ Comp9 : num [1:10] 0.06522 -0.01045 0.10591 -0.15524 0.00477 ...
## ..$ Comp10: num [1:10] 0.108689 0.000418 -0.093737 -0.019132 -0.003544 ...
## $ l1 :'data.frame': 569 obs. of 10 variables:
## ..$ RS1 : num [1:569] -1.861 0.156 -0.9 -1.742 -1.019 ...
## ..$ RS2 : num [1:569] -1.796 -1.001 -0.813 1.846 -0.715 ...
## ..$ RS3 : num [1:569] 0.343 0.687 0.859 -0.665 0.358 ...
## ..$ RS4 : num [1:569] 0.5789 -0.4526 0.0436 2.6234 -2.2146 ...
## ..$ RS5 : num [1:569] -1.4777 -0.0459 -0.7632 -4.0535 -0.6177 ...
## ..$ RS6 : num [1:569] 1.5195 -0.00633 -0.5459 0.9381 -0.71569 ...
## ..$ RS7 : num [1:569] -0.0861 -0.4939 -1.5117 -1.9535 0.7361 ...
## ..$ RS8 : num [1:569] 0.318 -1.533 -0.238 0.77 -1.301 ...
## ..$ RS9 : num [1:569] 1.15 -1.31 -1.418 -0.186 0.673 ...
## ..$ RS10: num [1:569] -1.397 0.376 2.098 1.79 -0.796 ...
## $ call: language dudi.pca(df = wbcd[, c(12:21)], scale = TRUE, scannf = FALSE, nf = 30)
## $ cent: Named num [1:10] 0.40517 1.21685 2.86606 40.33708 0.00704 ...
## ..- attr(*, "names")= chr [1:10] "radius_se" "texture_se" "perimeter_se" "area_se" ...
## $ norm: Named num [1:10] 0.277 0.551 2.02 45.451 0.003 ...
## ..- attr(*, "names")= chr [1:10] "radius_se" "texture_se" "perimeter_se" "area_se" ...
## - attr(*, "class")= chr [1:2] "pca" "dudi"
summary(se_pca_4)
## Class: pca dudi
## Call: dudi.pca(df = wbcd[, c(12:21)], scale = TRUE, scannf = FALSE,
## nf = 30)
##
## Total inertia: 10
##
## Eigenvalues:
## Ax1 Ax2 Ax3 Ax4 Ax5
## 4.7434 2.0752 1.2644 0.5944 0.5775
##
## Projected inertia (%):
## Ax1 Ax2 Ax3 Ax4 Ax5
## 47.434 20.752 12.644 5.944 5.775
##
## Cumulative projected inertia (%):
## Ax1 Ax1:2 Ax1:3 Ax1:4 Ax1:5
## 47.43 68.19 80.83 86.77 92.55
##
## (Only 5 dimensions (out of 10) are shown)worst_pca_4 <- dudi.pca(wbcd[,c(22:31)], scale = TRUE, scannf = FALSE, nf = 30)
class(worst_pca_4)
## [1] "pca" "dudi"
str(worst_pca_4)
## List of 13
## $ tab :'data.frame': 569 obs. of 10 variables:
## ..$ radius_worst : num [1:569] 1.887 1.806 1.512 -0.281 1.299 ...
## ..$ texture_worst : num [1:569] -1.359 -0.369 -0.024 0.134 -1.467 ...
## ..$ perimeter_worst : num [1:569] 2.3 1.54 1.35 -0.25 1.34 ...
## ..$ area_worst : num [1:569] 2 1.89 1.46 -0.55 1.22 ...
## ..$ smoothness_worst : num [1:569] 1.308 -0.376 0.527 3.394 0.221 ...
## ..$ compactness_worst : num [1:569] 2.617 -0.43 1.083 3.893 -0.313 ...
## ..$ concavity_worst : num [1:569] 2.11 -0.147 0.855 1.99 0.613 ...
## ..$ concave points_worst : num [1:569] 2.296 1.087 1.955 2.176 0.729 ...
## ..$ symmetry_worst : num [1:569] 2.751 -0.244 1.152 6.046 -0.868 ...
## ..$ fractal_dimension_worst: num [1:569] 1.937 0.281 0.201 4.935 -0.397 ...
## $ cw : num [1:10] 1 1 1 1 1 1 1 1 1 1
## $ lw : num [1:569] 0.00176 0.00176 0.00176 0.00176 0.00176 ...
## $ eig : num [1:10] 5.697 2.086 0.803 0.541 0.515 ...
## $ rank: int 10
## $ nf : int 10
## $ c1 :'data.frame': 10 obs. of 10 variables:
## ..$ CS1 : num [1:10] -0.336 -0.201 -0.348 -0.325 -0.249 ...
## ..$ CS2 : num [1:10] -0.4031 -0.0426 -0.3755 -0.4153 0.3379 ...
## ..$ CS3 : num [1:10] -0.0761 0.9768 -0.0838 -0.079 -0.0514 ...
## ..$ CS4 : num [1:10] 0.07096 -0.00233 0.03361 0.0661 0.31184 ...
## ..$ CS5 : num [1:10] -0.026914 -0.029027 0.000677 -0.069245 -0.826364 ...
## ..$ CS6 : num [1:10] 0.1738 -0.0151 0.1317 0.2944 -0.0711 ...
## ..$ CS7 : num [1:10] 0.0258 -0.0265 -0.0265 0.2488 0.0908 ...
## ..$ CS8 : num [1:10] -0.015 0.0431 -0.0922 -0.0317 -0.1624 ...
## ..$ CS9 : num [1:10] 0.42612 -0.00619 0.45915 -0.74526 0.03946 ...
## ..$ CS10: num [1:10] 0.70741 -0.006 -0.7016 -0.04175 -0.00681 ...
## $ li :'data.frame': 569 obs. of 10 variables:
## ..$ Axis1 : num [1:569] -5.97 -1.82 -3.41 -6.3 -1.15 ...
## ..$ Axis2 : num [1:569] 0.672 -2.315 -0.78 6.966 -1.878 ...
## ..$ Axis3 : num [1:569] -2.545 -0.881 -0.775 -0.817 -1.839 ...
## ..$ Axis4 : num [1:569] 0.70743 0.00988 0.5648 2.15604 -0.39309 ...
## ..$ Axis5 : num [1:569] 0.918 -0.139 0.247 1.314 -0.694 ...
## ..$ Axis6 : num [1:569] 0.491 0.927 -0.215 1.024 -0.131 ...
## ..$ Axis7 : num [1:569] -0.039 -0.0995 -0.5089 -0.5796 0.3704 ...
## ..$ Axis8 : num [1:569] -0.241 0.809 0.174 0.169 0.167 ...
## ..$ Axis9 : num [1:569] -0.00564 -0.05416 -0.19504 -0.04306 0.20087 ...
## ..$ Axis10: num [1:569] -0.2178 0.0948 0.1363 0.1227 -0.0626 ...
## $ co :'data.frame': 10 obs. of 10 variables:
## ..$ Comp1 : num [1:10] -0.802 -0.479 -0.831 -0.775 -0.593 ...
## ..$ Comp2 : num [1:10] -0.5822 -0.0615 -0.5424 -0.5998 0.488 ...
## ..$ Comp3 : num [1:10] -0.0682 0.8752 -0.0751 -0.0708 -0.0461 ...
## ..$ Comp4 : num [1:10] 0.05218 -0.00172 0.02471 0.0486 0.2293 ...
## ..$ Comp5 : num [1:10] -0.019308 -0.020824 0.000486 -0.049677 -0.59284 ...
## ..$ Comp6 : num [1:10] 0.07448 -0.00647 0.05646 0.12618 -0.0305 ...
## ..$ Comp7 : num [1:10] 0.00747 -0.00766 -0.00769 0.07204 0.02629 ...
## ..$ Comp8 : num [1:10] -0.00401 0.01156 -0.02472 -0.0085 -0.04351 ...
## ..$ Comp9 : num [1:10] 0.052595 -0.000764 0.056672 -0.091986 0.00487 ...
## ..$ Comp10: num [1:10] 0.044754 -0.00038 -0.044387 -0.002642 -0.000431 ...
## $ l1 :'data.frame': 569 obs. of 10 variables:
## ..$ RS1 : num [1:569] -2.503 -0.762 -1.428 -2.641 -0.48 ...
## ..$ RS2 : num [1:569] 0.465 -1.603 -0.54 4.823 -1.3 ...
## ..$ RS3 : num [1:569] -2.841 -0.983 -0.865 -0.912 -2.053 ...
## ..$ RS4 : num [1:569] 0.9621 0.0134 0.7681 2.9321 -0.5346 ...
## ..$ RS5 : num [1:569] 1.279 -0.194 0.345 1.831 -0.968 ...
## ..$ RS6 : num [1:569] 1.146 2.162 -0.502 2.39 -0.306 ...
## ..$ RS7 : num [1:569] -0.135 -0.344 -1.757 -2.002 1.279 ...
## ..$ RS8 : num [1:569] -0.9 3.02 0.649 0.63 0.622 ...
## ..$ RS9 : num [1:569] -0.0457 -0.4388 -1.5802 -0.3488 1.6274 ...
## ..$ RS10: num [1:569] -3.44 1.5 2.15 1.94 -0.99 ...
## $ call: language dudi.pca(df = wbcd[, c(22:31)], scale = TRUE, scannf = FALSE, nf = 30)
## $ cent: Named num [1:10] 16.269 25.677 107.261 880.583 0.132 ...
## ..- attr(*, "names")= chr [1:10] "radius_worst" "texture_worst" "perimeter_worst" "area_worst" ...
## $ norm: Named num [1:10] 4.829 6.1409 33.573 568.8565 0.0228 ...
## ..- attr(*, "names")= chr [1:10] "radius_worst" "texture_worst" "perimeter_worst" "area_worst" ...
## - attr(*, "class")= chr [1:2] "pca" "dudi"
summary(worst_pca_4)
## Class: pca dudi
## Call: dudi.pca(df = wbcd[, c(22:31)], scale = TRUE, scannf = FALSE,
## nf = 30)
##
## Total inertia: 10
##
## Eigenvalues:
## Ax1 Ax2 Ax3 Ax4 Ax5
## 5.6972 2.0860 0.8028 0.5407 0.5147
##
## Projected inertia (%):
## Ax1 Ax2 Ax3 Ax4 Ax5
## 56.972 20.860 8.028 5.407 5.147
##
## Cumulative projected inertia (%):
## Ax1 Ax1:2 Ax1:3 Ax1:4 Ax1:5
## 56.97 77.83 85.86 91.27 96.41
##
## (Only 5 dimensions (out of 10) are shown)The function class() showcases that the object created by using the dudi.pca() function is of class “pca” (not “PCA” like PCA()’s objects) and “dudi”. It is also worth noting that the cumulative proportions obtained with dudi.pca() (and seen when applying the summary() function to an object of this kind) are identical to the ones obtained with the PCA functions covered up until this point - as should be.
Once again, even though the components of the PCA object obtained through dudi.pca() can be read in str()’s output, the function print() does a better job at showcasing the object’s components and how the resulting data is stored within it/them.
print(all_pca_4)
## Duality diagramm
## class: pca dudi
## $call: dudi.pca(df = wbcd[, -1], scale = TRUE, scannf = FALSE, nf = 30)
##
## $nf: 30 axis-components saved
## $rank: 30
## eigen values: 13.28 5.691 2.818 1.981 1.649 ...
## vector length mode content
## 1 $cw 30 numeric column weights
## 2 $lw 569 numeric row weights
## 3 $eig 30 numeric eigen values
##
## data.frame nrow ncol content
## 1 $tab 569 30 modified array
## 2 $li 569 30 row coordinates
## 3 $l1 569 30 row normed scores
## 4 $co 30 30 column coordinates
## 5 $c1 30 30 column normed scores
## other elements: cent normPCA()’s objects, but it’s a cleaner way to observe the components than through str() and definitely an improvement over R built-in functions. These components are the following:
p vector containing the means for variables.
p vector containing the standard deviations for variables i.e. the root of the sum of squares deviations of the values from their means divided by n
Having the eigenvalues stored as a component is an improvement with respect to R built-in functions, but PCA()’s arguments are more convenient under most circumstances.
The epPCA() function from the ExPosition package is formatted as follows:
# Do not run this code snippet, as it is only here for illustration purposes
library(ExPosition)
epPCA(DATA,
scale = TRUE,
center = TRUE,
DESIGN = NULL,
make_design_nominal = TRUE,
graphs = TRUE,
k = 0)TRUE or FALSE determines whether or not to scale/standardize the data.
TRUE of FALSE determines whether perform an optional row weight (by default, uniform row weights).
TRUE (default) then DESIGN is a vector that indicates groups (and will be dummy-coded); if FALSE then DESIGN is a dummy-coded matrix.
TRUE of FALSE determines whether or not to display the PCA’s associated graph.
More information regarding the epPCA() function and its arguments is available in its associated RDocumentation page: https://www.rdocumentation.org/packages/ExPosition/versions/2.8.23/topics/epPCA
Let’s now evaluate the results when applying epPCA() to the Wisconsin Breast Cancer Dataset:
all_pca_5 <- epPCA(wbcd[,-1], scale = TRUE, graphs = FALSE, k = 30)
class(all_pca_5)
## [1] "expoOutput" "list"
str(all_pca_5)
## List of 2
## $ ExPosition.Data:List of 16
## ..$ fi : num [1:569, 1:30] -9.18 -2.39 -5.73 -7.12 -3.93 ...
## ..$ di : num [1:569, 1] 114.5 26.3 37.4 195.3 34.4 ...
## ..$ ci : num [1:569, 1:30] 0.011182 0.000754 0.00435 0.006714 0.002049 ...
## ..$ ri : num [1:569, 1:30] 0.737 0.216 0.878 0.259 0.45 ...
## ..$ fj : num [1:30, 1:30] -19.01 -9.01 -19.76 -19.19 -12.38 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:30] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
## .. .. ..$ : NULL
## ..$ cj : num [1:30, 1:30] 0.0479 0.0108 0.0518 0.0488 0.0203 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:30] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
## .. .. ..$ : NULL
## ..$ rj : num [1:30, 1:30] 0.636 0.143 0.688 0.649 0.27 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:30] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
## .. .. ..$ : NULL
## ..$ dj : num [1:30, 1] 568 568 568 568 568 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:30] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
## .. .. ..$ : NULL
## ..$ t : num [1:30] 44.27 18.97 9.39 6.6 5.5 ...
## ..$ eigs : num [1:30] 7544 3233 1601 1125 936 ...
## ..$ pdq :List of 8
## .. ..$ p : num [1:569, 1:30] -0.1057 -0.0275 -0.066 -0.0819 -0.0453 ...
## .. ..$ q : num [1:30, 1:30] -0.219 -0.104 -0.228 -0.221 -0.143 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:30] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
## .. .. .. ..$ : NULL
## .. ..$ Dv : num [1:30] 86.9 56.9 40 33.5 30.6 ...
## .. ..$ Dd : num [1:30, 1:30] 86.9 0 0 0 0 ...
## .. ..$ ng : int 30
## .. ..$ rank: int 30
## .. ..$ tau : num [1:30] 44.27 18.97 9.39 6.6 5.5 ...
## .. ..$ eigs: num [1:30] 7544 3233 1601 1125 936 ...
## .. ..- attr(*, "class")= chr [1:2] "epSVD" "list"
## ..$ X : num [1:569, 1:30] 1.096 1.828 1.578 -0.768 1.749 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : NULL
## .. .. ..$ : chr [1:30] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
## .. ..- attr(*, "scaled:center")= Named num [1:30] 14.1273 19.2896 91.969 654.8891 0.0964 ...
## .. .. ..- attr(*, "names")= chr [1:30] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
## .. ..- attr(*, "scaled:scale")= Named num [1:30] 3.524 4.301 24.299 351.9141 0.0141 ...
## .. .. ..- attr(*, "names")= chr [1:30] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
## ..$ M : num [1:569] 1 1 1 1 1 1 1 1 1 1 ...
## ..$ W : num [1:30] 1 1 1 1 1 1 1 1 1 1 ...
## ..$ center: Named num [1:30] 14.1273 19.2896 91.969 654.8891 0.0964 ...
## .. ..- attr(*, "names")= chr [1:30] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
## ..$ scale : Named num [1:30] 3.524 4.301 24.299 351.9141 0.0141 ...
## .. ..- attr(*, "names")= chr [1:30] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
## ..- attr(*, "class")= chr [1:2] "epPCA" "list"
## $ Plotting.Data :List of 5
## ..$ fi.col : chr [1:569, 1] "#305ABF" "#305ABF" "#305ABF" "#305ABF" ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:569] "1" "2" "3" "4" ...
## .. .. ..$ : NULL
## ..$ fi.pch : num [1:569, 1] 21 21 21 21 21 21 21 21 21 21 ...
## ..$ fj.col : chr [1:30, 1] "mediumorchid4" "mediumorchid4" "mediumorchid4" "mediumorchid4" ...
## ..$ fj.pch : num [1:30, 1] 21 21 21 21 21 21 21 21 21 21 ...
## ..$ constraints:List of 4
## .. ..$ minx: num -26.1
## .. ..$ miny: num -24
## .. ..$ maxx: num 6.39
## .. ..$ maxy: num 15.3
## ..- attr(*, "class")= chr [1:2] "epGraphs" "list"
## - attr(*, "class")= chr [1:2] "expoOutput" "list"
summary(all_pca_5)
## Length Class Mode
## ExPosition.Data 16 epPCA list
## Plotting.Data 5 epGraphs listmean_pca_5 <- epPCA(wbcd[,c(2:11)], scale = TRUE, graphs = FALSE, k = 30)
class(mean_pca_5)
## [1] "expoOutput" "list"
str(mean_pca_5)
## List of 2
## $ ExPosition.Data:List of 16
## ..$ fi : num [1:569, 1:10] -5.22 -1.73 -3.97 -3.59 -3.15 ...
## ..$ di : num [1:569, 1] 44.7 11.9 16.6 62 15.5 ...
## ..$ ci : num [1:569, 1:10] 0.008755 0.000958 0.005055 0.00415 0.003185 ...
## ..$ ri : num [1:569, 1:10] 0.609 0.25 0.947 0.208 0.641 ...
## ..$ fj : num [1:10, 1:10] -20.3 -8.62 -20.98 -20.31 -12.97 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:10] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
## .. .. ..$ : NULL
## ..$ cj : num [1:10, 1:10] 0.1325 0.0239 0.1414 0.1326 0.054 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:10] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
## .. .. ..$ : NULL
## ..$ rj : num [1:10, 1:10] 0.726 0.131 0.775 0.726 0.296 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:10] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
## .. .. ..$ : NULL
## ..$ dj : num [1:10, 1] 568 568 568 568 568 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:10] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
## .. .. ..$ : NULL
## ..$ t : num [1:10] 54.79 25.19 8.81 4.99 3.73 ...
## ..$ eigs : num [1:10] 3112 1431 500 283 212 ...
## ..$ pdq :List of 8
## .. ..$ p : num [1:569, 1:10] -0.0936 -0.031 -0.0711 -0.0644 -0.0564 ...
## .. ..$ q : num [1:10, 1:10] -0.364 -0.154 -0.376 -0.364 -0.232 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:10] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
## .. .. .. ..$ : NULL
## .. ..$ Dv : num [1:10] 55.8 37.8 22.4 16.8 14.5 ...
## .. ..$ Dd : num [1:10, 1:10] 55.8 0 0 0 0 ...
## .. ..$ ng : int 10
## .. ..$ rank: int 10
## .. ..$ tau : num [1:10] 54.79 25.19 8.81 4.99 3.73 ...
## .. ..$ eigs: num [1:10] 3112 1431 500 283 212 ...
## .. ..- attr(*, "class")= chr [1:2] "epSVD" "list"
## ..$ X : num [1:569, 1:10] 1.096 1.828 1.578 -0.768 1.749 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : NULL
## .. .. ..$ : chr [1:10] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
## .. ..- attr(*, "scaled:center")= Named num [1:10] 14.1273 19.2896 91.969 654.8891 0.0964 ...
## .. .. ..- attr(*, "names")= chr [1:10] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
## .. ..- attr(*, "scaled:scale")= Named num [1:10] 3.524 4.301 24.299 351.9141 0.0141 ...
## .. .. ..- attr(*, "names")= chr [1:10] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
## ..$ M : num [1:569] 1 1 1 1 1 1 1 1 1 1 ...
## ..$ W : num [1:10] 1 1 1 1 1 1 1 1 1 1
## ..$ center: Named num [1:10] 14.1273 19.2896 91.969 654.8891 0.0964 ...
## .. ..- attr(*, "names")= chr [1:10] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
## ..$ scale : Named num [1:10] 3.524 4.301 24.299 351.9141 0.0141 ...
## .. ..- attr(*, "names")= chr [1:10] "radius_mean" "texture_mean" "perimeter_mean" "area_mean" ...
## ..- attr(*, "class")= chr [1:2] "epPCA" "list"
## $ Plotting.Data :List of 5
## ..$ fi.col : chr [1:569, 1] "#305ABF" "#305ABF" "#305ABF" "#305ABF" ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:569] "1" "2" "3" "4" ...
## .. .. ..$ : NULL
## ..$ fi.pch : num [1:569, 1] 21 21 21 21 21 21 21 21 21 21 ...
## ..$ fj.col : chr [1:10, 1] "mediumorchid4" "mediumorchid4" "mediumorchid4" "mediumorchid4" ...
## ..$ fj.pch : num [1:10, 1] 21 21 21 21 21 21 21 21 21 21
## ..$ constraints:List of 4
## .. ..$ minx: num -26.8
## .. ..$ miny: num -24.9
## .. ..$ maxx: num 4.56
## .. ..$ maxy: num 13.7
## ..- attr(*, "class")= chr [1:2] "epGraphs" "list"
## - attr(*, "class")= chr [1:2] "expoOutput" "list"
summary(mean_pca_5)
## Length Class Mode
## ExPosition.Data 16 epPCA list
## Plotting.Data 5 epGraphs listse_pca_5 <- epPCA(wbcd[,c(12:21)], scale = TRUE, graphs = FALSE, k = 30)
class(se_pca_5)
## [1] "expoOutput" "list"
str(se_pca_5)
## List of 2
## $ ExPosition.Data:List of 16
## ..$ fi : num [1:569, 1:10] -4.05 0.34 -1.96 -3.79 -2.22 ...
## ..$ di : num [1:569, 1] 25.57 3.4 7.2 36.73 9.84 ...
## ..$ ci : num [1:569, 1:10] 0.006086 0.000043 0.001425 0.005336 0.001824 ...
## ..$ ri : num [1:569, 1:10] 0.6413 0.0341 0.5336 0.3914 0.4996 ...
## ..$ fj : num [1:10, 1:10] -17.94 -9.79 -18.56 -15.78 -11.03 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:10] "radius_se" "texture_se" "perimeter_se" "area_se" ...
## .. .. ..$ : NULL
## ..$ cj : num [1:10, 1:10] 0.1194 0.0356 0.1278 0.0924 0.0451 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:10] "radius_se" "texture_se" "perimeter_se" "area_se" ...
## .. .. ..$ : NULL
## ..$ rj : num [1:10, 1:10] 0.567 0.169 0.606 0.438 0.214 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:10] "radius_se" "texture_se" "perimeter_se" "area_se" ...
## .. .. ..$ : NULL
## ..$ dj : num [1:10, 1] 568 568 568 568 568 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:10] "radius_se" "texture_se" "perimeter_se" "area_se" ...
## .. .. ..$ : NULL
## ..$ t : num [1:10] 47.43 20.75 12.64 5.94 5.77 ...
## ..$ eigs : num [1:10] 2694 1179 718 338 328 ...
## ..$ pdq :List of 8
## .. ..$ p : num [1:569, 1:10] -0.07801 0.00656 -0.03775 -0.07305 -0.04271 ...
## .. ..$ q : num [1:10, 1:10] -0.346 -0.189 -0.357 -0.304 -0.212 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:10] "radius_se" "texture_se" "perimeter_se" "area_se" ...
## .. .. .. ..$ : NULL
## .. ..$ Dv : num [1:10] 51.9 34.3 26.8 18.4 18.1 ...
## .. ..$ Dd : num [1:10, 1:10] 51.9 0 0 0 0 ...
## .. ..$ ng : int 10
## .. ..$ rank: int 10
## .. ..$ tau : num [1:10] 47.43 20.75 12.64 5.94 5.77 ...
## .. ..$ eigs: num [1:10] 2694 1179 718 338 328 ...
## .. ..- attr(*, "class")= chr [1:2] "epSVD" "list"
## ..$ X : num [1:569, 1:10] 2.488 0.499 1.228 0.326 1.269 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : NULL
## .. .. ..$ : chr [1:10] "radius_se" "texture_se" "perimeter_se" "area_se" ...
## .. ..- attr(*, "scaled:center")= Named num [1:10] 0.40517 1.21685 2.86606 40.33708 0.00704 ...
## .. .. ..- attr(*, "names")= chr [1:10] "radius_se" "texture_se" "perimeter_se" "area_se" ...
## .. ..- attr(*, "scaled:scale")= Named num [1:10] 0.277 0.552 2.022 45.491 0.003 ...
## .. .. ..- attr(*, "names")= chr [1:10] "radius_se" "texture_se" "perimeter_se" "area_se" ...
## ..$ M : num [1:569] 1 1 1 1 1 1 1 1 1 1 ...
## ..$ W : num [1:10] 1 1 1 1 1 1 1 1 1 1
## ..$ center: Named num [1:10] 0.40517 1.21685 2.86606 40.33708 0.00704 ...
## .. ..- attr(*, "names")= chr [1:10] "radius_se" "texture_se" "perimeter_se" "area_se" ...
## ..$ scale : Named num [1:10] 0.277 0.552 2.022 45.491 0.003 ...
## .. ..- attr(*, "names")= chr [1:10] "radius_se" "texture_se" "perimeter_se" "area_se" ...
## ..- attr(*, "class")= chr [1:2] "epPCA" "list"
## $ Plotting.Data :List of 5
## ..$ fi.col : chr [1:569, 1] "#305ABF" "#305ABF" "#305ABF" "#305ABF" ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:569] "1" "2" "3" "4" ...
## .. .. ..$ : NULL
## ..$ fi.pch : num [1:569, 1] 21 21 21 21 21 21 21 21 21 21 ...
## ..$ fj.col : chr [1:10, 1] "mediumorchid4" "mediumorchid4" "mediumorchid4" "mediumorchid4" ...
## ..$ fj.pch : num [1:10, 1] 21 21 21 21 21 21 21 21 21 21
## ..$ constraints:List of 4
## .. ..$ minx: num -23
## .. ..$ miny: num -13.9
## .. ..$ maxx: num 3.51
## .. ..$ maxy: num 19.7
## ..- attr(*, "class")= chr [1:2] "epGraphs" "list"
## - attr(*, "class")= chr [1:2] "expoOutput" "list"
summary(se_pca_5)
## Length Class Mode
## ExPosition.Data 16 epPCA list
## Plotting.Data 5 epGraphs listworst_pca_5 <- epPCA(wbcd[,c(22:31)], scale = TRUE, graphs = FALSE, k = 30)
class(worst_pca_5)
## [1] "expoOutput" "list"
str(worst_pca_5)
## List of 2
## $ ExPosition.Data:List of 16
## ..$ fi : num [1:569, 1:10] -5.97 -1.82 -3.4 -6.3 -1.15 ...
## ..$ di : num [1:569, 1] 44.24 10.98 13.57 96.57 9.07 ...
## ..$ ci : num [1:569, 1:10] 0.011011 0.00102 0.003582 0.012262 0.000406 ...
## ..$ ri : num [1:569, 1:10] 0.805 0.301 0.854 0.411 0.145 ...
## ..$ fj : num [1:10, 1:10] -19.1 -11.4 -19.8 -18.5 -14.1 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:10] "radius_worst" "texture_worst" "perimeter_worst" "area_worst" ...
## .. .. ..$ : NULL
## ..$ cj : num [1:10, 1:10] 0.1128 0.0403 0.1212 0.1055 0.0618 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:10] "radius_worst" "texture_worst" "perimeter_worst" "area_worst" ...
## .. .. ..$ : NULL
## ..$ rj : num [1:10, 1:10] 0.643 0.23 0.691 0.601 0.352 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:10] "radius_worst" "texture_worst" "perimeter_worst" "area_worst" ...
## .. .. ..$ : NULL
## ..$ dj : num [1:10, 1] 568 568 568 568 568 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:10] "radius_worst" "texture_worst" "perimeter_worst" "area_worst" ...
## .. .. ..$ : NULL
## ..$ t : num [1:10] 56.97 20.86 8.03 5.41 5.15 ...
## ..$ eigs : num [1:10] 3236 1185 456 307 292 ...
## ..$ pdq :List of 8
## .. ..$ p : num [1:569, 1:10] -0.1049 -0.0319 -0.0599 -0.1107 -0.0201 ...
## .. ..$ q : num [1:10, 1:10] -0.336 -0.201 -0.348 -0.325 -0.249 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:10] "radius_worst" "texture_worst" "perimeter_worst" "area_worst" ...
## .. .. .. ..$ : NULL
## .. ..$ Dv : num [1:10] 56.9 34.4 21.4 17.5 17.1 ...
## .. ..$ Dd : num [1:10, 1:10] 56.9 0 0 0 0 ...
## .. ..$ ng : int 10
## .. ..$ rank: int 10
## .. ..$ tau : num [1:10] 56.97 20.86 8.03 5.41 5.15 ...
## .. ..$ eigs: num [1:10] 3236 1185 456 307 292 ...
## .. ..- attr(*, "class")= chr [1:2] "epSVD" "list"
## ..$ X : num [1:569, 1:10] 1.885 1.804 1.511 -0.281 1.297 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : NULL
## .. .. ..$ : chr [1:10] "radius_worst" "texture_worst" "perimeter_worst" "area_worst" ...
## .. ..- attr(*, "scaled:center")= Named num [1:10] 16.269 25.677 107.261 880.583 0.132 ...
## .. .. ..- attr(*, "names")= chr [1:10] "radius_worst" "texture_worst" "perimeter_worst" "area_worst" ...
## .. ..- attr(*, "scaled:scale")= Named num [1:10] 4.8332 6.1463 33.6025 569.357 0.0228 ...
## .. .. ..- attr(*, "names")= chr [1:10] "radius_worst" "texture_worst" "perimeter_worst" "area_worst" ...
## ..$ M : num [1:569] 1 1 1 1 1 1 1 1 1 1 ...
## ..$ W : num [1:10] 1 1 1 1 1 1 1 1 1 1
## ..$ center: Named num [1:10] 16.269 25.677 107.261 880.583 0.132 ...
## .. ..- attr(*, "names")= chr [1:10] "radius_worst" "texture_worst" "perimeter_worst" "area_worst" ...
## ..$ scale : Named num [1:10] 4.8332 6.1463 33.6025 569.357 0.0228 ...
## .. ..- attr(*, "names")= chr [1:10] "radius_worst" "texture_worst" "perimeter_worst" "area_worst" ...
## ..- attr(*, "class")= chr [1:2] "epPCA" "list"
## $ Plotting.Data :List of 5
## ..$ fi.col : chr [1:569, 1] "#305ABF" "#305ABF" "#305ABF" "#305ABF" ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:569] "1" "2" "3" "4" ...
## .. .. ..$ : NULL
## ..$ fi.pch : num [1:569, 1] 21 21 21 21 21 21 21 21 21 21 ...
## ..$ fj.col : chr [1:10, 1] "mediumorchid4" "mediumorchid4" "mediumorchid4" "mediumorchid4" ...
## ..$ fj.pch : num [1:10, 1] 21 21 21 21 21 21 21 21 21 21
## ..$ constraints:List of 4
## .. ..$ minx: num -26
## .. ..$ miny: num -18.9
## .. ..$ maxx: num 5.09
## .. ..$ maxy: num 16.4
## ..- attr(*, "class")= chr [1:2] "epGraphs" "list"
## - attr(*, "class")= chr [1:2] "expoOutput" "list"
summary(worst_pca_5)
## Length Class Mode
## ExPosition.Data 16 epPCA list
## Plotting.Data 5 epGraphs listThe function class() showcases that the object created by using the epPCA() function is of class “list” and “expoOutput”. It is also worth noting that the cumulative proportions easily observed with previous functions is hidden somewhere within the object and not readily available for an evaluation.
The output of both str() and summary() suggest that the core components are hidden within two “father” components by the name of ExPosition.Data and Plotting.Data. The output of applying the print() function to the object at hand further evidentiates this:
print(all_pca_5)
## **ExPosition output data**
## *Contains the following objects:
##
## name
## 1 "$ExPosition.Data"
## 2 "$Plotting.Data"
## description
## 1 "All ExPosition classes output (data, factor scores, contributions, etc...)"
## 2 "All ExPosition & prettyGraphs plotting data (constraints, colors, etc...)"The brief descriptions given by print()’s output imply that the components equivalent to those seen in previous functions are stored inside ExPosition.Data whereas Plotting.Data holds data regarding an optional plot (as their name would suggest).
The following code snippet details the components stored inside ExPosition.Data:
print(all_pca_5$ExPosition.Data)
## **Results for Principal Component Analysis**
## The analysis was performed on 569 individuals, described by 30 variables
## *The results are available in the following objects:
##
## name description
## 1 "$fi" "Factor scores of the rows"
## 2 "$di" "Squared distances of the rows"
## 3 "$ci" "Contributions of the rows"
## 4 "$ri" "Cosines of the rows"
## 5 "$fj" "Factor scores of the columns"
## 6 "$dj" "square distances of the columns"
## 7 "$cj" "Contributions for the columns"
## 8 "$rj" "Cosines of the columns"
## 9 "$t" "Explained Variance"
## 10 "$eigs" "Eigenvalues"
## 11 "$pdq" "SVD data"
## 12 "$X" "X matrix to decompose"
## 13 "$M" "Masses - each set to 1"
## 14 "$W" "Weights - each set to 1"
## 15 "$center" "Center of X"
## 16 "$scale" "Scale factor of X"
summary(all_pca_5$ExPosition.Data)
## Length Class Mode
## fi 17070 -none- numeric
## di 569 -none- numeric
## ci 17070 -none- numeric
## ri 17070 -none- numeric
## fj 900 -none- numeric
## cj 900 -none- numeric
## rj 900 -none- numeric
## dj 30 -none- numeric
## t 30 -none- numeric
## eigs 30 -none- numeric
## pdq 8 epSVD list
## X 17070 -none- numeric
## M 569 -none- numeric
## W 30 -none- numeric
## center 30 -none- numeric
## scale 30 -none- numericprint(mean_pca_5$ExPosition.Data)
## **Results for Principal Component Analysis**
## The analysis was performed on 569 individuals, described by 10 variables
## *The results are available in the following objects:
##
## name description
## 1 "$fi" "Factor scores of the rows"
## 2 "$di" "Squared distances of the rows"
## 3 "$ci" "Contributions of the rows"
## 4 "$ri" "Cosines of the rows"
## 5 "$fj" "Factor scores of the columns"
## 6 "$dj" "square distances of the columns"
## 7 "$cj" "Contributions for the columns"
## 8 "$rj" "Cosines of the columns"
## 9 "$t" "Explained Variance"
## 10 "$eigs" "Eigenvalues"
## 11 "$pdq" "SVD data"
## 12 "$X" "X matrix to decompose"
## 13 "$M" "Masses - each set to 1"
## 14 "$W" "Weights - each set to 1"
## 15 "$center" "Center of X"
## 16 "$scale" "Scale factor of X"
summary(mean_pca_5$ExPosition.Data)
## Length Class Mode
## fi 5690 -none- numeric
## di 569 -none- numeric
## ci 5690 -none- numeric
## ri 5690 -none- numeric
## fj 100 -none- numeric
## cj 100 -none- numeric
## rj 100 -none- numeric
## dj 10 -none- numeric
## t 10 -none- numeric
## eigs 10 -none- numeric
## pdq 8 epSVD list
## X 5690 -none- numeric
## M 569 -none- numeric
## W 10 -none- numeric
## center 10 -none- numeric
## scale 10 -none- numericprint(se_pca_5$ExPosition.Data)
## **Results for Principal Component Analysis**
## The analysis was performed on 569 individuals, described by 10 variables
## *The results are available in the following objects:
##
## name description
## 1 "$fi" "Factor scores of the rows"
## 2 "$di" "Squared distances of the rows"
## 3 "$ci" "Contributions of the rows"
## 4 "$ri" "Cosines of the rows"
## 5 "$fj" "Factor scores of the columns"
## 6 "$dj" "square distances of the columns"
## 7 "$cj" "Contributions for the columns"
## 8 "$rj" "Cosines of the columns"
## 9 "$t" "Explained Variance"
## 10 "$eigs" "Eigenvalues"
## 11 "$pdq" "SVD data"
## 12 "$X" "X matrix to decompose"
## 13 "$M" "Masses - each set to 1"
## 14 "$W" "Weights - each set to 1"
## 15 "$center" "Center of X"
## 16 "$scale" "Scale factor of X"
summary(se_pca_5$ExPosition.Data)
## Length Class Mode
## fi 5690 -none- numeric
## di 569 -none- numeric
## ci 5690 -none- numeric
## ri 5690 -none- numeric
## fj 100 -none- numeric
## cj 100 -none- numeric
## rj 100 -none- numeric
## dj 10 -none- numeric
## t 10 -none- numeric
## eigs 10 -none- numeric
## pdq 8 epSVD list
## X 5690 -none- numeric
## M 569 -none- numeric
## W 10 -none- numeric
## center 10 -none- numeric
## scale 10 -none- numericprint(worst_pca_5$ExPosition.Data)
## **Results for Principal Component Analysis**
## The analysis was performed on 569 individuals, described by 10 variables
## *The results are available in the following objects:
##
## name description
## 1 "$fi" "Factor scores of the rows"
## 2 "$di" "Squared distances of the rows"
## 3 "$ci" "Contributions of the rows"
## 4 "$ri" "Cosines of the rows"
## 5 "$fj" "Factor scores of the columns"
## 6 "$dj" "square distances of the columns"
## 7 "$cj" "Contributions for the columns"
## 8 "$rj" "Cosines of the columns"
## 9 "$t" "Explained Variance"
## 10 "$eigs" "Eigenvalues"
## 11 "$pdq" "SVD data"
## 12 "$X" "X matrix to decompose"
## 13 "$M" "Masses - each set to 1"
## 14 "$W" "Weights - each set to 1"
## 15 "$center" "Center of X"
## 16 "$scale" "Scale factor of X"
summary(worst_pca_5$ExPosition.Data)
## Length Class Mode
## fi 5690 -none- numeric
## di 569 -none- numeric
## ci 5690 -none- numeric
## ri 5690 -none- numeric
## fj 100 -none- numeric
## cj 100 -none- numeric
## rj 100 -none- numeric
## dj 10 -none- numeric
## t 10 -none- numeric
## eigs 10 -none- numeric
## pdq 8 epSVD list
## X 5690 -none- numeric
## M 569 -none- numeric
## W 10 -none- numeric
## center 10 -none- numeric
## scale 10 -none- numericAs was the case with dudi.pca(), having the eigenvalues stored as a component is an improvement with respect to R built-in functions, but PCA()’s arguments are more convenient under most circumstances. What’s more: this is the only PCA function with no direct access to cumulative variance.
The aforementioned video essay by Josh Starmer (https://www.youtube.com/watch?v=FgakZw6K1QQ) helps with the understanding of the eigenvalues (also known as singular vectors) since there is a graphical explanation detailing where they come from, but in summary they are a unit long vector whose slope is given by the amount of variation retained by each principal component. As such, eigenvalues store said variation and are therefore large for the first PCs and small for the subsequent ones, meaning that the first PCs correspond to the directions with the maximum amount of variation in the dataset (which makes sense since the first PC is the one that better explains the dataset).
Some of the previously described functions made the eigenvalues easily available and accessible - doing so requires addressing the proper component within each of said functions:
all_pca_3$eig
## eigenvalue percentage of variance cumulative percentage of variance
## comp 1 1.328161e+01 4.427203e+01 44.27203
## comp 2 5.691355e+00 1.897118e+01 63.24321
## comp 3 2.817949e+00 9.393163e+00 72.63637
## comp 4 1.980640e+00 6.602135e+00 79.23851
## comp 5 1.648731e+00 5.495768e+00 84.73427
## comp 6 1.207357e+00 4.024522e+00 88.75880
## comp 7 6.752201e-01 2.250734e+00 91.00953
## comp 8 4.766171e-01 1.588724e+00 92.59825
## comp 9 4.168948e-01 1.389649e+00 93.98790
## comp 10 3.506935e-01 1.168978e+00 95.15688
## comp 11 2.939157e-01 9.797190e-01 96.13660
## comp 12 2.611614e-01 8.705379e-01 97.00714
## comp 13 2.413575e-01 8.045250e-01 97.81166
## comp 14 1.570097e-01 5.233657e-01 98.33503
## comp 15 9.413497e-02 3.137832e-01 98.64881
## comp 16 7.986280e-02 2.662093e-01 98.91502
## comp 17 5.939904e-02 1.979968e-01 99.11302
## comp 18 5.261878e-02 1.753959e-01 99.28841
## comp 19 4.947759e-02 1.649253e-01 99.45334
## comp 20 3.115940e-02 1.038647e-01 99.55720
## comp 21 2.997289e-02 9.990965e-02 99.65711
## comp 22 2.743940e-02 9.146468e-02 99.74858
## comp 23 2.434084e-02 8.113613e-02 99.82971
## comp 24 1.805501e-02 6.018336e-02 99.88990
## comp 25 1.548127e-02 5.160424e-02 99.94150
## comp 26 8.177640e-03 2.725880e-02 99.96876
## comp 27 6.900464e-03 2.300155e-02 99.99176
## comp 28 1.589338e-03 5.297793e-03 99.99706
## comp 29 7.488031e-04 2.496010e-03 99.99956
## comp 30 1.330448e-04 4.434827e-04 100.00000all_pca_4$eig
## [1] 1.328161e+01 5.691355e+00 2.817949e+00 1.980640e+00 1.648731e+00
## [6] 1.207357e+00 6.752201e-01 4.766171e-01 4.168948e-01 3.506935e-01
## [11] 2.939157e-01 2.611614e-01 2.413575e-01 1.570097e-01 9.413497e-02
## [16] 7.986280e-02 5.939904e-02 5.261878e-02 4.947759e-02 3.115940e-02
## [21] 2.997289e-02 2.743940e-02 2.434084e-02 1.805501e-02 1.548127e-02
## [26] 8.177640e-03 6.900464e-03 1.589338e-03 7.488031e-04 1.330448e-04all_pca_5$ExPosition.Data$eigs
## [1] 7.543953e+03 3.232689e+03 1.600595e+03 1.125004e+03 9.364790e+02
## [6] 6.857786e+02 3.835250e+02 2.707185e+02 2.367963e+02 1.991939e+02
## [11] 1.669441e+02 1.483397e+02 1.370911e+02 8.918152e+01 5.346866e+01
## [16] 4.536207e+01 3.373865e+01 2.988747e+01 2.810327e+01 1.769854e+01
## [21] 1.702460e+01 1.558558e+01 1.382560e+01 1.025524e+01 8.793362e+00
## [26] 4.644899e+00 3.919463e+00 9.027439e-01 4.253202e-01 7.556946e-02Note that, for example, accessing the eigenvalues of the objects created through the use of the dudi.pca and epPCA() functions does not return the variances associated with each of the eigenvalues, which is an useful metric one would rather observe and evaluate than not. In epPCA()’s case these variances can be accessed via $ExPosition.Data$t, but as was already stated there is not a component for cumulative variances. In dudi.pca()’s case, one can observe these variances through the summary() function, but format-wise that is less than ideal. Fortunately, there are alternative ways to obtain the eigenvalues - the package factoextra includes several functions to extract and visualize these variances. More information about them can be found in their associated RDocumentation page: https://www.rdocumentation.org/packages/factoextra/versions/1.0.7/topics/eigenvalue
Let’s focus on the eigenvalues themselves, which can be extracted through the functions get_eig() and get_eigenvalue(). Note that both functions are identical, one is but an alias of the other.
library(factoextra)
get_eig(pca)
get_eigenvalue(pca)Let’s observe the results of applying said functions to the previously constructed PCAs:
get_eig(all_pca_1)
## eigenvalue variance.percent cumulative.variance.percent
## Dim.1 1.328161e+01 4.427203e+01 44.27203
## Dim.2 5.691355e+00 1.897118e+01 63.24321
## Dim.3 2.817949e+00 9.393163e+00 72.63637
## Dim.4 1.980640e+00 6.602135e+00 79.23851
## Dim.5 1.648731e+00 5.495768e+00 84.73427
## Dim.6 1.207357e+00 4.024522e+00 88.75880
## Dim.7 6.752201e-01 2.250734e+00 91.00953
## Dim.8 4.766171e-01 1.588724e+00 92.59825
## Dim.9 4.168948e-01 1.389649e+00 93.98790
## Dim.10 3.506935e-01 1.168978e+00 95.15688
## Dim.11 2.939157e-01 9.797190e-01 96.13660
## Dim.12 2.611614e-01 8.705379e-01 97.00714
## Dim.13 2.413575e-01 8.045250e-01 97.81166
## Dim.14 1.570097e-01 5.233657e-01 98.33503
## Dim.15 9.413497e-02 3.137832e-01 98.64881
## Dim.16 7.986280e-02 2.662093e-01 98.91502
## Dim.17 5.939904e-02 1.979968e-01 99.11302
## Dim.18 5.261878e-02 1.753959e-01 99.28841
## Dim.19 4.947759e-02 1.649253e-01 99.45334
## Dim.20 3.115940e-02 1.038647e-01 99.55720
## Dim.21 2.997289e-02 9.990965e-02 99.65711
## Dim.22 2.743940e-02 9.146468e-02 99.74858
## Dim.23 2.434084e-02 8.113613e-02 99.82971
## Dim.24 1.805501e-02 6.018336e-02 99.88990
## Dim.25 1.548127e-02 5.160424e-02 99.94150
## Dim.26 8.177640e-03 2.725880e-02 99.96876
## Dim.27 6.900464e-03 2.300155e-02 99.99176
## Dim.28 1.589338e-03 5.297793e-03 99.99706
## Dim.29 7.488031e-04 2.496010e-03 99.99956
## Dim.30 1.330448e-04 4.434827e-04 100.00000
get_eigenvalue(all_pca_1)
## eigenvalue variance.percent cumulative.variance.percent
## Dim.1 1.328161e+01 4.427203e+01 44.27203
## Dim.2 5.691355e+00 1.897118e+01 63.24321
## Dim.3 2.817949e+00 9.393163e+00 72.63637
## Dim.4 1.980640e+00 6.602135e+00 79.23851
## Dim.5 1.648731e+00 5.495768e+00 84.73427
## Dim.6 1.207357e+00 4.024522e+00 88.75880
## Dim.7 6.752201e-01 2.250734e+00 91.00953
## Dim.8 4.766171e-01 1.588724e+00 92.59825
## Dim.9 4.168948e-01 1.389649e+00 93.98790
## Dim.10 3.506935e-01 1.168978e+00 95.15688
## Dim.11 2.939157e-01 9.797190e-01 96.13660
## Dim.12 2.611614e-01 8.705379e-01 97.00714
## Dim.13 2.413575e-01 8.045250e-01 97.81166
## Dim.14 1.570097e-01 5.233657e-01 98.33503
## Dim.15 9.413497e-02 3.137832e-01 98.64881
## Dim.16 7.986280e-02 2.662093e-01 98.91502
## Dim.17 5.939904e-02 1.979968e-01 99.11302
## Dim.18 5.261878e-02 1.753959e-01 99.28841
## Dim.19 4.947759e-02 1.649253e-01 99.45334
## Dim.20 3.115940e-02 1.038647e-01 99.55720
## Dim.21 2.997289e-02 9.990965e-02 99.65711
## Dim.22 2.743940e-02 9.146468e-02 99.74858
## Dim.23 2.434084e-02 8.113613e-02 99.82971
## Dim.24 1.805501e-02 6.018336e-02 99.88990
## Dim.25 1.548127e-02 5.160424e-02 99.94150
## Dim.26 8.177640e-03 2.725880e-02 99.96876
## Dim.27 6.900464e-03 2.300155e-02 99.99176
## Dim.28 1.589338e-03 5.297793e-03 99.99706
## Dim.29 7.488031e-04 2.496010e-03 99.99956
## Dim.30 1.330448e-04 4.434827e-04 100.00000
identical(get_eig(all_pca_1), get_eigenvalue(all_pca_1))
## [1] TRUEget_eig(all_pca_2)
## eigenvalue variance.percent cumulative.variance.percent
## Dim.1 1.328161e+01 4.427203e+01 44.27203
## Dim.2 5.691355e+00 1.897118e+01 63.24321
## Dim.3 2.817949e+00 9.393163e+00 72.63637
## Dim.4 1.980640e+00 6.602135e+00 79.23851
## Dim.5 1.648731e+00 5.495768e+00 84.73427
## Dim.6 1.207357e+00 4.024522e+00 88.75880
## Dim.7 6.752201e-01 2.250734e+00 91.00953
## Dim.8 4.766171e-01 1.588724e+00 92.59825
## Dim.9 4.168948e-01 1.389649e+00 93.98790
## Dim.10 3.506935e-01 1.168978e+00 95.15688
## Dim.11 2.939157e-01 9.797190e-01 96.13660
## Dim.12 2.611614e-01 8.705379e-01 97.00714
## Dim.13 2.413575e-01 8.045250e-01 97.81166
## Dim.14 1.570097e-01 5.233657e-01 98.33503
## Dim.15 9.413497e-02 3.137832e-01 98.64881
## Dim.16 7.986280e-02 2.662093e-01 98.91502
## Dim.17 5.939904e-02 1.979968e-01 99.11302
## Dim.18 5.261878e-02 1.753959e-01 99.28841
## Dim.19 4.947759e-02 1.649253e-01 99.45334
## Dim.20 3.115940e-02 1.038647e-01 99.55720
## Dim.21 2.997289e-02 9.990965e-02 99.65711
## Dim.22 2.743940e-02 9.146468e-02 99.74858
## Dim.23 2.434084e-02 8.113613e-02 99.82971
## Dim.24 1.805501e-02 6.018336e-02 99.88990
## Dim.25 1.548127e-02 5.160424e-02 99.94150
## Dim.26 8.177640e-03 2.725880e-02 99.96876
## Dim.27 6.900464e-03 2.300155e-02 99.99176
## Dim.28 1.589338e-03 5.297793e-03 99.99706
## Dim.29 7.488031e-04 2.496010e-03 99.99956
## Dim.30 1.330448e-04 4.434827e-04 100.00000
get_eigenvalue(all_pca_2)
## eigenvalue variance.percent cumulative.variance.percent
## Dim.1 1.328161e+01 4.427203e+01 44.27203
## Dim.2 5.691355e+00 1.897118e+01 63.24321
## Dim.3 2.817949e+00 9.393163e+00 72.63637
## Dim.4 1.980640e+00 6.602135e+00 79.23851
## Dim.5 1.648731e+00 5.495768e+00 84.73427
## Dim.6 1.207357e+00 4.024522e+00 88.75880
## Dim.7 6.752201e-01 2.250734e+00 91.00953
## Dim.8 4.766171e-01 1.588724e+00 92.59825
## Dim.9 4.168948e-01 1.389649e+00 93.98790
## Dim.10 3.506935e-01 1.168978e+00 95.15688
## Dim.11 2.939157e-01 9.797190e-01 96.13660
## Dim.12 2.611614e-01 8.705379e-01 97.00714
## Dim.13 2.413575e-01 8.045250e-01 97.81166
## Dim.14 1.570097e-01 5.233657e-01 98.33503
## Dim.15 9.413497e-02 3.137832e-01 98.64881
## Dim.16 7.986280e-02 2.662093e-01 98.91502
## Dim.17 5.939904e-02 1.979968e-01 99.11302
## Dim.18 5.261878e-02 1.753959e-01 99.28841
## Dim.19 4.947759e-02 1.649253e-01 99.45334
## Dim.20 3.115940e-02 1.038647e-01 99.55720
## Dim.21 2.997289e-02 9.990965e-02 99.65711
## Dim.22 2.743940e-02 9.146468e-02 99.74858
## Dim.23 2.434084e-02 8.113613e-02 99.82971
## Dim.24 1.805501e-02 6.018336e-02 99.88990
## Dim.25 1.548127e-02 5.160424e-02 99.94150
## Dim.26 8.177640e-03 2.725880e-02 99.96876
## Dim.27 6.900464e-03 2.300155e-02 99.99176
## Dim.28 1.589338e-03 5.297793e-03 99.99706
## Dim.29 7.488031e-04 2.496010e-03 99.99956
## Dim.30 1.330448e-04 4.434827e-04 100.00000
identical(get_eig(all_pca_2), get_eigenvalue(all_pca_2))
## [1] TRUEget_eig(all_pca_3)
## eigenvalue variance.percent cumulative.variance.percent
## Dim.1 1.328161e+01 4.427203e+01 44.27203
## Dim.2 5.691355e+00 1.897118e+01 63.24321
## Dim.3 2.817949e+00 9.393163e+00 72.63637
## Dim.4 1.980640e+00 6.602135e+00 79.23851
## Dim.5 1.648731e+00 5.495768e+00 84.73427
## Dim.6 1.207357e+00 4.024522e+00 88.75880
## Dim.7 6.752201e-01 2.250734e+00 91.00953
## Dim.8 4.766171e-01 1.588724e+00 92.59825
## Dim.9 4.168948e-01 1.389649e+00 93.98790
## Dim.10 3.506935e-01 1.168978e+00 95.15688
## Dim.11 2.939157e-01 9.797190e-01 96.13660
## Dim.12 2.611614e-01 8.705379e-01 97.00714
## Dim.13 2.413575e-01 8.045250e-01 97.81166
## Dim.14 1.570097e-01 5.233657e-01 98.33503
## Dim.15 9.413497e-02 3.137832e-01 98.64881
## Dim.16 7.986280e-02 2.662093e-01 98.91502
## Dim.17 5.939904e-02 1.979968e-01 99.11302
## Dim.18 5.261878e-02 1.753959e-01 99.28841
## Dim.19 4.947759e-02 1.649253e-01 99.45334
## Dim.20 3.115940e-02 1.038647e-01 99.55720
## Dim.21 2.997289e-02 9.990965e-02 99.65711
## Dim.22 2.743940e-02 9.146468e-02 99.74858
## Dim.23 2.434084e-02 8.113613e-02 99.82971
## Dim.24 1.805501e-02 6.018336e-02 99.88990
## Dim.25 1.548127e-02 5.160424e-02 99.94150
## Dim.26 8.177640e-03 2.725880e-02 99.96876
## Dim.27 6.900464e-03 2.300155e-02 99.99176
## Dim.28 1.589338e-03 5.297793e-03 99.99706
## Dim.29 7.488031e-04 2.496010e-03 99.99956
## Dim.30 1.330448e-04 4.434827e-04 100.00000
get_eigenvalue(all_pca_3)
## eigenvalue variance.percent cumulative.variance.percent
## Dim.1 1.328161e+01 4.427203e+01 44.27203
## Dim.2 5.691355e+00 1.897118e+01 63.24321
## Dim.3 2.817949e+00 9.393163e+00 72.63637
## Dim.4 1.980640e+00 6.602135e+00 79.23851
## Dim.5 1.648731e+00 5.495768e+00 84.73427
## Dim.6 1.207357e+00 4.024522e+00 88.75880
## Dim.7 6.752201e-01 2.250734e+00 91.00953
## Dim.8 4.766171e-01 1.588724e+00 92.59825
## Dim.9 4.168948e-01 1.389649e+00 93.98790
## Dim.10 3.506935e-01 1.168978e+00 95.15688
## Dim.11 2.939157e-01 9.797190e-01 96.13660
## Dim.12 2.611614e-01 8.705379e-01 97.00714
## Dim.13 2.413575e-01 8.045250e-01 97.81166
## Dim.14 1.570097e-01 5.233657e-01 98.33503
## Dim.15 9.413497e-02 3.137832e-01 98.64881
## Dim.16 7.986280e-02 2.662093e-01 98.91502
## Dim.17 5.939904e-02 1.979968e-01 99.11302
## Dim.18 5.261878e-02 1.753959e-01 99.28841
## Dim.19 4.947759e-02 1.649253e-01 99.45334
## Dim.20 3.115940e-02 1.038647e-01 99.55720
## Dim.21 2.997289e-02 9.990965e-02 99.65711
## Dim.22 2.743940e-02 9.146468e-02 99.74858
## Dim.23 2.434084e-02 8.113613e-02 99.82971
## Dim.24 1.805501e-02 6.018336e-02 99.88990
## Dim.25 1.548127e-02 5.160424e-02 99.94150
## Dim.26 8.177640e-03 2.725880e-02 99.96876
## Dim.27 6.900464e-03 2.300155e-02 99.99176
## Dim.28 1.589338e-03 5.297793e-03 99.99706
## Dim.29 7.488031e-04 2.496010e-03 99.99956
## Dim.30 1.330448e-04 4.434827e-04 100.00000
identical(get_eig(all_pca_3), get_eigenvalue(all_pca_3))
## [1] TRUEget_eig(all_pca_4)
## eigenvalue variance.percent cumulative.variance.percent
## Dim.1 1.328161e+01 4.427203e+01 44.27203
## Dim.2 5.691355e+00 1.897118e+01 63.24321
## Dim.3 2.817949e+00 9.393163e+00 72.63637
## Dim.4 1.980640e+00 6.602135e+00 79.23851
## Dim.5 1.648731e+00 5.495768e+00 84.73427
## Dim.6 1.207357e+00 4.024522e+00 88.75880
## Dim.7 6.752201e-01 2.250734e+00 91.00953
## Dim.8 4.766171e-01 1.588724e+00 92.59825
## Dim.9 4.168948e-01 1.389649e+00 93.98790
## Dim.10 3.506935e-01 1.168978e+00 95.15688
## Dim.11 2.939157e-01 9.797190e-01 96.13660
## Dim.12 2.611614e-01 8.705379e-01 97.00714
## Dim.13 2.413575e-01 8.045250e-01 97.81166
## Dim.14 1.570097e-01 5.233657e-01 98.33503
## Dim.15 9.413497e-02 3.137832e-01 98.64881
## Dim.16 7.986280e-02 2.662093e-01 98.91502
## Dim.17 5.939904e-02 1.979968e-01 99.11302
## Dim.18 5.261878e-02 1.753959e-01 99.28841
## Dim.19 4.947759e-02 1.649253e-01 99.45334
## Dim.20 3.115940e-02 1.038647e-01 99.55720
## Dim.21 2.997289e-02 9.990965e-02 99.65711
## Dim.22 2.743940e-02 9.146468e-02 99.74858
## Dim.23 2.434084e-02 8.113613e-02 99.82971
## Dim.24 1.805501e-02 6.018336e-02 99.88990
## Dim.25 1.548127e-02 5.160424e-02 99.94150
## Dim.26 8.177640e-03 2.725880e-02 99.96876
## Dim.27 6.900464e-03 2.300155e-02 99.99176
## Dim.28 1.589338e-03 5.297793e-03 99.99706
## Dim.29 7.488031e-04 2.496010e-03 99.99956
## Dim.30 1.330448e-04 4.434827e-04 100.00000
get_eigenvalue(all_pca_4)
## eigenvalue variance.percent cumulative.variance.percent
## Dim.1 1.328161e+01 4.427203e+01 44.27203
## Dim.2 5.691355e+00 1.897118e+01 63.24321
## Dim.3 2.817949e+00 9.393163e+00 72.63637
## Dim.4 1.980640e+00 6.602135e+00 79.23851
## Dim.5 1.648731e+00 5.495768e+00 84.73427
## Dim.6 1.207357e+00 4.024522e+00 88.75880
## Dim.7 6.752201e-01 2.250734e+00 91.00953
## Dim.8 4.766171e-01 1.588724e+00 92.59825
## Dim.9 4.168948e-01 1.389649e+00 93.98790
## Dim.10 3.506935e-01 1.168978e+00 95.15688
## Dim.11 2.939157e-01 9.797190e-01 96.13660
## Dim.12 2.611614e-01 8.705379e-01 97.00714
## Dim.13 2.413575e-01 8.045250e-01 97.81166
## Dim.14 1.570097e-01 5.233657e-01 98.33503
## Dim.15 9.413497e-02 3.137832e-01 98.64881
## Dim.16 7.986280e-02 2.662093e-01 98.91502
## Dim.17 5.939904e-02 1.979968e-01 99.11302
## Dim.18 5.261878e-02 1.753959e-01 99.28841
## Dim.19 4.947759e-02 1.649253e-01 99.45334
## Dim.20 3.115940e-02 1.038647e-01 99.55720
## Dim.21 2.997289e-02 9.990965e-02 99.65711
## Dim.22 2.743940e-02 9.146468e-02 99.74858
## Dim.23 2.434084e-02 8.113613e-02 99.82971
## Dim.24 1.805501e-02 6.018336e-02 99.88990
## Dim.25 1.548127e-02 5.160424e-02 99.94150
## Dim.26 8.177640e-03 2.725880e-02 99.96876
## Dim.27 6.900464e-03 2.300155e-02 99.99176
## Dim.28 1.589338e-03 5.297793e-03 99.99706
## Dim.29 7.488031e-04 2.496010e-03 99.99956
## Dim.30 1.330448e-04 4.434827e-04 100.00000
identical(get_eig(all_pca_4), get_eigenvalue(all_pca_4))
## [1] TRUEget_eig(all_pca_5)
## eigenvalue variance.percent cumulative.variance.percent
## Dim.1 7.543953e+03 4.427203e+01 44.27203
## Dim.2 3.232689e+03 1.897118e+01 63.24321
## Dim.3 1.600595e+03 9.393163e+00 72.63637
## Dim.4 1.125004e+03 6.602135e+00 79.23851
## Dim.5 9.364790e+02 5.495768e+00 84.73427
## Dim.6 6.857786e+02 4.024522e+00 88.75880
## Dim.7 3.835250e+02 2.250734e+00 91.00953
## Dim.8 2.707185e+02 1.588724e+00 92.59825
## Dim.9 2.367963e+02 1.389649e+00 93.98790
## Dim.10 1.991939e+02 1.168978e+00 95.15688
## Dim.11 1.669441e+02 9.797190e-01 96.13660
## Dim.12 1.483397e+02 8.705379e-01 97.00714
## Dim.13 1.370911e+02 8.045250e-01 97.81166
## Dim.14 8.918152e+01 5.233657e-01 98.33503
## Dim.15 5.346866e+01 3.137832e-01 98.64881
## Dim.16 4.536207e+01 2.662093e-01 98.91502
## Dim.17 3.373865e+01 1.979968e-01 99.11302
## Dim.18 2.988747e+01 1.753959e-01 99.28841
## Dim.19 2.810327e+01 1.649253e-01 99.45334
## Dim.20 1.769854e+01 1.038647e-01 99.55720
## Dim.21 1.702460e+01 9.990965e-02 99.65711
## Dim.22 1.558558e+01 9.146468e-02 99.74858
## Dim.23 1.382560e+01 8.113613e-02 99.82971
## Dim.24 1.025524e+01 6.018336e-02 99.88990
## Dim.25 8.793362e+00 5.160424e-02 99.94150
## Dim.26 4.644899e+00 2.725880e-02 99.96876
## Dim.27 3.919463e+00 2.300155e-02 99.99176
## Dim.28 9.027439e-01 5.297793e-03 99.99706
## Dim.29 4.253202e-01 2.496010e-03 99.99956
## Dim.30 7.556946e-02 4.434827e-04 100.00000
get_eigenvalue(all_pca_5)
## eigenvalue variance.percent cumulative.variance.percent
## Dim.1 7.543953e+03 4.427203e+01 44.27203
## Dim.2 3.232689e+03 1.897118e+01 63.24321
## Dim.3 1.600595e+03 9.393163e+00 72.63637
## Dim.4 1.125004e+03 6.602135e+00 79.23851
## Dim.5 9.364790e+02 5.495768e+00 84.73427
## Dim.6 6.857786e+02 4.024522e+00 88.75880
## Dim.7 3.835250e+02 2.250734e+00 91.00953
## Dim.8 2.707185e+02 1.588724e+00 92.59825
## Dim.9 2.367963e+02 1.389649e+00 93.98790
## Dim.10 1.991939e+02 1.168978e+00 95.15688
## Dim.11 1.669441e+02 9.797190e-01 96.13660
## Dim.12 1.483397e+02 8.705379e-01 97.00714
## Dim.13 1.370911e+02 8.045250e-01 97.81166
## Dim.14 8.918152e+01 5.233657e-01 98.33503
## Dim.15 5.346866e+01 3.137832e-01 98.64881
## Dim.16 4.536207e+01 2.662093e-01 98.91502
## Dim.17 3.373865e+01 1.979968e-01 99.11302
## Dim.18 2.988747e+01 1.753959e-01 99.28841
## Dim.19 2.810327e+01 1.649253e-01 99.45334
## Dim.20 1.769854e+01 1.038647e-01 99.55720
## Dim.21 1.702460e+01 9.990965e-02 99.65711
## Dim.22 1.558558e+01 9.146468e-02 99.74858
## Dim.23 1.382560e+01 8.113613e-02 99.82971
## Dim.24 1.025524e+01 6.018336e-02 99.88990
## Dim.25 8.793362e+00 5.160424e-02 99.94150
## Dim.26 4.644899e+00 2.725880e-02 99.96876
## Dim.27 3.919463e+00 2.300155e-02 99.99176
## Dim.28 9.027439e-01 5.297793e-03 99.99706
## Dim.29 4.253202e-01 2.496010e-03 99.99956
## Dim.30 7.556946e-02 4.434827e-04 100.00000
identical(get_eig(all_pca_5), get_eigenvalue(all_pca_5))
## [1] TRUEEvery single PCA object returns the same eigenvalues and variances, as should be. Is also worth noting that both get_eig() and get_eigenvalue() yield the same results - the function identical() returns TRUE in each scenario, indicating equality between both functions’ outputs.
Despite the usefulness of having the eigenvalues and variances in a tidy dataframe such as the ones obtained in the previous code snippets, being able to visualize the differences graphically helps to illustrate it all - that is when the screeplot comes into play.
The RDocumentation page detailing both both get_eig() and get_eigenvalue() also mentions the fviz_eig() and the fviz_screeplot() functions which, as was the case with the previous functions, are identical in behavior - one is but an alias of the other.
These functions draw what is know as a “screeplot” or “scree plot”, which is a graph of eigenvalues ordered from largest to smallest. It can also be interpreted as a plot of the percentage of variance associated with each Principal Component or dimension (since such is the definition of an eigenvalue after all).
“variance” or “eigenvalue”.
“bar” for barplot, “line” for lineplot or c(“bar”, “line”) to use both types.
TRUE of FALSE determines whether or not labels are added at the top of bars or points showcasing the information retained by each dimension.
library(factoextra)
fviz_eig(X,
choice = "variance",
geom = c("bar", "line"),
barfill = "steelblue",
barcolor = "steelblue",
linecolor = "black",
ncp = 10,
addlabels = FALSE,
hjust = 0,
main = NULL,
xlab = NULL,
ylab = NULL,
ggtheme = theme_minimal(),
...)
fviz_screeplot(pca, ...)These functions also accept optional arguments to be passed onto the function ggpar() upon which they are based. More information regarding these functions and their arguments is available at the very same RDocumentation page as the functions get_eig() and get_eigenvalue() (https://www.rdocumentation.org/packages/factoextra/versions/1.0.7/topics/eigenvalue).
Let’s now observe the results of applying said functions to the previously constructed PCA objects:
fviz_eig(all_pca_1,
addlabels = TRUE,
ylim = c(0, 50),
barfill = "#81d4fa",
barcolor = "#81d4fa",
linecolor = "red")fviz_eig(all_pca_2,
addlabels = TRUE,
ylim = c(0, 50),
barfill = "#81d4fa",
barcolor = "#81d4fa",
linecolor = "red")fviz_eig(all_pca_3,
addlabels = TRUE,
ylim = c(0, 50),
barfill = "#81d4fa",
barcolor = "#81d4fa",
linecolor = "red")fviz_eig(all_pca_4,
addlabels = TRUE,
ylim = c(0, 50),
barfill = "#81d4fa",
barcolor = "#81d4fa",
linecolor = "red")fviz_eig(all_pca_5,
addlabels = TRUE,
ylim = c(0, 50),
barfill = "#81d4fa",
barcolor = "#81d4fa",
linecolor = "red")Unsurprisingly, all of the screeplots are identical.
As a side note, the scannf argument of the function dudi.pca() allows for a screeplot to be plotted. Setting said argument to TRUE would yield a plot akin to the ones obtained earlier with the function fviz_eig(), albeit considerably less complete/polished.
These plots help to visualize the variance results obtained when performing a PCA upon the Wisconsin Breast Cancer Dataset. Unfortunately, there is no well-accepted objective way to decide how many Principal Components are enough - this will depend on the specific field of application and the specific dataset (biomedical scenarios tend to require high cummulative variance since people’s health is at play). In practice, the first few principal components are the most important ones in order to find interesting patterns in the data and undoubtedly the most important ones when it comes to representing the data.
Navigating through the PCA outputs previously coded is not an easy task. As was the case with the eigenvalues, some of the PCA functions yield an object whose components include the results for variables and individuals, namely PCA() and epPCA().
print(all_pca_3)
## **Results for the Principal Component Analysis (PCA)**
## The analysis was performed on 569 individuals, described by 30 variables
## *The results are available in the following objects:
##
## name description
## 1 "$eig" "eigenvalues"
## 2 "$var" "results for the variables"
## 3 "$var$coord" "coord. for the variables"
## 4 "$var$cor" "correlations variables - dimensions"
## 5 "$var$cos2" "cos2 for the variables"
## 6 "$var$contrib" "contributions of the variables"
## 7 "$ind" "results for the individuals"
## 8 "$ind$coord" "coord. for the individuals"
## 9 "$ind$cos2" "cos2 for the individuals"
## 10 "$ind$contrib" "contributions of the individuals"
## 11 "$call" "summary statistics"
## 12 "$call$centre" "mean of the variables"
## 13 "$call$ecart.type" "standard error of the variables"
## 14 "$call$row.w" "weights for the individuals"
## 15 "$call$col.w" "weights for the variables"print(all_pca_5$ExPosition.Data)
## **Results for Principal Component Analysis**
## The analysis was performed on 569 individuals, described by 30 variables
## *The results are available in the following objects:
##
## name description
## 1 "$fi" "Factor scores of the rows"
## 2 "$di" "Squared distances of the rows"
## 3 "$ci" "Contributions of the rows"
## 4 "$ri" "Cosines of the rows"
## 5 "$fj" "Factor scores of the columns"
## 6 "$dj" "square distances of the columns"
## 7 "$cj" "Contributions for the columns"
## 8 "$rj" "Cosines of the columns"
## 9 "$t" "Explained Variance"
## 10 "$eigs" "Eigenvalues"
## 11 "$pdq" "SVD data"
## 12 "$X" "X matrix to decompose"
## 13 "$M" "Masses - each set to 1"
## 14 "$W" "Weights - each set to 1"
## 15 "$center" "Center of X"
## 16 "$scale" "Scale factor of X"In PCA()’s case, the results for variables and individuals are accessed with $var and $ind respectively, whereas epPCA() has a unique address for every result (less ideal, but manageable).
A simpler method to extract the results for variables and individuals from a PCA output is to use the function get_pca_var() and get_pca_ind() respectively. There’s also the option of using the function get_pca() with the argument element = “var” for the results for variables or with the argument element = “ind” for the results for individuals.
All of these functions come from the factoextra package and provide a list of matrices containing all the results for either the active variables or individuals (more information regarding these functions is available in their associated RDocumentation page: https://www.rdocumentation.org/packages/factoextra/versions/1.0.7/topics/get_pca).
Let’s first examine the outputs of applying get_pca_var() to the PCA objects previously created:
all_pca_var_1 <- get_pca_var(all_pca_1)
all_pca_var_1
## Principal Component Analysis Results for variables
## ===================================================
## Name Description
## 1 "$coord" "Coordinates for the variables"
## 2 "$cor" "Correlations between variables and dimensions"
## 3 "$cos2" "Cos2 for the variables"
## 4 "$contrib" "contributions of the variables"
head(all_pca_var_1$coord)
## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5
## radius_mean -0.7977668 0.5579027 -0.01432118 0.05827700 -0.04851878
## texture_mean -0.3780132 0.1424382 0.10835829 -0.84870380 0.06351944
## perimeter_mean -0.8292355 0.5133487 -0.01563555 0.05908501 -0.04799015
## area_mean -0.8053928 0.5512695 0.04817717 0.07520017 -0.01326563
## smoothness_mean -0.5196530 -0.4440017 -0.17507219 0.22430770 0.46878427
## compactness_mean -0.8720501 -0.3623611 -0.12437565 0.04474618 -0.01502824
## Dim.6 Dim.7 Dim.8 Dim.9
## radius_mean 0.020592339 -0.101965596 0.005144876 -0.144056156
## texture_mean -0.035358035 0.009367203 -0.090214585 0.072767057
## perimeter_mean 0.019018481 -0.094067834 0.012901209 -0.144462575
## area_mean -0.002074253 -0.042444540 -0.023937777 -0.126284788
## smoothness_mean -0.314667668 -0.115590213 0.199500717 0.004148275
## compactness_mean -0.015527056 0.025406278 0.104520200 -0.108370831
## Dim.10 Dim.11 Dim.12 Dim.13 Dim.14
## radius_mean 0.056546476 -0.022483349 0.02609749 0.005879269 0.023578980
## texture_mean 0.142679652 0.163858215 0.13026214 0.099956785 -0.008543071
## perimeter_mean 0.051157023 -0.009098538 0.01989278 0.021670182 0.019223333
## area_mean 0.044388765 -0.059727362 0.03344115 0.033100452 0.004291657
## smoothness_mean -0.041034694 0.074285011 0.16186012 0.022389467 0.176354514
## compactness_mean 0.007660737 0.166984319 -0.05315682 0.112641659 0.003210000
## Dim.15 Dim.16 Dim.17 Dim.18
## radius_mean -0.015683967 -0.04255502 0.049456533 0.033654027
## texture_mean -0.033112133 -0.04460615 -0.009433423 -0.009428525
## perimeter_mean -0.012242788 -0.03234470 0.047481690 0.036316100
## area_mean 0.004285246 -0.03742982 0.062320398 0.061055728
## smoothness_mean -0.036248064 -0.05782372 0.040927741 -0.080796547
## compactness_mean 0.070843391 0.04809242 -0.004949378 0.001787881
## Dim.19 Dim.20 Dim.21 Dim.22
## radius_mean 0.050133570 -0.008772821 -0.011871307 -0.01208056
## texture_mean 0.006626055 -0.043094773 0.077624778 -0.01570358
## perimeter_mean 0.053294517 -0.003118233 -0.012078892 -0.01245022
## area_mean -0.006077427 -0.015912200 -0.003193026 -0.01616162
## smoothness_mean -0.036605301 0.003018666 -0.020687226 -0.01057217
## compactness_mean 0.063221168 0.086263038 0.033347929 0.01624639
## Dim.23 Dim.24 Dim.25 Dim.26
## radius_mean -1.537575e-02 -0.024533003 -0.002392233 -0.011708590
## texture_mean -8.658821e-05 0.013273874 0.010544407 -0.002220667
## perimeter_mean -6.278798e-03 -0.015673984 0.003361359 -0.011326933
## area_mean 1.213375e-03 0.009385446 -0.026134063 0.032801549
## smoothness_mean -3.224173e-03 0.009230799 0.003602676 -0.003346255
## compactness_mean 8.169034e-03 -0.013992577 0.049349353 0.023765850
## Dim.27 Dim.28 Dim.29 Dim.30
## radius_mean -0.010925793 8.419566e-03 5.786460e-03 8.101999e-03
## texture_mean -0.001441855 -2.623673e-06 -2.882534e-04 3.156545e-06
## perimeter_mean -0.009587447 3.362272e-03 1.050312e-02 -7.957621e-03
## area_mean 0.038761046 -1.086395e-02 -1.156947e-02 -3.800314e-04
## smoothness_mean 0.005789074 5.897327e-05 -9.398714e-05 -5.591303e-05
## compactness_mean 0.008119890 -2.177814e-04 -1.122394e-03 5.152947e-04
head(all_pca_var_1$cor)
## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5
## radius_mean -0.7977668 0.5579027 -0.01432118 0.05827700 -0.04851878
## texture_mean -0.3780132 0.1424382 0.10835829 -0.84870380 0.06351944
## perimeter_mean -0.8292355 0.5133487 -0.01563555 0.05908501 -0.04799015
## area_mean -0.8053928 0.5512695 0.04817717 0.07520017 -0.01326563
## smoothness_mean -0.5196530 -0.4440017 -0.17507219 0.22430770 0.46878427
## compactness_mean -0.8720501 -0.3623611 -0.12437565 0.04474618 -0.01502824
## Dim.6 Dim.7 Dim.8 Dim.9
## radius_mean 0.020592339 -0.101965596 0.005144876 -0.144056156
## texture_mean -0.035358035 0.009367203 -0.090214585 0.072767057
## perimeter_mean 0.019018481 -0.094067834 0.012901209 -0.144462575
## area_mean -0.002074253 -0.042444540 -0.023937777 -0.126284788
## smoothness_mean -0.314667668 -0.115590213 0.199500717 0.004148275
## compactness_mean -0.015527056 0.025406278 0.104520200 -0.108370831
## Dim.10 Dim.11 Dim.12 Dim.13 Dim.14
## radius_mean 0.056546476 -0.022483349 0.02609749 0.005879269 0.023578980
## texture_mean 0.142679652 0.163858215 0.13026214 0.099956785 -0.008543071
## perimeter_mean 0.051157023 -0.009098538 0.01989278 0.021670182 0.019223333
## area_mean 0.044388765 -0.059727362 0.03344115 0.033100452 0.004291657
## smoothness_mean -0.041034694 0.074285011 0.16186012 0.022389467 0.176354514
## compactness_mean 0.007660737 0.166984319 -0.05315682 0.112641659 0.003210000
## Dim.15 Dim.16 Dim.17 Dim.18
## radius_mean -0.015683967 -0.04255502 0.049456533 0.033654027
## texture_mean -0.033112133 -0.04460615 -0.009433423 -0.009428525
## perimeter_mean -0.012242788 -0.03234470 0.047481690 0.036316100
## area_mean 0.004285246 -0.03742982 0.062320398 0.061055728
## smoothness_mean -0.036248064 -0.05782372 0.040927741 -0.080796547
## compactness_mean 0.070843391 0.04809242 -0.004949378 0.001787881
## Dim.19 Dim.20 Dim.21 Dim.22
## radius_mean 0.050133570 -0.008772821 -0.011871307 -0.01208056
## texture_mean 0.006626055 -0.043094773 0.077624778 -0.01570358
## perimeter_mean 0.053294517 -0.003118233 -0.012078892 -0.01245022
## area_mean -0.006077427 -0.015912200 -0.003193026 -0.01616162
## smoothness_mean -0.036605301 0.003018666 -0.020687226 -0.01057217
## compactness_mean 0.063221168 0.086263038 0.033347929 0.01624639
## Dim.23 Dim.24 Dim.25 Dim.26
## radius_mean -1.537575e-02 -0.024533003 -0.002392233 -0.011708590
## texture_mean -8.658821e-05 0.013273874 0.010544407 -0.002220667
## perimeter_mean -6.278798e-03 -0.015673984 0.003361359 -0.011326933
## area_mean 1.213375e-03 0.009385446 -0.026134063 0.032801549
## smoothness_mean -3.224173e-03 0.009230799 0.003602676 -0.003346255
## compactness_mean 8.169034e-03 -0.013992577 0.049349353 0.023765850
## Dim.27 Dim.28 Dim.29 Dim.30
## radius_mean -0.010925793 8.419566e-03 5.786460e-03 8.101999e-03
## texture_mean -0.001441855 -2.623673e-06 -2.882534e-04 3.156545e-06
## perimeter_mean -0.009587447 3.362272e-03 1.050312e-02 -7.957621e-03
## area_mean 0.038761046 -1.086395e-02 -1.156947e-02 -3.800314e-04
## smoothness_mean 0.005789074 5.897327e-05 -9.398714e-05 -5.591303e-05
## compactness_mean 0.008119890 -2.177814e-04 -1.122394e-03 5.152947e-04
head(all_pca_var_1$cos2)
## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5
## radius_mean 0.6364318 0.31125539 0.0002050963 0.003396209 0.0023540715
## texture_mean 0.1428940 0.02028864 0.0117415199 0.720298141 0.0040347193
## perimeter_mean 0.6876316 0.26352690 0.0002444703 0.003491038 0.0023030547
## area_mean 0.6486576 0.30389811 0.0023210397 0.005655066 0.0001759769
## smoothness_mean 0.2700393 0.19713747 0.0306502709 0.050313944 0.2197586899
## compactness_mean 0.7604714 0.13130559 0.0154693024 0.002002220 0.0002258480
## Dim.6 Dim.7 Dim.8 Dim.9
## radius_mean 4.240444e-04 0.0103969827 2.646975e-05 2.075218e-02
## texture_mean 1.250191e-03 0.0000877445 8.138671e-03 5.295045e-03
## perimeter_mean 3.617026e-04 0.0088487573 1.664412e-04 2.086944e-02
## area_mean 4.302527e-06 0.0018015390 5.730172e-04 1.594785e-02
## smoothness_mean 9.901574e-02 0.0133610973 3.980054e-02 1.720819e-05
## compactness_mean 2.410895e-04 0.0006454790 1.092447e-02 1.174424e-02
## Dim.10 Dim.11 Dim.12 Dim.13
## radius_mean 3.197504e-03 5.055010e-04 0.0006810789 3.456581e-05
## texture_mean 2.035748e-02 2.684951e-02 0.0169682252 9.991359e-03
## perimeter_mean 2.617041e-03 8.278339e-05 0.0003957226 4.695968e-04
## area_mean 1.970362e-03 3.567358e-03 0.0011183106 1.095640e-03
## smoothness_mean 1.683846e-03 5.518263e-03 0.0261986970 5.012882e-04
## compactness_mean 5.868689e-05 2.788376e-02 0.0028256473 1.268814e-02
## Dim.14 Dim.15 Dim.16 Dim.17
## radius_mean 5.559683e-04 2.459868e-04 0.001810929 2.445949e-03
## texture_mean 7.298407e-05 1.096413e-03 0.001989709 8.898948e-05
## perimeter_mean 3.695365e-04 1.498859e-04 0.001046179 2.254511e-03
## area_mean 1.841832e-05 1.836333e-05 0.001400992 3.883832e-03
## smoothness_mean 3.110091e-02 1.313922e-03 0.003343582 1.675080e-03
## compactness_mean 1.030410e-05 5.018786e-03 0.002312881 2.449634e-05
## Dim.18 Dim.19 Dim.20 Dim.21
## radius_mean 1.132594e-03 2.513375e-03 7.696239e-05 1.409279e-04
## texture_mean 8.889709e-05 4.390460e-05 1.857159e-03 6.025606e-03
## perimeter_mean 1.318859e-03 2.840305e-03 9.723374e-06 1.458996e-04
## area_mean 3.727802e-03 3.693512e-05 2.531981e-04 1.019542e-05
## smoothness_mean 6.528082e-03 1.339948e-03 9.112344e-06 4.279613e-04
## compactness_mean 3.196517e-06 3.996916e-03 7.441312e-03 1.112084e-03
## Dim.22 Dim.23 Dim.24 Dim.25
## radius_mean 0.0001459399 2.364136e-04 6.018682e-04 5.722780e-06
## texture_mean 0.0002466023 7.497518e-09 1.761957e-04 1.111845e-04
## perimeter_mean 0.0001550079 3.942330e-05 2.456738e-04 1.129874e-05
## area_mean 0.0002611979 1.472279e-06 8.808659e-05 6.829892e-04
## smoothness_mean 0.0001117708 1.039529e-05 8.520764e-05 1.297927e-05
## compactness_mean 0.0002639453 6.673311e-05 1.957922e-04 2.435359e-03
## Dim.26 Dim.27 Dim.28 Dim.29
## radius_mean 1.370911e-04 1.193730e-04 7.088910e-05 3.348312e-05
## texture_mean 4.931360e-06 2.078945e-06 6.883658e-12 8.309002e-08
## perimeter_mean 1.282994e-04 9.191915e-05 1.130487e-05 1.103155e-04
## area_mean 1.075942e-03 1.502419e-03 1.180255e-04 1.338527e-04
## smoothness_mean 1.119742e-05 3.351338e-05 3.477847e-09 8.833583e-09
## compactness_mean 5.648156e-04 6.593262e-05 4.742874e-08 1.259768e-06
## Dim.30
## radius_mean 6.564239e-05
## texture_mean 9.963774e-12
## perimeter_mean 6.332372e-05
## area_mean 1.444238e-07
## smoothness_mean 3.126267e-09
## compactness_mean 2.655286e-07
head(all_pca_var_1$contrib)
## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5
## radius_mean 4.791828 5.4689158 0.007278210 0.1714702 0.14278085
## texture_mean 1.075879 0.3564817 0.416669002 36.3669303 0.24471672
## perimeter_mean 5.177322 4.6303018 0.008675469 0.1762581 0.13968654
## area_mean 4.883878 5.3396446 0.082366279 0.2855170 0.01067348
## smoothness_mean 2.033182 3.4638057 1.087680124 2.5402866 13.32896332
## compactness_mean 5.725748 2.3071061 0.548956087 0.1010895 0.01369829
## Dim.6 Dim.7 Dim.8 Dim.9 Dim.10
## radius_mean 0.0351217226 1.53979162 0.005553672 4.977796701 0.91176608
## texture_mean 0.1035477523 0.01299495 1.707590993 1.270115259 5.80492242
## perimeter_mean 0.0299582265 1.31049967 0.034921362 5.005923564 0.74624745
## area_mean 0.0003563592 0.26680766 0.120225880 3.825388875 0.56184752
## smoothness_mean 8.2010352310 1.97877655 8.350630500 0.004127705 0.48014757
## compactness_mean 0.0199683716 0.09559534 2.292085472 2.817074391 0.01673453
## Dim.11 Dim.12 Dim.13 Dim.14 Dim.15
## radius_mean 0.17198842 0.2607885 0.01432142 0.354098008 0.26131291
## texture_mean 9.13510741 6.4972186 4.13965139 0.046483789 1.16472490
## perimeter_mean 0.02816569 0.1515242 0.19456483 0.235358999 0.15922443
## area_mean 1.21373501 0.4282067 0.45394900 0.011730686 0.01950745
## smoothness_mean 1.87749853 10.0316126 0.20769532 19.808272945 1.39578544
## compactness_mean 9.48699343 1.0819546 5.25699162 0.006562713 5.33147923
## Dim.16 Dim.17 Dim.18 Dim.19 Dim.20
## radius_mean 2.267551 4.1178253 2.152451027 5.07982446 0.24699572
## texture_mean 2.491408 0.1498164 0.168945538 0.08873633 5.96018948
## perimeter_mean 1.309971 3.7955343 2.506441650 5.74058961 0.03120527
## area_mean 1.754248 6.5385437 7.084545992 0.07465020 0.81258978
## smoothness_mean 4.186658 2.8200456 12.406371982 2.70819168 0.02924428
## compactness_mean 2.896068 0.0412403 0.006074859 8.07823486 23.88143277
## Dim.21 Dim.22 Dim.23 Dim.24 Dim.25
## radius_mean 0.47018457 0.5318625 9.712634e-01 3.3335252 0.03696583
## texture_mean 20.10351786 0.8987160 3.080222e-05 0.9758830 0.71818728
## perimeter_mean 0.48677194 0.5649097 1.619636e-01 1.3606960 0.07298326
## area_mean 0.03401545 0.9519081 6.048598e-03 0.4878790 4.41171292
## smoothness_mean 1.42782777 0.4073369 4.270720e-02 0.4719336 0.08383854
## compactness_mean 3.71030023 0.9619207 2.741611e-01 1.0844206 15.73099874
## Dim.26 Dim.27 Dim.28 Dim.29 Dim.30
## radius_mean 1.67641371 1.72992648 4.460291e+00 4.471552424 4.933856e+01
## texture_mean 0.06030297 0.03012762 4.331148e-07 0.011096377 7.489035e-06
## perimeter_mean 1.56890519 1.33207198 7.112943e-01 14.732247365 4.759578e+01
## area_mean 13.15711689 21.77272040 7.426079e+00 17.875554424 1.085528e-01
## smoothness_mean 0.13692728 0.48566854 2.188236e-04 0.001179694 2.349785e-03
## compactness_mean 6.90682942 0.95548094 2.984182e-03 0.168237574 1.995783e-01all_pca_var_2 <- get_pca_var(all_pca_2)
all_pca_var_2
## Principal Component Analysis Results for variables
## ===================================================
## Name Description
## 1 "$coord" "Coordinates for the variables"
## 2 "$cor" "Correlations between variables and dimensions"
## 3 "$cos2" "Cos2 for the variables"
## 4 "$contrib" "contributions of the variables"
head(all_pca_var_2$coord)
## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5
## radius_mean 0.7977668 0.5579027 0.01432118 0.05827700 0.04851878
## texture_mean 0.3780132 0.1424382 -0.10835829 -0.84870380 -0.06351944
## perimeter_mean 0.8292355 0.5133487 0.01563555 0.05908501 0.04799015
## area_mean 0.8053928 0.5512695 -0.04817717 0.07520017 0.01326563
## smoothness_mean 0.5196530 -0.4440017 0.17507219 0.22430770 -0.46878427
## compactness_mean 0.8720501 -0.3623611 0.12437565 0.04474618 0.01502824
## Dim.6 Dim.7 Dim.8 Dim.9
## radius_mean 0.020592339 0.101965596 0.005144876 0.144056156
## texture_mean -0.035358035 -0.009367203 -0.090214585 -0.072767057
## perimeter_mean 0.019018481 0.094067834 0.012901209 0.144462575
## area_mean -0.002074253 0.042444540 -0.023937777 0.126284788
## smoothness_mean -0.314667668 0.115590213 0.199500717 -0.004148275
## compactness_mean -0.015527056 -0.025406278 0.104520200 0.108370831
## Dim.10 Dim.11 Dim.12 Dim.13 Dim.14
## radius_mean 0.056546476 0.022483349 0.02609749 0.005879269 0.023578980
## texture_mean 0.142679652 -0.163858215 0.13026214 0.099956785 -0.008543071
## perimeter_mean 0.051157023 0.009098538 0.01989278 0.021670182 0.019223333
## area_mean 0.044388765 0.059727362 0.03344115 0.033100452 0.004291657
## smoothness_mean -0.041034694 -0.074285011 0.16186012 0.022389467 0.176354514
## compactness_mean 0.007660737 -0.166984319 -0.05315682 0.112641659 0.003210000
## Dim.15 Dim.16 Dim.17 Dim.18
## radius_mean 0.015683967 0.04255502 0.049456533 0.033654027
## texture_mean 0.033112133 0.04460615 -0.009433423 -0.009428525
## perimeter_mean 0.012242788 0.03234470 0.047481690 0.036316100
## area_mean -0.004285246 0.03742982 0.062320398 0.061055728
## smoothness_mean 0.036248064 0.05782372 0.040927741 -0.080796547
## compactness_mean -0.070843391 -0.04809242 -0.004949378 0.001787881
## Dim.19 Dim.20 Dim.21 Dim.22
## radius_mean 0.050133570 0.008772821 0.011871307 0.01208056
## texture_mean 0.006626055 0.043094773 -0.077624778 0.01570358
## perimeter_mean 0.053294517 0.003118233 0.012078892 0.01245022
## area_mean -0.006077427 0.015912200 0.003193026 0.01616162
## smoothness_mean -0.036605301 -0.003018666 0.020687226 0.01057217
## compactness_mean 0.063221168 -0.086263038 -0.033347929 -0.01624639
## Dim.23 Dim.24 Dim.25 Dim.26
## radius_mean 1.537575e-02 0.024533003 0.002392233 0.011708590
## texture_mean 8.658821e-05 -0.013273874 -0.010544407 0.002220667
## perimeter_mean 6.278798e-03 0.015673984 -0.003361359 0.011326933
## area_mean -1.213375e-03 -0.009385446 0.026134063 -0.032801549
## smoothness_mean 3.224173e-03 -0.009230799 -0.003602676 0.003346255
## compactness_mean -8.169034e-03 0.013992577 -0.049349353 -0.023765850
## Dim.27 Dim.28 Dim.29 Dim.30
## radius_mean 0.010925793 8.419566e-03 5.786460e-03 8.101999e-03
## texture_mean 0.001441855 -2.623673e-06 -2.882534e-04 3.156545e-06
## perimeter_mean 0.009587447 3.362272e-03 1.050312e-02 -7.957621e-03
## area_mean -0.038761046 -1.086395e-02 -1.156947e-02 -3.800314e-04
## smoothness_mean -0.005789074 5.897327e-05 -9.398714e-05 -5.591303e-05
## compactness_mean -0.008119890 -2.177814e-04 -1.122394e-03 5.152947e-04
head(all_pca_var_2$cor)
## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5
## radius_mean 0.7977668 0.5579027 0.01432118 0.05827700 0.04851878
## texture_mean 0.3780132 0.1424382 -0.10835829 -0.84870380 -0.06351944
## perimeter_mean 0.8292355 0.5133487 0.01563555 0.05908501 0.04799015
## area_mean 0.8053928 0.5512695 -0.04817717 0.07520017 0.01326563
## smoothness_mean 0.5196530 -0.4440017 0.17507219 0.22430770 -0.46878427
## compactness_mean 0.8720501 -0.3623611 0.12437565 0.04474618 0.01502824
## Dim.6 Dim.7 Dim.8 Dim.9
## radius_mean 0.020592339 0.101965596 0.005144876 0.144056156
## texture_mean -0.035358035 -0.009367203 -0.090214585 -0.072767057
## perimeter_mean 0.019018481 0.094067834 0.012901209 0.144462575
## area_mean -0.002074253 0.042444540 -0.023937777 0.126284788
## smoothness_mean -0.314667668 0.115590213 0.199500717 -0.004148275
## compactness_mean -0.015527056 -0.025406278 0.104520200 0.108370831
## Dim.10 Dim.11 Dim.12 Dim.13 Dim.14
## radius_mean 0.056546476 0.022483349 0.02609749 0.005879269 0.023578980
## texture_mean 0.142679652 -0.163858215 0.13026214 0.099956785 -0.008543071
## perimeter_mean 0.051157023 0.009098538 0.01989278 0.021670182 0.019223333
## area_mean 0.044388765 0.059727362 0.03344115 0.033100452 0.004291657
## smoothness_mean -0.041034694 -0.074285011 0.16186012 0.022389467 0.176354514
## compactness_mean 0.007660737 -0.166984319 -0.05315682 0.112641659 0.003210000
## Dim.15 Dim.16 Dim.17 Dim.18
## radius_mean 0.015683967 0.04255502 0.049456533 0.033654027
## texture_mean 0.033112133 0.04460615 -0.009433423 -0.009428525
## perimeter_mean 0.012242788 0.03234470 0.047481690 0.036316100
## area_mean -0.004285246 0.03742982 0.062320398 0.061055728
## smoothness_mean 0.036248064 0.05782372 0.040927741 -0.080796547
## compactness_mean -0.070843391 -0.04809242 -0.004949378 0.001787881
## Dim.19 Dim.20 Dim.21 Dim.22
## radius_mean 0.050133570 0.008772821 0.011871307 0.01208056
## texture_mean 0.006626055 0.043094773 -0.077624778 0.01570358
## perimeter_mean 0.053294517 0.003118233 0.012078892 0.01245022
## area_mean -0.006077427 0.015912200 0.003193026 0.01616162
## smoothness_mean -0.036605301 -0.003018666 0.020687226 0.01057217
## compactness_mean 0.063221168 -0.086263038 -0.033347929 -0.01624639
## Dim.23 Dim.24 Dim.25 Dim.26
## radius_mean 1.537575e-02 0.024533003 0.002392233 0.011708590
## texture_mean 8.658821e-05 -0.013273874 -0.010544407 0.002220667
## perimeter_mean 6.278798e-03 0.015673984 -0.003361359 0.011326933
## area_mean -1.213375e-03 -0.009385446 0.026134063 -0.032801549
## smoothness_mean 3.224173e-03 -0.009230799 -0.003602676 0.003346255
## compactness_mean -8.169034e-03 0.013992577 -0.049349353 -0.023765850
## Dim.27 Dim.28 Dim.29 Dim.30
## radius_mean 0.010925793 8.419566e-03 5.786460e-03 8.101999e-03
## texture_mean 0.001441855 -2.623673e-06 -2.882534e-04 3.156545e-06
## perimeter_mean 0.009587447 3.362272e-03 1.050312e-02 -7.957621e-03
## area_mean -0.038761046 -1.086395e-02 -1.156947e-02 -3.800314e-04
## smoothness_mean -0.005789074 5.897327e-05 -9.398714e-05 -5.591303e-05
## compactness_mean -0.008119890 -2.177814e-04 -1.122394e-03 5.152947e-04
head(all_pca_var_2$cos2)
## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5
## radius_mean 0.6364318 0.31125539 0.0002050963 0.003396209 0.0023540715
## texture_mean 0.1428940 0.02028864 0.0117415199 0.720298141 0.0040347193
## perimeter_mean 0.6876316 0.26352690 0.0002444703 0.003491038 0.0023030547
## area_mean 0.6486576 0.30389811 0.0023210397 0.005655066 0.0001759769
## smoothness_mean 0.2700393 0.19713747 0.0306502709 0.050313944 0.2197586899
## compactness_mean 0.7604714 0.13130559 0.0154693024 0.002002220 0.0002258480
## Dim.6 Dim.7 Dim.8 Dim.9
## radius_mean 4.240444e-04 0.0103969827 2.646975e-05 2.075218e-02
## texture_mean 1.250191e-03 0.0000877445 8.138671e-03 5.295045e-03
## perimeter_mean 3.617026e-04 0.0088487573 1.664412e-04 2.086944e-02
## area_mean 4.302527e-06 0.0018015390 5.730172e-04 1.594785e-02
## smoothness_mean 9.901574e-02 0.0133610973 3.980054e-02 1.720819e-05
## compactness_mean 2.410895e-04 0.0006454790 1.092447e-02 1.174424e-02
## Dim.10 Dim.11 Dim.12 Dim.13
## radius_mean 3.197504e-03 5.055010e-04 0.0006810789 3.456581e-05
## texture_mean 2.035748e-02 2.684951e-02 0.0169682252 9.991359e-03
## perimeter_mean 2.617041e-03 8.278339e-05 0.0003957226 4.695968e-04
## area_mean 1.970362e-03 3.567358e-03 0.0011183106 1.095640e-03
## smoothness_mean 1.683846e-03 5.518263e-03 0.0261986970 5.012882e-04
## compactness_mean 5.868689e-05 2.788376e-02 0.0028256473 1.268814e-02
## Dim.14 Dim.15 Dim.16 Dim.17
## radius_mean 5.559683e-04 2.459868e-04 0.001810929 2.445949e-03
## texture_mean 7.298407e-05 1.096413e-03 0.001989709 8.898948e-05
## perimeter_mean 3.695365e-04 1.498859e-04 0.001046179 2.254511e-03
## area_mean 1.841832e-05 1.836333e-05 0.001400992 3.883832e-03
## smoothness_mean 3.110091e-02 1.313922e-03 0.003343582 1.675080e-03
## compactness_mean 1.030410e-05 5.018786e-03 0.002312881 2.449634e-05
## Dim.18 Dim.19 Dim.20 Dim.21
## radius_mean 1.132594e-03 2.513375e-03 7.696239e-05 1.409279e-04
## texture_mean 8.889709e-05 4.390460e-05 1.857159e-03 6.025606e-03
## perimeter_mean 1.318859e-03 2.840305e-03 9.723374e-06 1.458996e-04
## area_mean 3.727802e-03 3.693512e-05 2.531981e-04 1.019542e-05
## smoothness_mean 6.528082e-03 1.339948e-03 9.112344e-06 4.279613e-04
## compactness_mean 3.196517e-06 3.996916e-03 7.441312e-03 1.112084e-03
## Dim.22 Dim.23 Dim.24 Dim.25
## radius_mean 0.0001459399 2.364136e-04 6.018682e-04 5.722780e-06
## texture_mean 0.0002466023 7.497518e-09 1.761957e-04 1.111845e-04
## perimeter_mean 0.0001550079 3.942330e-05 2.456738e-04 1.129874e-05
## area_mean 0.0002611979 1.472279e-06 8.808659e-05 6.829892e-04
## smoothness_mean 0.0001117708 1.039529e-05 8.520764e-05 1.297927e-05
## compactness_mean 0.0002639453 6.673311e-05 1.957922e-04 2.435359e-03
## Dim.26 Dim.27 Dim.28 Dim.29
## radius_mean 1.370911e-04 1.193730e-04 7.088910e-05 3.348312e-05
## texture_mean 4.931360e-06 2.078945e-06 6.883658e-12 8.309002e-08
## perimeter_mean 1.282994e-04 9.191915e-05 1.130487e-05 1.103155e-04
## area_mean 1.075942e-03 1.502419e-03 1.180255e-04 1.338527e-04
## smoothness_mean 1.119742e-05 3.351338e-05 3.477847e-09 8.833583e-09
## compactness_mean 5.648156e-04 6.593262e-05 4.742874e-08 1.259768e-06
## Dim.30
## radius_mean 6.564239e-05
## texture_mean 9.963774e-12
## perimeter_mean 6.332372e-05
## area_mean 1.444238e-07
## smoothness_mean 3.126267e-09
## compactness_mean 2.655286e-07
head(all_pca_var_2$contrib)
## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5
## radius_mean 4.791828 5.4689158 0.007278210 0.1714702 0.14278085
## texture_mean 1.075879 0.3564817 0.416669002 36.3669303 0.24471672
## perimeter_mean 5.177322 4.6303018 0.008675469 0.1762581 0.13968654
## area_mean 4.883878 5.3396446 0.082366279 0.2855170 0.01067348
## smoothness_mean 2.033182 3.4638057 1.087680124 2.5402866 13.32896332
## compactness_mean 5.725748 2.3071061 0.548956087 0.1010895 0.01369829
## Dim.6 Dim.7 Dim.8 Dim.9 Dim.10
## radius_mean 0.0351217226 1.53979162 0.005553672 4.977796701 0.91176608
## texture_mean 0.1035477523 0.01299495 1.707590993 1.270115259 5.80492242
## perimeter_mean 0.0299582265 1.31049967 0.034921362 5.005923564 0.74624745
## area_mean 0.0003563592 0.26680766 0.120225880 3.825388875 0.56184752
## smoothness_mean 8.2010352310 1.97877655 8.350630500 0.004127705 0.48014757
## compactness_mean 0.0199683716 0.09559534 2.292085472 2.817074391 0.01673453
## Dim.11 Dim.12 Dim.13 Dim.14 Dim.15
## radius_mean 0.17198842 0.2607885 0.01432142 0.354098008 0.26131291
## texture_mean 9.13510741 6.4972186 4.13965139 0.046483789 1.16472490
## perimeter_mean 0.02816569 0.1515242 0.19456483 0.235358999 0.15922443
## area_mean 1.21373501 0.4282067 0.45394900 0.011730686 0.01950745
## smoothness_mean 1.87749853 10.0316126 0.20769532 19.808272945 1.39578544
## compactness_mean 9.48699343 1.0819546 5.25699162 0.006562713 5.33147923
## Dim.16 Dim.17 Dim.18 Dim.19 Dim.20
## radius_mean 2.267551 4.1178253 2.152451027 5.07982446 0.24699572
## texture_mean 2.491408 0.1498164 0.168945538 0.08873633 5.96018948
## perimeter_mean 1.309971 3.7955343 2.506441650 5.74058961 0.03120527
## area_mean 1.754248 6.5385437 7.084545992 0.07465020 0.81258978
## smoothness_mean 4.186658 2.8200456 12.406371982 2.70819168 0.02924428
## compactness_mean 2.896068 0.0412403 0.006074859 8.07823486 23.88143277
## Dim.21 Dim.22 Dim.23 Dim.24 Dim.25
## radius_mean 0.47018457 0.5318625 9.712634e-01 3.3335252 0.03696583
## texture_mean 20.10351786 0.8987160 3.080222e-05 0.9758830 0.71818728
## perimeter_mean 0.48677194 0.5649097 1.619636e-01 1.3606960 0.07298326
## area_mean 0.03401545 0.9519081 6.048598e-03 0.4878790 4.41171292
## smoothness_mean 1.42782777 0.4073369 4.270720e-02 0.4719336 0.08383854
## compactness_mean 3.71030023 0.9619207 2.741611e-01 1.0844206 15.73099874
## Dim.26 Dim.27 Dim.28 Dim.29 Dim.30
## radius_mean 1.67641371 1.72992648 4.460291e+00 4.471552424 4.933856e+01
## texture_mean 0.06030297 0.03012762 4.331148e-07 0.011096377 7.489035e-06
## perimeter_mean 1.56890519 1.33207198 7.112943e-01 14.732247365 4.759578e+01
## area_mean 13.15711689 21.77272040 7.426079e+00 17.875554424 1.085528e-01
## smoothness_mean 0.13692728 0.48566854 2.188236e-04 0.001179694 2.349785e-03
## compactness_mean 6.90682942 0.95548094 2.984182e-03 0.168237574 1.995783e-01all_pca_var_3 <- get_pca_var(all_pca_3)
all_pca_var_3
## Principal Component Analysis Results for variables
## ===================================================
## Name Description
## 1 "$coord" "Coordinates for the variables"
## 2 "$cor" "Correlations between variables and dimensions"
## 3 "$cos2" "Cos2 for the variables"
## 4 "$contrib" "contributions of the variables"
head(all_pca_var_3$coord)
## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5
## radius_mean 0.7977668 -0.5579027 -0.01432118 -0.05827700 -0.04851878
## texture_mean 0.3780132 -0.1424382 0.10835829 0.84870380 0.06351944
## perimeter_mean 0.8292355 -0.5133487 -0.01563555 -0.05908501 -0.04799015
## area_mean 0.8053928 -0.5512695 0.04817717 -0.07520017 -0.01326563
## smoothness_mean 0.5196530 0.4440017 -0.17507219 -0.22430770 0.46878427
## compactness_mean 0.8720501 0.3623611 -0.12437565 -0.04474618 -0.01502824
## Dim.6 Dim.7 Dim.8 Dim.9
## radius_mean -0.020592339 0.101965596 -0.005144876 0.144056156
## texture_mean 0.035358035 -0.009367203 0.090214585 -0.072767057
## perimeter_mean -0.019018481 0.094067834 -0.012901209 0.144462575
## area_mean 0.002074253 0.042444540 0.023937777 0.126284788
## smoothness_mean 0.314667668 0.115590213 -0.199500717 -0.004148275
## compactness_mean 0.015527056 -0.025406278 -0.104520200 0.108370831
## Dim.10 Dim.11 Dim.12 Dim.13
## radius_mean 0.056546476 0.022483349 0.02609749 -0.005879269
## texture_mean 0.142679652 -0.163858215 0.13026214 -0.099956785
## perimeter_mean 0.051157023 0.009098538 0.01989278 -0.021670182
## area_mean 0.044388765 0.059727362 0.03344115 -0.033100452
## smoothness_mean -0.041034694 -0.074285011 0.16186012 -0.022389467
## compactness_mean 0.007660737 -0.166984319 -0.05315682 -0.112641659
## Dim.14 Dim.15 Dim.16 Dim.17
## radius_mean -0.023578980 -0.015683967 0.04255502 0.049456533
## texture_mean 0.008543071 -0.033112133 0.04460615 -0.009433423
## perimeter_mean -0.019223333 -0.012242788 0.03234470 0.047481690
## area_mean -0.004291657 0.004285246 0.03742982 0.062320398
## smoothness_mean -0.176354514 -0.036248064 0.05782372 0.040927741
## compactness_mean -0.003210000 0.070843391 -0.04809242 -0.004949378
## Dim.18 Dim.19 Dim.20 Dim.21
## radius_mean 0.033654027 0.050133570 0.008772821 0.011871307
## texture_mean -0.009428525 0.006626055 0.043094773 -0.077624778
## perimeter_mean 0.036316100 0.053294517 0.003118233 0.012078892
## area_mean 0.061055728 -0.006077427 0.015912200 0.003193026
## smoothness_mean -0.080796547 -0.036605301 -0.003018666 0.020687226
## compactness_mean 0.001787881 0.063221168 -0.086263038 -0.033347929
## Dim.22 Dim.23 Dim.24 Dim.25
## radius_mean 0.01208056 1.537575e-02 0.024533003 0.002392233
## texture_mean 0.01570358 8.658821e-05 -0.013273874 -0.010544407
## perimeter_mean 0.01245022 6.278798e-03 0.015673984 -0.003361359
## area_mean 0.01616162 -1.213375e-03 -0.009385446 0.026134063
## smoothness_mean 0.01057217 3.224173e-03 -0.009230799 -0.003602676
## compactness_mean -0.01624639 -8.169034e-03 0.013992577 -0.049349353
## Dim.26 Dim.27 Dim.28 Dim.29
## radius_mean 0.011708590 0.010925793 8.419566e-03 5.786460e-03
## texture_mean 0.002220667 0.001441855 -2.623673e-06 -2.882534e-04
## perimeter_mean 0.011326933 0.009587447 3.362272e-03 1.050312e-02
## area_mean -0.032801549 -0.038761046 -1.086395e-02 -1.156947e-02
## smoothness_mean 0.003346255 -0.005789074 5.897327e-05 -9.398714e-05
## compactness_mean -0.023765850 -0.008119890 -2.177814e-04 -1.122394e-03
## Dim.30
## radius_mean 8.101999e-03
## texture_mean 3.156545e-06
## perimeter_mean -7.957621e-03
## area_mean -3.800314e-04
## smoothness_mean -5.591303e-05
## compactness_mean 5.152947e-04
head(all_pca_var_3$cor)
## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5
## radius_mean 0.7977668 -0.5579027 -0.01432118 -0.05827700 -0.04851878
## texture_mean 0.3780132 -0.1424382 0.10835829 0.84870380 0.06351944
## perimeter_mean 0.8292355 -0.5133487 -0.01563555 -0.05908501 -0.04799015
## area_mean 0.8053928 -0.5512695 0.04817717 -0.07520017 -0.01326563
## smoothness_mean 0.5196530 0.4440017 -0.17507219 -0.22430770 0.46878427
## compactness_mean 0.8720501 0.3623611 -0.12437565 -0.04474618 -0.01502824
## Dim.6 Dim.7 Dim.8 Dim.9
## radius_mean -0.020592339 0.101965596 -0.005144876 0.144056156
## texture_mean 0.035358035 -0.009367203 0.090214585 -0.072767057
## perimeter_mean -0.019018481 0.094067834 -0.012901209 0.144462575
## area_mean 0.002074253 0.042444540 0.023937777 0.126284788
## smoothness_mean 0.314667668 0.115590213 -0.199500717 -0.004148275
## compactness_mean 0.015527056 -0.025406278 -0.104520200 0.108370831
## Dim.10 Dim.11 Dim.12 Dim.13
## radius_mean 0.056546476 0.022483349 0.02609749 -0.005879269
## texture_mean 0.142679652 -0.163858215 0.13026214 -0.099956785
## perimeter_mean 0.051157023 0.009098538 0.01989278 -0.021670182
## area_mean 0.044388765 0.059727362 0.03344115 -0.033100452
## smoothness_mean -0.041034694 -0.074285011 0.16186012 -0.022389467
## compactness_mean 0.007660737 -0.166984319 -0.05315682 -0.112641659
## Dim.14 Dim.15 Dim.16 Dim.17
## radius_mean -0.023578980 -0.015683967 0.04255502 0.049456533
## texture_mean 0.008543071 -0.033112133 0.04460615 -0.009433423
## perimeter_mean -0.019223333 -0.012242788 0.03234470 0.047481690
## area_mean -0.004291657 0.004285246 0.03742982 0.062320398
## smoothness_mean -0.176354514 -0.036248064 0.05782372 0.040927741
## compactness_mean -0.003210000 0.070843391 -0.04809242 -0.004949378
## Dim.18 Dim.19 Dim.20 Dim.21
## radius_mean 0.033654027 0.050133570 0.008772821 0.011871307
## texture_mean -0.009428525 0.006626055 0.043094773 -0.077624778
## perimeter_mean 0.036316100 0.053294517 0.003118233 0.012078892
## area_mean 0.061055728 -0.006077427 0.015912200 0.003193026
## smoothness_mean -0.080796547 -0.036605301 -0.003018666 0.020687226
## compactness_mean 0.001787881 0.063221168 -0.086263038 -0.033347929
## Dim.22 Dim.23 Dim.24 Dim.25
## radius_mean 0.01208056 1.537575e-02 0.024533003 0.002392233
## texture_mean 0.01570358 8.658821e-05 -0.013273874 -0.010544407
## perimeter_mean 0.01245022 6.278798e-03 0.015673984 -0.003361359
## area_mean 0.01616162 -1.213375e-03 -0.009385446 0.026134063
## smoothness_mean 0.01057217 3.224173e-03 -0.009230799 -0.003602676
## compactness_mean -0.01624639 -8.169034e-03 0.013992577 -0.049349353
## Dim.26 Dim.27 Dim.28 Dim.29
## radius_mean 0.011708590 0.010925793 8.419566e-03 5.786460e-03
## texture_mean 0.002220667 0.001441855 -2.623673e-06 -2.882534e-04
## perimeter_mean 0.011326933 0.009587447 3.362272e-03 1.050312e-02
## area_mean -0.032801549 -0.038761046 -1.086395e-02 -1.156947e-02
## smoothness_mean 0.003346255 -0.005789074 5.897327e-05 -9.398714e-05
## compactness_mean -0.023765850 -0.008119890 -2.177814e-04 -1.122394e-03
## Dim.30
## radius_mean 8.101999e-03
## texture_mean 3.156545e-06
## perimeter_mean -7.957621e-03
## area_mean -3.800314e-04
## smoothness_mean -5.591303e-05
## compactness_mean 5.152947e-04
head(all_pca_var_3$cos2)
## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5
## radius_mean 0.6364318 0.31125539 0.0002050963 0.003396209 0.0023540715
## texture_mean 0.1428940 0.02028864 0.0117415199 0.720298141 0.0040347193
## perimeter_mean 0.6876316 0.26352690 0.0002444703 0.003491038 0.0023030547
## area_mean 0.6486576 0.30389811 0.0023210397 0.005655066 0.0001759769
## smoothness_mean 0.2700393 0.19713747 0.0306502709 0.050313944 0.2197586899
## compactness_mean 0.7604714 0.13130559 0.0154693024 0.002002220 0.0002258480
## Dim.6 Dim.7 Dim.8 Dim.9
## radius_mean 4.240444e-04 0.0103969827 2.646975e-05 2.075218e-02
## texture_mean 1.250191e-03 0.0000877445 8.138671e-03 5.295045e-03
## perimeter_mean 3.617026e-04 0.0088487573 1.664412e-04 2.086944e-02
## area_mean 4.302527e-06 0.0018015390 5.730172e-04 1.594785e-02
## smoothness_mean 9.901574e-02 0.0133610973 3.980054e-02 1.720819e-05
## compactness_mean 2.410895e-04 0.0006454790 1.092447e-02 1.174424e-02
## Dim.10 Dim.11 Dim.12 Dim.13
## radius_mean 3.197504e-03 5.055010e-04 0.0006810789 3.456581e-05
## texture_mean 2.035748e-02 2.684951e-02 0.0169682252 9.991359e-03
## perimeter_mean 2.617041e-03 8.278339e-05 0.0003957226 4.695968e-04
## area_mean 1.970362e-03 3.567358e-03 0.0011183106 1.095640e-03
## smoothness_mean 1.683846e-03 5.518263e-03 0.0261986970 5.012882e-04
## compactness_mean 5.868689e-05 2.788376e-02 0.0028256473 1.268814e-02
## Dim.14 Dim.15 Dim.16 Dim.17
## radius_mean 5.559683e-04 2.459868e-04 0.001810929 2.445949e-03
## texture_mean 7.298407e-05 1.096413e-03 0.001989709 8.898948e-05
## perimeter_mean 3.695365e-04 1.498859e-04 0.001046179 2.254511e-03
## area_mean 1.841832e-05 1.836333e-05 0.001400992 3.883832e-03
## smoothness_mean 3.110091e-02 1.313922e-03 0.003343582 1.675080e-03
## compactness_mean 1.030410e-05 5.018786e-03 0.002312881 2.449634e-05
## Dim.18 Dim.19 Dim.20 Dim.21
## radius_mean 1.132594e-03 2.513375e-03 7.696239e-05 1.409279e-04
## texture_mean 8.889709e-05 4.390460e-05 1.857159e-03 6.025606e-03
## perimeter_mean 1.318859e-03 2.840305e-03 9.723374e-06 1.458996e-04
## area_mean 3.727802e-03 3.693512e-05 2.531981e-04 1.019542e-05
## smoothness_mean 6.528082e-03 1.339948e-03 9.112344e-06 4.279613e-04
## compactness_mean 3.196517e-06 3.996916e-03 7.441312e-03 1.112084e-03
## Dim.22 Dim.23 Dim.24 Dim.25
## radius_mean 0.0001459399 2.364136e-04 6.018682e-04 5.722780e-06
## texture_mean 0.0002466023 7.497518e-09 1.761957e-04 1.111845e-04
## perimeter_mean 0.0001550079 3.942330e-05 2.456738e-04 1.129874e-05
## area_mean 0.0002611979 1.472279e-06 8.808659e-05 6.829892e-04
## smoothness_mean 0.0001117708 1.039529e-05 8.520764e-05 1.297927e-05
## compactness_mean 0.0002639453 6.673311e-05 1.957922e-04 2.435359e-03
## Dim.26 Dim.27 Dim.28 Dim.29
## radius_mean 1.370911e-04 1.193730e-04 7.088910e-05 3.348312e-05
## texture_mean 4.931360e-06 2.078945e-06 6.883658e-12 8.309002e-08
## perimeter_mean 1.282994e-04 9.191915e-05 1.130487e-05 1.103155e-04
## area_mean 1.075942e-03 1.502419e-03 1.180255e-04 1.338527e-04
## smoothness_mean 1.119742e-05 3.351338e-05 3.477847e-09 8.833583e-09
## compactness_mean 5.648156e-04 6.593262e-05 4.742874e-08 1.259768e-06
## Dim.30
## radius_mean 6.564239e-05
## texture_mean 9.963774e-12
## perimeter_mean 6.332372e-05
## area_mean 1.444238e-07
## smoothness_mean 3.126267e-09
## compactness_mean 2.655286e-07
head(all_pca_var_3$contrib)
## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5
## radius_mean 4.791828 5.4689158 0.007278210 0.1714702 0.14278085
## texture_mean 1.075879 0.3564817 0.416669002 36.3669303 0.24471672
## perimeter_mean 5.177322 4.6303018 0.008675469 0.1762581 0.13968654
## area_mean 4.883878 5.3396446 0.082366279 0.2855170 0.01067348
## smoothness_mean 2.033182 3.4638057 1.087680124 2.5402866 13.32896332
## compactness_mean 5.725748 2.3071061 0.548956087 0.1010895 0.01369829
## Dim.6 Dim.7 Dim.8 Dim.9 Dim.10
## radius_mean 0.0351217226 1.53979162 0.005553672 4.977796701 0.91176608
## texture_mean 0.1035477523 0.01299495 1.707590993 1.270115259 5.80492242
## perimeter_mean 0.0299582265 1.31049967 0.034921362 5.005923564 0.74624745
## area_mean 0.0003563592 0.26680766 0.120225880 3.825388875 0.56184752
## smoothness_mean 8.2010352310 1.97877655 8.350630500 0.004127705 0.48014757
## compactness_mean 0.0199683716 0.09559534 2.292085472 2.817074391 0.01673453
## Dim.11 Dim.12 Dim.13 Dim.14 Dim.15
## radius_mean 0.17198842 0.2607885 0.01432142 0.354098008 0.26131291
## texture_mean 9.13510741 6.4972186 4.13965139 0.046483789 1.16472490
## perimeter_mean 0.02816569 0.1515242 0.19456483 0.235358999 0.15922443
## area_mean 1.21373501 0.4282067 0.45394900 0.011730686 0.01950745
## smoothness_mean 1.87749853 10.0316126 0.20769532 19.808272945 1.39578544
## compactness_mean 9.48699343 1.0819546 5.25699162 0.006562713 5.33147923
## Dim.16 Dim.17 Dim.18 Dim.19 Dim.20
## radius_mean 2.267551 4.1178253 2.152451027 5.07982446 0.24699572
## texture_mean 2.491408 0.1498164 0.168945538 0.08873633 5.96018948
## perimeter_mean 1.309971 3.7955343 2.506441650 5.74058961 0.03120527
## area_mean 1.754248 6.5385437 7.084545992 0.07465020 0.81258978
## smoothness_mean 4.186658 2.8200456 12.406371982 2.70819168 0.02924428
## compactness_mean 2.896068 0.0412403 0.006074859 8.07823486 23.88143277
## Dim.21 Dim.22 Dim.23 Dim.24 Dim.25
## radius_mean 0.47018457 0.5318625 9.712634e-01 3.3335252 0.03696583
## texture_mean 20.10351786 0.8987160 3.080222e-05 0.9758830 0.71818728
## perimeter_mean 0.48677194 0.5649097 1.619636e-01 1.3606960 0.07298326
## area_mean 0.03401545 0.9519081 6.048598e-03 0.4878790 4.41171292
## smoothness_mean 1.42782777 0.4073369 4.270720e-02 0.4719336 0.08383854
## compactness_mean 3.71030023 0.9619207 2.741611e-01 1.0844206 15.73099874
## Dim.26 Dim.27 Dim.28 Dim.29 Dim.30
## radius_mean 1.67641371 1.72992648 4.460291e+00 4.471552424 4.933856e+01
## texture_mean 0.06030297 0.03012762 4.331148e-07 0.011096377 7.489035e-06
## perimeter_mean 1.56890519 1.33207198 7.112943e-01 14.732247365 4.759578e+01
## area_mean 13.15711689 21.77272040 7.426079e+00 17.875554424 1.085528e-01
## smoothness_mean 0.13692728 0.48566854 2.188236e-04 0.001179694 2.349785e-03
## compactness_mean 6.90682942 0.95548094 2.984182e-03 0.168237574 1.995783e-01all_pca_var_4 <- get_pca_var(all_pca_4)
all_pca_var_4
## Principal Component Analysis Results for variables
## ===================================================
## Name Description
## 1 "$coord" "Coordinates for the variables"
## 2 "$cor" "Correlations between variables and dimensions"
## 3 "$cos2" "Cos2 for the variables"
## 4 "$contrib" "contributions of the variables"
head(all_pca_var_4$coord)
## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5
## radius_mean -0.7977668 -0.5579027 -0.01432118 0.05827700 0.04851878
## texture_mean -0.3780132 -0.1424382 0.10835829 -0.84870380 -0.06351944
## perimeter_mean -0.8292355 -0.5133487 -0.01563555 0.05908501 0.04799015
## area_mean -0.8053928 -0.5512695 0.04817717 0.07520017 0.01326563
## smoothness_mean -0.5196530 0.4440017 -0.17507219 0.22430770 -0.46878427
## compactness_mean -0.8720501 0.3623611 -0.12437565 0.04474618 0.01502824
## Dim.6 Dim.7 Dim.8 Dim.9
## radius_mean 0.020592339 -0.101965596 0.005144876 -0.144056156
## texture_mean -0.035358035 0.009367203 -0.090214585 0.072767057
## perimeter_mean 0.019018481 -0.094067834 0.012901209 -0.144462575
## area_mean -0.002074253 -0.042444540 -0.023937777 -0.126284788
## smoothness_mean -0.314667668 -0.115590213 0.199500717 0.004148275
## compactness_mean -0.015527056 0.025406278 0.104520200 -0.108370831
## Dim.10 Dim.11 Dim.12 Dim.13 Dim.14
## radius_mean 0.056546476 0.022483349 0.02609749 0.005879269 -0.023578980
## texture_mean 0.142679652 -0.163858215 0.13026214 0.099956785 0.008543071
## perimeter_mean 0.051157023 0.009098538 0.01989278 0.021670182 -0.019223333
## area_mean 0.044388765 0.059727362 0.03344115 0.033100452 -0.004291657
## smoothness_mean -0.041034694 -0.074285011 0.16186012 0.022389467 -0.176354514
## compactness_mean 0.007660737 -0.166984319 -0.05315682 0.112641659 -0.003210000
## Dim.15 Dim.16 Dim.17 Dim.18
## radius_mean 0.015683967 0.04255502 -0.049456533 -0.033654027
## texture_mean 0.033112133 0.04460615 0.009433423 0.009428525
## perimeter_mean 0.012242788 0.03234470 -0.047481690 -0.036316100
## area_mean -0.004285246 0.03742982 -0.062320398 -0.061055728
## smoothness_mean 0.036248064 0.05782372 -0.040927741 0.080796547
## compactness_mean -0.070843391 -0.04809242 0.004949378 -0.001787881
## Dim.19 Dim.20 Dim.21 Dim.22
## radius_mean 0.050133570 -0.008772821 -0.011871307 0.01208056
## texture_mean 0.006626055 -0.043094773 0.077624778 0.01570358
## perimeter_mean 0.053294517 -0.003118233 -0.012078892 0.01245022
## area_mean -0.006077427 -0.015912200 -0.003193026 0.01616162
## smoothness_mean -0.036605301 0.003018666 -0.020687226 0.01057217
## compactness_mean 0.063221168 0.086263038 0.033347929 -0.01624639
## Dim.23 Dim.24 Dim.25 Dim.26
## radius_mean -1.537575e-02 0.024533003 -0.002392233 -0.011708590
## texture_mean -8.658821e-05 -0.013273874 0.010544407 -0.002220667
## perimeter_mean -6.278798e-03 0.015673984 0.003361359 -0.011326933
## area_mean 1.213375e-03 -0.009385446 -0.026134063 0.032801549
## smoothness_mean -3.224173e-03 -0.009230799 0.003602676 -0.003346255
## compactness_mean 8.169034e-03 0.013992577 0.049349353 0.023765850
## Dim.27 Dim.28 Dim.29 Dim.30
## radius_mean 0.010925793 -8.419566e-03 5.786460e-03 8.101999e-03
## texture_mean 0.001441855 2.623673e-06 -2.882534e-04 3.156545e-06
## perimeter_mean 0.009587447 -3.362272e-03 1.050312e-02 -7.957621e-03
## area_mean -0.038761046 1.086395e-02 -1.156947e-02 -3.800314e-04
## smoothness_mean -0.005789074 -5.897327e-05 -9.398714e-05 -5.591303e-05
## compactness_mean -0.008119890 2.177814e-04 -1.122394e-03 5.152947e-04
head(all_pca_var_4$cor)
## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5
## radius_mean -0.7977668 -0.5579027 -0.01432118 0.05827700 0.04851878
## texture_mean -0.3780132 -0.1424382 0.10835829 -0.84870380 -0.06351944
## perimeter_mean -0.8292355 -0.5133487 -0.01563555 0.05908501 0.04799015
## area_mean -0.8053928 -0.5512695 0.04817717 0.07520017 0.01326563
## smoothness_mean -0.5196530 0.4440017 -0.17507219 0.22430770 -0.46878427
## compactness_mean -0.8720501 0.3623611 -0.12437565 0.04474618 0.01502824
## Dim.6 Dim.7 Dim.8 Dim.9
## radius_mean 0.020592339 -0.101965596 0.005144876 -0.144056156
## texture_mean -0.035358035 0.009367203 -0.090214585 0.072767057
## perimeter_mean 0.019018481 -0.094067834 0.012901209 -0.144462575
## area_mean -0.002074253 -0.042444540 -0.023937777 -0.126284788
## smoothness_mean -0.314667668 -0.115590213 0.199500717 0.004148275
## compactness_mean -0.015527056 0.025406278 0.104520200 -0.108370831
## Dim.10 Dim.11 Dim.12 Dim.13 Dim.14
## radius_mean 0.056546476 0.022483349 0.02609749 0.005879269 -0.023578980
## texture_mean 0.142679652 -0.163858215 0.13026214 0.099956785 0.008543071
## perimeter_mean 0.051157023 0.009098538 0.01989278 0.021670182 -0.019223333
## area_mean 0.044388765 0.059727362 0.03344115 0.033100452 -0.004291657
## smoothness_mean -0.041034694 -0.074285011 0.16186012 0.022389467 -0.176354514
## compactness_mean 0.007660737 -0.166984319 -0.05315682 0.112641659 -0.003210000
## Dim.15 Dim.16 Dim.17 Dim.18
## radius_mean 0.015683967 0.04255502 -0.049456533 -0.033654027
## texture_mean 0.033112133 0.04460615 0.009433423 0.009428525
## perimeter_mean 0.012242788 0.03234470 -0.047481690 -0.036316100
## area_mean -0.004285246 0.03742982 -0.062320398 -0.061055728
## smoothness_mean 0.036248064 0.05782372 -0.040927741 0.080796547
## compactness_mean -0.070843391 -0.04809242 0.004949378 -0.001787881
## Dim.19 Dim.20 Dim.21 Dim.22
## radius_mean 0.050133570 -0.008772821 -0.011871307 0.01208056
## texture_mean 0.006626055 -0.043094773 0.077624778 0.01570358
## perimeter_mean 0.053294517 -0.003118233 -0.012078892 0.01245022
## area_mean -0.006077427 -0.015912200 -0.003193026 0.01616162
## smoothness_mean -0.036605301 0.003018666 -0.020687226 0.01057217
## compactness_mean 0.063221168 0.086263038 0.033347929 -0.01624639
## Dim.23 Dim.24 Dim.25 Dim.26
## radius_mean -1.537575e-02 0.024533003 -0.002392233 -0.011708590
## texture_mean -8.658821e-05 -0.013273874 0.010544407 -0.002220667
## perimeter_mean -6.278798e-03 0.015673984 0.003361359 -0.011326933
## area_mean 1.213375e-03 -0.009385446 -0.026134063 0.032801549
## smoothness_mean -3.224173e-03 -0.009230799 0.003602676 -0.003346255
## compactness_mean 8.169034e-03 0.013992577 0.049349353 0.023765850
## Dim.27 Dim.28 Dim.29 Dim.30
## radius_mean 0.010925793 -8.419566e-03 5.786460e-03 8.101999e-03
## texture_mean 0.001441855 2.623673e-06 -2.882534e-04 3.156545e-06
## perimeter_mean 0.009587447 -3.362272e-03 1.050312e-02 -7.957621e-03
## area_mean -0.038761046 1.086395e-02 -1.156947e-02 -3.800314e-04
## smoothness_mean -0.005789074 -5.897327e-05 -9.398714e-05 -5.591303e-05
## compactness_mean -0.008119890 2.177814e-04 -1.122394e-03 5.152947e-04
head(all_pca_var_4$cos2)
## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5
## radius_mean 0.6364318 0.31125539 0.0002050963 0.003396209 0.0023540715
## texture_mean 0.1428940 0.02028864 0.0117415199 0.720298141 0.0040347193
## perimeter_mean 0.6876316 0.26352690 0.0002444703 0.003491038 0.0023030547
## area_mean 0.6486576 0.30389811 0.0023210397 0.005655066 0.0001759769
## smoothness_mean 0.2700393 0.19713747 0.0306502709 0.050313944 0.2197586899
## compactness_mean 0.7604714 0.13130559 0.0154693024 0.002002220 0.0002258480
## Dim.6 Dim.7 Dim.8 Dim.9
## radius_mean 4.240444e-04 0.0103969827 2.646975e-05 2.075218e-02
## texture_mean 1.250191e-03 0.0000877445 8.138671e-03 5.295045e-03
## perimeter_mean 3.617026e-04 0.0088487573 1.664412e-04 2.086944e-02
## area_mean 4.302527e-06 0.0018015390 5.730172e-04 1.594785e-02
## smoothness_mean 9.901574e-02 0.0133610973 3.980054e-02 1.720819e-05
## compactness_mean 2.410895e-04 0.0006454790 1.092447e-02 1.174424e-02
## Dim.10 Dim.11 Dim.12 Dim.13
## radius_mean 3.197504e-03 5.055010e-04 0.0006810789 3.456581e-05
## texture_mean 2.035748e-02 2.684951e-02 0.0169682252 9.991359e-03
## perimeter_mean 2.617041e-03 8.278339e-05 0.0003957226 4.695968e-04
## area_mean 1.970362e-03 3.567358e-03 0.0011183106 1.095640e-03
## smoothness_mean 1.683846e-03 5.518263e-03 0.0261986970 5.012882e-04
## compactness_mean 5.868689e-05 2.788376e-02 0.0028256473 1.268814e-02
## Dim.14 Dim.15 Dim.16 Dim.17
## radius_mean 5.559683e-04 2.459868e-04 0.001810929 2.445949e-03
## texture_mean 7.298407e-05 1.096413e-03 0.001989709 8.898948e-05
## perimeter_mean 3.695365e-04 1.498859e-04 0.001046179 2.254511e-03
## area_mean 1.841832e-05 1.836333e-05 0.001400992 3.883832e-03
## smoothness_mean 3.110091e-02 1.313922e-03 0.003343582 1.675080e-03
## compactness_mean 1.030410e-05 5.018786e-03 0.002312881 2.449634e-05
## Dim.18 Dim.19 Dim.20 Dim.21
## radius_mean 1.132594e-03 2.513375e-03 7.696239e-05 1.409279e-04
## texture_mean 8.889709e-05 4.390460e-05 1.857159e-03 6.025606e-03
## perimeter_mean 1.318859e-03 2.840305e-03 9.723374e-06 1.458996e-04
## area_mean 3.727802e-03 3.693512e-05 2.531981e-04 1.019542e-05
## smoothness_mean 6.528082e-03 1.339948e-03 9.112344e-06 4.279613e-04
## compactness_mean 3.196517e-06 3.996916e-03 7.441312e-03 1.112084e-03
## Dim.22 Dim.23 Dim.24 Dim.25
## radius_mean 0.0001459399 2.364136e-04 6.018682e-04 5.722780e-06
## texture_mean 0.0002466023 7.497518e-09 1.761957e-04 1.111845e-04
## perimeter_mean 0.0001550079 3.942330e-05 2.456738e-04 1.129874e-05
## area_mean 0.0002611979 1.472279e-06 8.808659e-05 6.829892e-04
## smoothness_mean 0.0001117708 1.039529e-05 8.520764e-05 1.297927e-05
## compactness_mean 0.0002639453 6.673311e-05 1.957922e-04 2.435359e-03
## Dim.26 Dim.27 Dim.28 Dim.29
## radius_mean 1.370911e-04 1.193730e-04 7.088910e-05 3.348312e-05
## texture_mean 4.931360e-06 2.078945e-06 6.883658e-12 8.309002e-08
## perimeter_mean 1.282994e-04 9.191915e-05 1.130487e-05 1.103155e-04
## area_mean 1.075942e-03 1.502419e-03 1.180255e-04 1.338527e-04
## smoothness_mean 1.119742e-05 3.351338e-05 3.477847e-09 8.833583e-09
## compactness_mean 5.648156e-04 6.593262e-05 4.742874e-08 1.259768e-06
## Dim.30
## radius_mean 6.564239e-05
## texture_mean 9.963774e-12
## perimeter_mean 6.332372e-05
## area_mean 1.444238e-07
## smoothness_mean 3.126267e-09
## compactness_mean 2.655286e-07
head(all_pca_var_4$contrib)
## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5
## radius_mean 4.791828 5.4689158 0.007278210 0.1714702 0.14278085
## texture_mean 1.075879 0.3564817 0.416669002 36.3669303 0.24471672
## perimeter_mean 5.177322 4.6303018 0.008675469 0.1762581 0.13968654
## area_mean 4.883878 5.3396446 0.082366279 0.2855170 0.01067348
## smoothness_mean 2.033182 3.4638057 1.087680124 2.5402866 13.32896332
## compactness_mean 5.725748 2.3071061 0.548956087 0.1010895 0.01369829
## Dim.6 Dim.7 Dim.8 Dim.9 Dim.10
## radius_mean 0.0351217226 1.53979162 0.005553672 4.977796701 0.91176608
## texture_mean 0.1035477523 0.01299495 1.707590993 1.270115259 5.80492242
## perimeter_mean 0.0299582265 1.31049967 0.034921362 5.005923564 0.74624745
## area_mean 0.0003563592 0.26680766 0.120225880 3.825388875 0.56184752
## smoothness_mean 8.2010352310 1.97877655 8.350630500 0.004127705 0.48014757
## compactness_mean 0.0199683716 0.09559534 2.292085472 2.817074391 0.01673453
## Dim.11 Dim.12 Dim.13 Dim.14 Dim.15
## radius_mean 0.17198842 0.2607885 0.01432142 0.354098008 0.26131291
## texture_mean 9.13510741 6.4972186 4.13965139 0.046483789 1.16472490
## perimeter_mean 0.02816569 0.1515242 0.19456483 0.235358999 0.15922443
## area_mean 1.21373501 0.4282067 0.45394900 0.011730686 0.01950745
## smoothness_mean 1.87749853 10.0316126 0.20769532 19.808272945 1.39578544
## compactness_mean 9.48699343 1.0819546 5.25699162 0.006562713 5.33147923
## Dim.16 Dim.17 Dim.18 Dim.19 Dim.20
## radius_mean 2.267551 4.1178253 2.152451027 5.07982446 0.24699572
## texture_mean 2.491408 0.1498164 0.168945538 0.08873633 5.96018948
## perimeter_mean 1.309971 3.7955343 2.506441650 5.74058961 0.03120527
## area_mean 1.754248 6.5385437 7.084545992 0.07465020 0.81258978
## smoothness_mean 4.186658 2.8200456 12.406371982 2.70819168 0.02924428
## compactness_mean 2.896068 0.0412403 0.006074859 8.07823486 23.88143277
## Dim.21 Dim.22 Dim.23 Dim.24 Dim.25
## radius_mean 0.47018457 0.5318625 9.712634e-01 3.3335252 0.03696583
## texture_mean 20.10351786 0.8987160 3.080222e-05 0.9758830 0.71818728
## perimeter_mean 0.48677194 0.5649097 1.619636e-01 1.3606960 0.07298326
## area_mean 0.03401545 0.9519081 6.048598e-03 0.4878790 4.41171292
## smoothness_mean 1.42782777 0.4073369 4.270720e-02 0.4719336 0.08383854
## compactness_mean 3.71030023 0.9619207 2.741611e-01 1.0844206 15.73099874
## Dim.26 Dim.27 Dim.28 Dim.29 Dim.30
## radius_mean 1.67641371 1.72992648 4.460291e+00 4.471552424 4.933856e+01
## texture_mean 0.06030297 0.03012762 4.331148e-07 0.011096377 7.489035e-06
## perimeter_mean 1.56890519 1.33207198 7.112943e-01 14.732247365 4.759578e+01
## area_mean 13.15711689 21.77272040 7.426079e+00 17.875554424 1.085528e-01
## smoothness_mean 0.13692728 0.48566854 2.188236e-04 0.001179694 2.349785e-03
## compactness_mean 6.90682942 0.95548094 2.984182e-03 0.168237574 1.995783e-01all_pca_var_5 <- get_pca_var(all_pca_5)
all_pca_var_5
## Principal Component Analysis Results for variables
## ===================================================
## Name Description
## 1 "$coord" "Coordinates for the variables"
## 2 "$cor" "Correlations between variables and dimensions"
## 3 "$cos2" "Cos2 for the variables"
## 4 "$contrib" "contributions of the variables"
head(all_pca_var_5$coord)
## [,1] [,2] [,3] [,4] [,5]
## radius_mean -0.7977668 0.5579027 -0.01432118 0.05827700 -0.04851878
## texture_mean -0.3780132 0.1424382 0.10835829 -0.84870380 0.06351944
## perimeter_mean -0.8292355 0.5133487 -0.01563555 0.05908501 -0.04799015
## area_mean -0.8053928 0.5512695 0.04817717 0.07520017 -0.01326563
## smoothness_mean -0.5196530 -0.4440017 -0.17507219 0.22430770 0.46878427
## compactness_mean -0.8720501 -0.3623611 -0.12437565 0.04474618 -0.01502824
## [,6] [,7] [,8] [,9]
## radius_mean 0.020592339 -0.101965596 0.005144876 -0.144056156
## texture_mean -0.035358035 0.009367203 -0.090214585 0.072767057
## perimeter_mean 0.019018481 -0.094067834 0.012901209 -0.144462575
## area_mean -0.002074253 -0.042444540 -0.023937777 -0.126284788
## smoothness_mean -0.314667668 -0.115590213 0.199500717 0.004148275
## compactness_mean -0.015527056 0.025406278 0.104520200 -0.108370831
## [,10] [,11] [,12] [,13] [,14]
## radius_mean 0.056546476 -0.022483349 0.02609749 0.005879269 0.023578980
## texture_mean 0.142679652 0.163858215 0.13026214 0.099956785 -0.008543071
## perimeter_mean 0.051157023 -0.009098538 0.01989278 0.021670182 0.019223333
## area_mean 0.044388765 -0.059727362 0.03344115 0.033100452 0.004291657
## smoothness_mean -0.041034694 0.074285011 0.16186012 0.022389467 0.176354514
## compactness_mean 0.007660737 0.166984319 -0.05315682 0.112641659 0.003210000
## [,15] [,16] [,17] [,18]
## radius_mean -0.015683967 -0.04255502 0.049456533 0.033654027
## texture_mean -0.033112133 -0.04460615 -0.009433423 -0.009428525
## perimeter_mean -0.012242788 -0.03234470 0.047481690 0.036316100
## area_mean 0.004285246 -0.03742982 0.062320398 0.061055728
## smoothness_mean -0.036248064 -0.05782372 0.040927741 -0.080796547
## compactness_mean 0.070843391 0.04809242 -0.004949378 0.001787881
## [,19] [,20] [,21] [,22]
## radius_mean 0.050133570 -0.008772821 -0.011871307 -0.01208056
## texture_mean 0.006626055 -0.043094773 0.077624778 -0.01570358
## perimeter_mean 0.053294517 -0.003118233 -0.012078892 -0.01245022
## area_mean -0.006077427 -0.015912200 -0.003193026 -0.01616162
## smoothness_mean -0.036605301 0.003018666 -0.020687226 -0.01057217
## compactness_mean 0.063221168 0.086263038 0.033347929 0.01624639
## [,23] [,24] [,25] [,26]
## radius_mean -1.537575e-02 -0.024533003 -0.002392233 -0.011708590
## texture_mean -8.658821e-05 0.013273874 0.010544407 -0.002220667
## perimeter_mean -6.278798e-03 -0.015673984 0.003361359 -0.011326933
## area_mean 1.213375e-03 0.009385446 -0.026134063 0.032801549
## smoothness_mean -3.224173e-03 0.009230799 0.003602676 -0.003346255
## compactness_mean 8.169034e-03 -0.013992577 0.049349353 0.023765850
## [,27] [,28] [,29] [,30]
## radius_mean -0.010925793 8.419566e-03 5.786460e-03 8.101999e-03
## texture_mean -0.001441855 -2.623673e-06 -2.882534e-04 3.156545e-06
## perimeter_mean -0.009587447 3.362272e-03 1.050312e-02 -7.957621e-03
## area_mean 0.038761046 -1.086395e-02 -1.156947e-02 -3.800314e-04
## smoothness_mean 0.005789074 5.897327e-05 -9.398714e-05 -5.591303e-05
## compactness_mean 0.008119890 -2.177814e-04 -1.122394e-03 5.152947e-04
head(all_pca_var_5$cor)
## [,1] [,2] [,3] [,4] [,5]
## radius_mean -0.7977668 0.5579027 -0.01432118 0.05827700 -0.04851878
## texture_mean -0.3780132 0.1424382 0.10835829 -0.84870380 0.06351944
## perimeter_mean -0.8292355 0.5133487 -0.01563555 0.05908501 -0.04799015
## area_mean -0.8053928 0.5512695 0.04817717 0.07520017 -0.01326563
## smoothness_mean -0.5196530 -0.4440017 -0.17507219 0.22430770 0.46878427
## compactness_mean -0.8720501 -0.3623611 -0.12437565 0.04474618 -0.01502824
## [,6] [,7] [,8] [,9]
## radius_mean 0.020592339 -0.101965596 0.005144876 -0.144056156
## texture_mean -0.035358035 0.009367203 -0.090214585 0.072767057
## perimeter_mean 0.019018481 -0.094067834 0.012901209 -0.144462575
## area_mean -0.002074253 -0.042444540 -0.023937777 -0.126284788
## smoothness_mean -0.314667668 -0.115590213 0.199500717 0.004148275
## compactness_mean -0.015527056 0.025406278 0.104520200 -0.108370831
## [,10] [,11] [,12] [,13] [,14]
## radius_mean 0.056546476 -0.022483349 0.02609749 0.005879269 0.023578980
## texture_mean 0.142679652 0.163858215 0.13026214 0.099956785 -0.008543071
## perimeter_mean 0.051157023 -0.009098538 0.01989278 0.021670182 0.019223333
## area_mean 0.044388765 -0.059727362 0.03344115 0.033100452 0.004291657
## smoothness_mean -0.041034694 0.074285011 0.16186012 0.022389467 0.176354514
## compactness_mean 0.007660737 0.166984319 -0.05315682 0.112641659 0.003210000
## [,15] [,16] [,17] [,18]
## radius_mean -0.015683967 -0.04255502 0.049456533 0.033654027
## texture_mean -0.033112133 -0.04460615 -0.009433423 -0.009428525
## perimeter_mean -0.012242788 -0.03234470 0.047481690 0.036316100
## area_mean 0.004285246 -0.03742982 0.062320398 0.061055728
## smoothness_mean -0.036248064 -0.05782372 0.040927741 -0.080796547
## compactness_mean 0.070843391 0.04809242 -0.004949378 0.001787881
## [,19] [,20] [,21] [,22]
## radius_mean 0.050133570 -0.008772821 -0.011871307 -0.01208056
## texture_mean 0.006626055 -0.043094773 0.077624778 -0.01570358
## perimeter_mean 0.053294517 -0.003118233 -0.012078892 -0.01245022
## area_mean -0.006077427 -0.015912200 -0.003193026 -0.01616162
## smoothness_mean -0.036605301 0.003018666 -0.020687226 -0.01057217
## compactness_mean 0.063221168 0.086263038 0.033347929 0.01624639
## [,23] [,24] [,25] [,26]
## radius_mean -1.537575e-02 -0.024533003 -0.002392233 -0.011708590
## texture_mean -8.658821e-05 0.013273874 0.010544407 -0.002220667
## perimeter_mean -6.278798e-03 -0.015673984 0.003361359 -0.011326933
## area_mean 1.213375e-03 0.009385446 -0.026134063 0.032801549
## smoothness_mean -3.224173e-03 0.009230799 0.003602676 -0.003346255
## compactness_mean 8.169034e-03 -0.013992577 0.049349353 0.023765850
## [,27] [,28] [,29] [,30]
## radius_mean -0.010925793 8.419566e-03 5.786460e-03 8.101999e-03
## texture_mean -0.001441855 -2.623673e-06 -2.882534e-04 3.156545e-06
## perimeter_mean -0.009587447 3.362272e-03 1.050312e-02 -7.957621e-03
## area_mean 0.038761046 -1.086395e-02 -1.156947e-02 -3.800314e-04
## smoothness_mean 0.005789074 5.897327e-05 -9.398714e-05 -5.591303e-05
## compactness_mean 0.008119890 -2.177814e-04 -1.122394e-03 5.152947e-04
head(all_pca_var_5$cos2)
## [,1] [,2] [,3] [,4] [,5]
## radius_mean 0.6364318 0.31125539 0.0002050963 0.003396209 0.0023540715
## texture_mean 0.1428940 0.02028864 0.0117415199 0.720298141 0.0040347193
## perimeter_mean 0.6876316 0.26352690 0.0002444703 0.003491038 0.0023030547
## area_mean 0.6486576 0.30389811 0.0023210397 0.005655066 0.0001759769
## smoothness_mean 0.2700393 0.19713747 0.0306502709 0.050313944 0.2197586899
## compactness_mean 0.7604714 0.13130559 0.0154693024 0.002002220 0.0002258480
## [,6] [,7] [,8] [,9]
## radius_mean 4.240444e-04 0.0103969827 2.646975e-05 2.075218e-02
## texture_mean 1.250191e-03 0.0000877445 8.138671e-03 5.295045e-03
## perimeter_mean 3.617026e-04 0.0088487573 1.664412e-04 2.086944e-02
## area_mean 4.302527e-06 0.0018015390 5.730172e-04 1.594785e-02
## smoothness_mean 9.901574e-02 0.0133610973 3.980054e-02 1.720819e-05
## compactness_mean 2.410895e-04 0.0006454790 1.092447e-02 1.174424e-02
## [,10] [,11] [,12] [,13]
## radius_mean 3.197504e-03 5.055010e-04 0.0006810789 3.456581e-05
## texture_mean 2.035748e-02 2.684951e-02 0.0169682252 9.991359e-03
## perimeter_mean 2.617041e-03 8.278339e-05 0.0003957226 4.695968e-04
## area_mean 1.970362e-03 3.567358e-03 0.0011183106 1.095640e-03
## smoothness_mean 1.683846e-03 5.518263e-03 0.0261986970 5.012882e-04
## compactness_mean 5.868689e-05 2.788376e-02 0.0028256473 1.268814e-02
## [,14] [,15] [,16] [,17]
## radius_mean 5.559683e-04 2.459868e-04 0.001810929 2.445949e-03
## texture_mean 7.298407e-05 1.096413e-03 0.001989709 8.898948e-05
## perimeter_mean 3.695365e-04 1.498859e-04 0.001046179 2.254511e-03
## area_mean 1.841832e-05 1.836333e-05 0.001400992 3.883832e-03
## smoothness_mean 3.110091e-02 1.313922e-03 0.003343582 1.675080e-03
## compactness_mean 1.030410e-05 5.018786e-03 0.002312881 2.449634e-05
## [,18] [,19] [,20] [,21]
## radius_mean 1.132594e-03 2.513375e-03 7.696239e-05 1.409279e-04
## texture_mean 8.889709e-05 4.390460e-05 1.857159e-03 6.025606e-03
## perimeter_mean 1.318859e-03 2.840305e-03 9.723374e-06 1.458996e-04
## area_mean 3.727802e-03 3.693512e-05 2.531981e-04 1.019542e-05
## smoothness_mean 6.528082e-03 1.339948e-03 9.112344e-06 4.279613e-04
## compactness_mean 3.196517e-06 3.996916e-03 7.441312e-03 1.112084e-03
## [,22] [,23] [,24] [,25]
## radius_mean 0.0001459399 2.364136e-04 6.018682e-04 5.722780e-06
## texture_mean 0.0002466023 7.497518e-09 1.761957e-04 1.111845e-04
## perimeter_mean 0.0001550079 3.942330e-05 2.456738e-04 1.129874e-05
## area_mean 0.0002611979 1.472279e-06 8.808659e-05 6.829892e-04
## smoothness_mean 0.0001117708 1.039529e-05 8.520764e-05 1.297927e-05
## compactness_mean 0.0002639453 6.673311e-05 1.957922e-04 2.435359e-03
## [,26] [,27] [,28] [,29]
## radius_mean 1.370911e-04 1.193730e-04 7.088910e-05 3.348312e-05
## texture_mean 4.931360e-06 2.078945e-06 6.883658e-12 8.309002e-08
## perimeter_mean 1.282994e-04 9.191915e-05 1.130487e-05 1.103155e-04
## area_mean 1.075942e-03 1.502419e-03 1.180255e-04 1.338527e-04
## smoothness_mean 1.119742e-05 3.351338e-05 3.477847e-09 8.833583e-09
## compactness_mean 5.648156e-04 6.593262e-05 4.742874e-08 1.259768e-06
## [,30]
## radius_mean 6.564239e-05
## texture_mean 9.963774e-12
## perimeter_mean 6.332372e-05
## area_mean 1.444238e-07
## smoothness_mean 3.126267e-09
## compactness_mean 2.655286e-07
head(all_pca_var_5$contrib)
## [,1] [,2] [,3] [,4] [,5]
## radius_mean 4.791828 5.4689158 0.007278210 0.1714702 0.14278085
## texture_mean 1.075879 0.3564817 0.416669002 36.3669303 0.24471672
## perimeter_mean 5.177322 4.6303018 0.008675469 0.1762581 0.13968654
## area_mean 4.883878 5.3396446 0.082366279 0.2855170 0.01067348
## smoothness_mean 2.033182 3.4638057 1.087680124 2.5402866 13.32896332
## compactness_mean 5.725748 2.3071061 0.548956087 0.1010895 0.01369829
## [,6] [,7] [,8] [,9] [,10]
## radius_mean 0.0351217226 1.53979162 0.005553672 4.977796701 0.91176608
## texture_mean 0.1035477523 0.01299495 1.707590993 1.270115259 5.80492242
## perimeter_mean 0.0299582265 1.31049967 0.034921362 5.005923564 0.74624745
## area_mean 0.0003563592 0.26680766 0.120225880 3.825388875 0.56184752
## smoothness_mean 8.2010352310 1.97877655 8.350630500 0.004127705 0.48014757
## compactness_mean 0.0199683716 0.09559534 2.292085472 2.817074391 0.01673453
## [,11] [,12] [,13] [,14] [,15]
## radius_mean 0.17198842 0.2607885 0.01432142 0.354098008 0.26131291
## texture_mean 9.13510741 6.4972186 4.13965139 0.046483789 1.16472490
## perimeter_mean 0.02816569 0.1515242 0.19456483 0.235358999 0.15922443
## area_mean 1.21373501 0.4282067 0.45394900 0.011730686 0.01950745
## smoothness_mean 1.87749853 10.0316126 0.20769532 19.808272945 1.39578544
## compactness_mean 9.48699343 1.0819546 5.25699162 0.006562713 5.33147923
## [,16] [,17] [,18] [,19] [,20]
## radius_mean 2.267551 4.1178253 2.152451027 5.07982446 0.24699572
## texture_mean 2.491408 0.1498164 0.168945538 0.08873633 5.96018948
## perimeter_mean 1.309971 3.7955343 2.506441650 5.74058961 0.03120527
## area_mean 1.754248 6.5385437 7.084545992 0.07465020 0.81258978
## smoothness_mean 4.186658 2.8200456 12.406371982 2.70819168 0.02924428
## compactness_mean 2.896068 0.0412403 0.006074859 8.07823486 23.88143277
## [,21] [,22] [,23] [,24] [,25]
## radius_mean 0.47018457 0.5318625 9.712634e-01 3.3335252 0.03696583
## texture_mean 20.10351786 0.8987160 3.080222e-05 0.9758830 0.71818728
## perimeter_mean 0.48677194 0.5649097 1.619636e-01 1.3606960 0.07298326
## area_mean 0.03401545 0.9519081 6.048598e-03 0.4878790 4.41171292
## smoothness_mean 1.42782777 0.4073369 4.270720e-02 0.4719336 0.08383854
## compactness_mean 3.71030023 0.9619207 2.741611e-01 1.0844206 15.73099874
## [,26] [,27] [,28] [,29] [,30]
## radius_mean 1.67641371 1.72992648 4.460291e+00 4.471552424 4.933856e+01
## texture_mean 0.06030297 0.03012762 4.331148e-07 0.011096377 7.489035e-06
## perimeter_mean 1.56890519 1.33207198 7.112943e-01 14.732247365 4.759578e+01
## area_mean 13.15711689 21.77272040 7.426079e+00 17.875554424 1.085528e-01
## smoothness_mean 0.13692728 0.48566854 2.188236e-04 0.001179694 2.349785e-03
## compactness_mean 6.90682942 0.95548094 2.984182e-03 0.168237574 1.995783e-01Unsurprisingly, each object yields the same output. Let’s now examine the outputs of applying get_pca_ind() to the PCA objects previously created:
all_pca_ind_1 <- get_pca_ind(all_pca_1)
all_pca_ind_1
## Principal Component Analysis Results for individuals
## ===================================================
## Name Description
## 1 "$coord" "Coordinates for the individuals"
## 2 "$cos2" "Cos2 for the individuals"
## 3 "$contrib" "contributions of the individuals"
head(all_pca_ind_1$coord)
## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5 Dim.6 Dim.7
## 1 -9.184755 -1.946870 -1.1221788 3.6305364 1.1940595 1.41018364 2.15747152
## 2 -2.385703 3.764859 -0.5288274 1.1172808 -0.6212284 0.02863116 0.01334635
## 3 -5.728855 1.074229 -0.5512625 0.9112808 0.1769302 0.54097615 -0.66757908
## 4 -7.116691 -10.266556 -3.2299475 0.1524129 2.9582754 3.05073750 1.42865363
## 5 -3.931842 1.946359 1.3885450 2.9380542 -0.5462667 -1.22541641 -0.93538950
## 6 -2.378155 -3.946456 -2.9322967 0.9402096 1.0551135 -0.45064213 0.49001396
## Dim.8 Dim.9 Dim.10 Dim.11 Dim.12 Dim.13
## 1 0.39805698 -0.15698023 -0.8766305 -0.2627243 -0.8582593 0.10329677
## 2 -0.24077660 -0.71127897 1.1060218 -0.8124048 0.1577838 -0.94269981
## 3 -0.09728813 0.02404449 0.4538760 0.6050715 0.1242777 -0.41026561
## 4 -1.05863376 -1.40420412 -1.1159933 1.1505012 1.0104267 -0.93245070
## 5 -0.63581661 -0.26357355 0.3773724 -0.6507870 -0.1104183 0.38760691
## 6 0.16529843 -0.13335576 -0.5299649 -0.1096698 0.0813699 -0.02625135
## Dim.14 Dim.15 Dim.16 Dim.17 Dim.18 Dim.19
## 1 -0.690196797 0.601264078 0.74446075 -0.26523740 -0.54907956 0.1336499
## 2 -0.652900844 -0.008966977 -0.64823831 -0.01719707 0.31801756 -0.2473470
## 3 0.016665095 -0.482994760 0.32482472 0.19075064 -0.08789759 -0.3922812
## 4 -0.486988399 0.168699395 0.05132509 0.48220960 -0.03584323 -0.0267241
## 5 -0.538706543 -0.310046684 -0.15247165 0.13302526 -0.01869779 0.4610302
## 6 0.003133944 -0.178447576 -0.01270566 0.19671335 -0.29727706 -0.1297265
## Dim.20 Dim.21 Dim.22 Dim.23 Dim.24 Dim.25
## 1 0.34526111 0.096430045 -0.06878939 0.08444429 0.175102213 0.150887294
## 2 -0.11403274 -0.077259494 0.09449530 -0.21752666 -0.011280193 0.170360355
## 3 -0.20435242 0.310793246 0.06025601 -0.07422581 -0.102671419 -0.171007656
## 4 -0.46432511 0.433811661 0.20308706 -0.12399554 -0.153294780 -0.077427574
## 5 0.06543782 -0.116442469 0.01763433 0.13933105 0.005327110 -0.003059371
## 6 -0.07117453 -0.002400178 0.10108043 0.03344819 -0.002837749 -0.122282765
## Dim.26 Dim.27 Dim.28 Dim.29 Dim.30
## 1 -0.201326305 -0.25236294 -0.0338846387 0.045607590 0.0471277407
## 2 -0.041092627 0.18111081 0.0325955021 -0.005682424 0.0018662342
## 3 0.004731249 0.04952586 0.0469844833 0.003143131 -0.0007498749
## 4 -0.274982822 0.18330078 0.0424469831 -0.069233868 0.0199198881
## 5 0.039219780 0.03213957 -0.0347556386 0.005033481 -0.0211951203
## 6 -0.030272333 -0.08438081 0.0007296587 -0.019703996 -0.0034564331
head(all_pca_ind_1$cos2)
## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5 Dim.6
## 1 0.7366868 0.03309951 0.010996938 0.1151037016 0.0124508677 1.736597e-02
## 2 0.2164877 0.53913561 0.010637227 0.0474815863 0.0146792249 3.118017e-05
## 3 0.8781764 0.03087731 0.008131356 0.0222203296 0.0008376258 7.830730e-03
## 4 0.2593764 0.53978871 0.053427543 0.0001189646 0.0448178978 4.766328e-02
## 5 0.4498315 0.11023095 0.056101900 0.2511755035 0.0086829482 4.369433e-02
## 6 0.1722700 0.47439927 0.261905853 0.0269264275 0.0339099994 6.185759e-03
## Dim.7 Dim.8 Dim.9 Dim.10 Dim.11 Dim.12
## 1 4.064787e-02 0.0013836880 2.151977e-04 0.006710902 0.0006027652 0.0064325731
## 2 6.775254e-06 0.0022051042 1.924334e-02 0.046529449 0.0251041522 0.0009469458
## 3 1.192481e-02 0.0002532595 1.546953e-05 0.005512144 0.0097962404 0.0004132687
## 4 1.045269e-02 0.0057393901 1.009799e-02 0.006378190 0.0067787311 0.0052285821
## 5 2.545908e-02 0.0117630901 2.021442e-03 0.004143792 0.0123235390 0.0003547641
## 6 7.313854e-03 0.0008322750 5.416927e-04 0.008555071 0.0003663563 0.0002016773
## Dim.13 Dim.14 Dim.15 Dim.16 Dim.17 Dim.18
## 1 9.317968e-05 4.160002e-03 3.157026e-03 4.839843e-03 6.143519e-04 2.632802e-03
## 2 3.380239e-02 1.621418e-02 3.058389e-06 1.598343e-02 1.124889e-05 3.846828e-03
## 3 4.503770e-03 7.431246e-06 6.242102e-03 2.823216e-03 9.735943e-04 2.067283e-04
## 4 4.452727e-03 1.214539e-03 1.457476e-04 1.349067e-05 1.190820e-03 6.579435e-06
## 5 4.371603e-03 8.444271e-03 2.797125e-03 6.764503e-04 5.149036e-04 1.017275e-05
## 6 2.099098e-05 2.991656e-07 9.699530e-04 4.917266e-06 1.178683e-03 2.691858e-03
## Dim.19 Dim.20 Dim.21 Dim.22 Dim.23 Dim.24
## 1 1.559858e-04 0.0010409815 8.120307e-05 4.132289e-05 6.227135e-05 2.677509e-04
## 2 2.327093e-03 0.0004946064 2.270410e-04 3.396417e-04 1.799805e-03 4.839869e-06
## 3 4.117570e-03 0.0011173921 2.584575e-03 9.715089e-05 1.474198e-04 2.820624e-04
## 4 3.657467e-06 0.0011041259 9.637773e-04 2.112218e-04 7.873837e-05 1.203453e-04
## 5 6.184670e-03 0.0001245992 3.945304e-04 9.048481e-06 5.648765e-04 8.257356e-07
## 6 5.126095e-04 0.0001543045 1.754755e-07 3.112171e-04 3.407804e-05 2.452886e-07
## Dim.25 Dim.26 Dim.27 Dim.28 Dim.29 Dim.30
## 1 1.988168e-04 3.539556e-04 5.561589e-04 1.002659e-05 1.816444e-05 1.939550e-05
## 2 1.103920e-03 6.422859e-05 1.247640e-03 4.041252e-05 1.228197e-06 1.324747e-07
## 3 7.824870e-04 5.989596e-07 6.563115e-05 5.906836e-05 2.643449e-07 1.504609e-08
## 4 3.070192e-05 3.872446e-04 1.720691e-04 9.227157e-06 2.454774e-05 2.032114e-06
## 5 2.723465e-07 4.475772e-05 3.005646e-05 3.514862e-05 7.372157e-07 1.307162e-05
## 6 4.554701e-04 2.791394e-05 2.168786e-04 1.621694e-08 1.182600e-05 3.639031e-07
head(all_pca_ind_1$contrib)
## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5 Dim.6
## 1 1.11627772 0.11704315 0.07853779 1.169563151 0.151981235 0.2894699673
## 2 0.07531296 0.43769293 0.01744145 0.110766067 0.041137756 0.0001193246
## 3 0.43428299 0.03563408 0.01895272 0.073686268 0.003336891 0.0425998823
## 4 0.67018289 3.25477984 0.65064717 0.002061226 0.932857403 1.3547583890
## 5 0.20456405 0.11698171 0.12024707 0.765952223 0.031808821 0.2185845940
## 6 0.07483715 0.48093536 0.53625385 0.078438889 0.118668771 0.0295607711
## Dim.7 Dim.8 Dim.9 Dim.10 Dim.11 Dim.12
## 1 1.211525e+00 0.05842632 0.0103884603 0.38511751 0.041272939 0.495696475
## 2 4.636256e-05 0.02137699 0.2132756007 0.61303803 0.394648035 0.016753415
## 3 1.159973e-01 0.00349010 0.0002437206 0.10323681 0.218916436 0.010393583
## 4 5.312467e-01 0.41324685 0.8312309885 0.62414179 0.791478446 0.687050176
## 5 2.277337e-01 0.14906710 0.0292863255 0.07136748 0.253246070 0.008204663
## 6 6.249701e-02 0.01007524 0.0074969532 0.14075191 0.007191827 0.004455602
## Dim.13 Dim.14 Dim.15 Dim.16 Dim.17 Dim.18
## 1 0.0077696323 5.332208e-01 0.6749432895 1.2196263567 0.2081506802 1.006972211
## 2 0.6471035437 4.771508e-01 0.0001501167 0.9247249753 0.0008750187 0.337791822
## 3 0.1225623765 3.108685e-04 0.4355335406 0.2321888338 0.1076565195 0.025804823
## 4 0.6331093156 2.654596e-01 0.0531329362 0.0057969902 0.6879866799 0.004291027
## 5 0.1093981490 3.248371e-01 0.1794696227 0.0511589256 0.0523572209 0.001167690
## 6 0.0005017996 1.099369e-05 0.0594508507 0.0003552528 0.1144922276 0.295168313
## Dim.19 Dim.20 Dim.21 Dim.22 Dim.23 Dim.24
## 1 0.063447786 0.67234773 5.452351e-02 0.030307915 0.05148643 2.984512e-01
## 2 0.217316368 0.07334285 3.499958e-02 0.057191771 0.34164668 1.238577e-03
## 3 0.546605705 0.23553649 5.663726e-01 0.023254867 0.03977976 1.026099e-01
## 4 0.002536795 1.21602625 1.103472e+00 0.264166418 0.11101058 2.287414e-01
## 5 0.754984443 0.02415217 7.950270e-02 0.001991733 0.14016772 2.762316e-04
## 6 0.059777324 0.02857248 3.377895e-05 0.065440591 0.00807788 7.838593e-05
## Dim.25 Dim.26 Dim.27 Dim.28 Dim.29 Dim.30
## 1 0.258455901 0.8710855293 1.62203643 1.269630e-01 0.488196148 2.9338843773
## 2 0.329471749 0.0362900507 0.83540714 1.174862e-01 0.007578573 0.0046006800
## 3 0.331980227 0.0004810733 0.06247028 2.441071e-01 0.002318703 0.0007427924
## 4 0.068056939 1.6250655464 0.85573255 1.992348e-01 1.125012348 0.5241595992
## 5 0.000106254 0.0330575041 0.02630811 1.335740e-01 0.005946440 0.5934191099
## 6 0.169750714 0.0196947983 0.18134133 5.887231e-05 0.091123153 0.0157814195all_pca_ind_2 <- get_pca_ind(all_pca_2)
all_pca_ind_2
## Principal Component Analysis Results for individuals
## ===================================================
## Name Description
## 1 "$coord" "Coordinates for the individuals"
## 2 "$cos2" "Cos2 for the individuals"
## 3 "$contrib" "contributions of the individuals"
head(all_pca_ind_2$coord)
## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5 Dim.6 Dim.7
## 1 9.192837 -1.948583 1.1231662 3.6337309 -1.1951101 1.41142445 -2.15936987
## 2 2.387802 3.768172 0.5292927 1.1182639 0.6217750 0.02865635 -0.01335809
## 3 5.733896 1.075174 0.5517476 0.9120827 -0.1770859 0.54145215 0.66816648
## 4 7.122953 -10.275589 3.2327895 0.1525470 -2.9608784 3.05342182 -1.42991070
## 5 3.935302 1.948072 -1.3897667 2.9406393 0.5467474 -1.22649464 0.93621255
## 6 2.380247 -3.949929 2.9348768 0.9410369 -1.0560419 -0.45103865 -0.49044512
## Dim.8 Dim.9 Dim.10 Dim.11 Dim.12 Dim.13
## 1 0.39840723 0.15711836 -0.8774019 0.2629555 -0.8590145 0.10338766
## 2 -0.24098846 0.71190482 1.1069949 0.8131197 0.1579226 -0.94352928
## 3 -0.09737374 -0.02406564 0.4542754 -0.6056039 0.1243871 -0.41062660
## 4 -1.05956524 1.40543967 -1.1169753 -1.1515135 1.0113158 -0.93327116
## 5 -0.63637606 0.26380546 0.3777045 0.6513596 -0.1105154 0.38794797
## 6 0.16544388 0.13347310 -0.5304312 0.1097663 0.0814415 -0.02627445
## Dim.14 Dim.15 Dim.16 Dim.17 Dim.18 Dim.19
## 1 -0.690804097 -0.601793127 -0.74511579 -0.2654708 -0.54956269 0.13376750
## 2 -0.653475327 0.008974867 0.64880869 -0.0172122 0.31829738 -0.24756463
## 3 0.016679759 0.483419744 -0.32511053 0.1909185 -0.08797493 -0.39262636
## 4 -0.487416897 -0.168847832 -0.05137025 0.4826339 -0.03587477 -0.02674762
## 5 -0.539180548 0.310319492 0.15260581 0.1331423 -0.01871424 0.46143590
## 6 0.003136701 0.178604591 0.01271684 0.1968864 -0.29753864 -0.12984063
## Dim.20 Dim.21 Dim.22 Dim.23 Dim.24 Dim.25
## 1 -0.34556490 -0.09651489 0.06884992 -0.08451859 -0.175256284 -0.151020059
## 2 0.11413308 0.07732747 -0.09457845 0.21771806 0.011290118 -0.170510254
## 3 0.20453223 -0.31106671 -0.06030903 0.07429112 0.102761759 0.171158125
## 4 0.46473366 -0.43419337 -0.20326576 0.12410465 0.153429663 0.077495702
## 5 -0.06549539 0.11654493 -0.01764985 -0.13945364 -0.005331797 0.003062062
## 6 0.07123716 0.00240229 -0.10116937 -0.03347762 0.002840246 0.122390361
## Dim.26 Dim.27 Dim.28 Dim.29 Dim.30
## 1 0.201503451 0.25258499 -0.0339144536 0.045647720 0.0471692081
## 2 0.041128785 -0.18127017 0.0326241827 -0.005687424 0.0018678763
## 3 -0.004735412 -0.04956943 0.0470258247 0.003145897 -0.0007505348
## 4 0.275224778 -0.18346206 0.0424843320 -0.069294786 0.0199374155
## 5 -0.039254289 -0.03216785 -0.0347862199 0.005037910 -0.0212137698
## 6 0.030298969 0.08445505 0.0007303007 -0.019721334 -0.0034594744
head(all_pca_ind_2$cos2)
## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5 Dim.6
## 1 0.7366868 0.03309951 0.010996938 0.1151037016 0.0124508677 1.736597e-02
## 2 0.2164877 0.53913561 0.010637227 0.0474815863 0.0146792249 3.118017e-05
## 3 0.8781764 0.03087731 0.008131356 0.0222203296 0.0008376258 7.830730e-03
## 4 0.2593764 0.53978871 0.053427543 0.0001189646 0.0448178978 4.766328e-02
## 5 0.4498315 0.11023095 0.056101900 0.2511755035 0.0086829482 4.369433e-02
## 6 0.1722700 0.47439927 0.261905853 0.0269264275 0.0339099994 6.185759e-03
## Dim.7 Dim.8 Dim.9 Dim.10 Dim.11 Dim.12
## 1 4.064787e-02 0.0013836880 2.151977e-04 0.006710902 0.0006027652 0.0064325731
## 2 6.775254e-06 0.0022051042 1.924334e-02 0.046529449 0.0251041522 0.0009469458
## 3 1.192481e-02 0.0002532595 1.546953e-05 0.005512144 0.0097962404 0.0004132687
## 4 1.045269e-02 0.0057393901 1.009799e-02 0.006378190 0.0067787311 0.0052285821
## 5 2.545908e-02 0.0117630901 2.021442e-03 0.004143792 0.0123235390 0.0003547641
## 6 7.313854e-03 0.0008322750 5.416927e-04 0.008555071 0.0003663563 0.0002016773
## Dim.13 Dim.14 Dim.15 Dim.16 Dim.17 Dim.18
## 1 9.317968e-05 4.160002e-03 3.157026e-03 4.839843e-03 6.143519e-04 2.632802e-03
## 2 3.380239e-02 1.621418e-02 3.058389e-06 1.598343e-02 1.124889e-05 3.846828e-03
## 3 4.503770e-03 7.431246e-06 6.242102e-03 2.823216e-03 9.735943e-04 2.067283e-04
## 4 4.452727e-03 1.214539e-03 1.457476e-04 1.349067e-05 1.190820e-03 6.579435e-06
## 5 4.371603e-03 8.444271e-03 2.797125e-03 6.764503e-04 5.149036e-04 1.017275e-05
## 6 2.099098e-05 2.991656e-07 9.699530e-04 4.917266e-06 1.178683e-03 2.691858e-03
## Dim.19 Dim.20 Dim.21 Dim.22 Dim.23 Dim.24
## 1 1.559858e-04 0.0010409815 8.120307e-05 4.132289e-05 6.227135e-05 2.677509e-04
## 2 2.327093e-03 0.0004946064 2.270410e-04 3.396417e-04 1.799805e-03 4.839869e-06
## 3 4.117570e-03 0.0011173921 2.584575e-03 9.715089e-05 1.474198e-04 2.820624e-04
## 4 3.657467e-06 0.0011041259 9.637773e-04 2.112218e-04 7.873837e-05 1.203453e-04
## 5 6.184670e-03 0.0001245992 3.945304e-04 9.048481e-06 5.648765e-04 8.257356e-07
## 6 5.126095e-04 0.0001543045 1.754755e-07 3.112171e-04 3.407804e-05 2.452886e-07
## Dim.25 Dim.26 Dim.27 Dim.28 Dim.29 Dim.30
## 1 1.988168e-04 3.539556e-04 5.561589e-04 1.002659e-05 1.816444e-05 1.939550e-05
## 2 1.103920e-03 6.422859e-05 1.247640e-03 4.041252e-05 1.228197e-06 1.324747e-07
## 3 7.824870e-04 5.989596e-07 6.563115e-05 5.906836e-05 2.643449e-07 1.504609e-08
## 4 3.070192e-05 3.872446e-04 1.720691e-04 9.227157e-06 2.454774e-05 2.032114e-06
## 5 2.723465e-07 4.475772e-05 3.005646e-05 3.514862e-05 7.372157e-07 1.307162e-05
## 6 4.554701e-04 2.791394e-05 2.168786e-04 1.621694e-08 1.182600e-05 3.639031e-07
head(all_pca_ind_2$contrib)
## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5 Dim.6
## 1 1.11824300 0.11724921 0.07867607 1.171622241 0.152248808 0.2899795975
## 2 0.07544555 0.43846352 0.01747215 0.110961077 0.041210182 0.0001195347
## 3 0.43504757 0.03569681 0.01898609 0.073815998 0.003342766 0.0426748821
## 4 0.67136279 3.26051009 0.65179267 0.002064855 0.934499758 1.3571435270
## 5 0.20492419 0.11718767 0.12045877 0.767300731 0.031864823 0.2189694260
## 6 0.07496891 0.48178208 0.53719796 0.078576986 0.118877695 0.0296128148
## Dim.7 Dim.8 Dim.9 Dim.10 Dim.11 Dim.12
## 1 1.213658e+00 0.058529188 0.0104067498 0.38579553 0.041345602 0.496569180
## 2 4.644418e-05 0.021414629 0.2136510859 0.61411732 0.395342837 0.016782911
## 3 1.162015e-01 0.003496244 0.0002441497 0.10341857 0.219301852 0.010411881
## 4 5.321820e-01 0.413974398 0.8326944234 0.62524063 0.792871893 0.688259771
## 5 2.281347e-01 0.149329546 0.0293378859 0.07149313 0.253691925 0.008219107
## 6 6.260704e-02 0.010092981 0.0075101521 0.14099971 0.007204489 0.004463446
## Dim.13 Dim.14 Dim.15 Dim.16 Dim.17 Dim.18
## 1 0.007783311 5.341595e-01 0.6761315700 1.2217735862 0.2085171427 1.008745049
## 2 0.648242810 4.779908e-01 0.0001503809 0.9263530122 0.0008765592 0.338386526
## 3 0.122778155 3.114159e-04 0.4363003250 0.2325976169 0.1078460556 0.025850254
## 4 0.634223945 2.659269e-01 0.0532264801 0.0058071962 0.6891979240 0.004298581
## 5 0.109590751 3.254090e-01 0.1797855904 0.0512489941 0.0524493991 0.001169745
## 6 0.000502683 1.101305e-05 0.0595555177 0.0003558783 0.1146937985 0.295687975
## Dim.19 Dim.20 Dim.21 Dim.22 Dim.23 Dim.24
## 1 0.063559490 0.67353144 5.461950e-02 0.030361274 0.051577078 2.989766e-01
## 2 0.217698967 0.07347197 3.506119e-02 0.057292461 0.342248175 1.240758e-03
## 3 0.547568039 0.23595117 5.673697e-01 0.023295809 0.039849794 1.027905e-01
## 4 0.002541261 1.21816714 1.105415e+00 0.264631499 0.111206018 2.291441e-01
## 5 0.756313641 0.02419469 7.964267e-02 0.001995239 0.140414495 2.767180e-04
## 6 0.059882566 0.02862278 3.383842e-05 0.065555803 0.008092102 7.852393e-05
## Dim.25 Dim.26 Dim.27 Dim.28 Dim.29 Dim.30
## 1 0.2589109285 0.8726191306 1.62489212 1.271865e-01 0.489055649 2.9390496668
## 2 0.3300518050 0.0363539416 0.83687792 1.176930e-01 0.007591915 0.0046087798
## 3 0.3325646996 0.0004819203 0.06258026 2.445369e-01 0.002322785 0.0007441001
## 4 0.0681767579 1.6279265773 0.85723913 1.995855e-01 1.126993004 0.5250824154
## 5 0.0001064411 0.0331157040 0.02635442 1.338092e-01 0.005956909 0.5944638618
## 6 0.1700495704 0.0197294722 0.18166059 5.897596e-05 0.091283581 0.0158092037all_pca_ind_3 <- get_pca_ind(all_pca_3)
all_pca_ind_3
## Principal Component Analysis Results for individuals
## ===================================================
## Name Description
## 1 "$coord" "Coordinates for the individuals"
## 2 "$cos2" "Cos2 for the individuals"
## 3 "$contrib" "contributions of the individuals"
head(all_pca_ind_3$coord)
## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5 Dim.6 Dim.7
## 1 9.192837 1.948583 -1.1231662 -3.6337309 1.1951101 -1.41142445 -2.15936987
## 2 2.387802 -3.768172 -0.5292927 -1.1182639 -0.6217750 -0.02865635 -0.01335809
## 3 5.733896 -1.075174 -0.5517476 -0.9120827 0.1770859 -0.54145215 0.66816648
## 4 7.122953 10.275589 -3.2327895 -0.1525470 2.9608784 -3.05342182 -1.42991070
## 5 3.935302 -1.948072 1.3897667 -2.9406393 -0.5467474 1.22649464 0.93621255
## 6 2.380247 3.949929 -2.9348768 -0.9410369 1.0560419 0.45103865 -0.49044512
## Dim.8 Dim.9 Dim.10 Dim.11 Dim.12 Dim.13
## 1 -0.39840723 0.15711836 -0.8774019 0.2629555 -0.8590145 -0.10338766
## 2 0.24098846 0.71190482 1.1069949 0.8131197 0.1579226 0.94352928
## 3 0.09737374 -0.02406564 0.4542754 -0.6056039 0.1243871 0.41062660
## 4 1.05956524 1.40543967 -1.1169753 -1.1515135 1.0113158 0.93327116
## 5 0.63637606 0.26380546 0.3777045 0.6513596 -0.1105154 -0.38794797
## 6 -0.16544388 0.13347310 -0.5304312 0.1097663 0.0814415 0.02627445
## Dim.14 Dim.15 Dim.16 Dim.17 Dim.18 Dim.19
## 1 0.690804097 0.601793127 -0.74511579 -0.2654708 -0.54956269 0.13376750
## 2 0.653475327 -0.008974867 0.64880869 -0.0172122 0.31829738 -0.24756463
## 3 -0.016679759 -0.483419744 -0.32511053 0.1909185 -0.08797493 -0.39262636
## 4 0.487416897 0.168847832 -0.05137025 0.4826339 -0.03587477 -0.02674762
## 5 0.539180548 -0.310319492 0.15260581 0.1331423 -0.01871424 0.46143590
## 6 -0.003136701 -0.178604591 0.01271684 0.1968864 -0.29753864 -0.12984063
## Dim.20 Dim.21 Dim.22 Dim.23 Dim.24 Dim.25
## 1 -0.34556490 -0.09651489 0.06884992 -0.08451859 -0.175256284 -0.151020059
## 2 0.11413308 0.07732747 -0.09457845 0.21771806 0.011290118 -0.170510254
## 3 0.20453223 -0.31106671 -0.06030903 0.07429112 0.102761759 0.171158125
## 4 0.46473366 -0.43419337 -0.20326576 0.12410465 0.153429663 0.077495702
## 5 -0.06549539 0.11654493 -0.01764985 -0.13945364 -0.005331797 0.003062062
## 6 0.07123716 0.00240229 -0.10116937 -0.03347762 0.002840246 0.122390361
## Dim.26 Dim.27 Dim.28 Dim.29 Dim.30
## 1 0.201503451 0.25258499 -0.0339144536 0.045647720 0.0471692081
## 2 0.041128785 -0.18127017 0.0326241827 -0.005687424 0.0018678763
## 3 -0.004735412 -0.04956943 0.0470258247 0.003145897 -0.0007505348
## 4 0.275224778 -0.18346206 0.0424843320 -0.069294786 0.0199374155
## 5 -0.039254289 -0.03216785 -0.0347862199 0.005037910 -0.0212137698
## 6 0.030298969 0.08445505 0.0007303007 -0.019721334 -0.0034594744
head(all_pca_ind_3$cos2)
## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5 Dim.6
## 1 0.7366868 0.03309951 0.010996938 0.1151037016 0.0124508677 1.736597e-02
## 2 0.2164877 0.53913561 0.010637227 0.0474815863 0.0146792249 3.118017e-05
## 3 0.8781764 0.03087731 0.008131356 0.0222203296 0.0008376258 7.830730e-03
## 4 0.2593764 0.53978871 0.053427543 0.0001189646 0.0448178978 4.766328e-02
## 5 0.4498315 0.11023095 0.056101900 0.2511755035 0.0086829482 4.369433e-02
## 6 0.1722700 0.47439927 0.261905853 0.0269264275 0.0339099994 6.185759e-03
## Dim.7 Dim.8 Dim.9 Dim.10 Dim.11 Dim.12
## 1 4.064787e-02 0.0013836880 2.151977e-04 0.006710902 0.0006027652 0.0064325731
## 2 6.775254e-06 0.0022051042 1.924334e-02 0.046529449 0.0251041522 0.0009469458
## 3 1.192481e-02 0.0002532595 1.546953e-05 0.005512144 0.0097962404 0.0004132687
## 4 1.045269e-02 0.0057393901 1.009799e-02 0.006378190 0.0067787311 0.0052285821
## 5 2.545908e-02 0.0117630901 2.021442e-03 0.004143792 0.0123235390 0.0003547641
## 6 7.313854e-03 0.0008322750 5.416927e-04 0.008555071 0.0003663563 0.0002016773
## Dim.13 Dim.14 Dim.15 Dim.16 Dim.17 Dim.18
## 1 9.317968e-05 4.160002e-03 3.157026e-03 4.839843e-03 6.143519e-04 2.632802e-03
## 2 3.380239e-02 1.621418e-02 3.058389e-06 1.598343e-02 1.124889e-05 3.846828e-03
## 3 4.503770e-03 7.431246e-06 6.242102e-03 2.823216e-03 9.735943e-04 2.067283e-04
## 4 4.452727e-03 1.214539e-03 1.457476e-04 1.349067e-05 1.190820e-03 6.579435e-06
## 5 4.371603e-03 8.444271e-03 2.797125e-03 6.764503e-04 5.149036e-04 1.017275e-05
## 6 2.099098e-05 2.991656e-07 9.699530e-04 4.917266e-06 1.178683e-03 2.691858e-03
## Dim.19 Dim.20 Dim.21 Dim.22 Dim.23 Dim.24
## 1 1.559858e-04 0.0010409815 8.120307e-05 4.132289e-05 6.227135e-05 2.677509e-04
## 2 2.327093e-03 0.0004946064 2.270410e-04 3.396417e-04 1.799805e-03 4.839869e-06
## 3 4.117570e-03 0.0011173921 2.584575e-03 9.715089e-05 1.474198e-04 2.820624e-04
## 4 3.657467e-06 0.0011041259 9.637773e-04 2.112218e-04 7.873837e-05 1.203453e-04
## 5 6.184670e-03 0.0001245992 3.945304e-04 9.048481e-06 5.648765e-04 8.257356e-07
## 6 5.126095e-04 0.0001543045 1.754755e-07 3.112171e-04 3.407804e-05 2.452886e-07
## Dim.25 Dim.26 Dim.27 Dim.28 Dim.29 Dim.30
## 1 1.988168e-04 3.539556e-04 5.561589e-04 1.002659e-05 1.816444e-05 1.939550e-05
## 2 1.103920e-03 6.422859e-05 1.247640e-03 4.041252e-05 1.228197e-06 1.324747e-07
## 3 7.824870e-04 5.989596e-07 6.563115e-05 5.906836e-05 2.643449e-07 1.504609e-08
## 4 3.070192e-05 3.872446e-04 1.720691e-04 9.227157e-06 2.454774e-05 2.032114e-06
## 5 2.723465e-07 4.475772e-05 3.005646e-05 3.514862e-05 7.372157e-07 1.307162e-05
## 6 4.554701e-04 2.791394e-05 2.168786e-04 1.621694e-08 1.182600e-05 3.639031e-07
head(all_pca_ind_3$contrib)
## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5 Dim.6
## 1 1.11824300 0.11724921 0.07867607 1.171622241 0.152248808 0.2899795975
## 2 0.07544555 0.43846352 0.01747215 0.110961077 0.041210182 0.0001195347
## 3 0.43504757 0.03569681 0.01898609 0.073815998 0.003342766 0.0426748821
## 4 0.67136279 3.26051009 0.65179267 0.002064855 0.934499758 1.3571435270
## 5 0.20492419 0.11718767 0.12045877 0.767300731 0.031864823 0.2189694260
## 6 0.07496891 0.48178208 0.53719796 0.078576986 0.118877695 0.0296128148
## Dim.7 Dim.8 Dim.9 Dim.10 Dim.11 Dim.12
## 1 1.213658e+00 0.058529188 0.0104067498 0.38579553 0.041345602 0.496569180
## 2 4.644418e-05 0.021414629 0.2136510859 0.61411732 0.395342837 0.016782911
## 3 1.162015e-01 0.003496244 0.0002441497 0.10341857 0.219301852 0.010411881
## 4 5.321820e-01 0.413974398 0.8326944234 0.62524063 0.792871893 0.688259771
## 5 2.281347e-01 0.149329546 0.0293378859 0.07149313 0.253691925 0.008219107
## 6 6.260704e-02 0.010092981 0.0075101521 0.14099971 0.007204489 0.004463446
## Dim.13 Dim.14 Dim.15 Dim.16 Dim.17 Dim.18
## 1 0.007783311 5.341595e-01 0.6761315700 1.2217735862 0.2085171427 1.008745049
## 2 0.648242810 4.779908e-01 0.0001503809 0.9263530122 0.0008765592 0.338386526
## 3 0.122778155 3.114159e-04 0.4363003250 0.2325976169 0.1078460556 0.025850254
## 4 0.634223945 2.659269e-01 0.0532264801 0.0058071962 0.6891979240 0.004298581
## 5 0.109590751 3.254090e-01 0.1797855904 0.0512489941 0.0524493991 0.001169745
## 6 0.000502683 1.101305e-05 0.0595555177 0.0003558783 0.1146937985 0.295687975
## Dim.19 Dim.20 Dim.21 Dim.22 Dim.23 Dim.24
## 1 0.063559490 0.67353144 5.461950e-02 0.030361274 0.051577078 2.989766e-01
## 2 0.217698967 0.07347197 3.506119e-02 0.057292461 0.342248175 1.240758e-03
## 3 0.547568039 0.23595117 5.673697e-01 0.023295809 0.039849794 1.027905e-01
## 4 0.002541261 1.21816714 1.105415e+00 0.264631499 0.111206018 2.291441e-01
## 5 0.756313641 0.02419469 7.964267e-02 0.001995239 0.140414495 2.767180e-04
## 6 0.059882566 0.02862278 3.383842e-05 0.065555803 0.008092102 7.852393e-05
## Dim.25 Dim.26 Dim.27 Dim.28 Dim.29 Dim.30
## 1 0.2589109285 0.8726191306 1.62489212 1.271865e-01 0.489055649 2.9390496667
## 2 0.3300518050 0.0363539416 0.83687792 1.176930e-01 0.007591915 0.0046087798
## 3 0.3325646996 0.0004819203 0.06258026 2.445369e-01 0.002322785 0.0007441001
## 4 0.0681767579 1.6279265773 0.85723913 1.995855e-01 1.126993004 0.5250824154
## 5 0.0001064411 0.0331157040 0.02635442 1.338092e-01 0.005956909 0.5944638618
## 6 0.1700495704 0.0197294722 0.18166059 5.897596e-05 0.091283581 0.0158092037all_pca_ind_4 <- get_pca_ind(all_pca_4)
all_pca_ind_4
## Principal Component Analysis Results for individuals
## ===================================================
## Name Description
## 1 "$coord" "Coordinates for the individuals"
## 2 "$cos2" "Cos2 for the individuals"
## 3 "$contrib" "contributions of the individuals"
head(all_pca_ind_4$coord)
## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5 Dim.6 Dim.7
## 1 -9.192837 1.948583 -1.1231662 3.6337309 -1.1951101 1.41142445 2.15936987
## 2 -2.387802 -3.768172 -0.5292927 1.1182639 0.6217750 0.02865635 0.01335809
## 3 -5.733896 -1.075174 -0.5517476 0.9120827 -0.1770859 0.54145215 -0.66816648
## 4 -7.122953 10.275589 -3.2327895 0.1525470 -2.9608784 3.05342182 1.42991070
## 5 -3.935302 -1.948072 1.3897667 2.9406393 0.5467474 -1.22649464 -0.93621255
## 6 -2.380247 3.949929 -2.9348768 0.9410369 -1.0560419 -0.45103865 0.49044512
## Dim.8 Dim.9 Dim.10 Dim.11 Dim.12 Dim.13
## 1 0.39840723 -0.15711836 -0.8774019 0.2629555 -0.8590145 0.10338766
## 2 -0.24098846 -0.71190482 1.1069949 0.8131197 0.1579226 -0.94352928
## 3 -0.09737374 0.02406564 0.4542754 -0.6056039 0.1243871 -0.41062660
## 4 -1.05956524 -1.40543967 -1.1169753 -1.1515135 1.0113158 -0.93327116
## 5 -0.63637606 -0.26380546 0.3777045 0.6513596 -0.1105154 0.38794797
## 6 0.16544388 -0.13347310 -0.5304312 0.1097663 0.0814415 -0.02627445
## Dim.14 Dim.15 Dim.16 Dim.17 Dim.18 Dim.19
## 1 0.690804097 -0.601793127 -0.74511579 0.2654708 0.54956269 0.13376750
## 2 0.653475327 0.008974867 0.64880869 0.0172122 -0.31829738 -0.24756463
## 3 -0.016679759 0.483419744 -0.32511053 -0.1909185 0.08797493 -0.39262636
## 4 0.487416897 -0.168847832 -0.05137025 -0.4826339 0.03587477 -0.02674762
## 5 0.539180548 0.310319492 0.15260581 -0.1331423 0.01871424 0.46143590
## 6 -0.003136701 0.178604591 0.01271684 -0.1968864 0.29753864 -0.12984063
## Dim.20 Dim.21 Dim.22 Dim.23 Dim.24 Dim.25
## 1 0.34556490 0.09651489 0.06884992 0.08451859 -0.175256284 0.151020059
## 2 -0.11413308 -0.07732747 -0.09457845 -0.21771806 0.011290118 0.170510254
## 3 -0.20453223 0.31106671 -0.06030903 -0.07429112 0.102761759 -0.171158125
## 4 -0.46473366 0.43419337 -0.20326576 -0.12410465 0.153429663 -0.077495702
## 5 0.06549539 -0.11654493 -0.01764985 0.13945364 -0.005331797 -0.003062062
## 6 -0.07123716 -0.00240229 -0.10116937 0.03347762 0.002840246 -0.122390361
## Dim.26 Dim.27 Dim.28 Dim.29 Dim.30
## 1 -0.201503451 0.25258499 0.0339144536 0.045647720 0.0471692081
## 2 -0.041128785 -0.18127017 -0.0326241827 -0.005687424 0.0018678763
## 3 0.004735412 -0.04956943 -0.0470258247 0.003145897 -0.0007505348
## 4 -0.275224778 -0.18346206 -0.0424843320 -0.069294786 0.0199374155
## 5 0.039254289 -0.03216785 0.0347862199 0.005037910 -0.0212137698
## 6 -0.030298969 0.08445505 -0.0007303007 -0.019721334 -0.0034594744
head(all_pca_ind_4$cos2)
## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5 Dim.6
## 1 0.7366868 0.03309951 0.010996938 0.1151037016 0.0124508677 1.736597e-02
## 2 0.2164877 0.53913561 0.010637227 0.0474815863 0.0146792249 3.118017e-05
## 3 0.8781764 0.03087731 0.008131356 0.0222203296 0.0008376258 7.830730e-03
## 4 0.2593764 0.53978871 0.053427543 0.0001189646 0.0448178978 4.766328e-02
## 5 0.4498315 0.11023095 0.056101900 0.2511755035 0.0086829482 4.369433e-02
## 6 0.1722700 0.47439927 0.261905853 0.0269264275 0.0339099994 6.185759e-03
## Dim.7 Dim.8 Dim.9 Dim.10 Dim.11 Dim.12
## 1 4.064787e-02 0.0013836880 2.151977e-04 0.006710902 0.0006027652 0.0064325731
## 2 6.775254e-06 0.0022051042 1.924334e-02 0.046529449 0.0251041522 0.0009469458
## 3 1.192481e-02 0.0002532595 1.546953e-05 0.005512144 0.0097962404 0.0004132687
## 4 1.045269e-02 0.0057393901 1.009799e-02 0.006378190 0.0067787311 0.0052285821
## 5 2.545908e-02 0.0117630901 2.021442e-03 0.004143792 0.0123235390 0.0003547641
## 6 7.313854e-03 0.0008322750 5.416927e-04 0.008555071 0.0003663563 0.0002016773
## Dim.13 Dim.14 Dim.15 Dim.16 Dim.17 Dim.18
## 1 9.317968e-05 4.160002e-03 3.157026e-03 4.839843e-03 6.143519e-04 2.632802e-03
## 2 3.380239e-02 1.621418e-02 3.058389e-06 1.598343e-02 1.124889e-05 3.846828e-03
## 3 4.503770e-03 7.431246e-06 6.242102e-03 2.823216e-03 9.735943e-04 2.067283e-04
## 4 4.452727e-03 1.214539e-03 1.457476e-04 1.349067e-05 1.190820e-03 6.579435e-06
## 5 4.371603e-03 8.444271e-03 2.797125e-03 6.764503e-04 5.149036e-04 1.017275e-05
## 6 2.099098e-05 2.991656e-07 9.699530e-04 4.917266e-06 1.178683e-03 2.691858e-03
## Dim.19 Dim.20 Dim.21 Dim.22 Dim.23 Dim.24
## 1 1.559858e-04 0.0010409815 8.120307e-05 4.132289e-05 6.227135e-05 2.677509e-04
## 2 2.327093e-03 0.0004946064 2.270410e-04 3.396417e-04 1.799805e-03 4.839869e-06
## 3 4.117570e-03 0.0011173921 2.584575e-03 9.715089e-05 1.474198e-04 2.820624e-04
## 4 3.657467e-06 0.0011041259 9.637773e-04 2.112218e-04 7.873837e-05 1.203453e-04
## 5 6.184670e-03 0.0001245992 3.945304e-04 9.048481e-06 5.648765e-04 8.257356e-07
## 6 5.126095e-04 0.0001543045 1.754755e-07 3.112171e-04 3.407804e-05 2.452886e-07
## Dim.25 Dim.26 Dim.27 Dim.28 Dim.29 Dim.30
## 1 1.988168e-04 3.539556e-04 5.561589e-04 1.002659e-05 1.816444e-05 1.939550e-05
## 2 1.103920e-03 6.422859e-05 1.247640e-03 4.041252e-05 1.228197e-06 1.324747e-07
## 3 7.824870e-04 5.989596e-07 6.563115e-05 5.906836e-05 2.643449e-07 1.504609e-08
## 4 3.070192e-05 3.872446e-04 1.720691e-04 9.227157e-06 2.454774e-05 2.032114e-06
## 5 2.723465e-07 4.475772e-05 3.005646e-05 3.514862e-05 7.372157e-07 1.307162e-05
## 6 4.554701e-04 2.791394e-05 2.168786e-04 1.621694e-08 1.182600e-05 3.639031e-07
head(all_pca_ind_4$contrib)
## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5 Dim.6
## 1 1.11824300 0.11724921 0.07867607 1.171622241 0.152248808 0.2899795975
## 2 0.07544555 0.43846352 0.01747215 0.110961077 0.041210182 0.0001195347
## 3 0.43504757 0.03569681 0.01898609 0.073815998 0.003342766 0.0426748821
## 4 0.67136279 3.26051009 0.65179267 0.002064855 0.934499758 1.3571435270
## 5 0.20492419 0.11718767 0.12045877 0.767300731 0.031864823 0.2189694260
## 6 0.07496891 0.48178208 0.53719796 0.078576986 0.118877695 0.0296128148
## Dim.7 Dim.8 Dim.9 Dim.10 Dim.11 Dim.12
## 1 1.213658e+00 0.058529188 0.0104067498 0.38579553 0.041345602 0.496569180
## 2 4.644418e-05 0.021414629 0.2136510859 0.61411732 0.395342837 0.016782911
## 3 1.162015e-01 0.003496244 0.0002441497 0.10341857 0.219301852 0.010411881
## 4 5.321820e-01 0.413974398 0.8326944234 0.62524063 0.792871893 0.688259771
## 5 2.281347e-01 0.149329546 0.0293378859 0.07149313 0.253691925 0.008219107
## 6 6.260704e-02 0.010092981 0.0075101521 0.14099971 0.007204489 0.004463446
## Dim.13 Dim.14 Dim.15 Dim.16 Dim.17 Dim.18
## 1 0.007783311 5.341595e-01 0.6761315700 1.2217735862 0.2085171427 1.008745049
## 2 0.648242810 4.779908e-01 0.0001503809 0.9263530122 0.0008765592 0.338386526
## 3 0.122778155 3.114159e-04 0.4363003250 0.2325976169 0.1078460556 0.025850254
## 4 0.634223945 2.659269e-01 0.0532264801 0.0058071962 0.6891979240 0.004298581
## 5 0.109590751 3.254090e-01 0.1797855904 0.0512489941 0.0524493991 0.001169745
## 6 0.000502683 1.101305e-05 0.0595555177 0.0003558783 0.1146937985 0.295687975
## Dim.19 Dim.20 Dim.21 Dim.22 Dim.23 Dim.24
## 1 0.063559490 0.67353144 5.461950e-02 0.030361274 0.051577078 2.989766e-01
## 2 0.217698967 0.07347197 3.506119e-02 0.057292461 0.342248175 1.240758e-03
## 3 0.547568039 0.23595117 5.673697e-01 0.023295809 0.039849794 1.027905e-01
## 4 0.002541261 1.21816714 1.105415e+00 0.264631499 0.111206018 2.291441e-01
## 5 0.756313641 0.02419469 7.964267e-02 0.001995239 0.140414495 2.767180e-04
## 6 0.059882566 0.02862278 3.383842e-05 0.065555803 0.008092102 7.852393e-05
## Dim.25 Dim.26 Dim.27 Dim.28 Dim.29 Dim.30
## 1 0.2589109285 0.8726191306 1.62489212 1.271865e-01 0.489055649 2.9390496667
## 2 0.3300518050 0.0363539416 0.83687792 1.176930e-01 0.007591915 0.0046087798
## 3 0.3325646996 0.0004819203 0.06258026 2.445369e-01 0.002322785 0.0007441001
## 4 0.0681767579 1.6279265773 0.85723913 1.995855e-01 1.126993004 0.5250824154
## 5 0.0001064411 0.0331157040 0.02635442 1.338092e-01 0.005956909 0.5944638618
## 6 0.1700495704 0.0197294722 0.18166059 5.897596e-05 0.091283581 0.0158092037all_pca_ind_5 <- get_pca_ind(all_pca_5)
all_pca_ind_5
## Principal Component Analysis Results for individuals
## ===================================================
## Name Description
## 1 "$coord" "Coordinates for the individuals"
## 2 "$cos2" "Cos2 for the individuals"
## 3 "$contrib" "contributions of the individuals"
head(all_pca_ind_5$coord)
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] -9.184755 -1.946870 -1.1221788 3.6305364 1.1940595 1.41018364
## [2,] -2.385703 3.764859 -0.5288274 1.1172808 -0.6212284 0.02863116
## [3,] -5.728855 1.074229 -0.5512625 0.9112808 0.1769302 0.54097615
## [4,] -7.116691 -10.266556 -3.2299475 0.1524129 2.9582754 3.05073750
## [5,] -3.931842 1.946359 1.3885450 2.9380542 -0.5462667 -1.22541641
## [6,] -2.378155 -3.946456 -2.9322967 0.9402096 1.0551135 -0.45064213
## [,7] [,8] [,9] [,10] [,11] [,12]
## [1,] 2.15747152 0.39805698 -0.15698023 -0.8766305 -0.2627243 -0.8582593
## [2,] 0.01334635 -0.24077660 -0.71127897 1.1060218 -0.8124048 0.1577838
## [3,] -0.66757908 -0.09728813 0.02404449 0.4538760 0.6050715 0.1242777
## [4,] 1.42865363 -1.05863376 -1.40420412 -1.1159933 1.1505012 1.0104267
## [5,] -0.93538950 -0.63581661 -0.26357355 0.3773724 -0.6507870 -0.1104183
## [6,] 0.49001396 0.16529843 -0.13335576 -0.5299649 -0.1096698 0.0813699
## [,13] [,14] [,15] [,16] [,17] [,18]
## [1,] 0.10329677 -0.690196797 0.601264078 0.74446075 -0.26523740 -0.54907956
## [2,] -0.94269981 -0.652900844 -0.008966977 -0.64823831 -0.01719707 0.31801756
## [3,] -0.41026561 0.016665095 -0.482994760 0.32482472 0.19075064 -0.08789759
## [4,] -0.93245070 -0.486988399 0.168699395 0.05132509 0.48220960 -0.03584323
## [5,] 0.38760691 -0.538706543 -0.310046684 -0.15247165 0.13302526 -0.01869779
## [6,] -0.02625135 0.003133944 -0.178447576 -0.01270566 0.19671335 -0.29727706
## [,19] [,20] [,21] [,22] [,23] [,24]
## [1,] 0.1336499 0.34526111 0.096430045 -0.06878939 0.08444429 0.175102213
## [2,] -0.2473470 -0.11403274 -0.077259494 0.09449530 -0.21752666 -0.011280193
## [3,] -0.3922812 -0.20435242 0.310793246 0.06025601 -0.07422581 -0.102671419
## [4,] -0.0267241 -0.46432511 0.433811661 0.20308706 -0.12399554 -0.153294780
## [5,] 0.4610302 0.06543782 -0.116442469 0.01763433 0.13933105 0.005327110
## [6,] -0.1297265 -0.07117453 -0.002400178 0.10108043 0.03344819 -0.002837749
## [,25] [,26] [,27] [,28] [,29]
## [1,] 0.150887294 -0.201326305 -0.25236294 -0.0338846387 0.045607590
## [2,] 0.170360355 -0.041092627 0.18111081 0.0325955021 -0.005682424
## [3,] -0.171007656 0.004731249 0.04952586 0.0469844833 0.003143131
## [4,] -0.077427574 -0.274982822 0.18330078 0.0424469831 -0.069233868
## [5,] -0.003059371 0.039219780 0.03213957 -0.0347556386 0.005033481
## [6,] -0.122282765 -0.030272333 -0.08438081 0.0007296587 -0.019703996
## [,30]
## [1,] 0.0471277407
## [2,] 0.0018662342
## [3,] -0.0007498749
## [4,] 0.0199198881
## [5,] -0.0211951203
## [6,] -0.0034564331
head(all_pca_ind_5$cos2)
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 0.7366868 0.03309951 0.010996938 0.1151037016 0.0124508677 1.736597e-02
## [2,] 0.2164877 0.53913561 0.010637227 0.0474815863 0.0146792249 3.118017e-05
## [3,] 0.8781764 0.03087731 0.008131356 0.0222203296 0.0008376258 7.830730e-03
## [4,] 0.2593764 0.53978871 0.053427543 0.0001189646 0.0448178978 4.766328e-02
## [5,] 0.4498315 0.11023095 0.056101900 0.2511755035 0.0086829482 4.369433e-02
## [6,] 0.1722700 0.47439927 0.261905853 0.0269264275 0.0339099994 6.185759e-03
## [,7] [,8] [,9] [,10] [,11]
## [1,] 4.064787e-02 0.0013836880 2.151977e-04 0.006710902 0.0006027652
## [2,] 6.775254e-06 0.0022051042 1.924334e-02 0.046529449 0.0251041522
## [3,] 1.192481e-02 0.0002532595 1.546953e-05 0.005512144 0.0097962404
## [4,] 1.045269e-02 0.0057393901 1.009799e-02 0.006378190 0.0067787311
## [5,] 2.545908e-02 0.0117630901 2.021442e-03 0.004143792 0.0123235390
## [6,] 7.313854e-03 0.0008322750 5.416927e-04 0.008555071 0.0003663563
## [,12] [,13] [,14] [,15] [,16]
## [1,] 0.0064325731 9.317968e-05 4.160002e-03 3.157026e-03 4.839843e-03
## [2,] 0.0009469458 3.380239e-02 1.621418e-02 3.058389e-06 1.598343e-02
## [3,] 0.0004132687 4.503770e-03 7.431246e-06 6.242102e-03 2.823216e-03
## [4,] 0.0052285821 4.452727e-03 1.214539e-03 1.457476e-04 1.349067e-05
## [5,] 0.0003547641 4.371603e-03 8.444271e-03 2.797125e-03 6.764503e-04
## [6,] 0.0002016773 2.099098e-05 2.991656e-07 9.699530e-04 4.917266e-06
## [,17] [,18] [,19] [,20] [,21]
## [1,] 6.143519e-04 2.632802e-03 1.559858e-04 0.0010409815 8.120307e-05
## [2,] 1.124889e-05 3.846828e-03 2.327093e-03 0.0004946064 2.270410e-04
## [3,] 9.735943e-04 2.067283e-04 4.117570e-03 0.0011173921 2.584575e-03
## [4,] 1.190820e-03 6.579435e-06 3.657467e-06 0.0011041259 9.637773e-04
## [5,] 5.149036e-04 1.017275e-05 6.184670e-03 0.0001245992 3.945304e-04
## [6,] 1.178683e-03 2.691858e-03 5.126095e-04 0.0001543045 1.754755e-07
## [,22] [,23] [,24] [,25] [,26]
## [1,] 4.132289e-05 6.227135e-05 2.677509e-04 1.988168e-04 3.539556e-04
## [2,] 3.396417e-04 1.799805e-03 4.839869e-06 1.103920e-03 6.422859e-05
## [3,] 9.715089e-05 1.474198e-04 2.820624e-04 7.824870e-04 5.989596e-07
## [4,] 2.112218e-04 7.873837e-05 1.203453e-04 3.070192e-05 3.872446e-04
## [5,] 9.048481e-06 5.648765e-04 8.257356e-07 2.723465e-07 4.475772e-05
## [6,] 3.112171e-04 3.407804e-05 2.452886e-07 4.554701e-04 2.791394e-05
## [,27] [,28] [,29] [,30]
## [1,] 5.561589e-04 1.002659e-05 1.816444e-05 1.939550e-05
## [2,] 1.247640e-03 4.041252e-05 1.228197e-06 1.324747e-07
## [3,] 6.563115e-05 5.906836e-05 2.643449e-07 1.504609e-08
## [4,] 1.720691e-04 9.227157e-06 2.454774e-05 2.032114e-06
## [5,] 3.005646e-05 3.514862e-05 7.372157e-07 1.307162e-05
## [6,] 2.168786e-04 1.621694e-08 1.182600e-05 3.639031e-07
head(all_pca_ind_5$contrib)
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 1.11824300 0.11724921 0.07867607 1.171622241 0.152248808 0.2899795975
## [2,] 0.07544555 0.43846352 0.01747215 0.110961077 0.041210182 0.0001195347
## [3,] 0.43504757 0.03569681 0.01898609 0.073815998 0.003342766 0.0426748821
## [4,] 0.67136279 3.26051009 0.65179267 0.002064855 0.934499758 1.3571435270
## [5,] 0.20492419 0.11718767 0.12045877 0.767300731 0.031864823 0.2189694260
## [6,] 0.07496891 0.48178208 0.53719796 0.078576986 0.118877695 0.0296128148
## [,7] [,8] [,9] [,10] [,11] [,12]
## [1,] 1.213658e+00 0.058529188 0.0104067498 0.38579553 0.041345602 0.496569180
## [2,] 4.644418e-05 0.021414629 0.2136510859 0.61411732 0.395342837 0.016782911
## [3,] 1.162015e-01 0.003496244 0.0002441497 0.10341857 0.219301852 0.010411881
## [4,] 5.321820e-01 0.413974398 0.8326944234 0.62524063 0.792871893 0.688259771
## [5,] 2.281347e-01 0.149329546 0.0293378859 0.07149313 0.253691925 0.008219107
## [6,] 6.260704e-02 0.010092981 0.0075101521 0.14099971 0.007204489 0.004463446
## [,13] [,14] [,15] [,16] [,17]
## [1,] 0.007783311 5.341595e-01 0.6761315700 1.2217735862 0.2085171427
## [2,] 0.648242810 4.779908e-01 0.0001503809 0.9263530122 0.0008765592
## [3,] 0.122778155 3.114159e-04 0.4363003250 0.2325976169 0.1078460556
## [4,] 0.634223945 2.659269e-01 0.0532264801 0.0058071962 0.6891979240
## [5,] 0.109590751 3.254090e-01 0.1797855904 0.0512489941 0.0524493991
## [6,] 0.000502683 1.101305e-05 0.0595555177 0.0003558783 0.1146937985
## [,18] [,19] [,20] [,21] [,22] [,23]
## [1,] 1.008745049 0.063559490 0.67353144 5.461950e-02 0.030361274 0.051577078
## [2,] 0.338386526 0.217698967 0.07347197 3.506119e-02 0.057292461 0.342248175
## [3,] 0.025850254 0.547568039 0.23595117 5.673697e-01 0.023295809 0.039849794
## [4,] 0.004298581 0.002541261 1.21816714 1.105415e+00 0.264631499 0.111206018
## [5,] 0.001169745 0.756313641 0.02419469 7.964267e-02 0.001995239 0.140414495
## [6,] 0.295687975 0.059882566 0.02862278 3.383842e-05 0.065555803 0.008092102
## [,24] [,25] [,26] [,27] [,28] [,29]
## [1,] 2.989766e-01 0.2589109285 0.8726191306 1.62489212 1.271865e-01 0.489055649
## [2,] 1.240758e-03 0.3300518050 0.0363539416 0.83687792 1.176930e-01 0.007591915
## [3,] 1.027905e-01 0.3325646996 0.0004819203 0.06258026 2.445369e-01 0.002322785
## [4,] 2.291441e-01 0.0681767579 1.6279265773 0.85723913 1.995855e-01 1.126993004
## [5,] 2.767180e-04 0.0001064411 0.0331157040 0.02635442 1.338092e-01 0.005956909
## [6,] 7.852393e-05 0.1700495704 0.0197294722 0.18166059 5.897596e-05 0.091283581
## [,30]
## [1,] 2.9390496667
## [2,] 0.0046087798
## [3,] 0.0007441001
## [4,] 0.5250824154
## [5,] 0.5944638618
## [6,] 0.0158092037Each of these results has its usefulness, although more often than not a plot is required in order to properly interpret the information they hold. Some of these plots and applications are overviewed from this point onward.
It was already stated that Principal Components are linear combinations of the dataset original features/variables. These linear combinations are more dependant on certain variables than upon others, and the distribution is stored within the results for variables under the $contrib index and expressed as a percentage (as in how much of a given Principal Component is determined by the variable in question). Note that each of the individuals also contributes to the Principal Components (albeit in a less direct way due to the higher amount of them) - these individual contributions are stored within the results for individuals under the $contrib index (also expressed as a percentage).
The function fviz_contrib() from the factoextra package can be used to draw a barplot of these contributions.
# Do not run this code snippet, as it is only here for illustration purposes
library(factoextra)
fviz_contrib(X,
choice = c("row", "col", "var", "ind", "quanti.var", "quali.var",
"group", "partial.axes"),
axes = 1,
fill = "steelblue",
color = "steelblue",
sort.val = c("desc", "asc", "none"),
top = Inf,
ggtheme = theme_minimal(),
...)PCA, CA, MCA, FAMD, MFA and HMFA (from the FactoMineR package); prcomp and princomp (from R built-in functions); dudi, pca, coa and acm (from the ade4 package); ca (from the ca package).
“row” and “col” for CA objects; “var” and “ind” for PCA or MCA objects; “var”, “ind”, “quanti.var”, “quali.var” and “group” for FAMD, MFA and HMFA objects.
“none” (no sorting), “asc” (for ascending) or “desc” (for descending).
More information regarding the fviz_contrib() function and its arguments is available in its associated RDocumentation page: https://www.rdocumentation.org/packages/factoextra/versions/1.0.7/topics/fviz_contrib
Let’s now observe the results of applying the function at hand to the previously constructed PCA objects. Note that various graphs are plotted to showcase an evalution of these contributions in various scenarios: a pair of single-dimensional ones and a combination of these (a multi-dimension scenario) - to do so, the function grid.arrange() from the gridExtra package is used; more information regarding this function and its arguments is available in its associated RDocumentation page: https://www.rdocumentation.org/packages/gridExtra/versions/2.3/topics/arrangeGrob
The following code snippets and plots cover the contribution of variables to PCs:
barplot_1 <- fviz_contrib(all_pca_1,
choice = "var",
axes = 1) +
theme(text = element_text(size = 10),
axis.text.x = element_text(angle = 90))
barplot_2 <- fviz_contrib(all_pca_1,
choice = "var",
axes = 2) +
theme(text = element_text(size = 10),
axis.text.x = element_text(angle = 90))
barplot_3 <- fviz_contrib(all_pca_1,
choice = "var",
axes = 1:2) +
theme(text = element_text(size = 10),
axis.text.x = element_text(angle = 90))
grid.arrange(barplot_1, barplot_2, barplot_3, nrow = 3, ncol = 1)barplot_1 <- fviz_contrib(all_pca_2,
choice = "var",
axes = 1) +
theme(text = element_text(size = 10),
axis.text.x = element_text(angle = 90))
barplot_2 <- fviz_contrib(all_pca_2,
choice = "var",
axes = 2) +
theme(text = element_text(size = 10),
axis.text.x = element_text(angle = 90))
barplot_3 <- fviz_contrib(all_pca_2,
choice = "var",
axes = 1:2) +
theme(text = element_text(size = 10),
axis.text.x = element_text(angle = 90))
grid.arrange(barplot_1, barplot_2, barplot_3, nrow = 3, ncol = 1)barplot_1 <- fviz_contrib(all_pca_3,
choice = "var",
axes = 1) +
theme(text = element_text(size = 10),
axis.text.x = element_text(angle = 90))
barplot_2 <- fviz_contrib(all_pca_3,
choice = "var",
axes = 2) +
theme(text = element_text(size = 10),
axis.text.x = element_text(angle = 90))
barplot_3 <- fviz_contrib(all_pca_3,
choice = "var",
axes = 1:2) +
theme(text = element_text(size = 10),
axis.text.x = element_text(angle = 90))
grid.arrange(barplot_1, barplot_2, barplot_3, nrow = 3, ncol = 1)barplot_1 <- fviz_contrib(all_pca_4,
choice = "var",
axes = 1) +
theme(text = element_text(size = 10),
axis.text.x = element_text(angle = 90))
barplot_2 <- fviz_contrib(all_pca_4,
choice = "var",
axes = 2) +
theme(text = element_text(size = 10),
axis.text.x = element_text(angle = 90))
barplot_3 <- fviz_contrib(all_pca_4,
choice = "var",
axes = 1:2) +
theme(text = element_text(size = 10),
axis.text.x = element_text(angle = 90))
grid.arrange(barplot_1, barplot_2, barplot_3, nrow = 3, ncol = 1)barplot_1 <- fviz_contrib(all_pca_5,
choice = "var",
axes = 1) +
theme(text = element_text(size = 10),
axis.text.x = element_text(angle = 90))
barplot_2 <- fviz_contrib(all_pca_5,
choice = "var",
axes = 2) +
theme(text = element_text(size = 10),
axis.text.x = element_text(angle = 90))
barplot_3 <- fviz_contrib(all_pca_5,
choice = "var",
axes = 1:2) +
theme(text = element_text(size = 10),
axis.text.x = element_text(angle = 90))
grid.arrange(barplot_1, barplot_2, barplot_3, nrow = 3, ncol = 1)Unsurprisingly, all of the resulting plots are identical - as should be. Let’s now evaluate the contribution of individuals to PCs:
barplot_1 <- fviz_contrib(all_pca_1,
choice = "ind",
axes = 1) +
theme(text = element_text(size = 10),
axis.text.x = element_text(angle = 90))
barplot_2 <- fviz_contrib(all_pca_1,
choice = "ind",
axes = 2) +
theme(text = element_text(size = 10),
axis.text.x = element_text(angle = 90))
barplot_3 <- fviz_contrib(all_pca_1,
choice = "ind",
axes = 1:2) +
theme(text = element_text(size = 10),
axis.text.x = element_text(angle = 90))
grid.arrange(barplot_1, barplot_2, barplot_3, nrow = 3, ncol = 1)barplot_1 <- fviz_contrib(all_pca_2,
choice = "ind",
axes = 1) +
theme(text = element_text(size = 10),
axis.text.x = element_text(angle = 90))
barplot_2 <- fviz_contrib(all_pca_2,
choice = "ind",
axes = 2) +
theme(text = element_text(size = 10),
axis.text.x = element_text(angle = 90))
barplot_3 <- fviz_contrib(all_pca_2,
choice = "ind",
axes = 1:2) +
theme(text = element_text(size = 10),
axis.text.x = element_text(angle = 90))
grid.arrange(barplot_1, barplot_2, barplot_3, nrow = 3, ncol = 1)barplot_1 <- fviz_contrib(all_pca_3,
choice = "ind",
axes = 1) +
theme(text = element_text(size = 10),
axis.text.x = element_text(angle = 90))
barplot_2 <- fviz_contrib(all_pca_3,
choice = "ind",
axes = 2) +
theme(text = element_text(size = 10),
axis.text.x = element_text(angle = 90))
barplot_3 <- fviz_contrib(all_pca_3,
choice = "ind",
axes = 1:2) +
theme(text = element_text(size = 10),
axis.text.x = element_text(angle = 90))
grid.arrange(barplot_1, barplot_2, barplot_3, nrow = 3, ncol = 1)barplot_1 <- fviz_contrib(all_pca_4,
choice = "ind",
axes = 1) +
theme(text = element_text(size = 10),
axis.text.x = element_text(angle = 90))
barplot_2 <- fviz_contrib(all_pca_4,
choice = "ind",
axes = 2) +
theme(text = element_text(size = 10),
axis.text.x = element_text(angle = 90))
barplot_3 <- fviz_contrib(all_pca_4,
choice = "ind",
axes = 1:2) +
theme(text = element_text(size = 10),
axis.text.x = element_text(angle = 90))
grid.arrange(barplot_1, barplot_2, barplot_3, nrow = 3, ncol = 1)barplot_1 <- fviz_contrib(all_pca_5,
choice = "ind",
axes = 1) +
theme(text = element_text(size = 10),
axis.text.x = element_text(angle = 90))
barplot_2 <- fviz_contrib(all_pca_5,
choice = "ind",
axes = 2) +
theme(text = element_text(size = 10),
axis.text.x = element_text(angle = 90))
barplot_3 <- fviz_contrib(all_pca_5,
choice = "ind",
axes = 1:2) +
theme(text = element_text(size = 10),
axis.text.x = element_text(angle = 90))
grid.arrange(barplot_1, barplot_2, barplot_3, nrow = 3, ncol = 1)Once again, identical outputs - as should be. Note that the amount of individuals makes it impossible to apreciate the X-axis’ labels (without an extreme zoom-in). A more illustrative approach to visualizing these contributions involves the use of the corrplot() function from the corrplot package, which showcases every contribution (either variables’ or individuals’) on every Principal Component.
# Do not run this code snippet, as it is only here for illustration purposes
library(corrplot)
corrplot(corr,
is.corr = FALSE,
...)corrplot() is a really complete function with much attention to detail, meaning that there are way too many arguments to tweak its behavior and graph for this document to cover - given that most of them are of negligible relevance for the task at hand, only the key arguments will be detailed:
‘circle’ (default), ‘square’, ‘ellipse’, ‘number’, ‘pie’, ‘shade’ and ‘color’.
‘full’), lower triangular (‘lower’) or upper triangular matrix (‘upper’).
TRUE or FALSE determines wheter or not to add the graph to an existing plot or not.
TRUE or FALSE determines whether the input matrix is a correlation matrix or not.
TRUE or FALSE determines whether to display the correlation coefficient on the principal diagonal
More information regarding the corrplot() function and all of its arguments is available in its associated RDocumentation page: page: https://www.rdocumentation.org/packages/corrplot/versions/0.90/topics/corrplot
Let’s now observe the results of applying the function at hand to the Wisconsin Breast Cancer Dataset. Note that the function’s very first argument corresponds to the contributions themselves (previous functions used the PCA objects instead) - let’s evaluate the variables first:
corrplot(all_pca_var_1$contrib, is.corr=FALSE)corrplot(all_pca_var_2$contrib, is.corr=FALSE)corrplot(all_pca_var_3$contrib, is.corr=FALSE)corrplot(as.matrix(all_pca_var_4$contrib), is.corr=FALSE)corrplot(all_pca_var_5$contrib, is.corr=FALSE)Once again, all of the resulting plots are identical - as should be. However, note the use of the function as.matrix() within the dudi.pca() tab - that is due to the particular structure of said function’s objects which creates oddities when applying certain functions such as get_pca_var() or get_pca_ind(). The resulting objects obtained via said functions are usually of matrix class, but in this case it is a data.frame instead; using as.matrix() reformats the data.frame so that the function corrplot() accepts the input.
Applying corrplot() upon the individuals’ contributions would yield a massive plot due to the sheer amount of individuals. Given that, the following code snippets are not rendered in order to keep the document clean and readable.
corrplot(all_pca_ind_1$contrib, is.corr=FALSE)corrplot(all_pca_ind_2$contrib, is.corr=FALSE)corrplot(all_pca_ind_3$contrib, is.corr=FALSE)corrplot(as.matrix(all_pca_ind_4$contrib), is.corr=FALSE)corrplot(all_pca_ind_5$contrib, is.corr=FALSE)The quality of representation (cos2) measures how well represented is a given variable within a given Principal Component (or within a set of them). This logic is also applied to the individuals, meaning that cos2 measures how well represented they are within a given Principal Component (or within a set of them). It is worth noting that for any given variable or individual the sum of the cos2 across all the Principal Components is equal to one.
The function fviz_cos2() from the factoextra package helps to visualize through a barplot which of the PCA variables and/or individuals are best represented within a certain Principal Component (or within a set of Principal Components).
# Do not run this code snippet, as it is only here for illustration purposes
library(factoextra)
fviz_cos2(X,
choice = c("row", "col", "var", "ind", "quanti.var", "quali.var", "group"),
axes = 1,
fill = "steelblue",
color = "steelblue",
sort.val = c("desc", "asc", "none"),
top = Inf,
xtickslab.rt = 45,
ggtheme = theme_minimal(),
...)PCA, CA, MCA, FAMD, MFA and HMFA (from the FactoMineR package); prcomp and princomp (from R built-in functions); dudi, pca, coa and acm (from the ade4 package); ca (from the ca package).
“row” and “col” for CA objects; “var” and “ind” for PCA or MCA objects; “var”, “ind”, “quanti.var”, “quali.var” and “group” for FAMD, MFA and HMFA objects.
“none” (no sorting), “asc” (for ascending) or “desc” (for descending).
More information regarding the fviz_cos2() function and its arguments is available in its associated RDocumentation page: https://www.rdocumentation.org/packages/factoextra/versions/1.0.7/topics/fviz_cos2
Let’s now observe the results of applying the function at hand to the previously constructed PCA objects. Note that various graphs are plotted to showcase an evalution of cos2 in various scenarios: a pair of single-dimensional ones and a combination of these (a multi-dimension scenario) - to do so, the function grid.arrange() from the gridExtra package is used once again.
The following code snippets and plots cover the contribution of variables to PCs:
barplot_1 <- fviz_cos2(all_pca_1,
choice = "var",
axes = 1) +
theme(text = element_text(size = 10),
axis.text.x = element_text(angle = 90))
barplot_2 <- fviz_cos2(all_pca_1,
choice = "var",
axes = 2) +
theme(text = element_text(size = 10),
axis.text.x = element_text(angle = 90))
barplot_3 <- fviz_cos2(all_pca_1,
choice = "var",
axes = 1:2) +
theme(text = element_text(size = 10),
axis.text.x = element_text(angle = 90))
grid.arrange(barplot_1, barplot_2, barplot_3, nrow = 3, ncol = 1)barplot_1 <- fviz_cos2(all_pca_2,
choice = "var",
axes = 1) +
theme(text = element_text(size = 10),
axis.text.x = element_text(angle = 90))
barplot_2 <- fviz_cos2(all_pca_2,
choice = "var",
axes = 2) +
theme(text = element_text(size = 10),
axis.text.x = element_text(angle = 90))
barplot_3 <- fviz_cos2(all_pca_2,
choice = "var",
axes = 1:2) +
theme(text = element_text(size = 10),
axis.text.x = element_text(angle = 90))
grid.arrange(barplot_1, barplot_2, barplot_3, nrow = 3, ncol = 1)barplot_1 <- fviz_cos2(all_pca_3,
choice = "var",
axes = 1) +
theme(text = element_text(size = 10),
axis.text.x = element_text(angle = 90))
barplot_2 <- fviz_cos2(all_pca_3,
choice = "var",
axes = 2) +
theme(text = element_text(size = 10),
axis.text.x = element_text(angle = 90))
barplot_3 <- fviz_cos2(all_pca_3,
choice = "var",
axes = 1:2) +
theme(text = element_text(size = 10),
axis.text.x = element_text(angle = 90))
grid.arrange(barplot_1, barplot_2, barplot_3, nrow = 3, ncol = 1)barplot_1 <- fviz_cos2(all_pca_4,
choice = "var",
axes = 1) +
theme(text = element_text(size = 10),
axis.text.x = element_text(angle = 90))
barplot_2 <- fviz_cos2(all_pca_4,
choice = "var",
axes = 2) +
theme(text = element_text(size = 10),
axis.text.x = element_text(angle = 90))
barplot_3 <- fviz_cos2(all_pca_4,
choice = "var",
axes = 1:2) +
theme(text = element_text(size = 10),
axis.text.x = element_text(angle = 90))
grid.arrange(barplot_1, barplot_2, barplot_3, nrow = 3, ncol = 1)barplot_1 <- fviz_cos2(all_pca_5,
choice = "var",
axes = 1) +
theme(text = element_text(size = 10),
axis.text.x = element_text(angle = 90))
barplot_2 <- fviz_cos2(all_pca_5,
choice = "var",
axes = 2) +
theme(text = element_text(size = 10),
axis.text.x = element_text(angle = 90))
barplot_3 <- fviz_cos2(all_pca_5,
choice = "var",
axes = 1:2) +
theme(text = element_text(size = 10),
axis.text.x = element_text(angle = 90))
grid.arrange(barplot_1, barplot_2, barplot_3, nrow = 3, ncol = 1)Unsurprisingly, all of the resulting plots are identical - as should be. Let’s now evaluate the cos2 for the individuals:
barplot_1 <- fviz_cos2(all_pca_1,
choice = "ind",
axes = 1) +
theme(text = element_text(size = 10),
axis.text.x = element_text(angle = 90))
barplot_2 <- fviz_cos2(all_pca_1,
choice = "ind",
axes = 2) +
theme(text = element_text(size = 10),
axis.text.x = element_text(angle = 90))
barplot_3 <- fviz_cos2(all_pca_1,
choice = "ind",
axes = 1:2) +
theme(text = element_text(size = 10),
axis.text.x = element_text(angle = 90))
grid.arrange(barplot_1, barplot_2, barplot_3, nrow = 3, ncol = 1)barplot_1 <- fviz_cos2(all_pca_2,
choice = "ind",
axes = 1) +
theme(text = element_text(size = 10),
axis.text.x = element_text(angle = 90))
barplot_2 <- fviz_cos2(all_pca_2,
choice = "ind",
axes = 2) +
theme(text = element_text(size = 10),
axis.text.x = element_text(angle = 90))
barplot_3 <- fviz_cos2(all_pca_2,
choice = "ind",
axes = 1:2) +
theme(text = element_text(size = 10),
axis.text.x = element_text(angle = 90))
grid.arrange(barplot_1, barplot_2, barplot_3, nrow = 3, ncol = 1)barplot_1 <- fviz_cos2(all_pca_3,
choice = "ind",
axes = 1) +
theme(text = element_text(size = 10),
axis.text.x = element_text(angle = 90))
barplot_2 <- fviz_cos2(all_pca_3,
choice = "ind",
axes = 2) +
theme(text = element_text(size = 10),
axis.text.x = element_text(angle = 90))
barplot_3 <- fviz_cos2(all_pca_3,
choice = "ind",
axes = 1:2) +
theme(text = element_text(size = 10),
axis.text.x = element_text(angle = 90))
grid.arrange(barplot_1, barplot_2, barplot_3, nrow = 3, ncol = 1)barplot_1 <- fviz_cos2(all_pca_4,
choice = "ind",
axes = 1) +
theme(text = element_text(size = 10),
axis.text.x = element_text(angle = 90))
barplot_2 <- fviz_cos2(all_pca_4,
choice = "ind",
axes = 2) +
theme(text = element_text(size = 10),
axis.text.x = element_text(angle = 90))
barplot_3 <- fviz_cos2(all_pca_4,
choice = "ind",
axes = 1:2) +
theme(text = element_text(size = 10),
axis.text.x = element_text(angle = 90))
grid.arrange(barplot_1, barplot_2, barplot_3, nrow = 3, ncol = 1)barplot_1 <- fviz_cos2(all_pca_5,
choice = "ind",
axes = 1) +
theme(text = element_text(size = 10),
axis.text.x = element_text(angle = 90))
barplot_2 <- fviz_cos2(all_pca_5,
choice = "ind",
axes = 2) +
theme(text = element_text(size = 10),
axis.text.x = element_text(angle = 90))
barplot_3 <- fviz_cos2(all_pca_5,
choice = "ind",
axes = 1:2) +
theme(text = element_text(size = 10),
axis.text.x = element_text(angle = 90))
grid.arrange(barplot_1, barplot_2, barplot_3, nrow = 3, ncol = 1)Once again, identical outputs - as should be. Note that, once again, the amount of individuals makes it impossible to apreciate the X-axis’ labels (without an extreme zoom-in).
As was already stated, a more illustrative approach to visualizing a data array such as the contributions’ one or this one (with the quality of representations for both variables and individuals) can be achieved through the use of the corrplot() function from the corrplot package, which in this case showcases the quality of representation of every variable or individual within every Principal Component.
Let’s now observe the results of applying said function to the Wisconsin Breast Cancer Dataset. Note that the function’s very first argument corresponds to the cos2 results themselves (some previous functions used the PCA objects instead) - let’s evaluate the variables first:
corrplot(all_pca_var_1$cos2, is.corr=FALSE)corrplot(all_pca_var_2$cos2, is.corr=FALSE)corrplot(all_pca_var_3$cos2, is.corr=FALSE)corrplot(as.matrix(all_pca_var_4$cos2), is.corr=FALSE)corrplot(all_pca_var_5$cos2, is.corr=FALSE)Once again, all of the resulting plots are identical - as should be. Note the use of the function as.matrix() within the dudi.pca() tab to transform once again the object of class data.frame to one of class matrix so that the function corrplot() can use it as an argument without issues.
Applying corrplot() upon the individuals’ cos2 would yield a massive plot due to the sheer amount of individuals. Given that, the following code snippets are not rendered in order to keep the document clean and readable.
corrplot(all_pca_ind_1$cos2, is.corr=FALSE)corrplot(all_pca_ind_2$cos2, is.corr=FALSE)corrplot(all_pca_ind_3$cos2, is.corr=FALSE)corrplot(as.matrix(all_pca_ind_4$cos2), is.corr=FALSE)corrplot(all_pca_ind_5$cos2, is.corr=FALSE)As was stated previously, there is no well-accepted objective way to decide how many Principal Components are enough - this will depend on the specific field of application and the specific dataset (biomedical scenarios tend to require high cummulative variance since the people’s health is at play). However, the first few principal components are the most important ones in order to find interesting patterns in the data and undoubtedly the most important ones when it comes to representing the data.
The correlation circle showcases the correlation between the original dataset features/variables and the Principal Components via coordinates within a 2D circle: the dimension with the most explained variance is the first Principal Component (PC1) and is plotted on the horizontal axis, whereas the second most explanatory dimension is the second Principal Component (PC2) and placed on the vertical axis; the original features/variables are then projected upon this bi-dimensional factor space.
“The observations are represented by their projections, but the variables are represented by their correlations.”
Abdi and Williams, 2010
The correlation circle allows to easily visualize said correlations: if two given lines are pointing in the same direction that implies their associated features/variables are highly correlated, if they are orthogonal they are mostly unrelated and if they are pointing in opposite directions they are negatively correlated.
Plotting correlation circles within R requires the use of the fviz_pca_var() function from the factoextra package.
# Do not run this code snippet, as it is only here for illustration purposes
library(factoextra)
fviz_pca_var(X,
axes = c(1, 2),
geom = c("arrow", "text"),
geom.var = geom,
repel = FALSE,
col.var = "black",
fill.var = "white",
alpha.var = 1,
col.quanti.sup = "blue",
col.circle = "grey70",
select.var = list(name = NULL, cos2 = NULL, contrib = NULL),
gradiant.cols = NULL,
...)c(“point”, “arrow”, “text”).
geom but for variables.
TRUE or FALSE determines whether to use ggrepel to avoid overplotting text labels or not.
More information regarding the fviz_pca_var() function and all of its arguments is available in its associated RDocumentation page: https://www.rdocumentation.org/packages/factoextra/versions/1.0.7/topics/fviz_pca
Let’s observe the results of applying the function at hand to the previously constructed PCA objects:
fviz_pca_var(all_pca_1,
repel = TRUE
)fviz_pca_var(all_pca_2,
repel = TRUE
)fviz_pca_var(all_pca_3,
repel = TRUE
)fviz_pca_var(all_pca_4,
repel = TRUE
)fviz_pca_var(all_pca_5,
repel = TRUE
)Note that, despite them being rotated, the resulting plots are identical - as should be.
The function’s arguments allow the colors of the correlation circle to be based upon results for variables obtained through the get_pca_var() function, such as the contribution and quality of representation - the following code snippets showcase said examples (using fviz_pca_var() upon PCA()’s resulting object).
fviz_pca_var(all_pca_3,
col.var = "contrib",
repel = TRUE,
gradient.cols = c("#FF0000", "#00FF00", "#0000FF")
# The higher the contrib values, the closer to the last color (blue)
)fviz_pca_var(all_pca_3,
col.var = "cos2",
repel = TRUE,
gradient.cols = c("#FF0000", "#00FF00", "#0000FF")
# The higher the cos2 values, the closer to the last color (blue)
)It’s also possible to change the color of variables by groups defined by a qualitative/categorical variable, commonly known as a factor (and thus, factor-based coloring) - in a correlation circle, this can help to illustrate which groups of variables are highly correlated and which are not.
There are multiple approaches as to how to create appropriate clusters; this document and the following code snippets showcase the kmeans clustering algorithm, which aims to partition the points (in this case, the variables) into “k” groups (hence the name) so that the sum of squares from the points to the assigned cluster centers is minimized - the function to perform this clustering algorithm is called kmeans() and is a built-in R function (from R’s built-in stats package, like the prcomp() and princomp functions previously detailed).
# Do not run this code snippet, as it is only here for illustration purposes
kmeans(x,
centers,
iter.max = 10,
nstart = 1,
algorithm = c("Hartigan-Wong", "Lloyd", "Forgy", "MacQueen"),
trace=FALSE)centers is a number (meaning that is not a set of cluster centers), then this argument determines the amount of random sets chosen.
“Hartigan-Wong”, “Lloyd”, “Forgy” or “MacQueen”.
“Hartigan-Wong”): either a logical or an integer number - if positive (or true), tracing information on the progress of the algorithm is produced. Higher values may produce more tracing information.
More information about this function, its behavior and its arguments can be found in its associated RDocumentation page: https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/kmeans
The following code snippet showcases this function’s use in order to create “k” different clusters (the different tabs are meant to illustrate different values of “k”), which are then latter used within fviz_pca_var() to create a correlation circle with factor-based coloring. Note that main argument of kmeans() uses the coordinates from the results for variables previously obtained through get_pca_var() (also note that the object and variables at play are the ones obtained with the PCA() function).
# Cluster Creation
res.all <- kmeans(all_pca_var_3$coord, centers = 6, nstart = 25)
grp <- as.factor(res.all$cluster)
# Correlation Circle
fviz_pca_var(all_pca_3,
col.var = grp,
repel = TRUE,
palette = "jco",
legend.title = "Clusters")# Cluster Creation
res.all <- kmeans(all_pca_var_3$coord, centers = 3, nstart = 25)
grp <- as.factor(res.all$cluster)
# Correlation Circle
fviz_pca_var(all_pca_3,
col.var = grp,
repel = TRUE,
palette = "jco",
legend.title = "Clusters")# Cluster Creation
res.all <- kmeans(all_pca_var_3$coord, centers = 2, nstart = 25)
grp <- as.factor(res.all$cluster)
# Correlation Circle
fviz_pca_var(all_pca_3,
col.var = grp,
repel = TRUE,
palette = "jco",
legend.title = "Clusters")If the correlation circle showcases the correlation between the original dataset features/variables and the Principal Components through a projection upon a bi-dimensional factor space defined by the two most relevant PCs, the plot of individuals delivers the same approach with the individuals themselves. As such, this plot of individuals is also useful to easily visualize the correlations among them based on their position within the graph: proximity means that the individuals are highly correlated whereas distanced individuals are mostly unrelated (data-wise).
Plotting correlation circles within R requires the use of the fviz_pca_ind() function from the factoextra package.
# Do not run this code snippet, as it is only here for illustration purposes
library(factoextra)
fviz_pca_ind(X,
axes = c(1, 2),
geom = c("point", "text"),
geom.ind = geom,
repel = FALSE,
habillage = "none",
palette = NULL,
addEllipses = FALSE,
col.ind = "black",
fill.ind = "white",
col.ind.sup = "blue",
alpha.ind = 1,
select.ind = list(name = NULL, cos2 = NULL, contrib = NULL),
...)c(“point”, “arrow”, “text”).
geom but for the individuals.
TRUE or FALSE determines whether to use ggrepel to avoid overplotting text labels or not.
TRUE or FALSE determines whether to draw ellipses around the individuals or not (only when habillage is not none
Note how the arguments are almost identical to those of the fviz_pca_var() function, albeit not exactly equal. More information regarding the fviz_pca_ind() function and all of its arguments is available in its associated RDocumentation page (which is the same as fviz_pca_var()‘s, holding both functions’ information): https://www.rdocumentation.org/packages/factoextra/versions/1.0.7/topics/fviz_pca
Let’s observe the results of applying the function at hand to the previously constructed PCA objects:
fviz_pca_ind(all_pca_1,
repel = TRUE
)fviz_pca_ind(all_pca_2,
repel = TRUE
)fviz_pca_ind(all_pca_3,
repel = TRUE
)fviz_pca_ind(all_pca_4,
repel = TRUE
)fviz_pca_ind(all_pca_5,
repel = TRUE
)Once again, the resulting plots are identical - as should be.
The function’s arguments allow the colors of the plot of individuals to be based upon the results for individuals obtained through the get_pca_ind() function, such as the contribution and quality of representation - the following code snippets showcase said examples (using fviz_pca_ind() upon PCA()’s resulting object).
fviz_pca_ind(all_pca_3,
col.ind = "contrib",
repel = TRUE,
gradient.cols = c("#FF0000", "#00FF00", "#0000FF"),
# The higher the contrib values, the closer to the last color (blue)
label = "none" # hide individual labels - no sensible information
)fviz_pca_ind(all_pca_3,
col.ind = "cos2",
repel = TRUE,
gradient.cols = c("#FF0000", "#00FF00", "#0000FF"),
# The higher the cos2 values, the closer to the last color (blue)
label = "none" # hide individual labels - no sensible information
)It’s also possible to change the color of variables by groups defined by a qualitative/categorical variable, commonly known as a factor (and thus, factor-based coloring) - in a plot of individuals, this can help to illustrate which individuals are highly correlated and which are not.
As was the case with the variables, the clusters that define the factors are determined through the use of the kmeans() function. The following code snippet showcases this function’s use in order to create “k” different clusters (the different tabs are meant to illustrate different values of “k”), which are then latter used within fviz_pca_ind() to create a plot of individuals with factor-based coloring. Note that main argument of kmeans() uses the coordinates from the results for individuals previously obtained through get_pca_ind() (also note that the object and individuals at play are the ones obtained with the PCA() function).
# Cluster Creation
res.all <- kmeans(all_pca_ind_3$coord, centers = 6, nstart = 25)
grp <- as.factor(res.all$cluster)
# Correlation Circle
fviz_pca_ind(all_pca_3,
col.ind = grp,
repel = TRUE,
palette = "jco",
legend.title = "Clusters",
label = "none" # hide individual labels - no sensible information
)# Cluster Creation
res.all <- kmeans(all_pca_ind_3$coord, centers = 3, nstart = 25)
grp <- as.factor(res.all$cluster)
# Correlation Circle
fviz_pca_ind(all_pca_3,
col.ind = grp,
repel = TRUE,
palette = "jco",
legend.title = "Clusters",
label = "none" # hide individual labels - no sensible information
)# Cluster Creation
res.all <- kmeans(all_pca_ind_3$coord, centers = 2, nstart = 25)
grp <- as.factor(res.all$cluster)
# Correlation Circle
fviz_pca_ind(all_pca_3,
col.ind = grp,
repel = TRUE,
palette = "jco",
legend.title = "Clusters",
label = "none" # hide individual labels - no sensible information
)Adding ellipses helps to visualize the clusters - the argument addEllipses controls that with a boolean, as previously stated and as shown in the following code snippets.
# Cluster Creation
res.all <- kmeans(all_pca_ind_3$coord, centers = 6, nstart = 25)
grp <- as.factor(res.all$cluster)
# Correlation Circle
fviz_pca_ind(all_pca_3,
col.ind = grp,
repel = TRUE,
palette = "jco",
addEllipses = TRUE,
legend.title = "Clusters",
label = "none" # hide individual labels - no sensible information
)# Cluster Creation
res.all <- kmeans(all_pca_ind_3$coord, centers = 3, nstart = 25)
grp <- as.factor(res.all$cluster)
# Correlation Circle
fviz_pca_ind(all_pca_3,
col.ind = grp,
repel = TRUE,
palette = "jco",
addEllipses = TRUE,
legend.title = "Clusters",
label = "none" # hide individual labels - no sensible information
)# Cluster Creation
res.all <- kmeans(all_pca_ind_3$coord, centers = 2, nstart = 25)
grp <- as.factor(res.all$cluster)
# Correlation Circle
fviz_pca_ind(all_pca_3,
col.ind = grp,
repel = TRUE,
palette = "jco",
addEllipses = TRUE,
legend.title = "Clusters",
label = "none" # hide individual labels - no sensible information
)As the correlation circle and the plot of individuals, the biplot is a graphing method which approximates the multi-dimensional dataset (along with its data points) by a bi-dimensional matrix defined by the two most relevant Principal Components; in fact, a biplot is kind of combination of both those plots within a single graph and, as such, its plot is dependent on functions from the factoextra package: fviz_pca_biplot() and fviz_pca(), whose behavior is identical (one is but an alias of the other).
# Do not run this code snippet, as it is only here for illustration purposes
library(factoextra)
fviz_pca(X, ...)
fviz_pca_biplot(X,
axes = c(1, 2),
geom = c("point", "text"),
geom.ind = geom,
geom.var = c("arrow", "text"),
col.ind = "black",
fill.ind = "white",
col.var = "steelblue",
fill.var = "white",
gradient.cols = NULL,
label = "all",
invisible = "none",
repel = FALSE,
habillage = "none",
palette = NULL,
addEllipses = FALSE,
title = "PCA - Biplot",
...)It can be observed that its arguments are a combination of those available for the fviz_pca_var() and fviz_pca_ind() - there are arguments to tweak either, which makes sense given that the biplot itself is a combination of the correlation circle (for variables) and the plot of individuals. As such, it is also possible to create a biplot based upon the results obtained through get_pca_var() and get_pca_ind(). The following code snippets showcase that: the first tab is the most basic biplot (a colorless one) whereas the second and third tab illustrate a biplot based upon contributions and quality of representation (cos2) respectively (using fviz_pca_biplot() upon PCA()’s resulting object).
fviz_pca_biplot(all_pca_3,
col.ind = wbcd$diagnosis,
col="black",
palette = "jco",
geom = "point",
repel=TRUE,
legend.title="Diagnosis",
addEllipses = TRUE)fviz_pca_biplot(all_pca_3,
col.ind = wbcd$diagnosis,
col="black",
palette = "jco",
geom = "point",
repel=TRUE,
legend.title="Diagnosis",
addEllipses = TRUE)fviz_pca_biplot(all_pca_3,
col.ind = wbcd$diagnosis,
col="black",
palette = "jco",
geom = "point",
repel=TRUE,
legend.title="Diagnosis",
addEllipses = TRUE)It is worth noting that the RDocumentation page documenting both fviz_pca_var() and fviz_pca_ind() also details the biplot functions at hand, so any additional information regarding these functions, their behavior and their arguments is available at https://www.rdocumentation.org/packages/factoextra/versions/1.0.7/topics/fviz_pca
Let’s observe the results of applying the function fviz_pca_biplot() to the previously constructed PCA objects.
Machine learning is a branch/subset of artificial intelligence (AI) and an important component of the growing field of data science which mimics the way humans learn by using certain algorithms that improve their accuracy upon training, a process that loops through the sampled data contrasting guesses with real values to evaluate the algorithm’s accuracy so that it can develop a statistical model which maximizes said accuracy and best fits the supplied data. These models can be used in classification and regression-based scenarios (to predict integers/factors and continuous values, respectively) as they uncover key insights and relationships from within the data that are hidden to the human eye and could take years to take grasp of.
This chapter aims to apply a variety of machine learning approaches to the dataset at hand so that the algorithms can determine whether any given patient’s cancer is benign or malign. The goals of this classification exercise include the exploration of the machine learning approaches, its application within R and a comparison of their accuracy in order to determine/choose the one that best fits within this scenario.
Machine learning algorithms are often categorized as either supervised learning or unsupervised learning. The former, as the name suggests, requires a supervisor/user that feeds the algorithm with well labeled data so that the algorithm can learn/train whereas the latter (unsupervised) is being fed information that is neither classified nor labeled allowing the algorithm to act without guidance, grouping such information in clusters according to similarities, patters and differences found in the data. Note that due to the nature of this clustering process, unsupervised machine learning approaches are rarely seen outside of classification scenarios, but since this exercise is of such kind it works well with both supervised and unsupervised algorithms.
The very first step is to divide the dataset at hand in two subsets: a training one, which will be used to train/educate the algorithm, and a testing one, which will be used to evaluate the algorithm’s effectiveness. This subdivision can be achieved using the R built-in functions nrow() and sample() to randomly select row indexes from within the dataset and thus create both the training and the testing sets pseudo-randomly.
train_index <- sample(1:nrow(wbcd), 0.7*nrow(wbcd))
train_set <- wbcd[train_index,]
test_set <- wbcd[-train_index,]
dim(wbcd) # 569 test data
## [1] 569 31
dim(train_set) # 398/569 test data (70%)
## [1] 398 31
dim(test_set) # 171/569 test data (30%)
## [1] 171 31The previous code snippet showcases the construction of the training and testing sets, although built-in R functions are not the usual/preferred approach. The caret package (short for Classification And Regression Training) contains functions to streamline many machine learning tasks and, as such, it is undoubtedly one of the most popular libraries for the matter. Among its many functions lies createDataPartition(), which divides the working dataset into the training and testing subsets while keeping the classification ratio constant within each set - that means that if the dataset is to be divided into a training set that holds 70% of the data and a testing set that holds the missing 30%, then each will have the same factor distribution (from which the algorithm will learn to classify) as the original dataset, avoiding certain unfavorable scenarios where there might not be a sufficient amount of a given factor within the training set to allow the algorithm to develop a fitting model.
# Do not run this code snippet, as it is only here for illustration purposes
library(caret)
createDataPartition(
y,
times = 1,
p = 0.5,
list = TRUE,
groups = min(5, length(y))
)y (the vector of outcomes) is numerical, then this argument defines the number of breaks in the quantiles.
Note that, as opposed to the previous approach, the main argument of the function does not ask for the dataset and uses a vector of outcomes instead (in the case of this exercise said vector of outcomes is the diagnosis array). More information about the function, its behavior and its arguments can be found in its associated RDocumentation page: https://www.rdocumentation.org/packages/caret/versions/6.0-90/topics/createDataPartition
The following code snippet showcases the subset construction using the createDataPartition() function. Note that the balance is 70% for the training set and 30% for the testing one, which is a commonly used ratio for the train-test split (an in fact such was the ratio previously used with the R built-in functions).
library(caret)
train_index <- createDataPartition(wbcd$diagnosis, times = 1, p = 0.7, list = FALSE)
train_set <- wbcd[train_index,]
test_set <- wbcd[-train_index,]
dim(wbcd) # 569 test data
## [1] 569 31
dim(train_set) # 398/569 test data (70%)
## [1] 399 31
dim(test_set) # 171/569 test data (30%)
## [1] 170 31Once the training and testing sets are constructed, the next step is to apply the machine learning algorithm of choice. Many of these algorithms are covered within this chapter, and every single one of them is worthy enough of a document of its own detailing the intricacies of their behavior and inner working - such task goes beyond the scope of this project although there will be an overview briefly describing each of these machine learning approaches.
Linear regression can be considered a machine learning algorithm despite its simplicity and rigidness (which makes it unreliable in most cases). It works rather well under certain scenarios and can be used to obtain a quick reference given the algorithm’s speed (processing-wise it is less demanding than most other machine learning approaches).
The R built-in stats package already bundles a set of functions to build linear regression models, namely lm() and glm(). Let’s detail the former, which is the one related to linear regression (the latter corresponds to logistic regression, which will be detailed later on):
# Do not run this code snippet, as it is only here for illustration purposes
library(stats)
lm(formula,
data,
subset,
weights,
na.action,
method = "qr",
model = TRUE,
x = FALSE,
y = FALSE,
qr = TRUE,
singular.ok = TRUE,
contrasts = NULL,
offset,
...
)formula (or one that can be coarced to that class).
NA.
method = “qr”
TRUE or FALSE determines whether to return the corresponding components of the fit (the model frame, the model matrix, the response, the QR decomposition respectively).
FALSE a singular fit yields an error.
More information regarding the lm() function and all of its arguments is available in its associated RDocumentation page: https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/lm
After feeding the function with the proper information, a model is built:
library(stats)
model_linear <- lm(diagnosis ~ ., data=train_set)Once the model is built, the function predict() is used to apply the constructed model upon the testing subset’s features in order to classify the set’s items. Note that the predicted data needs to be properly factored, for which one can use the factor() function along with a cutoff value (in this case the mean predicted value is used) to tag the guesses.
The function confusionMatrix() from the caret package can be used to visualize the accuracy/success of the algorithm. Confusion matrices are widely used in the data science and machine learning fields since they showcase said accuracy/success through various metrics in an easy-to-understand table. The following code snippet illustrates the use of these functions (predict() and confusionMatrix()), both of which are core functions within the data science and machine learning fields.
library(caret)
prediction_linear <- predict(model_linear, test_set)
prediction_linear <- factor(ifelse(prediction_linear > mean(prediction_linear), "Malignant", "Benign"))
cm_linear <- confusionMatrix(prediction_linear, test_set$diagnosis)
cm_linear
## Confusion Matrix and Statistics
##
## Reference
## Prediction Benign Malignant
## Benign 106 5
## Malignant 1 58
##
## Accuracy : 0.9647
## 95% CI : (0.9248, 0.9869)
## No Information Rate : 0.6294
## P-Value [Acc > NIR] : <2e-16
##
## Kappa : 0.9233
##
## Mcnemar's Test P-Value : 0.2207
##
## Sensitivity : 0.9907
## Specificity : 0.9206
## Pos Pred Value : 0.9550
## Neg Pred Value : 0.9831
## Prevalence : 0.6294
## Detection Rate : 0.6235
## Detection Prevalence : 0.6529
## Balanced Accuracy : 0.9556
##
## 'Positive' Class : Benign
## There are numerous metrics to measure a machine learning algorithm accuracy/success, but it is practical/useful to have a one number summary. The overall accuracy is somewhat misleading since it does not account for the sensitivity and specificity of the model, so alternatives such as the balanced accuracy (which is the average of specificity and sensitivity) or the F1 score (the harmonic average of precision and recall) are preferred.
Accessing these values from the confusion matrix can be achieved using $byClass as is showcased in the following code snippet:
acc_linear <- cm_linear$byClass['Balanced Accuracy']
F1_linear <- cm_linear$byClass['F1']
print(c(acc_linear, F1_linear))
## Balanced Accuracy F1
## 0.9556446 0.9724771A slightly more advanced version of the linear model just described is found within logistic regression. Its fit follows a logistic curve pattern allowing it to better represent non-linear data distributions.
As was the case with the lm() function, the function glm() comes from the built-in stats package, meaning that no additional libraries need to be imported.
# Do not run this code snippet, as it is only here for illustration purposes
library(stats)
glm(formula,
data,
family = gaussian,
weights,
subset,
na.action,
start = NULL,
etastart,
mustart,
offset,
control = list(...),
method = "glm.fit",
model = TRUE,
x = FALSE,
y = TRUE,
singular.ok = TRUE,
contrasts = NULL,
...
)formula (or one that can be coarced to that class).
NA data.
“glm.fit” uses iteratively reweighted least squares (IWLS) whereas the alternative “model.frame” returns the model frame and does no fitting.
FALSE a singular fit yields an error.
More information regarding the lm() function and all of its arguments is available in its associated RDocumentation page: https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/glm
The following code snippet showcases the construction of the model, its associated prediction and confusion matrix and the most relevant one number summary metrics:
library(stats)
library(caret)
model_logistic <- glm(diagnosis ~ ., data=train_set, family = "binomial")
prediction_logistic <- predict(model_logistic, test_set)
prediction_logistic <- factor(ifelse(prediction_logistic > mean(prediction_logistic), "Malignant", "Benign"))
cm_logistic <- confusionMatrix(prediction_logistic, test_set$diagnosis)
cm_logistic
## Confusion Matrix and Statistics
##
## Reference
## Prediction Benign Malignant
## Benign 104 7
## Malignant 3 56
##
## Accuracy : 0.9412
## 95% CI : (0.8945, 0.9714)
## No Information Rate : 0.6294
## P-Value [Acc > NIR] : <2e-16
##
## Kappa : 0.8722
##
## Mcnemar's Test P-Value : 0.3428
##
## Sensitivity : 0.9720
## Specificity : 0.8889
## Pos Pred Value : 0.9369
## Neg Pred Value : 0.9492
## Prevalence : 0.6294
## Detection Rate : 0.6118
## Detection Prevalence : 0.6529
## Balanced Accuracy : 0.9304
##
## 'Positive' Class : Benign
##
acc_logistic <- cm_logistic$byClass['Balanced Accuracy']
F1_logistic <- cm_logistic$byClass['F1']
print(c(acc_logistic, F1_logistic))
## Balanced Accuracy F1
## 0.9304258 0.9541284The C50 package contains an interface to the C5.0 classification model. More information regarding this package, its functions and their inner working can be found in https://cran.r-project.org/web/packages/C50/vignettes/C5.0.html
The most important note about the package is that it uses decision trees as its core. Decision trees can be used to visually and explicitly represent decisions and decision making through, as the name implies, a tree-like model. Visually speaking, these trees are drawn upside down with its root at the top, branching through conditionals through a downwards reading - the end of a branch is known as the “leaf” and represents the algorithm’s decision.
Decision trees are used in machine learning covering both classification and regression scenarios: classification trees predicts the class/factor of an item given a set of features whereas regression trees behave in the same manner although predicting continuous values instead. The C50 package is built around the C5.0() function, which fits a classification tree model upon the dataset in order to train the algorithm so that the tree model learns which features to choose and what conditions to use for splitting/branching (by constantly looping through the constructed tree and comparing the obtained hypothetical results with the real ones provided by the training set).
# Do not run this code snippet, as it is only here for illustration purposes
library(C50)
C5.0(
x,
y,
trials = 1,
rules = FALSE,
weights = NULL,
control = C5.0Control(),
costs = NULL,
...
)C5.0() function fits a classification tree, and this argument clearly shows that the function can not fit a regression fit since it only accepts factors as input.
More information regarding the C5.0() function and all of its arguments is available in its associated RDocumentation page: https://www.rdocumentation.org/packages/C50/versions/0.1.5/topics/C5.0.default
The following code snippet showcases the construction of the model, its associated prediction and confusion matrix and the most relevant one number summary metrics:
library(C50)
library(caret)
model_C50 <- C5.0(train_set[,-1], train_set$diagnosis)
prediction_C50 <- predict(model_C50, test_set[,-1])
cm_C50 <- confusionMatrix(prediction_C50, test_set$diagnosis)
cm_C50
## Confusion Matrix and Statistics
##
## Reference
## Prediction Benign Malignant
## Benign 104 5
## Malignant 3 58
##
## Accuracy : 0.9529
## 95% CI : (0.9094, 0.9795)
## No Information Rate : 0.6294
## P-Value [Acc > NIR] : <2e-16
##
## Kappa : 0.8985
##
## Mcnemar's Test P-Value : 0.7237
##
## Sensitivity : 0.9720
## Specificity : 0.9206
## Pos Pred Value : 0.9541
## Neg Pred Value : 0.9508
## Prevalence : 0.6294
## Detection Rate : 0.6118
## Detection Prevalence : 0.6412
## Balanced Accuracy : 0.9463
##
## 'Positive' Class : Benign
##
acc_C50 <- cm_C50$byClass['Balanced Accuracy']
F1_C50 <- cm_C50$byClass['F1']
print(c(acc_C50, F1_C50))
## Balanced Accuracy F1
## 0.9462988 0.9629630Note that the prediction metrics can be improved by changing the arguments constructing the function, namely the trials input - the following code snippet showcases a loop within which a table is created to compare the accuracy results obtained with various trials values (in between 1 and 50). Note that this process is usually known as tuning, and consists in tweaking the model so that it fits/works better with the data it is being fed so that a higher accuracy can be achieved.
acc_C50_array <- NULL
F1_C50_array <- NULL
for(i in 1:50){
model_C50_temp <- C5.0(train_set[,-1], train_set$diagnosis, trials = i)
prediction_C50_temp <- predict(model_C50_temp, test_set[,-1])
cm_C50_temp <- confusionMatrix(prediction_C50_temp, test_set$diagnosis)
acc_C50_array[i] <- cm_C50_temp$byClass['Balanced Accuracy']
F1_C50_array[i] <- cm_C50_temp$byClass['F1']
}
acc_C50_df <- data.frame(trials = seq(1,50), acc = acc_C50_array)
acc_C50_optimal <- subset(acc_C50_df, acc == max(acc))[1,]
F1_C50_df <- data.frame(trials = seq(1,50), F1 = F1_C50_array)
F1_C50_optimal <- subset(F1_C50_df, F1 == max(F1))[1,]
print(c(acc_C50_optimal, F1_C50_optimal)) # At the time of writing these values coincide, but that might not be the case
## $trials
## [1] 39
##
## $acc
## [1] 0.9827177
##
## $trials
## [1] 5
##
## $F1
## [1] 0.9861751
tuning_C50_df <- data.frame(trials = seq(1,50), success = 0.5 * (acc_C50_array + F1_C50_array)) # We average F1 and balanced accuracy to measure success
library(dplyr) # For the mutate() function used to add balanced accuracy and F1 values to the dataframe
tuning_C50 <- subset(tuning_C50_df, success == max(success))[1,] %>% mutate(acc = acc_C50_df[max(trials), 2], F1 = F1_C50_df[max(trials), 2])
print(tuning_C50)
## trials success acc F1
## 39 39 0.9843166 0.9827177 0.9859155Through tuning, a higher accuracy/success can be achieved. Using a value of 39 for the algorithm trials increases the balanced accuracy from 0.9462988 to 0.9827177 and the F1 score from 0.962963 to 0.9859155, which is a considerable improvement upon the previous results. To see how the number of trials affects the accuracy values a graph can be plotted - for illustration purposes, said plot is performed via two different libraries: highcharter and ggplot2.
sub_C50 <- paste("Optimal number of trials is", tuning_C50$trials, "with an averaged success (balanced accuracy and F1 score) of ", tuning_C50$success)
library(highcharter)
hchart(tuning_C50_df, 'line', hcaes(trials, success)) %>%
hc_title(text = "Averaged success with varying trials (C5.0)") %>%
hc_subtitle(text = sub_C50) %>%
hc_add_theme(hc_theme_google()) %>%
hc_xAxis(title = list(text = "Number of trials")) %>%
hc_yAxis(title = list(text = "Averaged success"))sub_C50 <- paste("Optimal number of trials is", tuning_C50$trials, "with an averaged success (balanced accuracy and F1 score) of ", tuning_C50$success)
library(ggplot2)
ggplot(tuning_C50_df, aes(trials, success)) +
geom_line() +
geom_point() + theme_minimal() +
labs(title = "Averaged success with varying trials (C5.0)",
subtitle = sub_C50,
x = "Number of trials",
y = "Averaged success")The rpart package allows to easily build classification or regression models with a very general structure (meaning that they can be applied in multiple cases/scenarios) using a two stage procedure which follow a decision tree behavior (already detailed within the C5.0 chapter). More information regarding this package can be found in its associated RDocumentation page: https://www.rdocumentation.org/packages/rpart
The core of the package lies within its main function: rpart().
# Do not run this code snippet, as it is only here for illustration purposes
library(rpart)
rpart(formula,
data,
weights,
subset,
na.action = na.rpart,
method,
model = FALSE,
x = FALSE,
y = TRUE,
parms,
control,
cost,
...)feature ~ predictor).
y is missing, but keeps those in which one or more predictors are missing.
“anova”, “poisson”, “class” or “exp”. If method is missing then the routine tries to make an intelligent guess, which is one of the strengths of this function/package:
y is a survival object, then method = “exp” is assumed.
y has 2 columns, then method = “poisson” is assumed.
y is a factor, then method = “class” is assumed.
method = “anova” is assumed.
TRUE or FALSE determine whether to keep a copy of the model frame in the result or not; can also be a model frame, in which case said frame is used rather than constructing new data.
TRUE or FALSE determine whether to keep a copy of the x matrix in the result or not.
TRUE or FALSE determine whether to keep a copy of the dependent variable in the result or not. If missing and model is supplied this defaults to FALSE.
prior, the loss matrix (component loss) or the splitting index (component split).
gini or information.
rpart.control; see https://www.rdocumentation.org/packages/rpart/versions/4.1-15/topics/rpart.control
More information about this function, its behavior and its arguments can be found in its associated RDocumentation page: https://www.rdocumentation.org/packages/rpart/versions/4.1-15/topics/rpart
Despite the amount of customization this function allows, the following code snippet showcases its use, prediction, confusion matrix and relevant one number summary metrics:
library(rpart)
library(caret)
model_rpart <- rpart(diagnosis ~ ., data = train_set)
prediction_rpart <- predict(model_rpart, test_set[,-1], type = "class")
cm_rpart <- confusionMatrix(prediction_rpart, test_set$diagnosis)
cm_rpart
## Confusion Matrix and Statistics
##
## Reference
## Prediction Benign Malignant
## Benign 103 4
## Malignant 4 59
##
## Accuracy : 0.9529
## 95% CI : (0.9094, 0.9795)
## No Information Rate : 0.6294
## P-Value [Acc > NIR] : <2e-16
##
## Kappa : 0.8991
##
## Mcnemar's Test P-Value : 1
##
## Sensitivity : 0.9626
## Specificity : 0.9365
## Pos Pred Value : 0.9626
## Neg Pred Value : 0.9365
## Prevalence : 0.6294
## Detection Rate : 0.6059
## Detection Prevalence : 0.6294
## Balanced Accuracy : 0.9496
##
## 'Positive' Class : Benign
##
acc_rpart <- cm_rpart$byClass['Balanced Accuracy']
F1_rpart <- cm_rpart$byClass['F1']
print(c(acc_rpart, F1_rpart))
## Balanced Accuracy F1
## 0.9495624 0.9626168At the time of writing, the results obtained with this function are somewhat unsatisfactory. Tuning this model could theoretically yield better results, but my tinkering with its arguments has only led to worse accuracy values - here’s an interesting article about the function’s behavior that could help anyone tune the function to its needs and data, but this document needs to move on to the following machine learning approach (time is but the most valuable currency).
Weka is a collection of machine learning algorithm for data mining tasks written in Java, containing tools for data pre-processing, classification, regression, clustering, association rules, and visualization. The package RWeka is but an R interface to said collection, bringing the Weka toolset to the R environment.
# Do not run this code snippet, as it is only here for illustration purposes
library("RWeka")
JRip(formula,
data,
subset,
na.action,
control = Weka_control(),
options = NULL)
M5Rules(formula,
data,
subset,
na.action,
control = Weka_control(),
options = NULL)
OneR(formula,
data,
subset,
na.action,
control = Weka_control(),
options = NULL)
PART(formula,
data,
subset,
na.action,
control = Weka_control(),
options = NULL)y is missing, but keeps those in which one or more predictors are missing.
Weka_control giving options to be passed to the Weka learner (see the associated documentation).
Note that additional information about these functions, their behavior and their arguments can be found in their associated RDocumentation page.
The JRip() function is based upon “RIPPER”, an acronym standing for “Repeated Incremental Pruning to Produce Error Reduction” which, as the name suggests, prunes a decision tree to avoid overfitting and minimize/reduce (potential) error. Its foundations are detailed in Cohen’s work (its author/creator).
The M5Rules() function generates a decision tree using a “separate-and-conquer” approach where each iteration constructs a tree using a given set of rules and turns the “best” leaf into a rule. However, M5Rules() is strictly used within regression exercises, so it cannot be applied in classification tasks such as the ones being performed throughout this document.
The OneR() function builds a simple yet effective and useful “one-rule” classifier, also known as Holte’s classifier or Holte’s 1R classifier after its creator/developer (see the original paper or Nevill-Manning et al.’s take on it). While technically based upon “one feature” and not “one rule”, fact is that its name comes from the fact that it finds exactly one feature (and one or more feature values for that feature) to classify data instances, which makes it a fast and simple approach although it is worth noting that it is not known for its good prediction performance (it is rather recommended for teaching purposes and for lower-bound performance baselines in real-world applications).
The PART() function is based upon “RIPPER”, an acronym standing for “Repeated Incremental Pruning to Produce Error Reduction” which, as the name suggests, prunes a decision tree to avoid overfitting and minimize/reduce (potential) error. Its foundations are detailed in Cohen’s work (its author/creator).
The following code snippet showcases the previously detailed functions albeit M5Rules, which is used in regression exercises and does not fit classification exercises such as this one. The showcase illustrates the construction of the models, their associated predictions and confusion matrices as well as the most relevant one number summary metrics (which can be used to compare their predictions’ success):
library("RWeka")
model_JRip <- JRip(diagnosis~., data = train_set)
predict_JRip <- predict(model_JRip, test_set[,-1])
cm_JRip <- confusionMatrix(predict_JRip, test_set$diagnosis)
cm_JRip
## Confusion Matrix and Statistics
##
## Reference
## Prediction Benign Malignant
## Benign 101 3
## Malignant 6 60
##
## Accuracy : 0.9471
## 95% CI : (0.9019, 0.9755)
## No Information Rate : 0.6294
## P-Value [Acc > NIR] : <2e-16
##
## Kappa : 0.8876
##
## Mcnemar's Test P-Value : 0.505
##
## Sensitivity : 0.9439
## Specificity : 0.9524
## Pos Pred Value : 0.9712
## Neg Pred Value : 0.9091
## Prevalence : 0.6294
## Detection Rate : 0.5941
## Detection Prevalence : 0.6118
## Balanced Accuracy : 0.9482
##
## 'Positive' Class : Benign
##
model_OneR <- OneR(diagnosis~., data = train_set)
predict_OneR <- predict(model_OneR, test_set[,-1])
cm_OneR <- confusionMatrix(predict_OneR, test_set$diagnosis)
cm_OneR
## Confusion Matrix and Statistics
##
## Reference
## Prediction Benign Malignant
## Benign 97 6
## Malignant 10 57
##
## Accuracy : 0.9059
## 95% CI : (0.8517, 0.9452)
## No Information Rate : 0.6294
## P-Value [Acc > NIR] : <2e-16
##
## Kappa : 0.8008
##
## Mcnemar's Test P-Value : 0.4533
##
## Sensitivity : 0.9065
## Specificity : 0.9048
## Pos Pred Value : 0.9417
## Neg Pred Value : 0.8507
## Prevalence : 0.6294
## Detection Rate : 0.5706
## Detection Prevalence : 0.6059
## Balanced Accuracy : 0.9057
##
## 'Positive' Class : Benign
##
model_PART <- PART(diagnosis~., data = train_set)
predict_PART <- predict(model_PART, test_set[,-1])
cm_PART <- confusionMatrix(predict_PART, test_set$diagnosis)
cm_PART
## Confusion Matrix and Statistics
##
## Reference
## Prediction Benign Malignant
## Benign 102 3
## Malignant 5 60
##
## Accuracy : 0.9529
## 95% CI : (0.9094, 0.9795)
## No Information Rate : 0.6294
## P-Value [Acc > NIR] : <2e-16
##
## Kappa : 0.8998
##
## Mcnemar's Test P-Value : 0.7237
##
## Sensitivity : 0.9533
## Specificity : 0.9524
## Pos Pred Value : 0.9714
## Neg Pred Value : 0.9231
## Prevalence : 0.6294
## Detection Rate : 0.6000
## Detection Prevalence : 0.6176
## Balanced Accuracy : 0.9528
##
## 'Positive' Class : Benign
##
RWeka_df <- data.frame("Balanced Accuracy" = c(cm_JRip$byClass['Balanced Accuracy'],
cm_OneR$byClass['Balanced Accuracy'],
cm_PART$byClass['Balanced Accuracy']),
"F1" = c(cm_JRip$byClass['F1'],
cm_OneR$byClass['F1'],
cm_PART$byClass['F1']),
row.names = c("JRip", "OneR", "PART"))
RWeka_df
## Balanced.Accuracy F1
## JRip 0.9481531 0.9573460
## OneR 0.9056520 0.9238095
## PART 0.9528260 0.9622642As can be seen, PART yields the best results out of all of these functions with default settings (perhaps tuning each of these functions could change these results).
The Naive Bayes algorithm, also known as the Multinomial Naive Bayes Classifier, uses this principle to build its decision tree, developing each branch based upon the probability of each branch given prior (training) knowledge. Josh Starmer’s video on it visually explains and exemplifies this concept (he has published an even more simplified video where the naivity aspect of the algorithm is explained, which briefly speaking is due to the algorithm ignoring potential relationships between features).
There are many libraries which include a Naive Bayes’ machine learning function but, for the sake of simplicity, this document will focus around one: the naiveBayes() function.
# Do not run this code snippet, as it is only here for illustration purposes
library(e1071)
naiveBayes(x,
y,
laplace = 0,
...
)More information about this function, its behavior and its arguments can be found in its associated RDocumentation page: https://www.rdocumentation.org/packages/e1071/versions/1.7-9/topics/naiveBayes
The following code snippet showcases the construction of the model, its associated prediction and confusion matrix and the most relevant one number summary metrics:
library(e1071)
library(caret)
model_naiveBayes <- naiveBayes(train_set[,-1], train_set$diagnosis)
prediction_naiveBayes <- predict(model_naiveBayes, test_set[,-1], type = "class")
cm_naiveBayes <- confusionMatrix(prediction_naiveBayes, test_set$diagnosis)
cm_naiveBayes
## Confusion Matrix and Statistics
##
## Reference
## Prediction Benign Malignant
## Benign 106 3
## Malignant 1 60
##
## Accuracy : 0.9765
## 95% CI : (0.9409, 0.9936)
## No Information Rate : 0.6294
## P-Value [Acc > NIR] : <2e-16
##
## Kappa : 0.9492
##
## Mcnemar's Test P-Value : 0.6171
##
## Sensitivity : 0.9907
## Specificity : 0.9524
## Pos Pred Value : 0.9725
## Neg Pred Value : 0.9836
## Prevalence : 0.6294
## Detection Rate : 0.6235
## Detection Prevalence : 0.6412
## Balanced Accuracy : 0.9715
##
## 'Positive' Class : Benign
##
acc_naiveBayes <- cm_naiveBayes$byClass['Balanced Accuracy']
F1_naiveBayes <- cm_naiveBayes$byClass['F1']
print(c(acc_naiveBayes, F1_naiveBayes))
## Balanced Accuracy F1
## 0.9715176 0.9814815Laplace smoothing can be tuned to achieve a better prediction.
library(e1071)
library(caret)
acc_naiveBayes_array <- NULL
F1_naiveBayes_array <- NULL
for(i in 0:50){
model_naiveBayes_temp <- naiveBayes(train_set[,-1], train_set$diagnosis, laplace = i)
prediction_naiveBayes_temp <- predict(model_naiveBayes_temp, test_set[,-1])
cm_naiveBayes_temp <- confusionMatrix(prediction_naiveBayes_temp, test_set$diagnosis)
acc_naiveBayes_array[i+1] <- cm_naiveBayes_temp$byClass['Balanced Accuracy']
F1_naiveBayes_array[i+1] <- cm_naiveBayes_temp$byClass['F1']
}
acc_naiveBayes_df <- data.frame(laplace = seq(0,50), acc = acc_naiveBayes_array)
acc_naiveBayes_optimal <- subset(acc_naiveBayes_df, acc == max(acc))[1,]
F1_naiveBayes_df <- data.frame(laplace = seq(0,50), F1 = F1_naiveBayes_array)
F1_naiveBayes_optimal <- subset(F1_naiveBayes_df, F1 == max(F1))[1,]
print(c(acc_naiveBayes_optimal, F1_naiveBayes_optimal))
## $laplace
## [1] 0
##
## $acc
## [1] 0.9715176
##
## $laplace
## [1] 0
##
## $F1
## [1] 0.9814815
# We average F1 and balanced accuracy to measure success
tuning_naiveBayes_df <- data.frame(laplace = seq(0,50), success = 0.5 * (acc_naiveBayes_array + F1_naiveBayes_array))
tuning_naiveBayes <- subset(tuning_naiveBayes_df, success == max(success))[1,]
print(tuning_naiveBayes)
## laplace success
## 1 0 0.9764995In this case, Laplace smoothing does not affect the one number summary metrics whatsoever. To appreciate this, a graph can be plotted - for illustration purposes, said plot is performed via two different libraries: highcharter and ggplot2.
sub_naiveBayes <- paste("Optimal number of laplace is", tuning_naiveBayes$laplace, "with an averaged success (balanced accuracy and F1 score) of ", tuning_naiveBayes$success)
library(highcharter)
hchart(tuning_naiveBayes_df, 'line', hcaes(laplace, success)) %>%
hc_title(text = "Averaged success with varying Laplace values (Naive Bayes)") %>%
hc_subtitle(text = sub_naiveBayes) %>%
hc_add_theme(hc_theme_google()) %>%
hc_xAxis(title = list(text = "Number of laplace")) %>%
hc_yAxis(title = list(text = "Averaged success"))sub_naiveBayes <- paste("Optimal Laplace value is", tuning_naiveBayes$laplace, "with an averaged success (balanced accuracy and F1 score) of ", tuning_naiveBayes$success)
library(ggplot2)
ggplot(tuning_naiveBayes_df, aes(laplace, success)) +
geom_line() +
geom_point() + theme_minimal() +
labs(title = "Averaged success with varying Laplace values (Naive Bayes)",
subtitle = sub_naiveBayes,
x = "Number of trials",
y = "Averaged success")Conditional Inference trees, also referred as unbiased recursive partitioning, is a non-parametric class of decision trees that uses a statistical theory (selection by permutation-based significance tests) in order to select variables instead of selecting the variable that maximizes an information measure (Gini coefficient or Information Gain) and thereby removes the potential bias in CART or similar decision trees.
Its usage within R comes from the party (usually referred to as partykit in most documentation) and its associated ctree() function.
# Do not run this code snippet, as it is only here for illustration purposes
library(party)
ctree(formula,
data,
subset,
weights,
na.action = na.pass,
offset, cluster,
control = ctree_control(...),
ytrafo = NULL,
converged = NULL,
scores = NULL,
doFit = TRUE,
...
)FALSE, the tree is not fitted.
More information about this function, its behavior and its arguments can be found in its associated RDocumentation page: https://www.rdocumentation.org/packages/partykit/versions/1.2-15/topics/ctree
The following code snippet showcases the construction of the model, its associated prediction and confusion matrix and the most relevant one number summary metrics:
library(party)
library(caret)
model_ctree <- ctree(diagnosis~., data=train_set)
prediction_ctree <- predict(model_ctree, test_set[,-1])
cm_ctree <- confusionMatrix(prediction_ctree, test_set$diagnosis)
cm_ctree
## Confusion Matrix and Statistics
##
## Reference
## Prediction Benign Malignant
## Benign 100 4
## Malignant 7 59
##
## Accuracy : 0.9353
## 95% CI : (0.8872, 0.9673)
## No Information Rate : 0.6294
## P-Value [Acc > NIR] : <2e-16
##
## Kappa : 0.8626
##
## Mcnemar's Test P-Value : 0.5465
##
## Sensitivity : 0.9346
## Specificity : 0.9365
## Pos Pred Value : 0.9615
## Neg Pred Value : 0.8939
## Prevalence : 0.6294
## Detection Rate : 0.5882
## Detection Prevalence : 0.6118
## Balanced Accuracy : 0.9355
##
## 'Positive' Class : Benign
##
acc_ctree <- cm_ctree$byClass['Balanced Accuracy']
F1_ctree <- cm_ctree$byClass['F1']
print(c(acc_ctree, F1_ctree))
## Balanced Accuracy F1
## 0.9355437 0.9478673The control parameter allows users to tweak the function through the ctree_control function and its maxdepth argument. Said argument determines the number of features around which to build the decision tree, and as such it should plateau at the optimal number of significant features.
The following code snippet loops through values to evaluate how the one number summary metrics change with the tree’s depth.
acc_ctree_array <- NULL
F1_ctree_array <- NULL
for(i in 1:50){
model_ctree_temp <- ctree(diagnosis~., data=train_set, controls=ctree_control(maxdepth=i))
prediction_ctree_temp <- predict(model_ctree_temp, test_set[,-1])
cm_ctree_temp <- confusionMatrix(prediction_ctree_temp, test_set$diagnosis)
acc_ctree_array[i] <- cm_ctree_temp$byClass['Balanced Accuracy']
F1_ctree_array[i] <- cm_ctree_temp$byClass['F1']
}
acc_ctree_df <- data.frame(depth = seq(1,50), acc = acc_ctree_array)
acc_ctree_optimal <- subset(acc_ctree_df, acc == max(acc))[1,]
F1_ctree_df <- data.frame(depth = seq(1,50), F1 = F1_ctree_array)
F1_ctree_optimal <- subset(F1_ctree_df, F1 == max(F1))[1,]
print(c(acc_ctree_optimal, F1_ctree_optimal)) # At the time of writing these values coincide, but that might not be the case
## $depth
## [1] 2
##
## $acc
## [1] 0.9420709
##
## $depth
## [1] 3
##
## $F1
## [1] 0.9478673
tuning_ctree_df <- data.frame(depth = seq(1,50), success = 0.5 * (acc_ctree_array + F1_ctree_array)) # We average F1 and balanced accuracy to measure success
library(dplyr) # For the mutate() function used to add balanced accuracy and F1 values to the dataframe
tuning_ctree <- subset(tuning_ctree_df, success == max(success))[1,] %>% mutate(acc = acc_ctree_df[max(depth), 2], F1 = F1_ctree_df[max(depth), 2])
print(tuning_ctree)
## depth success acc F1
## 2 2 0.9444654 0.9420709 0.9468599A graph can be plotted to appreciate the depth’s effect on the algorithm’ success. For illustration purposes, said plot is performed via two different libraries: highcharter and ggplot2.
sub_ctree <- paste("Optimal depth value is", tuning_ctree$depth, "with an averaged success (balanced accuracy and F1 score) of ", tuning_ctree$success)
library(highcharter)
hchart(tuning_ctree_df, 'line', hcaes(depth, success)) %>%
hc_title(text = "Averaged success with varying depth (Conditional Inference Trees)") %>%
hc_subtitle(text = sub_ctree) %>%
hc_add_theme(hc_theme_google()) %>%
hc_xAxis(title = list(text = "Depth value")) %>%
hc_yAxis(title = list(text = "Averaged success"))sub_ctree <- paste("Optimal depth value is", tuning_ctree$depth, "with an averaged success (balanced accuracy and F1 score) of ", tuning_ctree$success)
library(ggplot2)
ggplot(tuning_ctree_df, aes(depth, success)) +
geom_line() +
geom_point() + theme_minimal() +
labs(title = "Averaged success with varying depth (Conditional Inference Trees)",
subtitle = sub_ctree,
x = "Depth value",
y = "Averaged success")These results evidentiate that the most reasonable value for the maxdepth argument is 2 and any further increase in said value does not yield an improvement.
There are many other possible tweaks to better tune the function’s behavior. However, the current scope of this document does not cover them (might update it in the future). As of now, refer to the provided links to better understand the function and its arguments.
Decision trees tend to suffer from high variance: if the dataset is split into two halves, applying a decision tree to both halves could yield quite different results. One method that can be used in order to reduce the variance of a single decision tree is to make use of a random forest model. With the random forest approach a large number of decision trees are created, and every observation is fed into every decision tree with the most common outcome for each observation being used as the final output. Every new observation is fed into all the trees so that predictions are built upon a majority vote (with each and every tree being participant in said resolution).
The go-to library for random forest machine learning usage is randomForest, with its core function being randomForest().
# Do not run this code snippet, as it is only here for illustration purposes
randomForest(x,
y=NULL,
xtest=NULL,
ytest=NULL,
ntree=500,
na.action=na.fail,
mtry = if (!is.null(y) && !is.factor(y))
max(floor(ncol(x)/3), 1)
else
floor(sqrt(ncol(x))),
replace=TRUE,
classwt=NULL,
cutoff,
strata,
sampsize = if (replace) nrow(x) else ceiling(.632*nrow(x)),
nodesize = if (!is.null(y) && !is.factor(y)) 5 else 1,
maxnodes = NULL,
importance=FALSE,
localImp=FALSE,
nPerm=1,
proximity,
oob.prox=proximity,
norm.votes=TRUE,
do.trace=FALSE,
keep.forest=!is.null(y) && is.null(xtest),
corr.bias=FALSE,
keep.inbag=FALSE,
...
)y is missing, but keeps those in which one or more predictors are missing.
TRUE or FALSE determines whether to sample the cases with replacement or not.
1/k where k is the number of classes (i.e., majority vote wins).
TRUE or FALSE determines whether to assess the importance of predictors.
TRUE or FALSE determines whether to compute casewire importance measures.
TRUE or FALSE determines whether to measure the proximity between rows.
TRUE or FALSE determines whether to calculate proximity using only out-of-bag data.
TRUE (default), the final result of votes are expressed as fractions. If FALSE, raw vote counts are returned (useful for combining results from different runs). Ignored for regression exercises.
TRUE, a more verbose output output takes place.
FALSE, the forest will not be retained in the output object.
TRUE or FALSE determines whether to return a matrix with dimensions nxntree that keeps track of which samples are “in-bag” in which trees.
As can be observed, random forests (and randomForest() particularly) have an overwhelming amount of customazibility, mainly due to their potential complexity. If needed/interested, refer to the associated RDocumentation page for more information about this function, its behavior and its arguments: https://www.rdocumentation.org/packages/e1071/versions/1.7-9/topics/naiveBayes
The following code snippet showcases the construction of the model, its associated prediction and confusion matrix and the most relevant one number summary metrics (arguments set at default):
library(randomForest)
library(caret)
model_randomForest <- randomForest(x = train_set[,-1], y = train_set$diagnosis)
prediction_randomForest <- predict(model_randomForest, test_set[,-1], type = "class")
cm_randomForest <- confusionMatrix(prediction_randomForest, test_set$diagnosis)
cm_randomForest
## Confusion Matrix and Statistics
##
## Reference
## Prediction Benign Malignant
## Benign 106 1
## Malignant 1 62
##
## Accuracy : 0.9882
## 95% CI : (0.9581, 0.9986)
## No Information Rate : 0.6294
## P-Value [Acc > NIR] : <2e-16
##
## Kappa : 0.9748
##
## Mcnemar's Test P-Value : 1
##
## Sensitivity : 0.9907
## Specificity : 0.9841
## Pos Pred Value : 0.9907
## Neg Pred Value : 0.9841
## Prevalence : 0.6294
## Detection Rate : 0.6235
## Detection Prevalence : 0.6294
## Balanced Accuracy : 0.9874
##
## 'Positive' Class : Benign
##
acc_randomForest <- cm_randomForest$byClass['Balanced Accuracy']
F1_randomForest <- cm_randomForest$byClass['F1']
print(c(acc_randomForest, F1_randomForest))
## Balanced Accuracy F1
## 0.9873906 0.9906542The aforementioned complexity of random forests can make the process of tuning the algorithm’s behavior overwhelming and detailing every intricacy goes beyond the scope of this document. However, it is worth noting that there is a distinct correlation between the number of trees and the predictions’ prevalence, up until a given number up from which there is no significant accuracy gains - there’s only an increasing computing time requirement. That number can be as low as one (tree), and selecting the lowest value possible with an acceptable prevalence can save precious processing time.
The following code snippet aims to find said number of trees:
library(randomForest)
library(caret)
acc_randomForest_array <- NULL
F1_randomForest_array <- NULL
prevalence_randomForest_array <- NULL
for(i in 0:50){
model_randomForest_temp <- naiveBayes(train_set[,-1], train_set$diagnosis, ntree = i*15 + 1, importance = TRUE, proximity = TRUE)
prediction_randomForest_temp <- predict(model_randomForest_temp, test_set[,-1])
cm_randomForest_temp <- confusionMatrix(prediction_randomForest_temp, test_set$diagnosis)
acc_randomForest_array[i+1] <- cm_randomForest_temp$byClass['Balanced Accuracy']
F1_randomForest_array[i+1] <- cm_randomForest_temp$byClass['F1']
prevalence_randomForest_array[i+1] <- cm_randomForest_temp$byClass['Prevalence']
}
acc_randomForest_df <- data.frame(ntree = seq(1, length.out = 51, by = 15), acc = acc_randomForest_array)
acc_randomForest_optimal <- subset(acc_randomForest_df, acc == max(acc))[1,]
F1_randomForest_df <- data.frame(ntree = seq(1, length.out = 51, by = 15), F1 = F1_randomForest_array)
F1_randomForest_optimal <- subset(F1_randomForest_df, F1 == max(F1))[1,]
prevalence_randomForest_df <- data.frame(ntree = seq(1, length.out = 51, by = 15), prevalence = prevalence_randomForest_array)
prevalence_randomForest_optimal <- subset(prevalence_randomForest_df, prevalence == max(prevalence))[1,]
print(c(acc_randomForest_optimal, F1_randomForest_optimal, prevalence_randomForest_optimal))
## $ntree
## [1] 1
##
## $acc
## [1] 0.9715176
##
## $ntree
## [1] 1
##
## $F1
## [1] 0.9814815
##
## $ntree
## [1] 1
##
## $prevalence
## [1] 0.6294118
# We average F1, balanced accuracy and prevalence to measure success
tuning_randomForest_df <- data.frame(ntree = seq(1, by = 15), success = 0.33 * (acc_randomForest_array + F1_randomForest_array + prevalence_randomForest_array))
library(dplyr) # For the mutate() function used to add balanced accuracy and F1 values to the dataframe
tuning_randomForest <- subset(tuning_randomForest_df, success == max(success))[1,] %>%
mutate(acc = acc_randomForest_df[max(ntree), 2], F1 = F1_randomForest_df[max(ntree), 2], prevalence = prevalence_randomForest_df[max(ntree), 2])
print(tuning_randomForest)
## ntree success acc F1 prevalence
## 1 1 0.8521956 0.9715176 0.9814815 0.6294118K-Nearest Neighbors (KNN) is built around Euclidian distances. Using KNN, for any point (x1, x2) for which an estimate p(x1, x2) is wanted, the algorithm looks for the K nearest points to (x1, x2) and computes an average of the 0s and 1s associated with these points. This set of points used to compute the average as the neighborhood. Larger values of K result in smoother estimates, while smaller values of K result in more flexible and wiggly estimates.
caret package, which includes the knn3() function.
class package, which includes the knn() function.
Let’s detail the former:
# Do not run this code snippet, as it is only here for illustration purposes
library(caret)
# For formula
knn3(formula,
data,
k = 5,
subset,
na.action,
...
)
# For dataframes and matrices
knn3(x,
y,
k = 5,
...
)lhs ~ rhs where lhs is the response variable and rhs a set of predictors.
NA data.
More information about this function, its behavior and its arguments can be found in its associated RDocumentation page: https://www.rdocumentation.org/packages/caret/versions/6.0-90/topics/knn3
On the other hand, the function knn() from the class library goes as follows:
# Do not run this code snippet, as it is only here for illustration purposes
library(class)
knn(train,
test,
cl,
k = 1,
l = 0,
prob = FALSE,
use.all = TRUE)k dissenting votes are allowed).
TRUE, the proportion of the votes for the winning class are returned as attribute prob.
More information about this function, its behavior and its arguments can be found in its associated RDocumentation page: https://www.rdocumentation.org/packages/class/versions/7.3-20/topics/knn
The following code snippet showcases both functions, where the number of neighbors has been set at 5 in both cases in order to properly compare their results. Note that knn() returns the prediction directly whereas previous functions build a model which is then used to construct a prediction array.
library(caret)
model_knn3 <- knn3(x = train_set[,-1], y = train_set$diagnosis, k = 5)
prediction_knn3 <- predict(model_knn3, test_set[,-1], type = "class")
cm_knn3 <- confusionMatrix(prediction_knn3, test_set$diagnosis)
cm_knn3
## Confusion Matrix and Statistics
##
## Reference
## Prediction Benign Malignant
## Benign 100 7
## Malignant 7 56
##
## Accuracy : 0.9176
## 95% CI : (0.8657, 0.9542)
## No Information Rate : 0.6294
## P-Value [Acc > NIR] : <2e-16
##
## Kappa : 0.8235
##
## Mcnemar's Test P-Value : 1
##
## Sensitivity : 0.9346
## Specificity : 0.8889
## Pos Pred Value : 0.9346
## Neg Pred Value : 0.8889
## Prevalence : 0.6294
## Detection Rate : 0.5882
## Detection Prevalence : 0.6294
## Balanced Accuracy : 0.9117
##
## 'Positive' Class : Benign
##
acc_knn3 <- cm_knn3$byClass['Balanced Accuracy']
F1_knn3 <- cm_knn3$byClass['F1']
print(c(acc_knn3, F1_knn3))
## Balanced Accuracy F1
## 0.9117342 0.9345794
library(class)
prediction_knn <- knn(train = train_set[,-1], test = test_set[,-1], cl = train_set$diagnosis, k = 5)
cm_knn <- confusionMatrix(prediction_knn, test_set$diagnosis)
cm_knn
## Confusion Matrix and Statistics
##
## Reference
## Prediction Benign Malignant
## Benign 100 7
## Malignant 7 56
##
## Accuracy : 0.9176
## 95% CI : (0.8657, 0.9542)
## No Information Rate : 0.6294
## P-Value [Acc > NIR] : <2e-16
##
## Kappa : 0.8235
##
## Mcnemar's Test P-Value : 1
##
## Sensitivity : 0.9346
## Specificity : 0.8889
## Pos Pred Value : 0.9346
## Neg Pred Value : 0.8889
## Prevalence : 0.6294
## Detection Rate : 0.5882
## Detection Prevalence : 0.6294
## Balanced Accuracy : 0.9117
##
## 'Positive' Class : Benign
##
acc_knn <- cm_knn$byClass['Balanced Accuracy']
F1_knn <- cm_knn$byClass['F1']
print(c(acc_knn, F1_knn))
## Balanced Accuracy F1
## 0.9117342 0.9345794Note that results are identical with both functions, as should be (core-wise they should be identical).
KNN can be fine-tuned by selecting an optimal number of neighbors. Said tuning process is showcased in the following code snippet (only knn3() is used now):
library(caret)
acc_knn3_array <- NULL
F1_knn3_array <- NULL
for(i in 1:50){
model_knn3_temp <- knn3(x = train_set[,-1], y = train_set$diagnosis, k = i)
prediction_knn3_temp <- predict(model_knn3_temp, test_set[,-1], type = "class")
cm_knn3_temp <- confusionMatrix(prediction_knn3_temp, test_set$diagnosis)
acc_knn3_array[i] <- cm_knn3_temp$byClass['Balanced Accuracy']
F1_knn3_array[i] <- cm_knn3_temp$byClass['F1']
}
acc_knn3_df <- data.frame(k = seq(1,50), acc = acc_knn3_array)
acc_knn3_optimal <- subset(acc_knn3_df, acc == max(acc))[1,]
F1_knn3_df <- data.frame(k = seq(1,50), F1 = F1_knn3_array)
F1_knn3_optimal <- subset(F1_knn3_df, F1 == max(F1))[1,]
print(c(acc_knn3_optimal, F1_knn3_optimal)) # At the time of writing these values coincide, but that might not be the case
## $k
## [1] 20
##
## $acc
## [1] 0.9224892
##
## $k
## [1] 20
##
## $F1
## [1] 0.9497717
tuning_knn3_df <- data.frame(k = seq(1,50), success = 0.5 * (acc_knn3_array + F1_knn3_array)) # We average F1 and balanced accuracy to measure success
library(dplyr) # For the mutate() function used to add balanced accuracy and F1 values to the dataframe
tuning_knn3 <- subset(tuning_knn3_df, success == max(success))[1,] %>% mutate(acc = acc_knn3_df[max(k), 2], F1 = F1_knn3_df[max(k), 2])
print(tuning_knn3)
## k success acc F1
## 20 20 0.9361305 0.9224892 0.9497717The best results are obtained with k = 20, which might seem lower than expected but it does make sense since, as was already stated, smaller values of K result in more flexible and wiggly estimates.
These values and their associated results can be graphically interpreted with a plot. For illustration purposes, said plot is performed via two different libraries: highcharter and ggplot2.
sub_knn3 <- paste("Optimal number of neighbors is", tuning_knn3$k, "with an averaged success (balanced accuracy and F1 score) of ", tuning_knn3$success)
library(highcharter)
hchart(tuning_knn3_df, 'line', hcaes(k, success)) %>%
hc_title(text = "Averaged success with varying k (KNN)") %>%
hc_subtitle(text = sub_knn3) %>%
hc_add_theme(hc_theme_google()) %>%
hc_xAxis(title = list(text = "K neighbors")) %>%
hc_yAxis(title = list(text = "Averaged success"))sub_knn3 <- paste("Optimal number of neighbors is", tuning_knn3$k, "with an averaged success (balanced accuracy and F1 score) of ", tuning_knn3$success)
library(ggplot2)
ggplot(tuning_knn3_df, aes(k, success)) +
geom_line() +
geom_point() + theme_minimal() +
labs(title = "Averaged success with varying k (KNN)",
subtitle = sub_knn3,
x = "K neighbors",
y = "Averaged success")More machine learning approaches/algorithms to be included in a later update.
The following code snippet plots the overall success of each approach. It does so with a “Four fold” plot where each confusion matrix can be visually evaluated: the four folds represent each section of the confusion matrix table, with the blue sections being correct guesses (on one side, benign being predicted as benign; on the other, malign being predicted as malign) and the red sections correspond to incorrect guesses (benign being predicted as malign and vice-versa).
The overall success of each approach is represented by the average of their balanced accuracy and their F1 score (as a percentage).
# Visualize to compare the accuracy of all methods
col <- c("#ed3b3b", "#0099ff")
par(mfrow=c(4,4))
fourfoldplot(cm_linear$table, color = col, conf.level = 0, margin = 1, main=paste("Linear Model (",round(mean(acc_linear, F1_linear)*100, 2),"%)",sep=""))
fourfoldplot(cm_logistic$table, color = col, conf.level = 0, margin = 1, main=paste("Logistic Model (",round(mean(acc_logistic, F1_logistic)*100, 2),"%)",sep=""))
fourfoldplot(cm_C50$table, color = col, conf.level = 0, margin = 1, main=paste("C5.0 (",round(mean(acc_C50, F1_C50)*100, 2),"%)",sep=""))
fourfoldplot(cm_rpart$table, color = col, conf.level = 0, margin = 1, main=paste("rpart (",round(mean(acc_rpart, F1_rpart)*100, 2),"%)",sep=""))
fourfoldplot(cm_JRip$table, color = col, conf.level = 0, margin = 1, main=paste("RWeka JRip (",round(mean(cm_JRip$byClass['Balanced Accuracy'], cm_JRip$byClass['F1'])*100, 2),"%)",sep=""))
fourfoldplot(cm_OneR$table, color = col, conf.level = 0, margin = 1, main=paste("RWeka OneR (",round(mean(cm_OneR$byClass['Balanced Accuracy'], cm_OneR$byClass['F1'])*100, 2),"%)",sep=""))
fourfoldplot(cm_PART$table, color = col, conf.level = 0, margin = 1, main=paste("RWeka PART (",round(mean(cm_PART$byClass['Balanced Accuracy'], cm_PART$byClass['F1'])*100, 2),"%)",sep=""))
fourfoldplot(cm_naiveBayes$table, color = col, conf.level = 0, margin = 1, main=paste("Naive Bayes (",round(mean(acc_naiveBayes, F1_naiveBayes)*100, 2),"%)",sep=""))
fourfoldplot(cm_ctree$table, color = col, conf.level = 0, margin = 1, main=paste("ctree (",round(mean(acc_ctree, F1_ctree)*100, 2),"%)",sep=""))
fourfoldplot(cm_randomForest$table, color = col, conf.level = 0, margin = 1, main=paste("Random Forest (",round(mean(acc_randomForest, F1_randomForest)*100, 2),"%)",sep=""))
fourfoldplot(cm_knn3$table, color = col, conf.level = 0, margin = 1, main=paste("KNN3 (",round(mean(acc_knn3, F1_knn3)*100, 2),"%)",sep=""))
fourfoldplot(cm_knn$table, color = col, conf.level = 0, margin = 1, main=paste("KNN (",round(mean(acc_knn, F1_knn)*100, 2),"%)",sep=""))The following code snippet goes through the evaluated approaches and picks the one with the higher success.
# Select a best prediction model according to high accuracy
opt_predict <- c(mean(acc_linear, F1_linear)*100,
mean(acc_logistic, F1_logistic)*100,
mean(acc_C50, F1_C50)*100,
mean(acc_rpart, F1_rpart)*100,
mean(cm_JRip$byClass['Balanced Accuracy'], cm_JRip$byClass['F1'])*100,
mean(cm_OneR$byClass['Balanced Accuracy'], cm_OneR$byClass['F1'])*100,
mean(cm_PART$byClass['Balanced Accuracy'], cm_PART$byClass['F1'])*100,
mean(acc_naiveBayes, F1_naiveBayes)*100,
mean(acc_ctree, F1_ctree)*100,
mean(acc_randomForest, F1_randomForest)*100,
mean(acc_knn3, F1_knn3)*100,
mean(acc_knn, F1_knn)*100)
names(opt_predict) <- c("Linear Model",
"Logistic Model",
"C5.0",
"rpart",
"RWeka JRip",
"RWeka OneR",
"RWeka PART",
"Naive Bayes",
"ctree",
"Random Forest",
"KNN3",
"KNN")
best_predict_model <- subset(opt_predict, opt_predict==max(opt_predict))
best_predict_model
## Random Forest
## 98.73906