In this project I will use Dimension Reduction approach for popular UCI Wine data. Basically, Dimensionality reduction eliminates some features of the dataset and creates a restricted set of features that contains all of the information needed to predict the target variables more efficiently and accurately.
In this part, you can observe Exploratory Data Analysis of the given data with some visualization parts to get an insight about the distribution of the data. We will analyze the data by using specific R codes. We will add visualization parts as well to observe data deeply.
The data set was found on UC Irvine Machine Learning Repository.
The data set consists of 13 variables.The columns of the Dataset are as following;
Alcohol
Malic acid
Ash
Alcalinity of ash
Magnesium
Total phenols
Flavanoids
Nonflavanoid phenols
Proanthocyanins
Color intensity
Hue
OD
Proline
Firstly, we should install the necessary libraries to our environment.
requiredPackages = c("tidyverse", "dplyr", "ggplot2","stats","factoextra","knitr","gridExtra","corrplot","readr","psych")
for(i in requiredPackages){if(!require(i,character.only = TRUE)) install.packages(i)}
## Loading required package: tidyverse
## Warning: package 'tidyverse' was built under R version 4.3.2
## Warning: package 'ggplot2' was built under R version 4.3.2
## Warning: package 'tidyr' was built under R version 4.3.2
## Warning: package 'purrr' was built under R version 4.3.2
## Warning: package 'dplyr' was built under R version 4.3.2
## Warning: package 'forcats' was built under R version 4.3.2
## Warning: package 'lubridate' was built under R version 4.3.2
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## âś” dplyr 1.1.3 âś” readr 2.1.4
## âś” forcats 1.0.0 âś” stringr 1.5.0
## âś” ggplot2 3.4.4 âś” tibble 3.2.1
## âś” lubridate 1.9.3 âś” tidyr 1.3.0
## âś” purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## âś– dplyr::filter() masks stats::filter()
## âś– dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
## Loading required package: factoextra
## Warning: package 'factoextra' was built under R version 4.3.2
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
## Loading required package: knitr
## Loading required package: gridExtra
## Warning: package 'gridExtra' was built under R version 4.3.2
##
## Attaching package: 'gridExtra'
##
## The following object is masked from 'package:dplyr':
##
## combine
##
## Loading required package: corrplot
## Warning: package 'corrplot' was built under R version 4.3.2
## corrplot 0.92 loaded
## Loading required package: psych
##
## Attaching package: 'psych'
##
## The following objects are masked from 'package:ggplot2':
##
## %+%, alpha
for(i in requiredPackages){if(!require(i,character.only = TRUE)) library(i,character.only = TRUE) }
library(clusterSim)
## Warning: package 'clusterSim' was built under R version 4.3.2
## Loading required package: cluster
## Warning: package 'cluster' was built under R version 4.3.2
## Loading required package: MASS
##
## Attaching package: 'MASS'
##
## The following object is masked from 'package:dplyr':
##
## select
library(smacof)
## Warning: package 'smacof' was built under R version 4.3.2
## Loading required package: plotrix
## Warning: package 'plotrix' was built under R version 4.3.2
##
## Attaching package: 'plotrix'
##
## The following object is masked from 'package:psych':
##
## rescale
##
## Loading required package: colorspace
## Loading required package: e1071
## Warning: package 'e1071' was built under R version 4.3.2
##
## Attaching package: 'smacof'
##
## The following object is masked from 'package:psych':
##
## Procrustes
##
## The following object is masked from 'package:base':
##
## transform
wine <- read.csv("C:\\Users\\mfati\\Desktop\\Master\\Semester1\\UL\\Projects\\DimensionReduction\\wine.csv", sep = ",", header=TRUE)
wine <- wine[,-1]
wine$Mg <- as.numeric(wine$Mg)
wine$Proline <- as.numeric(wine$Proline)
dim(wine)
## [1] 178 13
str(wine)
## 'data.frame': 178 obs. of 13 variables:
## $ Alcohol : num 14.2 13.2 13.2 14.4 13.2 ...
## $ Malic.acid : num 1.71 1.78 2.36 1.95 2.59 1.76 1.87 2.15 1.64 1.35 ...
## $ Ash : num 2.43 2.14 2.67 2.5 2.87 2.45 2.45 2.61 2.17 2.27 ...
## $ Acl : num 15.6 11.2 18.6 16.8 21 15.2 14.6 17.6 14 16 ...
## $ Mg : num 127 100 101 113 118 112 96 121 97 98 ...
## $ Phenols : num 2.8 2.65 2.8 3.85 2.8 3.27 2.5 2.6 2.8 2.98 ...
## $ Flavanoids : num 3.06 2.76 3.24 3.49 2.69 3.39 2.52 2.51 2.98 3.15 ...
## $ Nonflavanoid.phenols: num 0.28 0.26 0.3 0.24 0.39 0.34 0.3 0.31 0.29 0.22 ...
## $ Proanth : num 2.29 1.28 2.81 2.18 1.82 1.97 1.98 1.25 1.98 1.85 ...
## $ Color.int : num 5.64 4.38 5.68 7.8 4.32 6.75 5.25 5.05 5.2 7.22 ...
## $ Hue : num 1.04 1.05 1.03 0.86 1.04 1.05 1.02 1.06 1.08 1.01 ...
## $ OD : num 3.92 3.4 3.17 3.45 2.93 2.85 3.58 3.58 2.85 3.55 ...
## $ Proline : num 1065 1050 1185 1480 735 ...
summary(wine)
## Alcohol Malic.acid Ash Acl
## Min. :11.03 Min. :0.740 Min. :1.360 Min. :10.60
## 1st Qu.:12.36 1st Qu.:1.603 1st Qu.:2.210 1st Qu.:17.20
## Median :13.05 Median :1.865 Median :2.360 Median :19.50
## Mean :13.00 Mean :2.336 Mean :2.367 Mean :19.49
## 3rd Qu.:13.68 3rd Qu.:3.083 3rd Qu.:2.558 3rd Qu.:21.50
## Max. :14.83 Max. :5.800 Max. :3.230 Max. :30.00
## Mg Phenols Flavanoids Nonflavanoid.phenols
## Min. : 70.00 Min. :0.980 Min. :0.340 Min. :0.1300
## 1st Qu.: 88.00 1st Qu.:1.742 1st Qu.:1.205 1st Qu.:0.2700
## Median : 98.00 Median :2.355 Median :2.135 Median :0.3400
## Mean : 99.74 Mean :2.295 Mean :2.029 Mean :0.3619
## 3rd Qu.:107.00 3rd Qu.:2.800 3rd Qu.:2.875 3rd Qu.:0.4375
## Max. :162.00 Max. :3.880 Max. :5.080 Max. :0.6600
## Proanth Color.int Hue OD
## Min. :0.410 Min. : 1.280 Min. :0.4800 Min. :1.270
## 1st Qu.:1.250 1st Qu.: 3.220 1st Qu.:0.7825 1st Qu.:1.938
## Median :1.555 Median : 4.690 Median :0.9650 Median :2.780
## Mean :1.591 Mean : 5.058 Mean :0.9574 Mean :2.612
## 3rd Qu.:1.950 3rd Qu.: 6.200 3rd Qu.:1.1200 3rd Qu.:3.170
## Max. :3.580 Max. :13.000 Max. :1.7100 Max. :4.000
## Proline
## Min. : 278.0
## 1st Qu.: 500.5
## Median : 673.5
## Mean : 746.9
## 3rd Qu.: 985.0
## Max. :1680.0
kable(head(wine,10))
| Alcohol | Malic.acid | Ash | Acl | Mg | Phenols | Flavanoids | Nonflavanoid.phenols | Proanth | Color.int | Hue | OD | Proline |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 14.23 | 1.71 | 2.43 | 15.6 | 127 | 2.80 | 3.06 | 0.28 | 2.29 | 5.64 | 1.04 | 3.92 | 1065 |
| 13.20 | 1.78 | 2.14 | 11.2 | 100 | 2.65 | 2.76 | 0.26 | 1.28 | 4.38 | 1.05 | 3.40 | 1050 |
| 13.16 | 2.36 | 2.67 | 18.6 | 101 | 2.80 | 3.24 | 0.30 | 2.81 | 5.68 | 1.03 | 3.17 | 1185 |
| 14.37 | 1.95 | 2.50 | 16.8 | 113 | 3.85 | 3.49 | 0.24 | 2.18 | 7.80 | 0.86 | 3.45 | 1480 |
| 13.24 | 2.59 | 2.87 | 21.0 | 118 | 2.80 | 2.69 | 0.39 | 1.82 | 4.32 | 1.04 | 2.93 | 735 |
| 14.20 | 1.76 | 2.45 | 15.2 | 112 | 3.27 | 3.39 | 0.34 | 1.97 | 6.75 | 1.05 | 2.85 | 1450 |
| 14.39 | 1.87 | 2.45 | 14.6 | 96 | 2.50 | 2.52 | 0.30 | 1.98 | 5.25 | 1.02 | 3.58 | 1290 |
| 14.06 | 2.15 | 2.61 | 17.6 | 121 | 2.60 | 2.51 | 0.31 | 1.25 | 5.05 | 1.06 | 3.58 | 1295 |
| 14.83 | 1.64 | 2.17 | 14.0 | 97 | 2.80 | 2.98 | 0.29 | 1.98 | 5.20 | 1.08 | 2.85 | 1045 |
| 13.86 | 1.35 | 2.27 | 16.0 | 98 | 2.98 | 3.15 | 0.22 | 1.85 | 7.22 | 1.01 | 3.55 | 1045 |
sapply(wine, function(x) sum(is.na(x)))
## Alcohol Malic.acid Ash
## 0 0 0
## Acl Mg Phenols
## 0 0 0
## Flavanoids Nonflavanoid.phenols Proanth
## 0 0 0
## Color.int Hue OD
## 0 0 0
## Proline
## 0
winenormal<-data.Normalization(wine, type="n1",normalization="column")
head(winenormal)
## Alcohol Malic.acid Ash Acl Mg Phenols Flavanoids
## 1 1.5143408 -0.56066822 0.2313998 -1.1663032 1.90852151 0.8067217 1.0319081
## 2 0.2455968 -0.49800856 -0.8256672 -2.4838405 0.01809398 0.5670481 0.7315653
## 3 0.1963252 0.02117152 1.1062139 -0.2679823 0.08810981 0.8067217 1.2121137
## 4 1.6867914 -0.34583508 0.4865539 -0.8069748 0.92829983 2.4844372 1.4623994
## 5 0.2948684 0.22705328 1.8352256 0.4506745 1.27837900 0.8067217 0.6614853
## 6 1.4773871 -0.51591132 0.3043010 -1.2860793 0.85828399 1.5576991 1.3622851
## Nonflavanoid.phenols Proanth Color.int Hue OD Proline
## 1 -0.6577078 1.2214385 0.2510088 0.3611585 1.8427215 1.01015939
## 2 -0.8184106 -0.5431887 -0.2924962 0.4049085 1.1103172 0.96252635
## 3 -0.4970050 2.1299594 0.2682629 0.3174085 0.7863692 1.39122370
## 4 -0.9791134 1.0292513 1.1827317 -0.4263410 1.1807407 2.32800680
## 5 0.2261576 0.4002753 -0.3183774 0.3611585 0.4483365 -0.03776747
## 6 -0.1755994 0.6623487 0.7298108 0.4049085 0.3356589 2.23274072
corrplot(cor(winenormal[sapply(winenormal, is.numeric)]), type = 'upper', method = 'number', tl.cex = 0.9,number.cex=0.65,)
From correlation analysis, there is a high correlation between Phenols and Flavanoids.
If we want we can reduce dimensionality by removing highly correlated features because if two features are highly correlated, then they have same effect for machine learning model.
Let’s see behavior of these two variables and others at MDS and PCA
MDS (multidimensional scaling) is the machine learning approach which is used in purpose to reduce dimensions of the data in order to be able to plot some meaningful results from the data.
Let’s create a dissimilarity matrix and MDS plot
w1 = dist(t(winenormal))
mds = smacofSym(w1, ndim = 2 )
mds1<-cmdscale(w1, k=2)
summary(mds1)
## V1 V2
## Min. :-11.1063 Min. :-8.798
## 1st Qu.: -7.1703 1st Qu.:-3.536
## Median : 0.1172 Median : 1.068
## Mean : 0.0000 Mean : 0.000
## 3rd Qu.: 7.3681 3rd Qu.: 4.423
## Max. : 9.3303 Max. : 8.006
mdsplot <- data.frame(mds1)
ggplot(data = mdsplot, aes(x = X1, y = X2) ) +
geom_point(color = "red") +
geom_text(aes(label = rownames(mdsplot)), vjust = 1.8, color = "black") +
labs(x = "Dim1", y = "Dim2", title = "MDS Plot of Wine") +
theme_bw()
We are seing dispersion of parameters
Now we will check the MDS Goodness
gdns<-randomstress(n=13, ndim=2, nrep=5)
mean(gdns)
## [1] 0.3516239
It shows the average of randomly generated MDS stress values for 13 objects and 2 dimensions. The average stress value is 0.3586362
PCA is a dimensionality-reduction approach for reducing the dimensionality of big data sets by converting a large collection of variables into a smaller set that nevertheless includes the majority of the information in the large set.
We can perform PCA using prcomp() function
pca <- prcomp(winenormal, center=FALSE, scale=FALSE)
summarypca <- summary(pca)
summarypca
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 2.169 1.5802 1.2025 0.95863 0.92370 0.80103 0.74231
## Proportion of Variance 0.362 0.1921 0.1112 0.07069 0.06563 0.04936 0.04239
## Cumulative Proportion 0.362 0.5541 0.6653 0.73599 0.80162 0.85098 0.89337
## PC8 PC9 PC10 PC11 PC12 PC13
## Standard deviation 0.59034 0.53748 0.5009 0.47517 0.41082 0.32152
## Proportion of Variance 0.02681 0.02222 0.0193 0.01737 0.01298 0.00795
## Cumulative Proportion 0.92018 0.94240 0.9617 0.97907 0.99205 1.00000
The quality of this PCA is good enough. We have 7 principal components amounting to 90 % of the variance explained, the first one explaining 36 %. However, later validation steps will show what is the optimal number of components. We can continue
Let’s find eigenvalue of the data and show on graph
get_eig(pca)
## eigenvalue variance.percent cumulative.variance.percent
## Dim.1 4.7058503 36.1988481 36.19885
## Dim.2 2.4969737 19.2074903 55.40634
## Dim.3 1.4460720 11.1236305 66.52997
## Dim.4 0.9189739 7.0690302 73.59900
## Dim.5 0.8532282 6.5632937 80.16229
## Dim.6 0.6416570 4.9358233 85.09812
## Dim.7 0.5510283 4.2386793 89.33680
## Dim.8 0.3484974 2.6807489 92.01754
## Dim.9 0.2888799 2.2221534 94.23970
## Dim.10 0.2509025 1.9300191 96.16972
## Dim.11 0.2257886 1.7368357 97.90655
## Dim.12 0.1687702 1.2982326 99.20479
## Dim.13 0.1033779 0.7952149 100.00000
fviz_eig(pca, addlabels = TRUE,ylim = c(0, 50))
Now let’s merge these eigenvalue results with variable names and see which variable are more important than the others.
coordpca <- get_pca_var(pca)
# Coordinates of variables
kable(coordpca$coord)
| Dim.1 | Dim.2 | Dim.3 | Dim.4 | Dim.5 | Dim.6 | Dim.7 | Dim.8 | Dim.9 | Dim.10 | Dim.11 | Dim.12 | Dim.13 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Alcohol | -0.3130934 | -0.7642573 | -0.2493833 | -0.0171176 | 0.2453944 | -0.1710519 | -0.0418637 | -0.2338555 | -0.2733703 | -0.1059932 | 0.1073495 | 0.1093949 | -0.0048132 |
| Malic.acid | 0.5318847 | -0.3554317 | 0.1070404 | 0.5146798 | -0.0325270 | -0.4300067 | 0.3121603 | -0.0388599 | 0.0404628 | 0.1548187 | -0.0363438 | -0.0499947 | -0.0083480 |
| Ash | 0.0044494 | -0.4994461 | 0.7530514 | -0.2053154 | 0.1321131 | -0.1237396 | -0.1107313 | 0.1005107 | 0.1653782 | 0.0135872 | 0.2369643 | 0.0203857 | 0.0454050 |
| Acl | 0.5191571 | 0.0167349 | 0.7360433 | 0.0583417 | -0.0610595 | 0.0807640 | -0.2130209 | -0.2526465 | -0.1077366 | -0.0264473 | -0.2277566 | 0.0229001 | -0.0294783 |
| Mg | -0.3080229 | -0.4734761 | 0.1572388 | -0.3372432 | -0.6715773 | -0.0305546 | 0.2396804 | 0.0923059 | -0.1458722 | -0.0339963 | -0.0338745 | -0.0255610 | -0.0182543 |
| Phenols | -0.8561367 | -0.1027742 | 0.1757842 | 0.1898745 | 0.1379259 | 0.0673849 | -0.0207291 | 0.2396378 | -0.1537366 | 0.1603543 | -0.1446145 | 0.1248399 | 0.1491577 |
| Flavanoids | -0.9174702 | 0.0053091 | 0.1811991 | 0.1459946 | 0.1007075 | 0.0151556 | -0.0450474 | 0.1105378 | -0.0266472 | 0.0817224 | 0.0122091 | 0.0176235 | -0.2675909 |
| Nonflavanoid.phenols | 0.6476070 | -0.0454768 | 0.2048724 | -0.1948907 | 0.4625011 | 0.2071428 | 0.4420082 | 0.1377165 | -0.1050772 | -0.1079619 | -0.0555457 | -0.0173990 | -0.0366666 |
| Proanth | -0.6799217 | -0.0621039 | 0.1797229 | 0.3825481 | -0.1264179 | 0.4275888 | 0.2762438 | -0.2173777 | 0.1124102 | -0.0672129 | 0.1127881 | 0.0392548 | 0.0375917 |
| Color.int | 0.1922360 | -0.8374894 | -0.1651145 | 0.0631984 | 0.0706049 | 0.3353486 | -0.1690336 | 0.0199516 | -0.0302155 | 0.1456498 | -0.0151289 | -0.2482242 | 0.0038560 |
| Hue | -0.6436621 | 0.4412422 | 0.1024817 | -0.4100751 | 0.1603683 | -0.0848959 | 0.1722727 | -0.2577549 | -0.0461307 | 0.2616705 | 0.0229090 | -0.1064894 | 0.0289015 |
| OD | -0.8160189 | 0.2599338 | 0.1996251 | 0.1765039 | 0.0934428 | -0.2129560 | -0.0332287 | 0.0461100 | -0.0737561 | -0.2623251 | -0.0220591 | -0.2468838 | 0.0503887 |
| Proline | -0.6220508 | -0.5766127 | -0.1524154 | -0.2224704 | 0.1458240 | -0.0959044 | 0.0570130 | -0.0708538 | 0.3094709 | -0.0812042 | -0.2562460 | 0.0326195 | -0.0046452 |
# Contribution of variables
kable(coordpca$contrib)
| Dim.1 | Dim.2 | Dim.3 | Dim.4 | Dim.5 | Dim.6 | Dim.7 | Dim.8 | Dim.9 | Dim.10 | Dim.11 | Dim.12 | Dim.13 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Alcohol | 2.0830974 | 23.3918820 | 4.3007553 | 0.0318848 | 7.0577176 | 4.5598754 | 0.3180549 | 15.6926312 | 25.8693405 | 4.4776561 | 5.1038474 | 7.0908474 | 0.0224100 |
| Malic.acid | 6.0116950 | 5.0593925 | 0.7923294 | 28.8251177 | 0.1240000 | 28.8169110 | 17.6840358 | 0.4333160 | 0.5667537 | 9.5530412 | 0.5850038 | 1.4809925 | 0.0674116 |
| Ash | 0.0004207 | 9.9899495 | 39.2156374 | 4.5871171 | 2.0456286 | 2.3862422 | 2.2251871 | 2.8988475 | 9.4675872 | 0.0735787 | 24.8693128 | 0.2462380 | 1.9942532 |
| Acl | 5.7274256 | 0.0112159 | 37.4642355 | 0.3703868 | 0.4369599 | 1.0165581 | 8.2351285 | 18.3158477 | 4.0179927 | 0.2787779 | 22.9741700 | 0.3107267 | 0.8405746 |
| Mg | 2.0161740 | 8.9780536 | 1.7097376 | 12.3760834 | 52.8599543 | 0.1454960 | 10.4253627 | 2.4448898 | 7.3659356 | 0.4606367 | 0.5082109 | 0.3871342 | 0.3223312 |
| Phenols | 15.5757183 | 0.4230138 | 2.1368289 | 3.9231070 | 2.2295987 | 0.7076561 | 0.0779804 | 16.4782482 | 8.1815747 | 10.2484083 | 9.2623559 | 9.2344544 | 21.5210553 |
| Flavanoids | 17.8873419 | 0.0011288 | 2.2705035 | 2.3193704 | 1.1886633 | 0.0357967 | 0.3682695 | 3.5060824 | 0.2458027 | 2.6618089 | 0.0660186 | 0.1840310 | 69.2651822 |
| Nonflavanoid.phenols | 8.9122014 | 0.0828259 | 2.9025311 | 4.1331305 | 25.0703477 | 6.6870863 | 35.4557474 | 5.4421728 | 3.8220766 | 4.6455367 | 1.3664643 | 0.1793708 | 1.3005087 |
| Proanth | 9.8238044 | 0.1544625 | 2.2336591 | 15.9246116 | 1.8730611 | 28.4937517 | 13.8487693 | 13.5590940 | 4.3741576 | 1.8005319 | 5.6340987 | 0.9130381 | 1.3669601 |
| Color.int | 0.7852920 | 28.0895412 | 1.8852996 | 0.4346196 | 0.5842581 | 17.5262917 | 5.1852819 | 0.1142232 | 0.3160410 | 8.4550208 | 0.1013709 | 36.5083773 | 0.0143827 |
| Hue | 8.8039532 | 7.7972268 | 0.7262776 | 18.2988379 | 3.0142002 | 1.1232342 | 5.3859103 | 19.0640190 | 0.7366513 | 27.2900597 | 0.2324398 | 6.7191895 | 0.8080004 |
| OD | 14.1501921 | 2.7058997 | 2.7557523 | 3.3900448 | 1.0233546 | 7.0676794 | 0.2003789 | 0.6100843 | 1.8831223 | 27.4267836 | 0.2155123 | 36.1151384 | 2.4560574 |
| Proline | 8.2226840 | 13.3154077 | 1.6064528 | 5.3856884 | 2.4922558 | 1.4334213 | 0.5898932 | 1.4405441 | 33.1529642 | 2.6281597 | 29.0811947 | 0.6304618 | 0.0208726 |
# Control variable colors using their contributions
fviz_pca_var(pca, col.var="contrib",
gradient.cols = c("orange", "green", "#CC0066"),
repel = TRUE # Avoid text overlapping
)
According to the chart above, variables with the darkest shading are the most significant, while those with the lightest shading are the least important.
# Contributions of variables to PC1
fviz_contrib(pca, choice = "var", axes = 1, top = 10)
# Contributions of variables to PC2
fviz_contrib(pca, choice = "var", axes = 2, top = 10)
Now find results for individuals.
indpca <- get_pca_ind(pca)
# Coordinates of individuals
kable(head(indpca$coord))
| Dim.1 | Dim.2 | Dim.3 | Dim.4 | Dim.5 | Dim.6 | Dim.7 | Dim.8 | Dim.9 | Dim.10 | Dim.11 | Dim.12 | Dim.13 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| -3.307421 | -1.4394023 | -0.1652728 | -0.2150246 | -0.6910933 | -0.2232504 | 0.5947488 | 0.0649559 | -0.6396384 | -1.0180840 | 0.4502932 | -0.5392891 | 0.0660523 |
| -2.203250 | 0.3324551 | -2.0207571 | -0.2905387 | 0.2569299 | -0.9245123 | 0.0536243 | 1.0215343 | 0.3079780 | -0.1592521 | 0.1422560 | -0.3871456 | -0.0036263 |
| -2.509661 | -1.0282507 | 0.9800541 | 0.7228632 | 0.2503270 | 0.5477310 | 0.4230122 | -0.3432479 | 1.1745213 | -0.1130420 | 0.2858665 | -0.0005819 | -0.0216554 |
| -3.746497 | -2.7486184 | -0.1756962 | 0.5663856 | 0.3109644 | 0.1141091 | -0.3822590 | 0.6417831 | -0.0523966 | -0.2387392 | -0.7574476 | 0.2413388 | 0.3684442 |
| -1.006071 | -0.8673840 | 2.0209873 | -0.4086131 | -0.2976180 | -0.4053761 | 0.4428253 | 0.4155283 | -0.3258998 | 0.0781460 | 0.5244656 | 0.2160547 | 0.0791403 |
| -3.041674 | -2.1164309 | -0.6276254 | -0.5141870 | 0.6302409 | 0.1230834 | 0.4005239 | 0.3937826 | 0.1517181 | 0.1017089 | -0.4044444 | 0.3783654 | -0.1447470 |
# Contribution of variables
kable(head(indpca$contrib))
| Dim.1 | Dim.2 | Dim.3 | Dim.4 | Dim.5 | Dim.6 | Dim.7 | Dim.8 | Dim.9 | Dim.10 | Dim.11 | Dim.12 | Dim.13 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1.3059329 | 0.4661550 | 0.0106119 | 0.0282653 | 0.3144765 | 0.0436377 | 0.3606395 | 0.0068017 | 0.7956675 | 2.3208241 | 0.5045087 | 0.9681162 | 0.0237098 |
| 0.5795213 | 0.0248675 | 1.5864204 | 0.0516042 | 0.0434654 | 0.7483460 | 0.0029318 | 1.6822341 | 0.1844599 | 0.0567866 | 0.0503523 | 0.4989223 | 0.0000715 |
| 0.7519205 | 0.2378834 | 0.3731558 | 0.3194398 | 0.0412601 | 0.2626707 | 0.1824365 | 0.1899312 | 2.6827759 | 0.0286124 | 0.2033313 | 0.0000011 | 0.0025485 |
| 1.6756863 | 1.6997886 | 0.0119926 | 0.1961107 | 0.0636702 | 0.0114003 | 0.1489778 | 0.6639832 | 0.0053391 | 0.1276211 | 1.4275231 | 0.1938824 | 0.7377268 |
| 0.1208367 | 0.1692735 | 1.5867819 | 0.1020708 | 0.0583221 | 0.1438776 | 0.1999268 | 0.2783439 | 0.2065527 | 0.0136738 | 0.6844031 | 0.1553859 | 0.0340367 |
| 1.1045036 | 1.0077997 | 0.1530351 | 0.1616289 | 0.2615339 | 0.0132640 | 0.1635547 | 0.2499733 | 0.0447649 | 0.0231629 | 0.4070011 | 0.4765483 | 0.1138601 |
var<-get_pca_ind(pca)
a<-fviz_contrib(pca, "var", axes=1, xtickslab.rt=90)
b<-fviz_contrib(pca, "var", axes=2, xtickslab.rt=90)
grid.arrange(a,b,top='Contribution of the Principal Components')
Result for PCA
As a results from the graphs above, both in results that we executed for all approach, We don’t have very strong evidence but we might drop
Malic acidandAclfeatures from the dataset.
wine <- wine[!colnames(wine) %in% c("Malic acid", "Acl")]
kable(head(wine))
| Alcohol | Malic.acid | Ash | Mg | Phenols | Flavanoids | Nonflavanoid.phenols | Proanth | Color.int | Hue | OD | Proline |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 14.23 | 1.71 | 2.43 | 127 | 2.80 | 3.06 | 0.28 | 2.29 | 5.64 | 1.04 | 3.92 | 1065 |
| 13.20 | 1.78 | 2.14 | 100 | 2.65 | 2.76 | 0.26 | 1.28 | 4.38 | 1.05 | 3.40 | 1050 |
| 13.16 | 2.36 | 2.67 | 101 | 2.80 | 3.24 | 0.30 | 2.81 | 5.68 | 1.03 | 3.17 | 1185 |
| 14.37 | 1.95 | 2.50 | 113 | 3.85 | 3.49 | 0.24 | 2.18 | 7.80 | 0.86 | 3.45 | 1480 |
| 13.24 | 2.59 | 2.87 | 118 | 2.80 | 2.69 | 0.39 | 1.82 | 4.32 | 1.04 | 2.93 | 735 |
| 14.20 | 1.76 | 2.45 | 112 | 3.27 | 3.39 | 0.34 | 1.97 | 6.75 | 1.05 | 2.85 | 1450 |
REFERENCES
Lecture notes of Jacek Lewkowicz - Unsupervised Learning-University of Warsaw.
https://www.displayr.com/goodness-of-fit-in-mds-and-t-sne-with-shepard-diagrams/
https://www.geeksforgeeks.org/multidimensional-scaling-using-r/
https://medium.com/geekculture/principal-component-analysis-3d2b3a0bb93e
https://medium.com/swlh/principal-component-analysis-3b746802766