Objetivo:

El objetivo de este trabajo, es practicar sobre el lenguaje de R y exponer los modelos de aprendizaje no supervisado, haciendo un poco trampa, ya que nosotros si tenemos la variable que categoriza a los datos, también otro objetivo es utilizar modelos de reducción de dimensión para los datos.

Introducción al dataset

Este conjunto de datos, tiene información sobre diferentes Aceites, contando con las siguientes variables:

Cuenta con 96 registros, sin ningún dato faltante.

## [1] 96  8
## Rows: 96
## Columns: 8
## $ palmitic   <dbl> 9.7, 11.1, 11.5, 10.0, 12.2, 9.8, 10.5, 10.5, 11.5, 10.0, 1…
## $ stearic    <dbl> 5.2, 5.0, 5.2, 4.8, 5.0, 4.2, 5.0, 5.0, 5.2, 4.8, 5.0, 4.4,…
## $ oleic      <dbl> 31.0, 32.9, 35.0, 30.4, 31.1, 43.0, 31.8, 31.8, 35.0, 30.4,…
## $ linoleic   <dbl> 52.7, 49.8, 47.2, 53.5, 50.5, 39.2, 51.3, 51.3, 47.2, 53.5,…
## $ linolenic  <dbl> 0.4, 0.3, 0.2, 0.3, 0.3, 2.4, 0.4, 0.4, 0.2, 0.3, 0.3, 2.3,…
## $ eicosanoic <dbl> 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4,…
## $ eicosenoic <dbl> 0.1, 0.1, 0.1, 0.1, 0.1, 0.5, 0.1, 0.1, 0.1, 0.1, 0.1, 0.5,…
## $ class      <fct> pumpkin, pumpkin, pumpkin, pumpkin, pumpkin, pumpkin, pumpk…
## # A tibble: 6 × 8
##   palmitic stearic oleic linoleic linolenic eicosanoic eicosenoic class  
##      <dbl>   <dbl> <dbl>    <dbl>     <dbl>      <dbl>      <dbl> <fct>  
## 1      9.7     5.2  31       52.7       0.4        0.4        0.1 pumpkin
## 2     11.1     5    32.9     49.8       0.3        0.4        0.1 pumpkin
## 3     11.5     5.2  35       47.2       0.2        0.4        0.1 pumpkin
## 4     10       4.8  30.4     53.5       0.3        0.4        0.1 pumpkin
## 5     12.2     5    31.1     50.5       0.3        0.4        0.1 pumpkin
## 6      9.8     4.2  43       39.2       2.4        0.4        0.5 pumpkin
##     palmitic        stearic          oleic          linoleic    
##  Min.   : 4.50   Min.   :1.700   Min.   :22.80   Min.   : 7.90  
##  1st Qu.: 6.20   1st Qu.:3.475   1st Qu.:26.30   1st Qu.:43.10  
##  Median : 9.85   Median :4.200   Median :30.70   Median :50.80  
##  Mean   : 9.04   Mean   :4.200   Mean   :36.73   Mean   :46.49  
##  3rd Qu.:11.12   3rd Qu.:5.000   3rd Qu.:38.62   3rd Qu.:58.08  
##  Max.   :14.90   Max.   :6.700   Max.   :76.70   Max.   :66.10  
##                                                                 
##    linolenic       eicosanoic      eicosenoic           class   
##  Min.   :0.100   Min.   :0.100   Min.   :0.1000   corn     : 2  
##  1st Qu.:0.375   1st Qu.:0.100   1st Qu.:0.1000   olive    : 7  
##  Median :0.800   Median :0.400   Median :0.1000   peanut   : 3  
##  Mean   :2.272   Mean   :0.399   Mean   :0.3115   pumpkin  :37  
##  3rd Qu.:2.650   3rd Qu.:0.400   3rd Qu.:0.3000   rapeseed :10  
##  Max.   :9.500   Max.   :2.800   Max.   :1.8000   soybean  :11  
##                                                   sunflower:26

Observamos que tenemos 7 clases. La más común es sunflower y la menos común es corn.

Visualización por clase

## [1] "mean"
## # A tibble: 7 × 8
##   class     palmitic stearic oleic linoleic linolenic eicosanoic eicosenoic
##   <fct>        <dbl>   <dbl> <dbl>    <dbl>     <dbl>      <dbl>      <dbl>
## 1 corn         10.4     2.05  33.6     51.3     1.55       0.5        0.4  
## 2 olive        11.5     2.79  72.6     10.7     1.16       0.2        0.214
## 3 peanut        9.77    3.33  59       20.8     0.167      1.5        1.43 
## 4 pumpkin      11.0     5.37  33.3     48.8     0.873      0.403      0.162
## 5 rapeseed      5.22    1.95  58.2     23.8     8.38       0.42       0.96 
## 6 soybean      10.5     3.97  25.7     52.2     6.72       0.327      0.227
## 7 sunflower     6.27    4.14  26.0     61.7     0.631      0.335      0.2  
## [1] "min"
## # A tibble: 7 × 8
##   class     palmitic stearic oleic linoleic linolenic eicosanoic eicosenoic
##   <fct>        <dbl>   <dbl> <dbl>    <dbl>     <dbl>      <dbl>      <dbl>
## 1 corn          10       1.8  30.2     47.1       0.9        0.5        0.3
## 2 olive          9.3     2.6  65        7.9       0.6        0.1        0.1
## 3 peanut         9.6     3.3  57.7     20.5       0.1        1.5        1.2
## 4 pumpkin        7.6     4.2  25.8     39.2       0.2        0.1        0.1
## 5 rapeseed       4.5     1.7  52.2     18.6       6.8        0.1        0.1
## 6 soybean        9.6     3.5  23.1     49.2       5.5        0.1        0.1
## 7 sunflower      5.6     2.8  22.8     57.7       0.1        0.1        0.1
## [1] "max"
## # A tibble: 7 × 8
##   class     palmitic stearic oleic linoleic linolenic eicosanoic eicosenoic
##   <fct>        <dbl>   <dbl> <dbl>    <dbl>     <dbl>      <dbl>      <dbl>
## 1 corn          10.7     2.3  36.9     55.5       2.2        0.5        0.5
## 2 olive         14.9     3    76.7     17         3.9        0.5        0.7
## 3 peanut        10       3.4  60       21.3       0.2        1.5        1.8
## 4 pumpkin       13.1     6.7  43.3     56.1       4.2        1.3        0.7
## 5 rapeseed       6.2     2.3  64.9     29         9.5        0.8        1.6
## 6 soybean       11.9     4.3  30.3     55.1       7.8        0.7        0.9
## 7 sunflower      7.2     4.9  29.9     66.1       1.7        2.8        0.9
## [1] "sd"
## # A tibble: 7 × 8
##   class     palmitic stearic oleic linoleic linolenic eicosanoic eicosenoic
##   <fct>        <dbl>   <dbl> <dbl>    <dbl>     <dbl>      <dbl>      <dbl>
## 1 corn         0.495  0.354   4.74    5.94     0.919       0          0.141
## 2 olive        1.73   0.135   4.37    3.28     1.21        0.173      0.227
## 3 peanut       0.208  0.0577  1.18    0.416    0.0577      0          0.321
## 4 pumpkin      1.29   0.618   4.17    4.35     0.912       0.252      0.134
## 5 rapeseed     0.496  0.201   4.19    3.88     0.937       0.286      0.669
## 6 soybean      0.638  0.272   1.85    1.81     0.844       0.200      0.245
## 7 sunflower    0.342  0.421   1.87    2.11     0.468       0.518      0.179

Análisis visual

Reducción de dimensión

En la siguiente sección vamos a revisar métodos de reducción de dimensión.

Análisis de correlación

Podemos observar que existen correlaciones alejadas de cero, por lo que podemos afirmar que el análisis con componentes principales será de gran utilidad.

Análisis de componentes principales

## [1] 0.4558454 0.6664997 0.8452196 0.9384298 0.9727448 0.9998312 1.0000000

La salida de la función prcomp nos muestra 3 resultados:
- sdev, que son los autovalores (eigenvalores),
- rotation, que son los autovectores (eigenvectores),
- x, que son los datos transformados.

Fuera de este algoritmo, los eigenvalores se pueden calcular de manera habitual, pero para obtener x debemos multiplicar nuestra matriz centrada de los datos originales por la matriz de eigenvectores (es decir, matriz centrada * eigenvectores).

Con esto, podemos concluir que con 3 componentes principales podemos explicar más del 80% de la variabilidad de los datos.

## Warning: No shared levels found between `names(values)` of the manual scale and the
## data's colour values.

TSNE

Elegimos estos parametros ya que estan generando grupos mas separados.

UMAP

Metodos de clusterizacion

Dendogramas

#### Metodos Agnes

Diana

Otros métodos para determinar el número de clusters.

## *** : The Hubert index is a graphical method of determining the number of clusters.
##                 In the plot of Hubert index, we seek a significant knee that corresponds to a 
##                 significant increase of the value of the measure i.e the significant peak in Hubert
##                 index second differences plot. 
## 

## *** : The D index is a graphical method of determining the number of clusters. 
##                 In the plot of D index, we seek a significant knee (the significant peak in Dindex
##                 second differences plot) that corresponds to a significant increase of the value of
##                 the measure. 
##  
## ******************************************************************* 
## * Among all indices:                                                
## * 1 proposed 2 as the best number of clusters 
## * 7 proposed 3 as the best number of clusters 
## * 4 proposed 4 as the best number of clusters 
## * 3 proposed 5 as the best number of clusters 
## * 5 proposed 8 as the best number of clusters 
## * 3 proposed 10 as the best number of clusters 
## 
##                    ***** Conclusion *****                            
##  
## * According to the majority rule, the best number of clusters is  3 
##  
##  
## *******************************************************************

Kmeans

Entonces, vamos a escoger un k = 3.

Ahora vamos a observar como afectan estos grupos a nuestras clases originales.

##    
##     corn olive peanut pumpkin rapeseed soybean sunflower
##   1    0     7      3       0       10       0         0
##   2    1     0      0       5        0       9        26
##   3    1     0      0      32        0       2         0
## 
##  1  2  3 
## 20 41 35

Ahora vamos a observarlo con otro método de reducción de dimensión.

d6 = d %>% filter(min_dist == 0.5 & n_neighbors == 15)
d6$kmeans = datos3d$kmeans
d6 %>% 
  ggplot(aes(x = x, y = y, col = kmeans)) +
  geom_point() +
  theme_bw() +
  scale_color_manual(values = c('#740938',
             '#133E87',
             '#FFBF61'
             )) +
  labs(title = 'Eleccion Final',
       subtitle = 'min_dist: 0.5 n_neigborhs: 15')

Dbscan

Análisis de los outliers generales

## Warning: Some values were outside the color scale and will be treated as NA
## Some values were outside the color scale and will be treated as NA
##        
##         corn olive peanut pumpkin rapeseed soybean sunflower
##   FALSE    0     6      3       0        6       0         1
##   TRUE     2     1      0      37        4      11        25
## 
## FALSE  TRUE 
##    16    80
## 
##      corn     olive    peanut   pumpkin  rapeseed   soybean sunflower 
##         2         7         3        37        10        11        26
##        
##          1  2  3
##   FALSE 15  1  0
##   TRUE   5 40 35
Puntos dentro del óvalo
Distancias entre los vecinos
Con 8 vecinos
##       70%       75%       80%       90%       95%       99%     99.5%     99.9% 
##  3.520937  4.029888  4.341659  7.804358 10.404326 16.801245 18.609567 20.918081
Con 14 vecinos
##       70%       75%       80%       90%       95%       99%     99.5%     99.9% 
##  4.860345  5.728001  6.677724 10.745458 14.297523 21.492696 22.703095 24.366512
dbscan con minPts = 14, eps = 6.5
## DBSCAN clustering for 96 objects.
## Parameters: eps = 6.5, minPts = 14
## Using euclidean distances and borderpoints = TRUE
## The clustering contains 1 cluster(s) and 20 noise points.
## 
##  0  1 
## 20 76 
## 
## Available fields: cluster, eps, minPts, metric, borderPoints
dbscan con minPts = 8, eps = 4.4
## DBSCAN clustering for 96 objects.
## Parameters: eps = 4.4, minPts = 8
## Using euclidean distances and borderpoints = TRUE
## The clustering contains 1 cluster(s) and 22 noise points.
## 
##  0  1 
## 22 74 
## 
## Available fields: cluster, eps, minPts, metric, borderPoints
Ayuda con k-means.
Distancia entre los puntos del cluster 3 de k-mean
## [1] 19.87461
Distancia entre los puntos del cluster 2 de k-means
## [1] 17.70282
Distancia entre los puntos del cluster 3 de k-means
## [1] 33.46102
Distancia media naxima entre los puntos
## [1] 23.67667
Media del numero de elementos por cluster
## [1] 32
dbscan eps = 25, minPts = 32
## DBSCAN clustering for 96 objects.
## Parameters: eps = 25, minPts = 32
## Using euclidean distances and borderpoints = TRUE
## The clustering contains 1 cluster(s) and 11 noise points.
## 
##  0  1 
## 11 85 
## 
## Available fields: cluster, eps, minPts, metric, borderPoints
dbscan eps = 25, minPts = 32 y utilizando las primeras 3 componentes principales de PCA
## DBSCAN clustering for 96 objects.
## Parameters: eps = 25, minPts = 32
## Using euclidean distances and borderpoints = TRUE
## The clustering contains 1 cluster(s) and 0 noise points.
## 
##  1 
## 96 
## 
## Available fields: cluster, eps, minPts, metric, borderPoints
HDSCAN
## HDBSCAN clustering for 96 objects.
## Parameters: minPts = 32
## The clustering contains 0 cluster(s) and 96 noise points.
## 
##  0 
## 96 
## 
## Available fields: cluster, minPts, coredist, cluster_scores,
##                   membership_prob, outlier_scores, hc
## HDBSCAN clustering for 96 objects.
## Parameters: minPts = 14
## The clustering contains 2 cluster(s) and 43 noise points.
## 
##  0  1  2 
## 43 25 28 
## 
## Available fields: cluster, minPts, coredist, cluster_scores,
##                   membership_prob, outlier_scores, hc
## HDBSCAN clustering for 96 objects.
## Parameters: minPts = 8
## The clustering contains 2 cluster(s) and 5 noise points.
## 
##  0  1  2 
##  5 15 76 
## 
## Available fields: cluster, minPts, coredist, cluster_scores,
##                   membership_prob, outlier_scores, hc
dbscan minPts = 14
dbscan minPts = 8
Conteos con las variables de respuesta y los modelos de cluster.
##    
##     corn olive peanut pumpkin rapeseed soybean sunflower
##   0    2     7      3       9       10      11         1
##   1    0     0      0       0        0       0        25
##   2    0     0      0      28        0       0         0
##    
##     corn olive peanut pumpkin rapeseed soybean sunflower
##   0    0     5      0       0        0       0         0
##   1    0     2      3       0       10       0         0
##   2    2     0      0      37        0      11        26
##    
##      0  1  2
##   0  5 15 23
##   1  0  0 25
##   2  0  0 28
##    
##     c1 c2 c3
##   0 20 14  9
##   1  0 25  0
##   2  0  2 26
##    
##     c1 c2 c3
##   0  5  0  0
##   1 15  0  0
##   2  0 41 35

Optics

## # A tibble: 6 × 6
##   class     PC1    PC2   PC3    PC4     PC5
##   <fct>   <dbl>  <dbl> <dbl>  <dbl>   <dbl>
## 1 pumpkin -61.7 -7.14  1.74  -0.456 -0.355 
## 2 pumpkin -60.7 -3.83  2.86   0.356  0.0972
## 3 pumpkin -60.0 -0.579 3.37   0.498 -0.0818
## 4 pumpkin -62.0 -8.09  1.88  -0.355  0.146 
## 5 pumpkin -60.4 -5.64  3.63   1.27   0.412 
## 6 pumpkin -58.2 10.7   0.728  0.417 -0.113

## # A tibble: 6 × 5
##       PC1     PC2   PC3    PC4    PC5
##     <dbl>   <dbl> <dbl>  <dbl>  <dbl>
## 1 -0.387  -0.394  0.546 -0.210 -0.506
## 2 -0.152  -0.238  0.896  0.111  0.145
## 3  0.0262 -0.0858 1.06   0.167 -0.113
## 4 -0.457  -0.438  0.588 -0.170  0.215
## 5 -0.0642 -0.323  1.14   0.471  0.598
## 6  0.461   0.442  0.228  0.135 -0.158

## OPTICS ordering/clustering for 96 objects.
## Parameters: minPts = 5, eps = 3.88284864127684, eps_cl = NA, xi = 0.09
## The clustering contains 8 cluster(s) and 0 noise points.
## 
## Available fields: order, reachdist, coredist, predecessor, minPts, eps,
##                   eps_cl, xi, clusters_xi, cluster

## 
##  1  2  3  4  5  6  7  8 
## 26  2  2 12 23 11 10 10
## # A tibble: 160 × 4
##          db     G1     dunn clusters
##       <dbl>  <dbl>    <dbl>    <int>
##   1   1.15   -4.83   0             8
##   2   1.15   -4.83   0             8
##   3   1.15   -5.70   0             8
##   4   1.01   -6.62   0             7
##   5   1.01   -6.62   0             7
##   6   1.01   -6.67   0             7
##   7   1.01   -6.67   0             7
##   8   1.01   -6.67   0             7
##   9   1.01   -6.67   0             7
##  10   0.864  -7.73   0             6
##  11   0.864  -7.73   0             6
##  12   0.864  -7.73   0             6
##  13   0.864  -7.73   0             6
##  14 NaN     Inf    Inf             2
##  15 NaN     Inf    Inf             2
##  16 NaN     Inf    Inf             2
##  17   1.45   -3.03   0            12
##  18   1.45   -3.03   0            12
##  19   1.45   -3.03   0            12
##  20   1.45   -3.03   0            12
##  21   2.37    2.63   0             9
##  22   2.37    2.63   0             9
##  23   2.37    2.63   0             9
##  24   1.30    6.20   0             7
##  25   1.35   13.1    0             6
##  26   1.03   18.7    0             5
##  27   1.02   15.4    0             5
##  28   1.02   15.4    0             5
##  29   1.02   15.4    0             5
##  30   1.02   15.4    0             5
##  31   1.02   15.4    0             5
##  32   1.21   33.3    0             4
##  33   1.53   34.8    0             9
##  34   1.52   35.0    0             9
##  35   1.86   32.1    0             9
##  36   1.86   32.1    0             9
##  37   1.29   41.2    0             7
##  38   1.12   49.4    0.0733        5
##  39   1.12   49.4    0.0733        5
##  40   1.12   49.4    0.0733        5
##  41   0.924  37.4    0             4
##  42   0.924  37.4    0             4
##  43   0.924  37.4    0             4
##  44   0.924  37.4    0             4
##  45   0.924  37.4    0             4
##  46   0.924  37.4    0             4
##  47   0.924  37.4    0             4
##  48   0.924  37.4    0             4
##  49   1.24   -2.29   0            13
##  50   1.15   -2.54   0            12
##  51   1.15   -2.54   0            12
##  52   1.15   -2.54   0            12
##  53   1.15   -2.54   0            12
##  54   1.15   -2.54   0            12
##  55   1.19   -2.71   0            11
##  56   1.19   -2.71   0            11
##  57   0.526  -3.69   0             8
##  58   0.526  -3.69   0             8
##  59   0.526  -3.69   0             8
##  60   0.526  -3.69   0             8
##  61   0.526  -3.69   0             8
##  62   0.552  -4.04   0             8
##  63   0.374  -4.86   0.444         7
##  64   0.362  -4.65   0.444         6
##  65   1.62   41.4    0            10
##  66   1.62   41.1    0            10
##  67   1.72   37.7    0            11
##  68   1.72   37.7    0            11
##  69   1.64   36.2    0            10
##  70   1.64   36.2    0            10
##  71   1.98   23.4    0             9
##  72   1.98   23.4    0             9
##  73   1.98   23.4    0             9
##  74   2.82   26.8    0             8
##  75   2.82   26.8    0             8
##  76   2.82   26.8    0             8
##  77   2.82   26.8    0             8
##  78   2.55   30.2    0.0332        7
##  79   1.33   -3.90   0.140         7
##  80   1.33   -3.90   0.140         7
##  81   1.39   -3.80   0            10
##  82   1.33   -3.87   0            10
##  83   1.41   -5.10   0             9
##  84   1.41   -5.10   0             9
##  85   1.32   -6.00   0             8
##  86   1.32   -6.00   0             8
##  87   0.831  -6.75   0             7
##  88   0.831  -6.75   0             7
##  89   0.831  -6.75   0             7
##  90   0.831  -6.75   0             7
##  91   0.831  -6.75   0             7
##  92   0.831  -6.75   0             7
##  93   0.894  -7.88   0.0524        6
##  94   0.894  -7.88   0.0524        6
##  95   0.885  -7.85   0             6
##  96   0.885  -7.85   0             6
##  97   0.935  71.5    0             8
##  98   0.938  70.3    0             8
##  99   1.17   67.4    0             7
## 100   1.17   67.4    0             7
## # ℹ 60 more rows
## # A tibble: 11 × 2
##    clusters     n
##       <int> <int>
##  1        2     3
##  2        4    17
##  3        5    17
##  4        6    16
##  5        7    31
##  6        8    26
##  7        9    21
##  8       10    15
##  9       11     4
## 10       12     9
## 11       13     1
## # A tibble: 480 × 3
##    clusters variable value
##       <int> <chr>    <dbl>
##  1        8 db       1.15 
##  2        8 db       1.15 
##  3        8 db       1.15 
##  4        7 db       1.01 
##  5        7 db       1.01 
##  6        7 db       1.01 
##  7        7 db       1.01 
##  8        7 db       1.01 
##  9        7 db       1.01 
## 10        6 db       0.864
## # ℹ 470 more rows

Modelos Gausianos Modificados

## Best BIC values:
##              VEV,8      VEV,7      VEV,9
## BIC      -641.4895 -648.43081 -683.59914
## BIC diff    0.0000   -6.94128  -42.10961
## ---------------------------------------------------- 
## Gaussian finite mixture model fitted by EM algorithm 
## ---------------------------------------------------- 
## 
## Mclust VEV (ellipsoidal, equal shape) model with 9 components: 
## 
##  log-likelihood  n  df       BIC       ICL
##        285.7983 96 275 -683.5991 -683.6065
## 
## Clustering table:
##  1  2  3  4  5  6  7  8  9 
## 14 15 13 17  8 14  4  5  6 
## 
## Mixing probabilities:
##          1          2          3          4          5          6          7 
## 0.14583310 0.15622485 0.13541667 0.17707819 0.08332876 0.14586842 0.04166667 
##          8          9 
## 0.05208333 0.06250000 
## 
## Means:
##                    [,1]        [,2]        [,3]        [,4]        [,5]
## palmitic    0.678001586  0.95660881  0.58084185 -1.09147352 -0.88782551
## stearic     0.711495172  1.18356316 -0.09886782 -0.09450105  0.36148137
## oleic      -0.317784336 -0.12933146 -0.70773229 -0.65965900 -0.73148979
## linoleic    0.270891242  0.02685227  0.34170325  0.92746264  0.89973897
## linolenic  -0.655717612 -0.43396761  1.36384333 -0.54634376 -0.57895596
## eicosanoic  0.002672754 -0.37366595 -0.15522530 -0.51048909 -0.09351621
## eicosenoic -0.481998285 -0.33766460 -0.21605271 -0.38749963 -0.24189617
##                  [,6]       [,7]       [,8]       [,9]
## palmitic    0.3272693 -1.4056631  0.8670520 -1.5047547
## stearic    -0.2923862 -1.7070150 -1.1085556 -1.8743694
## oleic       0.3765327  1.3582034  2.5795344  1.5085694
## linoleic   -0.3870366 -1.3284660 -2.3659284 -1.5063227
## linolenic  -0.4509769  1.8775913 -0.5294786  2.2216209
## eicosanoic  1.4500978 -0.5104960 -0.7670804  0.4303134
## eicosenoic  0.7051054 -0.2724654 -0.5169203  2.8239638
## 
## Variances:
## [,,1]
##                 palmitic       stearic         oleic      linoleic
## palmitic    6.494794e-02 -6.843086e-03 -1.229226e-04 -7.982912e-03
## stearic    -6.843086e-03  1.484479e-02  6.916348e-03 -7.150058e-03
## oleic      -1.229226e-04  6.916348e-03  1.041817e-02 -1.017566e-02
## linoleic   -7.982912e-03 -7.150058e-03 -1.017566e-02  1.161607e-02
## linolenic  -6.101915e-03  1.985611e-03  1.778975e-04  2.146903e-04
## eicosanoic  3.688522e-35 -1.177498e-35 -4.861202e-36  1.596136e-36
## eicosenoic -8.390667e-03  5.685010e-03  2.057716e-03 -1.586625e-03
##                linolenic    eicosanoic    eicosenoic
## palmitic   -6.101915e-03  3.688522e-35 -8.390667e-03
## stearic     1.985611e-03 -1.177498e-35  5.685010e-03
## oleic       1.778975e-04 -4.861202e-36  2.057716e-03
## linoleic    2.146903e-04  1.596136e-36 -1.586625e-03
## linolenic   2.147563e-03 -1.022264e-35  1.896699e-03
## eicosanoic -1.022264e-35  9.801348e-06 -1.293142e-35
## eicosenoic  1.896699e-03 -1.293142e-35  4.697768e-03
## [,,2]
##              palmitic      stearic       oleic    linoleic   linolenic
## palmitic    0.2379380  0.224655513 -0.16928156  0.14275640 -0.09425830
## stearic     0.2246555  0.348510263 -0.15161113  0.11796119 -0.14531702
## oleic      -0.1692816 -0.151611133  0.13983328 -0.11825511  0.06204172
## linoleic    0.1427564  0.117961188 -0.11825511  0.10250578 -0.04833637
## linolenic  -0.0942583 -0.145317015  0.06204172 -0.04833637  0.07591222
## eicosanoic -0.1021566  0.003550707  0.05467996 -0.06873647 -0.02814230
## eicosenoic -0.1445860 -0.167104098  0.09829037 -0.08734596  0.07653596
##              eicosanoic  eicosenoic
## palmitic   -0.102156602 -0.14458598
## stearic     0.003550707 -0.16710410
## oleic       0.054679957  0.09829037
## linoleic   -0.068736471 -0.08734596
## linolenic  -0.028142304  0.07653596
## eicosanoic  0.358993611  0.09114322
## eicosenoic  0.091143220  0.13304574
## [,,3]
##                palmitic      stearic       oleic     linoleic   linolenic
## palmitic    0.062513456  0.022156999  0.02785286 -0.017769096 -0.12686684
## stearic     0.022156999  0.101292884  0.01034257  0.002358465 -0.11040346
## oleic       0.027852857  0.010342568  0.03924834 -0.027861957 -0.12678997
## linoleic   -0.017769096  0.002358465 -0.02786196  0.025656483  0.06748354
## linolenic  -0.126866835 -0.110403456 -0.12678997  0.067483539  0.56155261
## eicosanoic  0.002355782 -0.037174710  0.06948293 -0.088455775 -0.12053212
## eicosenoic  0.032404568 -0.135244153  0.08821772 -0.120218538 -0.06029618
##              eicosanoic  eicosenoic
## palmitic    0.002355782  0.03240457
## stearic    -0.037174710 -0.13524415
## oleic       0.069482928  0.08821772
## linoleic   -0.088455775 -0.12021854
## linolenic  -0.120532121 -0.06029618
## eicosanoic  0.638848393  0.63086465
## eicosenoic  0.630864649  0.91011190
## [,,4]
##                 palmitic      stearic         oleic      linoleic    linolenic
## palmitic    0.0100354658  0.010646932 -0.0006206978  0.0009732605  0.013408155
## stearic     0.0106469324  0.052652080 -0.0078137113  0.0098231233  0.004193706
## oleic      -0.0006206978 -0.007813711  0.0385078740 -0.0456710834  0.046746614
## linoleic    0.0009732605  0.009823123 -0.0456710834  0.0574166217 -0.057306193
## linolenic   0.0134081552  0.004193706  0.0467466142 -0.0573061928  0.089682603
## eicosanoic -0.0370081943 -0.064546249 -0.0094036087 -0.0078354085 -0.071928901
## eicosenoic -0.0213444449 -0.067152342  0.0214746636 -0.0358466644  0.003871237
##              eicosanoic   eicosenoic
## palmitic   -0.037008194 -0.021344445
## stearic    -0.064546249 -0.067152342
## oleic      -0.009403609  0.021474664
## linoleic   -0.007835409 -0.035846664
## linolenic  -0.071928901  0.003871237
## eicosanoic  0.303585200  0.153320987
## eicosenoic  0.153320987  0.132155425
## [,,5]
##                palmitic       stearic        oleic      linoleic     linolenic
## palmitic    0.059989510  0.0014756400  0.038414085 -0.0437109412  0.0013983941
## stearic     0.001475640  0.0130844612  0.001276909 -0.0009754307  0.0008428665
## oleic       0.038414085  0.0012769087  0.029421062 -0.0344982202  0.0050401904
## linoleic   -0.043710941 -0.0009754307 -0.034498220  0.0424736428 -0.0077560807
## linolenic   0.001398394  0.0008428665  0.005040190 -0.0077560807  0.0070501263
## eicosanoic -0.012919820 -0.0245793570  0.001183505 -0.0116040966  0.0044039134
## eicosenoic -0.037277886 -0.0206364745 -0.010355252 -0.0031469818  0.0142821510
##              eicosanoic   eicosenoic
## palmitic   -0.012919820 -0.037277886
## stearic    -0.024579357 -0.020636475
## oleic       0.001183505 -0.010355252
## linoleic   -0.011604097 -0.003146982
## linolenic   0.004403913  0.014282151
## eicosanoic  0.132500187  0.123350564
## eicosenoic  0.123350564  0.171453996
## [,,6]
##                palmitic     stearic      oleic    linoleic    linolenic
## palmitic    0.842755009 -0.17050235  0.6592271 -0.70129241  0.003352692
## stearic    -0.170502349  1.34740807 -0.6662530  0.58744476 -0.078370864
## oleic       0.659227140 -0.66625296  1.9885539 -2.13349632  0.093102496
## linoleic   -0.701292410  0.58744476 -2.1334963  2.31391601 -0.090931690
## linolenic   0.003352692 -0.07837086  0.0931025 -0.09093169  0.085831115
## eicosanoic -1.007270622  0.89351625  0.8855371 -1.22902567 -0.005768646
## eicosenoic -0.263874676 -0.35767166  1.7663914 -2.00665568  0.002265431
##              eicosanoic   eicosenoic
## palmitic   -1.007270622 -0.263874676
## stearic     0.893516255 -0.357671664
## oleic       0.885537075  1.766391381
## linoleic   -1.229025669 -2.006655676
## linolenic  -0.005768646  0.002265431
## eicosanoic  7.487630418  4.001442223
## eicosenoic  4.001442223  3.529866165
## [,,7]
##                palmitic      stearic        oleic     linoleic    linolenic
## palmitic    0.025002204  0.006516459 -0.013716832  0.012766184  0.003972187
## stearic     0.006516459  0.012972375 -0.014471234  0.011113402 -0.009260494
## oleic      -0.013716832 -0.014471234  0.031096863 -0.022413114  0.008093337
## linoleic    0.012766184  0.011113402 -0.022413114  0.018814678 -0.007616418
## linolenic   0.003972187 -0.009260494  0.008093337 -0.007616418  0.015142683
## eicosanoic -0.005439128  0.015006920 -0.005896383  0.011315925 -0.026855259
## eicosenoic -0.005449080  0.014155764 -0.006168275  0.010591188 -0.024461254
##              eicosanoic   eicosenoic
## palmitic   -0.005439128 -0.005449080
## stearic     0.015006920  0.014155764
## oleic      -0.005896383 -0.006168275
## linoleic    0.011315925  0.010591188
## linolenic  -0.026855259 -0.024461254
## eicosanoic  0.063350330  0.059973940
## eicosenoic  0.059973940  0.058027297
## [,,8]
##                 palmitic       stearic         oleic      linoleic
## palmitic    1.290500e-02  0.0003329687 -2.230888e-03  0.0020041144
## stearic     3.329687e-04  0.0033412449 -2.158170e-03  0.0021210625
## oleic      -2.230888e-03 -0.0021581696  3.029151e-03 -0.0015274031
## linoleic    2.004114e-03  0.0021210625 -1.527403e-03  0.0019938988
## linolenic  -8.025464e-05 -0.0007451578 -1.875934e-05 -0.0005113572
## eicosanoic  0.000000e+00  0.0000000000  0.000000e+00  0.0000000000
## eicosenoic  0.000000e+00  0.0000000000  0.000000e+00  0.0000000000
##                linolenic   eicosanoic   eicosenoic
## palmitic   -8.025464e-05 0.000000e+00 0.000000e+00
## stearic    -7.451578e-04 0.000000e+00 0.000000e+00
## oleic      -1.875934e-05 0.000000e+00 0.000000e+00
## linoleic   -5.113572e-04 0.000000e+00 0.000000e+00
## linolenic   8.264923e-04 0.000000e+00 0.000000e+00
## eicosanoic  0.000000e+00 5.034339e-05 0.000000e+00
## eicosenoic  0.000000e+00 0.000000e+00 1.997399e-06
## [,,9]
##                palmitic      stearic        oleic     linoleic    linolenic
## palmitic    0.006874564 -0.001287151 -0.009063949  0.005874276 -0.002011160
## stearic    -0.001287151  0.004047970 -0.004442296  0.005421042 -0.007620419
## oleic      -0.009063949 -0.004442296  0.022894849 -0.019555992  0.018265527
## linoleic    0.005874276  0.005421042 -0.019555992  0.019241713 -0.020152944
## linolenic  -0.002011160 -0.007620419  0.018265527 -0.020152944  0.044387654
## eicosanoic  0.007938304  0.007777060 -0.018285124  0.012956265 -0.034900732
## eicosenoic -0.002117548 -0.003028478  0.012110401 -0.013792566  0.020370530
##              eicosanoic   eicosenoic
## palmitic    0.007938304 -0.002117548
## stearic     0.007777060 -0.003028478
## oleic      -0.018285124  0.012110401
## linoleic    0.012956265 -0.013792566
## linolenic  -0.034900732  0.020370530
## eicosanoic  0.168779677  0.028985557
## eicosenoic  0.028985557  0.029960841

## ----------------------------------------------------------------- 
## Dimension reduction for model-based clustering and classification 
## ----------------------------------------------------------------- 
## 
## Mixture model type: Mclust (VEV, 9) 
##         
## Clusters  n
##        1 14
##        2 15
##        3 13
##        4 17
##        5  8
##        6 14
##        7  4
##        8  5
##        9  6
## 
## Estimated basis vectors: 
##                  Dir1       Dir2      Dir3      Dir4      Dir5      Dir6
## palmitic    0.0398550  0.0274799 -0.042044 -0.158205  0.110356 -0.191597
## stearic     0.0513903 -0.0396812 -0.136271 -0.058337 -0.618839  0.094103
## oleic       0.7035073 -0.7456268 -0.594721 -0.678730 -0.566964 -0.622176
## linoleic    0.6720310 -0.6585070 -0.751601 -0.700154 -0.186876 -0.746142
## linolenic  -0.2212336 -0.0871214 -0.104507 -0.106642 -0.243924 -0.078350
## eicosanoic  0.0160890 -0.0045188  0.070768 -0.014710  0.434002  0.041523
## eicosenoic  0.0043908 -0.0216544 -0.212451 -0.095338 -0.025151 -0.052364
##                 Dir7
## palmitic    0.095723
## stearic     0.047857
## oleic       0.679406
## linoleic    0.709572
## linolenic   0.140340
## eicosanoic  0.057310
## eicosenoic -0.021957
## 
##                Dir1    Dir2    Dir3    Dir4     Dir5      Dir6       Dir7
## Eigenvalues  1.7729  1.5348  1.3844  1.2009  0.58992  0.036647 2.0756e-04
## Cum. %      27.1933 50.7343 71.9677 90.3865 99.43472 99.996816 1.0000e+02

## Class_GMM
##  1  2  3  4  5  6  7  8  9 
## 14 15 13 17  8 14  4  5  6
##          
## Class_GMM corn olive peanut pumpkin rapeseed soybean sunflower
##         1    0     0      0      14        0       0         0
##         2    0     0      0      15        0       0         0
##         3    0     0      0       2        0      11         0
##         4    0     0      0       0        0       0        17
##         5    0     0      0       2        0       0         6
##         6    2     2      3       4        0       0         3
##         7    0     0      0       0        4       0         0
##         8    0     5      0       0        0       0         0
##         9    0     0      0       0        6       0         0

## Best BIC values:
##              VEI,7      VEI,9       VEI,5
## BIC      -559.1728 -561.98725 -563.379201
## BIC diff    0.0000   -2.81443   -4.206377
## ---------------------------------------------------- 
## Gaussian finite mixture model fitted by EM algorithm 
## ---------------------------------------------------- 
## 
## Mclust VEI (diagonal, equal shape) model with 7 components: 
## 
##  log-likelihood  n df       BIC       ICL
##       -197.4281 96 36 -559.1728 -564.7159
## 
## Clustering table:
##  1  2  3  4  5  6  7 
## 29 22 23  6  6  6  4 
## 
## Mixing probabilities:
##          1          2          3          4          5          6          7 
## 0.30691940 0.22028499 0.23959707 0.06193194 0.06747129 0.06214134 0.04165398 
## 
## Means:
##           [,1]        [,2]       [,3]      [,4]       [,5]      [,6]       [,7]
## PC1 -0.7052117 -0.09433951 -0.5756359 1.7897450  1.2076003  1.155634  2.6651009
## PC2 -0.7695881  0.32359665  1.0339965 0.7456446 -0.4795708 -2.234596  1.0134525
## PC3 -0.2100309  0.14579246  0.2230587 1.2108666 -2.4301516  1.301180 -0.3116263
## 
## Variances:
## [,,1]
##            PC1        PC2       PC3
## PC1 0.04246937 0.00000000 0.0000000
## PC2 0.00000000 0.07457073 0.0000000
## PC3 0.00000000 0.00000000 0.2123919
## [,,2]
##            PC1       PC2       PC3
## PC1 0.07573322 0.0000000 0.0000000
## PC2 0.00000000 0.1329778 0.0000000
## PC3 0.00000000 0.0000000 0.3787465
## [,,3]
##            PC1        PC2        PC3
## PC1 0.01172287 0.00000000 0.00000000
## PC2 0.00000000 0.02058385 0.00000000
## PC3 0.00000000 0.00000000 0.05862679
## [,,4]
##           PC1       PC2       PC3
## PC1 0.1247447 0.0000000 0.0000000
## PC2 0.0000000 0.2190356 0.0000000
## PC3 0.0000000 0.0000000 0.6238559
## [,,5]
##           PC1       PC2      PC3
## PC1 0.4914034 0.0000000 0.000000
## PC2 0.0000000 0.8628411 0.000000
## PC3 0.0000000 0.0000000 2.457539
## [,,6]
##            PC1        PC2        PC3
## PC1 0.01596124 0.00000000 0.00000000
## PC2 0.00000000 0.02802589 0.00000000
## PC3 0.00000000 0.00000000 0.07982318
## [,,7]
##             PC1         PC2        PC3
## PC1 0.003531958 0.000000000 0.00000000
## PC2 0.000000000 0.006201662 0.00000000
## PC3 0.000000000 0.000000000 0.01766354

## ----------------------------------------------------------------- 
## Dimension reduction for model-based clustering and classification 
## ----------------------------------------------------------------- 
## 
## Mixture model type: Mclust (VEI, 7) 
##         
## Clusters  n
##        1 29
##        2 22
##        3 23
##        4  6
##        5  6
##        6  6
##        7  4
## 
## Estimated basis vectors: 
##          Dir1     Dir2      Dir3
## PC1 -0.938582  0.34482 -0.012735
## PC2 -0.333815 -0.91673 -0.219473
## PC3 -0.087353 -0.20174  0.975535
## 
##                Dir1    Dir2      Dir3
## Eigenvalues  1.7092  1.5513   0.77493
## Cum. %      42.3546 80.7966 100.00000

## Class_GMM_pca
##  1  2  3  4  5  6  7 
## 29 22 23  6  6  6  4
##              
## Class_GMM_pca corn olive peanut pumpkin rapeseed soybean sunflower
##             1    0     0      0      29        0       0         0
##             2    2     0      0       7        0      11         2
##             3    0     0      0       0        0       0        23
##             4    0     0      0       0        6       0         0
##             5    0     1      3       1        0       0         1
##             6    0     6      0       0        0       0         0
##             7    0     0      0       0        4       0         0