Intro

I want to clustering coffee based on their characteristics.

Read Dataset

#> 'data.frame':    1082 obs. of  13 variables:
#>  $ coffeeId     : int  1 2 3 4 5 6 7 8 9 10 ...
#>  $ Aroma        : num  7.83 8 7.92 8 8.33 8 7.67 7.67 7.67 7.67 ...
#>  $ Flavor       : num  8.08 7.75 7.83 7.92 7.83 7.92 7.75 7.75 7.75 7.83 ...
#>  $ Aftertaste   : num  7.75 7.92 7.92 7.92 7.83 7.67 7.83 7.83 7.58 7.83 ...
#>  $ Acidity      : num  7.92 8 8 7.75 7.75 8 7.83 7.67 7.83 7.83 ...
#>  $ Body         : num  8.25 7.92 7.83 7.83 8.25 7.75 7.92 7.92 7.83 7.92 ...
#>  $ Balance      : num  7.92 7.92 7.92 7.75 7.75 7.92 7.75 7.83 8 7.75 ...
#>  $ Uniformity   : num  10 10 10 10 10 10 10 10 10 10 ...
#>  $ Clean.Cup    : num  10 10 10 10 10 10 10 10 10 10 ...
#>  $ Sweetness    : num  8 8 7.83 7.75 7.58 7.75 8 7.92 7.92 7.75 ...
#>  $ Cupper.Points: num  8 8 8 8.08 7.67 7.75 7.83 7.92 7.92 7.83 ...
#>  $ Moisture     : num  0.12 0 0 0.12 0.12 0 0 0.1 0.09 0.12 ...
#>  $ Quakers      : int  0 0 0 0 0 0 0 0 0 0 ...

Principal Component Analysis

Principal component analysis (PCA) is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables (entities each of which takes on various numerical values) into a set of values of linearly uncorrelated variables called principal components. This transformation is defined in such a way that the first principal component has the largest possible variance (that is, accounts for as much of the variability in the data as possible), and each succeeding component in turn has the highest variance possible under the constraint that it is orthogonal to the preceding components. The resulting vectors (each being a linear combination of the variables and containing n observations) are an uncorrelated orthogonal basis set. PCA is sensitive to the relative scaling of the original variables.

#> 
#> Call:
#> PCA(X = coffee_scale, scale.unit = F, ncp = 13, graph = F) 
#> 
#> 
#> Eigenvalues
#>                        Dim.1   Dim.2   Dim.3   Dim.4   Dim.5   Dim.6
#> Variance               6.938   1.443   0.996   0.941   0.630   0.474
#> % of var.             53.418  11.114   7.670   7.244   4.853   3.653
#> Cumulative % of var.  53.418  64.531  72.202  79.446  84.299  87.951
#>                        Dim.7   Dim.8   Dim.9  Dim.10  Dim.11  Dim.12
#> Variance               0.353   0.313   0.247   0.230   0.175   0.155
#> % of var.              2.716   2.406   1.902   1.771   1.346   1.192
#> Cumulative % of var.  90.668  93.074  94.976  96.746  98.092  99.284
#>                       Dim.13
#> Variance               0.093
#> % of var.              0.716
#> Cumulative % of var. 100.000
#> 
#> Individuals (the 10 first)
#>                   Dist    Dim.1    ctr   cos2    Dim.2    ctr   cos2  
#> 1             |  4.858 |  2.793  0.104  0.331 | -2.668  0.456  0.302 |
#> 2             |  5.010 |  2.743  0.100  0.300 | -3.450  0.762  0.474 |
#> 3             |  5.123 |  2.623  0.092  0.262 | -3.604  0.832  0.495 |
#> 4             |  4.770 |  2.278  0.069  0.228 | -2.830  0.513  0.352 |
#> 5             |  5.331 |  2.434  0.079  0.208 | -2.997  0.575  0.316 |
#> 6             |  5.049 |  2.281  0.069  0.204 | -3.586  0.823  0.504 |
#> 7             |  4.597 |  1.978  0.052  0.185 | -3.284  0.691  0.510 |
#> 8             |  4.232 |  1.801  0.043  0.181 | -2.679  0.460  0.401 |
#> 9             |  4.208 |  1.792  0.043  0.181 | -2.740  0.481  0.424 |
#> 10            |  4.503 |  1.805  0.043  0.161 | -2.717  0.473  0.364 |
#>                Dim.3    ctr   cos2  
#> 1              0.367  0.012  0.006 |
#> 2              0.076  0.001  0.000 |
#> 3              0.106  0.001  0.000 |
#> 4              0.432  0.017  0.008 |
#> 5              0.446  0.018  0.007 |
#> 6              0.109  0.001  0.000 |
#> 7              0.050  0.000  0.000 |
#> 8              0.315  0.009  0.006 |
#> 9              0.280  0.007  0.004 |
#> 10             0.392  0.014  0.008 |
#> 
#> Variables (the 10 first)
#>                  Dim.1    ctr   cos2    Dim.2    ctr   cos2    Dim.3
#> coffeeId      | -0.746  8.025  0.557 |  0.384 10.215  0.148 | -0.083
#> Aroma         |  0.855 10.542  0.732 | -0.066  0.303  0.004 |  0.021
#> Flavor        |  0.940 12.742  0.885 | -0.075  0.388  0.006 |  0.018
#> Aftertaste    |  0.933 12.536  0.871 | -0.087  0.520  0.008 |  0.013
#> Acidity       |  0.874 11.012  0.765 | -0.083  0.473  0.007 | -0.004
#> Body          |  0.854 10.518  0.730 | -0.084  0.490  0.007 | -0.007
#> Balance       |  0.890 11.416  0.793 | -0.076  0.401  0.006 | -0.005
#> Uniformity    |  0.596  5.122  0.356 |  0.519 18.670  0.270 | -0.051
#> Clean.Cup     |  0.534  4.107  0.285 |  0.523 18.952  0.274 | -0.067
#> Sweetness     |  0.412  2.445  0.170 |  0.733 37.229  0.538 | -0.104
#>                  ctr   cos2  
#> coffeeId       0.687  0.007 |
#> Aroma          0.045  0.000 |
#> Flavor         0.031  0.000 |
#> Aftertaste     0.018  0.000 |
#> Acidity        0.001  0.000 |
#> Body           0.005  0.000 |
#> Balance        0.003  0.000 |
#> Uniformity     0.261  0.003 |
#> Clean.Cup      0.455  0.005 |
#> Sweetness      1.077  0.011 |

Plot PCA also to see the outlier

#> $Dim.1
#> $Dim.1$quanti
#>               correlation       p.value
#> Flavor          0.9406636  0.000000e+00
#> Aftertaste      0.9330372  0.000000e+00
#> Balance         0.8903672  0.000000e+00
#> Cupper.Points   0.8775643  0.000000e+00
#> Acidity         0.8744702  0.000000e+00
#> Aroma           0.8555946 4.061215e-311
#> Body            0.8546484 1.049902e-309
#> Uniformity      0.5963953 3.319503e-105
#> Clean.Cup       0.5340393  8.258183e-81
#> Sweetness       0.4120568  1.347442e-45
#> Moisture       -0.1751626  6.631645e-09
#> coffeeId       -0.7464989 2.702566e-193
#> 
#> 
#> $Dim.2
#> $Dim.2$quanti
#>               correlation       p.value
#> Sweetness      0.73340077 3.081049e-183
#> Clean.Cup      0.52327594  4.265727e-77
#> Uniformity     0.51937303  8.785091e-76
#> coffeeId       0.38416983  2.236144e-39
#> Moisture       0.37555249  1.424046e-37
#> Quakers        0.13601584  7.129601e-06
#> Aroma         -0.06616688  2.952929e-02
#> Flavor        -0.07489075  1.373761e-02
#> Balance       -0.07615240  1.222151e-02
#> Acidity       -0.08264565  6.527425e-03
#> Body          -0.08411397  5.630897e-03
#> Aftertaste    -0.08666795  4.332094e-03
#> Cupper.Points -0.13789572  5.302953e-06
#> 
#> 
#> $Dim.3
#> $Dim.3$quanti
#>           correlation      p.value
#> Quakers    0.97902816 0.0000000000
#> Moisture   0.11155973 0.0002361216
#> Clean.Cup -0.06733053 0.0267800937
#> coffeeId  -0.08273990 0.0064662377
#> Sweetness -0.10363297 0.0006398488

Because on the plot we see the observation in 1082 is outlier, so I have to take out.

Making Cluster

Visualizing the cluster

Conclusion

After clustered intu 3 class, I want to see the characteristics of each cluster based on aroma, sweetness, flavor, body, and acidity. From the cluster we can see that coffe in the cluster 3 have the highest value in all characteristics except sweetness. Type of coffe in cluster 1 have the lowest value of all characteristics.