The wine example using K-Means (Clustering technique)

Author : Edurne Alonso Moran

Introduction

A dataset containing 13 chemical measurements (Alcohol, Malic acid, Ash, Alcalinity of ash, Magnesium, Total phenols, Flavanoids, Nonflavanoid phenols, Proanthocyanins, Color intensity, Hue, OD280/OD315 of diluted wines and Proline) on 178 Italian wine samples is analyzed.

These data are the results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars.

Methodology

1) A K-means cluster of the data is performed. This algorithm is the most common partitioning method and aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells.

2) Since K-means cluster analysis starts with k randomly chosen centroids, a different solution can be obtained each time the function is invoked. Indeed, the variables vary in range, so they are standardized prior to clustering.

3) In the shiny app, the number of clusters (k) is determined by user and the cluster graph is drawn each time the user changes the number of clusters.

R code and plot

R code

## Loading required package: MASS
## This is vegan 2.3-1

data(wine)
df<-scale(wine[-1])
c<-kmeans(df,3)
cmd<-cmdscale(dist(df))
groups<-levels(factor(c$cluster))
ordiplot(cmd)
for(i in seq_along(groups)){
  points(cmd[factor(c$cluster)==groups[i],],col=i,pch=16)}
ordispider(cmd,factor(c$cluster),label=TRUE)
ordihull(cmd,factor(c$cluster),lty="dotted")

R plot (fixed to 3 cluster) plot of chunk unnamed-chunk-3

My Shiny App

Then, I have implemented the code shown before using an interactive web application - Shiny. This is a new package from RStudio that makes it incredibly easy with R.

The application designed for this project can be found in the following url: https://edurnita.shinyapps.io/my_app