Clustering example using Wholesale Spending data

Nimish Sanghi
05 November, 2016

Description of Data

The data was obtained from UCI dataset. https://archive.ics.uci.edu/ml/datasets/Wholesale+customers.

  • The first set of variables in the dataset that are used represent wholesale customer spending under major categories. All these represent annual spending (m.u.).

    • FRESH: fresh products
    • MILK: milk products
    • GROCERY: grocery products
    • FROZEN: frozen products
    • DETERGENTS_PAPER: detergents and paper products
    • DELICATESSEN: delicatessen products
  • The dataset contains two other variable which are not used in this demonstration:

    • CHANNEL: customers Channel
    • REGION: customers Region

Descriptive Statistics:

The Descriptive statistics of the data is:

    Channel          Region          Fresh             Milk      
 Min.   :1.000   Min.   :1.000   Min.   :     3   Min.   :   55  
 1st Qu.:1.000   1st Qu.:2.000   1st Qu.:  3128   1st Qu.: 1533  
 Median :1.000   Median :3.000   Median :  8504   Median : 3627  
 Mean   :1.323   Mean   :2.543   Mean   : 12000   Mean   : 5796  
 3rd Qu.:2.000   3rd Qu.:3.000   3rd Qu.: 16934   3rd Qu.: 7190  
 Max.   :2.000   Max.   :3.000   Max.   :112151   Max.   :73498  
    Grocery          Frozen        Detergents_Paper    Delicassen     
 Min.   :    3   Min.   :   25.0   Min.   :    3.0   Min.   :    3.0  
 1st Qu.: 2153   1st Qu.:  742.2   1st Qu.:  256.8   1st Qu.:  408.2  
 Median : 4756   Median : 1526.0   Median :  816.5   Median :  965.5  
 Mean   : 7951   Mean   : 3071.9   Mean   : 2881.5   Mean   : 1524.9  
 3rd Qu.:10656   3rd Qu.: 3554.2   3rd Qu.: 3922.0   3rd Qu.: 1820.2  
 Max.   :92780   Max.   :60869.0   Max.   :40827.0   Max.   :47943.0  
  • Only last six dimensions (Fresh, Milk, Grocery, Frozen, Detergents_Paper, Delicassen) were used for clustering analysis.
  • The data was further pre processed to remove some outliers. This was done to make the graph look more spread out.

A Sample Clustering on the data is shown below

plot of chunk unnamed-chunk-2

How to Use

  • Go to the URL https://nsanghi.shinyapps.io/Wholesale_clustering/
  • Choose the two dimensions you want to plot and use for clustering.
  • Choose the number of clusters you want to create on the plotted data.
  • The data points are plotted and color coded based on the cluster assignment.