12/17/2021

So why Cluster

Clustering is an unsupervised learning technique. It is the task of grouping together a set of objects in a way that objects in the same cluster are more similar to each other than to objects in other clusters. Similarity is an amount that reflects the strength of relationship between two data objects.

Clustering is mainly used for exploratory data mining. It is used in many fields such as machine learning, pattern recognition, image analysis, information retrieval, bio-informatics, data compression, and computer graphics.

Clustering can be broadly divided into two subgroups:

1.Hard clustering:

2.Soft clustering:

Clustering on Data Set

FALSE Registered S3 methods overwritten by 'tibble':
FALSE   method     from  
FALSE   format.tbl pillar
FALSE   print.tbl  pillar

These are the clusters

FALSE K-means clustering with 2 clusters of sizes 68, 176
FALSE 
FALSE Cluster means:
FALSE   total_bill      tip     size
FALSE 1   31.45132 4.191471 3.411765
FALSE 2   15.27886 2.537273 2.244318
FALSE 
FALSE Clustering vector:
FALSE   [1] 2 2 2 1 1 1 2 1 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 1 2
FALSE  [38] 2 2 1 2 2 2 2 1 2 2 1 1 2 2 2 1 2 1 2 1 1 2 1 2 2 2 2 2 2 2 2 2 2 2 2 1 1
FALSE  [75] 2 2 2 1 2 2 2 2 2 1 2 1 2 2 1 2 1 2 2 2 2 1 1 2 2 2 2 2 1 2 2 2 2 1 2 2 2
FALSE [112] 2 1 1 1 2 1 2 2 1 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 2 2 2 2
FALSE [149] 2 2 2 2 2 1 2 1 1 1 2 2 2 2 2 2 2 1 2 1 2 2 1 2 2 1 2 1 2 2 2 1 1 1 1 1 1
FALSE [186] 2 2 1 2 2 2 2 1 2 2 2 2 1 2 2 2 2 2 2 2 2 1 1 1 2 1 1 1 2 1 2 1 2 2 1 2 2
FALSE [223] 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 1 1 1 1 2 2 2
FALSE 
FALSE Within cluster sum of squares by cluster:
FALSE [1] 3682.184 3231.503
FALSE  (between_SS / total_SS =  65.3 %)
FALSE 
FALSE Available components:
FALSE 
FALSE [1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss"
FALSE [6] "betweenss"    "size"         "iter"         "ifault"
FALSE    
FALSE      1 1.01 1.1 1.17 1.25 1.32 1.36 1.44 1.45 1.47 1.48 1.5 1.56 1.57 1.58 1.61
FALSE   1  0    0   0    1    0    0    0    0    0    0    0   1    0    0    0    0
FALSE   2  4    1   1    0    3    1    1    2    1    1    1   8    1    1    1    1
FALSE    
FALSE     1.63 1.64 1.66 1.67 1.68 1.71 1.73 1.75 1.76 1.8 1.83 1.92 1.96 1.97 1.98
FALSE   1    0    0    0    0    0    0    0    0    0   0    0    0    0    0    0
FALSE   2    1    1    1    1    1    1    1    1    1   1    1    1    1    1    1
FALSE    
FALSE      2 2.01 2.02 2.03 2.05 2.09 2.18 2.2 2.23 2.24 2.3 2.31 2.34 2.45 2.47 2.5
FALSE   1  5    0    0    1    1    0    0   0    0    0   0    0    0    0    0   1
FALSE   2 28    2    1    1    0    1    1   2    2    2   1    2    1    1    1   9
FALSE    
FALSE     2.52 2.54 2.55 2.56 2.6 2.61 2.64 2.71 2.72 2.74 2.75 2.83 2.88 2.92  3
FALSE   1    0    0    1    1   0    0    0    0    0    0    0    0    0    1  5
FALSE   2    1    1    0    0   1    1    1    1    1    1    2    1    1    0 18
FALSE    
FALSE     3.02 3.06 3.07 3.08 3.09 3.11 3.12 3.14 3.15 3.16 3.18 3.21 3.23 3.25 3.27
FALSE   1    0    0    0    0    1    1    1    1    0    0    1    0    0    0    0
FALSE   2    1    1    1    1    0    0    0    0    1    1    1    1    2    2    1
FALSE    
FALSE     3.31 3.35 3.39 3.4 3.41 3.48 3.5 3.51 3.55 3.6 3.61 3.68 3.71 3.75 3.76
FALSE   1    1    0    0   0    1    1   1    0    1   1    1    1    0    1    0
FALSE   2    0    1    1   1    0    2   8    1    0   0    0    0    1    0    2
FALSE    
FALSE     3.92  4 4.06 4.08 4.19 4.2 4.29 4.3 4.34 4.5 4.67 4.71 4.73  5 5.07 5.14
FALSE   1    0  4    0    0    0   1    1   0    1   1    1    1    1  8    1    1
FALSE   2    1  8    1    2    1   0    0   2    0   0    0    0    0  2    0    0
FALSE    
FALSE     5.15 5.16 5.17 5.2 5.6 5.65 5.85 5.92  6 6.5 6.7 6.73 7.58  9 10
FALSE   1    0    1    1   1   1    1    1    1  1   2   1    1    1  1  1
FALSE   2    1    0    0   0   0    0    0    0  0   0   0    0    0  0  0

total bill cluster

Size cluster