Clustering

Kmeans

rm(list =ls())
#install.packages("factoextra")
#install.packages("NbClust")
library(cluster)

package ‘cluster’ was built under R version 3.5.2

library(factoextra)

Loading required package: ggplot2
Welcome! Related Books: `Practical Guide To Cluster Analysis in R` at https://goo.gl/13EFCZ

library(dplyr)


Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

library(NbClust)
setwd("/Users/jayavarshini/Desktop/ms/sem1/dmm/Assing3/")

The working directory was changed to /Users/jayavarshini/Desktop/ms/sem1/dmm/Assing3 inside a notebook chunk. The working directory will be reset when the chunk is finished running. Use the knitr root.dir option in the setup chunk to change the working directory for notebook chunks.

data_buddy_move <- read.csv("buddymove_holidayiq.csv", header=T, sep=",", comment.char = '#')
head(data_buddy_move)

buddy_move=data_buddy_move[,2:7]
head(buddy_move)

#Applying basic statistics before applying k means to check to apply standardization or not. 
stats<- data.frame(
  Min = apply(buddy_move, 2, min), # minimum
  Med = apply(buddy_move, 2, median), # median
  Mean = apply(buddy_move, 2, mean), # mean
  SD = apply(buddy_move, 2, sd), # Standard deviation
  Max = apply(buddy_move, 2, max)
)
stats <- round(stats, 1)
head(stats)

The minimum and maximum of sports value is much less than the rest, So I’m scaling the data.

x<-scale(buddy_move)
head(x)

        Sports  Religious     Nature    Theatre   Shopping     Picnic
[1,] -1.509552 -1.0100142 -0.9973422 -1.4744331 -1.0740003 -0.7783943
[2,] -1.509552 -1.4722052 -1.0630749 -1.2565864 -1.0499404 -1.6057691
[3,] -1.509552 -1.8419580 -0.6029459 -0.9142560 -1.5070790 -1.3912645
[4,] -1.509552 -1.2873288 -1.0411640 -0.6652884 -0.8815209 -1.8202736
[5,] -1.509552 -0.3629468 -1.5451149 -1.7856426 -0.4243823 -1.0541859
[6,] -1.358415 -1.7803325 -0.3400150 -0.7275303 -1.4589591 -1.3606210

stats<- data.frame(
  Min = apply(x, 2, min), # minimum
  Med = apply(x, 2, median), # median
  Mean = apply(x, 2, mean), # mean
  SD = apply(x, 2, sd), # Standard deviation
  Max = apply(x, 2, max)
)
stats <- round(stats, 1)
head(stats)

a

# Initializing total within sum of squares error: wss
wss <- 0
# For 1 to 15 cluster centers
for (i in 1:15) {
  kmn <- kmeans(x, centers = i, nstart = 20)
  # Saving the total within sum of squares to wss variable
  wss[i] <- kmn$tot.withinss
}
# Plot total within sum of squares vs. number of clusters
plot(1:15, wss, type = "b", 
     xlab = "Number of Clusters", 
     ylab = "Within groups sum of squares")

The plot has an elbow where the quality measure improves more slowly as the number of clusters increases which shows the quality of the of the model is no longer improving substantially as the model complexity increases.

Also to do it using fviz_nbclust() easily:

# How many clusters?  A couple of means to visuzalize it.
fviz_nbclust(x, kmeans, method="wss") # Elbow method minimizes total

# within-cluster sum of squares (wss).  Also called a "Scree" plot.
# Silhouette measures the quality of a cluster, i.e., how well each 
# point lies within its cluster.
fviz_nbclust(x, kmeans, method="silhouette")

k <- 3
kmean=kmeans(x,centers = 3,nstart = 25)

Part b

fviz_cluster(kmean, data=x)

Part c

Number of observations in each cluster

kmean$size

[1]  39 113  97

Part d

Total SSE of the clusters

print(kmean$tot.withinss)

[1] 649.9424

Part e

SSE of each cluster

print(kmean$withinss)

[1]  96.53409 222.64518 330.76311

Part f

for(i in 1:3)
{
  print(i)
  print(which(kmean$cluster==i))
}

[1] 1
 [1]  51  59  71  91  98 116 117 124 130 137 141 144 146 149 159 161 162 163 166 169 176 180 184 186 191 195
[27] 200 203 205 210 213 214 215 218 225 228 232 237 241
[1] 2
  [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26
 [27]  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  52  53
 [53]  54  55  56  57  58  60  61  62  63  64  65  66  67  68  69  70  72  73  74  75  76  77  78  79  80  81
 [79]  82  83  84  85  86  87  88  89  90  92  93  94  95  96  97  99 100 101 102 103 104 105 106 107 108 109
[105] 125 129 134 140 142 145 148 156 238
[1] 3
 [1] 110 111 112 113 114 115 118 119 120 121 122 123 126 127 128 131 132 133 135 136 138 139 143 147 150 151
[27] 152 153 154 155 157 158 160 164 165 167 168 170 171 172 173 174 175 177 178 179 181 182 183 185 187 188
[53] 189 190 192 193 194 196 197 198 199 201 202 204 206 207 208 209 211 212 216 217 219 220 221 222 223 224
[79] 226 227 229 230 231 233 234 235 236 239 240 242 243 244 245 246 247 248 249

In cluster one, the users who has given more reviews about nature and picnic are clustered together. It is obvious that people enjoy nature prefer to spend more time with family having picnics. For instance, for cluster points 73,84,94 the nature and picnic values are high than the rest. So they are clustered together.

In cluster two, Something I found out interesting was the users who rated can be mothers/women of the family. I say so because the ratings on Religious,Shopping and Picnic are specifically high.

In cluster three, the sports ratings play a role. If we can see the clusters more in detail, we can find the users who prefer watching movies and sports on tv than outdoors.

Hierarchical clustering

set.seed(1122)
setwd("/Users/jayavarshini/Desktop/ms/sem1/dmm/Assing3/")

The working directory was changed to /Users/jayavarshini/Desktop/ms/sem1/dmm/Assing3 inside a notebook chunk. The working directory will be reset when the chunk is finished running. Use the knitr root.dir option in the setup chunk to change the working directory for notebook chunks.

DataSet <- read.csv("buddymove_holidayiq.csv", header=T, sep=",", comment.char = '#')
DataSet

SubSet<-sample_n(DataSet, 50)
rownames(SubSet)<-SubSet$User.Id
SubSet<-SubSet[2:7]
SubSet

x<-scale(SubSet)
head(x)

             Sports  Religious     Nature     Theatre   Shopping     Picnic
User 20  -0.8438782  0.2067001 -0.9068927 -1.32468472  0.0397644 -0.5215410
User 4   -1.2561965 -1.2040127 -1.0526953 -0.47224925 -0.8705050 -1.6589443
User 167  1.4925923  0.6360474  0.9156408  2.02822813  1.0937605  0.2552222
User 36  -0.9813176 -0.8666683 -0.8339913 -0.04603152 -0.4632792 -1.3260458
User 116  0.1181979  0.7893858 -0.3965833  0.20969913  0.3990813  0.3384468
User 115  0.3930768 -0.4066533  1.0371431 -0.35859119 -0.4393248  0.8932777

Part a

Complete Linakge

complete_linkage<- eclust(x, "hclust", hc_method = "complete",k=1)

fviz_dend(complete_linkage, show_labels=T, palette="jco")

The number of singleton clusters: 19

single_linkage<- eclust(x, "hclust", hc_method = "single",k=1)

fviz_dend(single_linkage, show_labels=T, palette="jco",main='Single Linkage')

The number of singleton custer pairs:15

average_linkage<- eclust(x ,"hclust", hc_method ="average",k=1)

fviz_dend(average_linkage ,show_labels=T,palette="jco", main='Average Linkage')

The total number of singleton cluster pairs 18

Part b

Complete Linkage: The number of singleton clusters: 19

{User 71,User 98},{User 12,User 11},{User 72,User 73},{User 115,User 56},{User 18 ,User 41},{User 23,User 58},{User 43,User 35},{User 36,User 60},{User 4,User 37},{User 200,User 195},{User 199,User 168},{User 197,User 217},{User 224,User 221},{User 167,User 240},{User 140,User 145},{User 155,User 136},{User 170,User 225},{User 157,User 131},{User 116,User 139}

Single Linkage: The number of singleton custer pairs:15

{User 200,User 195},{User 224,User 221},{User 167 ,User 240},{User 71,User 98},{User 12,User 11},{User 18,User 41},{User 23,User 38},{User 43,User 35},{User 157,User 131},{User 116,User 139},{User 40,User 45},{User 140,User 145},{User155 ,User 136},{User 197 ,User 217},{User 199,User 168}

Average Linkage: The total number of singleton cluster pairs 18 {User 12,User 11},{User 43,User 35},{User 36,User 60},{User 4,User 37},{User 72,User 73},{User 18,User 41},{User 23,User 38},{User 71,User 98},{User 140,User 145},{User ,155User 136},{User 157,User 131},{User 116,User 139},{User 200,User 195},{User 224,User 221},{User 167,User 240},{User 197,User 217},{User 170,User 225},{User 199,User 168}

Part c

According to the assumption I take, the single linkage has the smallest number of singleton pairs and I consider the purest.

Part d

#cutree(single_linkage,)
cutree(single_linkage,h=1.7)

 User 20   User 4 User 167  User 36 User 116 User 115 User 237  User 43  User 71   User 1 User 200  User 23 
       1        1        1        1        1        1        2        1        1        1        2        1 
 User 72 User 140 User 197 User 170 User 225  User 35  User 17  User 56 User 155 User 145  User 53 User 240 
       1        1        1        1        1        1        1        1        1        1        1        1 
 User 37 User 157  User 18  User 98 User 195 User 185  User 14  User 73 User 217 User 224  User 38 User 199 
       1        1        1        1        2        3        1        1        1        1        1        3 
 User 76 User 239 User 168 User 139   User 9  User 41  User 31  User 60  User 12 User 221  User 11 User 131 
       1        1        3        1        1        1        1        1        1        1        1        1 
User 136 User 248 
       1        3

plot(single_linkage)
abline(h=1.7,col="red")

Im getting 3 clusters at height 1.7 #### Part e

complete_linkage2<- eclust(x, "hclust", hc_method = "complete",k=3)
fviz_dend(complete_linkage2, show_labels=T, palette="jco")

single_linkage2<- eclust(x, "hclust", hc_method = "single",k=3)
fviz_dend(single_linkage2, show_labels=T, palette="jco")

average_linkage2<- eclust(x, "hclust", hc_method  = "average",k=3)
fviz_dend(average_linkage2, show_labels=T, palette="jco")

Silhouette index for all types of linkage.

complete_statastics <- fpc::cluster.stats(dist(x), complete_linkage2$cluster)
complete_statastics$avg.silwidth

[1] 0.3533173

single_statastics <- fpc::cluster.stats(dist(x), single_linkage2$cluster)
single_statastics$avg.silwidth

[1] 0.2703044

average_statastics <- fpc::cluster.stats(dist(x), average_linkage2$cluster)
average_statastics$avg.silwidth

[1] 0.3775509

ACcording to the average silhoutte index, the complete linakge is the best. #### Part f

NbClust(x,method = "complete")

*** : The Hubert index is a graphical method of determining the number of clusters.
                In the plot of Hubert index, we seek a significant knee that corresponds to a 
                significant increase of the value of the measure i.e the significant peak in Hubert
                index second differences plot.

*** : The D index is a graphical method of determining the number of clusters. 
                In the plot of D index, we seek a significant knee (the significant peak in Dindex
                second differences plot) that corresponds to a significant increase of the value of
                the measure. 
 
******************************************************************* 
* Among all indices:                                                
* 9 proposed 2 as the best number of clusters 
* 3 proposed 3 as the best number of clusters 
* 3 proposed 5 as the best number of clusters 
* 1 proposed 7 as the best number of clusters 
* 1 proposed 11 as the best number of clusters 
* 1 proposed 12 as the best number of clusters 
* 2 proposed 13 as the best number of clusters 
* 1 proposed 14 as the best number of clusters 
* 2 proposed 15 as the best number of clusters 

                   ***** Conclusion *****                            
 
* According to the majority rule, the best number of clusters is  2 
 
 
******************************************************************* 
$All.index
        KL      CH Hartigan     CCC    Scott    Marriot    TrCovW   TraceW Friedman   Rubin Cindex     DB
2  14.4914 46.9075   9.1933 -0.0306  80.1498 6636265.36 1058.1850 148.6922  40.2032  1.9772 0.3928 1.0072
3   0.6058 31.8644   8.4750 -1.7867 108.2655 8509357.51  658.9692 124.7913  49.2698  2.3559 0.4817 0.9674
4   0.4633 27.3049  12.7285 -2.6062 147.8576 6853013.19  494.0157 105.7267  56.4402  2.7808 0.4534 1.1392
5   2.8163 28.6898   6.1854 -1.5870 191.3229 4489177.14  207.0485  82.8121  65.7713  3.5502 0.4378 1.1284
6   0.4082 26.7361  11.9718 -1.7656 217.0269 3866042.84  190.8952  72.8048  70.3882  4.0382 0.4135 1.1115
7   2.4003 29.6480   6.2310 -1.4929 266.7837 1945264.45  137.4747  57.2326  77.3620  5.1369 0.4383 1.0173
8   0.7864 29.2879   7.6881 -1.1590 300.4442 1295956.95   99.8483  49.9888  87.3907  5.8813 0.4737 1.0032
9   2.0405 30.5342   4.5339 -0.3742 333.6766  843806.23   64.7258  42.2542  99.6265  6.9579 0.4885 0.9831
10  1.2209 29.8992   3.8888 -0.3057 357.4652  647340.45   56.6232  38.0469 106.6295  7.7273 0.4844 0.9425
11  1.0673 29.1664   3.6622 -0.3375 395.9761  362590.85   48.7869  34.6757 119.3246  8.4786 0.5319 0.7863
12  0.4254 28.5854   7.7659 -0.3723 425.0465  241263.48   43.7133  31.6991 134.3862  9.2747 0.5645 0.7906
13  5.0649 31.3580   2.2639  0.6995 473.9297  106517.70   28.5004  26.3201 154.6270 11.1702 0.5598 0.8439
14  1.0959 30.0561   2.0633  0.3537 511.2982   58507.67   26.8222  24.8026 183.9262 11.8536 0.5547 0.8136
15  0.2971 28.8325   5.2722 -0.0146 531.4134   44917.98   25.2488  23.4581 203.5508 12.5330 0.5518 0.7759
   Silhouette   Duda Pseudot2   Beale Ratkowsky    Ball Ptbiserial   Frey McClain   Dunn Hubert SDindex
2      0.3992 0.7376   7.8279  1.3094    0.4930 74.3461     0.6241 0.3151  0.5910 0.2437 0.0052  1.5176
3      0.3533 0.7044   7.9744  1.5340    0.4357 41.5971     0.6550 0.8284  0.6888 0.3175 0.0062  1.3622
4      0.3340 0.6023  15.8469  2.4387    0.3992 26.4317     0.6447 0.8799  0.8979 0.2885 0.0067  1.6247
5      0.2888 0.6402   9.5562  2.0425    0.3785 16.5624     0.5934 1.0301  1.4004 0.2675 0.0071  1.4764
6      0.2561 0.5212  11.9433  3.2821    0.3536 12.1341     0.5504 0.2625  1.7957 0.2671 0.0073  1.6565
7      0.2964 0.4385   5.1213  3.9406    0.3389  8.1761     0.5455 0.1703  2.1019 0.3231 0.0078  1.5675
8      0.3020 0.4920   8.2601  3.5310    0.3219  6.2486     0.5465 0.3411  2.1494 0.3603 0.0079  1.4893
9      0.3347 0.3897   7.8290  5.0201    0.3083  4.6949     0.5333 0.5025  2.3646 0.3987 0.0079  1.4925
10     0.3307 0.7936   0.2601  0.5004    0.2950  3.8047     0.5238 0.1855  2.4876 0.4056 0.0080  1.4612
11     0.3495 0.3693   6.8301  5.2555    0.2831  3.1523     0.5234 0.5418  2.5019 0.4464 0.0080  1.3589
12     0.3495 0.6123   8.2315  2.2621    0.2726  2.6416     0.5146 0.5471  2.6150 0.4805 0.0080  1.5709
13     0.3492 3.2551  -0.6928 -1.3327    0.2646  2.0246     0.4344 0.2383  3.8803 0.4575 0.0086  1.9501
14     0.3549 4.1310  -0.7579 -1.4580    0.2557  1.7716     0.4321 0.3750  3.9290 0.4593 0.0087  1.9432
15     0.3593 0.4727   6.6932  3.6787    0.2476  1.5639     0.4288 0.3128  3.9933 0.4612 0.0087  1.8913
   Dindex   SDbw
2  1.6200 1.2013
3  1.5004 0.7141
4  1.3910 0.3662
5  1.2288 0.3043
6  1.1601 0.2679
7  1.0239 0.2223
8  0.9609 0.1963
9  0.8870 0.1766
10 0.8312 0.1504
11 0.7841 0.1082
12 0.7513 0.1058
13 0.6862 0.1043
14 0.6602 0.0958
15 0.6358 0.0855

$All.CriticalValues
   CritValue_Duda CritValue_PseudoT2 Fvalue_Beale
2          0.5432            18.5029       0.2573
3          0.5190            17.6120       0.1733
4          0.5569            19.0934       0.0283
5          0.4997            17.0194       0.0667
6          0.4503            15.8722       0.0062
7          0.1924            16.7852       0.0070
8          0.3506            14.8210       0.0056
9          0.2445            15.4517       0.0011
10        -0.0981           -11.1930       0.7899
11         0.1924            16.7852       0.0014
12         0.4503            15.8722       0.0459
13        -0.0981           -11.1930       1.0000
14        -0.0981           -11.1930       1.0000
15         0.2864            14.9482       0.0059

$Best.nc
                     KL      CH Hartigan     CCC   Scott Marriot   TrCovW  TraceW Friedman  Rubin Cindex
Number_clusters  2.0000  2.0000   5.0000 13.0000  7.0000       5   3.0000  5.0000  14.0000 13.000 2.0000
Value_Index     14.4914 46.9075   6.5431  0.6995 49.7567 1740702 399.2158 12.9075  29.2992 -1.212 0.3928
                     DB Silhouette   Duda PseudoT2  Beale Ratkowsky   Ball PtBiserial Frey McClain    Dunn
Number_clusters 15.0000     2.0000 2.0000   2.0000 2.0000     2.000  3.000      3.000    1   2.000 12.0000
Value_Index      0.7759     0.3992 0.7376   7.8279 1.3094     0.493 32.749      0.655   NA   0.591  0.4805
                Hubert SDindex Dindex    SDbw
Number_clusters      0 11.0000      0 15.0000
Value_Index          0  1.3589      0  0.0855

$Best.partition
 User 20   User 4 User 167  User 36 User 116 User 115 User 237  User 43  User 71   User 1 User 200  User 23 
       1        1        2        1        2        1        2        1        1        1        2        1 
 User 72 User 140 User 197 User 170 User 225  User 35  User 17  User 56 User 155 User 145  User 53 User 240 
       1        2        2        2        2        1        1        1        2        2        1        2 
 User 37 User 157  User 18  User 98 User 195 User 185  User 14  User 73 User 217 User 224  User 38 User 199 
       1        2        1        1        2        2        1        1        2        2        1        2 
 User 76 User 239 User 168 User 139   User 9  User 41  User 31  User 60  User 12 User 221  User 11 User 131 
       1        2        2        2        1        1        1        1        1        2        1        2 
User 136 User 248 
       2        2

NbClust(x,method = "single")

NaNs produced

*** : The Hubert index is a graphical method of determining the number of clusters.
                In the plot of Hubert index, we seek a significant knee that corresponds to a 
                significant increase of the value of the measure i.e the significant peak in Hubert
                index second differences plot.

*** : The D index is a graphical method of determining the number of clusters. 
                In the plot of D index, we seek a significant knee (the significant peak in Dindex
                second differences plot) that corresponds to a significant increase of the value of
                the measure. 
 
******************************************************************* 
* Among all indices:                                                
* 6 proposed 2 as the best number of clusters 
* 3 proposed 3 as the best number of clusters 
* 1 proposed 4 as the best number of clusters 
* 7 proposed 9 as the best number of clusters 
* 1 proposed 11 as the best number of clusters 
* 2 proposed 13 as the best number of clusters 
* 1 proposed 14 as the best number of clusters 
* 3 proposed 15 as the best number of clusters 

                   ***** Conclusion *****                            
 
* According to the majority rule, the best number of clusters is  9 
 
 
******************************************************************* 
$All.index
         KL      CH Hartigan      CCC    Scott    Marriot   TrCovW   TraceW Friedman  Rubin Cindex     DB
2    1.1780  9.3980   9.4276  -5.8551  30.8464 17789656.0 740.4079 245.8621  13.8019 1.1958 0.4863 0.7795
3    0.5695 10.1204   0.9298  -7.8771  72.3644 17447362.3 564.9911 205.5003  26.1819 1.4307 0.4546 0.9154
4    1.2489  7.0374   0.7992 -11.0882  97.1881 18879523.5 527.0235 201.5136  30.6947 1.4590 0.4549 0.7606
5    1.0190  5.4484   0.3474 -12.1885 107.6349 23937066.0 511.0490 198.0724  32.5858 1.4843 0.4550 0.8901
6    1.1123  4.3627   0.2627 -13.2035 127.8665 22998754.0 504.8444 196.5548  37.6323 1.4958 0.4554 0.8208
7    0.1831  3.6170  16.3558 -16.8886 139.0082 25050847.7 498.4411 195.3883  39.9606 1.5047 0.4557 0.7427
8    0.7618  6.4622  33.3652 -13.8740 180.1798 14361328.6 347.0451 141.5481  58.8712 2.0770 0.4297 0.6351
9  108.7004 13.9761   1.1840  -7.8376 256.2659  3968475.9 311.4268  78.8828  89.4201 3.7270 0.4256 0.7333
10   3.1020 12.5986   1.3796  -8.5311 267.4485  3917492.8 291.5743  76.6687  97.1484 3.8347 0.4242 0.7555
11   0.0129 11.5711   9.1190  -9.1150 276.7544  3935171.9 283.2820  74.1125  99.5336 3.9669 0.4198 0.7621
12   1.3075 13.4537   8.0153  -7.6581 307.3097  2541796.1 154.8078  60.0675 112.6929 4.8945 0.5095 0.7327
13   3.0653 15.1912   3.2781  -6.4375 374.5395   777525.0 147.3928  49.6045 157.3419 5.9269 0.4893 0.6257
14   8.2436 15.0978   1.1168  -6.4226 393.5879   616071.1 124.0001  45.5674 162.8986 6.4520 0.5012 0.6164
15   0.8121 14.1303   1.0981  -7.0034 414.6928   463706.4 122.3676  44.1963 169.6937 6.6521 0.5015 0.6007
   Silhouette   Duda Pseudot2   Beale Ratkowsky     Ball Ptbiserial      Frey McClain   Dunn Hubert SDindex
2      0.3169 0.8340   8.9553  0.7490    0.2566 122.9310     0.3797    1.8204  0.0837 0.3341 0.0073  1.2977
3      0.2703 0.7695   0.5990  0.7682    0.3013  68.5001     0.4695  -13.0696  0.2235 0.2920 0.0069  1.1548
4      0.2329 0.9942   0.2377  0.0218    0.2680  50.3784     0.4648 -308.6086  0.2275 0.2837 0.0068  1.1520
5      0.0397 3.6404  -0.7253 -1.3952    0.2441  39.6145     0.4258   -5.2215  0.2818 0.2812 0.0066  1.3125
6      0.0400 9.1165   0.0000  0.0000    0.2246  32.7591     0.4213   -4.8152  0.2852 0.2626 0.0066  1.4555
7      0.0753 0.7216  15.4304  1.4479    0.2105  27.9126     0.4188    0.2326  0.2870 0.2586 0.0065  1.3535
8      0.1975 0.5454  29.1740  3.1178    0.2540  17.6935     0.5370    0.4097  0.5029 0.2429 0.0064  1.2653
9      0.3254 1.0038  -0.0378 -0.0132    0.2847   8.7648     0.6213    3.8977  1.1216 0.3236 0.0069  1.4260
10     0.3285 0.9951   0.1135  0.0182    0.2715   7.6669     0.6109    2.5233  1.1734 0.3226 0.0069  1.4502
11     0.2371 0.7093   9.0186  1.5086    0.2604   6.7375     0.5904    0.2228  1.2937 0.3217 0.0069  1.4260
12     0.2861 0.5266   8.0901  3.1125    0.2574   5.0056     0.6031    0.4024  1.4387 0.4147 0.0071  1.4147
13     0.3332 0.9559   0.9218  0.1689    0.2527   3.8157     0.5987    0.5265  1.5496 0.4123 0.0073  1.4474
14     0.3309 0.4259   1.3479  2.5929    0.2455   3.2548     0.5893   -4.8499  1.6702 0.4254 0.0074  1.5205
15     0.3382 4.1310  -0.7579 -1.4580    0.2379   2.9464     0.5861   -6.3823  1.6899 0.4069 0.0074  1.6189
   Dindex   SDbw
2  2.0890 0.5304
3  1.8907 0.4208
4  1.8480 0.2721
5  1.8100 0.2191
6  1.7840 0.1800
7  1.7535 0.1321
8  1.4531 0.1042
9  1.1105 0.1035
10 1.0803 0.0933
11 1.0467 0.0850
12 0.9442 0.0717
13 0.8568 0.0724
14 0.8149 0.0658
15 0.7884 0.0553

$All.CriticalValues
   CritValue_Duda CritValue_PseudoT2 Fvalue_Beale
2          0.6433            24.9549       0.6107
3          0.0348            55.4763       0.6091
4          0.6319            23.8864       1.0000
5         -0.0981           -11.1930       1.0000
6         -0.3211             0.0000          NaN
7          0.6288            23.6160       0.1971
8          0.6114            22.2432       0.0060
9          0.3979            15.1322       1.0000
10         0.5503            18.7987       1.0000
11         0.5432            18.5029       0.1801
12         0.3758            14.9464       0.0108
13         0.5276            17.9093       0.9846
14        -0.0981           -11.1930       0.1356
15        -0.0981           -11.1930       1.0000

$Best.nc
                      KL      CH Hartigan     CCC   Scott  Marriot   TrCovW  TraceW Friedman   Rubin  Cindex
Number_clusters   9.0000 13.0000   9.0000  2.0000  9.0000        9   3.0000  9.0000   13.000  9.0000 11.0000
Value_Index     108.7004 15.1912  32.1812 -5.8551 76.0862 10341870 175.4168 60.4512   44.649 -1.5424  0.4198
                     DB Silhouette  Duda PseudoT2 Beale Ratkowsky   Ball PtBiserial   Frey McClain    Dunn
Number_clusters 15.0000    15.0000 2.000   2.0000 2.000    3.0000  3.000     9.0000 2.0000  2.0000 14.0000
Value_Index      0.6007     0.3382 0.834   8.9553 0.749    0.3013 54.431     0.6213 1.8204  0.0837  0.4254
                Hubert SDindex Dindex    SDbw
Number_clusters      0   4.000      0 15.0000
Value_Index          0   1.152      0  0.0553

$Best.partition
 User 20   User 4 User 167  User 36 User 116 User 115 User 237  User 43  User 71   User 1 User 200  User 23 
       1        1        2        1        3        4        5        1        1        1        6        1 
 User 72 User 140 User 197 User 170 User 225  User 35  User 17  User 56 User 155 User 145  User 53 User 240 
       1        3        3        3        3        1        1        1        3        3        1        2 
 User 37 User 157  User 18  User 98 User 195 User 185  User 14  User 73 User 217 User 224  User 38 User 199 
       1        3        1        1        7        8        1        1        3        2        1        8 
 User 76 User 239 User 168 User 139   User 9  User 41  User 31  User 60  User 12 User 221  User 11 User 131 
       1        2        8        3        1        1        1        1        1        2        1        3 
User 136 User 248 
       3        9

NbClust(x,method = "average")

NaNs produced

*** : The Hubert index is a graphical method of determining the number of clusters.
                In the plot of Hubert index, we seek a significant knee that corresponds to a 
                significant increase of the value of the measure i.e the significant peak in Hubert
                index second differences plot.

*** : The D index is a graphical method of determining the number of clusters. 
                In the plot of D index, we seek a significant knee (the significant peak in Dindex
                second differences plot) that corresponds to a significant increase of the value of
                the measure. 
 
******************************************************************* 
* Among all indices:                                                
* 8 proposed 2 as the best number of clusters 
* 3 proposed 3 as the best number of clusters 
* 1 proposed 4 as the best number of clusters 
* 4 proposed 5 as the best number of clusters 
* 1 proposed 7 as the best number of clusters 
* 1 proposed 10 as the best number of clusters 
* 1 proposed 13 as the best number of clusters 
* 2 proposed 14 as the best number of clusters 
* 2 proposed 15 as the best number of clusters 

                   ***** Conclusion *****                            
 
* According to the majority rule, the best number of clusters is  2 
 
 
******************************************************************* 
$All.index
        KL      CH Hartigan     CCC    Scott    Marriot    TrCovW   TraceW Friedman   Rubin Cindex     DB
2  42.7536 43.6317   7.6164 -0.4374  85.6793 5941492.96 1062.7221 154.0078  49.2029  1.9090 0.3961 1.0019
3   0.3419 28.4798   6.6439 -2.5570 109.3701 8323428.09  632.8272 132.9170  56.8605  2.2119 0.4971 0.9333
4   0.1333 23.3769  22.5870 -3.8772 144.9029 7270186.32  451.9154 116.4551  60.9979  2.5246 0.4871 0.8918
5   4.9560 31.0973   6.8376 -0.8754 223.4252 2362270.56  376.5825  78.1042  76.1266  3.7642 0.4676 1.0482
6   9.1609 29.3582   2.7488 -0.9457 247.4681 2103089.68  266.2071  67.8019  83.1374  4.3362 0.4924 0.9924
7   0.1251 25.8505   5.5486 -2.8580 277.7597 1561859.04  241.2669  63.8152  93.7086  4.6070 0.5223 0.8880
8   0.8923 25.2092   5.9773 -2.6594 301.7204 1263299.18  174.9706  56.5218 101.0233  5.2015 0.5098 0.8824
9   0.4078 25.3267  14.9595 -2.2616 333.6827  843704.50  137.2253  49.4800 108.3780  5.9418 0.4953 0.8287
10  9.9168 31.5989   2.6581  0.2614 385.6633  368304.77   55.8571  36.2526 120.4288  8.1098 0.4509 0.8747
11  0.5897 29.8298   3.5354 -0.1079 402.1022  320779.20   45.6249  33.9937 130.4411  8.6487 0.4430 0.8201
12  2.5897 29.1311   1.9449 -0.1797 429.3941  221171.41   38.7454  31.1682 141.2071  9.4327 0.4365 0.7523
13  0.1525 27.4893   7.8387 -0.6423 465.5747  125890.23   35.7438  29.6507 178.2512  9.9155 0.4339 0.7267
14  4.0987 30.5062   2.5521  0.5054 499.8400   73576.17   18.9938  24.4671 187.8930 12.0161 0.5193 0.7423
15  1.5299 29.6699   1.8832  0.2765 514.4226   63095.82   18.5137  22.8474 192.1438 12.8680 0.5085 0.7160
   Silhouette   Duda Pseudot2   Beale Ratkowsky    Ball Ptbiserial   Frey McClain   Dunn Hubert SDindex
2      0.4173 0.6441   7.7367  1.9844    0.4830 77.0039     0.6973 0.8723  0.4332 0.2335 0.0063  1.4708
3      0.3776 0.5360   9.5212  3.0526    0.4250 44.3057     0.7039 1.7861  0.4849 0.2994 0.0068  1.2863
4      0.3739 0.5953  21.7589  2.5368    0.3874 29.1138     0.6930 0.8460  0.5450 0.2994 0.0069  1.2588
5      0.3407 0.6544   5.8094  1.8625    0.3819 15.6208     0.6405 0.5383  1.2381 0.3639 0.0073  1.4103
6      0.3227 0.7695   0.5990  0.7682    0.3574 11.3003     0.6319 0.5830  1.3598 0.4007 0.0073  1.3603
7      0.3161 0.4413   8.8639  4.2628    0.3338  9.1165     0.6308 0.9574  1.3734 0.4254 0.0073  1.2350
8      0.3334 0.5155   7.5204  3.2148    0.3174  7.0652     0.6164 0.9971  1.4850 0.4254 0.0074  1.3756
9      0.3400 0.5604  14.9024  2.8667    0.3037  5.4978     0.5968 0.5477  1.6419 0.3739 0.0073  1.4359
10     0.3528 1.0084  -0.0332 -0.0256    0.2960  3.6253     0.5147 0.3954  2.6140 0.3477 0.0082  1.4663
11     0.3573 0.2998   4.6705  5.9897    0.2834  3.0903     0.5100 0.3912  2.6811 0.3477 0.0081  1.4690
12     0.3625 3.6404  -0.7253 -1.3952    0.2729  2.5974     0.5062 0.5096  2.7372 0.3477 0.0082  1.4462
13     0.3682 0.5759   8.1000  2.5969    0.2629  2.2808     0.5038 0.4049  2.7696 0.3477 0.0082  1.5428
14     0.3721 0.8469   0.5424  0.5217    0.2559  1.7477     0.4588 0.3465  3.4606 0.4329 0.0087  1.6594
15     0.3852 5.1205   0.0000  0.0000    0.2479  1.5232     0.4532 0.3601  3.5560 0.4329 0.0087  1.6812
   Dindex   SDbw
2  1.6730 1.0874
3  1.5618 0.3977
4  1.4462 0.3209
5  1.1904 0.2911
6  1.1109 0.2493
7  1.0682 0.1889
8  0.9972 0.1678
9  0.9303 0.1506
10 0.8036 0.1378
11 0.7711 0.1223
12 0.7361 0.1015
13 0.7101 0.0925
14 0.6528 0.0872
15 0.6253 0.0815

$All.CriticalValues
   CritValue_Duda CritValue_PseudoT2 Fvalue_Beale
2          0.4643            16.1499       0.0769
3          0.4174            15.3565       0.0107
4          0.5992            21.4021       0.0219
5          0.4174            15.3565       0.1005
6          0.0348            55.4763       0.6091
7          0.3212            14.7957       0.0019
8          0.3506            14.8210       0.0098
9          0.5190            17.6120       0.0122
10         0.1924            16.7852       1.0000
11         0.0348            55.4763       0.0043
12        -0.0981           -11.1930       1.0000
13         0.4174            15.3565       0.0255
14         0.1255            20.9054       0.7845
15        -0.3211             0.0000          NaN

$Best.nc
                     KL      CH Hartigan     CCC   Scott Marriot  TrCovW  TraceW Friedman  Rubin Cindex
Number_clusters  2.0000  2.0000   4.0000 14.0000  5.0000       5   3.000  5.0000  13.0000 10.000 2.0000
Value_Index     42.7536 43.6317  15.9432  0.5054 78.5223 4648735 429.895 28.0486  37.0441 -1.629 0.3961
                    DB Silhouette   Duda PseudoT2  Beale Ratkowsky    Ball PtBiserial Frey McClain    Dunn
Number_clusters 15.000     2.0000 2.0000   2.0000 5.0000     2.000  3.0000     3.0000    1  2.0000 14.0000
Value_Index      0.716     0.4173 0.6441   7.7367 1.8625     0.483 32.6982     0.7039   NA  0.4332  0.4329
                Hubert SDindex Dindex    SDbw
Number_clusters      0   7.000      0 15.0000
Value_Index          0   1.235      0  0.0815

$Best.partition
 User 20   User 4 User 167  User 36 User 116 User 115 User 237  User 43  User 71   User 1 User 200  User 23 
       1        1        2        1        1        1        2        1        1        1        2        1 
 User 72 User 140 User 197 User 170 User 225  User 35  User 17  User 56 User 155 User 145  User 53 User 240 
       1        1        2        2        2        1        1        1        1        1        1        2 
 User 37 User 157  User 18  User 98 User 195 User 185  User 14  User 73 User 217 User 224  User 38 User 199 
       1        1        1        1        2        2        1        1        2        2        1        2 
 User 76 User 239 User 168 User 139   User 9  User 41  User 31  User 60  User 12 User 221  User 11 User 131 
       1        2        2        1        1        1        1        1        1        2        1        1 
User 136 User 248 
       1        2

Part g

plot(silhouette(cutree(complete_linkage2,3),dist(x)))

plot(silhouette(cutree(single_linkage2,3),dist(x)))

plot(silhouette(cutree(average_linkage2,5),dist(x)))

Part h

The one based on purity./lowest number of singleton nodes gives us single_linkage to be the best and . The clustering performed with nb clust gave us good silhoutte index for complete_linkage. And we can see in the plot, the nb clust gave a elbow shaped drop in 3 clusters for complete and 3 for single and 5 for average. The higher the silhoute index,the good structure is present for the clusters.

I think the Complete linkage will be suit for the dataset, since it clusters properly and gives us a higher good structure.

LS0tCnRpdGxlOiAiQ2x1c3RlcmluZyIKb3V0cHV0OiAKICBodG1sX25vdGVib29rOgogICAgdG9jOiB5ZXMKICAgIHRvY19mbG9hdDogeWVzCmF1dGhvcjogSmF5YXZhcnNoaW5pIElsYXJhamFuLCAKICAgICAgICBJbGxpbm9pcyBJbnN0aXR1dGUgb2YgVGVjaG5vbG9neQotLS0KYGBge3Igc2V0dXAsIGluY2x1ZGU9RkFMU0V9CmtuaXRyOjpvcHRzX2NodW5rJHNldChlY2hvID0gVFJVRSkKYGBgCiMjIyBLbWVhbnMKYGBge3IgY2Fyc30Kcm0obGlzdCA9bHMoKSkKI2luc3RhbGwucGFja2FnZXMoImZhY3RvZXh0cmEiKQojaW5zdGFsbC5wYWNrYWdlcygiTmJDbHVzdCIpCmxpYnJhcnkoY2x1c3RlcikKbGlicmFyeShmYWN0b2V4dHJhKQpsaWJyYXJ5KGRwbHlyKQpsaWJyYXJ5KE5iQ2x1c3QpCnNldHdkKCIvVXNlcnMvamF5YXZhcnNoaW5pL0Rlc2t0b3AvbXMvc2VtMS9kbW0vQXNzaW5nMy8iKQpkYXRhX2J1ZGR5X21vdmUgPC0gcmVhZC5jc3YoImJ1ZGR5bW92ZV9ob2xpZGF5aXEuY3N2IiwgaGVhZGVyPVQsIHNlcD0iLCIsIGNvbW1lbnQuY2hhciA9ICcjJykKaGVhZChkYXRhX2J1ZGR5X21vdmUpCmBgYApgYGB7cn0KYnVkZHlfbW92ZT1kYXRhX2J1ZGR5X21vdmVbLDI6N10KaGVhZChidWRkeV9tb3ZlKQpgYGAKCmBgYHtyfQojQXBwbHlpbmcgYmFzaWMgc3RhdGlzdGljcyBiZWZvcmUgYXBwbHlpbmcgayBtZWFucyB0byBjaGVjayB0byBhcHBseSBzdGFuZGFyZGl6YXRpb24gb3Igbm90LiAKc3RhdHM8LSBkYXRhLmZyYW1lKAogIE1pbiA9IGFwcGx5KGJ1ZGR5X21vdmUsIDIsIG1pbiksICMgbWluaW11bQogIE1lZCA9IGFwcGx5KGJ1ZGR5X21vdmUsIDIsIG1lZGlhbiksICMgbWVkaWFuCiAgTWVhbiA9IGFwcGx5KGJ1ZGR5X21vdmUsIDIsIG1lYW4pLCAjIG1lYW4KICBTRCA9IGFwcGx5KGJ1ZGR5X21vdmUsIDIsIHNkKSwgIyBTdGFuZGFyZCBkZXZpYXRpb24KICBNYXggPSBhcHBseShidWRkeV9tb3ZlLCAyLCBtYXgpCikKc3RhdHMgPC0gcm91bmQoc3RhdHMsIDEpCmhlYWQoc3RhdHMpCmBgYApUaGUgbWluaW11bSBhbmQgbWF4aW11bSBvZiBzcG9ydHMgdmFsdWUgaXMgbXVjaCBsZXNzIHRoYW4gdGhlIHJlc3QsIFNvIEknbSBzY2FsaW5nIHRoZSBkYXRhLiAKCmBgYHtyfQp4PC1zY2FsZShidWRkeV9tb3ZlKQpoZWFkKHgpCmBgYApgYGB7cn0Kc3RhdHM8LSBkYXRhLmZyYW1lKAogIE1pbiA9IGFwcGx5KHgsIDIsIG1pbiksICMgbWluaW11bQogIE1lZCA9IGFwcGx5KHgsIDIsIG1lZGlhbiksICMgbWVkaWFuCiAgTWVhbiA9IGFwcGx5KHgsIDIsIG1lYW4pLCAjIG1lYW4KICBTRCA9IGFwcGx5KHgsIDIsIHNkKSwgIyBTdGFuZGFyZCBkZXZpYXRpb24KICBNYXggPSBhcHBseSh4LCAyLCBtYXgpCikKc3RhdHMgPC0gcm91bmQoc3RhdHMsIDEpCmhlYWQoc3RhdHMpCmBgYAoKIyMjIyBhCmBgYHtyfQojIEluaXRpYWxpemluZyB0b3RhbCB3aXRoaW4gc3VtIG9mIHNxdWFyZXMgZXJyb3I6IHdzcwp3c3MgPC0gMAoKIyBGb3IgMSB0byAxNSBjbHVzdGVyIGNlbnRlcnMKZm9yIChpIGluIDE6MTUpIHsKICBrbW4gPC0ga21lYW5zKHgsIGNlbnRlcnMgPSBpLCBuc3RhcnQgPSAyMCkKICAjIFNhdmluZyB0aGUgdG90YWwgd2l0aGluIHN1bSBvZiBzcXVhcmVzIHRvIHdzcyB2YXJpYWJsZQogIHdzc1tpXSA8LSBrbW4kdG90LndpdGhpbnNzCn0KCiMgUGxvdCB0b3RhbCB3aXRoaW4gc3VtIG9mIHNxdWFyZXMgdnMuIG51bWJlciBvZiBjbHVzdGVycwpwbG90KDE6MTUsIHdzcywgdHlwZSA9ICJiIiwgCiAgICAgeGxhYiA9ICJOdW1iZXIgb2YgQ2x1c3RlcnMiLCAKICAgICB5bGFiID0gIldpdGhpbiBncm91cHMgc3VtIG9mIHNxdWFyZXMiKQoKCmBgYApUaGUgcGxvdCBoYXMgYW4gZWxib3cgd2hlcmUgdGhlIHF1YWxpdHkgbWVhc3VyZSBpbXByb3ZlcyBtb3JlIHNsb3dseSBhcyB0aGUgbnVtYmVyIG9mIGNsdXN0ZXJzIGluY3JlYXNlcyB3aGljaCBzaG93cyB0aGUgcXVhbGl0eSBvZiB0aGUgb2YgdGhlIG1vZGVsIGlzIG5vIGxvbmdlciBpbXByb3Zpbmcgc3Vic3RhbnRpYWxseSBhcyB0aGUgbW9kZWwgY29tcGxleGl0eSBpbmNyZWFzZXMuIAoKCkFsc28gdG8gZG8gaXQgdXNpbmcgZnZpel9uYmNsdXN0KCkgZWFzaWx5OgoKYGBge3J9CgojIEhvdyBtYW55IGNsdXN0ZXJzPyAgQSBjb3VwbGUgb2YgbWVhbnMgdG8gdmlzdXphbGl6ZSBpdC4KZnZpel9uYmNsdXN0KHgsIGttZWFucywgbWV0aG9kPSJ3c3MiKSAjIEVsYm93IG1ldGhvZCBtaW5pbWl6ZXMgdG90YWwKIyB3aXRoaW4tY2x1c3RlciBzdW0gb2Ygc3F1YXJlcyAod3NzKS4gIEFsc28gY2FsbGVkIGEgIlNjcmVlIiBwbG90LgoKIyBTaWxob3VldHRlIG1lYXN1cmVzIHRoZSBxdWFsaXR5IG9mIGEgY2x1c3RlciwgaS5lLiwgaG93IHdlbGwgZWFjaCAKIyBwb2ludCBsaWVzIHdpdGhpbiBpdHMgY2x1c3Rlci4KZnZpel9uYmNsdXN0KHgsIGttZWFucywgbWV0aG9kPSJzaWxob3VldHRlIikKYGBgCmBgYHtyfQprIDwtIDMKa21lYW49a21lYW5zKHgsY2VudGVycyA9IDMsbnN0YXJ0ID0gMjUpCmBgYAojIyMjIFBhcnQgYgpgYGB7cn0KZnZpel9jbHVzdGVyKGttZWFuLCBkYXRhPXgpCmBgYAojIyMjIFBhcnQgYwpOdW1iZXIgb2Ygb2JzZXJ2YXRpb25zIGluIGVhY2ggY2x1c3RlciAKYGBge3J9CmttZWFuJHNpemUKYGBgCgojIyMjIFBhcnQgZApUb3RhbCBTU0Ugb2YgdGhlIGNsdXN0ZXJzIApgYGB7cn0KcHJpbnQoa21lYW4kdG90LndpdGhpbnNzKQpgYGAKIyMjIyBQYXJ0IGUKU1NFIG9mIGVhY2ggY2x1c3RlciAKYGBge3J9CnByaW50KGttZWFuJHdpdGhpbnNzKQoKYGBgCiMjIyMgUGFydCBmCmBgYHtyfQpmb3IoaSBpbiAxOjMpCnsKICBwcmludChpKQogIHByaW50KHdoaWNoKGttZWFuJGNsdXN0ZXI9PWkpKQp9CmBgYAoKSW4gY2x1c3RlciBvbmUsIHRoZSB1c2VycyB3aG8gaGFzIGdpdmVuIG1vcmUgcmV2aWV3cyBhYm91dCBuYXR1cmUgYW5kIHBpY25pYyBhcmUgY2x1c3RlcmVkIHRvZ2V0aGVyLiBJdCBpcyBvYnZpb3VzIHRoYXQgcGVvcGxlIGVuam95IG5hdHVyZSBwcmVmZXIgdG8gc3BlbmQgbW9yZSB0aW1lIHdpdGggZmFtaWx5IGhhdmluZyBwaWNuaWNzLiAKRm9yIGluc3RhbmNlLCBmb3IgY2x1c3RlciBwb2ludHMgNzMsODQsOTQgdGhlIG5hdHVyZSBhbmQgcGljbmljIHZhbHVlcyBhcmUgaGlnaCB0aGFuIHRoZSByZXN0LiBTbyB0aGV5IGFyZSBjbHVzdGVyZWQgdG9nZXRoZXIuIAoKCkluIGNsdXN0ZXIgdHdvLCBTb21ldGhpbmcgSSBmb3VuZCBvdXQgaW50ZXJlc3Rpbmcgd2FzIHRoZSB1c2VycyB3aG8gcmF0ZWQgY2FuIGJlIG1vdGhlcnMvd29tZW4gb2YgdGhlIGZhbWlseS4gSSBzYXkgc28gYmVjYXVzZSB0aGUgcmF0aW5ncyBvbiBSZWxpZ2lvdXMsU2hvcHBpbmcgYW5kIFBpY25pYyBhcmUgc3BlY2lmaWNhbGx5IGhpZ2guIAoKCkluIGNsdXN0ZXIgdGhyZWUsIHRoZSBzcG9ydHMgcmF0aW5ncyBwbGF5IGEgcm9sZS4gSWYgd2UgY2FuIHNlZSB0aGUgY2x1c3RlcnMgbW9yZSBpbiBkZXRhaWwsIHdlIGNhbiBmaW5kIHRoZSB1c2VycyB3aG8gcHJlZmVyIHdhdGNoaW5nIG1vdmllcyBhbmQgc3BvcnRzIG9uIHR2IHRoYW4gb3V0ZG9vcnMuIAoKIyMjIEhpZXJhcmNoaWNhbCBjbHVzdGVyaW5nICAKYGBge3J9CnNldC5zZWVkKDExMjIpCnNldHdkKCIvVXNlcnMvamF5YXZhcnNoaW5pL0Rlc2t0b3AvbXMvc2VtMS9kbW0vQXNzaW5nMy8iKQpEYXRhU2V0IDwtIHJlYWQuY3N2KCJidWRkeW1vdmVfaG9saWRheWlxLmNzdiIsIGhlYWRlcj1ULCBzZXA9IiwiLCBjb21tZW50LmNoYXIgPSAnIycpCkRhdGFTZXQKU3ViU2V0PC1zYW1wbGVfbihEYXRhU2V0LCA1MCkKcm93bmFtZXMoU3ViU2V0KTwtU3ViU2V0JFVzZXIuSWQKU3ViU2V0PC1TdWJTZXRbMjo3XQpTdWJTZXQKYGBgCmBgYHtyfQp4PC1zY2FsZShTdWJTZXQpCmhlYWQoeCkKYGBgCiMjIyMgUGFydCBhCkNvbXBsZXRlIExpbmFrZ2UgCmBgYHtyfQpjb21wbGV0ZV9saW5rYWdlPC0gZWNsdXN0KHgsICJoY2x1c3QiLCBoY19tZXRob2QgPSAiY29tcGxldGUiLGs9MSkKYGBgCgpgYGB7cn0KZnZpel9kZW5kKGNvbXBsZXRlX2xpbmthZ2UsIHNob3dfbGFiZWxzPVQsIHBhbGV0dGU9ImpjbyIpCmBgYApUaGUgbnVtYmVyIG9mIHNpbmdsZXRvbiBjbHVzdGVyczogMTkKCgpgYGB7cn0Kc2luZ2xlX2xpbmthZ2U8LSBlY2x1c3QoeCwgImhjbHVzdCIsIGhjX21ldGhvZCA9ICJzaW5nbGUiLGs9MSkKCmBgYAoKYGBge3J9CmZ2aXpfZGVuZChzaW5nbGVfbGlua2FnZSwgc2hvd19sYWJlbHM9VCwgcGFsZXR0ZT0iamNvIixtYWluPSdTaW5nbGUgTGlua2FnZScpCmBgYApUaGUgbnVtYmVyIG9mIHNpbmdsZXRvbiBjdXN0ZXIgcGFpcnM6MTUKYGBge3J9CmF2ZXJhZ2VfbGlua2FnZTwtIGVjbHVzdCh4ICwiaGNsdXN0IiwgaGNfbWV0aG9kID0iYXZlcmFnZSIsaz0xKQpgYGAgCgpgYGB7cn0KZnZpel9kZW5kKGF2ZXJhZ2VfbGlua2FnZSAsc2hvd19sYWJlbHM9VCxwYWxldHRlPSJqY28iLCBtYWluPSdBdmVyYWdlIExpbmthZ2UnKQpgYGAKClRoZSB0b3RhbCBudW1iZXIgb2Ygc2luZ2xldG9uIGNsdXN0ZXIgcGFpcnMgMTgKYGBge3J9CmBgYAojIyMjIFBhcnQgYgpDb21wbGV0ZSBMaW5rYWdlOiAKVGhlIG51bWJlciBvZiBzaW5nbGV0b24gY2x1c3RlcnM6IDE5Cgp7VXNlciA3MSxVc2VyIDk4fSx7VXNlciAxMixVc2VyIDExfSx7VXNlciA3MixVc2VyIDczfSx7VXNlciAxMTUsVXNlciA1Nn0se1VzZXIgMTggLFVzZXIgNDF9LHtVc2VyIDIzLFVzZXIgNTh9LHtVc2VyIDQzLFVzZXIgMzV9LHtVc2VyIDM2LFVzZXIgNjB9LHtVc2VyIDQsVXNlciAzN30se1VzZXIgMjAwLFVzZXIgMTk1fSx7VXNlciAxOTksVXNlciAxNjh9LHtVc2VyIDE5NyxVc2VyIDIxN30se1VzZXIgMjI0LFVzZXIgMjIxfSx7VXNlciAxNjcsVXNlciAyNDB9LHtVc2VyIDE0MCxVc2VyIDE0NX0se1VzZXIgMTU1LFVzZXIgMTM2fSx7VXNlciAxNzAsVXNlciAyMjV9LHtVc2VyIDE1NyxVc2VyIDEzMX0se1VzZXIgMTE2LFVzZXIgMTM5fQoKU2luZ2xlIExpbmthZ2U6ClRoZSBudW1iZXIgb2Ygc2luZ2xldG9uIGN1c3RlciBwYWlyczoxNQoKe1VzZXIgMjAwLFVzZXIgMTk1fSx7VXNlciAyMjQsVXNlciAyMjF9LHtVc2VyIDE2NyAsVXNlciAyNDB9LHtVc2VyIDcxLFVzZXIgOTh9LHtVc2VyIDEyLFVzZXIgMTF9LHtVc2VyIDE4LFVzZXIgNDF9LHtVc2VyIDIzLFVzZXIgMzh9LHtVc2VyIDQzLFVzZXIgMzV9LHtVc2VyIDE1NyxVc2VyIDEzMX0se1VzZXIgMTE2LFVzZXIgMTM5fSx7VXNlciA0MCxVc2VyIDQ1fSx7VXNlciAxNDAsVXNlciAxNDV9LHtVc2VyMTU1ICxVc2VyIDEzNn0se1VzZXIgMTk3ICxVc2VyIDIxN30se1VzZXIgMTk5LFVzZXIgMTY4fQoKCkF2ZXJhZ2UgTGlua2FnZToKVGhlIHRvdGFsIG51bWJlciBvZiBzaW5nbGV0b24gY2x1c3RlciBwYWlycyAxOAp7VXNlciAxMixVc2VyIDExfSx7VXNlciA0MyxVc2VyIDM1fSx7VXNlciAzNixVc2VyIDYwfSx7VXNlciA0LFVzZXIgMzd9LHtVc2VyIDcyLFVzZXIgNzN9LHtVc2VyIDE4LFVzZXIgNDF9LHtVc2VyIDIzLFVzZXIgMzh9LHtVc2VyIDcxLFVzZXIgOTh9LHtVc2VyIDE0MCxVc2VyIDE0NX0se1VzZXIgLDE1NVVzZXIgMTM2fSx7VXNlciAxNTcsVXNlciAxMzF9LHtVc2VyIDExNixVc2VyIDEzOX0se1VzZXIgMjAwLFVzZXIgMTk1fSx7VXNlciAyMjQsVXNlciAyMjF9LHtVc2VyIDE2NyxVc2VyIDI0MH0se1VzZXIgMTk3LFVzZXIgMjE3fSx7VXNlciAxNzAsVXNlciAyMjV9LHtVc2VyIDE5OSxVc2VyIDE2OH0KCmBgYHtyfQpgYGAKIyMjIyBQYXJ0IGMKQWNjb3JkaW5nIHRvIHRoZSBhc3N1bXB0aW9uIEkgdGFrZSwgdGhlIHNpbmdsZSBsaW5rYWdlIGhhcyB0aGUgc21hbGxlc3QgbnVtYmVyIG9mIHNpbmdsZXRvbiBwYWlycyBhbmQgSSBjb25zaWRlciB0aGUgcHVyZXN0LiAKYGBge3J9CmBgYAojIyMjIFBhcnQgZApgYGB7cn0KI2N1dHJlZShzaW5nbGVfbGlua2FnZSwpCmN1dHJlZShzaW5nbGVfbGlua2FnZSxoPTEuNykKcGxvdChzaW5nbGVfbGlua2FnZSkKYWJsaW5lKGg9MS43LGNvbD0icmVkIikKYGBgCgpJbSBnZXR0aW5nIDMgY2x1c3RlcnMgYXQgaGVpZ2h0IDEuNwojIyMjIFBhcnQgZQpgYGB7cn0gCmNvbXBsZXRlX2xpbmthZ2UyPC0gZWNsdXN0KHgsICJoY2x1c3QiLCBoY19tZXRob2QgPSAiY29tcGxldGUiLGs9MykKZnZpel9kZW5kKGNvbXBsZXRlX2xpbmthZ2UyLCBzaG93X2xhYmVscz1ULCBwYWxldHRlPSJqY28iKQoKYGBgCgpgYGB7cn0Kc2luZ2xlX2xpbmthZ2UyPC0gZWNsdXN0KHgsICJoY2x1c3QiLCBoY19tZXRob2QgPSAic2luZ2xlIixrPTMpCmZ2aXpfZGVuZChzaW5nbGVfbGlua2FnZTIsIHNob3dfbGFiZWxzPVQsIHBhbGV0dGU9ImpjbyIpCmBgYAoKYGBge3J9CmF2ZXJhZ2VfbGlua2FnZTI8LSBlY2x1c3QoeCwgImhjbHVzdCIsIGhjX21ldGhvZCAgPSAiYXZlcmFnZSIsaz0zKQpmdml6X2RlbmQoYXZlcmFnZV9saW5rYWdlMiwgc2hvd19sYWJlbHM9VCwgcGFsZXR0ZT0iamNvIikKYGBgClNpbGhvdWV0dGUgaW5kZXggZm9yIGFsbCB0eXBlcyBvZiBsaW5rYWdlLiAKYGBge3J9CmNvbXBsZXRlX3N0YXRhc3RpY3MgPC0gZnBjOjpjbHVzdGVyLnN0YXRzKGRpc3QoeCksIGNvbXBsZXRlX2xpbmthZ2UyJGNsdXN0ZXIpCmNvbXBsZXRlX3N0YXRhc3RpY3MkYXZnLnNpbHdpZHRoIApgYGAKCmBgYHtyfQpzaW5nbGVfc3RhdGFzdGljcyA8LSBmcGM6OmNsdXN0ZXIuc3RhdHMoZGlzdCh4KSwgc2luZ2xlX2xpbmthZ2UyJGNsdXN0ZXIpCnNpbmdsZV9zdGF0YXN0aWNzJGF2Zy5zaWx3aWR0aCAKCmBgYAoKYGBge3J9CmF2ZXJhZ2Vfc3RhdGFzdGljcyA8LSBmcGM6OmNsdXN0ZXIuc3RhdHMoZGlzdCh4KSwgYXZlcmFnZV9saW5rYWdlMiRjbHVzdGVyKQphdmVyYWdlX3N0YXRhc3RpY3MkYXZnLnNpbHdpZHRoIApgYGAKQUNjb3JkaW5nIHRvIHRoZSBhdmVyYWdlIHNpbGhvdXR0ZSBpbmRleCwgdGhlIGNvbXBsZXRlIGxpbmFrZ2UgaXMgdGhlIGJlc3QuIAojIyMjIFBhcnQgZgpgYGB7cn0KTmJDbHVzdCh4LG1ldGhvZCA9ICJjb21wbGV0ZSIpCmBgYApgYGB7cn0KTmJDbHVzdCh4LG1ldGhvZCA9ICJzaW5nbGUiKQpgYGAKCmBgYHtyfQpOYkNsdXN0KHgsbWV0aG9kID0gImF2ZXJhZ2UiKQpgYGAKIyMjIyBQYXJ0IGcKYGBge3J9CnBsb3Qoc2lsaG91ZXR0ZShjdXRyZWUoY29tcGxldGVfbGlua2FnZTIsMyksZGlzdCh4KSkpCgpgYGAKCmBgYHtyfQpwbG90KHNpbGhvdWV0dGUoY3V0cmVlKHNpbmdsZV9saW5rYWdlMiwzKSxkaXN0KHgpKSkKCmBgYAoKYGBge3J9CnBsb3Qoc2lsaG91ZXR0ZShjdXRyZWUoYXZlcmFnZV9saW5rYWdlMiw1KSxkaXN0KHgpKSkKYGBgCiMjIyMgUGFydCBoClRoZSBvbmUgYmFzZWQgb24gcHVyaXR5Li9sb3dlc3QgbnVtYmVyIG9mIHNpbmdsZXRvbiBub2RlcyBnaXZlcyB1cyBzaW5nbGVfbGlua2FnZSB0byBiZSB0aGUgYmVzdCBhbmQgLiBUaGUgY2x1c3RlcmluZyBwZXJmb3JtZWQgd2l0aCBuYiBjbHVzdCBnYXZlIHVzIGdvb2Qgc2lsaG91dHRlIGluZGV4IGZvciBjb21wbGV0ZV9saW5rYWdlLiBBbmQgd2UgY2FuIHNlZSBpbiB0aGUgcGxvdCwgdGhlIG5iIGNsdXN0IGdhdmUgYSBlbGJvdyBzaGFwZWQgZHJvcCBpbiAzIGNsdXN0ZXJzIGZvciBjb21wbGV0ZSBhbmQgMyBmb3Igc2luZ2xlIGFuZCA1IGZvciBhdmVyYWdlLiAKVGhlIGhpZ2hlciB0aGUgc2lsaG91dGUgaW5kZXgsdGhlIGdvb2Qgc3RydWN0dXJlIGlzIHByZXNlbnQgZm9yIHRoZSBjbHVzdGVycy4gCgpJIHRoaW5rIHRoZSBDb21wbGV0ZSBsaW5rYWdlIHdpbGwgYmUgc3VpdCBmb3IgdGhlIGRhdGFzZXQsIHNpbmNlIGl0IGNsdXN0ZXJzIHByb3Blcmx5IGFuZCBnaXZlcyB1cyBhIGhpZ2hlciBnb29kIHN0cnVjdHVyZS4g

Clustering

Jayavarshini Ilarajan, Illinois Institute of Technology

Kmeans

a

Part b

Part c

Part d

Part e

Part f

Hierarchical clustering

Part a

Part b

Part c

Part d

Part g

Part h