K-Means and HCA

The objective is to draw insights from the Whole Sale Customer data set located here: https://archive.ics.uci.edu/ml/datasets/wholesale+customers#

This project will be guided by examples in the “Machine Learning in R Cookbook”.

We’ll use methods of K-Means clustering and Hierarchical clustering analysis.

The first step is to load the data into R.

customers <- read.csv(url("https://archive.ics.uci.edu/ml/machine-learning-databases/00292/Wholesale%20customers%20data.csv"), header = TRUE, sep = ",")
head(customers)

The attribute information taken from the website is the following:

FRESH: annual spending (m.u.) on fresh products (Continuous);
MILK: annual spending (m.u.) on milk products (Continuous);
GROCERY: annual spending (m.u.)on grocery products (Continuous);
FROZEN: annual spending (m.u.)on frozen products (Continuous)
DETERGENTS_PAPER: annual spending (m.u.) on detergents and paper products (Continuous)
DELICATESSEN: annual spending (m.u.)on and delicatessen products (Continuous);
CHANNEL: customersâ€™ Channel - Horeca (Hotel/Restaurant/CafÃ©) or Retail channel (Nominal)
REGION: customersâ€™ Region â€“ Lisnon, Oporto or Other (Nominal)

Some descriptive stats…

summary(customers)

    Channel          Region          Fresh             Milk      
 Min.   :1.000   Min.   :1.000   Min.   :     3   Min.   :   55  
 1st Qu.:1.000   1st Qu.:2.000   1st Qu.:  3128   1st Qu.: 1533  
 Median :1.000   Median :3.000   Median :  8504   Median : 3627  
 Mean   :1.323   Mean   :2.543   Mean   : 12000   Mean   : 5796  
 3rd Qu.:2.000   3rd Qu.:3.000   3rd Qu.: 16934   3rd Qu.: 7190  
 Max.   :2.000   Max.   :3.000   Max.   :112151   Max.   :73498  
    Grocery          Frozen        Detergents_Paper 
 Min.   :    3   Min.   :   25.0   Min.   :    3.0  
 1st Qu.: 2153   1st Qu.:  742.2   1st Qu.:  256.8  
 Median : 4756   Median : 1526.0   Median :  816.5  
 Mean   : 7951   Mean   : 3071.9   Mean   : 2881.5  
 3rd Qu.:10656   3rd Qu.: 3554.2   3rd Qu.: 3922.0  
 Max.   :92780   Max.   :60869.0   Max.   :40827.0  
   Delicassen     
 Min.   :    3.0  
 1st Qu.:  408.2  
 Median :  965.5  
 Mean   : 1524.9  
 3rd Qu.: 1820.2  
 Max.   :47943.0

table(customers$Channel)


  1   2 
298 142

table(customers$Region)


  1   2   3 
 77  47 316

sum(is.na(customers))

[1] 0

Because the region and channel variables are categorical, let’s remove them from our analysis for now. They don’t offer much insight on spending habits. We’ll also change the channel and region variables to factors, in case we want to use them later in the analysis.

customers$Region <- as.factor(customers$Region)
customers$Channel <- as.factor((customers$Channel))

newCust <- customers
newCust$Region <- NULL
newCust$Channel <- NULL

Let’s also normalize the data.

normalize <- function(x) {
    return ((x - min(x)) / (max(x) - min(x)))
}
newCust[1:6] <- as.data.frame(lapply(newCust[1:6], normalize))

summary(newCust)

     Fresh              Milk            Grocery       
 Min.   :0.00000   Min.   :0.00000   Min.   :0.00000  
 1st Qu.:0.02786   1st Qu.:0.02012   1st Qu.:0.02317  
 Median :0.07580   Median :0.04864   Median :0.05122  
 Mean   :0.10698   Mean   :0.07817   Mean   :0.08567  
 3rd Qu.:0.15097   3rd Qu.:0.09715   3rd Qu.:0.11482  
 Max.   :1.00000   Max.   :1.00000   Max.   :1.00000  
     Frozen        Detergents_Paper     Delicassen      
 Min.   :0.00000   Min.   :0.000000   Min.   :0.000000  
 1st Qu.:0.01179   1st Qu.:0.006216   1st Qu.:0.008453  
 Median :0.02467   Median :0.019927   Median :0.020077  
 Mean   :0.05008   Mean   :0.070510   Mean   :0.031745  
 3rd Qu.:0.05800   3rd Qu.:0.095997   3rd Qu.:0.037907  
 Max.   :1.00000   Max.   :1.000000   Max.   :1.000000

As seen above, all of the values are now between 0 and 1. We can now trust that each variable will be weighted equally in our analysis. I.E we can equalize the spending on each product to see which products should be marketed to which customers. We’ll also perform clusters without normalizing the data later in this project.

Let’s find optimal clusters for the given data set.

clus <- 2:12
set.seed(22)
WSS <- sapply(clus, function(k) {
  kmeans(newCust, centers = k)$tot.withinss
})
WSS

 [1] 17.245406 13.895508 10.766948  9.011122  7.964424  7.140254
 [7]  6.938415  6.204416  5.666592  5.760167  4.842321

plot(clus, WSS, type = "l", xlab = "number of k", ylab = "within sum of squares")

Calculate the average silhouette width of various numbers of clusters.

library(fpc)

sw = sapply(clus, function(k) {
  cluster.stats(dist(newCust), kmeans(newCust, centers = k)$cluster)$avg.silwidth
  })

sw

 [1] 0.5729735 0.4405736 0.3989539 0.3906469 0.3498627 0.3413289
 [7] 0.3321986 0.2906882 0.2714457 0.2124369 0.2641221

plot(clus, sw, type = "l", xlab = "number of clusters", ylab = "average cilhouette width")

The graph above indicates that that 2 clusters would be optimal.

clus[which.max(sw)]

[1] 2

Let’s create the two clusters.

set.seed(22)
fit = kmeans(newCust, 2)
fit

K-means clustering with 2 clusters of sizes 394, 46

Cluster means:
       Fresh       Milk    Grocery     Frozen Detergents_Paper
1 0.10783586 0.05593035 0.06023356 0.05026197       0.04094287
2 0.09962418 0.26868693 0.30354600 0.04850001       0.32375723
  Delicassen
1 0.02718838
2 0.07077642

Clustering vector:
  [1] 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 2 1
 [31] 1 1 1 1 1 1 1 1 2 1 1 1 1 2 1 2 2 2 1 2 1 1 1 1 1 1 2 1 1 1
 [61] 1 2 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 2 2 1 1 1
 [91] 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1
[121] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1
[151] 1 1 1 1 1 2 1 1 1 1 1 1 1 2 1 2 1 1 1 1 1 2 1 2 1 1 1 1 1 1
[181] 1 2 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 2 1 1 1 2
[211] 1 2 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[241] 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[271] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[301] 1 2 1 1 2 1 2 1 1 2 1 1 2 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1
[331] 1 2 1 2 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 2 1 2 1 2 1 1 1 1 1 1
[361] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[391] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1
[421] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1

Within cluster sum of squares by cluster:
[1] 10.170431  7.074975
 (between_SS / total_SS =  30.8 %)

Available components:

[1] "cluster"      "centers"      "totss"        "withinss"    
[5] "tot.withinss" "betweenss"    "size"         "iter"        
[9] "ifault"

barplot(t(fit$centers), beside = TRUE, xlab="cluster", ylab="value")

This indicates that the products “milk”, “grocery” and “detergents_paper” are being purchased at a much higher rate from the customers in cluster 2.

Let’s check out what a cluster plot looks like of this data

install.packages("cluster")

Error in install.packages : Updating loaded packages

library(cluster)
clusplot(newCust, fit$cluster, color = TRUE, shade = TRUE)

The clusters above contain one of size 394 and the other with the remaining 46 observations. These two clusters might not be optimal for drawing valuable insights, as it seemed to lump most of the data together into one cluster and then the remaining, scattered data into another. Let’s try some other clustering methods and then tamper with our clusters if necessary.

mds = cmdscale(dist(newCust), k = 2)
plot(mds, col = fit$cluster)

From this graph we see that the data is split into two distinct groups. Let’s see if the “region” and “channel” variables can be predicted by our clusters

table(customers$Channel, fit$cluster)

table(customers$Region, fit$cluster)

These tables suggest that the clusters that were created based on the spending data are not indicative of what region or what channel the money was being spent at.

Let’s look at a correlation matrix to see how the variables are related in the original data set.

install.packages("Hmisc")

Error in install.packages : Updating loaded packages

library(Hmisc)
y <- rcorr(as.matrix(newCust))
y

                 Fresh Milk Grocery Frozen Detergents_Paper
Fresh             1.00 0.10   -0.01   0.35            -0.10
Milk              0.10 1.00    0.73   0.12             0.66
Grocery          -0.01 0.73    1.00  -0.04             0.92
Frozen            0.35 0.12   -0.04   1.00            -0.13
Detergents_Paper -0.10 0.66    0.92  -0.13             1.00
Delicassen        0.24 0.41    0.21   0.39             0.07
                 Delicassen
Fresh                  0.24
Milk                   0.41
Grocery                0.21
Frozen                 0.39
Detergents_Paper       0.07
Delicassen             1.00

n= 440 


P
                 Fresh  Milk   Grocery Frozen Detergents_Paper
Fresh                   0.0351 0.8042  0.0000 0.0325          
Milk             0.0351        0.0000  0.0092 0.0000          
Grocery          0.8042 0.0000         0.4003 0.0000          
Frozen           0.0000 0.0092 0.4003         0.0057          
Detergents_Paper 0.0325 0.0000 0.0000  0.0057                 
Delicassen       0.0000 0.0000 0.0000  0.0000 0.1468          
                 Delicassen
Fresh            0.0000    
Milk             0.0000    
Grocery          0.0000    
Frozen           0.0000    
Detergents_Paper 0.1468    
Delicassen

“Milk” and “Fresh” share a weak correlation. Let’s see how they compare in the clusters we created.

plot(newCust[c("Fresh", "Milk")], col = fit$cluster, cex = .5)
points(fit$centers[,c("Fresh", "Milk")], col=5, pch="X")

As shown in the graph above, based on the clusters we created, there is a slight overlap between customers who purchase significant amount of milk and those who purchase a lot of Fresh produce. However, as the values increase for both milk and fresh produce respectively, the correlation becomes weaker. It is especially clear that those who purchase lots of fresh produce tend not to be in the market for milk. The bright “X” indicated the center of the cluster in this graph and in subsequent graphs in this assignment.

plot(newCust[c("Detergents_Paper", "Grocery")], col = fit$cluster, cex = .5)
points(fit$centers[,c("Detergents_Paper", "Grocery")], col=5, pch="X")

In this graph, we see that “Grocery” and “Detergents” variables are strongly correlated. It would be wise for the company to market both detergents_paper and grocery products to the observations shown in red. These customers tend to spend money on both of these products.

Let’s look at the data again but this time let’s not normalize before we create clusters. This will let the most popular products take precedent.

customers <- read.csv(url("https://archive.ics.uci.edu/ml/machine-learning-databases/00292/Wholesale%20customers%20data.csv"), header = TRUE, sep = ",")

customers$Region <- as.factor(customers$Region)
customers$Channel <- as.factor((customers$Channel))

newCust <- customers
newCust$Region <- NULL
newCust$Channel <- NULL

clus <- 2:12
set.seed(22)
WSS <- sapply(clus, function(k) {
  kmeans(newCust, centers = k)$tot.withinss
})
WSS

 [1] 113217528521  80332413843  67315940100  53206131318
 [5]  47493122543  41922735726  36317892575  33601567798
 [9]  31888233451  28781999756  27499697494

plot(clus, WSS, type = "l", xlab = "number of k", ylab = "within sum of squares")

library(fpc)

sw = sapply(clus, function(k) {
  cluster.stats(dist(newCust), kmeans(newCust, centers = k)$cluster)$avg.silwidth
  })

sw

 [1] 0.4508716 0.4783511 0.3975973 0.3711958 0.3185849 0.3226306
 [7] 0.3195372 0.3061333 0.2799187 0.3056202 0.2643999

plot(clus, sw, type = "l", xlab = "number of clusters", ylab = "average cilhouette width")

The graph above indicates that that 3 clusters would be optimal.

clus[which.max(sw)]

[1] 3

Let’s create the three clusters.

set.seed(22)
fit = kmeans(newCust, 3)
fit

K-means clustering with 3 clusters of sizes 330, 50, 60

Cluster means:
     Fresh      Milk   Grocery   Frozen Detergents_Paper
1  8253.47  3824.603  5280.455 2572.661         1773.058
2  8000.04 18511.420 27573.900 1996.680        12407.360
3 35941.40  6044.450  6288.617 6713.967         1039.667
  Delicassen
1   1137.497
2   2252.020
3   3049.467

Clustering vector:
  [1] 1 1 1 1 3 1 1 1 1 2 1 1 3 1 3 1 1 1 1 1 1 1 3 2 3 1 1 1 2 3
 [31] 1 1 1 3 1 1 3 1 2 3 3 1 1 2 1 2 2 2 1 2 1 1 3 1 3 1 2 1 1 1
 [61] 1 2 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 2 2 3 1 3
 [91] 1 1 2 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 2 1 2 1 1 1 1 1 1 1 1
[121] 1 1 1 1 3 3 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 3 3 1 1 2 1 1 1 3
[151] 1 1 1 1 1 2 1 1 1 1 1 1 1 2 1 2 1 1 1 1 1 2 1 2 1 1 3 1 1 1
[181] 1 3 1 3 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 2 2 3 1 1 2 1 1 1 2
[211] 1 2 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 3
[241] 3 3 1 1 1 1 1 1 1 1 1 2 1 3 1 3 1 1 3 3 1 1 3 1 1 2 2 1 2 1
[271] 1 1 1 3 1 1 3 1 1 1 1 1 3 3 3 3 1 1 1 3 1 1 1 1 1 1 1 1 1 1
[301] 1 2 1 1 2 1 2 1 1 2 1 3 2 1 1 1 1 1 1 2 1 1 1 1 3 3 1 1 1 1
[331] 1 2 1 2 1 3 1 1 1 1 1 1 1 2 1 1 1 3 1 2 1 2 1 2 1 1 1 1 1 1
[361] 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 3 1 1 3 1 3 1 2 1 1 1 1 1
[391] 1 1 1 3 1 1 1 1 1 1 1 3 3 3 1 1 3 2 1 1 1 1 1 1 1 1 1 1 2 1
[421] 1 1 3 1 1 1 1 3 1 1 1 1 1 1 1 3 3 2 1 1

Within cluster sum of squares by cluster:
[1] 28184318853 26382784678 25765310312
 (between_SS / total_SS =  49.0 %)

Available components:

[1] "cluster"      "centers"      "totss"        "withinss"    
[5] "tot.withinss" "betweenss"    "size"         "iter"        
[9] "ifault"

barplot(t(fit$centers), beside = TRUE, xlab="cluster", ylab="value")

As shown above, the fast majority of people who tended to shop for fresh produce were grouped in cluster 3. The company would be wise to market their fresh produce to group 3. The also might benefit from marketing their grocery products to the customers in group 2.

library(cluster)
clusplot(newCust, fit$cluster, color = TRUE, shade = TRUE)

The clusters above contain one of size 330, 50, and 60 for cluster 1, 2 and 3 respectively.

mds = cmdscale(dist(newCust), k = 2)
plot(mds, col = fit$cluster)

Let’s see if this clustering can predict region and channel. I sort of doubt that these variables are related.

table(customers$Channel, fit$cluster)

   
      1   2   3
  1 244   2  52
  2  86  48   8

table(customers$Region, fit$cluster)

   
      1   2   3
  1  56  10  11
  2  35   8   4
  3 239  32  45

These tables also suggest that the clusters that were created based on the spending data are not indicative of what region or what channel the money was being spent at.

Looking again at the correlation matrix, lets pick two variables with a very weak correlation.

library(Hmisc)
y <- rcorr(as.matrix(newCust))
y

                 Fresh Milk Grocery Frozen Detergents_Paper
Fresh             1.00 0.10   -0.01   0.35            -0.10
Milk              0.10 1.00    0.73   0.12             0.66
Grocery          -0.01 0.73    1.00  -0.04             0.92
Frozen            0.35 0.12   -0.04   1.00            -0.13
Detergents_Paper -0.10 0.66    0.92  -0.13             1.00
Delicassen        0.24 0.41    0.21   0.39             0.07
                 Delicassen
Fresh                  0.24
Milk                   0.41
Grocery                0.21
Frozen                 0.39
Detergents_Paper       0.07
Delicassen             1.00

n= 440 


P
                 Fresh  Milk   Grocery Frozen Detergents_Paper
Fresh                   0.0351 0.8042  0.0000 0.0325          
Milk             0.0351        0.0000  0.0092 0.0000          
Grocery          0.8042 0.0000         0.4003 0.0000          
Frozen           0.0000 0.0092 0.4003         0.0057          
Detergents_Paper 0.0325 0.0000 0.0000  0.0057                 
Delicassen       0.0000 0.0000 0.0000  0.0000 0.1468          
                 Delicassen
Fresh            0.0000    
Milk             0.0000    
Grocery          0.0000    
Frozen           0.0000    
Detergents_Paper 0.1468    
Delicassen

Let’s look at another pair of products that have no correlation.

plot(newCust[c("Detergents_Paper", "Milk")], col = fit$cluster, cex = .5)
points(fit$centers[,c("Detergents_Paper", "Milk")], col=5, pch="X")

This graph doesn’t seem to lead to much of any valuable insight.

plot(newCust[c("Frozen", "Grocery")], col = fit$cluster, cex = .5)
points(fit$centers[,c("Frozen", "Grocery")], col=5, pch="X")

It looks as though the people in the red cluster tend to buy groceries at a much higher rate than frozen foods.

While there are many more explorations that we could perform on our clusters, lets shift gears to HCA clustering.

HCA CLUSTERING

First we load the data.

data <- read.csv(url("https://archive.ics.uci.edu/ml/machine-learning-databases/00292/Wholesale%20customers%20data.csv"), header = TRUE, sep = ",")

Let’s again remove the region and channel variables and we’ll need to normalize the data.

data$Channel <- NULL
data$Region <-NULL

normalize <- function(x) {
    return ((x - min(x)) / (max(x) - min(x)))
}
data[1:6] <- as.data.frame(lapply(data[1:6], normalize))

We’ll use packages “NbClust” and “factoextra” to choose the optimal number of clusters for our HCA analysis.

install.packages("NbClust")

Error in install.packages : Updating loaded packages

install.packages("factoextra")

Error in install.packages : Updating loaded packages

library(NbClust)
library(factoextra)

For the first analysis we’ll use ward.D2 linkage.

nb <- NbClust(data, distance = "euclidean", min.nc = 2,
              max.nc = 10, method = "ward.D2")

NaNs produced

*** : The Hubert index is a graphical method of determining the number of clusters.
                In the plot of Hubert index, we seek a significant knee that corresponds to a 
                significant increase of the value of the measure i.e the significant peak in Hubert
                index second differences plot.

*** : The D index is a graphical method of determining the number of clusters. 
                In the plot of D index, we seek a significant knee (the significant peak in Dindex
                second differences plot) that corresponds to a significant increase of the value of
                the measure. 
 
******************************************************************* 
* Among all indices:                                                
* 2 proposed 2 as the best number of clusters 
* 5 proposed 3 as the best number of clusters 
* 2 proposed 4 as the best number of clusters 
* 4 proposed 5 as the best number of clusters 
* 3 proposed 6 as the best number of clusters 
* 4 proposed 7 as the best number of clusters 
* 3 proposed 10 as the best number of clusters 

                   ***** Conclusion *****                            
 
* According to the majority rule, the best number of clusters is  3 
 
 
*******************************************************************

fviz_nbclust(nb)

Among all indices: 
===================
* 2 proposed  0 as the best number of clusters
* 1 proposed  1 as the best number of clusters
* 2 proposed  2 as the best number of clusters
* 5 proposed  3 as the best number of clusters
* 2 proposed  4 as the best number of clusters
* 4 proposed  5 as the best number of clusters
* 3 proposed  6 as the best number of clusters
* 4 proposed  7 as the best number of clusters
* 3 proposed  10 as the best number of clusters

Conclusion
=========================
* According to the majority rule, the best number of clusters is  3 .

As shown above, the optimal number of clusters is 3.

hc = hclust(dist(data, method="euclidean"), method="ward.D2")
hc


Call:
hclust(d = dist(data, method = "euclidean"), method = "ward.D2")

Cluster method   : ward.D2 
Distance         : euclidean 
Number of objects: 440

plot(hc, hang = -0.01, cex = 0.7)

We’ll now cut the dentrogram into three clusters.

fit <- cutree(hc, k = 3)
table(fit)

fit
  1   2   3 
310 125   5

As seen above, 310 of the observations are contained in cluster 1, 125 in cluster 2, and 5 in cluster 3.

plot(hc)
rect.hclust(hc, k = 4, border = "red")

The clusters are shown above.

Let’s also create clusters with “single” linkage.

nb2 <- NbClust(data, distance = "euclidean", min.nc = 2,
              max.nc = 10, method = "single")

NaNs produced

*** : The Hubert index is a graphical method of determining the number of clusters.
                In the plot of Hubert index, we seek a significant knee that corresponds to a 
                significant increase of the value of the measure i.e the significant peak in Hubert
                index second differences plot.

*** : The D index is a graphical method of determining the number of clusters. 
                In the plot of D index, we seek a significant knee (the significant peak in Dindex
                second differences plot) that corresponds to a significant increase of the value of
                the measure. 
 
******************************************************************* 
* Among all indices:                                                
* 8 proposed 2 as the best number of clusters 
* 3 proposed 3 as the best number of clusters 
* 4 proposed 5 as the best number of clusters 
* 3 proposed 7 as the best number of clusters 
* 5 proposed 8 as the best number of clusters 

                   ***** Conclusion *****                            
 
* According to the majority rule, the best number of clusters is  2 
 
 
*******************************************************************

fviz_nbclust(nb2)

Among all indices: 
===================
* 2 proposed  0 as the best number of clusters
* 1 proposed  1 as the best number of clusters
* 8 proposed  2 as the best number of clusters
* 3 proposed  3 as the best number of clusters
* 4 proposed  5 as the best number of clusters
* 3 proposed  7 as the best number of clusters
* 5 proposed  8 as the best number of clusters

Conclusion
=========================
* According to the majority rule, the best number of clusters is  2 .

2 clusters is optimal for this method.

hc2 <- hclust(dist(data), method = "single")
plot(hc2, hang = -.01, cex = .7)

hc2


Call:
hclust(d = dist(data), method = "single")

Cluster method   : single 
Distance         : euclidean 
Number of objects: 440

fit2 <- cutree(hc2, k = 2)
table(fit2)

fit2
  1   2 
439   1

As shown above, the single method didn’t do any meaningful clustering that we can work with. Let’s analyze our ward.D2 linkage clusters instead.

plot(hc)
rect.hclust(hc, k = 4, border = "red")

fit5 <- cutree(hc, h = 3)
table(fit5)

fit5
  1   2 
310 130

plot(hc)
rect.hclust(hc, h = 3, border = "red")

fit5 <- cutree(hc, h = .5)
table(fit5)

fit5
 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 
39 48 25 19 53 47 76 32  1 19 16 36  2  5  2  1  1  7  3  1  1  4 
23 24 
 1  1

plot(hc)
rect.hclust(hc, h = .5, border = "red")

Regardless of the where we choose to cut the dendrogram into clusters, there seem to just be too many observations. The dendrogram doesn’t have much value when there are 440 customers all listed by a different number. Therefore, I think HCA would have more value on a smaller data set where the observations are distinguished by name. I’m also aware that my knowledge of how to build and edit dendrograms is limited, so perhaps there is a way to obtain more value from this data set using HCA.

LS0tCnRpdGxlOiAiSy1NZWFucyBhbmQgSENBIgpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sKLS0tCgotLS0KdGl0bGU6ICJLLU1lYW5zIGFuZCBIQ0EgTWFjaGluZSBMZWFybmluZyIKb3V0cHV0OiBodG1sX25vdGVib29rCi0tLQoKVGhlIG9iamVjdGl2ZSBpcyB0byBkcmF3IGluc2lnaHRzIGZyb20gdGhlIFdob2xlIFNhbGUgQ3VzdG9tZXIgZGF0YSBzZXQgbG9jYXRlZCBoZXJlOiBodHRwczovL2FyY2hpdmUuaWNzLnVjaS5lZHUvbWwvZGF0YXNldHMvd2hvbGVzYWxlK2N1c3RvbWVycyMKClRoaXMgcHJvamVjdCB3aWxsIGJlIGd1aWRlZCBieSBleGFtcGxlcyBpbiB0aGUgIk1hY2hpbmUgTGVhcm5pbmcgaW4gUiBDb29rYm9vayIuCgpXZSdsbCB1c2UgbWV0aG9kcyBvZiBLLU1lYW5zIGNsdXN0ZXJpbmcgYW5kIEhpZXJhcmNoaWNhbCBjbHVzdGVyaW5nIGFuYWx5c2lzLgoKVGhlIGZpcnN0IHN0ZXAgaXMgdG8gbG9hZCB0aGUgZGF0YSBpbnRvIFIuCgpgYGB7cn0KY3VzdG9tZXJzIDwtIHJlYWQuY3N2KHVybCgiaHR0cHM6Ly9hcmNoaXZlLmljcy51Y2kuZWR1L21sL21hY2hpbmUtbGVhcm5pbmctZGF0YWJhc2VzLzAwMjkyL1dob2xlc2FsZSUyMGN1c3RvbWVycyUyMGRhdGEuY3N2IiksIGhlYWRlciA9IFRSVUUsIHNlcCA9ICIsIikKCmhlYWQoY3VzdG9tZXJzKQpgYGAKCgoKVGhlIGF0dHJpYnV0ZSBpbmZvcm1hdGlvbiB0YWtlbiBmcm9tIHRoZSB3ZWJzaXRlIGlzIHRoZSBmb2xsb3dpbmc6CgoxKQlGUkVTSDogYW5udWFsIHNwZW5kaW5nIChtLnUuKSBvbiBmcmVzaCBwcm9kdWN0cyAoQ29udGludW91cyk7IAoyKQlNSUxLOiBhbm51YWwgc3BlbmRpbmcgKG0udS4pIG9uIG1pbGsgcHJvZHVjdHMgKENvbnRpbnVvdXMpOyAKMykJR1JPQ0VSWTogYW5udWFsIHNwZW5kaW5nIChtLnUuKW9uIGdyb2NlcnkgcHJvZHVjdHMgKENvbnRpbnVvdXMpOyAKNCkJRlJPWkVOOiBhbm51YWwgc3BlbmRpbmcgKG0udS4pb24gZnJvemVuIHByb2R1Y3RzIChDb250aW51b3VzKSAKNSkJREVURVJHRU5UU19QQVBFUjogYW5udWFsIHNwZW5kaW5nIChtLnUuKSBvbiBkZXRlcmdlbnRzIGFuZCBwYXBlciBwcm9kdWN0cyAoQ29udGludW91cykgCjYpCURFTElDQVRFU1NFTjogYW5udWFsIHNwZW5kaW5nIChtLnUuKW9uIGFuZCBkZWxpY2F0ZXNzZW4gcHJvZHVjdHMgKENvbnRpbnVvdXMpOyAKNykJQ0hBTk5FTDogY3VzdG9tZXJzw6LigqzihKIgQ2hhbm5lbCAtIEhvcmVjYSAoSG90ZWwvUmVzdGF1cmFudC9DYWbDg8KpKSBvciBSZXRhaWwgY2hhbm5lbCAoTm9taW5hbCkgCjgpCVJFR0lPTjogY3VzdG9tZXJzw6LigqzihKIgUmVnaW9uIMOi4oKs4oCcIExpc25vbiwgT3BvcnRvIG9yIE90aGVyIChOb21pbmFsKSAKClNvbWUgZGVzY3JpcHRpdmUgc3RhdHMuLi4KCgpgYGB7cn0Kc3VtbWFyeShjdXN0b21lcnMpCmBgYAoKYGBge3J9CnRhYmxlKGN1c3RvbWVycyRDaGFubmVsKQpgYGAKCmBgYHtyfQp0YWJsZShjdXN0b21lcnMkUmVnaW9uKQpgYGAKCmBgYHtyfQpzdW0oaXMubmEoY3VzdG9tZXJzKSkKYGBgCgpCZWNhdXNlIHRoZSByZWdpb24gYW5kIGNoYW5uZWwgdmFyaWFibGVzIGFyZSBjYXRlZ29yaWNhbCwgbGV0J3MgcmVtb3ZlIHRoZW0gZnJvbSBvdXIgYW5hbHlzaXMgZm9yIG5vdy4gVGhleSBkb24ndCBvZmZlciBtdWNoIGluc2lnaHQgb24gc3BlbmRpbmcgaGFiaXRzLiBXZSdsbCBhbHNvIGNoYW5nZSB0aGUgY2hhbm5lbCBhbmQgcmVnaW9uIHZhcmlhYmxlcyB0byBmYWN0b3JzLCBpbiBjYXNlIHdlIHdhbnQgdG8gdXNlIHRoZW0gbGF0ZXIgaW4gdGhlIGFuYWx5c2lzLgoKYGBge3J9CmN1c3RvbWVycyRSZWdpb24gPC0gYXMuZmFjdG9yKGN1c3RvbWVycyRSZWdpb24pCmN1c3RvbWVycyRDaGFubmVsIDwtIGFzLmZhY3RvcigoY3VzdG9tZXJzJENoYW5uZWwpKQpgYGAKCmBgYHtyfQpuZXdDdXN0IDwtIGN1c3RvbWVycwpuZXdDdXN0JFJlZ2lvbiA8LSBOVUxMCm5ld0N1c3QkQ2hhbm5lbCA8LSBOVUxMCmBgYAoKCkxldCdzIGFsc28gbm9ybWFsaXplIHRoZSBkYXRhLgoKYGBge3J9Cm5vcm1hbGl6ZSA8LSBmdW5jdGlvbih4KSB7CiAgICByZXR1cm4gKCh4IC0gbWluKHgpKSAvIChtYXgoeCkgLSBtaW4oeCkpKQp9Cm5ld0N1c3RbMTo2XSA8LSBhcy5kYXRhLmZyYW1lKGxhcHBseShuZXdDdXN0WzE6Nl0sIG5vcm1hbGl6ZSkpCmBgYAoKYGBge3J9CnN1bW1hcnkobmV3Q3VzdCkKCmBgYAoKQXMgc2VlbiBhYm92ZSwgYWxsIG9mIHRoZSB2YWx1ZXMgYXJlIG5vdyBiZXR3ZWVuIDAgYW5kIDEuIFdlIGNhbiBub3cgdHJ1c3QgdGhhdCBlYWNoIHZhcmlhYmxlIHdpbGwgYmUgd2VpZ2h0ZWQgZXF1YWxseSBpbiBvdXIgYW5hbHlzaXMuIEkuRSB3ZSBjYW4gZXF1YWxpemUgdGhlIHNwZW5kaW5nIG9uIGVhY2ggcHJvZHVjdCB0byBzZWUgd2hpY2ggcHJvZHVjdHMgc2hvdWxkIGJlIG1hcmtldGVkIHRvIHdoaWNoIGN1c3RvbWVycy4gV2UnbGwgYWxzbyBwZXJmb3JtIGNsdXN0ZXJzIHdpdGhvdXQgbm9ybWFsaXppbmcgdGhlIGRhdGEgbGF0ZXIgaW4gdGhpcyBwcm9qZWN0LgoKCkxldCdzIGZpbmQgb3B0aW1hbCBjbHVzdGVycyBmb3IgdGhlIGdpdmVuIGRhdGEgc2V0LgoKYGBge3J9CmNsdXMgPC0gMjoxMgpzZXQuc2VlZCgyMikKV1NTIDwtIHNhcHBseShjbHVzLCBmdW5jdGlvbihrKSB7CiAga21lYW5zKG5ld0N1c3QsIGNlbnRlcnMgPSBrKSR0b3Qud2l0aGluc3MKfSkKV1NTCmBgYApgYGB7cn0KcGxvdChjbHVzLCBXU1MsIHR5cGUgPSAibCIsIHhsYWIgPSAibnVtYmVyIG9mIGsiLCB5bGFiID0gIndpdGhpbiBzdW0gb2Ygc3F1YXJlcyIpCmBgYAoKCkNhbGN1bGF0ZSB0aGUgYXZlcmFnZSBzaWxob3VldHRlIHdpZHRoIG9mIHZhcmlvdXMgbnVtYmVycyBvZiBjbHVzdGVycy4KCmBgYHtyfQpsaWJyYXJ5KGZwYykKYGBgCgoKYGBge3J9CnN3ID0gc2FwcGx5KGNsdXMsIGZ1bmN0aW9uKGspIHsKICBjbHVzdGVyLnN0YXRzKGRpc3QobmV3Q3VzdCksIGttZWFucyhuZXdDdXN0LCBjZW50ZXJzID0gaykkY2x1c3RlcikkYXZnLnNpbHdpZHRoCiAgfSkKYGBgCgpgYGB7cn0Kc3cKYGBgCgpgYGB7cn0KcGxvdChjbHVzLCBzdywgdHlwZSA9ICJsIiwgeGxhYiA9ICJudW1iZXIgb2YgY2x1c3RlcnMiLCB5bGFiID0gImF2ZXJhZ2UgY2lsaG91ZXR0ZSB3aWR0aCIpCmBgYAoKClRoZSBncmFwaCBhYm92ZSBpbmRpY2F0ZXMgdGhhdCB0aGF0IDIgY2x1c3RlcnMgd291bGQgYmUgb3B0aW1hbC4KCmBgYHtyfQpjbHVzW3doaWNoLm1heChzdyldCmBgYAoKCkxldCdzIGNyZWF0ZSB0aGUgdHdvIGNsdXN0ZXJzLgoKYGBge3J9CnNldC5zZWVkKDIyKQpmaXQgPSBrbWVhbnMobmV3Q3VzdCwgMikKZml0CmBgYAoKCmBgYHtyfQpiYXJwbG90KHQoZml0JGNlbnRlcnMpLCBiZXNpZGUgPSBUUlVFLCB4bGFiPSJjbHVzdGVyIiwgeWxhYj0idmFsdWUiKQpgYGAKClRoaXMgaW5kaWNhdGVzIHRoYXQgdGhlIHByb2R1Y3RzICJtaWxrIiwgImdyb2NlcnkiIGFuZCAiZGV0ZXJnZW50c19wYXBlciIgYXJlIGJlaW5nIHB1cmNoYXNlZCBhdCBhIG11Y2ggaGlnaGVyIHJhdGUgZnJvbSB0aGUgY3VzdG9tZXJzIGluIGNsdXN0ZXIgMi4KCkxldCdzIGNoZWNrIG91dCB3aGF0IGEgY2x1c3RlciBwbG90IGxvb2tzIGxpa2Ugb2YgdGhpcyBkYXRhCgoKYGBge3J9Cmluc3RhbGwucGFja2FnZXMoImNsdXN0ZXIiKQpgYGAKCgpgYGB7cn0KbGlicmFyeShjbHVzdGVyKQpjbHVzcGxvdChuZXdDdXN0LCBmaXQkY2x1c3RlciwgY29sb3IgPSBUUlVFLCBzaGFkZSA9IFRSVUUpCgoKYGBgCgoKClRoZSBjbHVzdGVycyBhYm92ZSBjb250YWluIG9uZSBvZiBzaXplIDM5NCBhbmQgdGhlIG90aGVyIHdpdGggdGhlIHJlbWFpbmluZyA0NiBvYnNlcnZhdGlvbnMuIFRoZXNlIHR3byBjbHVzdGVycyBtaWdodCBub3QgYmUgb3B0aW1hbCBmb3IgZHJhd2luZyB2YWx1YWJsZSBpbnNpZ2h0cywgYXMgaXQgc2VlbWVkIHRvIGx1bXAgbW9zdCBvZiB0aGUgZGF0YSB0b2dldGhlciBpbnRvIG9uZSBjbHVzdGVyIGFuZCB0aGVuIHRoZSByZW1haW5pbmcsIHNjYXR0ZXJlZCBkYXRhIGludG8gYW5vdGhlci4gTGV0J3MgdHJ5IHNvbWUgb3RoZXIgY2x1c3RlcmluZyBtZXRob2RzIGFuZCB0aGVuIHRhbXBlciB3aXRoIG91ciBjbHVzdGVycyBpZiBuZWNlc3NhcnkuCgoKCmBgYHtyfQptZHMgPSBjbWRzY2FsZShkaXN0KG5ld0N1c3QpLCBrID0gMikKcGxvdChtZHMsIGNvbCA9IGZpdCRjbHVzdGVyKQpgYGAKCgoKRnJvbSB0aGlzIGdyYXBoIHdlIHNlZSB0aGF0IHRoZSBkYXRhIGlzIHNwbGl0IGludG8gdHdvIGRpc3RpbmN0IGdyb3Vwcy4gTGV0J3Mgc2VlIGlmIHRoZSAicmVnaW9uIiBhbmQgImNoYW5uZWwiIHZhcmlhYmxlcyBjYW4gYmUgcHJlZGljdGVkIGJ5IG91ciBjbHVzdGVycwoKCmBgYHtyfQp0YWJsZShjdXN0b21lcnMkQ2hhbm5lbCwgZml0JGNsdXN0ZXIpCmBgYAoKCmBgYHtyfQp0YWJsZShjdXN0b21lcnMkUmVnaW9uLCBmaXQkY2x1c3RlcikKYGBgCgoKVGhlc2UgdGFibGVzIHN1Z2dlc3QgdGhhdCB0aGUgY2x1c3RlcnMgdGhhdCB3ZXJlIGNyZWF0ZWQgYmFzZWQgb24gdGhlIHNwZW5kaW5nIGRhdGEgYXJlIG5vdCBpbmRpY2F0aXZlIG9mIHdoYXQgcmVnaW9uIG9yIHdoYXQgY2hhbm5lbCB0aGUgbW9uZXkgd2FzIGJlaW5nIHNwZW50IGF0LgoKTGV0J3MgbG9vayBhdCBhIGNvcnJlbGF0aW9uIG1hdHJpeCB0byBzZWUgaG93IHRoZSB2YXJpYWJsZXMgYXJlIHJlbGF0ZWQgaW4gdGhlIG9yaWdpbmFsIGRhdGEgc2V0LiAKCgpgYGB7cn0KaW5zdGFsbC5wYWNrYWdlcygiSG1pc2MiKQpgYGAKCmBgYHtyfQpsaWJyYXJ5KEhtaXNjKQp5IDwtIHJjb3JyKGFzLm1hdHJpeChuZXdDdXN0KSkKeQpgYGAKCgoKIk1pbGsiIGFuZCAiRnJlc2giIHNoYXJlIGEgd2VhayBjb3JyZWxhdGlvbi4gTGV0J3Mgc2VlIGhvdyB0aGV5IGNvbXBhcmUgaW4gdGhlIGNsdXN0ZXJzIHdlIGNyZWF0ZWQuCgoKCgpgYGB7cn0KcGxvdChuZXdDdXN0W2MoIkZyZXNoIiwgIk1pbGsiKV0sIGNvbCA9IGZpdCRjbHVzdGVyLCBjZXggPSAuNSkKcG9pbnRzKGZpdCRjZW50ZXJzWyxjKCJGcmVzaCIsICJNaWxrIildLCBjb2w9NSwgcGNoPSJYIikKYGBgCgoKCkFzIHNob3duIGluIHRoZSBncmFwaCBhYm92ZSwgYmFzZWQgb24gdGhlIGNsdXN0ZXJzIHdlIGNyZWF0ZWQsIHRoZXJlIGlzIGEgc2xpZ2h0IG92ZXJsYXAgYmV0d2VlbiBjdXN0b21lcnMgd2hvIHB1cmNoYXNlIHNpZ25pZmljYW50IGFtb3VudCBvZiBtaWxrIGFuZCB0aG9zZSB3aG8gcHVyY2hhc2UgYSBsb3Qgb2YgRnJlc2ggcHJvZHVjZS4gSG93ZXZlciwgYXMgdGhlIHZhbHVlcyBpbmNyZWFzZSBmb3IgYm90aCBtaWxrIGFuZCBmcmVzaCBwcm9kdWNlIHJlc3BlY3RpdmVseSwgdGhlIGNvcnJlbGF0aW9uIGJlY29tZXMgd2Vha2VyLiBJdCBpcyBlc3BlY2lhbGx5IGNsZWFyIHRoYXQgdGhvc2Ugd2hvIHB1cmNoYXNlIGxvdHMgb2YgZnJlc2ggcHJvZHVjZSB0ZW5kIG5vdCB0byBiZSBpbiB0aGUgbWFya2V0IGZvciBtaWxrLiBUaGUgYnJpZ2h0ICJYIiBpbmRpY2F0ZWQgdGhlIGNlbnRlciBvZiB0aGUgY2x1c3RlciBpbiB0aGlzIGdyYXBoIGFuZCBpbiBzdWJzZXF1ZW50IGdyYXBocyBpbiB0aGlzIGFzc2lnbm1lbnQuCgoKYGBge3J9CnBsb3QobmV3Q3VzdFtjKCJEZXRlcmdlbnRzX1BhcGVyIiwgIkdyb2NlcnkiKV0sIGNvbCA9IGZpdCRjbHVzdGVyLCBjZXggPSAuNSkKcG9pbnRzKGZpdCRjZW50ZXJzWyxjKCJEZXRlcmdlbnRzX1BhcGVyIiwgIkdyb2NlcnkiKV0sIGNvbD01LCBwY2g9IlgiKQpgYGAKCgoKSW4gdGhpcyBncmFwaCwgd2Ugc2VlIHRoYXQgIkdyb2NlcnkiIGFuZCAiRGV0ZXJnZW50cyIgdmFyaWFibGVzIGFyZSBzdHJvbmdseSBjb3JyZWxhdGVkLiBJdCB3b3VsZCBiZSB3aXNlIGZvciB0aGUgY29tcGFueSB0byBtYXJrZXQgYm90aCBkZXRlcmdlbnRzX3BhcGVyIGFuZCBncm9jZXJ5IHByb2R1Y3RzIHRvIHRoZSBvYnNlcnZhdGlvbnMgc2hvd24gaW4gcmVkLiBUaGVzZSBjdXN0b21lcnMgdGVuZCB0byBzcGVuZCBtb25leSBvbiBib3RoIG9mIHRoZXNlIHByb2R1Y3RzLgoKCgpMZXTigJlzIGxvb2sgYXQgdGhlIGRhdGEgYWdhaW4gYnV0IHRoaXMgdGltZSBsZXTigJlzIG5vdCBub3JtYWxpemUgYmVmb3JlIHdlIGNyZWF0ZSBjbHVzdGVycy4gVGhpcyB3aWxsIGxldCB0aGUgbW9zdCBwb3B1bGFyIHByb2R1Y3RzIHRha2UgcHJlY2VkZW50LgoKCgoKYGBge3J9CmN1c3RvbWVycyA8LSByZWFkLmNzdih1cmwoImh0dHBzOi8vYXJjaGl2ZS5pY3MudWNpLmVkdS9tbC9tYWNoaW5lLWxlYXJuaW5nLWRhdGFiYXNlcy8wMDI5Mi9XaG9sZXNhbGUlMjBjdXN0b21lcnMlMjBkYXRhLmNzdiIpLCBoZWFkZXIgPSBUUlVFLCBzZXAgPSAiLCIpCmBgYAoKCgoKCgpgYGB7cn0KY3VzdG9tZXJzJFJlZ2lvbiA8LSBhcy5mYWN0b3IoY3VzdG9tZXJzJFJlZ2lvbikKY3VzdG9tZXJzJENoYW5uZWwgPC0gYXMuZmFjdG9yKChjdXN0b21lcnMkQ2hhbm5lbCkpCmBgYAoKYGBge3J9Cm5ld0N1c3QgPC0gY3VzdG9tZXJzCm5ld0N1c3QkUmVnaW9uIDwtIE5VTEwKbmV3Q3VzdCRDaGFubmVsIDwtIE5VTEwKYGBgCgoKCmBgYHtyfQpjbHVzIDwtIDI6MTIKc2V0LnNlZWQoMjIpCldTUyA8LSBzYXBwbHkoY2x1cywgZnVuY3Rpb24oaykgewogIGttZWFucyhuZXdDdXN0LCBjZW50ZXJzID0gaykkdG90LndpdGhpbnNzCn0pCldTUwpgYGAKYGBge3J9CnBsb3QoY2x1cywgV1NTLCB0eXBlID0gImwiLCB4bGFiID0gIm51bWJlciBvZiBrIiwgeWxhYiA9ICJ3aXRoaW4gc3VtIG9mIHNxdWFyZXMiKQpgYGAKCgoKYGBge3J9CmxpYnJhcnkoZnBjKQpgYGAKCgpgYGB7cn0Kc3cgPSBzYXBwbHkoY2x1cywgZnVuY3Rpb24oaykgewogIGNsdXN0ZXIuc3RhdHMoZGlzdChuZXdDdXN0KSwga21lYW5zKG5ld0N1c3QsIGNlbnRlcnMgPSBrKSRjbHVzdGVyKSRhdmcuc2lsd2lkdGgKICB9KQpgYGAKCmBgYHtyfQpzdwpgYGAKCmBgYHtyfQpwbG90KGNsdXMsIHN3LCB0eXBlID0gImwiLCB4bGFiID0gIm51bWJlciBvZiBjbHVzdGVycyIsIHlsYWIgPSAiYXZlcmFnZSBjaWxob3VldHRlIHdpZHRoIikKYGBgCgoKVGhlIGdyYXBoIGFib3ZlIGluZGljYXRlcyB0aGF0IHRoYXQgMyBjbHVzdGVycyB3b3VsZCBiZSBvcHRpbWFsLgoKYGBge3J9CmNsdXNbd2hpY2gubWF4KHN3KV0KYGBgCgoKTGV0J3MgY3JlYXRlIHRoZSB0aHJlZSBjbHVzdGVycy4KCmBgYHtyfQpzZXQuc2VlZCgyMikKZml0ID0ga21lYW5zKG5ld0N1c3QsIDMpCmZpdApgYGAKCgpgYGB7cn0KYmFycGxvdCh0KGZpdCRjZW50ZXJzKSwgYmVzaWRlID0gVFJVRSwgeGxhYj0iY2x1c3RlciIsIHlsYWI9InZhbHVlIikKYGBgCgpBcyBzaG93biBhYm92ZSwgdGhlIGZhc3QgbWFqb3JpdHkgb2YgcGVvcGxlIHdobyB0ZW5kZWQgdG8gc2hvcCBmb3IgZnJlc2ggcHJvZHVjZSB3ZXJlIGdyb3VwZWQgaW4gY2x1c3RlciAzLiBUaGUgY29tcGFueSB3b3VsZCBiZSB3aXNlIHRvIG1hcmtldCB0aGVpciBmcmVzaCBwcm9kdWNlIHRvIGdyb3VwIDMuIFRoZSBhbHNvIG1pZ2h0IGJlbmVmaXQgZnJvbSBtYXJrZXRpbmcgdGhlaXIgZ3JvY2VyeSBwcm9kdWN0cyB0byB0aGUgY3VzdG9tZXJzIGluIGdyb3VwIDIuCgoKCgpgYGB7cn0KbGlicmFyeShjbHVzdGVyKQpjbHVzcGxvdChuZXdDdXN0LCBmaXQkY2x1c3RlciwgY29sb3IgPSBUUlVFLCBzaGFkZSA9IFRSVUUpCmBgYAoKCgpUaGUgY2x1c3RlcnMgYWJvdmUgY29udGFpbiBvbmUgb2Ygc2l6ZSAzMzAsIDUwLCBhbmQgNjAgZm9yIGNsdXN0ZXIgMSwgMiBhbmQgMyByZXNwZWN0aXZlbHkuCgoKCmBgYHtyfQptZHMgPSBjbWRzY2FsZShkaXN0KG5ld0N1c3QpLCBrID0gMikKcGxvdChtZHMsIGNvbCA9IGZpdCRjbHVzdGVyKQpgYGAKCgoKTGV0J3Mgc2VlIGlmIHRoaXMgY2x1c3RlcmluZyBjYW4gcHJlZGljdCByZWdpb24gYW5kIGNoYW5uZWwuIEkgc29ydCBvZiBkb3VidCB0aGF0IHRoZXNlIHZhcmlhYmxlcyBhcmUgcmVsYXRlZC4KCgpgYGB7cn0KdGFibGUoY3VzdG9tZXJzJENoYW5uZWwsIGZpdCRjbHVzdGVyKQpgYGAKCgpgYGB7cn0KdGFibGUoY3VzdG9tZXJzJFJlZ2lvbiwgZml0JGNsdXN0ZXIpCmBgYAoKClRoZXNlIHRhYmxlcyBhbHNvIHN1Z2dlc3QgdGhhdCB0aGUgY2x1c3RlcnMgdGhhdCB3ZXJlIGNyZWF0ZWQgYmFzZWQgb24gdGhlIHNwZW5kaW5nIGRhdGEgYXJlIG5vdCBpbmRpY2F0aXZlIG9mIHdoYXQgcmVnaW9uIG9yIHdoYXQgY2hhbm5lbCB0aGUgbW9uZXkgd2FzIGJlaW5nIHNwZW50IGF0LgoKTG9va2luZyBhZ2FpbiBhdCB0aGUgY29ycmVsYXRpb24gbWF0cml4LCBsZXRzIHBpY2sgdHdvIHZhcmlhYmxlcyB3aXRoIGEgdmVyeSB3ZWFrIGNvcnJlbGF0aW9uLgoKCmBgYHtyfQpsaWJyYXJ5KEhtaXNjKQp5IDwtIHJjb3JyKGFzLm1hdHJpeChuZXdDdXN0KSkKeQpgYGAKCkxldCdzIGxvb2sgYXQgYW5vdGhlciBwYWlyIG9mIHByb2R1Y3RzIHRoYXQgaGF2ZSBubyBjb3JyZWxhdGlvbi4KCgpgYGB7cn0KcGxvdChuZXdDdXN0W2MoIkRldGVyZ2VudHNfUGFwZXIiLCAiTWlsayIpXSwgY29sID0gZml0JGNsdXN0ZXIsIGNleCA9IC41KQpwb2ludHMoZml0JGNlbnRlcnNbLGMoIkRldGVyZ2VudHNfUGFwZXIiLCAiTWlsayIpXSwgY29sPTUsIHBjaD0iWCIpCmBgYAoKCgpUaGlzIGdyYXBoIGRvZXNuJ3Qgc2VlbSB0byBsZWFkIHRvIG11Y2ggb2YgYW55IHZhbHVhYmxlIGluc2lnaHQuCgoKYGBge3J9CnBsb3QobmV3Q3VzdFtjKCJGcm96ZW4iLCAiR3JvY2VyeSIpXSwgY29sID0gZml0JGNsdXN0ZXIsIGNleCA9IC41KQpwb2ludHMoZml0JGNlbnRlcnNbLGMoIkZyb3plbiIsICJHcm9jZXJ5IildLCBjb2w9NSwgcGNoPSJYIikKYGBgCgpJdCBsb29rcyBhcyB0aG91Z2ggdGhlIHBlb3BsZSBpbiB0aGUgcmVkIGNsdXN0ZXIgdGVuZCB0byBidXkgZ3JvY2VyaWVzIGF0IGEgbXVjaCBoaWdoZXIgcmF0ZSB0aGFuIGZyb3plbiBmb29kcy4KCgpXaGlsZSB0aGVyZSBhcmUgbWFueSBtb3JlIGV4cGxvcmF0aW9ucyB0aGF0IHdlIGNvdWxkIHBlcmZvcm0gb24gb3VyIGNsdXN0ZXJzLCBsZXRzIHNoaWZ0IGdlYXJzIHRvIEhDQSBjbHVzdGVyaW5nLgoKCgpIQ0EgQ0xVU1RFUklORwoKRmlyc3Qgd2UgbG9hZCB0aGUgZGF0YS4KCmBgYHtyfQpkYXRhIDwtIHJlYWQuY3N2KHVybCgiaHR0cHM6Ly9hcmNoaXZlLmljcy51Y2kuZWR1L21sL21hY2hpbmUtbGVhcm5pbmctZGF0YWJhc2VzLzAwMjkyL1dob2xlc2FsZSUyMGN1c3RvbWVycyUyMGRhdGEuY3N2IiksIGhlYWRlciA9IFRSVUUsIHNlcCA9ICIsIikKYGBgCgpMZXQncyBhZ2FpbiByZW1vdmUgdGhlIHJlZ2lvbiBhbmQgY2hhbm5lbCB2YXJpYWJsZXMgYW5kIHdlJ2xsIG5lZWQgdG8gbm9ybWFsaXplIHRoZSBkYXRhLgoKYGBge3J9CmRhdGEkQ2hhbm5lbCA8LSBOVUxMCmRhdGEkUmVnaW9uIDwtTlVMTApgYGAKCgpgYGB7cn0Kbm9ybWFsaXplIDwtIGZ1bmN0aW9uKHgpIHsKICAgIHJldHVybiAoKHggLSBtaW4oeCkpIC8gKG1heCh4KSAtIG1pbih4KSkpCn0KZGF0YVsxOjZdIDwtIGFzLmRhdGEuZnJhbWUobGFwcGx5KGRhdGFbMTo2XSwgbm9ybWFsaXplKSkKYGBgCgoKCldlJ2xsIHVzZSBwYWNrYWdlcyAiTmJDbHVzdCIgYW5kICJmYWN0b2V4dHJhIiB0byBjaG9vc2UgdGhlIG9wdGltYWwgbnVtYmVyIG9mIGNsdXN0ZXJzIGZvciBvdXIgSENBIGFuYWx5c2lzLgoKCmBgYHtyfQppbnN0YWxsLnBhY2thZ2VzKCJOYkNsdXN0IikKaW5zdGFsbC5wYWNrYWdlcygiZmFjdG9leHRyYSIpCmBgYAoKYGBge3J9CmxpYnJhcnkoTmJDbHVzdCkKbGlicmFyeShmYWN0b2V4dHJhKQpgYGAKCkZvciB0aGUgZmlyc3QgYW5hbHlzaXMgd2UnbGwgdXNlIHdhcmQuRDIgbGlua2FnZS4KCgpgYGB7cn0KbmIgPC0gTmJDbHVzdChkYXRhLCBkaXN0YW5jZSA9ICJldWNsaWRlYW4iLCBtaW4ubmMgPSAyLAogICAgICAgICAgICAgIG1heC5uYyA9IDEwLCBtZXRob2QgPSAid2FyZC5EMiIpCmBgYAoKYGBge3J9CmZ2aXpfbmJjbHVzdChuYikKYGBgCgpBcyBzaG93biBhYm92ZSwgdGhlIG9wdGltYWwgbnVtYmVyIG9mIGNsdXN0ZXJzIGlzIDMuCgoKCgpgYGB7cn0KaGMgPSBoY2x1c3QoZGlzdChkYXRhLCBtZXRob2Q9ImV1Y2xpZGVhbiIpLCBtZXRob2Q9IndhcmQuRDIiKQpoYwpgYGAKCgpgYGB7cn0KcGxvdChoYywgaGFuZyA9IC0wLjAxLCBjZXggPSAwLjcpCmBgYAoKV2UnbGwgbm93IGN1dCB0aGUgZGVudHJvZ3JhbSBpbnRvIHRocmVlIGNsdXN0ZXJzLgoKYGBge3J9CmZpdCA8LSBjdXRyZWUoaGMsIGsgPSAzKQp0YWJsZShmaXQpCmBgYAoKCkFzIHNlZW4gYWJvdmUsIDMxMCBvZiB0aGUgb2JzZXJ2YXRpb25zIGFyZSBjb250YWluZWQgaW4gY2x1c3RlciAxLCAxMjUgaW4gY2x1c3RlciAyLCBhbmQgNSBpbiBjbHVzdGVyIDMuCgoKYGBge3J9CnBsb3QoaGMpCnJlY3QuaGNsdXN0KGhjLCBrID0gNCwgYm9yZGVyID0gInJlZCIpCmBgYAoKClRoZSBjbHVzdGVycyBhcmUgc2hvd24gYWJvdmUuCgpMZXQncyBhbHNvIGNyZWF0ZSBjbHVzdGVycyB3aXRoICJzaW5nbGUiIGxpbmthZ2UuCgpgYGB7cn0KbmIyIDwtIE5iQ2x1c3QoZGF0YSwgZGlzdGFuY2UgPSAiZXVjbGlkZWFuIiwgbWluLm5jID0gMiwKICAgICAgICAgICAgICBtYXgubmMgPSAxMCwgbWV0aG9kID0gInNpbmdsZSIpCmBgYAoKYGBge3J9CmZ2aXpfbmJjbHVzdChuYjIpCmBgYAoKMiBjbHVzdGVycyBpcyBvcHRpbWFsIGZvciB0aGlzIG1ldGhvZC4KCmBgYHtyfQpoYzIgPC0gaGNsdXN0KGRpc3QoZGF0YSksIG1ldGhvZCA9ICJzaW5nbGUiKQpwbG90KGhjMiwgaGFuZyA9IC0uMDEsIGNleCA9IC43KQpgYGAKCgpgYGB7cn0KaGMyCmBgYAoKYGBge3J9CmZpdDIgPC0gY3V0cmVlKGhjMiwgayA9IDIpCnRhYmxlKGZpdDIpCmBgYAoKCkFzIHNob3duIGFib3ZlLCB0aGUgc2luZ2xlIG1ldGhvZCBkaWRu4oCZdCBkbyBhbnkgbWVhbmluZ2Z1bCBjbHVzdGVyaW5nIHRoYXQgd2UgY2FuIHdvcmsgd2l0aC4gTGV0J3MgYW5hbHl6ZSBvdXIgd2FyZC5EMiBsaW5rYWdlIGNsdXN0ZXJzIGluc3RlYWQuCgoKCgpgYGB7cn0KcGxvdChoYykKcmVjdC5oY2x1c3QoaGMsIGsgPSA0LCBib3JkZXIgPSAicmVkIikKYGBgCgoKCgpgYGB7cn0KZml0NSA8LSBjdXRyZWUoaGMsIGggPSAzKQp0YWJsZShmaXQ1KQpgYGAKCmBgYHtyfQpwbG90KGhjKQpyZWN0LmhjbHVzdChoYywgaCA9IDMsIGJvcmRlciA9ICJyZWQiKQpgYGAKCgoKCgpgYGB7cn0KZml0NSA8LSBjdXRyZWUoaGMsIGggPSAuNSkKdGFibGUoZml0NSkKYGBgCgpgYGB7cn0KcGxvdChoYykKcmVjdC5oY2x1c3QoaGMsIGggPSAuNSwgYm9yZGVyID0gInJlZCIpCmBgYAoKCgpSZWdhcmRsZXNzIG9mIHRoZSB3aGVyZSB3ZSBjaG9vc2UgdG8gY3V0IHRoZSBkZW5kcm9ncmFtIGludG8gY2x1c3RlcnMsIHRoZXJlIHNlZW0gdG8ganVzdCBiZSB0b28gbWFueSBvYnNlcnZhdGlvbnMuIFRoZSBkZW5kcm9ncmFtIGRvZXNuJ3QgaGF2ZSBtdWNoIHZhbHVlIHdoZW4gdGhlcmUgYXJlIDQ0MCBjdXN0b21lcnMgYWxsIGxpc3RlZCBieSBhIGRpZmZlcmVudCBudW1iZXIuIFRoZXJlZm9yZSwgSSB0aGluayBIQ0Egd291bGQgaGF2ZSBtb3JlIHZhbHVlIG9uIGEgc21hbGxlciBkYXRhIHNldCB3aGVyZSB0aGUgb2JzZXJ2YXRpb25zIGFyZSBkaXN0aW5ndWlzaGVkIGJ5IG5hbWUuIEknbSBhbHNvIGF3YXJlIHRoYXQgbXkga25vd2xlZGdlIG9mIGhvdyB0byBidWlsZCBhbmQgZWRpdCBkZW5kcm9ncmFtcyBpcyBsaW1pdGVkLCBzbyBwZXJoYXBzIHRoZXJlIGlzIGEgd2F5IHRvIG9idGFpbiBtb3JlIHZhbHVlIGZyb20gdGhpcyBkYXRhIHNldCB1c2luZyBIQ0EuCg==