November 24, 2019

The Organization of Hierarchy

Introduction to Hierarchical Clustering

Decision on Cluster Level

Hierarchical Cluster Analysis: Agglomerative

Hierarchical Cluster Analysis: Divisive

Agglomerative vs Divisive Intuition

Agglomerative vs Divisive

Steps to Perform Hierarchical Clustering (Agglomerative)

  • view each data point as an individual "cluster" with just that one point as a member
  • calculate Euclidean distance between the centroids of all the clusters
  • group the closest point pairs together
  • repeat Step 2 and Step 3 until you reach a single cluster containing all the data in your set.
  • plot a dendrogram
  • decide on level

Decision on Cluster Level

Hierarchical Cluster Analysis (Agglomerative) in R

Compute distances

distances = dist(data, method = "euclidean")

applying Hierarchical clustering

clusterData = hclust(distances, method = "complete")

Plot the dendrogram

plot(clusterData)

Decision on Cluster Level

clusterGroups = cutree(clusterData, k = 10)

Agglomerative vs Divisive

Agglomerative

  • less complexity

  • may fooled by local neighbors

  • not see the larger implications of clusters

Divisive

  • see the entire data distribution

  • more accurate

  • deeper complexity (can decrease the stability and increase the runtime)

k-mean vs Hierarchical Cluster Analysis

k-mean

  • simplicity

  • instantiating random centroids and finding the closest points are time consuming

Hierarchical Cluster Analysis

  • no need to pass in an explicit "k" number of clusters

  • has more parameters to tweak

  • clusters can be subjectively chosen through the evaluation of a dendrogram plot

Conclusions

Clustering Analysis for Recommender Systems:

  • works at a group level

  • generates less-personal recommendations

  • often leads to worse accuracy than nearest neighbor algorithms

  • works faster

  • effective in shrinking the selection of relevant neighbors in a collaborative filtering algorithm

Sources

By Benjamin Johnston, Aaron Jones, Christopher Kruger May 2019, "Applied Unsupervised Learning with Python"

https://www.researchgate.net/publication/303870754_FHCC_A_Soft_Hierarchical_Clustering_Approach_for_Collaborative_Filtering_Recommendation

https://rpubs.com/kismetk/Netflix-recommendation

https://www.displayr.com/what-is-hierarchical-clustering/

https://towardsdatascience.com/understanding-the-concept-of-hierarchical-clustering-technique-c6e8243758ec

https://www.sciencedirect.com/topics/computer-science/hierarchical-cluster-analysis

Image Sources