November 24, 2019
Compute distances
distances = dist(data, method = "euclidean")
applying Hierarchical clustering
clusterData = hclust(distances, method = "complete")
Plot the dendrogram
plot(clusterData)
Decision on Cluster Level
clusterGroups = cutree(clusterData, k = 10)
Agglomerative
less complexity
may fooled by local neighbors
not see the larger implications of clusters
Divisive
see the entire data distribution
more accurate
deeper complexity (can decrease the stability and increase the runtime)
k-mean
simplicity
instantiating random centroids and finding the closest points are time consuming
Hierarchical Cluster Analysis
no need to pass in an explicit "k" number of clusters
has more parameters to tweak
clusters can be subjectively chosen through the evaluation of a dendrogram plot
Clustering Analysis for Recommender Systems:
works at a group level
generates less-personal recommendations
often leads to worse accuracy than nearest neighbor algorithms
works faster
effective in shrinking the selection of relevant neighbors in a collaborative filtering algorithm
By Benjamin Johnston, Aaron Jones, Christopher Kruger May 2019, "Applied Unsupervised Learning with Python"
https://rpubs.com/kismetk/Netflix-recommendation
https://www.displayr.com/what-is-hierarchical-clustering/
https://www.sciencedirect.com/topics/computer-science/hierarchical-cluster-analysis
https://www.bbvadata.com/recommender-systems-marketing-gets-personal/
https://subscription.packtpub.com/book/data/9781785884856/4/ch04lvl1sec25/clustering-techniques
https://towardsdatascience.com/unsupervised-learning-k-means-vs-hierarchical-clustering-5fe2da7c9554