## Clustering – Part 2 – Assignment 2
## Second Submission
## 3, 5, 8, 9, 11, 14
## Clustering - Part 2 - Assignment 2
## 1. List the methods of clustering
## • Partitioning methods
## • Hierarchical methods
## • Density-based methods
## • Grid-based methods
## 2. What are partitioning methods?
## • They use the information on the distances between the cases (or observations) in the dataset to obtain the ## k “best” groups, according to a certain criterion.
## 3. What are hierarchical methods?
## • Is obtained by a hierarchy of alternative clustering solutions – known as dendrograms.
## 4. For a dendrogram, given the relationship,
## h(d) ≤ h(g) ⟺ d ⊆ g ,
## which node is higher on the tree, d or g?
## • g is higher
## 5. The dendrogram follows which approaches for the task of building the hierarchy of clustering solutions?
## • Dendrograms follow a divisive, or an agglomerative approach to the task of building the hierarchy
## 6. Explain the divisive clustering approach.
## • you start with one, all inclusive cluster and, at each step, spit a cluster until only singleton clusters of ## individual points remain
## 7. What are density-based methods?
## • Clusters are dense regions in the data space, separated by regions of lower object density – A cluster is
## defined as a maximal set of density connected points – Discovers clusters of arbitrary shape
## 8. What are two criteria used to elevate a clustering solution?
## • Compactness and Separation.
## 9. Explain agglomerative clustering.
## • you start with the points as individual clusters and, at each step, merge the closest pair of clusters. What is ## the goal of the hierarchical clustering methods?
## • start with as many groups as there are cases (points, observations, etc.) in the dataset. At each iteration, ## the pair of groups that is most similar is merged into a single group.
## 10. What is the goal of the hierarchical clustering methods?
## • The goal of the hierarchical clustering methods is to obtain a hierarchy of possible solutions ranging from ## one single group to n groups, where n is the number of observations in the dataset
## 11. What is data noise?
## • When attempting to cluster parts of the data, which can be referred to as noise, can disturb the clustering ## on the remaining domain points.
## 12. Based on agglomerative hierarchical clustering methods, name three criteria that select the pair of groups that ## is most similar and is merged into a single group.
## • The single linkage criterion
## • The complete linkage method,
## • The average linkage
## 13. Explain the single linkage criteria.
## • In single linkage, the distance between two clusters is the minimum distance between members of the two
## clusters
## 14. Which type of clustering is implemented in the function, hclust( )?
## • Agglomerative Hierarchical clustering
## 15. For the function, hclust( ), what are the first two arguments (in Torgo’s text)?
## • The function, hclust() – takes in the first argument the distance matrix* of the dataset while the second
## argument specifies the criterion (single, complete, …) used to select the two groups for merging at each
## step.
## 16. Explain average linking.
## • In average linkage, the distance between two clusters is the average of all distances between members of the ## two clusters