Data Mining II-Assignment 2

Question 1. List the methods of clustering

Partitioning methods
Hierarchical methods
Density-based methods
Grid-based methods

Question 2. What are partitioning methods?

Partitioning methods are iterative processes where at every step the points or observations can be shifted between clusters until the overall quality of the solution is optimized.

Question 3. What are hierarchical methods?

Hierarchical methods obtains clusters by creating visualizations like tree structures to display the distances between each point/observation. Once these distances are graphed in a tree form the clusters are visible.

Question 4.

For a dendrogram, given the relationship,

h(d)<= h(g) <=> d strictly contained in g, which node is higher on the tree, d or g?

Because the height of g is greater than the height of d, the node for g is higher on the tree.

Question 5. The dendrogram follows which approaches for the task of building the hierarchy of clustering solutions?

Divisive
Agglomerative

Question 6. Explain the divisive clustering approach.

Divisive clustering begins with one cluster so all the points/observations are in the same cluster then at each iteration the cluster is split into different clusters that share distance/proximity. The iteration process finishes with singleton clusters of individual points.

Question 7. What are density-based methods?

Density-based methods finds areas with high density and these areas usually have something in common or are the columns of your dataset in otherwords feature space.

Question 8. What are the two criteria used to elevate a clustering solution?

Compactness
Seperation

Question 9. Explain agglomerative clustering.

Agglomerative clustering is in some way the opposite of divisive clustering. We start with every point/observation being its own cluster, then take two closest clusters and form a bigger cluster. Continue this iteration until we have the desired number of clusters.

Question 10. What is the goal of the hierarchical clustering methods?

The goal of the hierarchical clustering methods is rank possible solutions based on the preferred group size ranging from single to n, where n is the number of observations on the dataset.

Question 11. What is data noise?

Data noise is data that doesn’t follow typical patterns like the majority of the dataset which can lead to creating clusters that are too small. In order for this to not happen we need to set parameters.

Question 12.

Based on agglomerative hierarchical clustering methods, name three criteria that select the pair of groups that is most similar and is merged into a single group.

Single linkage criterion
Complete linkage method
Average linkage

Question 13. Explain the single linkage criteria.

The single linkage criteria takes the two points that are closest in the groups and takes that distance as the measure between both groups.

Question 14. Which type of clustering is implemented in the function, hclust( )?

The type of clustering is agglomerative hierarchical clustering.

Question 15. For the function, hclust( ), what are the first two arguments?

Distance Matrix of the dataset
Criterion (SIngle, complete, etc.)

Question 16.

Average linking is a merging criterion for hierarchical clustering that uses the average distances between points in two clusters to determine proximity. That is, it turns the distance matrix for data points between two clusters into an average value for proximity.