#                Clustering - Part 2 - Assignment 2

 

#1. List the methods of clustering? Partitioning, Density Based,
#   Hierarchical, Grid-Based

#2. What are partitioning methods? They use the information on the
#   distances between the cases in the dataset to obtain the k “best”
#   groups, according to a certain criterion.

#3. What are hierarchical methods?  obtain a hierarchy of alternative
#   clustering solutions, known as a dendrogram. These methods can follow a divisive or an
#   agglomerative approach to the task of building the hierarchy.

#4. For a dendrogram, given the relationship,
#            h(d) < = h(g) <=> d is subset of g
#           which node is higher on the tree, d or g? g

 

#5. The dendrogram follows which approaches for the task of building the hierarchy of
#     clustering solutions? Divisive and agglomerative.

#6. Explain the divisive clustering approach.you start with one, all inclusive cluster and, at
#   each step, spit a cluster until only singleton clusters of individual points remain. In this 
#   case, we need to decide which cluster to split at each step and how to do the splitting

#7. What are density-based methods?These methods try to find regions of the feature 
#   space where cases are packed together  with high density, and because of this they are 
#   frequently also used as a way of finding outliers as these are by definition rather 
#   different from other cases and thus should not belong to these high-density regions of 
#   the features space

#8. What are two criteria used to elevate a clustering solution? (i) compactness — how 
#   similar are the cases on each cluster; and  (ii) separation — how different is a cluster 
#   from the others

#9. Explain agglomerative clustering. Start with as many groups as there are cases in the 
#   dataset. At each iteration, the pair of groups that is most similar is merged into a single
#   group.

#10. What is the goal of the hierarchical clustering methods?  Their goal is to obtain a 
#    hierarchy of possible solutions ranging from one single group to n-groups, where n is
#    the number of observations in the dataset.

#11. What is data noise? Data that has meaningless information.

#12. Based on agglomerative hierarchical clustering methods, name three criteria that 
#    select the pair of groups that is most similar and is merged into a single group. Single, 
#    Complete and Average Linkage.

#13. Explain the single linkage criteria. The single linkage criterion – measures the 
#    difference between two groups by the smallest distance between any two 
#    observations in each group

#14. Which type of clustering is implemented in the function, hclust( )? Agglomerative 
#    hierarchical clustering.

#15. For the function, hclust( ), what are the first two arguments (in # Torgo’s text)? First 
#    Argument is the distance and the second is the merging method the default is complete.

#16. Explain average linking. The average linkage uses the average distance between any 
#    two observations of the two groups.  At each iteration, the pair of groups that is most 
#    similar is merged into a single group.
#