1.  List the methods of clustering
-->
A. Partitioning methods
B. Hierarchical methods 
C. Density-based methods
D. Grid-based methods

2.  What are partitioning methods?
-->
K-mean clustering

k-medoids / PAM (Partitioning Around Medoids): It is k-means but uses medoids (representative data point to each group) instead of centroids.

 3. What are hierarchical methods?
 -->
The hierarchical methods groups the records by considering more similarities in one group and dismiliraties in another group. It is like a tree-like structure model and visualiztion is called a dendrogram. One simple example is Dendogram tree of eveolution of life in the earth.
 
 4. For a dendrogram, given the relationship,
h(d) ≤ h(g) ⟺ d ⊆ g ,
which node is higher on the tree, d or g?
-->
A dendrogram is an n-tree in which each internal node is associated with a height satisfying the condition
                 
h(d) ≤ h(g) ⟺ d ⊆ g 

for all subsets of data points, d and g, provided d∩g ≠𝜙, where h(d) and h(g) denote the heights of d and g, respectively.



 5. The dendrogram follows which approaches for the task of building the hierarchy of clustering solutions?
 -->
 These dendrograms follow
(i). a divisive, or
(ii). an agglomerative
approach to the task of building the hierarchy clustering.

 6. Explain the divisive clustering approach.
 -->
 The divisive(Top-down method) approach starts with a single group containing all the observations and then it iteratively keeps splitting one of the current groups into two separate clusters     according to some criterion until n groups with a single observation are    obtained, where n is the number of cases (or observations)  in the dataset.
 
7.  What are density-based methods?
-->

These methods try to find regions of the variable space where observations are packed together with high density, finds clusters as dense regions separated by sparse areas.
Example:
DBSCAN (Density-Based Spatial Clustering of Applications with Noise)


 8. What are two criteria used to elevate a clustering solution?
 -->
 
1. Measures of cluster cohesion (compactness): Decrease dispersiion witin cluster
   and 
2. Measures of cluster separation (isolation): Increase dispersion between clusters 
 
 9. Explain agglomerative clustering.
 -->
 In agglomerative hierarchical clustering (Bottom to top approach) – you start with the points as individual clusters and, at each step, merge the closest pair of clusters. This requires defining a notion  of cluster proximity. 

10. What is the goal of the hierarchical clustering methods?
-->
The hierarchical cluster organize data into a hierarchy of nested clusters, providing dendogram tree structure. The height of the dendogram also provides the number of cluster at that particular height.

 11.    What is data noise?
 -->
 Definition of noise in data could be different by the field of study. One noise in data is: noise points do not have enough observations within their radius nor
are sufficiently close to any core point. 
The algorithm starts by removing the noise points into a separate cluster that contains cases that are too different up to a point of not making sense to use them in
the cluster formation.

 12.    Based on agglomerative hierarchical clustering methods, name three criteria that select the pair of groups that is most similar and is merged into a single group.
-->
(i). Single linkage criterion
(ii). Complete linkage
(iii). Average linkage


13. Explain the single linkage criteria.
--> 
The single linkage criterion – measures the difference between two groups by the smallest distance between any two observations in each group.


14. Which type of clustering is implemented in the function, hclust( )?
-->
The hierarchical clustering
 
 15.    For the function, hclust( ), what are the first two arguments (in Torgo’s text)?
 -->
 hclust(d, method = "complete", members = NULL)
 
First argument is distance matrix produced by dist() function
Second argument is method like "single", "complete", "average"
 
 16.    Explain average linking.
 -->
The average linkage uses the average distance between any two observations of the two groups.  At each iteration, the pair of groups that is most close distance is merged into a single group.