#VARIOUS DISTANCES
clusters_euc <- hclust(dist(MLBdata, method= "euclidean"))
## Warning in dist(MLBdata, method = "euclidean"): NAs introduced by coercion
plot(clusters_euc)
clusters_max<- hclust(dist(MLBdata, method="maximum"))
## Warning in dist(MLBdata, method = "maximum"): NAs introduced by coercion
plot(clusters_max)
clusters_man<-hclust(dist(MLBdata, method="manhattan"))
## Warning in dist(MLBdata, method = "manhattan"): NAs introduced by coercion
plot(clusters_man)
-All three create 2 clear clusters, although it can be argued that there are 3 clusters for the maximum distance metric and the manhattan distance metric
-The first cluster on the left is about the same for both the euclidean distance and the manhattan distance
-At first I figured it was grouped by the best and worst teams. After looking at the standings, it is not clear what the teams are grouped by. I do not have much knowledge on the teams or baseball statistics, so it is difficult to say what they may clustered by.
-The maximum distance provides a much different type of clustering than the other two. -COL, DET, and LAA are grouped together in all three clusters
## Warning in dist(MLBdata, method = "euclidean"): NAs introduced by coercion
## Warning in dist(MLBdata, method = "euclidean"): NAs introduced by coercion
## Warning in dist(MLBdata, method = "euclidean"): NAs introduced by coercion
## Warning in dist(MLBdata, method = "euclidean"): NAs introduced by coercion
From the different linkages between the euclidean distance, we notice: -The first three linkages create 2 sepeate clusters, and the centroid creates a very strange dendrogram with technically 3 clusters but it seems like just one big one -COL and DET are next to each other in all 4 -CIN and SDP are next to each other in all 4 (in centroid, it is in a different cluster but just barely) -Complete and Mcquitty seem the most similar clusters -Honestly though, I have no idea what variables are creating these clusters
## Warning in dist(MLBdata, method = "maximum"): NAs introduced by coercion
## Warning in dist(MLBdata, method = "maximum"): NAs introduced by coercion
## Warning in dist(MLBdata, method = "maximum"): NAs introduced by coercion
## Warning in dist(MLBdata, method = "maximum"): NAs introduced by coercion
From the different linkages between the maxixmum distance, we notice: -BAL and MIA both branch off to their own groups very early in the first three linkages -Again, the cluster dendogram created from the centroid linkage is very different than the rest as it does not produce distinct clusters -The average and mcquitty linkages produce similar looking dendrograms
## Warning in dist(MLBdata, method = "manhattan"): NAs introduced by coercion
## Warning in dist(MLBdata, method = "manhattan"): NAs introduced by coercion
## Warning in dist(MLBdata, method = "manhattan"): NAs introduced by coercion
## Warning in dist(MLBdata, method = "manhattan"): NAs introduced by coercion
From the different linkages between the manhattan distance, we notice: -The dendrograms produced by the complete and mcquitty distances are extremely similar -The dendrogram produced by the centroid linkage is yet again very different than the other 3 -SDP seems to branch off very quickly in the average linkage and the centroid linkage -Again, not sure how to interpret the clusterings; most have two clusters