Reading in the data:
data_cereals <- read.table("./data/T11-9_Fixed_JB (1).DAT", header = FALSE, row.names = 1)
dist_cereals <- dist(data_cereals[,-1])
Displaying a subset of the distance matrix:
print(round(dist_cereals_sub <- dist(data_cereals[1:6,-1]),2))
## ACCheerios Cheerios CocoaPuffs CountChocula GoldenGrahams
## Cheerios 116.04
## CocoaPuffs 15.51 121.65
## CountChocula 6.36 117.89 10.00
## GoldenGrahams 103.20 61.63 100.62 102.10
## HoneyNutCheerios 72.82 44.12 78.36 74.43 54.26
clust_cer1 <- hclust(dist_cereals,method = "single")
clust_cer2 <- hclust(dist_cereals,method = "complete")
plot(clust_cer1)
plot(clust_cer2)
COMMENT: The two diagrams are quite different. For example, in the first, All Bran stands on a cluster of its own, while in the second it forms part of a larger cluster.
Doing a K-means clustering for 2,3 and 4 centers respectively and subsetting only the clustering:
kmeans_cer2 <- kmeans(x = data_cereals[,2:10], centers = 2)
kmeans_cer3 <- kmeans(x = data_cereals[,2:10], centers = 3)
kmeans_cer4 <- kmeans(x = data_cereals[,2:10], centers = 4)
kmeans2_cer_clust <- kmeans_cer2$cluster
kmeans3_cer_clust <- kmeans_cer3$cluster
kmeans4_cer_clust <- kmeans_cer4$cluster
Comparing these results with the results from Question 1 above:
First need to create cuts for 2,3 and 4 on the two dendogram renditions:
dendo_cer1_2 <- cutree(tree=clust_cer1, k = 2)
dendo_cer1_3 <- cutree(tree=clust_cer1, k = 3)
dendo_cer1_4 <- cutree(tree=clust_cer1, k = 4)
dendo_cer2_2 <- cutree(tree=clust_cer2, k = 2)
dendo_cer2_3 <- cutree(tree=clust_cer2, k = 3)
dendo_cer2_4 <- cutree(tree=clust_cer2, k = 4)
Biding the dendogram for each number of clusters with the respective K-means clustering:
clust_cer2 <- cbind(dendo_cer1_2,dendo_cer2_2, kmeans2_cer_clust)
clust_cer3 <- cbind(dendo_cer1_3,dendo_cer2_3, kmeans3_cer_clust)
clust_cer4 <- cbind(dendo_cer1_4,dendo_cer2_4, kmeans4_cer_clust)
Now comparing these:
round(cor(clust_cer2),2)
## dendo_cer1_2 dendo_cer2_2 kmeans2_cer_clust
## dendo_cer1_2 1.00 -0.08 -0.08
## dendo_cer2_2 -0.08 1.00 1.00
## kmeans2_cer_clust -0.08 1.00 1.00
round(cor(clust_cer3),2)
## dendo_cer1_3 dendo_cer2_3 kmeans3_cer_clust
## dendo_cer1_3 1.00 0.57 -0.45
## dendo_cer2_3 0.57 1.00 -0.41
## kmeans3_cer_clust -0.45 -0.41 1.00
round(cor(clust_cer4),2)
## dendo_cer1_4 dendo_cer2_4 kmeans4_cer_clust
## dendo_cer1_4 1.00 0.56 -0.59
## dendo_cer2_4 0.56 1.00 -0.47
## kmeans4_cer_clust -0.59 -0.47 1.00
COMMENT: From the correlations, the following:
* It appears that the K-means clustering and the dendogram (complete) give exactly the same clusters for two clusters and that the comparitively stronger correlation remains as the number of centers/clusters increases.
* The correlations generally become weaker as the number of clusters/centers increases.
data_records <- read.table("./data/T1-9.dat", header = FALSE, row.names = 1, sep ="")
NOTE: Needed to change the spaces after Korea, N and Korea, S to Korea,N and Korea,S
dist_records <- dist(data_records)
Displaying a subset of the distance matrix:
print(round(dist_records_sub <- dist(data_records[1:6,-1]),2))
## ARG AUS AUT BEL BER
## AUS 7.89
## AUT 4.48 11.03
## BEL 7.37 2.88 11.33
## BER 23.88 31.06 20.04 31.21
## BRA 3.49 4.42 6.95 4.45 26.92
clust_records1 <- hclust(dist_records,method = "single")
clust_records2 <- hclust(dist_records,method = "complete")
plot(clust_records1)
plot(clust_records2)
COMMENT: Again, the results as is quite apparant from the dendograms are really quite different.
kmeans_records2 <- kmeans(x = data_records, centers = 2)
kmeans_records3 <- kmeans(x = data_records, centers = 3)
kmeans_records4 <- kmeans(x = data_records, centers = 4)
kmeans2_rec_clust <- kmeans_records2$cluster
kmeans3_rec_clust <- kmeans_records3$cluster
kmeans4_rec_clust <- kmeans_records4$cluster
First need to create cuts for 2,3 and 4 on the two dendogram renditions:
dendo_rec1_2 <- cutree(tree=clust_records1, k = 2)
dendo_rec1_3 <- cutree(tree=clust_records1, k = 3)
dendo_rec1_4 <- cutree(tree=clust_records1, k = 4)
dendo_rec2_2 <- cutree(tree=clust_records2, k = 2)
dendo_rec2_3 <- cutree(tree=clust_records2, k = 3)
dendo_rec2_4 <- cutree(tree=clust_records2, k = 4)
Biding the dendogram for each number of clusters with the respective K-means clustering:
clust_records2 <- cbind(dendo_rec1_2,dendo_rec2_2, kmeans2_rec_clust)
clust_records3 <- cbind(dendo_rec1_3,dendo_rec2_3, kmeans3_rec_clust)
clust_records4 <- cbind(dendo_rec1_4,dendo_rec2_4, kmeans4_rec_clust)
Now comparing these:
round(cor(clust_records2),2)
## dendo_rec1_2 dendo_rec2_2 kmeans2_rec_clust
## dendo_rec1_2 1.00 0.81 -0.44
## dendo_rec2_2 0.81 1.00 -0.54
## kmeans2_rec_clust -0.44 -0.54 1.00
round(cor(clust_records3),2)
## dendo_rec1_3 dendo_rec2_3 kmeans3_rec_clust
## dendo_rec1_3 1.00 0.60 0.52
## dendo_rec2_3 0.60 1.00 -0.26
## kmeans3_rec_clust 0.52 -0.26 1.00
round(cor(clust_records4),2)
## dendo_rec1_4 dendo_rec2_4 kmeans4_rec_clust
## dendo_rec1_4 1.00 0.7 -0.16
## dendo_rec2_4 0.70 1.0 -0.50
## kmeans4_rec_clust -0.16 -0.5 1.00
See hand-written work
Using the data_cereals dataset, and working with three K-means centers. That is with kmeans_cer3
Consider the centers:
centers <- kmeans_cer3$centers
star <- stars(centers,len=0.6,lwd=2, col.lines=1:6)
For the Faces:
install.packages(“aplpack”)
library(aplpack)
## Warning: package 'aplpack' was built under R version 3.1.3
## Loading required package: tcltk
face <- faces(data_cereals[,2:10])
## effect of variables:
## modified item Var
## "height of face " "V3"
## "width of face " "V4"
## "structure of face" "V5"
## "height of mouth " "V6"
## "width of mouth " "V7"
## "smiling " "V8"
## "height of eyes " "V9"
## "width of eyes " "V10"
## "height of hair " "V11"
## "width of hair " "V3"
## "style of hair " "V4"
## "height of nose " "V5"
## "width of nose " "V6"
## "width of ear " "V7"
## "height of ear " "V8"
The heatmap:
heatmap(sapply(data_cereals,as.numeric))