library(cluster)
olive<-read.csv("~/Desktop/碩一下/多變量/olive.csv",h=T)
head(olive)
## Region Area Palmitic Palmitoleic Stearic Oleic Linoleic Linolenic
## 1 1 1 1075 75 226 7823 672 36
## 2 1 1 1088 73 224 7709 781 31
## 3 1 1 911 54 246 8113 549 31
## 4 1 1 966 57 240 7952 619 50
## 5 1 1 1051 67 259 7771 672 50
## 6 1 1 911 49 268 7924 678 51
## Arachidic Eicosenoic
## 1 60 29
## 2 61 29
## 3 63 29
## 4 78 35
## 5 80 46
## 6 70 44
newolive<-olive[,3:10]
x <-daisy(newolive, stand=T) # Standarized
agn <-agnes(x,metric="euclidean",method="single")
# Use the following interactive command for both the “dedrogram” and “banner plot” :
plot(agn,which.plots=2)
# 觀察是否有outlier存在(例如522, 79),outlier會影響切點
plot(agn,which.plots=1)
# for a “banner plot”, you are not able to get a clear plot since we have more than 500 objects
# (this is just a horizontal version of the dendrogram).
# However, from the output the AC (Agglomerative Coefficient) is derived to be 0.73,
# which shows a strong clustering structure.
# You can also check out the AC (Agglomerative Coefficient) by using:
agn$ac
## [1] 0.7346398
# This shows a pretty good clustering structure.
# Check that if the resulting grouping agrees with the original “Regions”:
olive[,1][agn$order] # 樹從左邊開始標示,最左邊的群都是1, 但3的部分有outlier出現(1)
## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [36] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [71] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [106] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [141] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [176] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [211] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [246] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [281] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [316] 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## [351] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## [386] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3
## [421] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
## [456] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
## [491] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
## [526] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
## [561] 3 3 3 3 3 3 3 3 1 1 1 3
# I would say “yes”, except for 3 region “1” in the last line.
# Check that if the resulting grouping agrees with the original “Areas”:
olive[,2][agn$order] # 依然從左到右去標籤
## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 4 4 4 1 1 1 4 4 1 1 1 1 1 1 2 2 2 3 3 3 3 3
## [36] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
## [71] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
## [106] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
## [141] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 2 2 2 3 3 3 2 2 2 2 2 2 4 2
## [176] 2 2 2 2 2 2 2 4 2 2 2 2 2 3 3 3 2 2 2 3 3 3 3 4 4 4 4 3 3 3 3 2 3 3 2
## [211] 2 2 2 2 2 2 3 3 4 3 3 3 3 3 3 2 2 2 3 3 2 3 3 3 3 3 2 3 4 3 3 3 3 3 4
## [246] 4 2 2 2 3 3 4 4 4 3 3 2 2 3 3 4 4 3 3 3 3 4 4 4 4 3 3 3 3 4 3 4 4 3 4
## [281] 4 4 4 3 3 3 2 2 4 4 3 3 3 1 2 3 3 3 3 3 3 1 4 3 2 1 3 4 3 3 2 2 2 2 3
## [316] 3 3 3 3 4 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
## [351] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 6 6 5 5 5 5 6
## [386] 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 5 6 5 6 5 9 9
## [421] 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9
## [456] 9 9 9 9 9 9 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 9 9 9 7 7 7 7 7 7 7
## [491] 8 8 8 8 8 8 8 8 8 8 8 8 8 7 8 7 7 7 7 7 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8
## [526] 8 8 8 8 7 8 7 7 8 8 8 8 7 7 8 7 7 8 9 9 9 9 9 7 7 7 8 7 7 7 7 7 8 8 8
## [561] 8 8 8 8 8 8 7 7 3 3 2 7
# I would say “no” here.
Q: How about using other linkages?
agn<-agnes(x,metric="euclidean",method="complete")
plot(agn,which.plots=2)
# This results a better clustering structure, say, AC = 0.93.
# Better than single, so it would be better method
agn<-agnes(x,metric="euclidean",method="ward")
plot(agn,which.plots=2)
# This results an even larger AC = 0.99.
di<-diana(x,metric="euclidean")
plot(di, which.plots=2)
plot(di, which.plots=1)
di$dc
## [1] 0.924267
Note that DC=0.924267 shows a pretty strong clustering. Check that if the resulting grouping agrees with the original “Regions”:
olive[,1][di$order]
## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
## [36] 3 3 3 3 3 3 3 3 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
## [71] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
## [106] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
## [141] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [176] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [211] 1 1 1 1 1 1 3 3 3 3 3 3 3 3 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [246] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [281] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [316] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [351] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [386] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [421] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [456] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2
## [491] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## [526] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## [561] 2 2 2 2 2 2 2 2 2 2 2 2