The following work consists of a cluster analysis based on the use of a dissimilarity matrix.
The dataset refers to the Gross National Product (GNP) per capita and the percentage of the population working in agriculture for each country belonging to the European Union in 1993 (see R Documentation for details).
library(cluster)
data(agriculture)
mydata <- agriculture
names(mydata) <- c("GNP", "Agriculture")
summary(mydata)
## GNP Agriculture
## Min. : 5.90 Min. : 2.300
## 1st Qu.:11.28 1st Qu.: 3.500
## Median :16.50 Median : 5.850
## Mean :14.88 Mean : 8.417
## 3rd Qu.:18.02 3rd Qu.:11.675
## Max. :21.30 Max. :22.200
# scatterplot
plot(mydata$GNP, mydata$Agriculture, pch = 21,
bg = "skyblue", main = "percentage working in Agriculture Vs per capita GNP")
grid()
lines(lowess(mydata$GNP, mydata[, 2]), col = 2, lwd = 2)
text(mydata$GNP, mydata[, 2], rownames(mydata), cex = 0.7, pos = 3, col = "darkgreen")
mydiss <- daisy(x = mydata, metric = "euclidean")
In the analysis we make use of both the clustering methods: hierarchical and partitioning clustering.
mypam <- pam(x = mydiss, k = 2, diss = TRUE)
# bivariate cluster plot
clusplot(mypam, shade = TRUE, color = TRUE, labels = 2)
myagn <- agnes(x = mydiss, diss = TRUE)
# clustering tree (dendrogram) plot
pltree(myagn)
# bannerplot
bannerplot(myagn)