Cluster Analysis Example

Intro

The following work consists of a cluster analysis based on the use of a dissimilarity matrix.
The dataset refers to the Gross National Product (GNP) per capita and the percentage of the population working in agriculture for each country belonging to the European Union in 1993 (see R Documentation for details).

library(cluster)
data(agriculture)
mydata <- agriculture
names(mydata) <- c("GNP", "Agriculture")
summary(mydata)

##       GNP         Agriculture    
##  Min.   : 5.90   Min.   : 2.300  
##  1st Qu.:11.28   1st Qu.: 3.500  
##  Median :16.50   Median : 5.850  
##  Mean   :14.88   Mean   : 8.417  
##  3rd Qu.:18.02   3rd Qu.:11.675  
##  Max.   :21.30   Max.   :22.200

# scatterplot
plot(mydata$GNP, mydata$Agriculture, pch = 21, 
     bg = "skyblue", main = "percentage working in Agriculture Vs per capita GNP")
grid()
lines(lowess(mydata$GNP, mydata[, 2]), col = 2, lwd = 2)
text(mydata$GNP, mydata[, 2], rownames(mydata), cex = 0.7, pos = 3, col = "darkgreen")

1 - calculate the dissimilarity matrix

mydiss <- daisy(x = mydata, metric = "euclidean")

In the analysis we make use of both the clustering methods: hierarchical and partitioning clustering.

2 - Partitioning Around Medoids

mypam <- pam(x = mydiss, k = 2, diss = TRUE)
# bivariate cluster plot
clusplot(mypam, shade = TRUE, color = TRUE, labels = 2)

3 - Agglomerative Nesting (hierarchical clustering)

myagn <- agnes(x = mydiss, diss = TRUE)
# clustering tree (dendrogram) plot
pltree(myagn)

# bannerplot
bannerplot(myagn)

Cluster Analysis Example

Giovanni Valentini

Tuesday, September 27, 2016

Intro

1 - calculate the dissimilarity matrix

2 - Partitioning Around Medoids

3 - Agglomerative Nesting (hierarchical clustering)