Importing data from excel and setting the working directory. The matrix represented below shows the perceived personal distances between these countries. The matrix is symmetric.
GL <- read.table(pipe("pbpaste"))
print(GL)
## AR BR CA CN DE ES IT NL PL TR US
## Argentina 0 1 6 9 8 4 5 8 8 8 6
## Brazil 1 0 5 8 7 4 5 8 8 9 7
## Canada 6 5 0 7 5 5 5 6 7 8 2
## China 9 8 7 0 6 6 7 4 5 4 7
## Germany 8 7 5 6 0 4 4 2 4 5 8
## Spain 4 4 5 6 4 0 2 4 5 5 6
## Italy 5 5 5 7 4 2 0 4 5 5 8
## Netherlands 8 8 6 4 2 4 4 0 4 4 7
## Poland 8 8 7 5 4 5 5 4 0 3 7
## Turkey 8 9 8 4 5 5 5 4 3 0 6
## USA 6 7 2 7 8 6 8 7 7 6 0
setwd("~/Desktop/UPF - MSc Management and BuA/THIRD TERM/Advanced Statistical Methods")
The Dendogram cluster representation is the most common one and from here we begin the analysis.
I decided to cut the cluster at 4.5 because it was the height at which there was more spacing between clusters. Graphically, we can deduce that the clusters this method created are: Argentina-Brazil, Canada-USA, Germany-Netherlands-Spain-Italy, China, Poland-Turkey.
I agree with the results of this Analysis as it has divided South America, North America, Europe, Eastern Europe and china on its own.
It seems that my perception of distances has been strongly influenced by geography and relative culture.
# complete linkage clustering (default)
GL.clust <- hclust(as.dist(GL))
plot(GL.clust)
abline(h=4.5, col="red", lty=2 )
The second step of this cluster analysis is running the Hierarchical Cluster Analysis using the Average Linkage method which considers the average difference between the different countries.
Here the level of the cut was to be made much lower as i encountered a problem of tangent when cutting.The results shows the clusters: Germany-Netherlands, Spain-Italy, China, Poland-Turkey, Argentina-Brazil, Canada-USA. The results are coherent with the Dendogram cluster with the difference that here the average difference created subgroups that could be interpreted as cultural cluster more than location related cluster. In fact, for example, Germany and Netherlands created a new cluster and Spain with Italy make up another one, which In reckon are more similar in culture and language as well.
# average linkage clustering
plot(hclust(as.dist(GL), method="average"), main="Average linkage")
abline(h=3.5,col="blue", lty=2)
As a last step I performed the Ward Clustering, a method based on the optimal value of the objective function. I decided to cut the clusters at Height 6 and the results show: China-Poland-Turkey, Germany-Netherlands-Spain-Italy, Argentina-Brazil and Canada-USA.
These last analysis leaves me with a bit of doubts as, for example, China-Poland-Turkey does not seem to be an optimal group in my opinion.
# Ward clustering (next class)
plot(hclust(as.dist(GL), method="ward.D2"), main="Ward",
xlab="Countries", hang=-1)
abline(h=6, col="pink", lty=2)
Additionally I compared my results with the ones of
Dendogram
Average linkage
Ward