Cluster Analysis Application

Developing Data Products Class

Giovanni Valentini, 19/03/2016

Motivation and Source of Data

The application performs a Cluster Analysis of the 28 European Union Countries.

  • The analysis examines 3 economic indicators (year 2014):
    1. Gross Domestic Product (Euro per capita)
    2. Government Debt (Percentage of Gross Domestic Product)
    3. Unemployment Rate.
  • The dataset of the economic indicators is obtained from: Eurostat
  • The indicators come from 3 distinct datasets which I cleaned and assembled together.

The server.R and ui.R files and the associated supporting documentation are in the following GitHub repository: https://github.com/GValentini/Data_Products.git

The application is available here: https://gvalentini.shinyapps.io/ClustersEur-App/

How the Application Works

  • The user chooses:
    1. the economic indicators to plot on the 2 axes
    2. the number of centres
  • The points plotted represent the ordered pairs of the chosen indicators.
  • We have one point for each Country.
  • The K-Means method creates a partition of the points into k groups (called clusters) such that the sum of squares from points to the assigned cluster centres is minimized.
  • Each cluster is identified by a distinct color.
  • The cross-shaped symbol represents the cluster centre.
  • More details about K-Means Clustering Analysis here: https://en.wikipedia.org/wiki/K-means_clustering

Plot of GDP versus Government Debt with 4 centres

eurdf <- readRDS("eurodata.rds")
result <- kmeans(eurdf[, 4:5], centers = 4)
par(mfrow = c(2, 1), pin = c(7, 2), mar = c(4, 4, 0, 2))
plot(eurdf[, 4:5], col = result$cluster, pch = 19, cex = 1.8); grid(lty = 1)
text(eurdf[, 4:5], labels = eurdf[,2], pos = 3, cex = 0.8, col = result$cluster)
points(result$centers, col = 1:4, pch = 3, cex = 3, lwd = 3)

plot of chunk unnamed-chunk-1

I hope you will enjoy my shiny application

In the meantime you may take this quiz:

Interactive Question

What does K-Means refer to?

  1. An economic indicator about European Union Countries.
  2. A set of points on a plot.
  3. A method used in cluster analysis.
  4. A plotting system in R.

Think of K as the number of centres of the partition.

It is a method used in cluster analysis.