This analysis was initially published here I have redone the analysis for my own practice.
\[ Happiness Index = (Life Expectancy * Expirience Wellbeing * Inequality of outcomes)/ Ecological footprints \]
HPI tells us “how well nations are doing at achieving long, happy, sustainable lives”. The index is weighted to give progressively higher scores to nations with lower ecological footprints.
library(dplyr)
library(plotly)
library(stringr)
library(cluster)
library(FactoMineR)
library(factoextra)
library(ggplot2)
library(reshape2)
library(ggthemes)
library(NbClust)
library(readxl)
library(GGally)
library(maps)
ggplot(hpi, aes(x=gdp, y=life_expectancy, size = population)) +
geom_point(aes(color = region)) +
coord_trans(x = 'log10') +
geom_smooth(method = 'loess') +
ggtitle('Life Expectancy and GDP per Capita in USD log10') + theme_classic()
ggplot(hpi, aes(x=life_expectancy, y=hpi_index)) +
geom_point(aes(size=population, color=region)) +
geom_smooth(method = 'loess') +
ggtitle('Life Expectancy and Happy Planet Index Score') +
theme_classic()
ggplot(hpi, aes(x=gdp, y=hpi_index)) +
geom_point(aes(size=population, color=region)) +
geom_smooth(method = 'loess') +
ggtitle('GDP per Capita(log10) and Happy Planet Index Score') +
coord_trans(x = 'log10') +
theme_classic()
hpi[, 4:13] <- scale(hpi[, 4:13])
qplot(data=melt(cor(hpi[, 4:13],use="p")), x=Var1, y=Var2, fill=value, geom="tile") +
scale_fill_gradient2(limits=c(-1, 1)) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
labs(title="Heatmap of Correlation Matrix", x=NULL, y=NULL)
PCA is a procedure for identifying a smaller number of uncorrelated variables, called “principal components”, from a large set of data. The goal of principal components analysis is to explain the maximum amount of variance with the minimum number of principal components.
hpi.pca <- PCA(hpi[, 4:13], graph=FALSE)
eigenvalues <- hpi.pca$eig
fviz_screeplot(hpi.pca, addlabels = TRUE, ylim = c(0, 65))
(hpi.pca$var$contrib)
## Dim.1 Dim.2 Dim.3 Dim.4
## life_expectancy 12.27500107 2.298157e+00 0.002516184 18.4965447
## wellbeing 12.31846893 7.472989e-02 0.198445432 22.1593907
## happy_years 14.79371047 1.288175e-02 0.027105103 0.7180341
## footprint 9.02127688 2.471162e+01 2.982449522 0.4891428
## inequality_outcomes 13.36365052 3.049462e-01 0.010038818 9.7957329
## adj_life_expectancy 12.67789150 9.580098e-01 0.001525891 19.5589843
## adj_wellbeing 12.28394580 3.206506e-04 0.135269065 23.5949300
## hpi_index 3.57121564 5.096355e+01 5.368971166 2.1864830
## gdp 9.68826525 1.157381e+01 1.003632002 2.3980025
## population 0.00657393 9.101975e+00 90.270046817 0.6027549
## Dim.5
## life_expectancy 3.179724e-01
## wellbeing 6.376146e+00
## happy_years 3.254368e-02
## footprint 7.629671e+00
## inequality_outcomes 2.976993e+00
## adj_life_expectancy 7.545215e-02
## adj_wellbeing 4.808546e+00
## hpi_index 5.284314e+00
## gdp 7.249799e+01
## population 3.689407e-04
fviz_pca_var(hpi.pca, col.var="contrib",gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"), repel = TRUE )
This highlights the most important variables in explaining the variations retained by the principal components.
number <- NbClust(hpi[, 4:13], distance="euclidean",
min.nc=2, max.nc=15, method='kmeans', index='all', alphaBeale = 0.1)
## *** : The Hubert index is a graphical method of determining the number of clusters.
## In the plot of Hubert index, we seek a significant knee that corresponds to a
## significant increase of the value of the measure i.e the significant peak in Hubert
## index second differences plot.
##
## *** : The D index is a graphical method of determining the number of clusters.
## In the plot of D index, we seek a significant knee (the significant peak in Dindex
## second differences plot) that corresponds to a significant increase of the value of
## the measure.
##
## *******************************************************************
## * Among all indices:
## * 7 proposed 2 as the best number of clusters
## * 7 proposed 3 as the best number of clusters
## * 1 proposed 6 as the best number of clusters
## * 1 proposed 7 as the best number of clusters
## * 1 proposed 10 as the best number of clusters
## * 2 proposed 12 as the best number of clusters
## * 4 proposed 15 as the best number of clusters
##
## ***** Conclusion *****
##
## * According to the majority rule, the best number of clusters is 2
##
##
## *******************************************************************
set.seed(2017)
pam <- pam(hpi[, 4:13], diss=FALSE, 3, keep.data=TRUE)
fviz_silhouette(pam)
## cluster size ave.sil.width
## 1 1 43 0.46
## 2 2 66 0.32
## 3 3 31 0.37
hpi$country[pam$id.med]
## [1] "Liberia" "Romania" "Ireland"
fviz_cluster(pam, stand = FALSE, geom = "point",ellipse.type = "norm")
hpi['cluster'] <- as.factor(pam$clustering)
map <- map_data("world")
map <- left_join(map, hpi, by = c('region' = 'country'))
ggplot() + geom_polygon(data = map, aes(x = long, y = lat, group = group, fill=cluster, color=cluster)) +
labs(title = "Clustering Happy Planet Index", subtitle = "Based on data from:http://happyplanetindex.org/", x=NULL, y=NULL) +
theme_minimal()