1 The Happy Planet Index 2016

The Happy Planet Index (HPI) is an index of human well-being and environmental impact that was created by Nic Marks, telling us how well nations are doing at achieving long, happy, sustainable lives. The index combines four elements to how how efficiently residents of different countries are using environmental resources to lead long, happy lives. I downloaded the 2016 dataset from Happy Planet Index website.

My goal is to find correlations between several variables, then use clustering technic to separate these 140 countries into different clusters, according to wellbeing, wealth (GDP), life expectancy and carbon emissions.

1.1 Library

Loading the necessary package for this analysis.

library(tidyverse)
library(plotly)
library(stringr)
library(cluster)
library(FactoMineR)
library(factoextra)
library(reshape2)
library(ggthemes)
library(NbClust)
library(readxl)

1.2 Exploratory Data Analysis

I imported the data with read_xlsx() from readxl package and I subset the columns that will be used in our data visualization and analysis.

hpi <- read_xlsx("hpi-data-2016.xlsx", sheet = 5, col_names = TRUE)

hpi <- hpi[-c(1:4, 145:161), ]

I renamed the column names (or variables), to ease the analysis process.

names(hpi) <- c("Rank", "Country", "Region", "Avg.Life.Expectancy", "Avg.Wellbeing", "Happy.Life.Years", "Footprint.gha", "Inequality", "Inequality.LE", "Inequality.W", "HPI", "GDP", "Population", "GINI.Index")

head(hpi)

## # A tibble: 6 x 14
##   Rank  Country Region Avg.Life.Expect… Avg.Wellbeing Happy.Life.Years
##   <chr> <chr>   <chr>  <chr>            <chr>         <chr>           
## 1 110   Afghan… Middl… 59.667999999999… 3.8           12.396023808740…
## 2 13    Albania Post-… 77.346999999999… 5.5           34.414736010872…
## 3 30    Algeria Middl… 74.313000000000… 5.6           30.469461311230…
## 4 19    Argent… Ameri… 75.927000000000… 6.5           40.166673874579…
## 5 73    Armenia Post-… 74.445999999999… 4.3           24.018760060702…
## 6 105   Austra… Asia … 82.052000000000… 7.2           53.069497709526…
## # ... with 8 more variables: Footprint.gha <chr>, Inequality <chr>,
## #   Inequality.LE <chr>, Inequality.W <chr>, HPI <chr>, GDP <chr>,
## #   Population <chr>, GINI.Index <chr>

I correct the class of the variables, as we can see they appear to have chr as class.

hpi$Rank <- as.integer(hpi$Rank)
hpi$Region <- as.factor(hpi$Region)
hpi$Avg.Life.Expectancy <- as.numeric(hpi$Avg.Life.Expectancy)
hpi$Avg.Wellbeing <- as.numeric(hpi$Avg.Wellbeing)
hpi$Happy.Life.Years <- as.numeric(hpi$Happy.Life.Years)
hpi$Footprint.gha <- as.numeric(hpi$Footprint.gha)
hpi$Inequality <- as.numeric(hpi$Inequality)
hpi$Inequality.LE <- as.numeric(hpi$Inequality.LE)
hpi$Inequality.W <- as.numeric(hpi$Inequality.W)
hpi$HPI <- as.numeric(hpi$HPI)
hpi$GDP <- as.numeric(hpi$GDP)
hpi$Population <- as.numeric(hpi$Population)

glimpse(hpi)

## Observations: 140
## Variables: 14
## $ Rank                <int> 110, 13, 30, 19, 73, 105, 43, 8, 102, 87, ...
## $ Country             <chr> "Afghanistan", "Albania", "Algeria", "Arge...
## $ Region              <fct> Middle East and North Africa, Post-communi...
## $ Avg.Life.Expectancy <dbl> 59.668, 77.347, 74.313, 75.927, 74.446, 82...
## $ Avg.Wellbeing       <dbl> 3.800000, 5.500000, 5.600000, 6.500000, 4....
## $ Happy.Life.Years    <dbl> 12.396024, 34.414736, 30.469461, 40.166674...
## $ Footprint.gha       <dbl> 0.79000, 2.21000, 2.12000, 3.14000, 2.2300...
## $ Inequality          <dbl> 0.42655744, 0.16513372, 0.24486175, 0.1642...
## $ Inequality.LE       <dbl> 38.34882, 69.67116, 60.47454, 68.34958, 66...
## $ Inequality.W        <dbl> 3.390494, 5.097650, 5.196449, 6.034707, 3....
## $ HPI                 <dbl> 20.22535, 36.76687, 33.30054, 35.19024, 25...
## $ GDP                 <dbl> 690.8426, 4247.4854, 5583.6162, 14357.4116...
## $ Population          <dbl> 29726803, 2900489, 37439427, 42095224, 297...
## $ GINI.Index          <chr> "Data unavailable", "28.96", "Data unavail...

Let’s see the statistics for this dataset, with summary().

summary(hpi[, 3:12])

##                           Region   Avg.Life.Expectancy Avg.Wellbeing  
##  Americas                    :25   Min.   :48.91       Min.   :2.867  
##  Asia Pacific                :21   1st Qu.:65.04       1st Qu.:4.575  
##  Europe                      :20   Median :73.50       Median :5.250  
##  Middle East and North Africa:14   Mean   :70.93       Mean   :5.408  
##  Post-communist              :26   3rd Qu.:77.02       3rd Qu.:6.225  
##  Sub Saharan Africa          :34   Max.   :83.57       Max.   :7.800  
##  Happy.Life.Years Footprint.gha      Inequality      Inequality.LE  
##  Min.   : 8.97    Min.   : 0.610   Min.   :0.04322   Min.   :27.32  
##  1st Qu.:18.69    1st Qu.: 1.425   1st Qu.:0.13353   1st Qu.:48.21  
##  Median :29.40    Median : 2.680   Median :0.21174   Median :63.41  
##  Mean   :30.25    Mean   : 3.258   Mean   :0.23291   Mean   :60.34  
##  3rd Qu.:39.71    3rd Qu.: 4.482   3rd Qu.:0.32932   3rd Qu.:72.57  
##  Max.   :59.32    Max.   :15.820   Max.   :0.50734   Max.   :81.26  
##   Inequality.W        HPI             GDP          
##  Min.   :2.421   Min.   :12.78   Min.   :   244.2  
##  1st Qu.:4.047   1st Qu.:21.21   1st Qu.:  1628.1  
##  Median :4.816   Median :26.29   Median :  5691.1  
##  Mean   :4.973   Mean   :26.41   Mean   : 13911.1  
##  3rd Qu.:5.704   3rd Qu.:31.54   3rd Qu.: 15159.1  
##  Max.   :7.625   Max.   :44.71   Max.   :105447.1

1.2.1 GDP vs Average Life Expectancy

Let’s compare and maybe we can find a connection between GDP and Average Life Expectancy.

ggplot(hpi, aes(x=GDP, y=Avg.Life.Expectancy)) + 
  geom_point(aes(size=Population, color=Region)) +
  coord_trans(x = 'log10') +
  geom_smooth(method = 'loess') + 
  ggtitle('Life Expectancy and GDP per Capita in USD') +
  theme_classic() +
  theme(legend.justification = "left", legend.title = element_text(face = "bold")) +
  ylim(40, 90) +
  labs(x = "GDP in USD (with log transformation)",
       y = "Average Life Expectancy (year)")

After log transformation, the relationship between GDP per capita and life expectancy is more clear and looks relatively strong. These two variables are concordant. The Pearson correlation between this two variable is reasonably high, at approximate 0.62.

cor.test(hpi$GDP, hpi$Avg.Life.Expectancy)

## 
##  Pearson's product-moment correlation
## 
## data:  hpi$GDP and hpi$Avg.Life.Expectancy
## t = 9.3042, df = 138, p-value = 2.766e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.5072215 0.7133067
## sample estimates:
##       cor 
## 0.6208781

1.2.2 Average Life Expectany vs HPI

Let’s compare and maybe we can find a connection between Average Life Expectancy and Happy Planet Index Score.

ggplot(hpi, aes(x=Avg.Life.Expectancy, y=HPI)) + 
  geom_point(aes(size=Population, color=Region)) + 
  geom_smooth(method = 'loess') + 
  ggtitle('Average Life Expectancy and Happy Planet Index Score') +
  theme_classic() +
  theme(legend.justification = "left", legend.title = element_text(face = "bold")) +
  ylim(0, 50) +
  labs(x = "Average Life Expectancy (year)",
       y = "Happy Planet Index Score")

Many countries in Europe and Americas end up with middle-to-low HPI index probably because of their big carbon footprints, despite long life expectancy.

cor.test(hpi$Avg.Life.Expectancy, hpi$HPI)

## 
##  Pearson's product-moment correlation
## 
## data:  hpi$Avg.Life.Expectancy and hpi$HPI
## t = 7.5519, df = 138, p-value = 5.314e-12
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.4118020 0.6484859
## sample estimates:
##       cor 
## 0.5407609

1.2.3 GDP vs HPI

Let’s compare and maybe we can find a connection between GDP and Happy Planet Index Score.

ggplot(hpi, aes(x=GDP, y=HPI)) + 
  geom_point(aes(size=Population, color=Region)) + 
  coord_trans(x = 'log10') +
  geom_smooth(method = 'loess') + 
  ggtitle('GDP in USD (with log transformation) and Happy Planet Index Score') +
  theme_classic() +
  theme(legend.justification = "left", legend.title = element_text(face = "bold")) +
  ylim(0, 50) +
  labs(x = "GDP",
       y = "Happy Planet Index Score")

Apparently, money (GDP) can’t buy happiness. The correlation between GDP and Happy Planet Index score is indeed very low, at about 0.11.

cor.test(hpi$GDP, hpi$HPI)

## 
##  Pearson's product-moment correlation
## 
## data:  hpi$GDP and hpi$HPI
## t = 1.3507, df = 138, p-value = 0.179
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.05267424  0.27492060
## sample estimates:
##       cor 
## 0.1142272

2 Unsupervised Learning

2.1 Always (almost) scale the data.

An important step of meaningful clustering consists of transforming the variables such that they have mean zero and standard deviation one.

hpi_scale <- hpi[, 4:13]
hpi_scale  <- scale(hpi_scale)
summary(hpi_scale)

##  Avg.Life.Expectancy Avg.Wellbeing     Happy.Life.Years  
##  Min.   :-2.5153     Min.   :-2.2128   Min.   :-1.60493  
##  1st Qu.:-0.6729     1st Qu.:-0.7252   1st Qu.:-0.87191  
##  Median : 0.2939     Median :-0.1374   Median :-0.06378  
##  Mean   : 0.0000     Mean   : 0.0000   Mean   : 0.00000  
##  3rd Qu.: 0.6968     3rd Qu.: 0.7116   3rd Qu.: 0.71388  
##  Max.   : 1.4449     Max.   : 2.0831   Max.   : 2.19247  
##  Footprint.gha       Inequality      Inequality.LE      Inequality.W    
##  Min.   :-1.1493   Min.   :-1.5692   Min.   :-2.2192   Min.   :-2.1491  
##  1st Qu.:-0.7955   1st Qu.:-0.8222   1st Qu.:-0.8152   1st Qu.:-0.7795  
##  Median :-0.2507   Median :-0.1751   Median : 0.2060   Median :-0.1317  
##  Mean   : 0.0000   Mean   : 0.0000   Mean   : 0.0000   Mean   : 0.0000  
##  3rd Qu.: 0.5317   3rd Qu.: 0.7976   3rd Qu.: 0.8221   3rd Qu.: 0.6162  
##  Max.   : 5.4532   Max.   : 2.2702   Max.   : 1.4059   Max.   : 2.2339  
##       HPI                GDP            Population     
##  Min.   :-1.86308   Min.   :-0.6921   Min.   :-0.2990  
##  1st Qu.:-0.71120   1st Qu.:-0.6220   1st Qu.:-0.2740  
##  Median :-0.01653   Median :-0.4163   Median :-0.2339  
##  Mean   : 0.00000   Mean   : 0.0000   Mean   : 0.0000  
##  3rd Qu.: 0.70106   3rd Qu.: 0.0632   3rd Qu.:-0.0913  
##  Max.   : 2.50110   Max.   : 4.6356   Max.   : 8.1562

2.1.1 A simple correlation heatmap

From this heatmap, we can see the correlation of the variables with each other.

qplot(x=Var1, y=Var2, data=melt(cor(hpi_scale, use="p")), fill=value, geom="tile") +
  scale_fill_gradient2(limits=c(-1, 1)) +
  theme_classic() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) + 
             labs(title="Heatmap of Correlation Matrix", 
                  x=NULL, y=NULL)

2.1.2 Principal Component Analysis (PCA)

PCA is a procedure for identifying a smaller number of uncorrelated variables, called principal components, from a large set of data. The goal of principal components analysis is to explain the maximum amount of variance with the minimum number of principal components.

hpi.pca <- PCA(hpi_scale, graph=FALSE)
print(hpi.pca)

## **Results for the Principal Component Analysis (PCA)**
## The analysis was performed on 140 individuals, described by 10 variables
## *The results are available in the following objects:
## 
##    name               description                          
## 1  "$eig"             "eigenvalues"                        
## 2  "$var"             "results for the variables"          
## 3  "$var$coord"       "coord. for the variables"           
## 4  "$var$cor"         "correlations variables - dimensions"
## 5  "$var$cos2"        "cos2 for the variables"             
## 6  "$var$contrib"     "contributions of the variables"     
## 7  "$ind"             "results for the individuals"        
## 8  "$ind$coord"       "coord. for the individuals"         
## 9  "$ind$cos2"        "cos2 for the individuals"           
## 10 "$ind$contrib"     "contributions of the individuals"   
## 11 "$call"            "summary statistics"                 
## 12 "$call$centre"     "mean of the variables"              
## 13 "$call$ecart.type" "standard error of the variables"    
## 14 "$call$row.w"      "weights for the individuals"        
## 15 "$call$col.w"      "weights for the variables"

eigenvalues <- hpi.pca$eig
head(eigenvalues)

##        eigenvalue percentage of variance cumulative percentage of variance
## comp 1 6.66741533             66.6741533                          66.67415
## comp 2 1.31161290             13.1161290                          79.79028
## comp 3 0.97036077              9.7036077                          89.49389
## comp 4 0.70128270              7.0128270                          96.50672
## comp 5 0.24150648              2.4150648                          98.92178
## comp 6 0.05229306              0.5229306                          99.44471

2.1.3 Interpretation

The proportion of variation retained by the principal components was extracted above. eigenvalues is the amount of variation retained by each principal component (PC). The first PC corresponds to the maximum amount of variation in the data set. In this case, the first two principal components are worthy of consideration because A commonly used criterion for the number of factors to rotate is the eigenvalues-greater-than-one rule proposed by Kaiser (1960).

fviz_screeplot(hpi.pca, addlabels = TRUE, ylim = c(0, 70)) +
  theme_classic()

The scree plot shows us which components explain most of the variability in the data. In this case, almost 80% of the variances contained in the data are retained by the first two principal components.

head(hpi.pca$var$contrib)

##                         Dim.1       Dim.2       Dim.3      Dim.4
## Avg.Life.Expectancy 12.275001  2.29815687 0.002516184 18.4965447
## Avg.Wellbeing       12.318469  0.07472989 0.198445432 22.1593907
## Happy.Life.Years    14.793710  0.01288175 0.027105103  0.7180341
## Footprint.gha        9.021277 24.71161977 2.982449522  0.4891428
## Inequality          13.363651  0.30494623 0.010038818  9.7957329
## Inequality.LE       12.677892  0.95800977 0.001525891 19.5589843
##                          Dim.5
## Avg.Life.Expectancy 0.31797242
## Avg.Wellbeing       6.37614629
## Happy.Life.Years    0.03254368
## Footprint.gha       7.62967135
## Inequality          2.97699333
## Inequality.LE       0.07545215

Variables that are correlated with PC1 and PC2 are the most important in explaining the variability in the data set.

The contribution of variables was extracted above: The larger the value of the contribution, the more the variable contributes to the component.

fviz_pca_var(hpi.pca, col.var="contrib",
             gradient.cols = c("red", "yellow", "blue"),
             repel = TRUE 
             )

This highlights the most important variables in explaining the variations retained by the principal components.

2.2 Using Pam Clustering Analysis to group countries by wealth, development, carbon emissions, and happiness.

When using clustering algorithms, k must be specified by the analyst. I use the following method to help finding the best k.

NbClust package provides 30 indices for determining the number of clusters and proposes to user the best clustering scheme from the different results obtained by varying all combinations of number of clusters, distance measures, and clustering methods.

number <- NbClust(hpi_scale, 
                  distance="euclidean",
                  min.nc=2, 
                  max.nc=15, # By default, max.nc=15
                  method='ward.D', 
                  index='all', 
                  alphaBeale = 0.1)

## *** : The Hubert index is a graphical method of determining the number of clusters.
##                 In the plot of Hubert index, we seek a significant knee that corresponds to a 
##                 significant increase of the value of the measure i.e the significant peak in Hubert
##                 index second differences plot. 
##

## *** : The D index is a graphical method of determining the number of clusters. 
##                 In the plot of D index, we seek a significant knee (the significant peak in Dindex
##                 second differences plot) that corresponds to a significant increase of the value of
##                 the measure. 
##  
## ******************************************************************* 
## * Among all indices:                                                
## * 4 proposed 2 as the best number of clusters 
## * 7 proposed 3 as the best number of clusters 
## * 1 proposed 5 as the best number of clusters 
## * 5 proposed 6 as the best number of clusters 
## * 3 proposed 10 as the best number of clusters 
## * 3 proposed 15 as the best number of clusters 
## 
##                    ***** Conclusion *****                            
##  
## * According to the majority rule, the best number of clusters is  3 
##  
##  
## *******************************************************************

I will apply K = 3 in the following steps.

set.seed(2018)
pam <- pam(hpi_scale, diss=FALSE, k = 3, keep.data=TRUE)

Number of countries assigned in each cluster.

fviz_silhouette(pam)

##   cluster size ave.sil.width
## 1       1   43          0.46
## 2       2   66          0.32
## 3       3   31          0.37

This prints out one typical country represents each cluster.

hpi$Country[pam$id.med]

## [1] "Liberia" "Romania" "Ireland"

2.2.1 The visualization of Pam Clustering Analysis

I use fviz_cluster() which provides ggplot2-based elegant visualization of partitioning methods including kmeans, pam, HCPC (from FactoMineR), etc. Observations are represented by points in the plot, using principal components if ncol(data) > 2. An ellipse is drawn around each cluster.

fviz_cluster(pam, stand = FALSE, geom = "point",
             ellipse.type = "norm", ggtheme = theme_classic())

Hmmm.. It’s easy to understand, but which country belong to which cluster? Well, I think we’ll get the idea better if we use world map to visualize our cluster.

2.3 A World map of three clusters

I join the map using map_data() and our Happy Planet Index dataset with left_join() from dplyr package, with a little tweaking in “USA”.

hpi['Cluster'] <- as.factor(pam$clustering)
map <- map_data("world")

## 
## Attaching package: 'maps'

## The following object is masked from 'package:cluster':
## 
##     votes.repub

## The following object is masked from 'package:purrr':
## 
##     map

map$region[map$region == "USA"] <- "United States of America"
map1 <- left_join(map, hpi[, c("Region", "Country", "Avg.Life.Expectancy", "Avg.Wellbeing", "Footprint.gha", "Inequality", "HPI", "GDP", "Population", "Cluster")], by = c("region" = "Country"))

glimpse(map1)

## Observations: 99,338
## Variables: 15
## $ long                <dbl> -69.89912, -69.89571, -69.94219, -70.00415...
## $ lat                 <dbl> 12.45200, 12.42300, 12.43853, 12.50049, 12...
## $ group               <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, ...
## $ order               <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 14,...
## $ region              <chr> "Aruba", "Aruba", "Aruba", "Aruba", "Aruba...
## $ subregion           <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ Region              <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, Mi...
## $ Avg.Life.Expectancy <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 59...
## $ Avg.Wellbeing       <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 3....
## $ Footprint.gha       <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0....
## $ Inequality          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0....
## $ HPI                 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 20...
## $ GDP                 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 69...
## $ Population          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 29...
## $ Cluster             <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1,...

HPI_map <- ggplot(map1) + 
  geom_polygon(aes(x = long, 
                   y = lat, 
                   group = group, 
                   fill = Cluster,
                   colour = Cluster)) +
  coord_equal() +
  labs(title = "Clustering Happy Planet Index",
       subtitle = "Based on data from: http://happyplanetindex.org/",
       x = NULL, 
       y = NULL) +
  theme_classic() +
  theme(plot.title = element_text(face = "bold"),
        plot.subtitle = element_text(face = "italic"),
        legend.position = "bottom",
        legend.justification = "center", 
        legend.title = element_text(face = "bold"),
        )

HPI_map

2.4 Typical characteristics from every cluster

2.4.1 Cluster 1

C1 <- hpi[hpi$Cluster == 1, ]
C1 <- C1[, c("Country", "Region", "Avg.Life.Expectancy", "Avg.Wellbeing", "Footprint.gha", "Inequality", "HPI", "GDP", "Population")]

C1$Country

##  [1] "Afghanistan"       "Benin"             "Bolivia"          
##  [4] "Botswana"          "Burkina Faso"      "Burundi"          
##  [7] "Cambodia"          "Cameroon"          "Chad"             
## [10] "Comoros"           "Cote d'Ivoire"     "Djibouti"         
## [13] "Ethiopia"          "Gabon"             "Ghana"            
## [16] "Guinea"            "Haiti"             "India"            
## [19] "Kenya"             "Lesotho"           "Liberia"          
## [22] "Malawi"            "Mauritania"        "Mozambique"       
## [25] "Myanmar"           "Namibia"           "Niger"            
## [28] "Nigeria"           "Pakistan"          "Republic of Congo"
## [31] "Rwanda"            "Senegal"           "Sierra Leone"     
## [34] "South Africa"      "Swaziland"         "Syria"            
## [37] "Tanzania"          "Togo"              "Turkmenistan"     
## [40] "Uganda"            "Yemen"             "Zambia"           
## [43] "Zimbabwe"

summary(C1[, c(2:9)])

##                           Region   Avg.Life.Expectancy Avg.Wellbeing  
##  Americas                    : 2   Min.   :48.91       Min.   :2.867  
##  Asia Pacific                : 4   1st Qu.:56.71       1st Qu.:3.900  
##  Europe                      : 0   Median :60.31       Median :4.400  
##  Middle East and North Africa: 3   Mean   :59.88       Mean   :4.340  
##  Post-communist              : 1   3rd Qu.:63.44       3rd Qu.:4.850  
##  Sub Saharan Africa          :33   Max.   :70.39       Max.   :6.000  
##  Footprint.gha     Inequality          HPI             GDP         
##  Min.   :0.610   Min.   :0.2630   Min.   :12.78   Min.   :  244.2  
##  1st Qu.:1.030   1st Qu.:0.3529   1st Qu.:16.52   1st Qu.:  670.2  
##  Median :1.240   Median :0.3873   Median :19.63   Median : 1158.8  
##  Mean   :1.559   Mean   :0.3867   Mean   :20.11   Mean   : 1917.2  
##  3rd Qu.:1.610   3rd Qu.:0.4268   3rd Qu.:22.93   3rd Qu.: 1664.2  
##  Max.   :5.470   Max.   :0.5073   Max.   :31.50   Max.   :10642.4  
##    Population       
##  Min.   :7.337e+05  
##  1st Qu.:5.608e+06  
##  Median :1.457e+07  
##  Mean   :5.414e+07  
##  3rd Qu.:2.564e+07  
##  Max.   :1.264e+09

Cluster 1 are mostly from Sub Saharan Africa, the countries experiencing conflicts such as Afghanistan, Syria, and Myanmar.

With low income (average GDP is USD 1917), low wellbeing score (average 4.34) and low life expectancy (60 years old), the average HPI of the countries in this cluster is 20.11, the lowest from the three.

2.4.2 Cluster 2

C2 <- hpi[hpi$Cluster == 2, ]
C2 <- C2[, c("Country", "Region", "Avg.Life.Expectancy", "Avg.Wellbeing", "Footprint.gha", "Inequality", "HPI", "GDP", "Population")]

C2$Country

##  [1] "Albania"                "Algeria"               
##  [3] "Argentina"              "Armenia"               
##  [5] "Bangladesh"             "Belarus"               
##  [7] "Belize"                 "Bhutan"                
##  [9] "Bosnia and Herzegovina" "Brazil"                
## [11] "Bulgaria"               "China"                 
## [13] "Colombia"               "Croatia"               
## [15] "Dominican Republic"     "Ecuador"               
## [17] "Egypt"                  "El Salvador"           
## [19] "Estonia"                "Georgia"               
## [21] "Greece"                 "Guatemala"             
## [23] "Honduras"               "Hungary"               
## [25] "Indonesia"              "Iran"                  
## [27] "Iraq"                   "Jamaica"               
## [29] "Kazakhstan"             "Kyrgyzstan"            
## [31] "Latvia"                 "Lebanon"               
## [33] "Lithuania"              "Macedonia"             
## [35] "Malaysia"               "Malta"                 
## [37] "Mauritius"              "Mongolia"              
## [39] "Montenegro"             "Morocco"               
## [41] "Nepal"                  "Nicaragua"             
## [43] "Palestine"              "Panama"                
## [45] "Paraguay"               "Peru"                  
## [47] "Philippines"            "Poland"                
## [49] "Portugal"               "Romania"               
## [51] "Russia"                 "Serbia"                
## [53] "Slovakia"               "Sri Lanka"             
## [55] "Suriname"               "Tajikistan"            
## [57] "Thailand"               "Trinidad and Tobago"   
## [59] "Tunisia"                "Turkey"                
## [61] "Ukraine"                "Uruguay"               
## [63] "Uzbekistan"             "Vanuatu"               
## [65] "Venezuela"              "Vietnam"

summary(C2[, c(2:9)])

##                           Region   Avg.Life.Expectancy Avg.Wellbeing  
##  Americas                    :18   Min.   :67.95       Min.   :4.200  
##  Asia Pacific                :12   1st Qu.:70.85       1st Qu.:4.800  
##  Europe                      : 3   Median :74.03       Median :5.400  
##  Middle East and North Africa: 9   Mean   :73.48       Mean   :5.404  
##  Post-communist              :23   3rd Qu.:75.36       3rd Qu.:5.900  
##  Sub Saharan Africa          : 1   Max.   :80.50       Max.   :7.100  
##  Footprint.gha     Inequality          HPI             GDP         
##  Min.   :0.720   Min.   :0.1018   Min.   :14.27   Min.   :  685.5  
##  1st Qu.:1.890   1st Qu.:0.1652   1st Qu.:25.17   1st Qu.: 3599.3  
##  Median :2.750   Median :0.1904   Median :29.01   Median : 5942.5  
##  Mean   :3.022   Mean   :0.1983   Mean   :29.29   Mean   : 7698.1  
##  3rd Qu.:3.825   3rd Qu.:0.2292   3rd Qu.:34.44   3rd Qu.:11798.9  
##  Max.   :7.920   Max.   :0.3111   Max.   :40.70   Max.   :22242.7  
##    Population       
##  Min.   :2.475e+05  
##  1st Qu.:3.484e+06  
##  Median :9.692e+06  
##  Mean   :4.958e+07  
##  3rd Qu.:3.293e+07  
##  Max.   :1.351e+09

Cluster 2 are dominated with post-communist and developing countries in Asia Pacific and Americas.

The average HPI of the countries in this cluster is 29.29, increases significantly compared to cluster 1’s average HPI.

2.4.3 Cluster 3

C3 <- hpi[hpi$Cluster == 3, ]
C3 <- C3[, c("Country", "Region", "Avg.Life.Expectancy", "Avg.Wellbeing", "Footprint.gha", "Inequality", "HPI", "GDP", "Population")]

C3$Country

##  [1] "Australia"                "Austria"                 
##  [3] "Belgium"                  "Canada"                  
##  [5] "Chile"                    "Costa Rica"              
##  [7] "Cyprus"                   "Czech Republic"          
##  [9] "Denmark"                  "Finland"                 
## [11] "France"                   "Germany"                 
## [13] "Hong Kong"                "Iceland"                 
## [15] "Ireland"                  "Israel"                  
## [17] "Italy"                    "Japan"                   
## [19] "Luxembourg"               "Mexico"                  
## [21] "Netherlands"              "New Zealand"             
## [23] "Norway"                   "Oman"                    
## [25] "Slovenia"                 "South Korea"             
## [27] "Spain"                    "Sweden"                  
## [29] "Switzerland"              "United Kingdom"          
## [31] "United States of America"

summary(C3[, c(2:9)])

##                           Region   Avg.Life.Expectancy Avg.Wellbeing  
##  Americas                    : 5   Min.   :76.30       Min.   :5.500  
##  Asia Pacific                : 5   1st Qu.:80.17       1st Qu.:6.450  
##  Europe                      :17   Median :81.11       Median :7.000  
##  Middle East and North Africa: 2   Mean   :80.80       Mean   :6.897  
##  Post-communist              : 2   3rd Qu.:81.89       3rd Qu.:7.400  
##  Sub Saharan Africa          : 0   Max.   :83.57       Max.   :7.800  
##  Footprint.gha      Inequality           HPI             GDP        
##  Min.   : 2.840   Min.   :0.04322   Min.   :13.15   Min.   :  9703  
##  1st Qu.: 5.000   1st Qu.:0.07118   1st Qu.:24.71   1st Qu.: 28758  
##  Median : 5.600   Median :0.08537   Median :30.02   Median : 44011  
##  Mean   : 6.114   Mean   :0.09316   Mean   :29.03   Mean   : 43775  
##  3rd Qu.: 6.838   3rd Qu.:0.10641   3rd Qu.:31.79   3rd Qu.: 50466  
##  Max.   :15.820   Max.   :0.18770   Max.   :44.71   Max.   :105447  
##    Population       
##  Min.   :   320716  
##  1st Qu.:  4836360  
##  Median :  9519374  
##  Mean   : 36172836  
##  3rd Qu.: 48388748  
##  Max.   :314112078

Cluster 3, the happiest of them all. The average HPI of the countries in this cluster is 29.03, almost the same with cluster 2, but there’s a big difference in average life expectancy (73.48 vs 80.80), average wellbeing (5.404 vs 6.897), inequality (19.83% vs 9.31%), and carbon footprint (3.022 gha vs 6.114 gha). The countries in cluster 3 produce more carbon footprint than countries in cluster 2 (more than twice per capita) and that what makes the HPI score is practically the same with countries in cluster 2.

3 Final Visualization for Happy Planet Index 2016

I will use plotly() to make my visualization more interactive, so we can observe another data such as Life Expectation, Wellbeing, etc from a country.

ggplotly(
ggplot(map1, aes(text = paste("Country : ", map1$region, "\n",
                              "Life Exp : ", floor(map1$Avg.Life.Expectancy), "\n",
                              "Wellbeing : ", round(map1$Avg.Wellbeing, 2), "\n",
                              "Footprint : ", round(map1$Footprint.gha, 2), " gha", "\n",
                              "GDP : USD ", floor(map1$GDP), "\n",
                              "Population : ", format(map1$Population, big.mark = ","), "\n",
                              "HPI : ", round(map1$HPI, 2), "\n",
                              sep = ""))
       ) + 
  geom_polygon(aes(x = long, 
                   y = lat, 
                   group = group, 
                   fill = Cluster)) +
  coord_equal() +
  ggtitle("Clustering Happy Planet Index") +
  labs(x = "Longitude",
       y = "Latitude"),
tooltip ="text"
)

Happy Planet Index 2016 - Unsupervised Learning

Alvernia Eka Poetry

1/4/2019