Introduction

The World Happiness Report is an annual publication of the United Nations Sustainable Development Solutions Network. The first World Happiness Report was prepared in 2012 in support of the United Nations High-Level meeting on ‘Well-Being and Happiness: Defining a New Economic Paradigm’ which was held on 2 April 2012. The Report details and ranks the countries by how happy their citizens perceive themselves to be.

It stresses on how the social well-bring of citizens in a country is not only defined by the macro-economic factors such as Gross Domestic Product (GDP) indicators but also by social indicators such as Freedom to make life choices and Healthy Life Expectancy. The World Happiness Report 2020 is focused mainly on the impact of the social environments on the well-being of citizens.

We will study how the variables indicators such as Ladder score, Logged GDP per capira, Perceptions of corruption , social support, Generosity, Healthy Life expectancy and Freedom to make life choices are related to each other.

Through this Exploratory Data Analysis (EDA), we also aim to answer the below questions:

  • Are there any other social indicators other than GDP per capita to evaluate the well-being of the citizens of the country? Can these social indicators be used along with macro economic indicators when developing social policies for its people?

  • Does having a high GDP per capita always mean a higher level of happiness perceived by the people?

  • What factors influence the Ladder scores (Happiness level) of each country or across regions ?

  • What factors are correlated to each other?

  • What are the countries that have reported high levels of happiness scores but low levels of happiness scores?

  • How does the average life expectancy of the citizens vary across the regions?

Data and Design Challenges and Solutions

Sourcing for geospatial data and Data cleaning

I initially wanted to compare the happiness scores across the years. However, upon inspecting the datasets of the previous years, there were some challenges posed. For example, certain countries such as Hong Kong had undergone name changes and hence, when the datasets of different years were combined, some duplicates of some countries were observed.

Hence, for this analysis, the scope is just limited to the World Happiness 2020 dataset. It was also important to understand how the variables and its scores were calculated. The Happiness 2020 dataset was sourced from Kaggle. (https://www.kaggle.com/mathurinache/world-happiness-report)

The geospatial data was sourced from https://hub.arcgis.com/datasets/a21fdb46d23e4ef896f31475217cbb08_1/data

  • 251 countries

Below are the variables from the combined dataset:

  • CNTRY_NAME - Country name (141 countries)

  • Region.indicator - Region (10 Regions)

  • Ladder.score - Life evaluation score

  • Logged.GDP.per.capita - Extent to which GDP contributes to the calculation of the Ladder score

  • Healthy Life Expectancy - Healthy life expectancies at birth based on the data extracted from the World Health Organisation (WHO) data repository

  • Social support - Defined as having someone to count on in times of trouble (ranked from 0 to 1)

  • Freedom to make life choices - Defined as the national average of responses to the Gall-WorldPoll question (“Are you satisfied or dissatisfied with your freedom to choose what you do with your life?”)

  • Generosity - National average of responses to the question - “Have you donated money to a charity in the past month?”

  • Perception of corruption - National average of responses to the questions (“Is corruption widespread throughout the government or not” and “Is corruption widespread within businesses or not?” )

Details on the metadata of the dataset is provided under the References section.

When the World Happiness 2020 dataset (153 countries) was combined with the geo-spatial dataset (251 countries), there were 12 countries omitted as a result of the join. After much result, realised it could because of the fact that some countries were written/labelled differently. Hence, those countries were not included in the join. Some examples include Hong Kong and Hong Kong SAR.

So, in this analysis, only 141 countries were taken into consideration. This part of the pre-processing step was particularly challenging.

Proposed Sketched Design

Data Visualisation Preparation

In this step, the steps to prepare the various visualisations will be discussed.

Loading datasets

Load the below datasets :

  • ‘2020.csv’ using read.csv()
  • ‘99bfd9e7-bb42-4728-87b5-07f8c8ac631c2020328-1-1vef4ev.lu5nk.shp’ using read_sf() as it is a spatial vector data. This dataset contains geometry coordinates and will be used to combine with the world happiness 2020 dataset to be able to construct maps.

The world happiness 2020 dataset contains 153 countries and has categorised the countries into 10 different regions such as Western Europe, North America and ANZ, Middle East and North Africa, Latin America and Caribbean, Central and Eastern Europe, East Asis, Southeast Asia, Commonwealth of Independent States, Sub-Saharan Africa and South Asia.

The geo-spatial data has been sourced from . After combining the two datasets, it was observed that there were 12 countries that did not match. This is one of the challenges faced in a data-driven problem which will be further elaborated in the ‘Challenges’ section.

Hence, in this exercise, we will be studying the impacts of the various social indicators.

## 'data.frame':    153 obs. of  20 variables:
##  $ Country.name                              : Factor w/ 153 levels "Afghanistan",..: 43 36 132 57 106 100 131 101 7 81 ...
##  $ Regional.indicator                        : Factor w/ 10 levels "Central and Eastern Europe",..: 10 10 10 10 10 10 10 6 10 10 ...
##  $ Ladder.score                              : num  7.81 7.65 7.56 7.5 7.49 ...
##  $ Standard.error.of.ladder.score            : num  0.0312 0.0335 0.035 0.0596 0.0348 ...
##  $ upperwhisker                              : num  7.87 7.71 7.63 7.62 7.56 ...
##  $ lowerwhisker                              : num  7.75 7.58 7.49 7.39 7.42 ...
##  $ Logged.GDP.per.capita                     : num  10.6 10.8 11 10.8 11.1 ...
##  $ Social.support                            : num  0.954 0.956 0.943 0.975 0.952 ...
##  $ Healthy.life.expectancy                   : num  71.9 72.4 74.1 73 73.2 ...
##  $ Freedom.to.make.life.choices              : num  0.949 0.951 0.921 0.949 0.956 ...
##  $ Generosity                                : num  -0.0595 0.0662 0.1059 0.2469 0.1345 ...
##  $ Perceptions.of.corruption                 : num  0.195 0.168 0.304 0.712 0.263 ...
##  $ Ladder.score.in.Dystopia                  : num  1.97 1.97 1.97 1.97 1.97 ...
##  $ Explained.by..Log.GDP.per.capita          : num  1.29 1.33 1.39 1.33 1.42 ...
##  $ Explained.by..Social.support              : num  1.5 1.5 1.47 1.55 1.5 ...
##  $ Explained.by..Healthy.life.expectancy     : num  0.961 0.979 1.041 1.001 1.008 ...
##  $ Explained.by..Freedom.to.make.life.choices: num  0.662 0.665 0.629 0.662 0.67 ...
##  $ Explained.by..Generosity                  : num  0.16 0.243 0.269 0.362 0.288 ...
##  $ Explained.by..Perceptions.of.corruption   : num  0.478 0.495 0.408 0.145 0.434 ...
##  $ Dystopia...residual                       : num  2.76 2.43 2.35 2.46 2.17 ...
## tibble [251 x 3] (S3: sf/tbl_df/tbl/data.frame)
##  $ OBJECTID  : int [1:251] 1 2 3 4 5 6 7 8 9 10 ...
##  $ CNTRY_NAME: chr [1:251] "Aruba" "Antigua and Barbuda" "Afghanistan" "Algeria" ...
##  $ geometry  :sfc_MULTIPOLYGON of length 251; first list element: List of 1
##   ..$ :List of 1
##   .. ..$ : num [1:11, 1:2] -69.9 -69.9 -70.1 -70.1 -70 ...
##   ..- attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"
##  - attr(*, "sf_column")= chr "geometry"
##  - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA
##   ..- attr(*, "names")= chr [1:2] "OBJECTID" "CNTRY_NAME"
##  [1] "OBJECTID"                                  
##  [2] "CNTRY_NAME"                                
##  [3] "geometry"                                  
##  [4] "Regional.indicator"                        
##  [5] "Ladder.score"                              
##  [6] "Standard.error.of.ladder.score"            
##  [7] "upperwhisker"                              
##  [8] "lowerwhisker"                              
##  [9] "Logged.GDP.per.capita"                     
## [10] "Social.support"                            
## [11] "Healthy.life.expectancy"                   
## [12] "Freedom.to.make.life.choices"              
## [13] "Generosity"                                
## [14] "Perceptions.of.corruption"                 
## [15] "Ladder.score.in.Dystopia"                  
## [16] "Explained.by..Log.GDP.per.capita"          
## [17] "Explained.by..Social.support"              
## [18] "Explained.by..Healthy.life.expectancy"     
## [19] "Explained.by..Freedom.to.make.life.choices"
## [20] "Explained.by..Generosity"                  
## [21] "Explained.by..Perceptions.of.corruption"   
## [22] "Dystopia...residual"

Correlation Plot

Social support is highly correlated to Ladder Score (0.76), Healthy Life Expectancy (0.73) and the Logged GDP per Capita (0.78).

Logged GDP per Capita is highly correlated to Ladder Score (0.77), Healthy Life Expectancy (0.83) and Social support (0.78).

The Happiness 2O20 dataset is quite different from the datasets of previous years (2019,2018,2017). Happiness level of countries were previously referred to happiness scores. In the new dataset (2020), Happiness scores are now instead renamed to Ladder Scores to better assess the life evaluation score for analysis.

The Ladder Scores represent the happiness level of each country. According to the World Happiness Report, it is the national average response to the question of life evaluations. The Ladder Score can be compared to this analogy - “Please imagine a Ladder, with steps numbered from 0 at the bottom to 10 at the top” which was a survey question given to the respondents in the respective countries. The top of the ladder means the best possible life and the bottom of the ladder means the worst possible life and the respondents provide their ratings on the scale. Based on this, the Ladders scores were computed. A person who feels he has the opportunity to best improve his life in his/her country would assign a higher Ladder score.

Ladder Score can be best explained by Healthy Life Expectancy, Logged GDP per Capita and Social support as observed from the Correlation plot.

The heatmaply package was loaded to plot the interactive heatmap and the RColorBrewer package was loaded to apply the palette colours. To plot the heatmap, the x values input into the corr() function had to be in numeric format. Hence, the dataset was converted from a tibble to a dataframe using the as.data.frame() function.

Data Visualisation charts

Scatter plot

From our correlation matrix plot above, we can observe that the variables Ladder Score and Social Support are closely correlated.

The below scatter plots have been created with the regions grouped.

Ladder Score vs Social Support

  • From the above chart, we can observe that the Social support scores and Ladder scores have a linear relationship. Most of the countries in Western Europe region score high in both in terms of Social support and Ladder Scores. Afganistan saw the lowest Ladder score and a social support score of 0.47. The Central African Republic in Sub-Saharan Africa region scored the lowest in terms of social support scores.

  • The countries in the Sub-Saharan Africa seem to have a larger spread of ladder scores and social support scores. Mauritus ia performing very well in both ladder scores (6.10) and social support scores (0.91). At the same time, there are also countries such as Benin which has a high ladder score (5.21) but relatively low support score (0.47).

  • This would provide a good gauge for countries to focus on certain social policies to improve the social support scores.

Logged GDP per capita vs Healthy Life Expectancy

We observed from the correlation plot that the GDP per capita is also correlated to the Healthy Life Expectancy.

Hence, we wanted to explore if countries with a high GDP per capita observe a higher Life Expectancy?

  • The countries in Western Europe with high GDP per capita generally has higher level of life expectancy compared to the countries in other regions.

  • We can also observe from this chart that Singapore being a country with one of the highest GDP per capita, also has a high level of life expectancy of about 76.8.

  • Central African Republic in the Sub-Saharan Africa region has one of the lowest GDP per capita (6.63) and the lowest life expectancy (45). The violence and displacement of these people are some of the leading causes of the low life expectancy. Central African Republic has been facing decades of political instability since it gained independence from France in 1990. They have also been grappling with diseases such as AIDS/HIV, influenza, pneumonia and diarrhael diseases.

  • More details on the situation of Central African Republic can be found below: (https://borgenproject.org)

Dot plot

Top 10 and Bottom 10 Happiest Countries

In this section, we will look at the top ten and bottom happiest countries by looking at their Ladder scores. The arrange() function was applied to the combined_dataset to arrange the records by decreasing order of Ladder Score and head() function was applied to extract the top 10 and bottom 10 happiest countries respectively.

When plotting the dot plot, we wanted to show the countries in decreasing order of Ladder scores. Hence, the reorder(CNTY_NAME,-Ladder.score) was applied to sort the countries in decreasing order of the Ladder score.

  • From the dot plot, we can observe the most of the top 10 countries are from the Western Europe region.
  • Finland has reported the highest Ladder score and is seen to be one of the happiest countries in the world.
  • From the above plot, we can observe that most of the countries that have reported low ladder scores are from the Sub-Saharan Africa region.

Box plot

What is the Ladder score across the different regions?

It can be observed that in general, the countries in the North America and ANZ region and in the Western Europe region have reported high mean ladder scores. However, the spread of the ladder scores is higher in the Western Europe region. The maximum ladder score is 7.8 and the minimum ladder score is 5.51 in Western Europe.

What is the average life expectancy across the different regions?

  • The mean life expectancy is the highest in the Western Europe region with an average life expectancy of 72.8.
  • We can also observe that the mean life expectancy age is the lowest in the Sub-Saharan region is the lowest with an average life expectancy of 55.
  • This could be attributed to fact that many of the countries in the Sub-Saharan region are experiencing political unrest and have limited resources to healthcare.

Map visualisation

The tmap package was loaded to create the below charts. In this analysis, we will focus on the countries in the Southeast region. With the help of the tm_text() function, we were able to load the respective countries’ names over the area of the map.

The various social indicators such as Ladder score, Logged GDP per capita, Perceptions of corruption , social support, Generosity, Healthy Life expectancy and Freedom to make life choices were plot across the countries in the Southeast Asia.

## tmap mode set to interactive viewing
## tmap style set to "watercolor"
## other available styles are: "white", "gray", "natural", "cobalt", "col_blind", "albatross", "beaver", "bw", "classic"
## Text size will be constant in view mode. Set tm_view(text.size.variable = TRUE) to enable variable text sizes.
## tmap style set to "watercolor"
## other available styles are: "white", "gray", "natural", "cobalt", "col_blind", "albatross", "beaver", "bw", "classic"
## Text size will be constant in view mode. Set tm_view(text.size.variable = TRUE) to enable variable text sizes.
## tmap style set to "watercolor"
## other available styles are: "white", "gray", "natural", "cobalt", "col_blind", "albatross", "beaver", "bw", "classic"
## Text size will be constant in view mode. Set tm_view(text.size.variable = TRUE) to enable variable text sizes.
## tmap style set to "watercolor"
## other available styles are: "white", "gray", "natural", "cobalt", "col_blind", "albatross", "beaver", "bw", "classic"
## Text size will be constant in view mode. Set tm_view(text.size.variable = TRUE) to enable variable text sizes.
## tmap style set to "watercolor"
## other available styles are: "white", "gray", "natural", "cobalt", "col_blind", "albatross", "beaver", "bw", "classic"
## Text size will be constant in view mode. Set tm_view(text.size.variable = TRUE) to enable variable text sizes.
## tmap style set to "watercolor"
## other available styles are: "white", "gray", "natural", "cobalt", "col_blind", "albatross", "beaver", "bw", "classic"
## Text size will be constant in view mode. Set tm_view(text.size.variable = TRUE) to enable variable text sizes.
## tmap style set to "watercolor"
## other available styles are: "white", "gray", "natural", "cobalt", "col_blind", "albatross", "beaver", "bw", "classic"
## Text size will be constant in view mode. Set tm_view(text.size.variable = TRUE) to enable variable text sizes.

The below observations can be made from the above maps.

  • Countries with high GDP per capita generally report a higher perception of corruption.
  • Thailand has the highest social support score reported.
  • Thailand and Vietnam are some of the countries that have reported a higher level of healthy life expectancy.
  • Countries in the Southeast Asia report a minimum score of 0.871 when it comes to having the Freedom to life choices.

Insights drawn

  • A high GDP per capita does not always mean that the happiness level/satisfaction level perceived among its citizens is high as well.

  • The low life expectancy is very concerning in countries such as the Central African Republic.

  • Finland is one of the happiest countries to live in with a high ladder score reported.

  • Countries with high GDP per capita tend to have high perception of corruption. More trust has to be built among its citizens.

  • In countries with high GDP per capita, higher level of life expectancy is observed in most cases.