1 Introduction

The World Happiness Report (WHR) is a publication of the United Nations Sustainable Development Solutions Network. It contains articles and rankings of national happiness, based on respondent ratings of their own lives, which the report also correlates with various (quality of) life factors.

The first World Happiness Report was released on April 1, 2012 as a foundational text for the UN High Level Meeting: Well-being and Happiness: Defining a New Economic Paradigm, drawing international attention. The first report outlined the state of world happiness, causes of happiness and misery, and policy implications highlighted by case studies. In 2013, the second World Happiness Report was issued, and in 2015 the third. Since 2016, it has been issued on an annual basis on the 20th of March, to coincide with the UN’s International Day of Happiness.

Usually, WHR uses six key variables contribute to explaining the full sample of national annual average scores. These variables are GDP per capita, social support, healthy life expectancy, freedom, generosity, and absence of corruption.

Since The World Happiness Report has been focusing on ranking, in this exploration, we will try to make a comparison of each metrics used, then ranking each on of them, and see the contrast between the earliest available data, which is 2008 and the latest that is 2020 data (the 2020 data used for 2021 report).

Load Library

library(tidyverse)
library(dplyr)
library(ggpubr)
library(ggrepel)
library(GGally)

Load The Data

wh21_raw <- read.csv(file = "world-happiness-report-2021.csv")
wh_raw <- read.csv("world-happiness-report.csv")

2 Basic Information

wh_raw

str(wh_raw)
## 'data.frame':    1949 obs. of  11 variables:
##  $ ï..Country.name                 : chr  "Afghanistan" "Afghanistan" "Afghanistan" "Afghanistan" ...
##  $ year                            : int  2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 ...
##  $ Life.Ladder                     : num  3.72 4.4 4.76 3.83 3.78 ...
##  $ Log.GDP.per.capita              : num  7.37 7.54 7.65 7.62 7.71 ...
##  $ Social.support                  : num  0.451 0.552 0.539 0.521 0.521 0.484 0.526 0.529 0.559 0.491 ...
##  $ Healthy.life.expectancy.at.birth: num  50.8 51.2 51.6 51.9 52.2 ...
##  $ Freedom.to.make.life.choices    : num  0.718 0.679 0.6 0.496 0.531 0.578 0.509 0.389 0.523 0.427 ...
##  $ Generosity                      : num  0.168 0.19 0.121 0.162 0.236 0.061 0.104 0.08 0.042 -0.121 ...
##  $ Perceptions.of.corruption       : num  0.882 0.85 0.707 0.731 0.776 0.823 0.871 0.881 0.793 0.954 ...
##  $ Positive.affect                 : num  0.518 0.584 0.618 0.611 0.71 0.621 0.532 0.554 0.565 0.496 ...
##  $ Negative.affect                 : num  0.258 0.237 0.275 0.267 0.268 0.273 0.375 0.339 0.348 0.371 ...

Changing the column names of wh_raw and naming the dataframe wh

wh <- 
wh_raw %>% 
  select(country = ï..Country.name, 
         life_ladder = Life.Ladder,
         log_gdp_capita = Log.GDP.per.capita,
         social_support = Social.support, 
         health_birth_expectancy = Healthy.life.expectancy.at.birth,
         freedom = Freedom.to.make.life.choices,
         generosity = Generosity,
         corruption = Perceptions.of.corruption, 
         year,
         postive_affect = Positive.affect,
         negative_affect = Negative.affect)

wh21_raw

str(wh21_raw)
## 'data.frame':    149 obs. of  20 variables:
##  $ ï..Country.name                           : chr  "Finland" "Denmark" "Switzerland" "Iceland" ...
##  $ Regional.indicator                        : chr  "Western Europe" "Western Europe" "Western Europe" "Western Europe" ...
##  $ Ladder.score                              : num  7.84 7.62 7.57 7.55 7.46 ...
##  $ Standard.error.of.ladder.score            : num  0.032 0.035 0.036 0.059 0.027 0.035 0.036 0.037 0.04 0.036 ...
##  $ upperwhisker                              : num  7.9 7.69 7.64 7.67 7.52 ...
##  $ lowerwhisker                              : num  7.78 7.55 7.5 7.44 7.41 ...
##  $ Logged.GDP.per.capita                     : num  10.8 10.9 11.1 10.9 10.9 ...
##  $ Social.support                            : num  0.954 0.954 0.942 0.983 0.942 0.954 0.934 0.908 0.948 0.934 ...
##  $ Healthy.life.expectancy                   : num  72 72.7 74.4 73 72.4 73.3 72.7 72.6 73.4 73.3 ...
##  $ Freedom.to.make.life.choices              : num  0.949 0.946 0.919 0.955 0.913 0.96 0.945 0.907 0.929 0.908 ...
##  $ Generosity                                : num  -0.098 0.03 0.025 0.16 0.175 0.093 0.086 -0.034 0.134 0.042 ...
##  $ Perceptions.of.corruption                 : num  0.186 0.179 0.292 0.673 0.338 0.27 0.237 0.386 0.242 0.481 ...
##  $ Ladder.score.in.Dystopia                  : num  2.43 2.43 2.43 2.43 2.43 2.43 2.43 2.43 2.43 2.43 ...
##  $ Explained.by..Log.GDP.per.capita          : num  1.45 1.5 1.57 1.48 1.5 ...
##  $ Explained.by..Social.support              : num  1.11 1.11 1.08 1.17 1.08 ...
##  $ Explained.by..Healthy.life.expectancy     : num  0.741 0.763 0.816 0.772 0.753 0.782 0.763 0.76 0.785 0.782 ...
##  $ Explained.by..Freedom.to.make.life.choices: num  0.691 0.686 0.653 0.698 0.647 0.703 0.685 0.639 0.665 0.64 ...
##  $ Explained.by..Generosity                  : num  0.124 0.208 0.204 0.293 0.302 0.249 0.244 0.166 0.276 0.215 ...
##  $ Explained.by..Perceptions.of.corruption   : num  0.481 0.485 0.413 0.17 0.384 0.427 0.448 0.353 0.445 0.292 ...
##  $ Dystopia...residual                       : num  3.25 2.87 2.84 2.97 2.8 ...

Again, changing the column names of wh21_raw and naming the dataframe wh21

wh21 <- 
wh21_raw %>% 
  select(country = ï..Country.name, 
         regional_indicator = Regional.indicator,
         life_ladder = Ladder.score,
         se_ladder_score = Standard.error.of.ladder.score,
         upperwhisker,
         lowerwhisker,
         log_gdp_capita = Logged.GDP.per.capita,
         social_support = Social.support, 
         health_birth_expectancy = Healthy.life.expectancy,
         freedom = Freedom.to.make.life.choices,
         generosity = Generosity,
         corruption = Perceptions.of.corruption, 
         dystopia_ladder_score = Ladder.score.in.Dystopia,
         exp_log_gdp = Explained.by..Log.GDP.per.capita,
         exp_social_support = Explained.by..Social.support, 
         exp_health_expectancy = Explained.by..Healthy.life.expectancy,
         exp_freedom = Explained.by..Freedom.to.make.life.choices,
         exp_generosity = Explained.by..Generosity,
         exp_corruption = Explained.by..Perceptions.of.corruption,
         dystopia_residual = Dystopia...residual
         )

Then selecting only the key metrics for each dataframe available.

wh_alt <- 
wh %>% 
  select(country,life_ladder, log_gdp_capita, social_support, health_birth_expectancy, freedom, generosity, corruption, year) %>% 
  filter(year!=2020)


wh21_alt <-  
wh21%>% 
  select(country,life_ladder, log_gdp_capita, social_support, health_birth_expectancy, freedom, generosity, corruption, regional_indicator ) %>% 
  mutate(year = 2020)

#Selecting 'country' and 'regional_indicator' columns in 'wh21_alt' and naming it continent
continent <- 
wh21_alt %>% 
  select(country, regional_indicator)

#Full join 'wh_alt' and continent by country that creates a new 'wh_alt' (overwrite)
wh_alt <- 
full_join(wh_alt, continent, by = 'country')

2.1 NA Checking

colSums(is.na(wh_alt))
##                 country             life_ladder          log_gdp_capita 
##                       0                       0                      29 
##          social_support health_birth_expectancy                 freedom 
##                      13                      52                      31 
##              generosity              corruption                    year 
##                      82                     104                       0 
##      regional_indicator 
##                       0
colSums(is.na(wh21_alt))
##                 country             life_ladder          log_gdp_capita 
##                       0                       0                       0 
##          social_support health_birth_expectancy                 freedom 
##                       0                       0                       0 
##              generosity              corruption      regional_indicator 
##                       0                       0                       0 
##                    year 
##                       0

We see that there are several NA value in wh_alt metrics, so we will have to deal with them seperately later.

Attempts to combine the two dataframes.

wh_all <- 
bind_rows(wh_alt, wh21_alt) %>% 
  mutate(year = as.character(year))

tail(wh_all, 5)

3 Global Dynamics of Each Metrics From 2008 to 2020

What a difference 12 years make.

  • GDP: GDP per capita is a measure for gauging the prosperity of nations, and used by economists to analyze the prosperity of a country based on its economic growth. Over 12 years, the average growth rate was 2.1%.
  • Social support: Social support includes people’s relationships and their willingness to help each other. There is an increase in the average indicator by 3.5%.
  • Life expectancy: Life expectancy are constructed based on data from the World Health Organization (WHO) Global Health Observatory data repository. The average life expectancy of the world population has increased by 5.8%.
  • Freedom to make life choices: It means freedom to choose what people do with their life. It is not surprising to see bigger growth in this particular metrics, which has increased by 16.3%, comparing to other factors. In the peak era and development of internet, have increased transparency, better access of information to support any decision.
  • Generosity: measured by recent donations and over 12 years, it has decreased by 143.3%, means these numbers keep decreasing, so less people donated/ gave money back to charity.
  • Perceptions of corruption: is a response to the question “Is corruption widespread throughout the government and business or not?” We my agree that a decrease in people’s perceptions of corruption is a good sign.

4 Best and Worst Possible Countries by Metrics 2008 v. 2020

4.1 GDP per Capita

4.2 Social Support

4.3 Life Expectancy

4.4 Freedom

4.5 Generosity

4.6 Perception of Corruption

5 Happiest and Least Happy Regions and Countries in 2021

5.1 Rankings of Happiness by Regions

5.2 Happiest and Least Happy Countries

Many researchers have noted that there is a correlation between economic growth and reductions in happiness inequality—even when income inequality is increasing at the same time. The possibility is that economic growth in rich countries has translated into a more diverse society in terms of cultural expressions (e.g. through the emergence of alternative lifestyles), which has allowed people to converge in happiness even if they diverge in incomes, tastes and consumption.

For example, Zimbabwe has been dealing with disrupted livelihoods to the extreme poor. In addition to that, supply-side challenges facing the health system, contributed to a decline in the coverage and quality of essential health services and also fronting an economic crisis exacerbated by the COVID-19 (coronavirus) pandemic.

As by regions, namely the main problem that worries the South Asian states is unemployment and the low standard of living of a significant part of the population. A strong overpopulation is also big problem. In addition to these ailments, the countries of South Asia suffer from hunger and lack of clean drinking water.

5.3 Dynamics of The Ladder Score by Region

ladder_dynamics <- 
wh_all %>% 
  select(regional_indicator, life_ladder, year) %>% 
  filter (year == 2008 | year == 2020) %>% 
  group_by(regional_indicator, year) %>% 
  summarise_all(mean) %>% 
  pivot_wider(names_from = "year", values_from = "life_ladder") %>% 
  mutate(difference = round((`2020`-`2008`)/`2008`*100,1)) %>% 
  pivot_longer(cols=`2008`:`2020`, names_to = "year", values_to = "ladder")
ladder_dynamics %>% 
  ggplot(aes(x=factor(year),
             y = ladder,
             color = regional_indicator))+
  geom_line(aes(group = regional_indicator))+
  geom_point(alpha =0.8) +
  geom_text_repel(data = . %>%
                    filter(year == 2008),
                  aes(x = year, y = ladder, 
                      label = regional_indicator),
                  xlim = c(NA, 1),
                  ylim = c(0.1, NA),
                  size = 2.9)+
  geom_text(data = . %>% 
              filter (year == 2020),
            mapping = aes(x = year, y = ladder,
                          label = paste(difference,"%", sep="")),
            hjust = -.25, size = 3, fontface ="bold")+
  scale_color_manual( values = c( "#097168","#097168", "#097168", "#097168",  "#878787", 
                                "#878787", "#878787", "#097168", "#097168", "#878787" ))+
  ylab ("Average Ladder Score") +
  theme_minimal() +
  theme(legend.position = "none",
        axis.title.x  =element_blank())

Here we can see how each region’s average Ladder Score has changed. Middle East and North Africa, South Asia, North America and ANZ and Western Europe are the region which they have lower average 2020 Happiness score than they were in 2008.

6 Indonesia v. Southeast Asia

Since the writer is Indonesian citizen and curious how Indonesia’s happiness compares to the country/ nations around. So the next thing to do is to make a comparison between Indonesia’s own subcontinent, Southeast Asia.

sea <- 
  wh_all %>% 
  mutate(year = as.numeric(year)) %>% 
  filter(regional_indicator == "Southeast Asia",
         year >2007)%>% 
  group_by(year) %>% 
  mutate(yrrank = row_number(-life_ladder))
ggplot(sea, 
       aes(x=year, y = life_ladder, label = yrrank)) + 
  geom_point(aes(color = country), size =10) + 
  theme_minimal() +
  geom_line(aes(color = country)) + 
  scale_x_continuous(breaks = seq(2008,2020,1)) + 
  scale_y_continuous(breaks = seq(0,8,0.5)) + 
  geom_text(color = 'white', size =5) + 
  scale_color_manual(values = c("#e4d1a4", "#750000", "#41b6c4","#1d91c0", "#bcc98e",
                                "#7fcdbb","#253494","#081d58","#2a7a78")) +
  labs(title = "Indonesia v. Southeast Asia", 
     subtitle = "Ranking the Happiness Score of Southeast Asian Countries Across The Years",
     x = "", y = "Happiness Score") + 
  theme(plot.title = element_text(size = 30, face = "bold"),
        plot.subtitle = element_text(size = 15),
        axis.title = element_text(size= 15),
        axis.text = element_text(size = 15),
        legend.direction = "horizontal", 
        legend.position = "bottom",
        legend.title = element_blank(),
        panel.grid.minor = element_blank()
        )

We can see that, Indonesia, most of the times are in the middle rank. We can also say that Phillipines has consistently managed to improve their Happiness Score. They went from being the 7th in 2008, to become the top 3 in the last 5 years period (2016 - 2020).