Data Visualization

World Happiness Report 2020

The World Happiness Report is a landmark survey of the state of global happiness. The report continues to gain global recognition as governments, organizations and civil society increasingly use happiness indicators to inform their policy-making decisions. The reports review the state of happiness in the world today and show how the new science of happiness explains personal and national variations in happiness. Data collecting from http://worldhappiness.report and http://www.gallup.com/corporate/212381/who-we-are.aspx

Dataset have a columns following the happiness score estimate the extent to which each of six factors – economic production / GDP per capita, social support, life expectancy, freedom, absence of corruption, and generosity – contribute to making life evaluations higher in each country than they are in Dystopia, a hypothetical country that has values equal to the world’s lowest national averages for each of the six factors.

Below description of the column in world happiness 2020 dataset :

  • Ladder score: Happiness score or subjective well-being. This is the national average response to the question of life evaluations.
  • Logged GDP per capita: Extent to which GDP contributes to the calculation of the Ladder score.
  • Social support: Social support refers to assistance or support provided by members of social networks (like government) to an individual.
  • Healthy life expectancy: Healthy life expectancy is the average life in good health - that is to say without irreversible limitation of activity in daily life or incapacities of a fictitious generation subject to the conditions of mortality and morbidity prevailing that year.
  • Freedom to make life choices: Freedom to make life choices is the national average of binary responses question “Are you satisfied or dissatisfied with your freedom to choose what you do with your life?”
  • Generosity is the residual of regressing national average of response to the GWP question “Have you donated money to a charity in the past month?” on GDP per capita.
  • Perceptions of corruption: The measure is the national average of the survey responses to two questions : “Is corruption widespread throughout the government or not” and “Is corruption widespread within businesses or not?”
  • Dystopia.residual: It has values equal to the world’s lowest national averages. Dystopia as a benchmark against which to compare contributions from each of the six factors. Dystopia is an imaginary country that has the world’s least-happy people.

Import Library

library(tidyverse) 
library(plotly) 
library(scales)
library(hrbrthemes)
library(dplyr)
library(tidyr)
library(reshape2)
library(ggcorrplot)
library(rnaturalearth)
library(rnaturalearthdata)
library(sf)
library(tmap)

Read Data

whr <- read.csv("data/world-happiness-report-2021.csv")
# Check structure data
glimpse(whr)
Rows: 149
Columns: 20
$ ï..Country.name                            <chr> "Finland", "Denmark", "Swit~
$ Regional.indicator                         <chr> "Western Europe", "Western ~
$ Ladder.score                               <dbl> 7.842, 7.620, 7.571, 7.554,~
$ Standard.error.of.ladder.score             <dbl> 0.032, 0.035, 0.036, 0.059,~
$ upperwhisker                               <dbl> 7.904, 7.687, 7.643, 7.670,~
$ lowerwhisker                               <dbl> 7.780, 7.552, 7.500, 7.438,~
$ Logged.GDP.per.capita                      <dbl> 10.775, 10.933, 11.117, 10.~
$ Social.support                             <dbl> 0.954, 0.954, 0.942, 0.983,~
$ Healthy.life.expectancy                    <dbl> 72.000, 72.700, 74.400, 73.~
$ Freedom.to.make.life.choices               <dbl> 0.949, 0.946, 0.919, 0.955,~
$ Generosity                                 <dbl> -0.098, 0.030, 0.025, 0.160~
$ Perceptions.of.corruption                  <dbl> 0.186, 0.179, 0.292, 0.673,~
$ Ladder.score.in.Dystopia                   <dbl> 2.43, 2.43, 2.43, 2.43, 2.4~
$ Explained.by..Log.GDP.per.capita           <dbl> 1.446, 1.502, 1.566, 1.482,~
$ Explained.by..Social.support               <dbl> 1.106, 1.108, 1.079, 1.172,~
$ Explained.by..Healthy.life.expectancy      <dbl> 0.741, 0.763, 0.816, 0.772,~
$ Explained.by..Freedom.to.make.life.choices <dbl> 0.691, 0.686, 0.653, 0.698,~
$ Explained.by..Generosity                   <dbl> 0.124, 0.208, 0.204, 0.293,~
$ Explained.by..Perceptions.of.corruption    <dbl> 0.481, 0.485, 0.413, 0.170,~
$ Dystopia...residual                        <dbl> 3.253, 2.868, 2.839, 2.967,~

Preprocessing Data

Data Cleaning and Wrangling

# Check whether any missing data in dataset column : 
anyNA(whr)
[1] FALSE

there is no missing value in whr dataset column, then I want to modify some column name in dataset, to make it easier for the analysis.

whr <-  whr %>%
  rename(
    Country = ï..Country.name,
    Region = Regional.indicator,
    Score = Ladder.score,
    GDP.per.capita = Logged.GDP.per.capita,
    Dystopia.Residual = Dystopia...residual
  )

remove some column which not use in analysis processing

# remove column by index column number:
whr <-  whr[-c(4,5,6,13,14,15,16,17,18,19)]  
names(whr)
 [1] "Country"                      "Region"                      
 [3] "Score"                        "GDP.per.capita"              
 [5] "Social.support"               "Healthy.life.expectancy"     
 [7] "Freedom.to.make.life.choices" "Generosity"                  
 [9] "Perceptions.of.corruption"    "Dystopia.Residual"           

There is one data type of the column still not accurate, which is Region column , it should be modify into factor data type :

# modifying Region column data type into factor
whr$Region <- as.factor(whr$Region)
glimpse(whr)
Rows: 149
Columns: 10
$ Country                      <chr> "Finland", "Denmark", "Switzerland", "Ice~
$ Region                       <fct> Western Europe, Western Europe, Western E~
$ Score                        <dbl> 7.842, 7.620, 7.571, 7.554, 7.464, 7.392,~
$ GDP.per.capita               <dbl> 10.775, 10.933, 11.117, 10.878, 10.932, 1~
$ Social.support               <dbl> 0.954, 0.954, 0.942, 0.983, 0.942, 0.954,~
$ Healthy.life.expectancy      <dbl> 72.000, 72.700, 74.400, 73.000, 72.400, 7~
$ Freedom.to.make.life.choices <dbl> 0.949, 0.946, 0.919, 0.955, 0.913, 0.960,~
$ Generosity                   <dbl> -0.098, 0.030, 0.025, 0.160, 0.175, 0.093~
$ Perceptions.of.corruption    <dbl> 0.186, 0.179, 0.292, 0.673, 0.338, 0.270,~
$ Dystopia.Residual            <dbl> 3.253, 2.868, 2.839, 2.967, 2.798, 2.580,~

Region column its already change now , lets get more information from the data using summary. Summary is used to get a description of statistical values from numeric data by using the summary() function, it will help to get quick insight from the data we have.

summary(whr)
   Country                                         Region       Score      
 Length:149         Sub-Saharan Africa                :36   Min.   :2.523  
 Class :character   Western Europe                    :21   1st Qu.:4.852  
 Mode  :character   Latin America and Caribbean       :20   Median :5.534  
                    Central and Eastern Europe        :17   Mean   :5.533  
                    Middle East and North Africa      :17   3rd Qu.:6.255  
                    Commonwealth of Independent States:12   Max.   :7.842  
                    (Other)                           :26                  
 GDP.per.capita   Social.support   Healthy.life.expectancy
 Min.   : 6.635   Min.   :0.4630   Min.   :48.48          
 1st Qu.: 8.541   1st Qu.:0.7500   1st Qu.:59.80          
 Median : 9.569   Median :0.8320   Median :66.60          
 Mean   : 9.432   Mean   :0.8147   Mean   :64.99          
 3rd Qu.:10.421   3rd Qu.:0.9050   3rd Qu.:69.60          
 Max.   :11.647   Max.   :0.9830   Max.   :76.95          
                                                          
 Freedom.to.make.life.choices   Generosity       Perceptions.of.corruption
 Min.   :0.3820               Min.   :-0.28800   Min.   :0.0820           
 1st Qu.:0.7180               1st Qu.:-0.12600   1st Qu.:0.6670           
 Median :0.8040               Median :-0.03600   Median :0.7810           
 Mean   :0.7916               Mean   :-0.01513   Mean   :0.7274           
 3rd Qu.:0.8770               3rd Qu.: 0.07900   3rd Qu.:0.8450           
 Max.   :0.9700               Max.   : 0.54200   Max.   :0.9390           
                                                                          
 Dystopia.Residual
 Min.   :0.648    
 1st Qu.:2.138    
 Median :2.509    
 Mean   :2.430    
 3rd Qu.:2.794    
 Max.   :3.482    
                  

Interpretation :

  1. Length Country: 149 , describe total country in dataset , it means also the data have 149 total rows
  2. For the region : Sub-Saharan Africa have the most countries , 36 countries, meanwhile Commonwealth of Independent States only have 12 countries
  3. For the happines score minimum values is 2.523 and maximum value is 7.842, and mean value is about 5.533
  4. We can find out also the minimum GDP per capita is 6.635 and maximum GDP per capita is 11.647, mean : 9.432
  5. Social.support Min value is :0.4630, and Max value :0.9830
  6. Healthy.life.expectancy Min value is :48.48, Max value is :76.95
  7. Freedom.to.make.life.choice Min value :0.3820, Max value :0.9700
  8. its interesting value in Generosity it have minus for Min value :-0.28800, and Max value is: 0.54200
  9. Perceptions.of.corruption Min value is :0.0820 , Max value is:0.9390

Processing Data and Plotting

# Create manual color for plotting :
pallete <- c("#5bdcdc", "#7a0651", "#06517a",
             "#808080", "#bc8f8f", "#8fa6bc", 
             "#bc8fbc", "#8fbcbc","#e3aaa3","#9695a4")

Check how many countries in each region :

# Check how many country in each region :
ggplot(whr, aes(y = Region,fill = Region))+
  geom_bar()+
  scale_fill_manual(values = pallete)+
  theme_ipsum()

Correlation

First, check the correlation between happiness score and the six factor : GDP.per.capita, Social.support, Healthy.life.expectancy, Freedom.to.make.life.choices, Generosity, Perceptions.of.corruption,from the visualization we can find out which factor is the strongest .

# plot correlation Happiness Score & GDp per capita
gdp_plot <- ggplot(data = whr, aes(x = GDP.per.capita, y = Score)) +
  geom_point(colour = "gray", shape = 21, size =4, aes(fill = Region)) +
  scale_fill_brewer(type = "seq", palette = "Spectral") +
  theme_minimal() +
  labs(title = "Does GDP bring Happiness?")
gdp_plot

# plot correlation Happiness Score & Social Life Expectancy 
ss_plot <- ggplot(data = whr, aes(x = Social.support, y = Score)) +
  geom_point(colour = "gray", shape = 21, size =4, aes(fill = Region)) +
  scale_fill_brewer(type = "seq", palette = "Spectral") +
  theme_minimal() +
  labs(title = "Does Social Support bring Happiness?")
ss_plot

# plot correlation Happiness Score & Healthy.life.expectancy
healthy_plot <- ggplot(data = whr, aes(x = Healthy.life.expectancy, y = Score)) +
  geom_point(colour = "gray", shape = 21, size =4, aes(fill = Region)) +
  scale_fill_brewer(type = "seq", palette = "Spectral") +
  theme_minimal() +
  labs(title = "Does Healthy life bring Happiness?")
healthy_plot

# plot correlation Happiness Score & Freedom 
freedom_plot <- ggplot(data = whr, aes(x = Freedom.to.make.life.choices, y = Score)) +
  geom_point(colour = "gray", shape = 21, size =4, aes(fill = Region)) +
  scale_fill_brewer(type = "seq", palette = "Spectral") +
  theme_minimal() +
  labs(title = "Does Freedom bring Happiness?")
freedom_plot

# plot correlation Happiness Score & Generosity
generosity_plot <- ggplot(data = whr, aes(x = Generosity, y = Score)) +
  geom_point(colour = "gray", shape = 21, size =4, aes(fill = Region)) +
  scale_fill_brewer(type = "seq", palette = "Spectral") +
  theme_minimal() +
  labs(title = "Does Generosity bring Happiness?")
generosity_plot

# plot correlation Happiness Score & Corruption
corruption_plot <- ggplot(data = whr, aes(x = Perceptions.of.corruption, y = Score)) +
  geom_point(colour = "gray", shape = 21, size =4, aes(fill = Region)) +
  scale_fill_brewer(type = "seq", palette = "Spectral") +
  theme_minimal() +
  labs(title = "Does Corruption bring Happiness?")

corruption_plot

# Select numeric column & create object corr_data 
corr_data <-  whr %>% 
  select(Score,GDP.per.capita,Social.support,Healthy.life.expectancy,Freedom.to.make.life.choices,Generosity,Perceptions.of.corruption)
corr_data
   Score GDP.per.capita Social.support Healthy.life.expectancy
1  7.842         10.775          0.954                    72.0
2  7.620         10.933          0.954                    72.7
3  7.571         11.117          0.942                    74.4
4  7.554         10.878          0.983                    73.0
5  7.464         10.932          0.942                    72.4
6  7.392         11.053          0.954                    73.3
7  7.363         10.867          0.934                    72.7
8  7.324         11.647          0.908                    72.6
9  7.277         10.643          0.948                    73.4
10 7.268         10.906          0.934                    73.3
   Freedom.to.make.life.choices Generosity Perceptions.of.corruption
1                         0.949     -0.098                     0.186
2                         0.946      0.030                     0.179
3                         0.919      0.025                     0.292
4                         0.955      0.160                     0.673
5                         0.913      0.175                     0.338
6                         0.960      0.093                     0.270
7                         0.945      0.086                     0.237
8                         0.907     -0.034                     0.386
9                         0.929      0.134                     0.242
10                        0.908      0.042                     0.481
 [ reached 'max' / getOption("max.print") -- omitted 139 rows ]
# modifying data into matrix
corr_data <- as.matrix(corr_data)
# create correlation object data
corr_data <- cor(corr_data)
# Plotting data correlation 
corrplot <- ggcorrplot(corr_data, method = "circle",lab=TRUE,lab_size = 3, colors = c("brown","pink","blue")) +      
        ggtitle("Correlation Graph between variables")
corrplot

Interpretations :

From the above output we can check which one have a strong correlation with the happiness score :

  • Happiness score has strong level correlation with GDP per capita, Healthy life Expectancy, and Social Support
  • GDP per capita score has strong correlation with Social support and Healthy Life expectancy.
  • Happiness score and the value of Freedom to make life choice have middle level correlation between them.
  • Between happiness score and the value of Generosity and perception of corruption have weak correlation, and both have a weak negative level correlation .

Distribution

Boxplot

I want to use boxplot for visualizing “Social support”,“Freedom to make life choices”,“Generosity”,and “Perception of corruption”, I take these 4 columns because of distribution of the numeric values are similar to each other ( all of them have the range between 0 and 1), and by using box plot we can see outliers better.

# Check the value of the column and assign into one object distribution
dist <- whr %>% 
  select(Social.support,Freedom.to.make.life.choices,Generosity,Perceptions.of.corruption)
dist
   Social.support Freedom.to.make.life.choices Generosity
1           0.954                        0.949     -0.098
2           0.954                        0.946      0.030
3           0.942                        0.919      0.025
4           0.983                        0.955      0.160
5           0.942                        0.913      0.175
6           0.954                        0.960      0.093
7           0.934                        0.945      0.086
8           0.908                        0.907     -0.034
9           0.948                        0.929      0.134
10          0.934                        0.908      0.042
11          0.940                        0.914      0.159
12          0.939                        0.800      0.031
13          0.903                        0.875      0.011
14          0.926                        0.915      0.089
15          0.947                        0.879      0.077
16          0.891                        0.934     -0.126
17          0.934                        0.859      0.233
18          0.947                        0.858     -0.208
   Perceptions.of.corruption
1                      0.186
2                      0.179
3                      0.292
4                      0.673
5                      0.338
6                      0.270
7                      0.237
8                      0.386
9                      0.242
10                     0.481
11                     0.442
12                     0.753
13                     0.460
14                     0.415
15                     0.363
16                     0.809
17                     0.459
18                     0.868
 [ reached 'max' / getOption("max.print") -- omitted 131 rows ]
  • Before plotting into boxplot we have to transform the data first , from wide to long using pivot longer function , as below :
# transform data distribution from wide to long :
 dist  <-  pivot_longer(data = dist, 
             cols =Social.support:Freedom.to.make.life.choices:Generosity:Perceptions.of.corruption, 
             names_to = "variabel", 
             values_to = "value")
dist
# A tibble: 596 x 2
   variabel                      value
   <chr>                         <dbl>
 1 Social.support                0.954
 2 Freedom.to.make.life.choices  0.949
 3 Generosity                   -0.098
 4 Perceptions.of.corruption     0.186
 5 Social.support                0.954
 6 Freedom.to.make.life.choices  0.946
 7 Generosity                    0.03 
 8 Perceptions.of.corruption     0.179
 9 Social.support                0.942
10 Freedom.to.make.life.choices  0.919
# ... with 586 more rows
# Plotting the data 
ggplot(dist) +
      geom_boxplot(aes(x=variabel, y=value, fill=variabel))+
  scale_fill_manual(values = pallete)+
  coord_flip()+
  theme_ipsum()

- It is noticed the presence of some outliers, mainly in the Perceptions of corruption and Generosity. “Perceptron of corruption” variable has higher than the others meaning that only some countries have indicators above the majority in these indicators. - Generosity has a little bit different range when we compare with other three variables.

# distribution between happiness Score & GDP per capita
dist2 <- whr %>% 
  select(Score,GDP.per.capita)
dist2
   Score GDP.per.capita
1  7.842         10.775
2  7.620         10.933
3  7.571         11.117
4  7.554         10.878
5  7.464         10.932
6  7.392         11.053
7  7.363         10.867
8  7.324         11.647
9  7.277         10.643
10 7.268         10.906
11 7.183         10.796
12 7.157         10.575
13 7.155         10.873
14 7.103         10.776
15 7.085         11.342
16 7.069          9.880
17 7.064         10.707
18 6.965         10.556
19 6.951         11.023
20 6.834         10.823
21 6.690         10.704
22 6.647         10.669
23 6.602         10.674
24 6.584         10.871
25 6.561         11.085
26 6.494         10.743
27 6.491         10.571
28 6.483         10.623
29 6.461         10.529
30 6.435          9.053
31 6.431          9.966
32 6.377         11.488
33 6.372          9.318
34 6.331         10.369
35 6.330          9.577
36 6.317          9.859
37 6.309          9.186
 [ reached 'max' / getOption("max.print") -- omitted 112 rows ]
 dist2 <-  pivot_longer(data = dist2, 
             cols =Score:GDP.per.capita, 
             names_to = "variabel", 
             values_to = "value")
 dist2
# A tibble: 298 x 2
   variabel       value
   <chr>          <dbl>
 1 Score           7.84
 2 GDP.per.capita 10.8 
 3 Score           7.62
 4 GDP.per.capita 10.9 
 5 Score           7.57
 6 GDP.per.capita 11.1 
 7 Score           7.55
 8 GDP.per.capita 10.9 
 9 Score           7.46
10 GDP.per.capita 10.9 
# ... with 288 more rows
# Plotting the data 
ggplot(dist2) +
      geom_boxplot(aes(x=variabel, y=value, fill=variabel))+
  scale_fill_manual(values = pallete)+
  theme_ipsum()

- There is no outliers between these two variable and GDP per capita have a high of mean value ~ 9.00

dist3 <- whr %>% 
  select(Healthy.life.expectancy)
dist3
   Healthy.life.expectancy
1                   72.000
2                   72.700
3                   74.400
4                   73.000
5                   72.400
6                   73.300
7                   72.700
8                   72.600
9                   73.400
10                  73.300
11                  73.900
12                  73.503
13                  72.500
14                  73.800
15                  72.400
16                  71.400
17                  72.500
18                  70.807
19                  68.200
20                  72.199
21                  74.000
22                  69.495
23                  72.200
24                  69.600
25                  67.333
26                  66.603
27                  74.700
28                  73.800
29                  71.400
30                  64.958
31                  69.100
32                  76.953
33                  63.813
34                  69.201
35                  66.601
36                  68.597
37                  67.500
38                  67.906
39                  73.898
40                  68.800
41                  69.652
42                  65.255
43                  70.000
44                  69.702
45                  65.200
46                  67.355
47                  66.900
48                  68.600
49                  66.402
50                  66.701
51                  67.100
52                  68.001
53                  68.000
54                  67.401
55                  67.657
56                  75.100
57                  69.000
58                  72.600
59                  67.300
60                  70.799
61                  62.000
62                  73.900
63                  68.250
64                  68.098
65                  65.699
66                  68.800
67                  64.401
68                  72.600
69                  63.901
70                  62.500
71                  65.900
72                  68.699
73                  66.102
74                  73.898
75                  66.253
 [ reached 'max' / getOption("max.print") -- omitted 74 rows ]
dist3 <-  pivot_longer(data = dist3, 
             cols =Healthy.life.expectancy, 
             names_to = "variabel", 
             values_to = "value")
 dist3
# A tibble: 149 x 2
   variabel                value
   <chr>                   <dbl>
 1 Healthy.life.expectancy  72  
 2 Healthy.life.expectancy  72.7
 3 Healthy.life.expectancy  74.4
 4 Healthy.life.expectancy  73  
 5 Healthy.life.expectancy  72.4
 6 Healthy.life.expectancy  73.3
 7 Healthy.life.expectancy  72.7
 8 Healthy.life.expectancy  72.6
 9 Healthy.life.expectancy  73.4
10 Healthy.life.expectancy  73.3
# ... with 139 more rows
# Plotting the data 
ggplot(dist3) +
      geom_boxplot(aes(x=variabel, y=value, fill=variabel))+
  scale_fill_manual(values = pallete)+
  theme_ipsum()

  • I created the boxplot separately because of the distribution of the numeric values higher than the others. From the output , there is no outliers also in healthy life variable, and the median value around ~ 66.

Density

I am going to use density plot to show the happiness score distribution according to the geographical location , region of the countries.

ggplot(whr, aes(Score, fill = Region)) +
  scale_fill_manual(values = pallete)+
  geom_density() +
  xlim(1, 9)+
  labs(title = "Happiness Score Distribution by Region")+
  theme_ipsum()+
  theme(plot.title = element_text(size = 14))

- According to distribution plot, we can say that the happiest region is Western Europe because the range value of the happiness score between 5 until around 7.8 and have a lot of distribution also from the plot density, then follow by North America and ANZ. - Plot also shows us the unhappiest region is South Asia and follow by Sub-Saharan Africa

Top 10 Happiest & Unhappiest Countries

# Subset top 10 happiest countries
happiest <- whr %>% 
  select(Country,Region,Score) %>% 
  arrange(desc(Score)) %>% 
  head(10)
# Plotting top 10 Hapiest Countries
top10_happiest <- happiest %>% 
  ggplot(aes(x = Score, 
             y = reorder(Country, Score)))+
  geom_col(fill = "#993366") +
  labs(title = "Top 10 Happiest Countries",
       x = "Score",
       y = NULL) +
  theme_ipsum() +
  theme(axis.title.x = element_text(size = 8 ))

top10_happiest

- From the output we find out that the happiest country is Findland and follow by Denmark and Switzerland

# Subset unhappiest countries data :
unhappiest <- whr %>% 
  select(Country,Region, Score) %>% 
  arrange(Score) %>% 
  head(10)
# Plotting top 10 Unhapiest Countries
top10_unhappiest <- unhappiest %>% 
  ggplot(aes(x = Score,  
             y = reorder(Country, Score))) +
  geom_col(fill = "#d396a3") +
  labs(title = "Top 10 Unhappiest Countries",
       x = "Score",
       y = NULL) +
  theme_ipsum() +
  theme(axis.title.x = element_text(size = 10 ))

top10_unhappiest

  • From the output we find out that the unhappiest country is Afghanistan and follow by Zimbabwe and Rwanda

Map Visualisation

The tmap package was loaded to create the below charts. In this visualization , I just mapping the happiness score by each countries . With the help of the tm_text() function, we were able to load the respective countries’ names over the area of the map.

# create data for map location
maplocation <-
  read_sf("Longitude_Graticules_and_World_Countries_Boundaries-shp/99bfd9e7-bb42-4728-87b5-07f8c8ac631c2020328-1-1vef4ev.lu5nk.shp")
str(maplocation)
sf [251 x 3] (S3: sf/tbl_df/tbl/data.frame)
 $ OBJECTID  : int [1:251] 1 2 3 4 5 6 7 8 9 10 ...
 $ CNTRY_NAME: chr [1:251] "Aruba" "Antigua and Barbuda" "Afghanistan" "Algeria" ...
 $ geometry  :sfc_MULTIPOLYGON of length 251; first list element: List of 1
  ..$ :List of 1
  .. ..$ : num [1:11, 1:2] -69.9 -69.9 -70.1 -70.1 -70 ...
  ..- attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"
 - attr(*, "sf_column")= chr "geometry"
 - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA
  ..- attr(*, "names")= chr [1:2] "OBJECTID" "CNTRY_NAME"
# combined dataset & map location
combined_dataset <- left_join(maplocation,whr , 
                              by = c("CNTRY_NAME" = "Country"))

names(combined_dataset)
 [1] "OBJECTID"                     "CNTRY_NAME"                  
 [3] "geometry"                     "Region"                      
 [5] "Score"                        "GDP.per.capita"              
 [7] "Social.support"               "Healthy.life.expectancy"     
 [9] "Freedom.to.make.life.choices" "Generosity"                  
[11] "Perceptions.of.corruption"    "Dystopia.Residual"           
# check na value and select some column
combined_dataset <- combined_dataset[!is.na(combined_dataset$Score),] %>%
  select(CNTRY_NAME,Region,Score,GDP.per.capita,Social.support,
         Healthy.life.expectancy,Freedom.to.make.life.choices,
         Generosity,Perceptions.of.corruption)
# plotting map 
tmap_mode("view")

tm_shape(combined_dataset) +
  tm_fill("Score",
           style = "quantile", 
           palette = "Greens") +
  tm_borders(alpha = 0.5) +
  tm_text("CNTRY_NAME", size="CNTRY_NAME")+
  tmap_style("classic")

Conclusion

  • Happiness score has strong level correlation with GDP,Healthy life Expectancy, and Social Support . High scores among these categories speak to the likelihood of having a high overall Happiness Score.
  • The variable that look like interesting is between Generosity and Corruption , this two variable have a lowest point at this analysis, It is a bigger concern, it means generosity didn’t effect to much for the happiness.
  • Happiest countries tend to have longer life expectancy and a higher value of GDP ,which are also most of Western Europe.
  • Finland is one of the happiest countries to live in with a high score reported, follow by Denmark and Switzerland
  • Meanwhile Afghanistan is the unhappiest country with lowest score reported.
  • Most of the Sub-Saharan African countries have lower level of happiness.
  • From the analysing reports, we are able to interpret what makes countries and their citizens happier, thus allowing us to focus on prioritizing and improving these aspects of each nation.