Part 1 - Introduction

The World Happiness Report is an annual publication of the United Nations Sustainable Development Solutions Network. It contains articles, and rankings of national happiness based on respondent ratings of their own lives, which the report also correlates with various life factors. Each year, ~1,000 individuals are sampled from each of more than 150 countries.

Fair to say that in the current COVID pandemic situation not many people would be happy, my analysis here perhaps may help to look or shape our post-covid world. May not fully but at least aim to look at the brighter side of life.

Research questions that I plan to get some insights are:

  1. Country in top 10 rank for happiness, have higher Life Expectancy (years) ?
  2. Does being Generous make people happy ?

We will check the above for the calendar year 2019.

Part 2 - Data

  • Data collection:

    The data was collected from Kaggle, but the original source for the data is World Bank Data. The data was collected from below sites:

    1. Life Expectancy (WHO) data is found here.
    2. World Happiness Report data is found here.
  • Cases:

  1. Each case in the Happiness dataset is for a given Country and shows its rank, score and other parameter score. There are 156 cases in each 2018 and 2019 dataset.

  2. Each case in the Life Expectancy dataset is for a Given country and year and shows its life expectancy for the year. There are 19028 cases in the dataset.

  • Variables:
  1. Dependent Variable : The response variable is the Score variable and its quantitative (numerical).
  2. Independent Variable : The independent variables are the Life expectancy (years) variable and the other variable is Generosity in the happiness dataset; Both these are quantitative (numerical).
  • Type of study:

    This is an observational study. We are looking at data for year 2018 and 2019 for the countries in the world.

## [1] 19028     4
Entity Code Year Life expectancy (years)
Afghanistan AFG 1950 27.638
Afghanistan AFG 1951 27.878
Afghanistan AFG 1952 28.361
Afghanistan AFG 1953 28.852
Afghanistan AFG 1954 29.350
Afghanistan AFG 1955 29.854
Afghanistan AFG 1956 30.365
Afghanistan AFG 1957 30.882
Afghanistan AFG 1958 31.403
Afghanistan AFG 1959 31.925
## [1] 156   9
Overall rank Country or region Score GDP per capita Social support Healthy life expectancy Freedom to make life choices Generosity Perceptions of corruption
1 Finland 7.632 1.305 1.592 0.874 0.681 0.202 0.393
2 Norway 7.594 1.456 1.582 0.861 0.686 0.286 0.340
3 Denmark 7.555 1.351 1.590 0.868 0.683 0.284 0.408
4 Iceland 7.495 1.343 1.644 0.914 0.677 0.353 0.138
5 Switzerland 7.487 1.420 1.549 0.927 0.660 0.256 0.357
6 Netherlands 7.441 1.361 1.488 0.878 0.638 0.333 0.295
7 Canada 7.328 1.330 1.532 0.896 0.653 0.321 0.291
8 New Zealand 7.324 1.268 1.601 0.876 0.669 0.365 0.389
9 Sweden 7.314 1.355 1.501 0.913 0.659 0.285 0.383
10 Australia 7.272 1.340 1.573 0.910 0.647 0.361 0.302
## [1] 156   9
Overall rank Country or region Score GDP per capita Social support Healthy life expectancy Freedom to make life choices Generosity Perceptions of corruption
1 Finland 7.769 1.340 1.587 0.986 0.596 0.153 0.393
2 Denmark 7.600 1.383 1.573 0.996 0.592 0.252 0.410
3 Norway 7.554 1.488 1.582 1.028 0.603 0.271 0.341
4 Iceland 7.494 1.380 1.624 1.026 0.591 0.354 0.118
5 Netherlands 7.488 1.396 1.522 0.999 0.557 0.322 0.298
6 Switzerland 7.480 1.452 1.526 1.052 0.572 0.263 0.343
7 Sweden 7.343 1.387 1.487 1.009 0.574 0.267 0.373
8 New Zealand 7.307 1.303 1.557 1.026 0.585 0.330 0.380
9 Canada 7.278 1.365 1.505 1.039 0.584 0.285 0.308
10 Austria 7.246 1.376 1.475 1.016 0.532 0.244 0.226

Part 3 - Exploratory data analysis

Six key variables are provided to build the country’s happiness score. The six variables are as follows: GDP per capita , Social support, Healthy life expectancy, Freedom to make life choices, Generosity, and Perceptions of corruption .

  • Summary Statistics for each dataframe is as below:
## Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 19028 obs. of  4 variables:
##  $ Entity                 : chr  "Afghanistan" "Afghanistan" "Afghanistan" "Afghanistan" ...
##  $ Code                   : chr  "AFG" "AFG" "AFG" "AFG" ...
##  $ Year                   : num  1950 1951 1952 1953 1954 ...
##  $ Life expectancy (years): num  27.6 27.9 28.4 28.9 29.4 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Entity = col_character(),
##   ..   Code = col_character(),
##   ..   Year = col_double(),
##   ..   `Life expectancy (years)` = col_double()
##   .. )
##     Entity              Code                Year      Life expectancy (years)
##  Length:19028       Length:19028       Min.   :1543   Min.   :17.76          
##  Class :character   Class :character   1st Qu.:1961   1st Qu.:52.31          
##  Mode  :character   Mode  :character   Median :1980   Median :64.71          
##                                        Mean   :1975   Mean   :61.75          
##                                        3rd Qu.:2000   3rd Qu.:71.98          
##                                        Max.   :2019   Max.   :86.75
## Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 156 obs. of  9 variables:
##  $ Overall rank                : num  1 2 3 4 5 6 7 8 9 10 ...
##  $ Country or region           : chr  "Finland" "Norway" "Denmark" "Iceland" ...
##  $ Score                       : num  7.63 7.59 7.55 7.5 7.49 ...
##  $ GDP per capita              : num  1.3 1.46 1.35 1.34 1.42 ...
##  $ Social support              : num  1.59 1.58 1.59 1.64 1.55 ...
##  $ Healthy life expectancy     : num  0.874 0.861 0.868 0.914 0.927 0.878 0.896 0.876 0.913 0.91 ...
##  $ Freedom to make life choices: num  0.681 0.686 0.683 0.677 0.66 0.638 0.653 0.669 0.659 0.647 ...
##  $ Generosity                  : num  0.202 0.286 0.284 0.353 0.256 0.333 0.321 0.365 0.285 0.361 ...
##  $ Perceptions of corruption   : num  0.393 0.34 0.408 0.138 0.357 0.295 0.291 0.389 0.383 0.302 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   `Overall rank` = col_double(),
##   ..   `Country or region` = col_character(),
##   ..   Score = col_double(),
##   ..   `GDP per capita` = col_double(),
##   ..   `Social support` = col_double(),
##   ..   `Healthy life expectancy` = col_double(),
##   ..   `Freedom to make life choices` = col_double(),
##   ..   Generosity = col_double(),
##   ..   `Perceptions of corruption` = col_double()
##   .. )
##   Overall rank    Country or region      Score       GDP per capita  
##  Min.   :  1.00   Length:156         Min.   :2.905   Min.   :0.0000  
##  1st Qu.: 39.75   Class :character   1st Qu.:4.454   1st Qu.:0.6162  
##  Median : 78.50   Mode  :character   Median :5.378   Median :0.9495  
##  Mean   : 78.50                      Mean   :5.376   Mean   :0.8914  
##  3rd Qu.:117.25                      3rd Qu.:6.168   3rd Qu.:1.1978  
##  Max.   :156.00                      Max.   :7.632   Max.   :2.0960  
##                                                                      
##  Social support  Healthy life expectancy Freedom to make life choices
##  Min.   :0.000   Min.   :0.0000          Min.   :0.0000              
##  1st Qu.:1.067   1st Qu.:0.4223          1st Qu.:0.3560              
##  Median :1.255   Median :0.6440          Median :0.4870              
##  Mean   :1.213   Mean   :0.5973          Mean   :0.4545              
##  3rd Qu.:1.463   3rd Qu.:0.7772          3rd Qu.:0.5785              
##  Max.   :1.644   Max.   :1.0300          Max.   :0.7240              
##                                                                      
##    Generosity     Perceptions of corruption
##  Min.   :0.0000   Min.   :0.000            
##  1st Qu.:0.1095   1st Qu.:0.051            
##  Median :0.1740   Median :0.082            
##  Mean   :0.1810   Mean   :0.112            
##  3rd Qu.:0.2390   3rd Qu.:0.137            
##  Max.   :0.5980   Max.   :0.457            
##                   NA's   :1
## Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 156 obs. of  9 variables:
##  $ Overall rank                : num  1 2 3 4 5 6 7 8 9 10 ...
##  $ Country or region           : chr  "Finland" "Denmark" "Norway" "Iceland" ...
##  $ Score                       : num  7.77 7.6 7.55 7.49 7.49 ...
##  $ GDP per capita              : num  1.34 1.38 1.49 1.38 1.4 ...
##  $ Social support              : num  1.59 1.57 1.58 1.62 1.52 ...
##  $ Healthy life expectancy     : num  0.986 0.996 1.028 1.026 0.999 ...
##  $ Freedom to make life choices: num  0.596 0.592 0.603 0.591 0.557 0.572 0.574 0.585 0.584 0.532 ...
##  $ Generosity                  : num  0.153 0.252 0.271 0.354 0.322 0.263 0.267 0.33 0.285 0.244 ...
##  $ Perceptions of corruption   : num  0.393 0.41 0.341 0.118 0.298 0.343 0.373 0.38 0.308 0.226 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   `Overall rank` = col_double(),
##   ..   `Country or region` = col_character(),
##   ..   Score = col_double(),
##   ..   `GDP per capita` = col_double(),
##   ..   `Social support` = col_double(),
##   ..   `Healthy life expectancy` = col_double(),
##   ..   `Freedom to make life choices` = col_double(),
##   ..   Generosity = col_double(),
##   ..   `Perceptions of corruption` = col_double()
##   .. )
##   Overall rank    Country or region      Score       GDP per capita  
##  Min.   :  1.00   Length:156         Min.   :2.853   Min.   :0.0000  
##  1st Qu.: 39.75   Class :character   1st Qu.:4.545   1st Qu.:0.6028  
##  Median : 78.50   Mode  :character   Median :5.380   Median :0.9600  
##  Mean   : 78.50                      Mean   :5.407   Mean   :0.9051  
##  3rd Qu.:117.25                      3rd Qu.:6.184   3rd Qu.:1.2325  
##  Max.   :156.00                      Max.   :7.769   Max.   :1.6840  
##  Social support  Healthy life expectancy Freedom to make life choices
##  Min.   :0.000   Min.   :0.0000          Min.   :0.0000              
##  1st Qu.:1.056   1st Qu.:0.5477          1st Qu.:0.3080              
##  Median :1.272   Median :0.7890          Median :0.4170              
##  Mean   :1.209   Mean   :0.7252          Mean   :0.3926              
##  3rd Qu.:1.452   3rd Qu.:0.8818          3rd Qu.:0.5072              
##  Max.   :1.624   Max.   :1.1410          Max.   :0.6310              
##    Generosity     Perceptions of corruption
##  Min.   :0.0000   Min.   :0.0000           
##  1st Qu.:0.1087   1st Qu.:0.0470           
##  Median :0.1775   Median :0.0855           
##  Mean   :0.1848   Mean   :0.1106           
##  3rd Qu.:0.2482   3rd Qu.:0.1412           
##  Max.   :0.5660   Max.   :0.4530

From the above summary table for happiness data for 2019, we can quickly identify that the minimum score value is set at 2.853 and the maximum is at 7.769 with a median score value of 5.380 per country.

  • Data Analysis to explore if higher Life Expectancy (years) countries are also most Happy (for 2019) :
Entity Code Year Life expectancy (years) Overall rank Score GDP per capita Social support Healthy life expectancy Freedom to make life choices Generosity Perceptions of corruption
Andorra AND 2019 83.732 NA NA NA NA NA NA NA NA
Cayman Islands CYM 2019 83.924 NA NA NA NA NA NA NA NA
Hong Kong HKG 2019 84.857 76 5.430 1.438 1.277 1.122 0.440 0.258 0.287
Japan JPN 2019 84.629 58 5.886 1.327 1.419 1.088 0.445 0.069 0.140
Macao MAC 2019 84.244 NA NA NA NA NA NA NA NA
Monaco MCO 2019 86.751 NA NA NA NA NA NA NA NA
San Marino SMR 2019 84.972 NA NA NA NA NA NA NA NA
Singapore SGP 2019 83.620 34 6.262 1.572 1.463 1.141 0.556 0.271 0.453
Spain ESP 2019 83.565 30 6.354 1.286 1.484 1.062 0.362 0.153 0.079
Switzerland CHE 2019 83.779 6 7.480 1.452 1.526 1.052 0.572 0.263 0.343
## Warning: Removed 5 rows containing missing values (geom_point).

=> By checking the Rank and Score for these 5 countries we can say that higher life expectancy does not lead to Higher Happiness.

Part 4 - Inference

  • Hypothesis:

    H(Null): There is no association between Being Generous makes people happy.
    H(Alternate): There is association between Being Generous makes people happy.

  • Conditions:

    The sample size is greater than 30. The data sets follow a uni modal normal distribution. The samples are random.

    Hence, the conditions for inference seems to be satisfied.

  • ANOVA:

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.853   4.545   5.380   5.407   6.184   7.769

## 
## Call:
## lm(formula = happ.2019$Score ~ happ.2019$"Freedom to make life choices" + 
##     happ.2019$Generosity + happ.2019$"Social support" + happ.2019$"GDP per capita" + 
##     happ.2019$"Perceptions of corruption" + happ.2019$"Healthy life expectancy")
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.75304 -0.35306  0.05703  0.36695  1.19059 
## 
## Coefficients:
##                                          Estimate Std. Error t value Pr(>|t|)
## (Intercept)                                1.7952     0.2111   8.505 1.77e-14
## happ.2019$"Freedom to make life choices"   1.4548     0.3753   3.876 0.000159
## happ.2019$Generosity                       0.4898     0.4977   0.984 0.326709
## happ.2019$"Social support"                 1.1242     0.2369   4.745 4.83e-06
## happ.2019$"GDP per capita"                 0.7754     0.2182   3.553 0.000510
## happ.2019$"Perceptions of corruption"      0.9723     0.5424   1.793 0.075053
## happ.2019$"Healthy life expectancy"        1.0781     0.3345   3.223 0.001560
##                                             
## (Intercept)                              ***
## happ.2019$"Freedom to make life choices" ***
## happ.2019$Generosity                        
## happ.2019$"Social support"               ***
## happ.2019$"GDP per capita"               ***
## happ.2019$"Perceptions of corruption"    .  
## happ.2019$"Healthy life expectancy"      ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5335 on 149 degrees of freedom
## Multiple R-squared:  0.7792, Adjusted R-squared:  0.7703 
## F-statistic: 87.62 on 6 and 149 DF,  p-value: < 2.2e-16
## Analysis of Variance Table
## 
## Response: happ.2019$Score
##                                           Df Sum Sq Mean Sq  F value  Pr(>F)
## happ.2019$"Freedom to make life choices"   1 61.686  61.686 216.7138 < 2e-16
## happ.2019$Generosity                       1  1.230   1.230   4.3199 0.03938
## happ.2019$"Social support"                 1 64.819  64.819 227.7193 < 2e-16
## happ.2019$"GDP per capita"                 1 17.774  17.774  62.4442 5.6e-13
## happ.2019$"Perceptions of corruption"      1  1.174   1.174   4.1248 0.04404
## happ.2019$"Healthy life expectancy"        1  2.956   2.956  10.3863 0.00156
## Residuals                                149 42.412   0.285                 
##                                             
## happ.2019$"Freedom to make life choices" ***
## happ.2019$Generosity                     *  
## happ.2019$"Social support"               ***
## happ.2019$"GDP per capita"               ***
## happ.2019$"Perceptions of corruption"    *  
## happ.2019$"Healthy life expectancy"      ** 
## Residuals                                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

=> From the ANOVA result, we can see that the p-value for Generosity is 0.03938 which is less than 0.05 indicating that there is at least an association with the response variable. Therefore, we should reject the null hypothesis in favor of the alternative hypothesis.

  • Diagnostic Plots to Verify that the conditions for this model are reasonable:

=> Residual are nearly normal; The hist plot seems normal although leftward skewed. Using diagnostic plots, we can conclude that the conditions for this model are reasonable. The variables are linearly related to the outcome.

Part 5 - Conclusion

We do know that the different attributes collectively impact the score. But of these attributes we can see that the variable Generosity does play a significant factor on the score.

Checking this data closely for neighboring countries that differ a lot in their score to see what other factors influence Happiness. Or to check and compare data between countries in an continent. Would be interesting to analyze this further with COVID pandemic 2020 data set.