What factors contribute to overall happiness around the world?

Saayed Alam
December 12, 2018

Introduction

  • The suicide rate is the highest it's been in decades in United States.
  • Which led to a question, what makes a country happy?
  • The purpose of this project is to determine which factors are most important to live a happier life.
  • The dataset comes from United Nations hosted on Kaggle.

CDC Report

Getting The Data

Observations: 155
Variables: 12
$ Country                       <fct> Norway, Denmark, Iceland, Switze...
$ Happiness.Rank                <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1...
$ Happiness.Score               <dbl> 7.537, 7.522, 7.504, 7.494, 7.46...
$ Whisker.high                  <dbl> 7.594445, 7.581728, 7.622030, 7....
$ Whisker.low                   <dbl> 7.479556, 7.462272, 7.385970, 7....
$ Economy..GDP.per.Capita.      <dbl> 1.616463, 1.482383, 1.480633, 1....
$ Family                        <dbl> 1.533524, 1.551122, 1.610574, 1....
$ Health..Life.Expectancy.      <dbl> 0.7966665, 0.7925655, 0.8335521,...
$ Freedom                       <dbl> 0.6354226, 0.6260067, 0.6271626,...
$ Generosity                    <dbl> 0.36201224, 0.35528049, 0.475540...
$ Trust..Government.Corruption. <dbl> 0.31596383, 0.40077007, 0.153526...
$ Dystopia.Residual             <dbl> 2.277027, 2.313707, 2.322715, 2....

The ranking of the countries are based on seven factors - family, life expectancy, economy, generosity, trust in government, freedom and dystopia residuals. The sum of these seven factors equal to the happiness score.

Tidying and Transforming Data

  • The name of some columns are renamed. And some are deleted for the purpose of this project.
  • One such column is Economy. Later, I will get the data by scraping it from Wikipedia.
  • A new variable called continent is created based on 155 countries from the dataset.
Observations: 155
Variables: 10
$ Country           <fct> Norway, Denmark, Iceland, Switzerland, Finla...
$ Continent         <fct> Europe, Europe, Europe, Europe, Europe, Euro...
$ Happiness.Rank    <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 1...
$ Happiness.Score   <dbl> 7.537, 7.522, 7.504, 7.494, 7.469, 7.377, 7....
$ Family            <dbl> 1.533524, 1.551122, 1.610574, 1.516912, 1.54...
$ Life.Expectancy   <dbl> 0.7966665, 0.7925655, 0.8335521, 0.8581313, ...
$ Freedom           <dbl> 0.6354226, 0.6260067, 0.6271626, 0.6200706, ...
$ Generosity        <dbl> 0.36201224, 0.35528049, 0.47554022, 0.290549...
$ Trust             <dbl> 0.31596383, 0.40077007, 0.15352656, 0.367007...
$ Dystopia.Residual <dbl> 2.277027, 2.313707, 2.322715, 2.276716, 2.43...

Statistical Analysis

A descriptive statistical analysis of the six factors and happiness score.

 Happiness.Score     Family      Life.Expectancy     Freedom      
 Min.   :2.693   Min.   :0.000   Min.   :0.0000   Min.   :0.0000  
 1st Qu.:4.505   1st Qu.:1.043   1st Qu.:0.3699   1st Qu.:0.3037  
 Median :5.279   Median :1.254   Median :0.6060   Median :0.4375  
 Mean   :5.354   Mean   :1.189   Mean   :0.5513   Mean   :0.4088  
 3rd Qu.:6.101   3rd Qu.:1.414   3rd Qu.:0.7230   3rd Qu.:0.5166  
 Max.   :7.537   Max.   :1.611   Max.   :0.9495   Max.   :0.6582  
   Generosity         Trust         Dystopia.Residual
 Min.   :0.0000   Min.   :0.00000   Min.   :0.3779   
 1st Qu.:0.1541   1st Qu.:0.05727   1st Qu.:1.5913   
 Median :0.2315   Median :0.08985   Median :1.8329   
 Mean   :0.2469   Mean   :0.12312   Mean   :1.8502   
 3rd Qu.:0.3238   3rd Qu.:0.15330   3rd Qu.:2.1447   
 Max.   :0.8381   Max.   :0.46431   Max.   :3.1175   

Visualizing The Data

Happiness score and happiness rank are inversely proportional.

plot of chunk unnamed-chunk-5

Europe ranks the highest in happiness score.

plot of chunk unnamed-chunk-6

Visualizing The Data

Factors that play the most significant role in contributing to happiness score.

plot of chunk unnamed-chunk-7

The correlation plot above shows Life Expectancy and Family play a major role in a continent's happiness level.

Getting The Data From A New Source

What about economy? To examine this question, a list of countries by GDP per capita published by World Bank in year 2017 is scrapped from Wikipedia.

#scraping data from wikipedia
wiki_gdp_capita <- "https://en.wikipedia.org/wiki/List_of_countries_by_GDP_(PPP)_per_capita"
wiki_gdp_capita <- read_html(wiki_gdp_capita)
gdp_capita <- wiki_gdp_capita %>%
  html_nodes(xpath = '//*[@id="mw-content-text"]/div/table[1]/tbody/tr[2]/td[2]/table') %>%
  html_table()

#getting the data frame from the list
gdp_capita <- gdp_capita[[1]] 

#renaming the column names and removing rank
gdp_capita <- gdp_capita %>% 
  dplyr::rename(Country = 'Country/Territory', 
                USD = 'Int$') %>%
  select(-Rank) 

#removing commas from dollar amount  
gdp_capita$USD <- str_replace_all(gdp_capita$USD, "[:punct:]", '') %>%
  as.numeric()

#all rows in happiness rank that do no have a match in gdp per capita
#anti_join(happiness_rank, gdp_capita) %>%
# select(Country)

happiness_rank_gdp <- happiness_rank

#changing the value of the rows which are spelled differently
happiness_rank_gdp$country <- recode(happiness_rank_gdp$Country, "'North Cyprus' = 'Cyprus'")
happiness_rank_gdp$Country <- recode(happiness_rank_gdp$Country, "'Hong Kong S.A.R., China' = 'Hong Kong'")
happiness_rank_gdp$Country <- recode(happiness_rank_gdp$Country, "'Congo (Kinshasa)' = 'Congo, Rep.'")
happiness_rank_gdp$Country <- recode(happiness_rank_gdp$Country, "'Congo (Brazzaville)' = 'Congo, Dem. Rep.'")
happiness_rank_gdp$Country <- recode(happiness_rank_gdp$Country, "'South Sudan' = 'Sudan'")

#joining both datasets and delete the duplicate column of Country
happiness_rank_gdp <- left_join(happiness_rank_gdp, gdp_capita)

#standarizing the USD column and change its the name of the column to Economy
happiness_rank_gdp <- happiness_rank_gdp %>%
  mutate(Economy = percent_rank(USD)) %>%
  select(-11, -12) %>%
  drop_na()

Visualizing The Data

There is a correlation between economy and happiness score. And it's higher than life expectancy and family factors.

plot of chunk unnamed-chunk-9

Prediction

Multiple linear regression will predict happiness score with high accuracy given these factors on unseen data.

plot of chunk unnamed-chunk-10