The relationship between Happiness and Obesity

Required packages

# This is the R chunk for the required packages
library(readr)
library(ggplot2) # Useful for creating plots
library(dplyr)  # Useful for data manipulation

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(tidyr)
library(knitr) # Useful for creating nice tables
library(kableExtra)

## 
## Attaching package: 'kableExtra'

## The following object is masked from 'package:dplyr':
## 
##     group_rows

library(ggplot2)
library(magrittr)

## 
## Attaching package: 'magrittr'

## The following object is masked from 'package:tidyr':
## 
##     extract

library(Hmisc)

## Loading required package: lattice

## Loading required package: survival

## Loading required package: Formula

## 
## Attaching package: 'Hmisc'

## The following objects are masked from 'package:dplyr':
## 
##     src, summarize

## The following objects are masked from 'package:base':
## 
##     format.pval, units

library(outliers)
library(MVN)

## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2

## sROC 0.1-2 loaded

library(infotheo)
library(forecast)

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

Executive Summary

The purpose of wrangling datasets in this assignment is for comparing the effect of happiness on obesity in 2 year 2015 and 2016. In order to have a clean dataset, we have 7 essential stages described below:

Data stage: There are 3 datasets, that are “Obesity among adults by country,1975-2016”, “The World Happiness Report 2015” and “The World Happiness Report 2016”. In this stage, I will import all the datasets and subset them to have data in 2 year 2015 and 2016. Then, I changed the column names of datasets where necessary. After that, I join the two World Happiness Report together and change column name where necessary (named WHP_join). Finally, I joined all datasets together and had one data called “all_join”.

Understand stage: I checked data type of every variables for data conversion. However, all the data conversion and factorizing will be done in the next stage.

Tidy & Manipulate Data 1 stage: I found that there are two variables that need to be tidy. I separate data from each column and removed unnecessary data. Also, I check variables that need to be tidy up. Also in this stage, I converted the cleaned variables into numeric datatype. Then, I transform data from wide format to long format and finally factorized them.

Tidy & Manipulate Data 2 stage: from cleaned dataset, I mutated to have 2 new variables called “Difference_happiness” and “Difference_Obesity”.

Scan 1 stage: I scanned and summed all missing value and filtered them to not have any missing values in the datasets. I also checked any special values available and did not find any.

Scan 2 stage: I used box plot to check whether there are any outlier or not. Then, I used Mahalanobis distance methods to find specific outliers. Finally, I chose to keep all outlier.

Transform stage: I plotted histogram to see how distributed the variables are. Then I decided to use Boxcox transformation to make the graph more symmetrical.

Data

There are three datasets used in my assignment 2:

The first one is Obesity among adults by country, 1975-2016 which is collected from https://www.kaggle.com/amanarora/obesity-among-adults-by-country-19752016. The data is originally from WHO.This dataset has 198 observations and 127 variables. This dataset contains information about obesity percentage of population in all countries over the world from 1975 to 2016.
The second dataset is the World Happiness Report 2015, which is collected from “https://www.kaggle.com/mathurinache/world-happiness-report”.This dataset has 12 variables and observations, which ranks the levels of happiness of 159 countries based on 6 main factors and the 7th factor called Dystopia/the residuals.
The third dataset is the World Happiness Report 2016, which is collected from “https://www.kaggle.com/mathurinache/world-happiness-report”. This form of dataset is similar to the second dataset, but collected in different year.

Because the obesity datasets are used to analyse the relationship betwwen obesity and happiness levels in 2015 and 2016. So, all the data in these 2 years are kept and the other years are excluded.

The World Happiness Report 2015 and 2016 are used to see the difference in happiness score. The Happiness Score is a sum of scores of other 7 factors.

In this stage, i will rename the column names of datasets where neccessary, and then merge all data together after subsetting the 3 datasets. After that, i will check its head.

# Read Obesity dataset and subset for year_2015 and year_2016
Obesity<-read.csv("D:/RMIT/2. Data Wrangling/Assignment 2/Dataset/Group 1/obesity among adult by countries/data.csv")

knitr::kable(head(Obesity))

X	X2016	X2016.1	X2016.2	X2015	X2015.1	X2015.2	X2014	X2014.1	X2014.2	X2013	X2013.1	X2013.2	X2012	X2012.1	X2012.2	X2011	X2011.1	X2011.2	X2010	X2010.1	X2010.2	X2009	X2009.1	X2009.2	X2008	X2008.1	X2008.2	X2007	X2007.1	X2007.2	X2006	X2006.1	X2006.2	X2005	X2005.1	X2005.2	X2004	X2004.1	X2004.2	X2003	X2003.1	X2003.2	X2002	X2002.1	X2002.2	X2001	X2001.1	X2001.2	X2000	X2000.1	X2000.2	X1999	X1999.1	X1999.2	X1998	X1998.1	X1998.2	X1997	X1997.1	X1997.2	X1996	X1996.1	X1996.2	X1995	X1995.1	X1995.2	X1994	X1994.1	X1994.2	X1993	X1993.1	X1993.2	X1992	X1992.1	X1992.2	X1991	X1991.1	X1991.2	X1990	X1990.1	X1990.2	X1989	X1989.1	X1989.2	X1988	X1988.1	X1988.2	X1987	X1987.1	X1987.2	X1986	X1986.1	X1986.2	X1985	X1985.1	X1985.2	X1984	X1984.1	X1984.2	X1983	X1983.1	X1983.2	X1982	X1982.1	X1982.2	X1981	X1981.1	X1981.2	X1980	X1980.1	X1980.2	X1979	X1979.1	X1979.2	X1978	X1978.1	X1978.2	X1977	X1977.1	X1977.2	X1976	X1976.1	X1976.2	X1975	X1975.1	X1975.2
	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)	Prevalence of obesity among adults, BMI &GreaterEqual; 30 (age-standardized estimate) (%)
	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years	18+ years
Country	Both sexes	Male	Female	Both sexes	Male	Female	Both sexes	Male	Female	Both sexes	Male	Female	Both sexes	Male	Female	Both sexes	Male	Female	Both sexes	Male	Female	Both sexes	Male	Female	Both sexes	Male	Female	Both sexes	Male	Female	Both sexes	Male	Female	Both sexes	Male	Female	Both sexes	Male	Female	Both sexes	Male	Female	Both sexes	Male	Female	Both sexes	Male	Female	Both sexes	Male	Female	Both sexes	Male	Female	Both sexes	Male	Female	Both sexes	Male	Female	Both sexes	Male	Female	Both sexes	Male	Female	Both sexes	Male	Female	Both sexes	Male	Female	Both sexes	Male	Female	Both sexes	Male	Female	Both sexes	Male	Female	Both sexes	Male	Female	Both sexes	Male	Female	Both sexes	Male	Female	Both sexes	Male	Female	Both sexes	Male	Female	Both sexes	Male	Female	Both sexes	Male	Female	Both sexes	Male	Female	Both sexes	Male	Female	Both sexes	Male	Female	Both sexes	Male	Female	Both sexes	Male	Female	Both sexes	Male	Female	Both sexes	Male	Female	Both sexes	Male	Female
Afghanistan	5.5 [3.4-8.1]	3.2 [1.3-6.4]	7.6 [4.3-12.4]	5.2 [3.3-7.7]	3.0 [1.3-6.0]	7.3 [4.1-11.8]	4.9 [3.1-7.3]	2.8 [1.2-5.6]	7.0 [4.0-11.3]	4.7 [2.9-6.9]	2.7 [1.1-5.3]	6.6 [3.8-10.7]	4.4 [2.8-6.6]	2.5 [1.1-5.0]	6.3 [3.6-10.2]	4.2 [2.6-6.2]	2.4 [1.0-4.7]	6.0 [3.4-9.7]	4.0 [2.5-5.9]	2.2 [1.0-4.4]	5.7 [3.3-9.2]	3.8 [2.4-5.6]	2.1 [0.9-4.2]	5.4 [3.1-8.8]	3.6 [2.2-5.4]	2.0 [0.8-3.9]	5.2 [2.9-8.4]	3.4 [2.1-5.1]	1.8 [0.8-3.7]	4.9 [2.8-8.0]	3.2 [2.0-4.9]	1.7 [0.7-3.5]	4.7 [2.6-7.7]	3.0 [1.9-4.6]	1.6 [0.7-3.3]	4.4 [2.4-7.3]	2.9 [1.7-4.4]	1.5 [0.6-3.1]	4.2 [2.3-7.0]	2.7 [1.6-4.2]	1.4 [0.6-3.0]	4.0 [2.1-6.7]	2.6 [1.5-4.0]	1.3 [0.5-2.8]	3.8 [2.0-6.4]	2.4 [1.4-3.8]	1.3 [0.5-2.7]	3.6 [1.9-6.2]	2.3 [1.3-3.7]	1.2 [0.4-2.5]	3.4 [1.8-5.9]	2.2 [1.3-3.5]	1.1 [0.4-2.4]	3.2 [1.6-5.6]	2.1 [1.2-3.3]	1.0 [0.4-2.2]	3.0 [1.5-5.4]	1.9 [1.1-3.2]	1.0 [0.4-2.1]	2.9 [1.4-5.1]	1.8 [1.0-3.0]	0.9 [0.3-2.0]	2.7 [1.3-4.9]	1.7 [1.0-2.9]	0.8 [0.3-1.9]	2.6 [1.2-4.7]	1.6 [0.9-2.8]	0.8 [0.3-1.8]	2.4 [1.2-4.5]	1.5 [0.8-2.6]	0.7 [0.2-1.7]	2.3 [1.1-4.3]	1.5 [0.8-2.5]	0.7 [0.2-1.6]	2.2 [1.0-4.1]	1.4 [0.7-2.4]	0.6 [0.2-1.5]	2.1 [0.9-3.9]	1.3 [0.7-2.3]	0.6 [0.2-1.4]	1.9 [0.9-3.7]	1.2 [0.6-2.2]	0.6 [0.2-1.3]	1.8 [0.8-3.5]	1.2 [0.6-2.1]	0.5 [0.2-1.2]	1.7 [0.7-3.4]	1.1 [0.5-2.0]	0.5 [0.1-1.2]	1.6 [0.7-3.2]	1.0 [0.5-1.9]	0.4 [0.1-1.1]	1.5 [0.6-3.1]	1.0 [0.5-1.8]	0.4 [0.1-1.0]	1.5 [0.6-3.0]	0.9 [0.4-1.7]	0.4 [0.1-1.0]	1.4 [0.5-2.9]	0.8 [0.4-1.6]	0.4 [0.1-0.9]	1.3 [0.5-2.7]	0.8 [0.4-1.5]	0.3 [0.1-0.9]	1.2 [0.4-2.6]	0.7 [0.3-1.4]	0.3 [0.1-0.8]	1.1 [0.4-2.5]	0.7 [0.3-1.4]	0.3 [0.1-0.8]	1.1 [0.4-2.4]	0.6 [0.3-1.3]	0.3 [0.1-0.8]	1.0 [0.3-2.3]	0.6 [0.2-1.3]	0.2 [0.1-0.7]	0.9 [0.3-2.2]	0.6 [0.2-1.2]	0.2 [0.0-0.7]	0.9 [0.3-2.1]	0.5 [0.2-1.1]	0.2 [0.0-0.7]	0.8 [0.2-2.0]	0.5 [0.2-1.1]	0.2 [0.0-0.6]	0.8 [0.2-2.0]
Albania	21.7 [17.0-26.7]	21.6 [14.8-29.0]	21.8 [15.3-28.9]	21.1 [16.6-26.0]	20.9 [14.4-28.1]	21.3 [15.1-28.1]	20.5 [16.2-25.1]	20.2 [13.9-27.3]	20.8 [14.9-27.4]	19.9 [15.7-24.4]	19.5 [13.4-26.3]	20.4 [14.6-26.7]	19.3 [15.3-23.7]	18.8 [13.0-25.4]	19.9 [14.3-26.0]	18.8 [14.8-23.0]	18.1 [12.5-24.5]	19.4 [14.0-25.3]	18.2 [14.3-22.3]	17.4 [12.0-23.7]	18.9 [13.6-24.8]	17.6 [13.8-21.6]	16.8 [11.5-22.9]	18.4 [13.3-24.2]	17.0 [13.4-21.0]	16.1 [10.9-22.1]	18.0 [12.9-23.6]	16.5 [12.9-20.3]	15.5 [10.4-21.2]	17.5 [12.6-23.0]	16.0 [12.4-19.8]	14.9 [9.9-20.5]	17.1 [12.2-22.5]	15.4 [12.0-19.2]	14.3 [9.4-19.8]	16.6 [11.9-22.0]	14.9 [11.5-18.7]	13.7 [9.0-19.1]	16.2 [11.5-21.5]	14.5 [11.1-18.2]	13.2 [8.5-18.5]	15.8 [11.1-21.0]	14.0 [10.7-17.7]	12.6 [8.1-18.0]	15.4 [10.8-20.6]	13.6 [10.3-17.3]	12.1 [7.7-17.3]	15.0 [10.4-20.2]	13.2 [9.9-16.8]	11.7 [7.3-16.9]	14.7 [10.1-19.8]	12.8 [9.5-16.4]	11.2 [6.9-16.4]	14.3 [9.8-19.5]	12.4 [9.2-16.0]	10.8 [6.5-15.9]	14.0 [9.5-19.1]	12.0 [8.9-15.6]	10.4 [6.2-15.4]	13.7 [9.2-18.8]	11.7 [8.5-15.2]	10.0 [5.9-15.0]	13.4 [9.0-18.6]	11.3 [8.2-14.9]	9.6 [5.6-14.6]	13.1 [8.7-18.3]	11.0 [7.9-14.5]	9.2 [5.3-14.1]	12.8 [8.4-17.9]	10.7 [7.6-14.2]	8.8 [5.1-13.7]	12.5 [8.2-17.7]	10.4 [7.3-13.9]	8.5 [4.8-13.3]	12.2 [7.9-17.5]	10.1 [7.1-13.6]	8.2 [4.6-12.9]	12.0 [7.7-17.2]	9.8 [6.9-13.3]	7.9 [4.3-12.6]	11.7 [7.4-16.9]	9.5 [6.6-13.0]	7.6 [4.1-12.3]	11.5 [7.1-16.8]	9.3 [6.4-12.8]	7.3 [3.9-11.9]	11.3 [6.9-16.6]	9.0 [6.2-12.5]	7.0 [3.7-11.6]	11.0 [6.7-16.5]	8.7 [6.0-12.3]	6.8 [3.5-11.3]	10.8 [6.5-16.4]	8.5 [5.7-12.0]	6.5 [3.3-10.9]	10.6 [6.2-16.2]	8.3 [5.5-11.8]	6.3 [3.1-10.7]	10.3 [6.0-16.1]	8.0 [5.3-11.6]	6.0 [2.9-10.4]	10.1 [5.7-15.9]	7.8 [5.1-11.4]	5.8 [2.8-10.1]	9.9 [5.5-15.8]	7.6 [4.9-11.2]	5.6 [2.6-9.9]	9.7 [5.3-15.7]	7.4 [4.6-11.0]	5.4 [2.4-9.7]	9.5 [5.0-15.5]	7.2 [4.4-10.9]	5.2 [2.3-9.6]	9.3 [4.8-15.5]	7.0 [4.2-10.8]	5.0 [2.1-9.4]	9.1 [4.6-15.5]	6.8 [4.0-10.7]	4.8 [2.0-9.3]	8.9 [4.3-15.4]	6.7 [3.8-10.6]	4.6 [1.8-9.2]	8.8 [4.1-15.4]	6.5 [3.6-10.5]	4.4 [1.7-9.2]	8.6 [3.9-15.4]
Algeria	27.4 [22.5-32.7]	19.9 [13.6-27.1]	34.9 [27.6-42.7]	26.7 [21.9-31.8]	19.2 [13.2-26.1]	34.2 [27.1-41.7]	26.0 [21.4-30.9]	18.5 [12.7-25.0]	33.6 [26.7-40.7]	25.3 [20.9-30.1]	17.8 [12.3-24.1]	32.9 [26.4-39.8]	24.7 [20.4-29.2]	17.1 [11.9-23.2]	32.2 [25.9-39.0]	24.0 [19.9-28.4]	16.5 [11.4-22.3]	31.5 [25.4-38.2]	23.3 [19.3-27.6]	15.8 [11.0-21.4]	30.9 [24.9-37.3]	22.7 [18.8-26.8]	15.2 [10.5-20.5]	30.2 [24.3-36.4]	22.0 [18.2-26.1]	14.6 [10.0-19.8]	29.5 [23.7-35.6]	21.4 [17.6-25.4]	14.0 [9.5-19.1]	28.8 [23.1-34.8]	20.8 [17.1-24.7]	13.4 [9.1-18.4]	28.2 [22.5-34.1]	20.2 [16.5-24.1]	12.9 [8.6-17.8]	27.5 [21.8-33.5]	19.6 [16.0-23.4]	12.3 [8.2-17.2]	26.8 [21.2-32.9]	19.1 [15.5-22.8]	11.8 [7.8-16.6]	26.2 [20.5-32.2]	18.5 [14.9-22.2]	11.3 [7.4-16.0]	25.5 [19.9-31.6]	17.9 [14.4-21.6]	10.8 [6.9-15.4]	24.9 [19.3-30.9]	17.4 [13.8-21.1]	10.3 [6.6-14.8]	24.2 [18.7-30.2]	16.8 [13.4-20.5]	9.9 [6.2-14.3]	23.6 [18.0-29.6]	16.3 [12.8-19.9]	9.4 [5.8-13.8]	23.0 [17.4-28.9]	15.8 [12.4-19.4]	9.0 [5.5-13.3]	22.4 [16.7-28.3]	15.3 [11.9-18.8]	8.6 [5.1-12.8]	21.7 [16.1-27.7]	14.8 [11.4-18.3]	8.2 [4.8-12.3]	21.1 [15.5-27.0]	14.3 [10.9-17.8]	7.8 [4.6-11.8]	20.5 [14.9-26.4]	13.8 [10.5-17.3]	7.4 [4.3-11.3]	19.9 [14.3-25.8]	13.3 [10.0-16.8]	7.0 [4.0-10.8]	19.3 [13.7-25.2]	12.8 [9.6-16.3]	6.7 [3.7-10.4]	18.7 [13.1-24.5]	12.4 [9.1-15.9]	6.3 [3.5-10.0]	18.1 [12.5-24.0]	11.9 [8.7-15.4]	6.0 [3.2-9.6]	17.5 [12.0-23.5]	11.5 [8.2-15.0]	5.7 [3.0-9.2]	16.9 [11.4-23.0]	11.0 [7.8-14.6]	5.4 [2.8-8.9]	16.3 [10.9-22.4]	10.6 [7.4-14.1]	5.1 [2.6-8.5]	15.7 [10.4-21.8]	10.2 [7.0-13.7]	4.8 [2.4-8.1]	15.2 [9.8-21.4]	9.7 [6.6-13.3]	4.5 [2.2-7.8]	14.6 [9.3-21.0]	9.4 [6.2-12.9]	4.3 [2.0-7.5]	14.1 [8.8-20.5]	9.0 [5.9-12.6]	4.0 [1.9-7.2]	13.6 [8.2-20.1]	8.6 [5.5-12.3]	3.8 [1.7-7.0]	13.1 [7.8-19.7]	8.3 [5.2-12.0]	3.6 [1.6-6.7]	12.7 [7.3-19.3]	8.0 [4.9-11.7]	3.4 [1.4-6.6]	12.2 [6.9-18.9]	7.7 [4.6-11.4]	3.2 [1.3-6.4]	11.8 [6.5-18.6]	7.4 [4.3-11.3]	3.1 [1.2-6.2]	11.4 [6.2-18.4]	7.2 [4.1-11.1]	2.9 [1.1-6.1]	11.1 [5.8-18.2]	6.9 [3.9-10.9]	2.8 [1.0-6.0]	10.7 [5.5-18.0]

# Subset Obesity dataset for year_2015 and year_2016
Obesity<-Obesity[-c(1:3), c(1,2,5)]

head(Obesity)

# Changing column name of Obesity
colnames(Obesity)<- c("Country","Obesity_2016_percentage","Obesity_2015_percentage")
head(Obesity)

# Read World Happiness Report 2015 and subset Hapiness Score
WHP_2015<-read_csv("D:/RMIT/2. Data Wrangling/Assignment 2/Dataset/Group 1/World Happiness Report/2015.csv")

## Parsed with column specification:
## cols(
##   Country = col_character(),
##   Region = col_character(),
##   `Happiness Rank` = col_double(),
##   `Happiness Score` = col_double(),
##   `Standard Error` = col_double(),
##   `Economy (GDP per Capita)` = col_double(),
##   Family = col_double(),
##   `Health (Life Expectancy)` = col_double(),
##   Freedom = col_double(),
##   `Trust (Government Corruption)` = col_double(),
##   Generosity = col_double(),
##   `Dystopia Residual` = col_double()
## )

head(WHP_2015)

# Subset World Happiness Report to get Hapiness Score
WHP_2015<-WHP_2015[,c(1,2,4)]
head(WHP_2015)

# Read World Happiness Report 2016 
WHP_2016<-read_csv("D:/RMIT/2. Data Wrangling/Assignment 2/Dataset/Group 1/World Happiness Report/2016.csv")

## Parsed with column specification:
## cols(
##   Country = col_character(),
##   Region = col_character(),
##   `Happiness Rank` = col_double(),
##   `Happiness Score` = col_double(),
##   `Lower Confidence Interval` = col_double(),
##   `Upper Confidence Interval` = col_double(),
##   `Economy (GDP per Capita)` = col_double(),
##   Family = col_double(),
##   `Health (Life Expectancy)` = col_double(),
##   Freedom = col_double(),
##   `Trust (Government Corruption)` = col_double(),
##   Generosity = col_double(),
##   `Dystopia Residual` = col_double()
## )

head(WHP_2016)

# Subset World Happiness Report to get Hapiness Score
WHP_2016<-WHP_2016[,c(1,4)]
head(WHP_2016)

# Joining World Health Report in 2015 and 2016
WHP_join<-WHP_2015 %>% left_join(WHP_2016, by="Country")
head(WHP_join,10)

#Changing column name of World Health Report in 2015 and 2016
colnames(WHP_join)<-c("Country","Region","Score_2015","Score_2016")
head(WHP_join)

#Joining Obesity dataset and World Health Report datasets together
all_join<-left_join(WHP_join,Obesity, by = "Country")
head(all_join)

Understand

Checking structure of two datasets to see their data types and dimensions

# Structure of Obesity
str(all_join)

## tibble [158 x 6] (S3: tbl_df/tbl/data.frame)
##  $ Country                : chr [1:158] "Switzerland" "Iceland" "Denmark" "Norway" ...
##  $ Region                 : chr [1:158] "Western Europe" "Western Europe" "Western Europe" "Western Europe" ...
##  $ Score_2015             : num [1:158] 7.59 7.56 7.53 7.52 7.43 ...
##  $ Score_2016             : num [1:158] 7.51 7.5 7.53 7.5 7.4 ...
##  $ Obesity_2016_percentage: chr [1:158] "19.5 [16.0-23.3]" "21.9 [18.0-26.0]" "19.7 [16.2-23.3]" "23.1 [19.3-27.1]" ...
##  $ Obesity_2015_percentage: chr [1:158] "19.1 [15.7-22.7]" "21.5 [17.8-25.4]" "19.3 [16.1-22.7]" "22.6 [19.0-26.4]" ...

-At this stage, it is obvious to see that: there are two main data type of the all_join dataset, which are character and numeric. The dataset has 158 observations and 6 variables.

There are 6 variables: Country(character), Region(character), Score_2015(numeric), Score_2016(numeric), Obesity_2016_percentage(character), Obesity_2015_percentage(character)
However, in order to compare the difference of obesity in 2015 and 2016, the data column of Obesity_2015_percentage and Obesity_2016_percentage needs to be converted to numeric data type for mutating.
As can been seen from the structure, i cannot convert to numeric data type at this stage as the column of Obesity_2015_percentage and Obesity_2016_percentage needs to be tidy up before converting and factorizing. Therefore, i cannot do conversions and factorizing at this stage. They will be performed in the next stages.

Tidy & Manipulate Data I

-Firstly, i need to check the dataset to find any problem.

# Checking head

head(all_join)

As can be seen from the head of 6 observations above, the columns named Obesity_2016_percentage and Obesity_2015_percentage are untidy as they contain intervals in the cells. In order to convert to numeric, i need to be tidy and then reshape the dataset.

In this stage, i will use separate() function to split the intervals of the two column and then remove the interval.
Next, i will convert the Obesity_2016_percentage and Obesity_2015_percentage columns into numeric.
Then, i will factorize and label the Score_2015 and Score-2016 collumns

# Using separate function to split "Obesity_2016_percentage" and its interval
all_join %<>% separate("Obesity_2016_percentage",into= c("Obesity_2016_percentage", "Intervals_1"), sep=" ") 
knitr::kable(head(all_join,10))

Country	Region	Score_2015	Score_2016	Obesity_2016_percentage	Intervals_1	Obesity_2015_percentage
Switzerland	Western Europe	7.587	7.509	19.5	[16.0-23.3]	19.1 [15.7-22.7]
Iceland	Western Europe	7.561	7.501	21.9	[18.0-26.0]	21.5 [17.8-25.4]
Denmark	Western Europe	7.527	7.526	19.7	[16.2-23.3]	19.3 [16.1-22.7]
Norway	Western Europe	7.522	7.498	23.1	[19.3-27.1]	22.6 [19.0-26.4]
Canada	North America	7.427	7.404	29.4	[25.7-33.3]	28.8 [25.3-32.4]
Finland	Western Europe	7.406	7.413	22.2	[19.0-25.7]	21.8 [18.8-25.1]
Netherlands	Western Europe	7.378	7.339	20.4	[16.9-24.2]	20.0 [16.7-23.6]
Sweden	Western Europe	7.364	7.291	20.6	[17.1-24.3]	20.2 [16.9-23.6]
New Zealand	Australia and New Zealand	7.286	7.334	30.8	[27.3-34.3]	30.2 [26.9-33.5]
Australia	Australia and New Zealand	7.284	7.313	29.0	[25.3-32.9]	28.4 [24.9-32.1]

# Using separate function to split "Obesity_2015_percentage" and its interval
all_join %<>% separate("Obesity_2015_percentage",into= c("Obesity_2015_percentage", "Intervals_2"), sep=" ") 
knitr::kable(head(all_join,10))

Country	Region	Score_2015	Score_2016	Obesity_2016_percentage	Intervals_1	Obesity_2015_percentage	Intervals_2
Switzerland	Western Europe	7.587	7.509	19.5	[16.0-23.3]	19.1	[15.7-22.7]
Iceland	Western Europe	7.561	7.501	21.9	[18.0-26.0]	21.5	[17.8-25.4]
Denmark	Western Europe	7.527	7.526	19.7	[16.2-23.3]	19.3	[16.1-22.7]
Norway	Western Europe	7.522	7.498	23.1	[19.3-27.1]	22.6	[19.0-26.4]
Canada	North America	7.427	7.404	29.4	[25.7-33.3]	28.8	[25.3-32.4]
Finland	Western Europe	7.406	7.413	22.2	[19.0-25.7]	21.8	[18.8-25.1]
Netherlands	Western Europe	7.378	7.339	20.4	[16.9-24.2]	20.0	[16.7-23.6]
Sweden	Western Europe	7.364	7.291	20.6	[17.1-24.3]	20.2	[16.9-23.6]
New Zealand	Australia and New Zealand	7.286	7.334	30.8	[27.3-34.3]	30.2	[26.9-33.5]
Australia	Australia and New Zealand	7.284	7.313	29.0	[25.3-32.9]	28.4	[24.9-32.1]

# Removing intervals 
all_join<-all_join[ , -c(6,8)]
head(all_join)

Now, we can see that all the intervals have been removed and then, will perform the data conversion for the Obesity_2016_percentage and Obesity_2015_percentage columns.

# Convert data type from character to numeric
all_join[ , c(5)]<-as.numeric(unlist(all_join[ , c(5)]))

## Warning: NAs introduced by coercion

all_join[ , c(6)]<-as.numeric(unlist(all_join[ , c(6)]))

## Warning: NAs introduced by coercion

str(all_join)

## tibble [158 x 6] (S3: tbl_df/tbl/data.frame)
##  $ Country                : chr [1:158] "Switzerland" "Iceland" "Denmark" "Norway" ...
##  $ Region                 : chr [1:158] "Western Europe" "Western Europe" "Western Europe" "Western Europe" ...
##  $ Score_2015             : num [1:158] 7.59 7.56 7.53 7.52 7.43 ...
##  $ Score_2016             : num [1:158] 7.51 7.5 7.53 7.5 7.4 ...
##  $ Obesity_2016_percentage: num [1:158] 19.5 21.9 19.7 23.1 29.4 22.2 20.4 20.6 30.8 29 ...
##  $ Obesity_2015_percentage: num [1:158] 19.1 21.5 19.3 22.6 28.8 21.8 20 20.2 30.2 28.4 ...

Now, the Obesity_2016_percentage and Obesity_2015_percentage columns have been converted to numeric datatype and ready for mutating in the next stages.
In the following steps, i will convert the dataset from long format and wide format and do factorizing and labelling.

# Transfer World Health Report to long format
all_join1 <- all_join %>% 
  pivot_longer(3:4, names_to="Years", values_to="happiness_score")
knitr::kable(head(all_join1))

Country	Region	Obesity_2016_percentage	Obesity_2015_percentage	Years	happiness_score
Switzerland	Western Europe	19.5	19.1	Score_2015	7.587
Switzerland	Western Europe	19.5	19.1	Score_2016	7.509
Iceland	Western Europe	21.9	21.5	Score_2015	7.561
Iceland	Western Europe	21.9	21.5	Score_2016	7.501
Denmark	Western Europe	19.7	19.3	Score_2015	7.527
Denmark	Western Europe	19.7	19.3	Score_2016	7.526

# Factorise Year variable
all_join1$Years <-all_join1$Years %>%factor(levels=c("Score_2015", "Score_2016"),labels=c("2015", "2016"))
head(all_join1)

Tidy & Manipulate Data II

At this stage, the dataset is tidy, and i will create new column to see the difference of happiness and obesity in 2015 and 2016

# Mutating the combined dataset to see the changes between 2015 and 2016
all_join<-mutate(all_join,Difference_hapiness= Score_2016 - Score_2015, 
                 Difference_obesity=(Obesity_2016_percentage - Obesity_2015_percentage) )
knitr::kable(head(all_join))

Country	Region	Score_2015	Score_2016	Obesity_2016_percentage	Obesity_2015_percentage	Difference_hapiness	Difference_obesity
Switzerland	Western Europe	7.587	7.509	19.5	19.1	-0.078	0.4
Iceland	Western Europe	7.561	7.501	21.9	21.5	-0.060	0.4
Denmark	Western Europe	7.527	7.526	19.7	19.3	-0.001	0.4
Norway	Western Europe	7.522	7.498	23.1	22.6	-0.024	0.5
Canada	North America	7.427	7.404	29.4	28.8	-0.023	0.6
Finland	Western Europe	7.406	7.413	22.2	21.8	0.007	0.4

Now, we can see that the new columns named Difference_hapiness and Difference_obesity are created.

Scan I

In this stage, i will scan the columns Difference_happiness and Difference_obesity for missing value. By using sum(), i can know how many missing values in each variable.

If there is any missing value, i will choose to exclude the observations that have missing values. The reason i do not replace missing values by using other methods such as the mean, median or mode is that each country has its own value, and each value change every year depending on the country’s circumstance. For example, in the Covid-19 period, the happiness score will be significantly different and it is impossible to predict or replace the missing value.

# Scanning missing value in the difference in Obesity and World Health Report
sum(is.na(all_join$Difference_hapiness))

## [1] 7

sum(is.na(all_join$Difference_obesity))

## [1] 25

-Now, we can see that the “Difference_happiness” column has 7 missing values, while the “Difference_Obesity” contains 25 missing value - In the next step,i will remove the missing value by applying complete.cases() method for each variable.

# Excluding missing data
all_join<-all_join[complete.cases(all_join$Difference_hapiness),]
all_join<-all_join[complete.cases(all_join$Difference_obesity),]
sum(is.na(all_join))

## [1] 0

The dataset is now having no missing value.
In the following steps, i will create a function to check for infinite values and nan values. Then, i will use apply family function to apply the function to the dataset.

# Create function to check for special values
is.special <- function(x){
if (is.numeric(x)) (is.infinite(x) | is.nan(x))
}

# Applying function
sum(sapply(all_join$Difference_hapiness, is.special))

## [1] 0

sum(sapply(all_join$Difference_obesity, is.special))

## [1] 0

After applying created functions, it is clear to see that the dataset has no special value.

Scan II

In this stage, because of having two variables, I will use the univariate box plot approach to detect any outlier.

# Scanning for outlier
all_join%>%plot(Difference_obesity~Difference_hapiness, data=., main="Relationship of Happiness levels and Obesity",
                 xlab = "Hapiness", ylab = "Obesity")

From the plot above, it is obvious to see that there are some possible outlier on the lower left, top left and upper right of the scatter.
Then, i will Mahalanobis distance methods to detect specific outlier. But before using Mahalanobis distance method, i need to subset the dataset.

# Subsetting data for outlier check
Happiness_Obesity<-all_join%>%select(Difference_hapiness,Difference_obesity)
head(Happiness_Obesity)

# Multivariate outlier detection using Mahalanobis distance with QQ plots
results <- mvn(data = Happiness_Obesity, multivariateOutlierMethod = "quan", showOutliers = TRUE)

results$multivariateOutliers

After applying Mahalanobis distance method, it is clear to see that there are 10 outlier in the dataset. However, because those outliers are values of countries so they are valuable, therefore i cannot remove, impute or transform them.

Transform

In this stage, i need to check the distribution of the Difference_in_happiness and Difference_in_obesity variables.

# Checking distribution for Difference_happiness variable
all_join$Difference_hapiness %>% hist(col="grey",xlab = "Hapiness",main = "Histogram of difference in happiness")

The Difference_in_happiness variable is slightly skewed to the left. Then i will apply boxcox transformation to make it more symmetrical.

# Applying BoxCox transformation
boxcox_happiness<-BoxCox(all_join$Difference_hapiness, lambda="auto")
hist(boxcox_happiness)

Now the Difference_happiness is more symmetrical.
Next, i will check the Difference_obesity variable.

# Checking distribution for Difference_Obesity variable
all_join$Difference_obesity %>% hist(col="grey",xlab = "Obesity",main = "Histogram of difference in Obesity")

I will apply the boxcox transformmation for this variable as well.

# Applying BoxCox transformation
boxcox_obesity<-BoxCox(all_join$Difference_obesity,lambda="auto")
hist(boxcox_obesity)

The Difference_obesity is now more symmetrical.

Reference

Obesity among adults by country, 1975-2016 https://www.kaggle.com/amanarora/obesity-among-adults-by-country-19752016
World Happiness Report 2015 https://www.kaggle.com/mathurinache/world-happiness-report
World Happiness Report 2016 https://www.kaggle.com/mathurinache/world-happiness-report