# This is the R chunk for the required packages
library(readr)
library(ggplot2) # Useful for creating plots
library(dplyr) # Useful for data manipulation
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyr)
library(knitr) # Useful for creating nice tables
library(kableExtra)
##
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
##
## group_rows
library(ggplot2)
library(magrittr)
##
## Attaching package: 'magrittr'
## The following object is masked from 'package:tidyr':
##
## extract
library(Hmisc)
## Loading required package: lattice
## Loading required package: survival
## Loading required package: Formula
##
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:dplyr':
##
## src, summarize
## The following objects are masked from 'package:base':
##
## format.pval, units
library(outliers)
library(MVN)
## Registered S3 method overwritten by 'GGally':
## method from
## +.gg ggplot2
## sROC 0.1-2 loaded
library(infotheo)
library(forecast)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
The purpose of wrangling datasets in this assignment is for comparing the effect of happiness on obesity in 2 year 2015 and 2016. In order to have a clean dataset, we have 7 essential stages described below:
Data stage: There are 3 datasets, that are “Obesity among adults by country,1975-2016”, “The World Happiness Report 2015” and “The World Happiness Report 2016”. In this stage, I will import all the datasets and subset them to have data in 2 year 2015 and 2016. Then, I changed the column names of datasets where necessary. After that, I join the two World Happiness Report together and change column name where necessary (named WHP_join). Finally, I joined all datasets together and had one data called “all_join”.
Understand stage: I checked data type of every variables for data conversion. However, all the data conversion and factorizing will be done in the next stage.
Tidy & Manipulate Data 1 stage: I found that there are two variables that need to be tidy. I separate data from each column and removed unnecessary data. Also, I check variables that need to be tidy up. Also in this stage, I converted the cleaned variables into numeric datatype. Then, I transform data from wide format to long format and finally factorized them.
Tidy & Manipulate Data 2 stage: from cleaned dataset, I mutated to have 2 new variables called “Difference_happiness” and “Difference_Obesity”.
Scan 1 stage: I scanned and summed all missing value and filtered them to not have any missing values in the datasets. I also checked any special values available and did not find any.
Scan 2 stage: I used box plot to check whether there are any outlier or not. Then, I used Mahalanobis distance methods to find specific outliers. Finally, I chose to keep all outlier.
Transform stage: I plotted histogram to see how distributed the variables are. Then I decided to use Boxcox transformation to make the graph more symmetrical.
There are three datasets used in my assignment 2:
The first one is Obesity among adults by country, 1975-2016 which is collected from https://www.kaggle.com/amanarora/obesity-among-adults-by-country-19752016. The data is originally from WHO.This dataset has 198 observations and 127 variables. This dataset contains information about obesity percentage of population in all countries over the world from 1975 to 2016.
The second dataset is the World Happiness Report 2015, which is collected from “https://www.kaggle.com/mathurinache/world-happiness-report”.This dataset has 12 variables and observations, which ranks the levels of happiness of 159 countries based on 6 main factors and the 7th factor called Dystopia/the residuals.
The third dataset is the World Happiness Report 2016, which is collected from “https://www.kaggle.com/mathurinache/world-happiness-report”. This form of dataset is similar to the second dataset, but collected in different year.
Because the obesity datasets are used to analyse the relationship betwwen obesity and happiness levels in 2015 and 2016. So, all the data in these 2 years are kept and the other years are excluded.
The World Happiness Report 2015 and 2016 are used to see the difference in happiness score. The Happiness Score is a sum of scores of other 7 factors.
In this stage, i will rename the column names of datasets where neccessary, and then merge all data together after subsetting the 3 datasets. After that, i will check its head.
# Read Obesity dataset and subset for year_2015 and year_2016
Obesity<-read.csv("D:/RMIT/2. Data Wrangling/Assignment 2/Dataset/Group 1/obesity among adult by countries/data.csv")
knitr::kable(head(Obesity))
| X | X2016 | X2016.1 | X2016.2 | X2015 | X2015.1 | X2015.2 | X2014 | X2014.1 | X2014.2 | X2013 | X2013.1 | X2013.2 | X2012 | X2012.1 | X2012.2 | X2011 | X2011.1 | X2011.2 | X2010 | X2010.1 | X2010.2 | X2009 | X2009.1 | X2009.2 | X2008 | X2008.1 | X2008.2 | X2007 | X2007.1 | X2007.2 | X2006 | X2006.1 | X2006.2 | X2005 | X2005.1 | X2005.2 | X2004 | X2004.1 | X2004.2 | X2003 | X2003.1 | X2003.2 | X2002 | X2002.1 | X2002.2 | X2001 | X2001.1 | X2001.2 | X2000 | X2000.1 | X2000.2 | X1999 | X1999.1 | X1999.2 | X1998 | X1998.1 | X1998.2 | X1997 | X1997.1 | X1997.2 | X1996 | X1996.1 | X1996.2 | X1995 | X1995.1 | X1995.2 | X1994 | X1994.1 | X1994.2 | X1993 | X1993.1 | X1993.2 | X1992 | X1992.1 | X1992.2 | X1991 | X1991.1 | X1991.2 | X1990 | X1990.1 | X1990.2 | X1989 | X1989.1 | X1989.2 | X1988 | X1988.1 | X1988.2 | X1987 | X1987.1 | X1987.2 | X1986 | X1986.1 | X1986.2 | X1985 | X1985.1 | X1985.2 | X1984 | X1984.1 | X1984.2 | X1983 | X1983.1 | X1983.2 | X1982 | X1982.1 | X1982.2 | X1981 | X1981.1 | X1981.2 | X1980 | X1980.1 | X1980.2 | X1979 | X1979.1 | X1979.2 | X1978 | X1978.1 | X1978.2 | X1977 | X1977.1 | X1977.2 | X1976 | X1976.1 | X1976.2 | X1975 | X1975.1 | X1975.2 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | Prevalence of obesity among adults, BMI ≥ 30 (age-standardized estimate) (%) | |
| 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | 18+ years | |
| Country | Both sexes | Male | Female | Both sexes | Male | Female | Both sexes | Male | Female | Both sexes | Male | Female | Both sexes | Male | Female | Both sexes | Male | Female | Both sexes | Male | Female | Both sexes | Male | Female | Both sexes | Male | Female | Both sexes | Male | Female | Both sexes | Male | Female | Both sexes | Male | Female | Both sexes | Male | Female | Both sexes | Male | Female | Both sexes | Male | Female | Both sexes | Male | Female | Both sexes | Male | Female | Both sexes | Male | Female | Both sexes | Male | Female | Both sexes | Male | Female | Both sexes | Male | Female | Both sexes | Male | Female | Both sexes | Male | Female | Both sexes | Male | Female | Both sexes | Male | Female | Both sexes | Male | Female | Both sexes | Male | Female | Both sexes | Male | Female | Both sexes | Male | Female | Both sexes | Male | Female | Both sexes | Male | Female | Both sexes | Male | Female | Both sexes | Male | Female | Both sexes | Male | Female | Both sexes | Male | Female | Both sexes | Male | Female | Both sexes | Male | Female | Both sexes | Male | Female | Both sexes | Male | Female | Both sexes | Male | Female | Both sexes | Male | Female | Both sexes | Male | Female |
| Afghanistan | 5.5 [3.4-8.1] | 3.2 [1.3-6.4] | 7.6 [4.3-12.4] | 5.2 [3.3-7.7] | 3.0 [1.3-6.0] | 7.3 [4.1-11.8] | 4.9 [3.1-7.3] | 2.8 [1.2-5.6] | 7.0 [4.0-11.3] | 4.7 [2.9-6.9] | 2.7 [1.1-5.3] | 6.6 [3.8-10.7] | 4.4 [2.8-6.6] | 2.5 [1.1-5.0] | 6.3 [3.6-10.2] | 4.2 [2.6-6.2] | 2.4 [1.0-4.7] | 6.0 [3.4-9.7] | 4.0 [2.5-5.9] | 2.2 [1.0-4.4] | 5.7 [3.3-9.2] | 3.8 [2.4-5.6] | 2.1 [0.9-4.2] | 5.4 [3.1-8.8] | 3.6 [2.2-5.4] | 2.0 [0.8-3.9] | 5.2 [2.9-8.4] | 3.4 [2.1-5.1] | 1.8 [0.8-3.7] | 4.9 [2.8-8.0] | 3.2 [2.0-4.9] | 1.7 [0.7-3.5] | 4.7 [2.6-7.7] | 3.0 [1.9-4.6] | 1.6 [0.7-3.3] | 4.4 [2.4-7.3] | 2.9 [1.7-4.4] | 1.5 [0.6-3.1] | 4.2 [2.3-7.0] | 2.7 [1.6-4.2] | 1.4 [0.6-3.0] | 4.0 [2.1-6.7] | 2.6 [1.5-4.0] | 1.3 [0.5-2.8] | 3.8 [2.0-6.4] | 2.4 [1.4-3.8] | 1.3 [0.5-2.7] | 3.6 [1.9-6.2] | 2.3 [1.3-3.7] | 1.2 [0.4-2.5] | 3.4 [1.8-5.9] | 2.2 [1.3-3.5] | 1.1 [0.4-2.4] | 3.2 [1.6-5.6] | 2.1 [1.2-3.3] | 1.0 [0.4-2.2] | 3.0 [1.5-5.4] | 1.9 [1.1-3.2] | 1.0 [0.4-2.1] | 2.9 [1.4-5.1] | 1.8 [1.0-3.0] | 0.9 [0.3-2.0] | 2.7 [1.3-4.9] | 1.7 [1.0-2.9] | 0.8 [0.3-1.9] | 2.6 [1.2-4.7] | 1.6 [0.9-2.8] | 0.8 [0.3-1.8] | 2.4 [1.2-4.5] | 1.5 [0.8-2.6] | 0.7 [0.2-1.7] | 2.3 [1.1-4.3] | 1.5 [0.8-2.5] | 0.7 [0.2-1.6] | 2.2 [1.0-4.1] | 1.4 [0.7-2.4] | 0.6 [0.2-1.5] | 2.1 [0.9-3.9] | 1.3 [0.7-2.3] | 0.6 [0.2-1.4] | 1.9 [0.9-3.7] | 1.2 [0.6-2.2] | 0.6 [0.2-1.3] | 1.8 [0.8-3.5] | 1.2 [0.6-2.1] | 0.5 [0.2-1.2] | 1.7 [0.7-3.4] | 1.1 [0.5-2.0] | 0.5 [0.1-1.2] | 1.6 [0.7-3.2] | 1.0 [0.5-1.9] | 0.4 [0.1-1.1] | 1.5 [0.6-3.1] | 1.0 [0.5-1.8] | 0.4 [0.1-1.0] | 1.5 [0.6-3.0] | 0.9 [0.4-1.7] | 0.4 [0.1-1.0] | 1.4 [0.5-2.9] | 0.8 [0.4-1.6] | 0.4 [0.1-0.9] | 1.3 [0.5-2.7] | 0.8 [0.4-1.5] | 0.3 [0.1-0.9] | 1.2 [0.4-2.6] | 0.7 [0.3-1.4] | 0.3 [0.1-0.8] | 1.1 [0.4-2.5] | 0.7 [0.3-1.4] | 0.3 [0.1-0.8] | 1.1 [0.4-2.4] | 0.6 [0.3-1.3] | 0.3 [0.1-0.8] | 1.0 [0.3-2.3] | 0.6 [0.2-1.3] | 0.2 [0.1-0.7] | 0.9 [0.3-2.2] | 0.6 [0.2-1.2] | 0.2 [0.0-0.7] | 0.9 [0.3-2.1] | 0.5 [0.2-1.1] | 0.2 [0.0-0.7] | 0.8 [0.2-2.0] | 0.5 [0.2-1.1] | 0.2 [0.0-0.6] | 0.8 [0.2-2.0] |
| Albania | 21.7 [17.0-26.7] | 21.6 [14.8-29.0] | 21.8 [15.3-28.9] | 21.1 [16.6-26.0] | 20.9 [14.4-28.1] | 21.3 [15.1-28.1] | 20.5 [16.2-25.1] | 20.2 [13.9-27.3] | 20.8 [14.9-27.4] | 19.9 [15.7-24.4] | 19.5 [13.4-26.3] | 20.4 [14.6-26.7] | 19.3 [15.3-23.7] | 18.8 [13.0-25.4] | 19.9 [14.3-26.0] | 18.8 [14.8-23.0] | 18.1 [12.5-24.5] | 19.4 [14.0-25.3] | 18.2 [14.3-22.3] | 17.4 [12.0-23.7] | 18.9 [13.6-24.8] | 17.6 [13.8-21.6] | 16.8 [11.5-22.9] | 18.4 [13.3-24.2] | 17.0 [13.4-21.0] | 16.1 [10.9-22.1] | 18.0 [12.9-23.6] | 16.5 [12.9-20.3] | 15.5 [10.4-21.2] | 17.5 [12.6-23.0] | 16.0 [12.4-19.8] | 14.9 [9.9-20.5] | 17.1 [12.2-22.5] | 15.4 [12.0-19.2] | 14.3 [9.4-19.8] | 16.6 [11.9-22.0] | 14.9 [11.5-18.7] | 13.7 [9.0-19.1] | 16.2 [11.5-21.5] | 14.5 [11.1-18.2] | 13.2 [8.5-18.5] | 15.8 [11.1-21.0] | 14.0 [10.7-17.7] | 12.6 [8.1-18.0] | 15.4 [10.8-20.6] | 13.6 [10.3-17.3] | 12.1 [7.7-17.3] | 15.0 [10.4-20.2] | 13.2 [9.9-16.8] | 11.7 [7.3-16.9] | 14.7 [10.1-19.8] | 12.8 [9.5-16.4] | 11.2 [6.9-16.4] | 14.3 [9.8-19.5] | 12.4 [9.2-16.0] | 10.8 [6.5-15.9] | 14.0 [9.5-19.1] | 12.0 [8.9-15.6] | 10.4 [6.2-15.4] | 13.7 [9.2-18.8] | 11.7 [8.5-15.2] | 10.0 [5.9-15.0] | 13.4 [9.0-18.6] | 11.3 [8.2-14.9] | 9.6 [5.6-14.6] | 13.1 [8.7-18.3] | 11.0 [7.9-14.5] | 9.2 [5.3-14.1] | 12.8 [8.4-17.9] | 10.7 [7.6-14.2] | 8.8 [5.1-13.7] | 12.5 [8.2-17.7] | 10.4 [7.3-13.9] | 8.5 [4.8-13.3] | 12.2 [7.9-17.5] | 10.1 [7.1-13.6] | 8.2 [4.6-12.9] | 12.0 [7.7-17.2] | 9.8 [6.9-13.3] | 7.9 [4.3-12.6] | 11.7 [7.4-16.9] | 9.5 [6.6-13.0] | 7.6 [4.1-12.3] | 11.5 [7.1-16.8] | 9.3 [6.4-12.8] | 7.3 [3.9-11.9] | 11.3 [6.9-16.6] | 9.0 [6.2-12.5] | 7.0 [3.7-11.6] | 11.0 [6.7-16.5] | 8.7 [6.0-12.3] | 6.8 [3.5-11.3] | 10.8 [6.5-16.4] | 8.5 [5.7-12.0] | 6.5 [3.3-10.9] | 10.6 [6.2-16.2] | 8.3 [5.5-11.8] | 6.3 [3.1-10.7] | 10.3 [6.0-16.1] | 8.0 [5.3-11.6] | 6.0 [2.9-10.4] | 10.1 [5.7-15.9] | 7.8 [5.1-11.4] | 5.8 [2.8-10.1] | 9.9 [5.5-15.8] | 7.6 [4.9-11.2] | 5.6 [2.6-9.9] | 9.7 [5.3-15.7] | 7.4 [4.6-11.0] | 5.4 [2.4-9.7] | 9.5 [5.0-15.5] | 7.2 [4.4-10.9] | 5.2 [2.3-9.6] | 9.3 [4.8-15.5] | 7.0 [4.2-10.8] | 5.0 [2.1-9.4] | 9.1 [4.6-15.5] | 6.8 [4.0-10.7] | 4.8 [2.0-9.3] | 8.9 [4.3-15.4] | 6.7 [3.8-10.6] | 4.6 [1.8-9.2] | 8.8 [4.1-15.4] | 6.5 [3.6-10.5] | 4.4 [1.7-9.2] | 8.6 [3.9-15.4] |
| Algeria | 27.4 [22.5-32.7] | 19.9 [13.6-27.1] | 34.9 [27.6-42.7] | 26.7 [21.9-31.8] | 19.2 [13.2-26.1] | 34.2 [27.1-41.7] | 26.0 [21.4-30.9] | 18.5 [12.7-25.0] | 33.6 [26.7-40.7] | 25.3 [20.9-30.1] | 17.8 [12.3-24.1] | 32.9 [26.4-39.8] | 24.7 [20.4-29.2] | 17.1 [11.9-23.2] | 32.2 [25.9-39.0] | 24.0 [19.9-28.4] | 16.5 [11.4-22.3] | 31.5 [25.4-38.2] | 23.3 [19.3-27.6] | 15.8 [11.0-21.4] | 30.9 [24.9-37.3] | 22.7 [18.8-26.8] | 15.2 [10.5-20.5] | 30.2 [24.3-36.4] | 22.0 [18.2-26.1] | 14.6 [10.0-19.8] | 29.5 [23.7-35.6] | 21.4 [17.6-25.4] | 14.0 [9.5-19.1] | 28.8 [23.1-34.8] | 20.8 [17.1-24.7] | 13.4 [9.1-18.4] | 28.2 [22.5-34.1] | 20.2 [16.5-24.1] | 12.9 [8.6-17.8] | 27.5 [21.8-33.5] | 19.6 [16.0-23.4] | 12.3 [8.2-17.2] | 26.8 [21.2-32.9] | 19.1 [15.5-22.8] | 11.8 [7.8-16.6] | 26.2 [20.5-32.2] | 18.5 [14.9-22.2] | 11.3 [7.4-16.0] | 25.5 [19.9-31.6] | 17.9 [14.4-21.6] | 10.8 [6.9-15.4] | 24.9 [19.3-30.9] | 17.4 [13.8-21.1] | 10.3 [6.6-14.8] | 24.2 [18.7-30.2] | 16.8 [13.4-20.5] | 9.9 [6.2-14.3] | 23.6 [18.0-29.6] | 16.3 [12.8-19.9] | 9.4 [5.8-13.8] | 23.0 [17.4-28.9] | 15.8 [12.4-19.4] | 9.0 [5.5-13.3] | 22.4 [16.7-28.3] | 15.3 [11.9-18.8] | 8.6 [5.1-12.8] | 21.7 [16.1-27.7] | 14.8 [11.4-18.3] | 8.2 [4.8-12.3] | 21.1 [15.5-27.0] | 14.3 [10.9-17.8] | 7.8 [4.6-11.8] | 20.5 [14.9-26.4] | 13.8 [10.5-17.3] | 7.4 [4.3-11.3] | 19.9 [14.3-25.8] | 13.3 [10.0-16.8] | 7.0 [4.0-10.8] | 19.3 [13.7-25.2] | 12.8 [9.6-16.3] | 6.7 [3.7-10.4] | 18.7 [13.1-24.5] | 12.4 [9.1-15.9] | 6.3 [3.5-10.0] | 18.1 [12.5-24.0] | 11.9 [8.7-15.4] | 6.0 [3.2-9.6] | 17.5 [12.0-23.5] | 11.5 [8.2-15.0] | 5.7 [3.0-9.2] | 16.9 [11.4-23.0] | 11.0 [7.8-14.6] | 5.4 [2.8-8.9] | 16.3 [10.9-22.4] | 10.6 [7.4-14.1] | 5.1 [2.6-8.5] | 15.7 [10.4-21.8] | 10.2 [7.0-13.7] | 4.8 [2.4-8.1] | 15.2 [9.8-21.4] | 9.7 [6.6-13.3] | 4.5 [2.2-7.8] | 14.6 [9.3-21.0] | 9.4 [6.2-12.9] | 4.3 [2.0-7.5] | 14.1 [8.8-20.5] | 9.0 [5.9-12.6] | 4.0 [1.9-7.2] | 13.6 [8.2-20.1] | 8.6 [5.5-12.3] | 3.8 [1.7-7.0] | 13.1 [7.8-19.7] | 8.3 [5.2-12.0] | 3.6 [1.6-6.7] | 12.7 [7.3-19.3] | 8.0 [4.9-11.7] | 3.4 [1.4-6.6] | 12.2 [6.9-18.9] | 7.7 [4.6-11.4] | 3.2 [1.3-6.4] | 11.8 [6.5-18.6] | 7.4 [4.3-11.3] | 3.1 [1.2-6.2] | 11.4 [6.2-18.4] | 7.2 [4.1-11.1] | 2.9 [1.1-6.1] | 11.1 [5.8-18.2] | 6.9 [3.9-10.9] | 2.8 [1.0-6.0] | 10.7 [5.5-18.0] |
# Subset Obesity dataset for year_2015 and year_2016
Obesity<-Obesity[-c(1:3), c(1,2,5)]
head(Obesity)
# Changing column name of Obesity
colnames(Obesity)<- c("Country","Obesity_2016_percentage","Obesity_2015_percentage")
head(Obesity)
# Read World Happiness Report 2015 and subset Hapiness Score
WHP_2015<-read_csv("D:/RMIT/2. Data Wrangling/Assignment 2/Dataset/Group 1/World Happiness Report/2015.csv")
## Parsed with column specification:
## cols(
## Country = col_character(),
## Region = col_character(),
## `Happiness Rank` = col_double(),
## `Happiness Score` = col_double(),
## `Standard Error` = col_double(),
## `Economy (GDP per Capita)` = col_double(),
## Family = col_double(),
## `Health (Life Expectancy)` = col_double(),
## Freedom = col_double(),
## `Trust (Government Corruption)` = col_double(),
## Generosity = col_double(),
## `Dystopia Residual` = col_double()
## )
head(WHP_2015)
# Subset World Happiness Report to get Hapiness Score
WHP_2015<-WHP_2015[,c(1,2,4)]
head(WHP_2015)
# Read World Happiness Report 2016
WHP_2016<-read_csv("D:/RMIT/2. Data Wrangling/Assignment 2/Dataset/Group 1/World Happiness Report/2016.csv")
## Parsed with column specification:
## cols(
## Country = col_character(),
## Region = col_character(),
## `Happiness Rank` = col_double(),
## `Happiness Score` = col_double(),
## `Lower Confidence Interval` = col_double(),
## `Upper Confidence Interval` = col_double(),
## `Economy (GDP per Capita)` = col_double(),
## Family = col_double(),
## `Health (Life Expectancy)` = col_double(),
## Freedom = col_double(),
## `Trust (Government Corruption)` = col_double(),
## Generosity = col_double(),
## `Dystopia Residual` = col_double()
## )
head(WHP_2016)
# Subset World Happiness Report to get Hapiness Score
WHP_2016<-WHP_2016[,c(1,4)]
head(WHP_2016)
# Joining World Health Report in 2015 and 2016
WHP_join<-WHP_2015 %>% left_join(WHP_2016, by="Country")
head(WHP_join,10)
#Changing column name of World Health Report in 2015 and 2016
colnames(WHP_join)<-c("Country","Region","Score_2015","Score_2016")
head(WHP_join)
#Joining Obesity dataset and World Health Report datasets together
all_join<-left_join(WHP_join,Obesity, by = "Country")
head(all_join)
Checking structure of two datasets to see their data types and dimensions
# Structure of Obesity
str(all_join)
## tibble [158 x 6] (S3: tbl_df/tbl/data.frame)
## $ Country : chr [1:158] "Switzerland" "Iceland" "Denmark" "Norway" ...
## $ Region : chr [1:158] "Western Europe" "Western Europe" "Western Europe" "Western Europe" ...
## $ Score_2015 : num [1:158] 7.59 7.56 7.53 7.52 7.43 ...
## $ Score_2016 : num [1:158] 7.51 7.5 7.53 7.5 7.4 ...
## $ Obesity_2016_percentage: chr [1:158] "19.5 [16.0-23.3]" "21.9 [18.0-26.0]" "19.7 [16.2-23.3]" "23.1 [19.3-27.1]" ...
## $ Obesity_2015_percentage: chr [1:158] "19.1 [15.7-22.7]" "21.5 [17.8-25.4]" "19.3 [16.1-22.7]" "22.6 [19.0-26.4]" ...
-At this stage, it is obvious to see that: there are two main data type of the all_join dataset, which are character and numeric. The dataset has 158 observations and 6 variables.
There are 6 variables: Country(character), Region(character), Score_2015(numeric), Score_2016(numeric), Obesity_2016_percentage(character), Obesity_2015_percentage(character)
However, in order to compare the difference of obesity in 2015 and 2016, the data column of Obesity_2015_percentage and Obesity_2016_percentage needs to be converted to numeric data type for mutating.
As can been seen from the structure, i cannot convert to numeric data type at this stage as the column of Obesity_2015_percentage and Obesity_2016_percentage needs to be tidy up before converting and factorizing. Therefore, i cannot do conversions and factorizing at this stage. They will be performed in the next stages.
-Firstly, i need to check the dataset to find any problem.
# Checking head
head(all_join)
As can be seen from the head of 6 observations above, the columns named Obesity_2016_percentage and Obesity_2015_percentage are untidy as they contain intervals in the cells. In order to convert to numeric, i need to be tidy and then reshape the dataset.
In this stage, i will use separate() function to split the intervals of the two column and then remove the interval.
Next, i will convert the Obesity_2016_percentage and Obesity_2015_percentage columns into numeric.
Then, i will factorize and label the Score_2015 and Score-2016 collumns
# Using separate function to split "Obesity_2016_percentage" and its interval
all_join %<>% separate("Obesity_2016_percentage",into= c("Obesity_2016_percentage", "Intervals_1"), sep=" ")
knitr::kable(head(all_join,10))
| Country | Region | Score_2015 | Score_2016 | Obesity_2016_percentage | Intervals_1 | Obesity_2015_percentage |
|---|---|---|---|---|---|---|
| Switzerland | Western Europe | 7.587 | 7.509 | 19.5 | [16.0-23.3] | 19.1 [15.7-22.7] |
| Iceland | Western Europe | 7.561 | 7.501 | 21.9 | [18.0-26.0] | 21.5 [17.8-25.4] |
| Denmark | Western Europe | 7.527 | 7.526 | 19.7 | [16.2-23.3] | 19.3 [16.1-22.7] |
| Norway | Western Europe | 7.522 | 7.498 | 23.1 | [19.3-27.1] | 22.6 [19.0-26.4] |
| Canada | North America | 7.427 | 7.404 | 29.4 | [25.7-33.3] | 28.8 [25.3-32.4] |
| Finland | Western Europe | 7.406 | 7.413 | 22.2 | [19.0-25.7] | 21.8 [18.8-25.1] |
| Netherlands | Western Europe | 7.378 | 7.339 | 20.4 | [16.9-24.2] | 20.0 [16.7-23.6] |
| Sweden | Western Europe | 7.364 | 7.291 | 20.6 | [17.1-24.3] | 20.2 [16.9-23.6] |
| New Zealand | Australia and New Zealand | 7.286 | 7.334 | 30.8 | [27.3-34.3] | 30.2 [26.9-33.5] |
| Australia | Australia and New Zealand | 7.284 | 7.313 | 29.0 | [25.3-32.9] | 28.4 [24.9-32.1] |
# Using separate function to split "Obesity_2015_percentage" and its interval
all_join %<>% separate("Obesity_2015_percentage",into= c("Obesity_2015_percentage", "Intervals_2"), sep=" ")
knitr::kable(head(all_join,10))
| Country | Region | Score_2015 | Score_2016 | Obesity_2016_percentage | Intervals_1 | Obesity_2015_percentage | Intervals_2 |
|---|---|---|---|---|---|---|---|
| Switzerland | Western Europe | 7.587 | 7.509 | 19.5 | [16.0-23.3] | 19.1 | [15.7-22.7] |
| Iceland | Western Europe | 7.561 | 7.501 | 21.9 | [18.0-26.0] | 21.5 | [17.8-25.4] |
| Denmark | Western Europe | 7.527 | 7.526 | 19.7 | [16.2-23.3] | 19.3 | [16.1-22.7] |
| Norway | Western Europe | 7.522 | 7.498 | 23.1 | [19.3-27.1] | 22.6 | [19.0-26.4] |
| Canada | North America | 7.427 | 7.404 | 29.4 | [25.7-33.3] | 28.8 | [25.3-32.4] |
| Finland | Western Europe | 7.406 | 7.413 | 22.2 | [19.0-25.7] | 21.8 | [18.8-25.1] |
| Netherlands | Western Europe | 7.378 | 7.339 | 20.4 | [16.9-24.2] | 20.0 | [16.7-23.6] |
| Sweden | Western Europe | 7.364 | 7.291 | 20.6 | [17.1-24.3] | 20.2 | [16.9-23.6] |
| New Zealand | Australia and New Zealand | 7.286 | 7.334 | 30.8 | [27.3-34.3] | 30.2 | [26.9-33.5] |
| Australia | Australia and New Zealand | 7.284 | 7.313 | 29.0 | [25.3-32.9] | 28.4 | [24.9-32.1] |
# Removing intervals
all_join<-all_join[ , -c(6,8)]
head(all_join)
# Convert data type from character to numeric
all_join[ , c(5)]<-as.numeric(unlist(all_join[ , c(5)]))
## Warning: NAs introduced by coercion
all_join[ , c(6)]<-as.numeric(unlist(all_join[ , c(6)]))
## Warning: NAs introduced by coercion
str(all_join)
## tibble [158 x 6] (S3: tbl_df/tbl/data.frame)
## $ Country : chr [1:158] "Switzerland" "Iceland" "Denmark" "Norway" ...
## $ Region : chr [1:158] "Western Europe" "Western Europe" "Western Europe" "Western Europe" ...
## $ Score_2015 : num [1:158] 7.59 7.56 7.53 7.52 7.43 ...
## $ Score_2016 : num [1:158] 7.51 7.5 7.53 7.5 7.4 ...
## $ Obesity_2016_percentage: num [1:158] 19.5 21.9 19.7 23.1 29.4 22.2 20.4 20.6 30.8 29 ...
## $ Obesity_2015_percentage: num [1:158] 19.1 21.5 19.3 22.6 28.8 21.8 20 20.2 30.2 28.4 ...
Now, the Obesity_2016_percentage and Obesity_2015_percentage columns have been converted to numeric datatype and ready for mutating in the next stages.
In the following steps, i will convert the dataset from long format and wide format and do factorizing and labelling.
# Transfer World Health Report to long format
all_join1 <- all_join %>%
pivot_longer(3:4, names_to="Years", values_to="happiness_score")
knitr::kable(head(all_join1))
| Country | Region | Obesity_2016_percentage | Obesity_2015_percentage | Years | happiness_score |
|---|---|---|---|---|---|
| Switzerland | Western Europe | 19.5 | 19.1 | Score_2015 | 7.587 |
| Switzerland | Western Europe | 19.5 | 19.1 | Score_2016 | 7.509 |
| Iceland | Western Europe | 21.9 | 21.5 | Score_2015 | 7.561 |
| Iceland | Western Europe | 21.9 | 21.5 | Score_2016 | 7.501 |
| Denmark | Western Europe | 19.7 | 19.3 | Score_2015 | 7.527 |
| Denmark | Western Europe | 19.7 | 19.3 | Score_2016 | 7.526 |
# Factorise Year variable
all_join1$Years <-all_join1$Years %>%factor(levels=c("Score_2015", "Score_2016"),labels=c("2015", "2016"))
head(all_join1)
# Mutating the combined dataset to see the changes between 2015 and 2016
all_join<-mutate(all_join,Difference_hapiness= Score_2016 - Score_2015,
Difference_obesity=(Obesity_2016_percentage - Obesity_2015_percentage) )
knitr::kable(head(all_join))
| Country | Region | Score_2015 | Score_2016 | Obesity_2016_percentage | Obesity_2015_percentage | Difference_hapiness | Difference_obesity |
|---|---|---|---|---|---|---|---|
| Switzerland | Western Europe | 7.587 | 7.509 | 19.5 | 19.1 | -0.078 | 0.4 |
| Iceland | Western Europe | 7.561 | 7.501 | 21.9 | 21.5 | -0.060 | 0.4 |
| Denmark | Western Europe | 7.527 | 7.526 | 19.7 | 19.3 | -0.001 | 0.4 |
| Norway | Western Europe | 7.522 | 7.498 | 23.1 | 22.6 | -0.024 | 0.5 |
| Canada | North America | 7.427 | 7.404 | 29.4 | 28.8 | -0.023 | 0.6 |
| Finland | Western Europe | 7.406 | 7.413 | 22.2 | 21.8 | 0.007 | 0.4 |
In this stage, i will scan the columns Difference_happiness and Difference_obesity for missing value. By using sum(), i can know how many missing values in each variable.
If there is any missing value, i will choose to exclude the observations that have missing values. The reason i do not replace missing values by using other methods such as the mean, median or mode is that each country has its own value, and each value change every year depending on the country’s circumstance. For example, in the Covid-19 period, the happiness score will be significantly different and it is impossible to predict or replace the missing value.
# Scanning missing value in the difference in Obesity and World Health Report
sum(is.na(all_join$Difference_hapiness))
## [1] 7
sum(is.na(all_join$Difference_obesity))
## [1] 25
-Now, we can see that the “Difference_happiness” column has 7 missing values, while the “Difference_Obesity” contains 25 missing value - In the next step,i will remove the missing value by applying complete.cases() method for each variable.
# Excluding missing data
all_join<-all_join[complete.cases(all_join$Difference_hapiness),]
all_join<-all_join[complete.cases(all_join$Difference_obesity),]
sum(is.na(all_join))
## [1] 0
The dataset is now having no missing value.
In the following steps, i will create a function to check for infinite values and nan values. Then, i will use apply family function to apply the function to the dataset.
# Create function to check for special values
is.special <- function(x){
if (is.numeric(x)) (is.infinite(x) | is.nan(x))
}
# Applying function
sum(sapply(all_join$Difference_hapiness, is.special))
## [1] 0
sum(sapply(all_join$Difference_obesity, is.special))
## [1] 0
In this stage, because of having two variables, I will use the univariate box plot approach to detect any outlier.
# Scanning for outlier
all_join%>%plot(Difference_obesity~Difference_hapiness, data=., main="Relationship of Happiness levels and Obesity",
xlab = "Hapiness", ylab = "Obesity")
# Subsetting data for outlier check
Happiness_Obesity<-all_join%>%select(Difference_hapiness,Difference_obesity)
head(Happiness_Obesity)
# Multivariate outlier detection using Mahalanobis distance with QQ plots
results <- mvn(data = Happiness_Obesity, multivariateOutlierMethod = "quan", showOutliers = TRUE)
results$multivariateOutliers
In this stage, i need to check the distribution of the Difference_in_happiness and Difference_in_obesity variables.
# Checking distribution for Difference_happiness variable
all_join$Difference_hapiness %>% hist(col="grey",xlab = "Hapiness",main = "Histogram of difference in happiness")
# Applying BoxCox transformation
boxcox_happiness<-BoxCox(all_join$Difference_hapiness, lambda="auto")
hist(boxcox_happiness)
# Checking distribution for Difference_Obesity variable
all_join$Difference_obesity %>% hist(col="grey",xlab = "Obesity",main = "Histogram of difference in Obesity")
# Applying BoxCox transformation
boxcox_obesity<-BoxCox(all_join$Difference_obesity,lambda="auto")
hist(boxcox_obesity)
Obesity among adults by country, 1975-2016 https://www.kaggle.com/amanarora/obesity-among-adults-by-country-19752016
World Happiness Report 2015 https://www.kaggle.com/mathurinache/world-happiness-report
World Happiness Report 2016 https://www.kaggle.com/mathurinache/world-happiness-report