Saayed Alam
December 12, 2018
Observations: 155
Variables: 12
$ Country <fct> Norway, Denmark, Iceland, Switze...
$ Happiness.Rank <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1...
$ Happiness.Score <dbl> 7.537, 7.522, 7.504, 7.494, 7.46...
$ Whisker.high <dbl> 7.594445, 7.581728, 7.622030, 7....
$ Whisker.low <dbl> 7.479556, 7.462272, 7.385970, 7....
$ Economy..GDP.per.Capita. <dbl> 1.616463, 1.482383, 1.480633, 1....
$ Family <dbl> 1.533524, 1.551122, 1.610574, 1....
$ Health..Life.Expectancy. <dbl> 0.7966665, 0.7925655, 0.8335521,...
$ Freedom <dbl> 0.6354226, 0.6260067, 0.6271626,...
$ Generosity <dbl> 0.36201224, 0.35528049, 0.475540...
$ Trust..Government.Corruption. <dbl> 0.31596383, 0.40077007, 0.153526...
$ Dystopia.Residual <dbl> 2.277027, 2.313707, 2.322715, 2....
The ranking of the countries are based on seven factors - family, life expectancy, economy, generosity, trust in government, freedom and dystopia residuals. The sum of these seven factors equal to the happiness score.
Observations: 155
Variables: 10
$ Country <fct> Norway, Denmark, Iceland, Switzerland, Finla...
$ Continent <fct> Europe, Europe, Europe, Europe, Europe, Euro...
$ Happiness.Rank <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 1...
$ Happiness.Score <dbl> 7.537, 7.522, 7.504, 7.494, 7.469, 7.377, 7....
$ Family <dbl> 1.533524, 1.551122, 1.610574, 1.516912, 1.54...
$ Life.Expectancy <dbl> 0.7966665, 0.7925655, 0.8335521, 0.8581313, ...
$ Freedom <dbl> 0.6354226, 0.6260067, 0.6271626, 0.6200706, ...
$ Generosity <dbl> 0.36201224, 0.35528049, 0.47554022, 0.290549...
$ Trust <dbl> 0.31596383, 0.40077007, 0.15352656, 0.367007...
$ Dystopia.Residual <dbl> 2.277027, 2.313707, 2.322715, 2.276716, 2.43...
A descriptive statistical analysis of the six factors and happiness score.
Happiness.Score Family Life.Expectancy Freedom
Min. :2.693 Min. :0.000 Min. :0.0000 Min. :0.0000
1st Qu.:4.505 1st Qu.:1.043 1st Qu.:0.3699 1st Qu.:0.3037
Median :5.279 Median :1.254 Median :0.6060 Median :0.4375
Mean :5.354 Mean :1.189 Mean :0.5513 Mean :0.4088
3rd Qu.:6.101 3rd Qu.:1.414 3rd Qu.:0.7230 3rd Qu.:0.5166
Max. :7.537 Max. :1.611 Max. :0.9495 Max. :0.6582
Generosity Trust Dystopia.Residual
Min. :0.0000 Min. :0.00000 Min. :0.3779
1st Qu.:0.1541 1st Qu.:0.05727 1st Qu.:1.5913
Median :0.2315 Median :0.08985 Median :1.8329
Mean :0.2469 Mean :0.12312 Mean :1.8502
3rd Qu.:0.3238 3rd Qu.:0.15330 3rd Qu.:2.1447
Max. :0.8381 Max. :0.46431 Max. :3.1175
Happiness score and happiness rank are inversely proportional.
Europe ranks the highest in happiness score.
Factors that play the most significant role in contributing to happiness score.
The correlation plot above shows Life Expectancy and Family play a major role in a continent's happiness level.
What about economy? To examine this question, a list of countries by GDP per capita published by World Bank in year 2017 is scrapped from Wikipedia.
#scraping data from wikipedia
wiki_gdp_capita <- "https://en.wikipedia.org/wiki/List_of_countries_by_GDP_(PPP)_per_capita"
wiki_gdp_capita <- read_html(wiki_gdp_capita)
gdp_capita <- wiki_gdp_capita %>%
html_nodes(xpath = '//*[@id="mw-content-text"]/div/table[1]/tbody/tr[2]/td[2]/table') %>%
html_table()
#getting the data frame from the list
gdp_capita <- gdp_capita[[1]]
#renaming the column names and removing rank
gdp_capita <- gdp_capita %>%
dplyr::rename(Country = 'Country/Territory',
USD = 'Int$') %>%
select(-Rank)
#removing commas from dollar amount
gdp_capita$USD <- str_replace_all(gdp_capita$USD, "[:punct:]", '') %>%
as.numeric()
#all rows in happiness rank that do no have a match in gdp per capita
#anti_join(happiness_rank, gdp_capita) %>%
# select(Country)
happiness_rank_gdp <- happiness_rank
#changing the value of the rows which are spelled differently
happiness_rank_gdp$country <- recode(happiness_rank_gdp$Country, "'North Cyprus' = 'Cyprus'")
happiness_rank_gdp$Country <- recode(happiness_rank_gdp$Country, "'Hong Kong S.A.R., China' = 'Hong Kong'")
happiness_rank_gdp$Country <- recode(happiness_rank_gdp$Country, "'Congo (Kinshasa)' = 'Congo, Rep.'")
happiness_rank_gdp$Country <- recode(happiness_rank_gdp$Country, "'Congo (Brazzaville)' = 'Congo, Dem. Rep.'")
happiness_rank_gdp$Country <- recode(happiness_rank_gdp$Country, "'South Sudan' = 'Sudan'")
#joining both datasets and delete the duplicate column of Country
happiness_rank_gdp <- left_join(happiness_rank_gdp, gdp_capita)
#standarizing the USD column and change its the name of the column to Economy
happiness_rank_gdp <- happiness_rank_gdp %>%
mutate(Economy = percent_rank(USD)) %>%
select(-11, -12) %>%
drop_na()
There is a correlation between economy and happiness score. And it's higher than life expectancy and family factors.
Multiple linear regression will predict happiness score with high accuracy given these factors on unseen data.