Saayed-Project

#loading library
library(tidyverse)
library(ggthemes)
library(corrplot)
library(RColorBrewer)
library(Hmisc)
library(ggpubr)

Part 1 - Introduction

According to a new report by the Centers of Disease Control and Prevention (CDC), the suicide rate is the highest it’s been in decades in United States. Which led to a question, what makes a country happy? The purpose of this project is to determine which factors are most important to live a happier life and whether there is a difference in those factors. The finding could help people and countries focus on the factors needed to attain a higher level of happiness or at the least push them towards the right direction.

Part 2 - Data

The dataset comes Sustainable Development Solutions Network, an initiative by United Nations to promote sustainable development around the world. The survey from the year 2012, has 155 cases which are countries by their happiness levels. The ranking of the countries are based on six variables - family, life expectancy, economy, generosity, trust in government and freedom. The variables are all numerical and the sum of these six variables equal to the happiness score.

This is an observational study. The population of interest is all the people. And the sample in the study has 155 countries.

The results of the study cannot be generalized to the population because this is an observational study and the data comes from the Gallup World Poll. Respondents are asked to think of a ladder with the best possible life for them being a 10 and the worst possible life being a 0 and to rate their own current lives on that scale. Since these are just answers, these data cannot be used to establish a casual links between the variables of interest. Nonetheless, the report continues to gain global recognition as governments, organizations and civil society increasingly use happiness indicators to inform their policy-making decisions.

#loading data
happiness_rank <- read.csv("https://raw.githubusercontent.com/saayedalam/Data/master/happiness_rank_2017.csv")

#renaming some columns for readability, 
#removing some columns because they are not needed for the purpose of this project
#creating a new variable for analysis
happiness_rank <- happiness_rank %>%
  dplyr::rename(Life.Expectancy = Health..Life.Expectancy.
                ,Economy = Economy..GDP.per.Capita.
                ,Trust = Trust..Government.Corruption.) %>%
  select(-Whisker.high, -Whisker.low, - Dystopia.Residual, -Happiness.Rank) %>%
  mutate(Continent = case_when(
    Country %in% c("Israel", "United Arab Emirates", "Singapore", "Thailand", "Taiwan Province of China", "Qatar", "Saudi Arabia", "Kuwait", "Bahrain", "Malaysia", "Uzbekistan", "Japan", "South Korea", "Turkmenistan",
                   "Kazakhstan", "Turkey", "Hong Kong S.A.R., China", "Philippines", "Jordan", "China", "Pakistan", "Indonesia", "Azerbaijan", "Lebanon", "Vietnam", "Tajikistan", "Bhutan", "Kyrgyzstan", "Nepal", "Mongolia",
                   "Palestinian Territories", "Iran", "Bangladesh", "Myanmar", "Iraq", "Sri Lanka", "Armenia", "India", "Georgia", "Cambodia", "Afghanistan", "Yemen", "Syria") ~ "Asia",
    Country %in%  c("Norway", "Denmark", "Iceland", "Switzerland", "Finland", "Netherlands", "Sweden", "Austria", "Ireland", "Germany", "Belgium", "Luxembourg", "United Kingdom", "Czech Republic", "Malta", "France", 
                    "Spain","Slovakia", "Poland", "Italy", "Russia", "Lithuania", "Latvia", "Moldova", "Romania", "Slovenia", "North Cyprus", "Cyprus", "Estonia", "Belarus", "Serbia", "Hungary", "Croatia", "Kosovo",
                    "Montenegro", "Greece", "Portugal", "Bosnia and Herzegovina", "Macedonia", "Bulgaria", "Albania", "Ukraine") ~ "Europe",
    Country %in%  c("Canada", "Costa Rica", "United States", "Mexico", "Panama","Trinidad and Tobago", "El Salvador", "Belize", "Guatemala", "Jamaica", "Nicaragua", "Dominican Republic", "Honduras", "Haiti") ~ "North America",
    Country %in%  c("Chile", "Brazil", "Argentina", "Uruguay", "Colombia", "Ecuador", "Bolivia", "Peru", "Paraguay", "Venezuela") ~ "South America",
    Country %in%  c("New Zealand", "Australia") ~ "Australia",
    TRUE ~ "Africa")) %>%
  mutate(Continent = as.factor(Continent)) %>%
  select(Country, Continent, everything()) 

glimpse(happiness_rank)

## Observations: 155
## Variables: 9
## $ Country         <fct> Norway, Denmark, Iceland, Switzerland, Finland...
## $ Continent       <fct> Europe, Europe, Europe, Europe, Europe, Europe...
## $ Happiness.Score <dbl> 7.537, 7.522, 7.504, 7.494, 7.469, 7.377, 7.31...
## $ Economy         <dbl> 1.616463, 1.482383, 1.480633, 1.564980, 1.4435...
## $ Family          <dbl> 1.533524, 1.551122, 1.610574, 1.516912, 1.5402...
## $ Life.Expectancy <dbl> 0.7966665, 0.7925655, 0.8335521, 0.8581313, 0....
## $ Freedom         <dbl> 0.6354226, 0.6260067, 0.6271626, 0.6200706, 0....
## $ Generosity      <dbl> 0.36201224, 0.35528049, 0.47554022, 0.29054928...
## $ Trust           <dbl> 0.31596383, 0.40077007, 0.15352656, 0.36700729...

Part 3 - Exploratory data analysis

A descriptive statistical analysis of the six factors and happiness score using the Hmisc package.

happiness_rank %>%
  select(-Country, -Continent) %>%
  Hmisc::describe()

## . 
## 
##  7  Variables      155  Observations
## ---------------------------------------------------------------------------
## Happiness.Score 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##      155        0      151        1    5.354    1.301    3.574    3.800 
##      .25      .50      .75      .90      .95 
##    4.506    5.279    6.102    6.927    7.293 
## 
## lowest : 2.693 2.905 3.349 3.462 3.471, highest: 7.469 7.494 7.504 7.522 7.537
## ---------------------------------------------------------------------------
## Economy 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##      155        0      155        1   0.9847   0.4802   0.2415   0.3687 
##      .25      .50      .75      .90      .95 
##   0.6634   1.0646   1.3180   1.4860   1.5479 
## 
## lowest : 0.00000000 0.02264318 0.09162257 0.09210235 0.11904179
## highest: 1.62634337 1.63295245 1.69227767 1.74194360 1.87076569
## ---------------------------------------------------------------------------
## Family 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##      155        0      155        1    1.189   0.3106   0.6213   0.7814 
##      .25      .50      .75      .90      .95 
##   1.0426   1.2539   1.4143   1.4856   1.5215 
## 
## lowest : 0.0000000 0.3961026 0.4318825 0.4352998 0.5125688
## highest: 1.5481951 1.5489691 1.5511216 1.5582311 1.6105740
## ---------------------------------------------------------------------------
## Life.Expectancy 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##      155        0      155        1   0.5513   0.2677   0.1118   0.1925 
##      .25      .50      .75      .90      .95 
##   0.3699   0.6060   0.7230   0.8273   0.8448 
## 
## lowest : 0.000000000 0.005564754 0.018772686 0.041134715 0.048642170
## highest: 0.888960600 0.900214076 0.913475871 0.943062425 0.949492395
## ---------------------------------------------------------------------------
## Freedom 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##      155        0      155        1   0.4088   0.1691   0.1179   0.2007 
##      .25      .50      .75      .90      .95 
##   0.3037   0.4375   0.5166   0.5874   0.6133 
## 
## lowest : 0.00000000 0.01499586 0.03036986 0.05990075 0.08153944
## highest: 0.62600672 0.62716264 0.63337582 0.63542259 0.65824866
## ---------------------------------------------------------------------------
## Generosity 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##      155        0      155        1   0.2469   0.1482  0.05149  0.08534 
##      .25      .50      .75      .90      .95 
##  0.15411  0.23154  0.32376  0.42829  0.48970 
## 
## lowest : 0.00000000 0.01016466 0.02880684 0.03220996 0.04378538
## highest: 0.50000513 0.57212311 0.57473058 0.61170459 0.83807516
## ---------------------------------------------------------------------------
## Trust 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##      155        0      155        1   0.1231   0.1047  0.02072  0.03213 
##      .25      .50      .75      .90      .95 
##  0.05727  0.08985  0.15330  0.28256  0.33724 
## 
## lowest : 0.000000000 0.004387901 0.008964816 0.010091286 0.011051531
## highest: 0.384398729 0.400770068 0.439299256 0.455220014 0.464307785
## ---------------------------------------------------------------------------

A correlation plot shows Economy, Life Expectancy and Family play a major role in a continent’s happiness level.

happiness_corr <- cor(happiness_rank[c(3:9)])
corrplot(happiness_corr, method = "pie", type = "upper", order = "FPC",
         col = brewer.pal(n = 7, name = "GnBu"),
         tl.col = "black", cl.align = "r", cl.ratio = 0.3)

Part 4 - Inference

The sample meets the required conditions for inference i.e. independent and large enough sample size. I will use the one-way analysis of variance (ANOVA) for comparing means.
Null hypothesis: the means of the six variables are the same.
Alternative hypothesis: At least one sample mean is not equal to the others.

#transforming data for analysis
test_anova <-happiness_rank %>%
  gather(Factors, Score, -c("Country", "Continent", "Happiness.Score")) %>%
  select(-Country, -Happiness.Score)

#statistical analysis
group_by(test_anova, Factors) %>%
  summarise(
    count = n(),
    mean = mean(Score, na.rm = TRUE),
    sd = sd(Score, na.rm = TRUE)
  )

## # A tibble: 6 x 4
##   Factors         count  mean    sd
##   <chr>           <int> <dbl> <dbl>
## 1 Economy           155 0.985 0.421
## 2 Family            155 1.19  0.287
## 3 Freedom           155 0.409 0.150
## 4 Generosity        155 0.247 0.135
## 5 Life.Expectancy   155 0.551 0.237
## 6 Trust             155 0.123 0.102

#visualizing the data
ggboxplot(test_anova, x = "Factors", y = "Score", 
          color = "Factors", palette = brewer.pal(n = 6, name = "Dark2"),
          ylab = "Score", xlab = "Factors of Happiness Score")

#computing the analysis of variance
res.aov <- aov(Score ~ Factors, data = test_anova)

#summary of the analysis
summary(res.aov)

##              Df Sum Sq Mean Sq F value Pr(>F)    
## Factors       5 137.07  27.413   448.4 <2e-16 ***
## Residuals   924  56.49   0.061                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Since the p value is extremely low, we reject null hypothesis and conclude that there are some differences in factors determining the overall happiness of a country.

Part 5 - Conclusion

Based on the data gathered by UN, statistical analysis and inference test, all the factors are varied enough to measure happiness of a country. The analysis shows at least three of the factors, i.e. economy, life expectancy, freedom, play a major role in attaining a higher level of happiness. The correlation between the rest, namely trust in government and generosity, does not really say much about a countries happiness. For future research, one could collect independent data on other factors such as marriage/divorce to get a better happiness score.

References

One-Way ANOVA Test in R