#loading library
library(tidyverse)
library(ggthemes)
library(corrplot)
library(RColorBrewer)
library(Hmisc)
library(ggpubr)
According to a new report by the Centers of Disease Control and Prevention (CDC), the suicide rate is the highest it’s been in decades in United States. Which led to a question, what makes a country happy? The purpose of this project is to determine which factors are most important to live a happier life and whether there is a difference in those factors. The finding could help people and countries focus on the factors needed to attain a higher level of happiness or at the least push them towards the right direction.
The dataset comes Sustainable Development Solutions Network, an initiative by United Nations to promote sustainable development around the world. The survey from the year 2012, has 155 cases which are countries by their happiness levels. The ranking of the countries are based on six variables - family, life expectancy, economy, generosity, trust in government and freedom. The variables are all numerical and the sum of these six variables equal to the happiness score.
This is an observational study. The population of interest is all the people. And the sample in the study has 155 countries.
The results of the study cannot be generalized to the population because this is an observational study and the data comes from the Gallup World Poll. Respondents are asked to think of a ladder with the best possible life for them being a 10 and the worst possible life being a 0 and to rate their own current lives on that scale. Since these are just answers, these data cannot be used to establish a casual links between the variables of interest. Nonetheless, the report continues to gain global recognition as governments, organizations and civil society increasingly use happiness indicators to inform their policy-making decisions.
#loading data
happiness_rank <- read.csv("https://raw.githubusercontent.com/saayedalam/Data/master/happiness_rank_2017.csv")
#renaming some columns for readability,
#removing some columns because they are not needed for the purpose of this project
#creating a new variable for analysis
happiness_rank <- happiness_rank %>%
dplyr::rename(Life.Expectancy = Health..Life.Expectancy.
,Economy = Economy..GDP.per.Capita.
,Trust = Trust..Government.Corruption.) %>%
select(-Whisker.high, -Whisker.low, - Dystopia.Residual, -Happiness.Rank) %>%
mutate(Continent = case_when(
Country %in% c("Israel", "United Arab Emirates", "Singapore", "Thailand", "Taiwan Province of China", "Qatar", "Saudi Arabia", "Kuwait", "Bahrain", "Malaysia", "Uzbekistan", "Japan", "South Korea", "Turkmenistan",
"Kazakhstan", "Turkey", "Hong Kong S.A.R., China", "Philippines", "Jordan", "China", "Pakistan", "Indonesia", "Azerbaijan", "Lebanon", "Vietnam", "Tajikistan", "Bhutan", "Kyrgyzstan", "Nepal", "Mongolia",
"Palestinian Territories", "Iran", "Bangladesh", "Myanmar", "Iraq", "Sri Lanka", "Armenia", "India", "Georgia", "Cambodia", "Afghanistan", "Yemen", "Syria") ~ "Asia",
Country %in% c("Norway", "Denmark", "Iceland", "Switzerland", "Finland", "Netherlands", "Sweden", "Austria", "Ireland", "Germany", "Belgium", "Luxembourg", "United Kingdom", "Czech Republic", "Malta", "France",
"Spain","Slovakia", "Poland", "Italy", "Russia", "Lithuania", "Latvia", "Moldova", "Romania", "Slovenia", "North Cyprus", "Cyprus", "Estonia", "Belarus", "Serbia", "Hungary", "Croatia", "Kosovo",
"Montenegro", "Greece", "Portugal", "Bosnia and Herzegovina", "Macedonia", "Bulgaria", "Albania", "Ukraine") ~ "Europe",
Country %in% c("Canada", "Costa Rica", "United States", "Mexico", "Panama","Trinidad and Tobago", "El Salvador", "Belize", "Guatemala", "Jamaica", "Nicaragua", "Dominican Republic", "Honduras", "Haiti") ~ "North America",
Country %in% c("Chile", "Brazil", "Argentina", "Uruguay", "Colombia", "Ecuador", "Bolivia", "Peru", "Paraguay", "Venezuela") ~ "South America",
Country %in% c("New Zealand", "Australia") ~ "Australia",
TRUE ~ "Africa")) %>%
mutate(Continent = as.factor(Continent)) %>%
select(Country, Continent, everything())
glimpse(happiness_rank)
## Observations: 155
## Variables: 9
## $ Country <fct> Norway, Denmark, Iceland, Switzerland, Finland...
## $ Continent <fct> Europe, Europe, Europe, Europe, Europe, Europe...
## $ Happiness.Score <dbl> 7.537, 7.522, 7.504, 7.494, 7.469, 7.377, 7.31...
## $ Economy <dbl> 1.616463, 1.482383, 1.480633, 1.564980, 1.4435...
## $ Family <dbl> 1.533524, 1.551122, 1.610574, 1.516912, 1.5402...
## $ Life.Expectancy <dbl> 0.7966665, 0.7925655, 0.8335521, 0.8581313, 0....
## $ Freedom <dbl> 0.6354226, 0.6260067, 0.6271626, 0.6200706, 0....
## $ Generosity <dbl> 0.36201224, 0.35528049, 0.47554022, 0.29054928...
## $ Trust <dbl> 0.31596383, 0.40077007, 0.15352656, 0.36700729...
A descriptive statistical analysis of the six factors and happiness score using the Hmisc package.
happiness_rank %>%
select(-Country, -Continent) %>%
Hmisc::describe()
## .
##
## 7 Variables 155 Observations
## ---------------------------------------------------------------------------
## Happiness.Score
## n missing distinct Info Mean Gmd .05 .10
## 155 0 151 1 5.354 1.301 3.574 3.800
## .25 .50 .75 .90 .95
## 4.506 5.279 6.102 6.927 7.293
##
## lowest : 2.693 2.905 3.349 3.462 3.471, highest: 7.469 7.494 7.504 7.522 7.537
## ---------------------------------------------------------------------------
## Economy
## n missing distinct Info Mean Gmd .05 .10
## 155 0 155 1 0.9847 0.4802 0.2415 0.3687
## .25 .50 .75 .90 .95
## 0.6634 1.0646 1.3180 1.4860 1.5479
##
## lowest : 0.00000000 0.02264318 0.09162257 0.09210235 0.11904179
## highest: 1.62634337 1.63295245 1.69227767 1.74194360 1.87076569
## ---------------------------------------------------------------------------
## Family
## n missing distinct Info Mean Gmd .05 .10
## 155 0 155 1 1.189 0.3106 0.6213 0.7814
## .25 .50 .75 .90 .95
## 1.0426 1.2539 1.4143 1.4856 1.5215
##
## lowest : 0.0000000 0.3961026 0.4318825 0.4352998 0.5125688
## highest: 1.5481951 1.5489691 1.5511216 1.5582311 1.6105740
## ---------------------------------------------------------------------------
## Life.Expectancy
## n missing distinct Info Mean Gmd .05 .10
## 155 0 155 1 0.5513 0.2677 0.1118 0.1925
## .25 .50 .75 .90 .95
## 0.3699 0.6060 0.7230 0.8273 0.8448
##
## lowest : 0.000000000 0.005564754 0.018772686 0.041134715 0.048642170
## highest: 0.888960600 0.900214076 0.913475871 0.943062425 0.949492395
## ---------------------------------------------------------------------------
## Freedom
## n missing distinct Info Mean Gmd .05 .10
## 155 0 155 1 0.4088 0.1691 0.1179 0.2007
## .25 .50 .75 .90 .95
## 0.3037 0.4375 0.5166 0.5874 0.6133
##
## lowest : 0.00000000 0.01499586 0.03036986 0.05990075 0.08153944
## highest: 0.62600672 0.62716264 0.63337582 0.63542259 0.65824866
## ---------------------------------------------------------------------------
## Generosity
## n missing distinct Info Mean Gmd .05 .10
## 155 0 155 1 0.2469 0.1482 0.05149 0.08534
## .25 .50 .75 .90 .95
## 0.15411 0.23154 0.32376 0.42829 0.48970
##
## lowest : 0.00000000 0.01016466 0.02880684 0.03220996 0.04378538
## highest: 0.50000513 0.57212311 0.57473058 0.61170459 0.83807516
## ---------------------------------------------------------------------------
## Trust
## n missing distinct Info Mean Gmd .05 .10
## 155 0 155 1 0.1231 0.1047 0.02072 0.03213
## .25 .50 .75 .90 .95
## 0.05727 0.08985 0.15330 0.28256 0.33724
##
## lowest : 0.000000000 0.004387901 0.008964816 0.010091286 0.011051531
## highest: 0.384398729 0.400770068 0.439299256 0.455220014 0.464307785
## ---------------------------------------------------------------------------
A correlation plot shows Economy, Life Expectancy and Family play a major role in a continent’s happiness level.
happiness_corr <- cor(happiness_rank[c(3:9)])
corrplot(happiness_corr, method = "pie", type = "upper", order = "FPC",
col = brewer.pal(n = 7, name = "GnBu"),
tl.col = "black", cl.align = "r", cl.ratio = 0.3)
The sample meets the required conditions for inference i.e. independent and large enough sample size. I will use the one-way analysis of variance (ANOVA) for comparing means.
Null hypothesis: the means of the six variables are the same.
Alternative hypothesis: At least one sample mean is not equal to the others.
#transforming data for analysis
test_anova <-happiness_rank %>%
gather(Factors, Score, -c("Country", "Continent", "Happiness.Score")) %>%
select(-Country, -Happiness.Score)
#statistical analysis
group_by(test_anova, Factors) %>%
summarise(
count = n(),
mean = mean(Score, na.rm = TRUE),
sd = sd(Score, na.rm = TRUE)
)
## # A tibble: 6 x 4
## Factors count mean sd
## <chr> <int> <dbl> <dbl>
## 1 Economy 155 0.985 0.421
## 2 Family 155 1.19 0.287
## 3 Freedom 155 0.409 0.150
## 4 Generosity 155 0.247 0.135
## 5 Life.Expectancy 155 0.551 0.237
## 6 Trust 155 0.123 0.102
#visualizing the data
ggboxplot(test_anova, x = "Factors", y = "Score",
color = "Factors", palette = brewer.pal(n = 6, name = "Dark2"),
ylab = "Score", xlab = "Factors of Happiness Score")
#computing the analysis of variance
res.aov <- aov(Score ~ Factors, data = test_anova)
#summary of the analysis
summary(res.aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## Factors 5 137.07 27.413 448.4 <2e-16 ***
## Residuals 924 56.49 0.061
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Since the p value is extremely low, we reject null hypothesis and conclude that there are some differences in factors determining the overall happiness of a country.
Based on the data gathered by UN, statistical analysis and inference test, all the factors are varied enough to measure happiness of a country. The analysis shows at least three of the factors, i.e. economy, life expectancy, freedom, play a major role in attaining a higher level of happiness. The correlation between the rest, namely trust in government and generosity, does not really say much about a countries happiness. For future research, one could collect independent data on other factors such as marriage/divorce to get a better happiness score.