Are citizens of Western countries significantly happier than citizens from the rest of the world? Lets find out!
The World Happiness Report is an annual report on the state of global happiness and is participated by 155 countries
The Western world (West) in the context of this analysis refers to countries of Western Europe, Australasia, and the Americas, these include (4):
The aim of this analysis is to determine whether there is a statistically significant difference in the mean happiness scores of respondents in the West vs the rest of the world.
As happiness scores are independent for each nation (and as such deemed independent for a group of nations i.e. West vs Rest of the World), a two sample t-test will be used to compare the difference in the two population means. Prior to performing the two sample t-test, the variance and distribution of the two groups will be investigated prior to interpreting any results.
Our hypothesis for the test is as follows:
\[H_0: \mu_1 - \mu_2 = 0 \]
- Null Hypothesis: The difference in the mean happiness score between the West vs Rest of the world is 0
\[H_A: \mu_1 - \mu_2 \ne 0\]
- Alternate Hypothesis: The difference in the mean happiness score between the West vs Rest of the world is not 0
The following steps summarise the high-level steps of the analysis:
The data set contained 9 variables:
Per the description provided by the data source (www.kaggle.com), the columns that follow the happiness score estimate the extent to which each of six factors contribute to making life evaluations higher in each country in comparison to a hypothetical country called ‘Dystopia’ (2). These scores are independent of the scores reported for each country, but they do explain why some countries rank higher than others. The happiness rankings are determined from nationally representative samples with a typical sample size of 1,000 people per nation and use the Gallup weights (https://www.gallup.com/178667/gallup-world-poll-work.aspx) to make the estimates representative (1). All numeric data is scaled between 1 to 10.
To ready the data for analysis, the following steps were followed
## Rows: 156
## Columns: 9
## $ `Overall rank` <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,…
## $ `Country or region` <chr> "Finland", "Denmark", "Norway", "Icela…
## $ Score <dbl> 7.769, 7.600, 7.554, 7.494, 7.488, 7.4…
## $ `GDP per capita` <dbl> 1.340, 1.383, 1.488, 1.380, 1.396, 1.4…
## $ `Social support` <dbl> 1.587, 1.573, 1.582, 1.624, 1.522, 1.5…
## $ `Healthy life expectancy` <dbl> 0.986, 0.996, 1.028, 1.026, 0.999, 1.0…
## $ `Freedom to make life choices` <dbl> 0.596, 0.592, 0.603, 0.591, 0.557, 0.5…
## $ Generosity <dbl> 0.153, 0.252, 0.271, 0.354, 0.322, 0.2…
## $ `Perceptions of corruption` <dbl> 0.393, 0.410, 0.341, 0.118, 0.298, 0.3…
## Rows: 253
## Columns: 2
## $ Country <chr> "Andorra", "United Arab Emirates", "Afghanistan", "Ant…
## $ Classification <chr> "West", "Rest of the world", "Rest of the world", "Res…
#1.
countryclass$Country <- factor(countryclass$Country)
countryclass$Classification <- factor(countryclass$Classification)
WorldHappinessReport2019$`Country or region` <- factor(WorldHappinessReport2019$`Country or region`)
#2.
colSums(is.na(WorldHappinessReport2019)) # there are no missing values in the original data set## Overall rank Country or region
## 0 0
## Score GDP per capita
## 0 0
## Social support Healthy life expectancy
## 0 0
## Freedom to make life choices Generosity
## 0 0
## Perceptions of corruption
## 0
#3.
WorldHappinessReport2019 <- WorldHappinessReport2019 %>% rename(Country = "Country or region")
WHI_2019 <- WorldHappinessReport2019 %>% left_join(countryclass, by = "Country")
#4.
WHI_2019 <- WHI_2019 %>% select(Country, Score,Classification)
head(WHI_2019) Summary statistics of the happiness scores for the two groups are displayed below:
WHI_2019 %>% group_by(Classification) %>% summarise(Min = min(Score,na.rm = TRUE),
Q1 = quantile(Score,probs = .25,na.rm = TRUE),
Median = median(Score,na.rm = TRUE),
Q3 = quantile(Score,probs = .75,na.rm = TRUE),
Max = max(Score,na.rm = TRUE),
Mean = mean(Score, na.rm = TRUE),
SD = sd(Score, na.rm = TRUE),
n = n(),
Missing = sum(is.na(Score))) -> table1
knitr::kable(table1)| Classification | Min | Q1 | Median | Q3 | Max | Mean | SD | n | Missing |
|---|---|---|---|---|---|---|---|---|---|
| Rest of the world | 2.853 | 4.456 | 5.191 | 5.886 | 7.554 | 5.119656 | 1.000460 | 125 | 0 |
| West | 5.011 | 6.058 | 6.726 | 7.159 | 7.769 | 6.566129 | 0.729489 | 31 | 0 |
A review of the summary statistics indicate that the mean score of the West is higher than the rest of the world.
Confirms again that there is no missing data in the data set
The median for the West is 0.2 points higher than the mean indicating a left skew in the distribution
The minimum for the Rest of the world is more than 3 points lower than the west, indicating a long tail
Lets investigate the distribution visually through a pair of box plots:
WHI_2019 %>% boxplot(Score ~ Classification, data = ., ylab = "Happiness Score", col=c('pink', 'sky blue'))A visual inspection of the box plots also indicate a difference in the mean scores
The box plot for the West highlights the left skew
There are no outliers in the scores for the two groups
While not necessary for this analysis due to the sample sizes being greater than 30 (n = 31 and n = 125 for West and Rest of the World respectively), we will plot Q-Q plots for both groups to further investigate for normality.
HS_west <- WHI_2019 %>% filter(Classification == "West")
HS_west$Score %>% qqPlot(dist="norm", main = "QQ plot - Happiness Scores - West", col= 'dark blue', col.lines = 'sky blue')## [1] 31 30
HS_ROW<- WHI_2019 %>% filter(Classification == "Rest of the world")
HS_ROW$Score %>% qqPlot(dist="norm", main = "QQ plot - Happiness Scores -Rest of the World", col= 'red', col.lines = 'pink')## [1] 1 2
The data points fall close to the diagonal lines for both of the two groups indicating overall normally distributed.
However an ‘s’ shape is observed in both groups indicating non-normality
This does not matter however as per the Central Limit Theorem; when the sample size is large, the sampling distribution of a mean will be approximately normally distributed, regardless of the underlying population distribution
Levene’s test is used to test the assumption of equal variance.
We will use the leveneTest() funciton in R to compare the variances of the two groups
If the variances between the two groups are not equal then we would expect a statistically significant difference in the output of the levenTest()
The p-value for the Levene’s test of equal variance is 0.0826 (> 0.05)
As the p-value > 0.05, we assume equal variance.
Our hypothesis for the test is as follows:
\[H_0: \mu_1 - \mu_2 = 0 \]
- Null Hypothesis: The difference in the mean happiness score between the West vs Rest of the world is 0
\[H_A: \mu_1 - \mu_2 \ne 0\]
- Alternate Hypothesis: The difference in the mean happiness score between the West vs Rest of the world is not 0
We will perform a two-sample t-test with equal variance and a two-sided hypothesis test.
##
## Two Sample t-test
##
## data: Score by Classification
## t = -7.5589, df = 154, p-value = 3.402e-12
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -1.824503 -1.068443
## sample estimates:
## mean in group Rest of the world mean in group West
## 5.119656 6.566129
Using the p-value method to test the hypothesis, as the p-value = 3.402e-12 < 0.05, we fail to reject H0.
There is no statistically significant difference between the means of the two happiness scores between the West and the Rest of the World.
A two-sample t-test was used to test for a significant difference between the mean happiness scores of the West and the rest of the world.
While the happiness scores for the West exhibited evidence of non-normality upon inspection of the normal Q-Q plot, the central limit theorem ensured that the t-test could be applied due to the large (n > 30) sample size in each group.
The Levene’s test of homogeneity of variance indicated that equal variance could be assumed.
The results of the two-sample t-test assuming equal variance did not find a statistically significant difference between the mean happiness scores of the West and the rest of the world, t(df=154)=−7.56, p=3.402e-12, 95% CI for the difference in means [-1.824 -1.068].
The results of the investigation suggest that the West does not have a significantly higher average happiness score than the rest of the world.
Worldhappiness.report. (2019). Home. [online] Available at: https://worldhappiness.report/.
kaggle.com. (n.d.). World Happiness Report. [online] Available at: https://www.kaggle.com/unsdsn/world-happiness?select=2019.csv.
Inc, G. (2014). How Does the Gallup World Poll Work? [online] Gallup.com. Available at: https://www.gallup.com/178667/gallup-world-poll-work.aspx.
Wikipedia Contributors (2019). East–West dichotomy. [online] Wikipedia. Available at: https://en.wikipedia.org/wiki/East%E2%80%93West_dichotomy [Accessed 22 Oct. 2020].