Adam Xie, s3769982
Last updated: 25 October, 2020
What makes us happy? Isn’t that what everything is about?
To help find an answer, the World Happiness Report, in response to the UN meeting on “Wellbeing and Happiness: Defining a New Economic Paradigm” has been releasing annual editions on the topic since 2012. The report reviews the latest discoveries on happiness and examines globally obtained data of differing qualities of life to ascertain how happiness is derived.
This report explores the 2019 World Happiness Report dataset which examined 8 different predictors that affect people’s perceived happiness. The report then ranks the 156 participating countries based on how their citizens respond to the survey’s questions. The primary measure of happiness is determined by the Cantril ladder question which asks respondents to evaluate their lives from a score between 0 (the worst possible life) to 10 (the best possible life). Data from 2005-2018 is captured in the dataset.
This report will aim to identify the effect of 2 of the 8 predictors on the Cantril ladder question on a national level, in particular: social support and generosity between the years 2016-2018. It intends to see if the relationships between these variables and the ladder question response is linear and which is a better measurement of happiness.
As mentioned, all data was sourced from the 2019 World Happiness Report’s Chapter 2 Online Data. According to the 2020 FAQ page, the data from the 2020 report was obtained from “nationally representative samples” (Helliwell et al., 2020) of which a typical annual sample is 1000 people. To obtain these samples, it is inferred that a cluster sampling method was used, and given no mention of changing procedures, it is inferred that the same sampling methods were used for the 2019 report.
The values within the dataset have already been adjusted for population size in each country (Helliwell et al., 2019).
As it has been stated that some earlier samples (for example 2005) had sample sizes less than the most recent sample sizes (Helliwell et al., 2019), the report will only examine the most recent grouping of 2016-2018.
The report will subset the 2019 World Happiness Report dataset (Helliwell et al., 2019) to only consist of observations between 2016-2018. For the purposes of this report, the specific countries are irrelevant as the report will observe in particular the following list of variables at the global level:
# Reading in the dataset
happiness <- read_excel("Chapter2OnlineData.xls")
# Only obtaining required columns for report
happiness_2v <- happiness[,c(2,3,5,8)];
# Filtering observations to be between 2016-2018
happiness_2v_16_18 <- happiness_2v %>% filter(Year >= 2016)Completing this filtering process resulted in 425 observations.
## Year Life Ladder Social support Generosity
## 0 0 1 19
To handle the missing data, it was deemed appropriate to impute the missing values with the median as the individual values were population-weighted averages.
Visualising the the relationships between the social support and generosity upon life ladder response:
par(mfrow=c(1,2))
plot(`Life Ladder` ~ `Social support`, data = happiness_2v_16_18,
main = "Social Support v Life Ladder")
plot(`Life Ladder` ~ `Generosity`, data = happiness_2v_16_18,
main = "Generosity v Life Ladder")Converting the dataframe into a tidy one for descriptive statistics:
happiness_tidy <- happiness_2v_16_18 %>% gather(`Social support`,
Generosity,
key="Predictor",
value="Result")options(digits=2)
happiness_tidy %>% group_by(Predictor) %>% summarise(Min = min(Result,na.rm = TRUE),
Q1 = quantile(Result,probs = .25,na.rm = TRUE),
Median = median(Result, na.rm = TRUE),
Q3 = quantile(Result,probs = .75,na.rm = TRUE),
Max = max(Result,na.rm = TRUE),
Mean = mean(Result, na.rm = TRUE),
SD = sd(Result, na.rm = TRUE),
n = n(),
Missing = sum(is.na(Result))) -> table1
knitr::kable(table1)| Predictor | Min | Q1 | Median | Q3 | Max | Mean | SD | n | Missing |
|---|---|---|---|---|---|---|---|---|---|
| Generosity | -0.34 | -0.13 | -0.03 | 0.08 | 0.66 | -0.01 | 0.15 | 425 | 0 |
| Social support | 0.29 | 0.74 | 0.84 | 0.91 | 0.98 | 0.81 | 0.12 | 425 | 0 |
From the scatter plots that visualise the relationships between the predictors and life ladder values, it can be seen that social support appears to have a linear relationship with life ladder and generosity does not. As such, we can fit the linear regression model of social support and compare with the significance level of:
\[ \alpha = 0.05. \] The hypotheses for the overall linear regression model consist of:
#
life_social <- lm(`Life Ladder` ~ `Social support`, data = happiness_2v_16_18)
life_social %>% summary()##
## Call:
## lm(formula = `Life Ladder` ~ `Social support`, data = happiness_2v_16_18)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.2592 -0.4765 -0.0181 0.5308 2.4761
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.131 0.249 -0.53 0.6
## `Social support` 6.901 0.304 22.68 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.76 on 423 degrees of freedom
## Multiple R-squared: 0.549, Adjusted R-squared: 0.548
## F-statistic: 514 on 1 and 423 DF, p-value: <2e-16
As the p-value for the F-test is substantially small, F(1, 423) = 514, p<0.001 then the model is statistically significant and we can reject the null hypothesis. r^{2} = 0.549
## [1] 0.74
The correlation is: 0.74.
Now we test for the model parameters:
Intercept:
\[H_0: \alpha = 0 \] \[H_A: \alpha \ne 0 \] Slope:
\[H_0: \beta = 0 \]
\[H_A: \beta \ne 0 \]
The 95% confidence interval consists of:
## 2.5 % 97.5 %
## (Intercept) -0.62 0.36
## `Social support` 6.30 7.50
As alpha=0 is captured in this interval, then we fail to reject the null hypothesis.
To check for the slope hypothesis, we calculate the two-tailed p-value for the slope:
## [1] 4.3e-75
As p<0.001, therefore p<0.05 so we can reject the null hypothesis and conclude that there is a positive relationship between social support and respondent’s response to the life ladder scale.
The relationship between social support and life ladder scores currently supports two of the four assumptions allowing for linear regression (independence, through the way data was obtained and linearity, via the scatter plot). However, the normality of residuals and homoscedasticity still need to be accounted for.
par(mfrow=c(2,2))
life_social %>% plot(which=1)
life_social %>% plot(which=2)
life_social %>% plot(which=5)To conclude, generosity was found to have a non-linear relationship with the life ladder scores.
For social support, it was found that:
We can conclude that social support is a more meaningful representation of overall happiness in comparison to generosity. However, as social support fails the homoscedasticity assumption, we cannot conclude that there was a statistically positive linear relationship between the social support response and life ladder response in the dataset from the 2019 World Happiness Report.
In future, it may be more appropriate to have a multi-valued response to determine social support scores as opposed to a binary response to account for the current failure of the homoscedasticity test.
Helliwell, J., Layard, R., & Sachs, J. (2019). Chapter 2: Online Data (World Happiness Report 2019), New York: Sustainable Development Solutions Network. https://s3.amazonaws.com/happiness-report/2019/Chapter2OnlineData.xls
Helliwell, J., Layard, R., & Sachs, J. (2019). World Happiness Report 2019, New York: Sustainable Development Solutions Network. https://worldhappiness.report/ed/2019/changing-world-happiness/
Helliwell, J., Layard, R., Sachs, J., & Neve, D. J. (2020). FAQ (World Happiness Report 2020), New York: Sustainable Development Solutions Network. https://worldhappiness.report/faq/