Introduction

What makes us happy? Isn’t that what everything is about?

To help find an answer, the World Happiness Report, in response to the UN meeting on “Wellbeing and Happiness: Defining a New Economic Paradigm” has been releasing annual editions on the topic since 2012. The report reviews the latest discoveries on happiness and examines globally obtained data of differing qualities of life to ascertain how happiness is derived.

This report explores the 2019 World Happiness Report dataset which examined 8 different predictors that affect people’s perceived happiness. The report then ranks the 156 participating countries based on how their citizens respond to the survey’s questions. The primary measure of happiness is determined by the Cantril ladder question which asks respondents to evaluate their lives from a score between 0 (the worst possible life) to 10 (the best possible life). Data from 2005-2018 is captured in the dataset.

Problem Statement

This report will aim to identify the effect of 2 of the 8 predictors on the Cantril ladder question on a national level, in particular: social support and generosity between the years 2016-2018. It intends to see if the relationships between these variables and the ladder question response is linear and which is a better measurement of happiness.

Data

As mentioned, all data was sourced from the 2019 World Happiness Report’s Chapter 2 Online Data. According to the 2020 FAQ page, the data from the 2020 report was obtained from “nationally representative samples” (Helliwell et al., 2020) of which a typical annual sample is 1000 people. To obtain these samples, it is inferred that a cluster sampling method was used, and given no mention of changing procedures, it is inferred that the same sampling methods were used for the 2019 report.

The values within the dataset have already been adjusted for population size in each country (Helliwell et al., 2019).

As it has been stated that some earlier samples (for example 2005) had sample sizes less than the most recent sample sizes (Helliwell et al., 2019), the report will only examine the most recent grouping of 2016-2018.

Data Cont.

The report will subset the 2019 World Happiness Report dataset (Helliwell et al., 2019) to only consist of observations between 2016-2018. For the purposes of this report, the specific countries are irrelevant as the report will observe in particular the following list of variables at the global level:

Year: Year when the survey was conducted
Life Ladder: The population-weighted average life evaluation on a scale of 0 to 10
Social support: The national average binary response to the question “If you were in trouble, do you have relatives or friends you can count on to help you whenever you need them, or not?”
Generosity: The national average regression of responses to the question “Have you donated money to a charity in the past month?”

Data Cont.

# Reading in the dataset
happiness <- read_excel("Chapter2OnlineData.xls")

# Only obtaining required columns for report
happiness_2v <- happiness[,c(2,3,5,8)];

# Filtering observations to be between 2016-2018
happiness_2v_16_18 <- happiness_2v %>% filter(Year >= 2016)

Completing this filtering process resulted in 425 observations.

# Checking for missing data
colSums(is.na(happiness_2v_16_18))

##           Year    Life Ladder Social support     Generosity 
##              0              0              1             19

Data Cont.

To handle the missing data, it was deemed appropriate to impute the missing values with the median as the individual values were population-weighted averages.

# Impute missing data
happiness_2v_16_18$`Social support` <- impute(happiness_2v_16_18$`Social support`, fun=median)

happiness_2v_16_18$Generosity <- impute(happiness_2v_16_18$Generosity, fun = median)

Descriptive Statistics and Visualisation

Visualising the the relationships between the social support and generosity upon life ladder response:

par(mfrow=c(1,2))
plot(`Life Ladder` ~ `Social support`, data = happiness_2v_16_18,
     main = "Social Support v Life Ladder")
plot(`Life Ladder` ~ `Generosity`, data = happiness_2v_16_18,
     main = "Generosity v Life Ladder")

Decsriptive Statistics and Visualisation Cont.

Converting the dataframe into a tidy one for descriptive statistics:

happiness_tidy <- happiness_2v_16_18 %>% gather(`Social support`,
                                                Generosity,
                                                key="Predictor", 
                                                value="Result")

options(digits=2)
happiness_tidy %>% group_by(Predictor) %>% summarise(Min = min(Result,na.rm = TRUE),
                                           Q1 = quantile(Result,probs = .25,na.rm = TRUE),
                                           Median = median(Result, na.rm = TRUE),
                                           Q3 = quantile(Result,probs = .75,na.rm = TRUE),
                                           Max = max(Result,na.rm = TRUE),
                                           Mean = mean(Result, na.rm = TRUE),
                                           SD = sd(Result, na.rm = TRUE),
                                           n = n(),
                                           Missing = sum(is.na(Result))) -> table1
knitr::kable(table1)

Predictor	Min	Q1	Median	Q3	Max	Mean	SD	n	Missing
Generosity	-0.34	-0.13	-0.03	0.08	0.66	-0.01	0.15	425	0
Social support	0.29	0.74	0.84	0.91	0.98	0.81	0.12	425	0

Hypothesis Testing

From the scatter plots that visualise the relationships between the predictors and life ladder values, it can be seen that social support appears to have a linear relationship with life ladder and generosity does not. As such, we can fit the linear regression model of social support and compare with the significance level of:

\[ \alpha = 0.05. \] The hypotheses for the overall linear regression model consist of:

H0: The data does not fit the lienar regression model
HA: The data fits the linear regression model

Hypothesis Testing Cont.

# 
life_social <- lm(`Life Ladder` ~ `Social support`, data = happiness_2v_16_18)
life_social %>% summary()

## 
## Call:
## lm(formula = `Life Ladder` ~ `Social support`, data = happiness_2v_16_18)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.2592 -0.4765 -0.0181  0.5308  2.4761 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        -0.131      0.249   -0.53      0.6    
## `Social support`    6.901      0.304   22.68   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.76 on 423 degrees of freedom
## Multiple R-squared:  0.549,  Adjusted R-squared:  0.548 
## F-statistic:  514 on 1 and 423 DF,  p-value: <2e-16

Hypthesis Testing Cont.

As the p-value for the F-test is substantially small, F(1, 423) = 514, p<0.001 then the model is statistically significant and we can reject the null hypothesis. r^{2} = 0.549

cor(happiness_2v_16_18$`Life Ladder`, happiness_2v_16_18$`Social support`)

## [1] 0.74

The correlation is: 0.74.

Now we test for the model parameters:

Intercept:

\[H_0: \alpha = 0 \] \[H_A: \alpha \ne 0 \] Slope:

\[H_0: \beta = 0 \]

\[H_A: \beta \ne 0 \]

Hypothesis Testing Cont.

The 95% confidence interval consists of:

life_social %>% confint()

##                  2.5 % 97.5 %
## (Intercept)      -0.62   0.36
## `Social support`  6.30   7.50

As alpha=0 is captured in this interval, then we fail to reject the null hypothesis.

To check for the slope hypothesis, we calculate the two-tailed p-value for the slope:

2*pt(q=22.68, df=425-2, lower.tail=FALSE)

## [1] 4.3e-75

As p<0.001, therefore p<0.05 so we can reject the null hypothesis and conclude that there is a positive relationship between social support and respondent’s response to the life ladder scale.

Hypothesis Testing Cont.

The relationship between social support and life ladder scores currently supports two of the four assumptions allowing for linear regression (independence, through the way data was obtained and linearity, via the scatter plot). However, the normality of residuals and homoscedasticity still need to be accounted for.

par(mfrow=c(2,2))
life_social %>% plot(which=1)
life_social %>% plot(which=2)
life_social %>% plot(which=5)

Discussion

To conclude, generosity was found to have a non-linear relationship with the life ladder scores.

For social support, it was found that:

Linearity was observed in the scatter plot and assumed
Independence was assumed by the data collection methods
No major deviations from normality from observing the Q-Q plot
The data was observed to be heteroscedastic in the Residuals Vs Fitted plot, so ordinary least squares linear regression is not appropriate for this data.
r = 0.74, r^{2} = 0.549
a = 0, null hypothesis could not be rejected
b = 6.901, p<0.001, 95% CI (6.3, 7.5)

We can conclude that social support is a more meaningful representation of overall happiness in comparison to generosity. However, as social support fails the homoscedasticity assumption, we cannot conclude that there was a statistically positive linear relationship between the social support response and life ladder response in the dataset from the 2019 World Happiness Report.

In future, it may be more appropriate to have a multi-valued response to determine social support scores as opposed to a binary response to account for the current failure of the homoscedasticity test.

References

Helliwell, J., Layard, R., & Sachs, J. (2019). Chapter 2: Online Data (World Happiness Report 2019), New York: Sustainable Development Solutions Network. https://s3.amazonaws.com/happiness-report/2019/Chapter2OnlineData.xls
Helliwell, J., Layard, R., & Sachs, J. (2019). World Happiness Report 2019, New York: Sustainable Development Solutions Network. https://worldhappiness.report/ed/2019/changing-world-happiness/
Helliwell, J., Layard, R., Sachs, J., & Neve, D. J. (2020). FAQ (World Happiness Report 2020), New York: Sustainable Development Solutions Network. https://worldhappiness.report/faq/

Which Principle Measure of Happiness affects it the most?

A snapshot from the 2019 World Happiness Report

RPubs link information

Introduction

Problem Statement

Data

Data Cont.

Data Cont.

Data Cont.

Descriptive Statistics and Visualisation

Decsriptive Statistics and Visualisation Cont.

Hypothesis Testing

Hypothesis Testing Cont.

Hypthesis Testing Cont.

Hypothesis Testing Cont.

Hypothesis Testing Cont.

Discussion

References