World Happiness Report 2015 to 2019

Assignment 2 - MATH1324

Edward Healey S3868807, Michael Hu S3599481, Clementine Girard-Foley S3864545, Jasper Koop s3610005

Last updated: 23 October, 2020

Introduction Cont.

The World Happiness Report looks at 155 countries’ happiness levels by using survey data drawn from nationally representative samples for these countries’ populations 2013-2019. The survey data is used to estimate economic production, social support, life expectancy, freedom, absence of corruption and generosity’s impact on happiness evaluations.

Our report will investigate whether there was a statistically significant change in the surveyed countries’ happiness scores between 2015 and 2019.

The rationale for measuring happiness levels stems from these measurements’ ability to identify the policies that reward populations with the most wellbeing, which is presumably beneficial for any politician, citizen or policy planner interested in improving the society they belong to.

This report will use statistics to answer our research question. We will subset the data obtained from the World Happiness Report to focus on the values concerning the countries’ geographical location and their reported happiness scores for the year 2015 and 2019. Descriptive statistics will be used to summarise the data and a boxplot drawn to visualise the possible differences between the reported happiness levels in 2015 and 2019. To test whether a possible change in happiness scores is statistically significant, an independent samples t-test will be applied to the data.

Data

The data was obtained from the Kaggle open data website (URL: https://www.kaggle.com/unsdsn/world-happiness?fbclid=IwAR30I6bEoTj8lUUB8et6CpVWFoEd3HAjwNCSHy0G_3UTPGSRNESG11SjBvo).

The data contains happiness scores for each country, these being a metric measure obtained from the Gallup World Poll which asked respondents to rank the quality of their life on a scale of 0 (least happy) to 10 (most happy). The responses were used to measure happiness levels in the 155 countries surveyed by the poll, and were drawn from nationally representative samples. Gallup weights were used to transform the responses into representative estimates.

Data Cont: Read In and Subset

whr_2015 <- read_csv("2015.csv")
whr_2019 <- read_csv("2019.csv")
whr_2015 %>% select(Country,`Happiness Score`) -> whr_2015_sub
whr_2019 %>% select(`Country or region`,Score) -> whr_2019_sub

whr_2019_sub %>% rename("Country"="Country or region","2019"="Score") -> whr_2019_sub
whr_2015_sub %>% rename("2015"="Happiness Score") -> whr_2015_sub
whr_2015_sub %>% left_join(whr_2019_sub, by = "Country") -> WHR
head(WHR)

Data cont: Gather

#Gather the data
WHR <- WHR %>% gather("2015","2019",key = "Year",value = "Happiness")

WHR$Year <- as.factor(WHR$Year)

str(WHR)
## tibble [316 x 3] (S3: tbl_df/tbl/data.frame)
##  $ Country  : chr [1:316] "Switzerland" "Iceland" "Denmark" "Norway" ...
##  $ Year     : Factor w/ 2 levels "2015","2019": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Happiness: num [1:316] 7.59 7.56 7.53 7.52 7.43 ...

Variables:
1. Country: The country of the reported data.
2. Year: A factor variable with two possible responses, 2015 or 2019. This represents the year of the recorded data.
3. Happiness: The happiness indicator of that country and year. Sourced from data by the World Gallup Poll.

Descriptive Statistics and Visualisation

An important variable to consider is the happiness score for each country, since comparing the scores in 2015 and in 2019 are the approach this report has taken to identify a statistically significant change in happiness levels. Happiness Score is an ordinal variable, since it ranks the categorical data from 0 (least happy) to 10 (most happy).

The second important variable is the Year column that enables the comparison of the data to take place across time. This is why we subsetted Country, Happiness score for the 2015 and 2019 datasets, joining them together via a left_join().

WHR %>% group_by(Year) %>% summarise(Min = min(Happiness, na.rm = T),
                                     Q1 = quantile(Happiness,probs = .25, na.rm = T),
                                     Median = median(Happiness, na.rm = T),
                                     Q3 = quantile(Happiness,probs = .75, na.rm = T),
                                     Max = max(Happiness,na.rm=T),
                                     Mean = mean(Happiness, na.rm = T),
                                     SD = sd(Happiness, na.rm = T),
                                     n = n(),
                                     Missing = sum(is.na(Happiness))) -> table1

Descriptive Statistics and Visualisation cont.

knitr::kable(table1)
Year Min Q1 Median Q3 Max Mean SD n Missing
2015 2.839 4.526 5.2325 6.24375 7.587 5.375734 1.145010 158 0
2019 3.083 4.548 5.4250 6.19800 7.769 5.433872 1.111244 158 9

Descriptive Statistics and Visualisation cont.

The summary statistics show a difference between the mean Happiness Score for 2015 and for 2019. A boxplot is created to visualise the difference, though at this stage we cannot know if the change over time is statistically significant.

We checked for any missing or special values using is.na, is.nan, and is.infinite functions.

sum(sapply(WHR, is.infinite))
## [1] 0
sum(sapply(WHR, is.nan))
## [1] 0
sum(is.na(WHR))
## [1] 9

Descriptive Statistics and Visualisation cont.

colSums(is.na(WHR))
##   Country      Year Happiness 
##         0         0         9

From the results, we can see that the Happiness column is missing 9 values. Now that the missing values have been found, we will decide to remove all observations for both years, as only having the Happiness value for one year does not provide any statistical insight.

WHR_Complete <- WHR[complete.cases(WHR), ]

WHR_Complete %>% boxplot(Happiness ~ Year, data = ., ylab = "Happiness", xlab= "Year", col = "blue", main = "Happiness by Year")

Hypothesis Testing

We will use a two sample t-test to test our hypothesis that there was a change in reported happiness levels between 2015 and 2019. We chose this test because a country’s demographic composition is assumed to change over time, meaning the samples used to measure happiness levels in 2015 are different to the ones for 2019.

\[H_0: \text{There is no change in countries' global happiness score between 2015 and 2019.} \] \[H_A: \text{There is a statistically significant change in global happiness score between 2015 and 2019}\]

Hypothesis Testing Cont.

WHR_2015 <- WHR %>% filter (Year == 2015)
WHR_2015$Happiness %>% qqPlot(dist="norm", main = "QQ Plot for 2015 Happiness")

## [1] 158 157

Hypothesis Testing Cont.

WHR_2019 <- WHR %>% filter (Year == 2019)
WHR_2019$Happiness %>% qqPlot(dist="norm", main = "QQ Plot for 2019 Happiness")

## [1] 148   6

Hypothesis Testing Cont.

leveneTest(Happiness ~ Year, data = WHR)

Hypothesis Testing Cont.

t.test(Happiness ~ Year,
       data = WHR,
       var.equal = T,
       alternative = "two.sided"
      )
## 
##  Two Sample t-test
## 
## data:  Happiness by Year
## t = -0.45104, df = 305, p-value = 0.6523
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.3117800  0.1955033
## sample estimates:
## mean in group 2015 mean in group 2019 
##           5.375734           5.433872

Discussion

Our investigation found that global happiness scores did not significantly change in the period between 2015 and 2019. We reached this conclusion based on the p-value of our test statistic, which was greater than the significance level.

The strengths of our investigation: The data we used had a large sample size. Most countries on the planet were included in the World Gallup Poll, which strengthens the generalisability of our results. The data we used was consistent, as all respondents were asked the same question when reporting their happiness levels.

The limitations of our investigation: The data used to calculate happiness scores was derived from the answers of a survey. Survey responses can sometimes be untrustworthy sources of information as respondents may not answer the questions honestly, may omit answers and may respond unconsciously. Happiness is a contestable concept that can take different meanings depending on its cultural context. Comparing life satisfaction across different countries using a singular metric does not account for these cultural differences. A similar problem applies to the variations in happiness’ meaning from one person to another, making it hard to objectively compare the concept across a population’s members.

Directions for future investigations: Since our investigation did not find a statistically significant change in happiness levels for the period 2015-2019, we advise future research to use alternative data collection methods to explore the question of global happiness levels further. For example, researchers could use focus groups and interviews to obtain more nuanced qualitative data concerning global populations’ self-reported happiness levels. Researchers would thus not have to rely on a singular metric to quantify a term as ambiguous as happiness. This would also give researchers more flexibility in accounting for possible cultural differences. Researchers could also look at the data for a wider timeframe, as the change in happiness levels over time might be more suited to an evaluation in terms of decades rather than years.

One take-home message: Though our investigation did not find a statistically significant change in happiness levels between 2015 and 2019, this should not discourage further research from being undertaken. Through the application of different statistical methods and the use of different kinds of data, happiness measures may indeed prove to be a useful tool for future policy-making.

References