Introduction

RPubs Link:

Developed nations with high levels of wealth and health can still have very different levels of happiness among their citizens
There is many factors at play
Understanding differences between similar regions can provide insight to what increases or decreases happiness
This can be used for policy making and development
Eastern and Western Europe have similar geographies, landscapes and climates
Cultural, historical and religious differences may have an impact

Problem Statement

-This study aims to investigate if happiness levels are higher in Western Europe than in Eastern Europe - A two-sample t-test will be used to determine if there is a statistically significant difference between mean scores of the two regions

Alt text

Data

Data has been collected from open source data sets available on Kaggle.com
2 data sets have been merged to provide all the required variables
World Happiness Data from 2018 (SDNS, 2018) is from the Gallup World Poll.
Stratified Sampling with either face to face or telephone interviews by household is used to approximate the population
Information about which regions countries belong to is from the Countries of the World data set (Lasso, 2018)
This data set uses data collected by the US Census Bureau
Data is subsetted to include only the observations from Eastern and Western Europe
Sample size includes 11 Eastern and 19 Western
Preprocessing was applied as necessary including
- renaming columns
- trimming whitespace on values
- merging dataframes using common variable
- filtering to include only eastern and western european results

Code for Reading Data

# Import and Preprocess Data
happy<- read.csv("2018.csv")
country_regions<- read.csv("countries_of_the_world.csv")
happy_score<- happy[, 2:3]# Subset for required columns
regions<- country_regions[, 1:2]
colnames(happy_score)<- c("Country", "Score")# Name columns
regions$Region<- str_trim(regions$Region, side = "right") # Manipulate strings to be same
regions$Country<- str_trim(regions$Country, side = "right")
happy_region<- inner_join(happy_score, regions) # merge to one data set
happy_region$Region<- str_to_title(happy_region$Region) #fix all caps
head(happy_region) # print and check

str(happy_region)

## 'data.frame':    144 obs. of  3 variables:
##  $ Country: chr  "Finland" "Norway" "Denmark" "Iceland" ...
##  $ Score  : num  7.63 7.59 7.55 7.5 7.49 ...
##  $ Region : chr  "Western Europe" "Western Europe" "Western Europe" "Western Europe" ...

# Filter for only East and West Europe
east<- happy_region %>% filter(Region == "Eastern Europe")
west<- happy_region %>% filter(Region == "Western Europe")
east_west<- rbind(east, west)
head(east_west)

Descriptive Statistics and Visualisation

A boxplot is used to visualise the median values, inter-quartile range. No outliers were identified for either region.
Median Happiness Score for Western Europe is higher than Eastern Europe.

#Box plot to show median, IQR and possible outliers 
east_west %>% boxplot(Score ~ Region, data = ., main="Box Plot of Happiness Score by Region", 
                     ylab="Region", xlab="Happiness Score", horizontal=TRUE, col = "skyblue")

Histograms

Side by side histograms allow visualisation of the happiness score distribution.
Eastern European Scores are distributed more frequently at lower number than Western European scores.
Western Europe appears negatively skewed while Eastern Europe appears positively skewed

# Side by Side histogram to show distribution, using lattice package
east_west %>% histogram(~ Score|Region, col="skyblue", data=., xlab="Happiness Score", breaks = 7)

Summary Statistics

# Happiness Level Statistics by Region (East or West Europe)
east_west %>% group_by(Region) %>% summarise(Min = min(Score,na.rm = TRUE),
                                           Q1 = quantile(Score,probs = .25,na.rm = TRUE),
                                           Median = median(Score, na.rm = TRUE),
                                           Q3 = quantile(Score,probs = .75,na.rm = TRUE),
                                           Max = max(Score,na.rm = TRUE),
                                           Mean = mean(Score, na.rm = TRUE),
                                           SD = sd(Score, na.rm = TRUE),
                                           n = n(),
                                           Missing = sum(is.na(Score))) -> table1

knitr::kable(table1)

Region	Min	Q1	Median	Q3	Max	Mean	SD	n	Missing
Eastern Europe	4.586	5.253	5.620	6.0355	6.711	5.631182	0.6188673	11	0
Western Europe	5.358	6.558	6.977	7.4640	7.632	6.885263	0.6974815	19	0

Hypothesis Testing

A one-tailed two sample t-test is used to analyze if there is a statistically significant difference between means
Assumption of normality is tested using Q-Q plot
Homogeneity of variance using Levene’s test

Hypothesis Statement

Null Hypothsis: \[H_0: \mu_1 = \mu_2 \] (There is no difference in means) Alternative Hypothesis: \[H_0: \mu_1 < \mu_2 \](The true difference in means is less than 0)

The Alternative Hypothesis can be understood as the true mean for happiness score in Eastern Europe is less than that in Western Europe.

Assumption of Normality Test

Q-Q plots are used to test visually for normality
Eastern Europe values all fall within 95% CI for normal quantiles so normality can be safely assumed.
Western Europe has one value which falls outside the 95% CI for normal quantiles. However the sample size is slightly larger, all other values are close to the diagonal line and the value does not fall very far out of the 95% CI so we will assume normality in this case.

# Testing for normality using Q-Q Plot
east$Score %>% qqPlot(dist="norm", main = "Eastern Europe Normality Test") # Eastern Europe Countries

## [1]  1 11

west$Score %>% qqPlot(dist = "norm", main = "Western Europe Normality Test") # Western Europe Countries

## [1] 19 18

Homogeneity of Variance Test

# Testing for homogeneity of variance using Levene's test
leveneTest(Score ~ Region, data = east_west)

The p-value in 0.898. As p > 0.05, we fail to reject the null hypothesis. It is safe to assume equal variance.
Based on these results a two-sample t-test assuming equal variance has been performed.

Two-Sample t-test

The two samples t-test has been selected as it is appropriate for comparing the difference between - two population means.
It is assumed that the populations tested are independent of each other.
Test will be performed with the following R-Code

# Two-sample t-test
t.test(
  Score ~ Region,
  data = east_west,
  var.equal = TRUE,
  alternative = "less"
  )

## 
##  Two Sample t-test
## 
## data:  Score by Region
## t = -4.937, df = 28, p-value = 1.647e-05
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
##       -Inf -0.821965
## sample estimates:
## mean in group Eastern Europe mean in group Western Europe 
##                     5.631182                     6.885263

Results from one-tailed two-sample t-test

p-value < 0.001. As p-value < 0.05. Decision is made to reject H0.

The t-test score is -4.937. The t-critical level is found to be -1.701 using the following R-function.

qt(p = 0.05, df = 28, lower.tail = TRUE)

## [1] -1.701131

As the t-test score is more extreme then the t-critical level a decision is made to reject H0.

Discussion

Based on both the p-value and the t-test a decision is made to reject the null hypothesis. The 95% CI for the difference in the mean is reported as [-Inf, -0.812]. The sample mean difference is (6.88 - 5.63) -1.25, which falls in the 95% CI, so we reject the Null Hypothesis.
Limitations include:
- Survey format relies on people’s self reporting, may over or underestimate their happiness as compared to others
- Head of the household may not report in a way that reflects other members
- Questionnaire developed by people within Western European culture, may reflect values that do not accurately describe happiness for Eastern Europeans
Future investigation into the factors creating this difference in happiness levels would be interesting

Final Conclusion

A mean difference of -1.25 points is found between Happiness Scores in Western and Eastern Europe.
The results of the two-sample t-test found this difference to be statistically significant.
This indicates that the population in Western Europe reported higher levels of overall happiness than the population in Eastern Europe, based on the 2018 Gallup World Poll

References

“World Happiness Report”, Sustainable Development Solutions Network, 2018, accessed from https://www.kaggle.com/unsdsn/world-happiness

“Countries of the World”, Fernando Lasso, 2018, accessed from https://www.kaggle.com/fernandol/countries-of-the-world

Map of Europe, Google Maps, 2020, accessed from https://www.google.com/maps/place/Europe/

Happiness Levels in Eastern Europe compared to Western Europe

Are mean happiness scores higher in Western Europe than in Eastern Europe?