Happiness Levels in Eastern Europe compared to Western Europe

Are mean happiness scores higher in Western Europe than in Eastern Europe?

Felicity Draper, s3742570

Last updated: 14 October, 2020

Introduction

RPubs Link:

Problem Statement

-This study aims to investigate if happiness levels are higher in Western Europe than in Eastern Europe - A two-sample t-test will be used to determine if there is a statistically significant difference between mean scores of the two regions

Alt text

Data

Code for Reading Data

# Import and Preprocess Data
happy<- read.csv("2018.csv")
country_regions<- read.csv("countries_of_the_world.csv")
happy_score<- happy[, 2:3]# Subset for required columns
regions<- country_regions[, 1:2]
colnames(happy_score)<- c("Country", "Score")# Name columns
regions$Region<- str_trim(regions$Region, side = "right") # Manipulate strings to be same
regions$Country<- str_trim(regions$Country, side = "right")
happy_region<- inner_join(happy_score, regions) # merge to one data set
happy_region$Region<- str_to_title(happy_region$Region) #fix all caps
head(happy_region) # print and check
str(happy_region)
## 'data.frame':    144 obs. of  3 variables:
##  $ Country: chr  "Finland" "Norway" "Denmark" "Iceland" ...
##  $ Score  : num  7.63 7.59 7.55 7.5 7.49 ...
##  $ Region : chr  "Western Europe" "Western Europe" "Western Europe" "Western Europe" ...
# Filter for only East and West Europe
east<- happy_region %>% filter(Region == "Eastern Europe")
west<- happy_region %>% filter(Region == "Western Europe")
east_west<- rbind(east, west)
head(east_west)

Descriptive Statistics and Visualisation

#Box plot to show median, IQR and possible outliers 
east_west %>% boxplot(Score ~ Region, data = ., main="Box Plot of Happiness Score by Region", 
                     ylab="Region", xlab="Happiness Score", horizontal=TRUE, col = "skyblue")

Histograms

# Side by Side histogram to show distribution, using lattice package
east_west %>% histogram(~ Score|Region, col="skyblue", data=., xlab="Happiness Score", breaks = 7)

Summary Statistics

# Happiness Level Statistics by Region (East or West Europe)
east_west %>% group_by(Region) %>% summarise(Min = min(Score,na.rm = TRUE),
                                           Q1 = quantile(Score,probs = .25,na.rm = TRUE),
                                           Median = median(Score, na.rm = TRUE),
                                           Q3 = quantile(Score,probs = .75,na.rm = TRUE),
                                           Max = max(Score,na.rm = TRUE),
                                           Mean = mean(Score, na.rm = TRUE),
                                           SD = sd(Score, na.rm = TRUE),
                                           n = n(),
                                           Missing = sum(is.na(Score))) -> table1

knitr::kable(table1)
Region Min Q1 Median Q3 Max Mean SD n Missing
Eastern Europe 4.586 5.253 5.620 6.0355 6.711 5.631182 0.6188673 11 0
Western Europe 5.358 6.558 6.977 7.4640 7.632 6.885263 0.6974815 19 0

Hypothesis Testing

Hypothesis Statement

Null Hypothsis: \[H_0: \mu_1 = \mu_2 \] (There is no difference in means) Alternative Hypothesis: \[H_0: \mu_1 < \mu_2 \](The true difference in means is less than 0)

The Alternative Hypothesis can be understood as the true mean for happiness score in Eastern Europe is less than that in Western Europe.

Assumption of Normality Test

# Testing for normality using Q-Q Plot
east$Score %>% qqPlot(dist="norm", main = "Eastern Europe Normality Test") # Eastern Europe Countries

## [1]  1 11
west$Score %>% qqPlot(dist = "norm", main = "Western Europe Normality Test") # Western Europe Countries

## [1] 19 18

Homogeneity of Variance Test

# Testing for homogeneity of variance using Levene's test
leveneTest(Score ~ Region, data = east_west)

Two-Sample t-test

# Two-sample t-test
t.test(
  Score ~ Region,
  data = east_west,
  var.equal = TRUE,
  alternative = "less"
  )
## 
##  Two Sample t-test
## 
## data:  Score by Region
## t = -4.937, df = 28, p-value = 1.647e-05
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
##       -Inf -0.821965
## sample estimates:
## mean in group Eastern Europe mean in group Western Europe 
##                     5.631182                     6.885263

Results from one-tailed two-sample t-test

p-value < 0.001. As p-value < 0.05. Decision is made to reject H0.

The t-test score is -4.937. The t-critical level is found to be -1.701 using the following R-function.

qt(p = 0.05, df = 28, lower.tail = TRUE)
## [1] -1.701131

As the t-test score is more extreme then the t-critical level a decision is made to reject H0.

Discussion

Final Conclusion

References

“World Happiness Report”, Sustainable Development Solutions Network, 2018, accessed from https://www.kaggle.com/unsdsn/world-happiness

“Countries of the World”, Fernando Lasso, 2018, accessed from https://www.kaggle.com/fernandol/countries-of-the-world

Map of Europe, Google Maps, 2020, accessed from https://www.google.com/maps/place/Europe/