Introduction



Problem Statement

\[H_0: \mu_1 - \mu_2 = 0 \]

-   Null Hypothesis: The difference in the mean happiness score between the West vs Rest of the world is 0

\[H_A: \mu_1 - \mu_2 \ne 0\]

-   Alternate Hypothesis: The difference in the mean happiness score between the West vs Rest of the world is not 0



Data

WorldHappinessReport2019 <- read_csv( "2019.csv")
glimpse(WorldHappinessReport2019)
## Rows: 156
## Columns: 9
## $ `Overall rank`                 <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,…
## $ `Country or region`            <chr> "Finland", "Denmark", "Norway", "Icela…
## $ Score                          <dbl> 7.769, 7.600, 7.554, 7.494, 7.488, 7.4…
## $ `GDP per capita`               <dbl> 1.340, 1.383, 1.488, 1.380, 1.396, 1.4…
## $ `Social support`               <dbl> 1.587, 1.573, 1.582, 1.624, 1.522, 1.5…
## $ `Healthy life expectancy`      <dbl> 0.986, 0.996, 1.028, 1.026, 0.999, 1.0…
## $ `Freedom to make life choices` <dbl> 0.596, 0.592, 0.603, 0.591, 0.557, 0.5…
## $ Generosity                     <dbl> 0.153, 0.252, 0.271, 0.354, 0.322, 0.2…
## $ `Perceptions of corruption`    <dbl> 0.393, 0.410, 0.341, 0.118, 0.298, 0.3…
countryclass <- read_csv( "countryclass.csv")
glimpse(countryclass)
## Rows: 253
## Columns: 2
## $ Country        <chr> "Andorra", "United Arab Emirates", "Afghanistan", "Ant…
## $ Classification <chr> "West", "Rest of the world", "Rest of the world", "Res…
#1.
countryclass$Country <- factor(countryclass$Country)
countryclass$Classification <- factor(countryclass$Classification)
WorldHappinessReport2019$`Country or region` <- factor(WorldHappinessReport2019$`Country or region`)
#2. 
colSums(is.na(WorldHappinessReport2019)) # there are no missing values in the original data set
##                 Overall rank            Country or region 
##                            0                            0 
##                        Score               GDP per capita 
##                            0                            0 
##               Social support      Healthy life expectancy 
##                            0                            0 
## Freedom to make life choices                   Generosity 
##                            0                            0 
##    Perceptions of corruption 
##                            0
#3.
WorldHappinessReport2019 <- WorldHappinessReport2019 %>% rename(Country = "Country or region")
WHI_2019 <- WorldHappinessReport2019 %>% left_join(countryclass, by = "Country")
#4.
WHI_2019 <- WHI_2019 %>% select(Country, Score,Classification)
head(WHI_2019)                   



Descriptive Statistics and Visualisation

Summary Statistics

Summary statistics of the happiness scores for the two groups are displayed below:

WHI_2019 %>% group_by(Classification) %>% summarise(Min = min(Score,na.rm = TRUE),
                                           Q1 = quantile(Score,probs = .25,na.rm = TRUE),
                                           Median = median(Score,na.rm = TRUE),
                                           Q3 = quantile(Score,probs = .75,na.rm = TRUE),
                                           Max = max(Score,na.rm = TRUE),
                                           Mean = mean(Score, na.rm = TRUE),
                                           SD = sd(Score, na.rm = TRUE),
                                           n = n(),
                                           Missing = sum(is.na(Score))) -> table1
knitr::kable(table1)
Classification Min Q1 Median Q3 Max Mean SD n Missing
Rest of the world 2.853 4.456 5.191 5.886 7.554 5.119656 1.000460 125 0
West 5.011 6.058 6.726 7.159 7.769 6.566129 0.729489 31 0
  • A review of the summary statistics indicate that the mean score of the West is higher than the rest of the world.

  • Confirms again that there is no missing data in the data set

  • The median for the West is 0.2 points higher than the mean indicating a left skew in the distribution

  • The minimum for the Rest of the world is more than 3 points lower than the west, indicating a long tail



Box plots

Lets investigate the distribution visually through a pair of box plots:

WHI_2019 %>% boxplot(Score ~ Classification, data = ., ylab = "Happiness Score", col=c('pink', 'sky blue'))

  • A visual inspection of the box plots also indicate a difference in the mean scores

  • The box plot for the West highlights the left skew

  • There are no outliers in the scores for the two groups



Q-Q plots

While not necessary for this analysis due to the sample sizes being greater than 30 (n = 31 and n = 125 for West and Rest of the World respectively), we will plot Q-Q plots for both groups to further investigate for normality.

HS_west <- WHI_2019 %>% filter(Classification == "West")
HS_west$Score %>% qqPlot(dist="norm", main = "QQ plot - Happiness Scores - West", col= 'dark blue', col.lines = 'sky blue')

## [1] 31 30
HS_ROW<- WHI_2019 %>% filter(Classification == "Rest of the world")
HS_ROW$Score %>% qqPlot(dist="norm", main = "QQ plot - Happiness Scores -Rest of the World", col= 'red', col.lines = 'pink')

## [1] 1 2
  • The data points fall close to the diagonal lines for both of the two groups indicating overall normally distributed.

  • However an ‘s’ shape is observed in both groups indicating non-normality

  • This does not matter however as per the Central Limit Theorem; when the sample size is large, the sampling distribution of a mean will be approximately normally distributed, regardless of the underlying population distribution



Homogeneity of Variance

Levene’s test is used to test the assumption of equal variance.

  • We will use the leveneTest() funciton in R to compare the variances of the two groups

  • If the variances between the two groups are not equal then we would expect a statistically significant difference in the output of the levenTest()

leveneTest(Score ~ Classification , data = WHI_2019)
  • The p-value for the Levene’s test of equal variance is 0.0826 (> 0.05)

  • As the p-value > 0.05, we assume equal variance.



Hypothesis Testing

Our hypothesis for the test is as follows:

\[H_0: \mu_1 - \mu_2 = 0 \]

-   Null Hypothesis: The difference in the mean happiness score between the West vs Rest of the world is 0

\[H_A: \mu_1 - \mu_2 \ne 0\]

-   Alternate Hypothesis: The difference in the mean happiness score between the West vs Rest of the world is not 0



Two-sample t-test - Assuming Equal Variance

We will perform a two-sample t-test with equal variance and a two-sided hypothesis test.

t.test(
  Score ~ Classification,
  data = WHI_2019,
  var.equal = TRUE,
  alternative = "two.sided"
  )
## 
##  Two Sample t-test
## 
## data:  Score by Classification
## t = -7.5589, df = 154, p-value = 3.402e-12
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.824503 -1.068443
## sample estimates:
## mean in group Rest of the world              mean in group West 
##                        5.119656                        6.566129
  • Using the p-value method to test the hypothesis, as the p-value = 3.402e-12 < 0.05, we fail to reject H0.

  • There is no statistically significant difference between the means of the two happiness scores between the West and the Rest of the World.



Discussion



References

  1. Worldhappiness.report. (2019). Home. [online] Available at: https://worldhappiness.report/.

  2. kaggle.com. (n.d.). World Happiness Report. [online] Available at: https://www.kaggle.com/unsdsn/world-happiness?select=2019.csv.

  3. Inc, G. (2014). How Does the Gallup World Poll Work? [online] Gallup.com. Available at: https://www.gallup.com/178667/gallup-world-poll-work.aspx.

  4. Wikipedia Contributors (2019). East–West dichotomy. [online] Wikipedia. Available at: https://en.wikipedia.org/wiki/East%E2%80%93West_dichotomy [Accessed 22 Oct. 2020].