Introduction

A statistical investigation into the age-old adage “Money can’t buy happiness.”
In 2012 the United Nations adopted Bhutan’s use of Gross National Happiness rather than Gross Domestic Product as a main indicator of development (Centre for Bhutan Studies, 2017).
For each country, a sample size of 2,000 to 3,000 was used collating data for the positive measures wealth, social support, healthy life expectancy, freedom, generosity, trust in government, and one negative measure, dystopia (Helliwell, Layard, & Sachs. 2017)
We became particularly interested in how much a wealth correlates to the perception of happiness in the world, and whether the happiness of a country could be predicted knowing it’s GDP per capita.
This report is published on RPubs at the following location :
- http://rpubs.com/ThreeOfTheFive/321000

Problem Statement

The World Happiness 2017 report data was sourced from the United Nations Sustainable Development Solutions Network’s World Happiness Report on https://www.kaggle.com/unsdsn/world-happiness.
We focussed on the variables Happiness.Score and GDP.per.Capita..
Tests were performed for correlation between these two variables,

Data

The dataset published by The World Happiness Report was accessed through the following URL: https://www.kaggle.com/unsdsn/world-happiness
The dataset consists of the following variables: (Helliwell, Huang, and Wang, 2016)
- Rank - each country’s rank in the Happiness Report.
- Country - the country surveyed.
- Happiness.Score - explained by the national average response to below life evaluation questions on the survey.
- Economy..GDP.per.Capita - the Gross Domestic Product per capita for that country.
- Family - confidence in social support, and having friends or relatives availabe to call on in times of trouble. “If you were in trouble, do you have relatives or friends you can count on to help you whenever you need them, or not?”
- Health..Life.Expectancy. - the healthy life expectancy at birth calulated based on the World Health Organisation and World Development Indicators.
- Freedom - freedom to make life choices. “Are you satisfied or dissatisfied with your freedom to choose what you do with your life?”
- Generosity - the national average of responses to “Have you donated money to a charity in the past month?”
- Trust..Government.Corruption. - a measure of the perception of corruption in society measured from responsed to the questions “Is corruption widespread throughout the government or not” and “Is corruption widespread within businesses or not?”
- Dystopia.Residual - the negative effect derived from worry, sadness and anger, measured from responses to questions on how much each emotion was felt the previous day.
Our investigation focussed on the variables Happiness.Score and Economy..GDP.per.Capita in order to investigate the relationship between these two variables.

# Read the data file from the project folder.
Dataset_for_Assignment_4 <- read_csv("Dataset for Assignment 4.csv")

Descriptive Statistics and Visualisation

Summaries of the Happiness.Score and GDP variables were produced.

Happiness_Score_Summary <- Dataset_for_Assignment_4 %>% 
            summarise(
             Min = min(Happiness.Score,na.rm = TRUE),
             Q1 = quantile(Happiness.Score,probs = .25,na.rm = TRUE),
             Median = median(Happiness.Score, na.rm = TRUE),
             Q3 = quantile(Happiness.Score,probs = .75,na.rm = TRUE),
             Max = max(Happiness.Score,na.rm = TRUE), 
             Mean = mean(Happiness.Score, na.rm = TRUE),
             SD = sd(Happiness.Score, na.rm = TRUE),
             n = n())
Happiness_Score_Summary

There were no outliers found in either of the variables.

Happiness_Score_Outliers <- Dataset_for_Assignment_4 %>% summarise(
                LoFence = (quantile(Happiness.Score,probs = .25)-1.5*IQR(Happiness.Score)) %>% round(1),
                NumLoOuts = sum(Happiness.Score < LoFence) %>% round(0),
                SD_Min2Mean = ((mean(Happiness.Score) - min(Happiness.Score)) / sd(Happiness.Score)) %>% round(1),
                UpFence = (quantile(Happiness.Score,probs = .75)+1.5*IQR(Happiness.Score)) %>% round(1),
                NumUpOuts = sum(Happiness.Score > UpFence) %>% round(0),
                SD_Mean2Max = ((max(Happiness.Score) - mean(Happiness.Score)) / sd(Happiness.Score)) %>% round(1))
Happiness_Score_Outliers

Descriptive Statistics Continued

Similarly, no outliers were found in the GDP variable.

Wealth_Score_Summary <- Dataset_for_Assignment_4 %>% 
            summarise(
             Min = min(Economy..GDP.per.Capita.,na.rm = TRUE),
             Q1 = quantile(Economy..GDP.per.Capita.,probs = .25,na.rm = TRUE),
             Median = median(Economy..GDP.per.Capita., na.rm = TRUE),
             Q3 = quantile(Economy..GDP.per.Capita.,probs = .75,na.rm = TRUE),
             Max = max(Economy..GDP.per.Capita.,na.rm = TRUE), 
             Mean = mean(Economy..GDP.per.Capita., na.rm = TRUE),
             SD = sd(Economy..GDP.per.Capita., na.rm = TRUE),
             n = n())
Wealth_Score_Summary

Wealth_Score_Outliers <- Dataset_for_Assignment_4 %>% summarise(
                LoFence = (quantile(Economy..GDP.per.Capita.,probs = .25)-1.5*IQR(Economy..GDP.per.Capita.)) %>% round(1),
                NumLoOuts = sum(Economy..GDP.per.Capita. < LoFence) %>% round(0),
                SD_Min2Mean = ((mean(Economy..GDP.per.Capita.) - min(Economy..GDP.per.Capita.)) 
                               / sd(Economy..GDP.per.Capita.)) %>% round(1),
                UpFence = (quantile(Economy..GDP.per.Capita.,probs = .75)+1.5*IQR(Economy..GDP.per.Capita.)) %>% round(1),
                NumUpOuts = sum(Economy..GDP.per.Capita. > UpFence) %>% round(0),
                SD_Mean2Max = ((  max(Economy..GDP.per.Capita.) - mean(Economy..GDP.per.Capita.)  ) 
                               / sd(Economy..GDP.per.Capita.)) %>% round(1))
Wealth_Score_Outliers

Descriptive Statistics Continued - Boxplot

boxplot(
  Dataset_for_Assignment_4$Economy..GDP.per.Capita.,
  Dataset_for_Assignment_4$Happiness.Score,
  ylab = "Scores Ratings",
  xlab = "Wealth and Happiness"
  )
axis(1, at = 1:2, labels = c("Wealth", "Happiness"))

Descriptive Statistics Continued - Matplot

matplot(t(data.frame(Dataset_for_Assignment_4$Economy..GDP.per.Capita., Dataset_for_Assignment_4$Happiness.Score)),
  type = "b",
  pch = 19,
  col = 1,
  lty = 1,
  xlab = "Comparison",
  ylab = "Score rating",
  xaxt = "n"
  )
axis(1, at = 1:2, labels = c("Wealth Score", "Happiness Score"))

From the matplot we can clearly visualise that it is a positive increase relationship between Economy and Happiness. If wealth socre or happiness score increase, another one will increase as well.

Descriptive Statistics Continued - Graphical Anova.

granova.ds(
  data.frame(Dataset_for_Assignment_4$Economy..GDP.per.Capita., Dataset_for_Assignment_4$Happiness.Score),
  xlab = "Wealth Score",
  ylab = "Happiness Score"
  )

##             Summary Stats
## n                 155.000
## mean(x)             0.985
## mean(y)             5.354
## mean(D=x-y)        -4.369
## SD(D)               0.827
## ES(D)              -5.286
## r(x,y)              0.812
## r(x+y,d)           -0.893
## LL 95%CI           -4.500
## UL 95%CI           -4.238
## t(D-bar)          -65.809
## df.t              154.000
## pval.t              0.000

Descriptive Statistics Continued - QQPlot (Happiness)

The Happiness Score was found to be normally distributed as observed in the quantile-quantile plot below, making this variable suitable for a Paired Samples T-Test.

Dataset_for_Assignment_4$Happiness.Score  %>% qqPlot(dist="norm")

Descriptive Statistics Continued - QQPlot (GDP)

Despite some slight skew in the GDP variable, the number of samples at 155 make this vairable also suitable for a Paired Samples T-Test.

Dataset_for_Assignment_4$Economy..GDP.per.Capita.  %>% qqPlot(dist="norm")

Descriptive Statistics Continued - T - Test

A paired-samples tt-test was used to test for a significant mean difference between scores levels of economy and happiness. The mean difference following exercise was found to be -4.37 (SD = 0.827). Visual inspection of the Q-Q plot of the difference scores suggested that the Happiness.Score was approximately normally distributed, but Economy..GDP.per.Capita was not so clear. The paired-samples tt-test found a statistically significant mean difference between stress levels before and after exercise, t(df=154)=???65.8, p<0.05, 95% [ -4.500461 -4.238141]. Happiness scores were found to be significantly increased after wealth score increased.

t.test(Dataset_for_Assignment_4$Economy..GDP.per.Capita., Dataset_for_Assignment_4$Happiness.Score,
       paired = TRUE,
       alternative = "two.sided")

## 
##  Paired t-test
## 
## data:  Dataset_for_Assignment_4$Economy..GDP.per.Capita. and Dataset_for_Assignment_4$Happiness.Score
## t = -65.809, df = 154, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -4.500461 -4.238141
## sample estimates:
## mean of the differences 
##               -4.369301

Decsriptive Statistics Continued - Linear Regression Model

y2 <- Dataset_for_Assignment_4$Happiness.Score^2
x2 <- Dataset_for_Assignment_4$Economy..GDP.per.Capita.^2
xy <- Dataset_for_Assignment_4$Happiness.Score*Dataset_for_Assignment_4$Economy..GDP.per.Capita.
sum_x <- sum(Dataset_for_Assignment_4$Economy..GDP.per.Capita.)
sum_y <- sum(Dataset_for_Assignment_4$Happiness.Score)
sum_x_sq <- sum(Dataset_for_Assignment_4$Economy..GDP.per.Capita.^2)
sum_y_sq <- sum(Dataset_for_Assignment_4$Happiness.Score^2)
sum_xy <- sum(Dataset_for_Assignment_4$Happiness.Score*Dataset_for_Assignment_4$Economy..GDP.per.Capita.)
n <- length(Dataset_for_Assignment_4$Economy..GDP.per.Capita.) #Sample size

Lxx <- sum_x_sq-((sum_x^2)/n)
Lyy <- sum_y_sq-((sum_y^2)/n)
Lxy = sum_xy - (((sum_x)*(sum_y))/n)
b = Lxy/Lxx
a = mean(Dataset_for_Assignment_4$Economy..GDP.per.Capita. - b*mean(Dataset_for_Assignment_4$Happiness.Score))

plot(Economy..GDP.per.Capita. ~ Happiness.Score, 
     data = Dataset_for_Assignment_4, xlab = "Happiness Score", ylab = "Economy Score")

abline(a = a, b = b, col= "red")
abline(lm(Dataset_for_Assignment_4$Economy..GDP.per.Capita. ~ Dataset_for_Assignment_4$Happiness.Score))

HapEconmodel <- lm( Economy..GDP.per.Capita. ~ Happiness.Score, data = Dataset_for_Assignment_4)
HapEconmodel %>% summary()

## 
## Call:
## lm(formula = Economy..GDP.per.Capita. ~ Happiness.Score, data = Dataset_for_Assignment_4)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.90072 -0.16663  0.00354  0.16685  0.61731 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     -0.63338    0.09593  -6.603 6.27e-10 ***
## Happiness.Score  0.30222    0.01753  17.238  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2461 on 153 degrees of freedom
## Multiple R-squared:  0.6601, Adjusted R-squared:  0.6579 
## F-statistic: 297.1 on 1 and 153 DF,  p-value: < 2.2e-16

R2 <- (b*Lxy)/Lyy
R2

## [1] 0.6601055

Hypothesis Testing

H0: The data does not fit the linear regression model

pf(q = 297.1,1,153,lower.tail = FALSE)

## [1] 1.117922e-37

(R2/(1-R2)*(153/1))

## [1] 297.1396

HapEconmodel %>% anova()

We confirm the pp-value reported in the summary to be p<.001. As p-value is less than the 0.05 level of significance, we reject H0. There was statistically significant evidence that the data fits a linear regression model.

Hypthesis Testing Continued

HapEconmodel %>% summary() %>% coef()

##                   Estimate Std. Error   t value     Pr(>|t|)
## (Intercept)     -0.6333762 0.09592844 -6.602591 6.271247e-10
## Happiness.Score  0.3022205 0.01753249 17.237739 1.110391e-37

The intercept/constant is reported as a=-0.633.This value represents the average Economy..GDP.per.Capita score when happiness score is equal to 0. To test the statistical significance of the constant, we set the following statistical hypotheses:
H0:\(\alpha\)=0
HA:\(\alpha\)<>0
This hypothesis is tested using a t statistic, reported as t=-6.6026, p<.001. The constant is statistically significant at the 0.05 level. This means that there is statistically significant evidence that the constant is not 0.

HapEconmodel %>% confint()

##                      2.5 %     97.5 %
## (Intercept)     -0.8228915 -0.4438609
## Happiness.Score  0.2675835  0.3368575

R reports the 95% CI for a to be [-0.8228915, -0.4438609]. H0:\(\beta\)=0 is clearly not captured by this interval, so was rejected.

H0:\(\beta\)=0 HA:\(\beta\)<>0 The slope of the regression line was reported as b=0.302.A one unit increase in Happiness Score was related to an average increase in Economy..GDP.per.Capita of .302 units. This is a positive change. We confirm that p<.001. As p<.05, we reject H0. There was statistically significant evidence that Happiness Score was positively related to Economy..GDP.per.Capita.

Hypthesis Testing Continued

plot(HapEconmodel)

Discussion

The investigation found that there is a strong correlation between wealth (GPD per capita) and happiness.
It is not clear whether this is purely a correlation or whether there is indeed a causational relationship between wealth and happiness.
Does wealth alone bring increased happiness?
Is the increase in happiness related to other societal factors which improve as the economy of a country grows (without corruption) bringing improved social, educational and health services.
Questions arise from this study which themselves warrant further investigation :
What is the strength of correlation between each of the other variables surveyed and happiness??
Are any effect causational or simply correlational?
Is economic growth in itself correlated with the improvement of the other variables that contribute to the happiness score? Does happiness not come from the wealth alone, but from the flow-on impact of wealth on other factors in society?

In summary, a strong correlation exists between wealth and happiness, and it was found that the Gross Domestic Product of a country is a statistically significant predictor of the happiness of that country. The question remains as to whether happiness is derived from wealth alone, or from the benefits to society that wealth has the potential to bring.

References

Centre for Bhutan Studies, (2011). Gross National Happiness Index Explained in Detail [online] Available at: http://www.grossnationalhappiness.com/docs/GNH/PDFs/Sabina_Alkire_method.pdf [Accessed 21 Oct. 2017].
Helliwell, J., Layard, R., & Sachs, J. (2017). World Happiness Report 2017, New York: Sustainable Development Solutions Network. [online] Available at:http://worldhappiness.report/ [Accessed 21 Oct. 2017].
Helliwell, J., Layard, R., & Sachs, J. (2017). World Happiness Report - Frequently Asked Questions. [online] Available at:http://worldhappiness.report/faq/ [Accessed 21 Oct. 2017].
Helliwell, J.F., Huang, H. and Wang S. (2016). Statistical Appendix for “The Distribution of World Happiness”. [online] Available at:http://worldhappiness.report/wp-content/uploads/sites/2/2016/03/StatisticalAppendixWHR2016.pdf [Accessed 21 Oct. 2017].

MATH1324 Introduction to Statistics Assignment 4

An investigation into the correlation between money and happiness.

Introduction

Problem Statement

Data

Descriptive Statistics and Visualisation

Descriptive Statistics Continued

Descriptive Statistics Continued - Boxplot

Descriptive Statistics Continued - Matplot

Descriptive Statistics Continued - Graphical Anova.

Descriptive Statistics Continued - QQPlot (Happiness)

Descriptive Statistics Continued - QQPlot (GDP)

Descriptive Statistics Continued - T - Test

Decsriptive Statistics Continued - Linear Regression Model

Hypothesis Testing

Hypthesis Testing Continued

Hypthesis Testing Continued

Discussion

References