MATH1324 Introduction to Statistics Assignment 4

An investigation into the correlation between money and happiness.

Ling Hou (s3637388), Scott Keene (s3686673), Kamalpreet Khangura (s3688108)

Last updated: 22 October, 2017

Introduction

Problem Statement

Data

# Read the data file from the project folder.
Dataset_for_Assignment_4 <- read_csv("Dataset for Assignment 4.csv")

Descriptive Statistics and Visualisation

Happiness_Score_Summary <- Dataset_for_Assignment_4 %>% 
            summarise(
             Min = min(Happiness.Score,na.rm = TRUE),
             Q1 = quantile(Happiness.Score,probs = .25,na.rm = TRUE),
             Median = median(Happiness.Score, na.rm = TRUE),
             Q3 = quantile(Happiness.Score,probs = .75,na.rm = TRUE),
             Max = max(Happiness.Score,na.rm = TRUE), 
             Mean = mean(Happiness.Score, na.rm = TRUE),
             SD = sd(Happiness.Score, na.rm = TRUE),
             n = n())
Happiness_Score_Summary

There were no outliers found in either of the variables.

Happiness_Score_Outliers <- Dataset_for_Assignment_4 %>% summarise(
                LoFence = (quantile(Happiness.Score,probs = .25)-1.5*IQR(Happiness.Score)) %>% round(1),
                NumLoOuts = sum(Happiness.Score < LoFence) %>% round(0),
                SD_Min2Mean = ((mean(Happiness.Score) - min(Happiness.Score)) / sd(Happiness.Score)) %>% round(1),
                UpFence = (quantile(Happiness.Score,probs = .75)+1.5*IQR(Happiness.Score)) %>% round(1),
                NumUpOuts = sum(Happiness.Score > UpFence) %>% round(0),
                SD_Mean2Max = ((max(Happiness.Score) - mean(Happiness.Score)) / sd(Happiness.Score)) %>% round(1))
Happiness_Score_Outliers

Descriptive Statistics Continued

Similarly, no outliers were found in the GDP variable.

Wealth_Score_Summary <- Dataset_for_Assignment_4 %>% 
            summarise(
             Min = min(Economy..GDP.per.Capita.,na.rm = TRUE),
             Q1 = quantile(Economy..GDP.per.Capita.,probs = .25,na.rm = TRUE),
             Median = median(Economy..GDP.per.Capita., na.rm = TRUE),
             Q3 = quantile(Economy..GDP.per.Capita.,probs = .75,na.rm = TRUE),
             Max = max(Economy..GDP.per.Capita.,na.rm = TRUE), 
             Mean = mean(Economy..GDP.per.Capita., na.rm = TRUE),
             SD = sd(Economy..GDP.per.Capita., na.rm = TRUE),
             n = n())
Wealth_Score_Summary 
Wealth_Score_Outliers <- Dataset_for_Assignment_4 %>% summarise(
                LoFence = (quantile(Economy..GDP.per.Capita.,probs = .25)-1.5*IQR(Economy..GDP.per.Capita.)) %>% round(1),
                NumLoOuts = sum(Economy..GDP.per.Capita. < LoFence) %>% round(0),
                SD_Min2Mean = ((mean(Economy..GDP.per.Capita.) - min(Economy..GDP.per.Capita.)) 
                               / sd(Economy..GDP.per.Capita.)) %>% round(1),
                UpFence = (quantile(Economy..GDP.per.Capita.,probs = .75)+1.5*IQR(Economy..GDP.per.Capita.)) %>% round(1),
                NumUpOuts = sum(Economy..GDP.per.Capita. > UpFence) %>% round(0),
                SD_Mean2Max = ((  max(Economy..GDP.per.Capita.) - mean(Economy..GDP.per.Capita.)  ) 
                               / sd(Economy..GDP.per.Capita.)) %>% round(1))
Wealth_Score_Outliers

Descriptive Statistics Continued - Boxplot

boxplot(
  Dataset_for_Assignment_4$Economy..GDP.per.Capita.,
  Dataset_for_Assignment_4$Happiness.Score,
  ylab = "Scores Ratings",
  xlab = "Wealth and Happiness"
  )
axis(1, at = 1:2, labels = c("Wealth", "Happiness"))

Descriptive Statistics Continued - Matplot

matplot(t(data.frame(Dataset_for_Assignment_4$Economy..GDP.per.Capita., Dataset_for_Assignment_4$Happiness.Score)),
  type = "b",
  pch = 19,
  col = 1,
  lty = 1,
  xlab = "Comparison",
  ylab = "Score rating",
  xaxt = "n"
  )
axis(1, at = 1:2, labels = c("Wealth Score", "Happiness Score"))

From the matplot we can clearly visualise that it is a positive increase relationship between Economy and Happiness. If wealth socre or happiness score increase, another one will increase as well.

Descriptive Statistics Continued - Graphical Anova.

granova.ds(
  data.frame(Dataset_for_Assignment_4$Economy..GDP.per.Capita., Dataset_for_Assignment_4$Happiness.Score),
  xlab = "Wealth Score",
  ylab = "Happiness Score"
  )

##             Summary Stats
## n                 155.000
## mean(x)             0.985
## mean(y)             5.354
## mean(D=x-y)        -4.369
## SD(D)               0.827
## ES(D)              -5.286
## r(x,y)              0.812
## r(x+y,d)           -0.893
## LL 95%CI           -4.500
## UL 95%CI           -4.238
## t(D-bar)          -65.809
## df.t              154.000
## pval.t              0.000

Descriptive Statistics Continued - QQPlot (Happiness)

The Happiness Score was found to be normally distributed as observed in the quantile-quantile plot below, making this variable suitable for a Paired Samples T-Test.

Dataset_for_Assignment_4$Happiness.Score  %>% qqPlot(dist="norm")

Descriptive Statistics Continued - QQPlot (GDP)

Despite some slight skew in the GDP variable, the number of samples at 155 make this vairable also suitable for a Paired Samples T-Test.

Dataset_for_Assignment_4$Economy..GDP.per.Capita.  %>% qqPlot(dist="norm")

Descriptive Statistics Continued - T - Test

A paired-samples tt-test was used to test for a significant mean difference between scores levels of economy and happiness. The mean difference following exercise was found to be -4.37 (SD = 0.827). Visual inspection of the Q-Q plot of the difference scores suggested that the Happiness.Score was approximately normally distributed, but Economy..GDP.per.Capita was not so clear. The paired-samples tt-test found a statistically significant mean difference between stress levels before and after exercise, t(df=154)=???65.8, p<0.05, 95% [ -4.500461 -4.238141]. Happiness scores were found to be significantly increased after wealth score increased.

t.test(Dataset_for_Assignment_4$Economy..GDP.per.Capita., Dataset_for_Assignment_4$Happiness.Score,
       paired = TRUE,
       alternative = "two.sided")
## 
##  Paired t-test
## 
## data:  Dataset_for_Assignment_4$Economy..GDP.per.Capita. and Dataset_for_Assignment_4$Happiness.Score
## t = -65.809, df = 154, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -4.500461 -4.238141
## sample estimates:
## mean of the differences 
##               -4.369301

Decsriptive Statistics Continued - Linear Regression Model

y2 <- Dataset_for_Assignment_4$Happiness.Score^2
x2 <- Dataset_for_Assignment_4$Economy..GDP.per.Capita.^2
xy <- Dataset_for_Assignment_4$Happiness.Score*Dataset_for_Assignment_4$Economy..GDP.per.Capita.
sum_x <- sum(Dataset_for_Assignment_4$Economy..GDP.per.Capita.)
sum_y <- sum(Dataset_for_Assignment_4$Happiness.Score)
sum_x_sq <- sum(Dataset_for_Assignment_4$Economy..GDP.per.Capita.^2)
sum_y_sq <- sum(Dataset_for_Assignment_4$Happiness.Score^2)
sum_xy <- sum(Dataset_for_Assignment_4$Happiness.Score*Dataset_for_Assignment_4$Economy..GDP.per.Capita.)
n <- length(Dataset_for_Assignment_4$Economy..GDP.per.Capita.) #Sample size

Lxx <- sum_x_sq-((sum_x^2)/n)
Lyy <- sum_y_sq-((sum_y^2)/n)
Lxy = sum_xy - (((sum_x)*(sum_y))/n)
b = Lxy/Lxx
a = mean(Dataset_for_Assignment_4$Economy..GDP.per.Capita. - b*mean(Dataset_for_Assignment_4$Happiness.Score))

plot(Economy..GDP.per.Capita. ~ Happiness.Score, 
     data = Dataset_for_Assignment_4, xlab = "Happiness Score", ylab = "Economy Score")

abline(a = a, b = b, col= "red")
abline(lm(Dataset_for_Assignment_4$Economy..GDP.per.Capita. ~ Dataset_for_Assignment_4$Happiness.Score))

HapEconmodel <- lm( Economy..GDP.per.Capita. ~ Happiness.Score, data = Dataset_for_Assignment_4)
HapEconmodel %>% summary()
## 
## Call:
## lm(formula = Economy..GDP.per.Capita. ~ Happiness.Score, data = Dataset_for_Assignment_4)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.90072 -0.16663  0.00354  0.16685  0.61731 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     -0.63338    0.09593  -6.603 6.27e-10 ***
## Happiness.Score  0.30222    0.01753  17.238  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2461 on 153 degrees of freedom
## Multiple R-squared:  0.6601, Adjusted R-squared:  0.6579 
## F-statistic: 297.1 on 1 and 153 DF,  p-value: < 2.2e-16
R2 <- (b*Lxy)/Lyy
R2
## [1] 0.6601055

Hypothesis Testing

H0: The data does not fit the linear regression model

pf(q = 297.1,1,153,lower.tail = FALSE)
## [1] 1.117922e-37
(R2/(1-R2)*(153/1))
## [1] 297.1396
HapEconmodel %>% anova()

We confirm the pp-value reported in the summary to be p<.001. As p-value is less than the 0.05 level of significance, we reject H0. There was statistically significant evidence that the data fits a linear regression model.

Hypthesis Testing Continued

HapEconmodel %>% summary() %>% coef()
##                   Estimate Std. Error   t value     Pr(>|t|)
## (Intercept)     -0.6333762 0.09592844 -6.602591 6.271247e-10
## Happiness.Score  0.3022205 0.01753249 17.237739 1.110391e-37

The intercept/constant is reported as a=-0.633.This value represents the average Economy..GDP.per.Capita score when happiness score is equal to 0. To test the statistical significance of the constant, we set the following statistical hypotheses:
H0:\(\alpha\)=0
HA:\(\alpha\)<>0
This hypothesis is tested using a t statistic, reported as t=-6.6026, p<.001. The constant is statistically significant at the 0.05 level. This means that there is statistically significant evidence that the constant is not 0.

HapEconmodel %>% confint()
##                      2.5 %     97.5 %
## (Intercept)     -0.8228915 -0.4438609
## Happiness.Score  0.2675835  0.3368575

R reports the 95% CI for a to be [-0.8228915, -0.4438609]. H0:\(\beta\)=0 is clearly not captured by this interval, so was rejected.

H0:\(\beta\)=0 HA:\(\beta\)<>0 The slope of the regression line was reported as b=0.302.A one unit increase in Happiness Score was related to an average increase in Economy..GDP.per.Capita of .302 units. This is a positive change. We confirm that p<.001. As p<.05, we reject H0. There was statistically significant evidence that Happiness Score was positively related to Economy..GDP.per.Capita.

Hypthesis Testing Continued

plot(HapEconmodel)

Discussion

In summary, a strong correlation exists between wealth and happiness, and it was found that the Gross Domestic Product of a country is a statistically significant predictor of the happiness of that country. The question remains as to whether happiness is derived from wealth alone, or from the benefits to society that wealth has the potential to bring.

References

  1. Centre for Bhutan Studies, (2011). Gross National Happiness Index Explained in Detail [online] Available at: http://www.grossnationalhappiness.com/docs/GNH/PDFs/Sabina_Alkire_method.pdf [Accessed 21 Oct. 2017].

  2. Helliwell, J., Layard, R., & Sachs, J. (2017). World Happiness Report 2017, New York: Sustainable Development Solutions Network. [online] Available at:http://worldhappiness.report/ [Accessed 21 Oct. 2017].

  3. Helliwell, J., Layard, R., & Sachs, J. (2017). World Happiness Report - Frequently Asked Questions. [online] Available at:http://worldhappiness.report/faq/ [Accessed 21 Oct. 2017].

  4. Helliwell, J.F., Huang, H. and Wang S. (2016). Statistical Appendix for “The Distribution of World Happiness”. [online] Available at:http://worldhappiness.report/wp-content/uploads/sites/2/2016/03/StatisticalAppendixWHR2016.pdf [Accessed 21 Oct. 2017].