MATH1324 Introduction to Statistics Assignment 3

An analysis on the correlation between Happiness and Healthy life expectancy

DHACHAINEE MURUGAYAH (s3794334) RACHEAL RONALD COELHO (s3804448)

Last updated: 27 October, 2019

Introduction

10 Happiest Countries in the World

10 Happiest Countries in the World

Problem Statement

Data

happy <- read_csv("happy.csv")

names(happy)[names(happy) == "Happiness score"] <- "HappinessScore"
names(happy)[names(happy) == "Explained by: Healthy life expectancy"] <- "HealthyLifeExpectancy"
head(happy)

Data Cont.

Descriptive Statistics and Visualisation

x<-happy %>% summarise(MissingValuesInHealthyLifeStyle=sum(is.na(happy$HealthyLifeExpectancy)))
y<-happy %>% summarise(MissingValuesInHappinessScore=sum(is.na(happy$HappinessScore)))
data.frame(y,x)
happy<-happy %>%mutate(HappinessScore = ifelse (is.na(HappinessScore), mean(HappinessScore, na.rm = TRUE), HappinessScore))
happy<-happy %>%mutate(HealthyLifeExpectancy = ifelse (is.na(HealthyLifeExpectancy), mean(HealthyLifeExpectancy, na.rm = TRUE), HealthyLifeExpectancy))

x<-happy %>% summarise(MissingValuesInHealthyLifeStyle=sum(is.na(happy$HealthyLifeExpectancy)))
y<-happy %>% summarise(MissingValuesInHappinessScore=sum(is.na(happy$HappinessScore)))
data.frame(y,x)

Descriptive Statistics cont.

Descriptive Statistics cont.

#boxplot
happy %>% boxplot(happy$HappinessScore, happy$HealthyLifeExpectancy, names=c("HappinessScore"
, "HealthyLifeExpectancy"), data = .,
main="Boxplot of Happines sScore and Life Expectancy",
xlab="happy", ylab="Range", col=c("yellow", "green"))

Descriptive Statistics cont.

HappinessScore <- happy$HappinessScore[!is.na(happy$HappinessScore)]
z_score<-HappinessScore %>% scores(type = "z")
HappinessScore[ which( abs(z_score) >3 )]
## numeric(0)
HappinessScore[ which( abs(z_score) >3 )]<-mean(HappinessScore,na.rm=TRUE)

boxplot(HappinessScore, main = "Box Plot of Happiness Score",ylab="Happiness Score",verticle
=TRUE, col = "green")

Decsriptive Statistics Cont.

HealthyLifeExpectancy <- happy%>% summarise(Min = min(HealthyLifeExpectancy,na.rm = TRUE),
Q1 = quantile(HealthyLifeExpectancy,probs = .25,na.rm=TRUE),
Median = median(HealthyLifeExpectancy, na.rm = TRUE),
Q3 = quantile(HealthyLifeExpectancy,probs = .75,na.rm=TRUE),
Max = max(HealthyLifeExpectancy,na.rm = TRUE),
Mean = mean(HealthyLifeExpectancy, na.rm = TRUE),
SD = sd(HealthyLifeExpectancy, na.rm = TRUE)
)

HappinessScore <- happy%>% summarise(Min = min(HappinessScore,na.rm = TRUE),
Q1 = quantile(HappinessScore,probs = .25,na.rm=TRUE),
Median = median(HappinessScore, na.rm = TRUE),
Q3 = quantile(HappinessScore,probs = .75,na.rm=TRUE),
Max = max(HappinessScore,na.rm = TRUE),
Mean = mean(HappinessScore, na.rm = TRUE),
SD = sd(HappinessScore, na.rm = TRUE)
)

combination <- rbind(HappinessScore, HealthyLifeExpectancy)
rownames(combination) <- c("HappinessScore", "HealthyLifeExpectancy")
kable(round(combination,2), caption = "Summary table of Happiness Score and Healthy Life Expectancy", row.names = TRUE)
Summary table of Happiness Score and Healthy Life Expectancy
Min Q1 Median Q3 Max Mean SD
HappinessScore 2.85 4.54 5.38 6.18 7.77 5.41 1.11
HealthyLifeExpectancy 0.00 0.55 0.78 0.88 1.14 0.72 0.24

Decsriptive Statistics Cont.

matplot(t(data.frame(happy$HealthyLifeExpectancy,happy$HappinessScore)),
type="b",
pch = 19,
col = 1,
lty = 1,
xlab = "Comparison",
ylab = "Happiness Score",
xaxt = "n")
axis(1, at=1:2,labels = c("HappinessScore","HealthyLifeExpectancy") )

Decsriptive Statistics Cont.

#scatterplot

x <- happy$HappinessScore
y <- happy$HealthyLifeExpectancy
plot(x, y, main = "Happiness Score Price VS Healthy Life Expectancy ",
     xlab = "Happiness", ylab = " HealthyLifeExpectancy",
     pch = 19, frame = FALSE)
abline(lm(y ~ x, data = happy), col="red", lty=6)

Hypothesis Testing

Assumptions:

Independence: Happiness Score and Healthy Life Expectancy are independent Linearity: as shown in scatter plot, there is a possitive relationship between Happiness Score and Healthy Life Expectancy

Testing the Overall Model

score <- lm(HappinessScore ~ HealthyLifeExpectancy, data = happy)
summary(score)
## 
## Call:
## lm(formula = HappinessScore ~ HealthyLifeExpectancy, data = happy)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.6831 -0.4604  0.0743  0.5230  1.5904 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             2.7832     0.1771   15.72   <2e-16 ***
## HealthyLifeExpectancy   3.6382     0.2331   15.61   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6949 on 154 degrees of freedom
## Multiple R-squared:  0.6127, Adjusted R-squared:  0.6102 
## F-statistic: 243.7 on 1 and 154 DF,  p-value: < 2.2e-16

The p-value for the F-test is very small, F(1,154) = 243.7, p<.001. Since the p value is less than the 0.05 level of significance, H0 is rejected. Hence, there was statistically significant evidence that there is an association between the Happiness Score and Healthy Life Expectancy

Linear Regression - R-squared

pf(q = 239.1,1,154,lower.tail = FALSE)
## [1] 3.767232e-33

The p value is less than 0.001. Since, the p is less than the 0.05 level of significance, we reject H0. There was statistically significant evidence that the data fit a linear regression model.

Testing Model Parameters

score %>% summary() %>% coef()
##                       Estimate Std. Error  t value     Pr(>|t|)
## (Intercept)           2.783194  0.1770647 15.71852 7.979033e-34
## HealthyLifeExpectancy 3.638231  0.2330765 15.60960 1.544574e-33
score %>% confint()
##                          2.5 %   97.5 %
## (Intercept)           2.433405 3.132984
## HealthyLifeExpectancy 3.177791 4.098671

The intercept/constant is reported as a=2.783194. The 95% CI for a to be [2.433405, 3.132984]. H0:α=0 is clearly not captured by this interval. Thus, H0 is rejected.

Testing assumption

score <- lm(HappinessScore ~ HealthyLifeExpectancy, data = happy)
par(mfrow=c(2,2))
plot(score)

Discussion

Findings:

Strengths:

Limitations:

The directions for future research:

Conclusion:

Happy humans

Happy humans

References

  1. kaggle, 2019. World Happiness Report 2019. [Online] Available at: https://www.kaggle.com/PromptCloudHQ/world-happiness-report-2019 [Accessed 8 October 2019].

  2. MacMillan, A., 2018. Happiness linked to longer life. [Online] Available at: https://edition.cnn.com/2011/10/31/health/happiness-linked-longer-life/index.html [Accessed 20 October 2019].

  3. McKenzie, D. J., 2014. Happiness - The Highest Form of Health. [Online] Available at: https://www.naturopathiccurrents.com/articles/happiness-highest-form-health [Accessed 10 October 2019].

  4. UNRIC, 2019. The UN and happiness. [Online] Available at: https://www.unric.org/en/happiness/27709-the-un-and-happiness [Accessed 10 October 2019].

  5. World Happiness Report , 2019. World Happiness Report 2019. [Online] Available at: https://worldhappiness.report/ed/2019/ [Accessed 10 October 2019].