MATH1324 Introduction to Statistics Assignment 3

Simple Linear Regression Model- Can happiness be predicted?

Shel Nee Gan (s3746473) Haojun Xu (s3685256) Tianbao Jin (s3696594)

Last updated: 28 October, 2018

Introduction

Problem Statement

Question: Can GDP per capita of the country be used to determine the happiness of people in one country?

Method:

Data

#read data
World_Happiness_Report<-read.csv("2017.csv")

Data Cont.

#Subset the data
happiness<- World_Happiness_Report[,c(1,3,6)]

#changing column names
colnames(happiness)[c(2,3)]<- c("Happiness Score", "Economy")

Variables Used in Data:

Descriptive Statistics and Visualisation

Use scatter plot to visualise the relationship between Economy and Happiness Score.

plot(happiness$`Happiness Score` ~ happiness$Economy, data = happiness, xlab = "Economy", ylab = "Hapiness Score")

Decsriptive Statistics Cont.

#Use lm() function to fit the linear regression model
happiness_model<- lm(happiness$`Happiness Score`~happiness$Economy, data = happiness)
happiness_model %>% summary()
## 
## Call:
## lm(formula = happiness$`Happiness Score` ~ happiness$Economy, 
##     data = happiness)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.88807 -0.45200 -0.05328  0.49425  1.89833 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         3.2032     0.1356   23.62   <2e-16 ***
## happiness$Economy   2.1842     0.1267   17.24   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6617 on 153 degrees of freedom
## Multiple R-squared:  0.6601, Adjusted R-squared:  0.6579 
## F-statistic: 297.1 on 1 and 153 DF,  p-value: < 2.2e-16

The best line fit is Happiness Score = 3.203 + 2.184 x Economy

Hypothesis Testing for the overall linear regression model

\[H_0: The\ data\ do\ not\ fit\ the\ linear\ regression\ model\] \[H_A: The\ data\ fit\ the\ linear\ regression\ model\]

#calculate the p-value
pf( q =297.1,1,153,lower.tail = FALSE)
## [1] 1.117922e-37

Hypthesis Testing - Model parameters (\(\alpha\)):

\[H_0:\alpha = 0 \] \[H_A: \alpha \ne 0\]

In order to confirm that p < 0.001. We calculate 95% CI for \(\alpha\) by using confint() function:

happiness_model %>% confint()
##                      2.5 %   97.5 %
## (Intercept)       2.935283 3.471143
## happiness$Economy 1.933859 2.434511

Hypthesis Testing - Model parameters (\(\beta\)):

The slope of the regression line was reported as \(\beta\) = 2.18. This means that one unit increase in GDP is related to an average increase in happiness score of 2.18 units. This is a positive change.

\[H_0: \beta = 0 \] \[H_A: \beta \ne 0 \]

Assumptions

Independence is checked through the research design.

We have checked and confirmed the linearity in the begining of the report by using scatter plot.

The scatter plot shows a positive relationship as GDP increases, so too does happiness scores.

The plot() function is used to obtain a series of plots for checking the diagnotics of a fitted regression model in the following slides.

Testing Assumptions- Residuals vs. Fitted

happiness_model %>% plot(which =1)

Testing Assumptions- Normal Q-Q

happiness_model %>% plot(which =2)

Testing Assumptions- Scale-Location

happiness_model %>% plot(which =3)

Testing Assumptions- Residuals vs. Leverage

happiness_model %>% plot(which =5)

Linear Regression- \(R^2\)

r <- cor(happiness$`Happiness Score`,happiness$Economy)
r
## [1] 0.8124688

Interpretation

Summary:

Decision:

Discussion

\[Happiness\ Score = 3.203 + 2.184 * Economy\]

The simple regression model should further be used for comparision in 2-3 years.

The happiness score is recommended to use for determining the happiness level within a nation instead of comparing with other nations.

Use the estimated linear regression model to predict the level of happiness of the countries that are not included in this report and compare the predicted happiness score with the actual happiness score.

References