library(readr)
library(ggplot2)
library(pander)
library(knitr)
library(magrittr)
library(DT)
X2017 <- read_csv("C:/Users/blake/Desktop/Math 325 Notebook/Math 325 Notebook/Data/2017.csv")
X2017 <- X2017[, c(1:3, 6:11)]
X2017Table <- X2017
X2017Table[,c(-1,-2)] <- round(X2017[,c(-1,-2)], 2)

Background

Every year the United Nations Sustainable Development Solutions Network publishes a World Happiness Report. It ranks a country’s happiness level based on the variables: GDP per capita, family, health life expectancy, freedom, generosity, and government trust within a country. This year Norway, Denmark, and Iceland were the top three happiest countries. The United States ranked 14th out of 155 countries. You want to know the order by which the six variables (listed above) makes the most impact on a country’s happiness. This analysis will be using simple linear regression to answer this question.

Data

datatable(X2017Table, options=list(lengthMenu = c(5, 10, 20)), class = 'hover')

Mathimatical Model

The mathematical model for a simple linear regression is as follows:

\(Y_i = \beta_0 + \beta_1X_i + \epsilon_i\), where \(\epsilon_{i} \sim N(0,\sigma^2)\) is the error term.

\(Y_i =\) Happiness Score

\(X_i =\) Explanatory Coefficient

Formally, the null and alternative hypothesis are as follows:

\(H_0: \beta_1 = 0\)

\(H_a: \beta_1 \neq 0\)

The overall level of significance, for this analysis, is set at:
\(\alpha = 0.05\)

Each individual level of significance for all six tests will be set at:
\(\alpha = 0.008\)

Visualization of Variables

GDP Per Capita

qplot(Economy..GDP.per.Capita., Happiness.Rank, data = X2017) + geom_point(aes(color = Happiness.Rank)) + geom_smooth(method = "lm", se = TRUE) + xlab("GDP Per Capita") + ylab("Happiness Score") + ggtitle("How Much GDP Per Capita Affects Happiness") + guides(colour = FALSE) 

Family

qplot(Family, Happiness.Rank, data = X2017) + geom_point(aes(color = Happiness.Rank)) + geom_smooth(method = "lm", se = TRUE) + xlab("Family Quality") + ylab("Happiness Score") + ggtitle("How Much Family Quality Affects Happiness") + guides(colour = FALSE)

Health Life Expectancy

qplot(X2017$Health..Life.Expectancy., Happiness.Rank, data = X2017) + geom_point(aes(color = Happiness.Rank)) + geom_smooth(method = "lm", se = TRUE) + xlab("Health Life Expectancy") + ylab("Happiness Score") + ggtitle("How Much Health Life Expectancy Affects Happiness") + guides(colour = FALSE)

Freedom

qplot(X2017$Freedom, Happiness.Rank, data = X2017) + geom_point(aes(color = Happiness.Rank)) + geom_smooth(method = "lm", se = TRUE) + xlab("Freedom") + ylab("Happiness Score") + ggtitle("How Much Freedom Affects Happiness") + guides(colour = FALSE)

Generosity

qplot(X2017$Generosity, Happiness.Rank, data = X2017) + geom_point(aes(color = Happiness.Rank)) + geom_smooth(method = "lm", se = TRUE) + xlab("Generosity") + ylab("Happiness Score") + ggtitle("How Much Generosity Affects Happiness") + guides(colour = FALSE)

Government Trust

qplot(X2017$Trust..Government.Corruption., Happiness.Rank, data = X2017) + geom_point(aes(color = Happiness.Rank)) + geom_smooth(method = "lm", se = TRUE) + xlab("Government Trust") + ylab("Happiness Score") + ggtitle("How Much Government Trust Affects Happiness") + guides(colour = FALSE)

Quantitative Results

  GDP Per Capita Family Health Life Expectancy Freedom Generosity Government Trust
Intercept 163.4 214.875 159.502 145.481 88.905 100.063
Explanatory Variable Coefficient -86.75 -115.128 -17.29 -165.076 -44.169 -179.202
P-value of Coefficient <2e-16 <2e-16 <2e-16 1.02e-13 0.1 1.61e-07
R-squared Value 0.6614 0.5428 0.6095 0.3043 0.01759 0.1647
Adjusted R-squared Value 0.6592 0.5398 0.607 0.2997 0.01117 0.1592

It is worth noting that generosity had a p-value of 0.1. For the line relating to that variable we will fail to reject the null hypothesis. In other words, we believe that there is sufficient evidence to conclude that, on it’s own, there is no linear relationship between generosity and happiness. It’s also worth noting that GDP, health life expectancy, and family had the highest adjusted R-squared values. Freedom and government trust had small adjusted R-squared values while generosity had an adjusted R-squared value of almost zero.

Which Variables Are Appropriate for Simple Linear Regression?

GDP Per Capita

par(mfrow = c(1,2))
plot(mylm1, which=1:2)

Family

par(mfrow = c(1,2))
plot(mylm2, which=1:2)

Health Life Expectancy

par(mfrow = c(1,2))
plot(mylm3, which=1:2)

Freedom

par(mfrow = c(1,2))
plot(mylm4, which=1:2)

Generosity

par(mfrow = c(1,2))
plot(mylm5, which=1:2)

Government Trust

par(mfrow = c(1,2))
plot(mylm6, which=1:2)

Listed above are residuals vs fitted plots along with their respective normal Q-Q plots. The first plot demonstrates whether or not the data for a particular regression line has a linear relationship and constant variance. The second plot demonstrates whether or not the data has error terms that are normally distributed. From these plots it appears that GDP per capita is the only variable that doesn’t push the boundaries of what simple linear regression is able to explain.

Interpretation

So is money the source of all happiness? On it’s own, GDP per capita impacts a country’s overall happiness the most. Health life expectancy and family also impact a country’s happiness significantly. Surprisingly, freedom and government trust don’t make as much of an impact, while generosity makes virtually no impact on it’s own. A future analysis should consider how much all six variables, in combination, affect a country’s happiness.