library(readr)
library(ggplot2)
library(pander)
library(knitr)
library(magrittr)
library(DT)
X2017 <- read_csv("C:/Users/blake/Desktop/Math 325 Notebook/Math 325 Notebook/Data/2017.csv")
X2017 <- X2017[, c(1:3, 6:11)]
X2017Table <- X2017
X2017Table[,c(-1,-2)] <- round(X2017[,c(-1,-2)], 2)
Every year the United Nations Sustainable Development Solutions Network publishes a World Happiness Report. It ranks a country’s happiness level based on the variables: GDP per capita, family, health life expectancy, freedom, generosity, and government trust within a country. This year Norway, Denmark, and Iceland were the top three happiest countries. The United States ranked 14th out of 155 countries. You want to know the order by which the six variables (listed above) makes the most impact on a country’s happiness. This analysis will be using simple linear regression to answer this question.
datatable(X2017Table, options=list(lengthMenu = c(5, 10, 20)), class = 'hover')
The mathematical model for a simple linear regression is as follows:
\(Y_i = \beta_0 + \beta_1X_i + \epsilon_i\), where \(\epsilon_{i} \sim N(0,\sigma^2)\) is the error term.
\(Y_i =\) Happiness Score
\(X_i =\) Explanatory Coefficient
Formally, the null and alternative hypothesis are as follows:
\(H_0: \beta_1 = 0\)
\(H_a: \beta_1 \neq 0\)
The overall level of significance, for this analysis, is set at: \(\alpha = 0.05\)
Each individual level of significance for all six tests will be set at: \(\alpha = 0.008\)
qplot(Economy..GDP.per.Capita., Happiness.Rank, data = X2017) + geom_point(aes(color = Happiness.Rank)) + geom_smooth(method = "lm", se = TRUE) + xlab("GDP Per Capita") + ylab("Happiness Score") + ggtitle("How Much GDP Per Capita Affects Happiness") + guides(colour = FALSE)
qplot(Family, Happiness.Rank, data = X2017) + geom_point(aes(color = Happiness.Rank)) + geom_smooth(method = "lm", se = TRUE) + xlab("Family Quality") + ylab("Happiness Score") + ggtitle("How Much Family Quality Affects Happiness") + guides(colour = FALSE)
qplot(X2017$Health..Life.Expectancy., Happiness.Rank, data = X2017) + geom_point(aes(color = Happiness.Rank)) + geom_smooth(method = "lm", se = TRUE) + xlab("Health Life Expectancy") + ylab("Happiness Score") + ggtitle("How Much Health Life Expectancy Affects Happiness") + guides(colour = FALSE)
qplot(X2017$Freedom, Happiness.Rank, data = X2017) + geom_point(aes(color = Happiness.Rank)) + geom_smooth(method = "lm", se = TRUE) + xlab("Freedom") + ylab("Happiness Score") + ggtitle("How Much Freedom Affects Happiness") + guides(colour = FALSE)
qplot(X2017$Generosity, Happiness.Rank, data = X2017) + geom_point(aes(color = Happiness.Rank)) + geom_smooth(method = "lm", se = TRUE) + xlab("Generosity") + ylab("Happiness Score") + ggtitle("How Much Generosity Affects Happiness") + guides(colour = FALSE)
qplot(X2017$Trust..Government.Corruption., Happiness.Rank, data = X2017) + geom_point(aes(color = Happiness.Rank)) + geom_smooth(method = "lm", se = TRUE) + xlab("Government Trust") + ylab("Happiness Score") + ggtitle("How Much Government Trust Affects Happiness") + guides(colour = FALSE)
| GDP Per Capita | Family | Health Life Expectancy | Freedom | Generosity | Government Trust | |
|---|---|---|---|---|---|---|
| Intercept | 163.4 | 214.875 | 159.502 | 145.481 | 88.905 | 100.063 |
| Explanatory Variable Coefficient | -86.75 | -115.128 | -17.29 | -165.076 | -44.169 | -179.202 |
| P-value of Coefficient | <2e-16 | <2e-16 | <2e-16 | 1.02e-13 | 0.1 | 1.61e-07 |
| R-squared Value | 0.6614 | 0.5428 | 0.6095 | 0.3043 | 0.01759 | 0.1647 |
| Adjusted R-squared Value | 0.6592 | 0.5398 | 0.607 | 0.2997 | 0.01117 | 0.1592 |
It is worth noting that generosity had a p-value of 0.1. For the line relating to that variable we will fail to reject the null hypothesis. In other words, we believe that there is sufficient evidence to conclude that, on it’s own, there is no linear relationship between generosity and happiness. It’s also worth noting that GDP, health life expectancy, and family had the highest adjusted R-squared values. Freedom and government trust had small adjusted R-squared values while generosity had an adjusted R-squared value of almost zero.
par(mfrow = c(1,2))
plot(mylm1, which=1:2)
par(mfrow = c(1,2))
plot(mylm2, which=1:2)
par(mfrow = c(1,2))
plot(mylm3, which=1:2)
par(mfrow = c(1,2))
plot(mylm4, which=1:2)
par(mfrow = c(1,2))
plot(mylm5, which=1:2)
par(mfrow = c(1,2))
plot(mylm6, which=1:2)
Listed above are residuals vs fitted plots along with their respective normal Q-Q plots. The first plot demonstrates whether or not the data for a particular regression line has a linear relationship and constant variance. The second plot demonstrates whether or not the data has error terms that are normally distributed. From these plots it appears that GDP per capita is the only variable that doesn’t push the boundaries of what simple linear regression is able to explain.
So is money the source of all happiness? On it’s own, GDP per capita impacts a country’s overall happiness the most. Health life expectancy and family also impact a country’s happiness significantly. Surprisingly, freedom and government trust don’t make as much of an impact, while generosity makes virtually no impact on it’s own. A future analysis should consider how much all six variables, in combination, affect a country’s happiness.