Assignment 3 - Math 1324 - 2019

Investigation of the relationship between Wealth and Life Expectancy in 2017

Paul McKie - s3767130

Last updated: 02 June, 2019

Introduction

Do people from rich countries live longer? In this report I attempt to apply statistical testing techniques to that question.

Using two datasets from the World Bank, I evaluate the relationship between

I used the most recent year with conplete statistics for both variables - 2017.

Introduction Cont.

Problem Statement

Data

  1. download the datasets linked on the references page of this report
  2. remove the top three lines from the csv files
  3. keep only the country code column and the values for the year 2017
  4. label the year value as GDP or LIfe as appropriate.
  5. combine the datasets into one csv file for loading.
  6. remove entire entries with no value of either Life expectancy or GDP.

Data Cont.

Descriptive Statistics

  1. Mean life Expectancy is 72 in the 233 coutries assessed
  2. Mean GDP per person is US $13,188
data <- summary(Life_GDP)
knitr::kable(data[0:-1 ,2:3])
GDP Life
1st Qu.: 1921 1st Qu.:66.76
Median : 5594 Median :73.32
Mean : 13256 Mean :72.00
3rd Qu.: 15267 3rd Qu.:77.17
Max. :104499 Max. :84.68
NA NA

Boxplot of Life Expectancy

Life_GDP$Life %>% boxplot(ylab = "Life Expectancy")

normality of Life Expectancy

Life_GDP$Life %>% qqPlot(dist="norm", ylab="Life Expectancy", xlab="Quantiles", 
                               main = "Q-Q plot - Life Expectancy by country - 2017", ylim=range(40:90,  na.rm=TRUE))

## [1] 183  30

Boxplot of GDP per capita

Life_GDP$GDP %>% boxplot(ylab = "Gross Domestic Product - per capita", ylim=range(0:50000,  na.rm=TRUE))

# normality of GDP per capita - Note: Increased Variation from normal as wealth increases

Life_GDP$GDP %>% qqPlot(dist="norm", ylab="GDP per capita",
                               main = "Q-Q plot - GDP per capita  by country - 2017",xlab="Quantiles", ylim=range(0:50000,  na.rm=TRUE))

## [1] 127 129

Plot of Life Expectancy versus GDP per capita

plot(Life ~ GDP, data = Life_GDP, xlim=range(0:50000,  na.rm=TRUE) , ylim=range(50:85,  na.rm=TRUE), 
     ylab="Life Expectancy - Years", xlab="GDP per capita $US", main= "Life Expectancy versus GDP per capita - 2017")

Hypthesis Testing.

Null Hypothesis: The data do not fit the linear regression model Alternative Hypothesis: The data fit the linear regression model

F-test …

leogdpmodel2017 <- lm(Life ~ GDP, data = Life_GDP)
leogdpmodel2017 %>% summary()
## 
## Call:
## lm(formula = Life ~ GDP, data = Life_GDP)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -16.309  -3.771   1.325   4.303   9.237 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 6.839e+01  4.565e-01   149.8   <2e-16 ***
## GDP         2.724e-04  2.033e-05    13.4   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.576 on 227 degrees of freedom
## Multiple R-squared:  0.4416, Adjusted R-squared:  0.4392 
## F-statistic: 179.5 on 1 and 227 DF,  p-value: < 2.2e-16

F-test results…

F-statistic: 179.5 on 1 and 227 DF, p-value: < 2.2e-16

Yields a p-value below that is less than 0.001

RESULT: Reject Null Hypothesis. Hece support for a linear relationshp existing between GDP per capita and Life Expectancy.

Pearson test

r = 0.665, p < 0.001, 95% CI [0.5911188 0.7270721] The positive crrelation was statistically significant.

Result: Fail to reject Null Hypothesis

r <- cor(Life_GDP$Life, Life_GDP$GDP)
r
## [1] 0.6645596
library(psychometric)
CIr(r, n = 263, level = .95)
## [1] 0.5911188 0.7270721

Plot the regression Line

ggscatter(Life_GDP, y = "Life", x = "GDP", 
          add = "reg.line", conf.int = TRUE, 
          cor.coef = TRUE, cor.method = "pearson",
          xlab = "x", ylab = "y")

#Results The F-test yield a sufficiently low p-value that we can reject the null hypothesis.

Hence there is support for a linear relationship existing between GDP per capita and Life Expectancy.
The positive crrelation was statistically significant. r = 0.665, p < 0.001, 95% CI [0.5911188 0.7270721] Result: Fail to reject Null Hypothesis

Discussion

Linearity has been established in this analysis.
A moderate, positve correlation of 0.66 with statistical sugnificance was shown.

Wealth is a statistically likely predictor of greater life expectancy.

References