Assignments 2 Applied Analytics

Do citizens of wealthier countries live longer?

Rajib Debnath (s3856291), Lissette Salazar Rodriguez (s3866268)

Last updated: 23 October, 2020

Introduction

The gross domestic product per capita, or GDP per capita, is a measure of a country’s economic output that accounts for its number of people. It divides the country’s gross domestic product by its total population. GDP per capita is an important indicator of economic performance and a useful unit to make cross-country comparisons of average living standards and economic wellbeing.

Life expectancy at birth is defined as how long, on average, a newborn can expect to live, if current death rates do not change. Life expectancy at birth is one of the most frequently used health status indicators.

Large inequalities in life expectancy by countries exist in the world. The big differences in health across the world is clearly visible in the following illustration:


Source:https://ourworldindata.org/life-expectancy


Introduction Cont.

Do citizens of wealthier countries live longer?

To answer this, we wanted to investigate the relation between GDP per Capita and the life expectancy of the citizen in the developed countries.

Problem Statement

The question that we want to investigate is:

“Is there any relationship between a country’s GDP per Capita and Life Expectancy?”

To answer this, we:

Data

A detailed description of datasets considered for data preprocessing, their sources, and variable descriptions are as follows:

Dataset Source

The datasets contained:

Data Cont.

newdata <- read_excel("gdp_lifeexp_data.xlsx")
head(newdata)

Descriptive Statistics

summary1 <- newdata %>% summarise(Min = min(GDP,na.rm = TRUE), Q1 = quantile(GDP,probs = .25,na.rm = TRUE),
                       Median = median(GDP, na.rm = TRUE), Q3 = quantile(GDP,probs = .75,na.rm = TRUE),
                       Max = max(GDP,na.rm = TRUE), Mean = mean(GDP, na.rm = TRUE),
                       SD = sd(GDP, na.rm = TRUE), n = n(),
                       Missing = sum(is.na(GDP)))
knitr::kable(summary1,caption="Summary Statistics for GDP per Capita")
Summary Statistics for GDP per Capita
Min Q1 Median Q3 Max Mean SD n Missing
6776.995 23859.79 31877.07 41360.25 116622.2 33850.01 15703.31 700 0
summary2 <- newdata %>% summarise(Min = min(LIFEEXP,na.rm = TRUE), Q1 = quantile(LIFEEXP,probs = .25,na.rm = TRUE),
                                  Median = median(LIFEEXP, na.rm = TRUE), Q3 = quantile(LIFEEXP,probs = .75,na.rm = TRUE),
                                  Max = max(LIFEEXP,na.rm = TRUE), Mean = mean(LIFEEXP, na.rm = TRUE),
                                  SD = sd(LIFEEXP, na.rm = TRUE), n = n(),
                                  Missing = sum(is.na(LIFEEXP)))
knitr::kable(summary2,caption="Summary Statistics for Life Expentancy")
Summary Statistics for Life Expentancy
Min Q1 Median Q3 Max Mean SD n Missing
70.1 77.1 79.7 81.4 84.2 78.95329 3.092587 700 0

Visualisation

g1 <- ggplot(newdata,aes(GDP))+geom_histogram(bins=40,color = "yellow3", fill="yellow3")+ ylab('Frequency')+ggtitle('GDP Per Capita')
g2 <- ggplot(newdata,aes(LIFEEXP))+geom_histogram(bins=40,color = "brown", fill="brown")+ ylab('Frequency')+ggtitle('Life Expectancy')
cowplot::plot_grid(g1, g2, labels = "AUTO")

Visualisation 2

par(mfrow = c(1,2))
boxplot(newdata$GDP, main="Boxplot of GDP PER CAPITA",col='yellow3', notch = T)
boxplot(newdata$LIFEEXP, main="Boxplot of Life Expectancy", col='brown', notch = T)

##  [1]  69147  72018  78211  84575  68141  77891  83852  86592  82269  85579
## [11]  91814  91527  95246 100934 103788 110250 112702 116622  69358
## [1] 70.1 70.5 70.6 70.6

Visualisation 3

plot(LIFEEXP ~ GDP, data = newdata, main="Scatter of Life Expectancy and GDP",
        ylab="Life Expectancy", xlab="GDP Per Capita", col=c('brown', 'yellow3'))
abline(lm(LIFEEXP ~ GDP, data = newdata))

Hypothesis Testing 1

Correlation Analysis

Hypothesis Generation

Ho : There is no correlation between GDP per Capita of an OECD country and its Life Expectancy.

Ha : There is significant correlation between GDP per Capita of an OECD country and its Life Expectancy.

Mathematically,

Ho :r = 0

Ha : r ≠ 0

Test Result 1

A Pearson’s correlation was calculated to measure the strength of the linear relationship between GDP per capita of a OECD country and its life expectancy. The positive correlation was statistically significant, r=.67, p<.001, 95% CI [0.628, .710].

We can say that there are statistically significant evidence to reject Ho.

#Creating a correlation matrix
corr<-as.matrix(dplyr::select(newdata, LIFEEXP, GDP))
rcorr(corr, type = "pearson")
##         LIFEEXP  GDP
## LIFEEXP    1.00 0.67
## GDP        0.67 1.00
## 
## n= 700 
## 
## 
## P
##         LIFEEXP GDP
## LIFEEXP          0 
## GDP      0
r=cor(newdata$LIFEEXP,newdata$GDP)
CIr(r = r, n = 700, level = .95) %>% round(3)
## [1] 0.628 0.710

Hypthesis Testing 2

As the data demonstrated evidence of a positive linear relationship, a linear regression model was fitted to predict the dependent variable, life expectancy, using measures of GDP per capita. Other non-linear trends were ruled out.

Linear Regression Model

Hypothesis Generation

Ho: The data do not fit the linear regression model

Ha: The data fit the linear regression model

Mathematically,

Ho : α = 0

Ha : α ≠ 0

Test Result 2

gdplifeexpmodel <- lm(LIFEEXP ~ GDP, data = newdata) # fitting the linear regression model using the lm() function
gdplifeexpmodel %>% summary()
## 
## Call:
## lm(formula = LIFEEXP ~ GDP, data = newdata)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.4908 -1.4302  0.3848  1.6505  4.4314 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 7.448e+01  2.062e-01  361.17   <2e-16 ***
## GDP         1.321e-04  5.527e-06   23.91   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.295 on 698 degrees of freedom
## Multiple R-squared:  0.4502, Adjusted R-squared:  0.4494 
## F-statistic: 571.6 on 1 and 698 DF,  p-value: < 2.2e-16
gdplifeexpmodel %>% confint() # calculating 95% CI
##                    2.5 %       97.5 %
## (Intercept) 7.407545e+01 7.488522e+01
## GDP         1.212884e-04 1.429922e-04

Assumptions validation for linear regression

par(mfrow=c(1,4))
plot(gdplifeexpmodel)

Discussion

Major Findings:

Discussion Cont.

References