MATH1324 Assignment 2

Comparing Life Expectancy in between Developing and Developed Countries

Syed Nazmul Kabir (s3874060), Mohammad Khan (s3872060)

Last updated: 25 October, 2020

Introduction

Problem Statement

Data

Data cont.

life_expectancy<-read_csv("Life Expectancy Data.csv")%>% select(Country,Year,Status, `Life expectancy`)%>%filter(Year==c(2011:2015))
str(life_expectancy)
## tibble [185 x 4] (S3: tbl_df/tbl/data.frame)
##  $ Country        : chr [1:185] "Afghanistan" "Albania" "Algeria" "Angola" ...
##  $ Year           : num [1:185] 2013 2011 2014 2012 2015 ...
##  $ Status         : chr [1:185] "Developing" "Developing" "Developing" "Developing" ...
##  $ Life expectancy: num [1:185] 59.9 76.6 75.4 56 76.4 76 73.9 82.7 88 72.7 ...
life_expectancy$Status<-factor(life_expectancy$Status,levels=c("Developing","Developed"))
levels(life_expectancy$`Status`)
## [1] "Developing" "Developed"

Data Cont.(Handling Missing Values)

is.specialorNA <- function(x){if (is.numeric(x)) (is.infinite(x) | is.nan(x) | is.na(x))}
sapply(life_expectancy, function(y)sum(is.specialorNA(y)))
##         Country            Year          Status Life expectancy 
##               0               0               0               2
life_expectancy[!complete.cases(life_expectancy),]
life_expectancy<-life_expectancy%>%na.omit()

Descriptive Statistics and Visualisation

life_expectancy%>%group_by(Status)%>%summarise(Min = min(`Life expectancy`,na.rm = TRUE),Q1 = round(quantile(`Life expectancy`,probs = .25,na.rm = TRUE),2), Median = round(median(`Life expectancy`, na.rm = TRUE),2), Q3 = round(quantile(`Life expectancy`,probs = .75,na.rm = TRUE),2),Max = max(`Life expectancy`,na.rm = TRUE),IQR=IQR(`Life expectancy`,na.rm = TRUE),Mean = round(mean(`Life expectancy`, na.rm = TRUE),2),SD = round(sd(`Life expectancy`, na.rm = TRUE),2), IQR = IQR(`Life expectancy`, na.rm = TRUE), Range= Max-Min,n = n(),Missing = sum(is.na(`Life expectancy`))) -> table1
knitr::kable(table1)
Status Min Q1 Median Q3 Max IQR Mean SD Range n Missing
Developing 48.1 63.15 71.0 75.05 85 11.90 69.12 7.73 36.9 151 0
Developed 73.4 78.47 81.4 82.53 88 4.05 80.74 3.67 14.6 32 0

Observations

# Decsriptive Statistics: Boxplot of Life Expectancy

qplot(`Life expectancy`,Status,data=life_expectancy,geom='boxplot')+stat_summary(fun.y=mean,shape=1,col='red',geom='point')+labs(title=" Boxplot of 'Life expectancy' of Developing and Developed Countries", x = "Life Expectancy (years)", y = "Types of Countries")+ geom_jitter(alpha = 1/8)

So there is no outlier. Red points represent mean value that differs more than 10 years in between two types of countries.Box-plot shows that life expectancy varies a lot in developing countries compared to developed countries. Developed countries have higher life expectancy than developing countries.

Decsriptive Statistics Cont.(Visualisation)

ggplot(life_expectancy , aes(x=`Life expectancy`, fill=Status))+geom_histogram(alpha=0.5,position = 'identity', colour='black',binwidth =1.8 )+ labs(title="    Histogram of Life Expectancy of Developing & Developed Countries",x="Life Expectancy (Years)", y="Frequency")

Visualisation:QQ Plot for Developing Countries

life_developing<-life_expectancy%>%filter(Status=="Developing")
life_developing$`Life expectancy`%>%car::qqPlot(dist="norm",ylab="Life expectancy(years)",main = "Developing Countries", col = "red")

## [1] 120  30

Visualisation:QQ Plot for Developed Countries

life_developed<-life_expectancy%>%filter(Status=="Developed")
life_developed$`Life expectancy`%>%qqPlot(dist="norm",ylab="Life expectancy(years)",main = "Developed Countries", col = "red")

## [1] 16  2

Visualisation:Important features of data visualisation:

Hypothesis Testing: Normality Checking

shapiro.test(life_developing$`Life expectancy`)
## 
##  Shapiro-Wilk normality test
## 
## data:  life_developing$`Life expectancy`
## W = 0.96354, p-value = 0.0005007
shapiro.test(life_developed$`Life expectancy`)
## 
##  Shapiro-Wilk normality test
## 
## data:  life_developed$`Life expectancy`
## W = 0.95602, p-value = 0.2132

Hypothesis Testing:cont.

Levene’s Test For Homogeneity Of Variance.

\[H_0: \sigma_1^2 = \sigma_2^2 \]

\[H_A: \sigma_1^2 \ne \sigma_2^2\]

Levene’s Test, continuing….

Here is the R code for Levene’s test:

leveneTest(`Life expectancy` ~ as.factor(Status), data = life_expectancy)%>%knitr::kable()
Df F value Pr(>F)
group 1 19.56093 1.68e-05
181 NA NA

Explanation:

Welch Two Sample t-test: Unequal Variance

\[H_0: \mu_1 - \mu_2=0 \]

\[H_A: \mu_1 - \mu_2\ne0\]

Welch Two Sample t-test:Cont…

t.test(`Life expectancy` ~ as.factor(Status), data = life_expectancy,var.equal = FALSE, alternative = "two.sided")
## 
##  Welch Two Sample t-test
## 
## data:  Life expectancy by as.factor(Status)
## t = -12.841, df = 98.577, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -13.407625  -9.818368
## sample estimates:
## mean in group Developing  mean in group Developed 
##                  69.1245                  80.7375

Decision:

Explanation of Hypothesis Test

Discussion

References