Syed Nazmul Kabir (s3874060), Mohammad Khan (s3872060)
Last updated: 25 October, 2020
Life expectancy refers to a measure describing how long a person would like to live.
Previous studies show that life expectancy varies among the countries worldwide due to different socioeconomic and health factors (Khan, Khan, & Khan, 2010; Lin, Chen, Chien, & Chan, 2012).
A better understanding of the difference in life expectancy in developing and developed countries is of great importance for international and global development.
In spite of poor health facilities and lower standard of living and education level; the life expectancy in developing countries was increased 3.6 years per decade compare to 1.8 years per decade in developed countries (United Nations, 2017).
So it is of great interest to know the present condition of developing and developed countries in terms of life expectancy.
life_expectancy<-read_csv("Life Expectancy Data.csv")%>% select(Country,Year,Status, `Life expectancy`)%>%filter(Year==c(2011:2015))
str(life_expectancy)## tibble [185 x 4] (S3: tbl_df/tbl/data.frame)
## $ Country : chr [1:185] "Afghanistan" "Albania" "Algeria" "Angola" ...
## $ Year : num [1:185] 2013 2011 2014 2012 2015 ...
## $ Status : chr [1:185] "Developing" "Developing" "Developing" "Developing" ...
## $ Life expectancy: num [1:185] 59.9 76.6 75.4 56 76.4 76 73.9 82.7 88 72.7 ...
life_expectancy$Status<-factor(life_expectancy$Status,levels=c("Developing","Developed"))
levels(life_expectancy$`Status`)## [1] "Developing" "Developed"
is.specialorNA <- function(x){if (is.numeric(x)) (is.infinite(x) | is.nan(x) | is.na(x))}
sapply(life_expectancy, function(y)sum(is.specialorNA(y)))## Country Year Status Life expectancy
## 0 0 0 2
life_expectancy%>%group_by(Status)%>%summarise(Min = min(`Life expectancy`,na.rm = TRUE),Q1 = round(quantile(`Life expectancy`,probs = .25,na.rm = TRUE),2), Median = round(median(`Life expectancy`, na.rm = TRUE),2), Q3 = round(quantile(`Life expectancy`,probs = .75,na.rm = TRUE),2),Max = max(`Life expectancy`,na.rm = TRUE),IQR=IQR(`Life expectancy`,na.rm = TRUE),Mean = round(mean(`Life expectancy`, na.rm = TRUE),2),SD = round(sd(`Life expectancy`, na.rm = TRUE),2), IQR = IQR(`Life expectancy`, na.rm = TRUE), Range= Max-Min,n = n(),Missing = sum(is.na(`Life expectancy`))) -> table1
knitr::kable(table1)| Status | Min | Q1 | Median | Q3 | Max | IQR | Mean | SD | Range | n | Missing |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Developing | 48.1 | 63.15 | 71.0 | 75.05 | 85 | 11.90 | 69.12 | 7.73 | 36.9 | 151 | 0 |
| Developed | 73.4 | 78.47 | 81.4 | 82.53 | 88 | 4.05 | 80.74 | 3.67 | 14.6 | 32 | 0 |
qplot(`Life expectancy`,Status,data=life_expectancy,geom='boxplot')+stat_summary(fun.y=mean,shape=1,col='red',geom='point')+labs(title=" Boxplot of 'Life expectancy' of Developing and Developed Countries", x = "Life Expectancy (years)", y = "Types of Countries")+ geom_jitter(alpha = 1/8) So there is no outlier. Red points represent mean value that differs more than 10 years in between two types of countries.Box-plot shows that life expectancy varies a lot in developing countries compared to developed countries. Developed countries have higher life expectancy than developing countries.
ggplot(life_expectancy , aes(x=`Life expectancy`, fill=Status))+geom_histogram(alpha=0.5,position = 'identity', colour='black',binwidth =1.8 )+ labs(title=" Histogram of Life Expectancy of Developing & Developed Countries",x="Life Expectancy (Years)", y="Frequency")life_developing<-life_expectancy%>%filter(Status=="Developing")
life_developing$`Life expectancy`%>%car::qqPlot(dist="norm",ylab="Life expectancy(years)",main = "Developing Countries", col = "red")## [1] 120 30
life_developed<-life_expectancy%>%filter(Status=="Developed")
life_developed$`Life expectancy`%>%qqPlot(dist="norm",ylab="Life expectancy(years)",main = "Developed Countries", col = "red")## [1] 16 2
Histogram of life expectancy for developed countries shows less variation among the countries. All developed countries have life expectancy more than 73 years.
Histogram of life expectancy of developing countries shows a lot of variation in a wider range compared to developed countries. Some developing countries have life expectancy below 55 years,majority in between 60 to 72 years, very few have over 72 years in a range of 5 years from 2011 to 2015.
So it is clear from visualisation that developed countries have higher life expectancy. But we are not sure whether this difference is statistically significant or not.
Both QQ plot shows ‘S’ patterns. QQ plot for developing countries clearly shows left-skewness. Many data falls outside of dotted lines as well. So this distribution may not be normal.
In case of QQ plot for developed countries, we can assume normality considering 95% CI of normal quantiles. Because almost all of the data falls inside to the dotted lines.
##
## Shapiro-Wilk normality test
##
## data: life_developing$`Life expectancy`
## W = 0.96354, p-value = 0.0005007
##
## Shapiro-Wilk normality test
##
## data: life_developed$`Life expectancy`
## W = 0.95602, p-value = 0.2132
\[H_0: \sigma_1^2 = \sigma_2^2 \]
\[H_A: \sigma_1^2 \ne \sigma_2^2\]
Here is the R code for Levene’s test:
| Df | F value | Pr(>F) | |
|---|---|---|---|
| group | 1 | 19.56093 | 1.68e-05 |
| 181 | NA | NA |
Welch two-sample(independent) t-test assuming unequal variance is used to determine if the mean life expectancy of developing countries are significantly different from developed countries.The following statistical hypotheses are used:
Null hypothesis: denoted as \(H_0\) assumes that the difference between means of Life expectancy of developed and developing countries is 0.
Alternate hypothesis:denoted as \(H_A\) assumes that the difference between means of Life expectancy of developed and developing countries is not 0. Here are the mathematical representations of two hypothesis.
Here, \(\mu_1\) and \(\mu_2\) refers to the mean value of life expectancy of developing and developed countries.
\[H_0: \mu_1 - \mu_2=0 \]
\[H_A: \mu_1 - \mu_2\ne0\]
t.test(`Life expectancy` ~ as.factor(Status), data = life_expectancy,var.equal = FALSE, alternative = "two.sided")##
## Welch Two Sample t-test
##
## data: Life expectancy by as.factor(Status)
## t = -12.841, df = 98.577, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -13.407625 -9.818368
## sample estimates:
## mean in group Developing mean in group Developed
## 69.1245 80.7375
Kaggle. Life Expectancy (WHO), Statistical Analysis on factors influencing Life Expectancy,retrieved on 28 September, 2020 from https://www.kaggle.com/kumarajarshi/life-expectancy-who?select=Life+Expectancy+Data.csv
Khan, A., Khan, S., & Khan, M. (2010). Factors effecting life expectency in developed and developing countries of the world (An approach to available literature). International Journal of Yoga, Physiotherapy and Physical Education, 1(1), 04-06.
Lin, R.-T., Chen, Y.-M., Chien, L.-C., & Chan, C.-C. (2012). Political and social determinants of life expectancy in less developed countries: a longitudinal study. BMC Public Health, 12(1), 85.
United Nations. (2017). Life expectancy at birth increasing in less developed regions, retrieved on 28 September, 2020 from https://www.un.org/en/development/desa/population/publications/pdf/popfacts/PopFacts_2017-9.pdf.