MATH1324 Assignment 2

Statistical analysis of life expectancy in developed and developing countries

Pandaree Tangyoocharoen (s3829627)

Last updated: 25 October, 2020

Introduction

Problem Statement

The statistical analysis including:

Data

The important variables and description as below:

Year : Year of observation
Status : Status of countries (developed/developing)
life expectancy : Life expectancy (year) of people

Data Cont.

Before analysis the data need to preprocess by:

Data Cont.

Subset data from the dataset in order to prepare for analysis.

life_exp <- read.csv("Life Expectancy Data.csv")


exp_status <- subset(life_exp, Year == 2015 , select=c(Status, Life.expectancy))


developed <- subset(exp_status, Status == "Developed")
developing <- subset(exp_status, Status == "Developing")

Data Cont.

The data represent life expectancy in developed countries.

tibble(developed$Status, developed$Life.expectancy)
## # A tibble: 32 x 2
##    `developed$Status` `developed$Life.expectancy`
##    <chr>                                    <dbl>
##  1 Developed                                 82.8
##  2 Developed                                 81.5
##  3 Developed                                 81.1
##  4 Developed                                 74.5
##  5 Developed                                 78  
##  6 Developed                                 85  
##  7 Developed                                 78.8
##  8 Developed                                 86  
##  9 Developed                                 81  
## 10 Developed                                 75.8
## # … with 22 more rows

Data Cont.

The data represent life expectancy in developing countries.

tibble(developing$Status, developing$Life.expectancy)
## # A tibble: 151 x 2
##    `developing$Status` `developing$Life.expectancy`
##    <chr>                                      <dbl>
##  1 Developing                                  65  
##  2 Developing                                  77.8
##  3 Developing                                  75.6
##  4 Developing                                  52.4
##  5 Developing                                  76.4
##  6 Developing                                  76.3
##  7 Developing                                  74.8
##  8 Developing                                  72.7
##  9 Developing                                  76.1
## 10 Developing                                  76.9
## # … with 141 more rows

Data Cont.

Checking type of observations. And then create factor of Status.

str(exp_status)
## 'data.frame':    183 obs. of  2 variables:
##  $ Status         : chr  "Developing" "Developing" "Developing" "Developing" ...
##  $ Life.expectancy: num  65 77.8 75.6 52.4 76.4 76.3 74.8 82.8 81.5 72.7 ...
exp_status$Status <- as.factor(exp_status$Status)

Statistics and Visualisation

In this step, Visualisation by Box plot for comparing life expectancy in developed and developing countries.

exp_status %>%  boxplot(Life.expectancy~Status, data = ., ylab = "Life Expectancy (year)")

Statistics and Visualisation cont.

Using Q-Q plot to check data normality.

par(mfrow = c(1, 2))
qqPlot(developed$Life.expectancy, dist = "norm", ylab = "Life expectancy (developed)")
## [1] 27 16
qqPlot(developing$Life.expectancy, dist = "norm", ylab = "Life expectancy (developing)")

## [1] 120   4

Statistics and Visualisation cont.

Statistics Cont.

The summary of life expectancy values in two groups.

exp_status %>% group_by(Status) %>% summarise(Min = min(Life.expectancy, na.rm = TRUE),
                                           Q1 = quantile(Life.expectancy, probs = .25, na.rm = TRUE),
                                           Median = median(Life.expectancy, na.rm = TRUE),
                                           Q3 = round(quantile(Life.expectancy, probs = .75, na.rm = TRUE),2),
                                           Max = max(Life.expectancy, na.rm = TRUE),
                                           Mean =round(mean(Life.expectancy, na.rm = TRUE),2),
                                           SD = round(sd(Life.expectancy, na.rm = TRUE),2),
                                           n = n(),
                                           Missing = sum(is.na(Life.expectancy))) 
## # A tibble: 2 x 10
##   Status       Min    Q1 Median    Q3   Max  Mean    SD     n Missing
##   <fct>      <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <int>   <int>
## 1 Developed   73.6  78.6   81.6  82.7    88  80.7  3.46    32       0
## 2 Developing  51    64.6   71.6  75.5    85  69.7  7.5    151       0

Hypothesis Testing

Using leveneTest() function in order to compare the variances of developed and developing countries.

\[H_0: a_1^2 = σ_2^2 \]

\[H_A: a_1^2 ≠ σ_2^2 \]

exp_status %>%  leveneTest(Life.expectancy ~ Status, data =. )
## Levene's Test for Homogeneity of Variance (center = median)
##        Df F value    Pr(>F)    
## group   1  19.469 1.753e-05 ***
##       181                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

As the result shows \(p\) < 0.05. Thus, it need to reject \(H_0\) and it can assume that there are unequal variance.

Hypothesis Testing Cont.

Statistical hypotheses for two-sample t-test:

\[H_0: \mu_1 - \mu_2 = 0\]

\[H_A: \mu_1 - \mu_2 \ne 0\]

Two-sample t-test with Unequal Variance. It can be used t.test() function by determine var.equal = FALSE, and this test is known as Welch two-sample t-test.

exp_status %>% t.test(Life.expectancy ~ Status, data =., var.equal = FALSE, alternative = "two.sided")
## 
##  Welch Two Sample t-test
## 
## data:  Life.expectancy by Status
## t = 12.753, df = 102.42, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   9.305573 12.733045
## sample estimates:
##  mean in group Developed mean in group Developing 
##                 80.70937                 69.69007

interpretation

Discussion

References