Pandaree Tangyoocharoen (s3829627)
Last updated: 25 October, 2020
Rpubs link comes here:
Figure1: Comparing between developed and developing countries
Year : Year of observation
Status : Status of countries (developed/developing)
life expectancy : Life expectancy (year) of people
Subset data from the dataset in order to prepare for analysis.
The data represent life expectancy in developed countries.
## # A tibble: 32 x 2
## `developed$Status` `developed$Life.expectancy`
## <chr> <dbl>
## 1 Developed 82.8
## 2 Developed 81.5
## 3 Developed 81.1
## 4 Developed 74.5
## 5 Developed 78
## 6 Developed 85
## 7 Developed 78.8
## 8 Developed 86
## 9 Developed 81
## 10 Developed 75.8
## # … with 22 more rows
The data represent life expectancy in developing countries.
## # A tibble: 151 x 2
## `developing$Status` `developing$Life.expectancy`
## <chr> <dbl>
## 1 Developing 65
## 2 Developing 77.8
## 3 Developing 75.6
## 4 Developing 52.4
## 5 Developing 76.4
## 6 Developing 76.3
## 7 Developing 74.8
## 8 Developing 72.7
## 9 Developing 76.1
## 10 Developing 76.9
## # … with 141 more rows
Checking type of observations. And then create factor of Status.
## 'data.frame': 183 obs. of 2 variables:
## $ Status : chr "Developing" "Developing" "Developing" "Developing" ...
## $ Life.expectancy: num 65 77.8 75.6 52.4 76.4 76.3 74.8 82.8 81.5 72.7 ...
In this step, Visualisation by Box plot for comparing life expectancy in developed and developing countries.
Using Q-Q plot to check data normality.
par(mfrow = c(1, 2))
qqPlot(developed$Life.expectancy, dist = "norm", ylab = "Life expectancy (developed)")## [1] 27 16
## [1] 120 4
The summary of life expectancy values in two groups.
exp_status %>% group_by(Status) %>% summarise(Min = min(Life.expectancy, na.rm = TRUE),
Q1 = quantile(Life.expectancy, probs = .25, na.rm = TRUE),
Median = median(Life.expectancy, na.rm = TRUE),
Q3 = round(quantile(Life.expectancy, probs = .75, na.rm = TRUE),2),
Max = max(Life.expectancy, na.rm = TRUE),
Mean =round(mean(Life.expectancy, na.rm = TRUE),2),
SD = round(sd(Life.expectancy, na.rm = TRUE),2),
n = n(),
Missing = sum(is.na(Life.expectancy))) ## # A tibble: 2 x 10
## Status Min Q1 Median Q3 Max Mean SD n Missing
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <int>
## 1 Developed 73.6 78.6 81.6 82.7 88 80.7 3.46 32 0
## 2 Developing 51 64.6 71.6 75.5 85 69.7 7.5 151 0
Using leveneTest() function in order to compare the variances of developed and developing countries.
\[H_0: a_1^2 = σ_2^2 \]
\[H_A: a_1^2 ≠ σ_2^2 \]
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 1 19.469 1.753e-05 ***
## 181
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
As the result shows \(p\) < 0.05. Thus, it need to reject \(H_0\) and it can assume that there are unequal variance.
Statistical hypotheses for two-sample t-test:
\[H_0: \mu_1 - \mu_2 = 0\]
\[H_A: \mu_1 - \mu_2 \ne 0\]
Two-sample t-test with Unequal Variance. It can be used t.test() function by determine var.equal = FALSE, and this test is known as Welch two-sample t-test.
exp_status %>% t.test(Life.expectancy ~ Status, data =., var.equal = FALSE, alternative = "two.sided")##
## Welch Two Sample t-test
##
## data: Life.expectancy by Status
## t = 12.753, df = 102.42, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 9.305573 12.733045
## sample estimates:
## mean in group Developed mean in group Developing
## 80.70937 69.69007
Levene’s test showed statistically significant and it assume that unequal variance.
So, the test used Welch two-sample t-test in order to compare two independent samples.
The hypothesis test illustrated there is a significant difference between mean of developed and developing countries.
The result of two-sample t-test represents t = 12.753, \(p\) < 0.05 at 95% CI of difference in means [9.306,12.733], which is not capture \(H_0\).
The finding represent developed and developing countries (in 2015) have affect to life expectancy significantly.
The average of life expectancy in developed country are 80.71, meanwhile developing countries are 69.69.
Therefore, it can conclude that people in developed countries have longer life expectancy.
As the box plot shows that there are no outliers in developed and developing countries’ life expectancy. However, in the future, if the data has any outliers, we need to fix them before testing.
Nevertheless, this investigation has very different number of sample size. Thus, The next investigation would like to increase the sample size of developed countries as the same as developing countries for comparing them in order to be ensure that there have an accurate analysis.