The chosen dataset provides information about the life expectancy of people in different countries. It consists of 22 columns and 2938 rows of data, which were collected from the World Health Organization (WHO) and the United Nations Development Program (UNDP) from 2000 to 2015 and covers different aspects of life expectancy like infant mortality rate, adult mortality rate, GDP per capita, education, and healthcare expenditure. The information can also be used to predict the life expectancy of people in different countries based on the factors that influence it. Researchers can use the dataset to conduct studies on the impact of healthcare and education on life expectancy and to suggest policy changes that can improve people’s quality of life.

The dataset is relevant as life expectancy is a crucial indicator of a country’s development and well-being. This can be used to study and explore the trends in life expectancy over the years and identify the countries that need more attention in healthcare and education via visual analysis and data analytics.

I. ANOVA TECHNIQUE AND HYPOTHESIS 1

1.1. INTRO OF ANOVA TECHNIQUE

ANOVA, or Analysis of Variance, is a statistical technique used to compare the means of two or more groups or treatments. It determines whether the groups have significant differences based on the variation observed in the data. ANOVA is commonly used when there are more than two groups to compare and is particularly useful for experimental designs with categorical independent variables. By analyzing the variance between and within groups, ANOVA helps determine if significant differences exist in the means and provides insights into the factors contributing to these differences.

1.2. INTRO OF HYPOTHESIS 1

- Null Hypothesis (H₀): There is no significant difference in life expectancy across different country statuses.

- Alternative Hypothesis (H₁): There is a significant difference in life expectancy across different country statuses.
  • In this hypothesis, we examine whether there are any significant variations in life expectancy based on the country status variable. The ANOVA technique can determine if the mean life expectancy differs significantly between the different categories of country status (e.g., developed and developing countries). We can obtain an F-statistic and associated p-value by conducting an ANOVA analysis.

Possible Results Interpretation:

  • Suppose the p-value is less than the predetermined significance level (e.g., 0.05). In that case, we reject the null hypothesis (H₀) and conclude that there is a significant difference in life expectancy among the various country statuses. This indicates that country status has a significant impact on life expectancy.

  • Conversely, if the p-value exceeds the significance level, we fail to reject the null hypothesis. In such cases, insufficient evidence suggests a significant difference in life expectancy across different country statuses.

1.3. R CODE

## [1] 2.465086e-170
## === Analysis of Variance (ANOVA) ===
##               Df Sum Sq Mean Sq F value Pr(>F)    
## Status         1  61715   61715   886.2 <2e-16 ***
## Residuals   2926 203776      70                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 10 observations deleted due to missingness
## The overall ANOVA is statistically significant (p <  0.01 ).
## There is evidence of a significant difference in life expectancy across different country statuses.
## The overall ANOVA is statistically significant (p < 0.05).
## There is evidence of a significant difference in life expectancy across different country statuses.

1.4. ANALYZE

The ANOVA analysis investigated the relationship between the “Status” variable (representing different country statuses) and the “Life” variable (representing life expectancy), and revealed a highly significant difference in life expectancy across different country statuses (p < 2e-16). The p-value (2.465086e-170) provides strong evidence against the null hypothesis and supports the alternative hypothesis.

F-value: The F-value obtained from this ANOVA analysis was 886.2, which means a significant effect of the country’s status on life expectancy. The mean square for Status was 61715, suggesting that a considerable amount of the variation in life expectancy can be explained by the country status compared to the other variation.

P-value: Considering a significance level of 0.01, the obtained p-value (2.465086e-170) is much smaller, leading to rejecting the null hypothesis. Furthermore, at a significance level of 0.05, the p-value (2.465086e-170) remains smaller than the threshold, reinforcing the significance level of the overall ANOVA. This further supports the conclusion that there is a significant difference in life expectancy across different country statuses.

In conclusion, the statistical analysis results provide obvious evidence to reject the null hypothesis. The findings suggest a substantial and statistically significant difference in life expectancy across various country statuses while supporting the alternative hypothesis.

II. PEARSON CORRELATION COEFFICIENT TECHNIQUE AND HYPOTHESIS 2

2.1. INTRO OF PEARSON CORRELATION COEFFICIENT TECHNIQUE

The Pearson correlation coefficient is a statistical measure used to quantify the strength and direction of the linear relationship between two variables. It ranges from -1 to 1, indicating the degree of correlation between the variables. This technique is used to analyze continuous data and identify patterns and dependencies.

2.2. INTRO OF HYPOTHESIS 2

- H₀ (Null Hypothesis): There is no significant linear relationship between GDP and Schooling in low-income countries.

- H₁ (Alternative Hypothesis): A significant linear relationship exists between GDP and Schooling for low-income countries.

The hypothesis examines the potential linear relationship between GDP (Gross Domestic Product) and Schooling in low-income countries.

The null hypothesis (H0) states that there is no significant linear relationship between GDP and Schooling in low-income countries. In other words, the null hypothesis assumes that changes in GDP do not have a meaningful impact on the level of schooling in these countries. On the other hand, the alternative hypothesis (H1) suggests a significant linear relationship exists between GDP and Schooling in low-income countries. This implies that changes in GDP are associated with variations in the level of schooling, indicating that economic factors play a role in determining the educational outcomes in these countries.

By testing these two hypotheses, we aim to gain insights into the relationship between GDP and Schooling in low-income countries and whether the economics affect educational opportunities.

Possible Results Interpretation

  • If the correlation coefficient is close to 0, it suggests a weak or no linear relationship between GDP and Schooling.

  • If the correlation coefficient is positive, it indicates a positive linear relationship, implying that Schooling also tends to increase as GDP increases.

  • If the correlation coefficient is negative, it indicates a negative linear relationship, implying that as GDP increases, Schooling tends to decrease.

2.3. R CODE

## [1] "Pearson correlation coefficient: 0.108196033259806"

2.4. ANALYZE

  • P-value: The p-value from the correlation test is 0.000666236091774794, which is less than the significance level of 0.05. This indicates strong evidence against the null hypothesis (H0) that there is no significant linear relationship between GDP and Schooling in low-income countries.

  • Coefficient: The Pearson correlation coefficient between GDP and Schooling in low-income countries is 0.108196033259806. The correlation coefficient measures the strength and direction of the linear relationship. In this case, the positive correlation coefficient suggests a weak positive linear relationship between GDP and Schooling in low-income countries.

In conclusion, the statistical analysis supports the alternative hypothesis (H1) that a significant linear relationship exists between GDP and Schooling in low-income countries, although the relationship is weak.

III. LINEAR REGRESSION TECHNIQUE AND HYPOTHESIS 3

3.1. INTRO OF LINEAR REGRESSION TECHNIQUE

The Linear Regression Technique allows the assessment of the relationship between two variables. It focuses on predicting the value of a dependent variable based on an independent variable. We’re employing this technique to understand the relationship between BMI (Body Mass Index) and adult mortality rates in the top 5 wealthiest countries.

The process begins with data reading and cleaning, selecting necessary columns, and removing missing values. Next, we calculate GDP per capita and select the top 5 wealthiest countries based on this index.

We utilize the Linear Regression technique to examine the relationship between BMI and mortality rates. This model helps us assess the variation in mortality rates based on BMI. The model results provide coefficients, especially the p-value, aiding in evaluating the statistical significance of this relationship.

Finally, we use the model’s output to create a scatter plot, a simple yet effective visualization tool, to depict the relationship between BMI and mortality rates in the top 5 wealthiest countries. This helps us comprehend the correlation between these factors visually and straightforwardly.

3.2. INTRO OF HYPOTHESIS 3

- Null Hypothesis (H₀): "There is no significant relationship between BMI and adult mortality rates in the top 5 wealthiest countries."

- Alternative hypothesis (H₁): "There is a significant relationship between BMI and adult mortality rates in the world's five wealthiest countries."

In this hypothesis, we investigate ” no significant relationship between BMI and adult mortality rates in the top 5 wealthiest countries.”

3.3. R CODE

Top 5 Wealthiest Countries
Country
Maldives
Georgia
Israel
Croatia
Tonga
Data for Top 5 Wealthiest Countries
Country bmi Adult
Croatia 63.7 95
Croatia 63.1 97
Croatia 62.5 97
Croatia 61.9 14
Croatia 61.3 14
Croatia 6.6 16
Croatia 6.0 19
Croatia 59.4 116
Croatia 58.7 114
Croatia 58.1 113
Croatia 57.5 116
Croatia 56.9 114
Croatia 56.3 122
Croatia 55.8 124
Croatia 55.2 126
Croatia 54.7 127
Georgia 56.2 129
Georgia 55.3 125
Georgia 54.4 128
Georgia 53.6 13
Georgia 52.8 127
Georgia 52.0 132
Georgia 51.3 133
Georgia 5.5 128
Georgia 49.9 12
Georgia 49.2 126
Georgia 48.6 128
Georgia 48.1 134
Georgia 47.5 132
Georgia 47.0 142
Georgia 46.5 121
Georgia 46.0 129
Israel 64.9 58
Israel 64.6 6
Israel 64.2 61
Israel 63.8 6
Israel 63.4 61
Israel 63.0 61
Israel 62.6 63
Israel 62.1 65
Israel 61.6 68
Israel 61.1 68
Israel 6.6 71
Israel 6.1 69
Israel 59.6 71
Israel 59.2 74
Israel 58.7 74
Israel 58.3 76
Maldives 27.4 61
Maldives 26.2 62
Maldives 25.1 64
Maldives 24.1 65
Maldives 23.1 67
Maldives 22.1 73
Maldives 21.2 75
Maldives 2.3 81
Maldives 19.5 82
Maldives 18.7 88
Maldives 18.0 93
Maldives 17.3 16
Maldives 16.7 112
Maldives 16.2 124
Maldives 15.6 129
Maldives 15.2 139
Tonga 75.2 133
Tonga 74.8 135
Tonga 74.3 137
Tonga 73.8 138
Tonga 73.3 14
Tonga 72.7 142
Tonga 72.1 147
Tonga 71.5 145
Tonga 7.8 146
Tonga 7.1 148
Tonga 69.4 15
Tonga 68.6 151
Tonga 67.8 153
Tonga 67.0 155
Tonga 66.2 157
Tonga 65.5 158
## 
## Call:
## lm(formula = Adult ~ bmi, data = top_5_data)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -93.58 -28.20  15.49  35.13  62.95 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  83.2580    11.8600   7.020 7.21e-10 ***
## bmi           0.2526     0.2270   1.113    0.269    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 44.29 on 78 degrees of freedom
## Multiple R-squared:  0.01563,    Adjusted R-squared:  0.003006 
## F-statistic: 1.238 on 1 and 78 DF,  p-value: 0.2692

3.4. ANALYZE

The results from the linear regression model allow us to assess the relationship between BMI and adult mortality rates in the top 5 wealthiest countries.

  • Coefficients: For the Body Mass Index (BMI) coefficient, the estimated value is 0.2526, with a standard error of 0.2270. This indicates a predictive relationship between BMI and mortality rates. However, the p-value for BMI (0.269) does not reach statistical significance (assuming a significance level of 0.05).

  • R-squared: The R-squared value is 0.01563, suggesting that only about 1.56% of the variation in mortality rates can be explained by the linear model using BMI. This implies that the model does not explain much of the data variance.

  • P-value: The p-value for the BMI coefficient is 0.269 (> 0.05), which is insufficient to reject the null hypothesis of no significant relationship between BMI and adult mortality rates in the top 5 wealthiest countries.

Based on these results and the initial hypothesis, insufficient statistical evidence exists to assert a significant relationship between BMI and adult mortality rates in the top 5 wealthiest countries. This might suggest no clear relationship between BMI and mortality rates within this dataset.

IV. T-TEST TECHNIQUE AND HYPOTHESIS 4

4.1. INTRO OF T-TEST TECHNIQUE

The t-test is a statistical technique used to compare means between two groups and determine if the observed difference is statistically significant. We can assess the significance of the observed difference by calculating a t-value and comparing it to a critical value.

4.2. INTRO OF HYPOTHESIS 4

- Null Hypothesis (H₀): There is no significant improvement in global life expectancy between consecutive years.

- Alternative Hypothesis (H₁): There is a significant improvement in global life expectancy between consecutive years.

With these hypotheses, we aim to evaluate whether there is evidence to reject the null hypothesis in favor of the alternative hypothesis, suggesting a significant improvement in global life expectancy over time.

4.3. R CODE

## 
##  Paired t-test
## 
## data:  data_2005 and data_2000
## t = 6.9778, df = 182, p-value = 2.695e-11
## alternative hypothesis: true mean difference is greater than 0
## 95 percent confidence interval:
##  1.113328      Inf
## sample estimates:
## mean difference 
##        1.459016
## 
##  Paired t-test
## 
## data:  data_2010 and data_2005
## t = 6.9744, df = 182, p-value = 2.747e-11
## alternative hypothesis: true mean difference is greater than 0
## 95 percent confidence interval:
##  1.403333      Inf
## sample estimates:
## mean difference 
##        1.839344
## 
##  Paired t-test
## 
## data:  data_2015 and data_2010
## t = 6.5621, df = 182, p-value = 2.673e-10
## alternative hypothesis: true mean difference is greater than 0
## 95 percent confidence interval:
##  1.173185      Inf
## sample estimates:
## mean difference 
##        1.568306

4.4. ANALYZE

The extremely small p-value (close to zero) provides strong evidence to reject the null hypothesis, indicating a significant difference in life expectancies between 2000 and 2015. To be specific:

Paired t-test 2000-2005

  • t-value: 6.9778

  • Degrees of freedom (df): 182

  • p-value: 2.695e-11

  • Alternative hypothesis: The true mean difference is greater than 0.

  • 95% confidence interval: 1.113328 to Inf

  • Sample estimate of the mean difference: 1.459016

  • Interpretation: The p-value is extremely small (2.695e-11), indicating strong evidence to reject the null hypothesis. Hence, the results suggest a statistically significant increase in life expectancy between 2000 and 2005.

Paired t-test 2005-2010

  • t-value: 6.9744

  • df: 182

  • p-value: 2.747e-11

  • Alternative hypothesis: The true mean difference is greater than 0.

  • 95% confidence interval: 1.403333 to Inf

  • Sample estimate of the mean difference: 1.839344

  • Interpretation: The p-value is extremely small (2.747e-11), providing strong evidence to reject the null hypothesis. The results indicate a statistically significant increase in life expectancy between 2005 and 2010.

Paired t-test 2010-2015

  • t-value: 6.5621

  • df: 182

  • p-value: 2.673e-10

  • Alternative hypothesis: The true mean difference is greater than 0.

  • 95% confidence interval: 1.173185 to Inf

  • Sample estimate of the mean difference: 1.568306 Interpretation: The p-value is extremely small (2.673e-10), providing strong evidence to reject the null hypothesis. The results indicate a statistically significant increase in life expectancy between 2010 and 2015.

Overall, the statistical analysis shows solid evidence of increasing life expectancy globally over the course of 15 years, with significant differences observed between each consecutive year.