Introduction

According to the world health organization, global life expectancy between 2000 and 2019 increased from 66 years to 72 years. An increase in life expectancy by 6 years between 2000 and 2019 is an indication that life expectancy improved by 9.677%. An increase in global life expectancy by 9.677% is a significant increase in life expectancy (Who.int, 2021). However, it is worth acknowledging that the existing variation in life a cross continent and across different countries raise eye brows. Different continents experience different average life expectancy. This paper focuses on the variation in life expectancy across various continent. The variation in life expectancy is caused by quite a number of factors, with the most significant factors being GDP per capita and population size. Thus, this also aims at establishing the impacts of GDP per capita and population size on life expectancy. Despite the fact the world health organization in their 2020 report found that the average life expectancy increased from 66 years to 72 years, Gutin and Hummer (2021) in their study on the social inequality and the future of US life expectancy found that racial and ethnic identity are the primary determinant of life expectancy in the United States. In their study, it was found that African Americans had a lower average life expectancy as compared to the European American. The study concluded that holding all other factors constant, racial and ethnic identity determined the average number of years one has to live in the United States. As discussed above, life expectancy is determined by quite a number of factors, which include but are not limited to GDP per capita and population. GDP per capita is the measure of the average income per head calculated annually or quarterly.

The objective of the study

Like any other study, this study is guided by the following research objectives. I. To establish the impact of GDP per capital on life expectancy II. To establish the impacts of population size on life expectancy III. To establish the difference in the average life expectancy and GDP per capita across continents

Research Questions

Every research seeks to provide a response or an answer to a problem under consideration. The research question provides a ground for which the researcher has to provide the answer for the problem under investigation. The following are the research questions for which this research has to provide the answer. I. What are the impacts of GDP Per capita on life expectancy? II. What are the impacts of population size on life expectancy? III. Is there a statistically significant difference in the average life expectancy across various continents? IV. Is there a statistically significant difference in the average GDP per capita across various continents?

Hypothesis

A hypothesis is a proposition with little or no statistical evidence to prove its validity. Conducting a hypothesis help answer the research question under consideration. The following are the null and alternative hypotheses for this study.

Null hypothesis

There is no statistically significant effect of GDP per capita on life expectancy at a 5% level of significance.

Alternative hypothesis

There is no statistically significant effect of GDP per capita on life expectancy at a 5% level of significance.

Literature Review

The literature review section discusses the theoretical and empirical literature related to GDP per capita, population size and life expectancy.

Empirical Literature

Miladinov (2020) did a study on the relationship between the social economic development and life expectancy. The author collected data from the UE accession countries, which include Macedonia, Serbia, Bosnia and Herzegovina, Montenegro, and Albania. Miladinov (2020) used a time series pooled data from 1990 to 2017. GDP per capita was the social economic variable used as the independent variable and life expectancy used as the dependent variable. From the study, it was established that higher GDP per capita results in a reduced infant mortality with a significant increase in life expectancy. Further, Miladinov (2020) identified income as one of the key determinant of life expectancy. On the other hand, Guo (2016) argues that the existing relationship between GDP per capita and life expectancy is not as simple as it might appear to interpret and understand. In his paper to examine the relationship between GDP and life expectancy, found that when the social economic indicator such as GDP per capita goes beyond the expected level, life expectancy on the other hand goes higher beyond the expected level. Therefore, despite the unclear linear association between GDP per capita and life expectancy, Guo (2016) found a significant and positive relationship between GDP per capita and life expectancy. Dayanikli, Gokare and Kincaid (2016) did a study to establish the effect of GDP per capita on the national life expectancy. The study used cross-sectional data collected from various sources, with the primary purpose of establishing a correlation association between GDP per capita public health expenditure, average years of education and life expectancy. The study, found that a positive and significant correlation between GDP per capita and life expectancy. Further, the study identified individual level of income a primary determinant of life expectancy. Gwatkin and Brandel (1982) focused on the Third World countries to establish the effect of population size on life expectancy. In their study, Gwatkin and Brandel (1982), found that population explosion especially in the least developed countries results in an increase in unemployment rate which ultimately affect overall economy. As a result of an increased population beyond the available job opportunities, cases of inability to meet the daily needs such as medication, education starts to pop up. Further, the inability to meet the daily needs such as medication is becoming the genesis of a reduced life expectancy, especially in the least developed countries. Mackenbach (2002) in his study aimed at establishing the existing income inequality brought about a rapid population growth on the life expectancy. In his study, Mackenbach (2002) found that there exists a strong negative correlation between income inequality and life expectancy. Further, from the study, population growth was found to have a negative correlation with GDP per capita. In other words, an increase in population size significantly results in a low GDP per capita which consequently give rise in a lower life expectancy, especially for the middle and lower income countries. In their study, Torres, Canudas-Romo and Oeppen (2019) found a negative correlation between population growth and life expectancy. The authors measured the distribution of the population in terms of low-mortality and high mortality. The data was collected focusing on the changes in Scottish life expectancy between 1861 and 1910. The variable under consideration were changes in mortality and change in population composition and density. From the study, a negative correlation was established between changes in population composition and density on life expectancy. The findings from the empirical literature above shows that both population and GDP per capita significantly affect life expectancy.

Conceptual Frame Work

The figure above shows the functional relationship between independent variables and the dependent variable, that is, GDP per capita and population as the independent variables and life expectancy as the dependent variable. GDP per capita is theoretically directly related to life expectancy since an increase in GDP per capita leads to an increase in the average number of years an individual is expected to live. On the other hand, population growth has a negative relationship with life expectancy. A change in population composition and density significantly effects life expectancy.

Operationalization of the independent variables

Exogenous variables Indicator Measure GDP Per Capita GDP per Capita National GDP divided by the total population Population Population size Total population size

Methodology

The methodology section of this paper discusses the approach and techniques used in this paper from data collection, sampling and analysis. Further, the variables under consideration in this study (GDP Per capita, population and life expectancy) will be defined in this section.

Research Design

The study employed a quantitative research design where descriptive statistical methods will be used. According Scirp.org (2015), “descriptive statistics help to measure the cause and effect relationship between the variables under consideration.” This in this paper, we shall use descriptive statistics to evaluate the existing relationship between life expectancy, GDP per capita and population. Further, in order to establish the linear effect of GDP per capita and population on life expectancy, we used multiple linear regression analysis, where the effect of GDP per capita and population were estimated.

Population of the study

The population of the study is the total set of elements from which our sample has to come from. In this paper, our population comprises the data on GDP per capita, population size and life expectancy for various countries in various continents.

Sampling Procedure

Different researchers employ different sampling techniques including but are not limited to simple random sampling, systematic sampling, convenience sampling and purposive sampling. This study used purposive sampling technique, which is sometimes referred to as judgmental sampling. Purposive sampling is a sampling technique where the researcher selects the sample data from the population of interest that would produce the desired and expected results. The sample data was used in this study was downloaded from the kaggle.com using the link: https://www.kaggle.com/tklimonova/gapminder-datacamp-2007.

Nature of the data

The data used in this paper was a quantitative in nature. The variables of interest which included, GDP per capita, population size and life expectancy were all quantitative in nature.

Data Analysis

In order to establish the impacts of GDP per capita and population size on life expectancy, both descriptive and inferential statistics were used. A multivariate linear regression analysis was used to establish the effect of GDP per capita and population size on life expectancy. Further, an independent t-test was also used to determine if there exist a significant difference in the average life expectancy across continents.

Linearity Assumption of Regression Analysis

Linearity is one of the assumptions of the classical linear regression modelling. In this assumption, we assume that there is a linear association between dependent and independent variables. In this paper, a linear association between the dependent variable and each individual independent variable was tested. Log-linearizing the variables under consideration help bring about a linear relationship between the dependent and independent variables.

Regression Analytical Model

Consider the sketch below representing the analytic model for this study: Log(life_expectancy) = α+ β1Log(GDP_per_capita) + β2Log(population) + ε

Conducting the Analysis

Load the necessary and required before running the analysis
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.6     ✓ dplyr   1.0.8
## ✓ tidyr   1.2.0     ✓ stringr 1.4.0
## ✓ readr   2.1.2     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(tidyr)
library(ggplot2)
library(ggthemes)
library(stargazer)
## 
## Please cite as:
##  Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
##  R package version 5.2.3. https://CRAN.R-project.org/package=stargazer
library(rmarkdown)
library(gapminder)
Import the data set.
data("gapminder")

View the data set

head(gapminder,10)

Attach the data set

attach(gapminder)

Descriptive Statistics

summary(gapminder)
##         country        continent        year         lifeExp     
##  Afghanistan:  12   Africa  :624   Min.   :1952   Min.   :23.60  
##  Albania    :  12   Americas:300   1st Qu.:1966   1st Qu.:48.20  
##  Algeria    :  12   Asia    :396   Median :1980   Median :60.71  
##  Angola     :  12   Europe  :360   Mean   :1980   Mean   :59.47  
##  Argentina  :  12   Oceania : 24   3rd Qu.:1993   3rd Qu.:70.85  
##  Australia  :  12                  Max.   :2007   Max.   :82.60  
##  (Other)    :1632                                                
##       pop              gdpPercap       
##  Min.   :6.001e+04   Min.   :   241.2  
##  1st Qu.:2.794e+06   1st Qu.:  1202.1  
##  Median :7.024e+06   Median :  3531.8  
##  Mean   :2.960e+07   Mean   :  7215.3  
##  3rd Qu.:1.959e+07   3rd Qu.:  9325.5  
##  Max.   :1.319e+09   Max.   :113523.1  
## 

The summary output above, life expectancy has a mean of 59.47. In other words, the number of year one is expected to live as from the sample collected is 59.47 years. The life expectancy has a minimum and a maximum of 23.60 and 82.60, respectively. On the other hand, GDP per capita has a mean of $ 7215.3 with a minimum and a maximum of $241.2 of $113523.1, respectively. From the sample collected, population size gave a mean of 2.960E+07 with a minimum and a maximum of 6.001E+04 and 1.319E+09, respectively.

Test the normality of the data

shapiro.test(lifeExp)
## 
##  Shapiro-Wilk normality test
## 
## data:  lifeExp
## W = 0.95248, p-value < 2.2e-16
shapiro.test(gdpPercap)
## 
##  Shapiro-Wilk normality test
## 
## data:  gdpPercap
## W = 0.6522, p-value < 2.2e-16
shapiro.test(pop)
## 
##  Shapiro-Wilk normality test
## 
## data:  pop
## W = 0.23598, p-value < 2.2e-16

Consider the null and alternative hypothesis below for conducting normality test.

null hypothesis

The observations in the data set are normally distributed

Alternative hypothesis

The null hypothesis is rejected if and only the p-value is less than 0.05. Therefore, from the results above for the three variables, we rejected the null hypothesis and conclude that the observations from the three variables (life expectancy, GDP per capita and population) are not normally distributed.

Data visualization to test for normality
GDP Per Capita
hist(gdpPercap,breaks = 20, main="Histogram for the GDP Per capita",xlab="GDP Per Capita", ylab="Frequency")

The histogram above shows that in deed the observations for the GDP per capita are not normally distributed. The graph shows that the observations are skewed to the right. However, the log transformed variable (GDP per capita) gave a histogram showing slightly normally distributed observations. Consider the graph below.

hist(log(gdpPercap),breaks = 20, main="Histogram for the log of GDP Per capita",xlab="log of GDP Per Capita", ylab="Frequency")

The histogram above shows the observations for the log transformed GDP per capita are normally distributed.

Population size
hist(pop, main="Histogram for the population size",xlab="Population Size", ylab="Frequency")

Similarly, the graph above show that population size is not normally distributed. Consider the histogram below for the log transformed population.

hist(log(pop), main="Histogram for the log of population size",xlab="log of Population Size", ylab="Frequency")

The histogram above is a clear representation of a normal data having tranformed the variable “pop” into its log equivalence.

Life Expectancy
hist(lifeExp, main="Histogram Showing the Distribution of life expectancy",xlab="Life Expectancy", ylab="Frequency")

The histogram above shows a slightly left skewed distribution. In this case, any attempt to log transform the variable (life expectancy) does not make observation to be normally distributed. For example, consider the graph below.

hist(log(lifeExp), main="Histogram Showing the Distribution of the log of life expectancy",xlab="log of Life Expectancy", ylab="Frequency")

The graph above is an evidence showing that any further log transformation of the variable “lifeEXp” does not bring about normality of the data.

Independent T-Test

An independent t-test is an important statistical test when we examine the difference in the averages between two groups. This paper examine the existing difference in the average life expectancy between various continents. For example consider the output below:

DF <- gapminder %>%
  dplyr::select(continent, lifeExp)%>%
  filter(continent == "Americas"|
           continent == "Asia")

head(DF,5)
tail(DF,5)
t.test(data = DF, lifeExp ~ continent)
## 
##  Welch Two Sample t-test
## 
## data:  lifeExp by continent
## t = 5.713, df = 692.94, p-value = 1.648e-08
## alternative hypothesis: true difference in means between group Americas and group Asia is not equal to 0
## 95 percent confidence interval:
##  3.015071 6.172596
## sample estimates:
## mean in group Americas     mean in group Asia 
##               64.65874               60.06490

The p-value of 1.648e-08, which is approximately 0.0001 indicates that there is a statistically significant difference in the average life expectancy between Asia and Americas. These results are confirmed by findings from Statista (2021), indicating that the average life expectancy in Asia is approximately lower than the average life expectancy in Latin America, Caribbean. This is one example showing that life expectancy significantly varies across continents. Consider the following second output for the independent t-test showing the difference in the average life expectancy for Africa and Europe.

DF2 <- gapminder %>%
  dplyr::select(continent, lifeExp)%>%
  filter(continent == "Africa"|
           continent == "Europe")

head(DF2,5)
tail(DF2,5)
t.test(data = DF2, lifeExp ~ continent)
## 
##  Welch Two Sample t-test
## 
## data:  lifeExp by continent
## t = -49.551, df = 981.2, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group Africa and group Europe is not equal to 0
## 95 percent confidence interval:
##  -23.95076 -22.12595
## sample estimates:
## mean in group Africa mean in group Europe 
##             48.86533             71.90369

The p-value of 2.2e-16 which is approximately 0.0001, is less than 0.05. This indicates that the average life expectancy in African is significantly different from the average life expectancy in Europe. From the output, the average life expectancy in African is approximately 48.87 years, while the average life expectancy in Europe is approximately 71.90 years.

Therefore, from the results in the two scenarios above, it is statistically convincing that the average life expectancy varies across various continents. Consider the following summary table showing the average life expectancy across various continents.

AVG_LIFE_EXP<-gapminder %>%
  dplyr::select(continent, lifeExp)%>%
  group_by(continent)%>%
  summarise(Average_life = mean(lifeExp))
AVG_LIFE_EXP

The results above show that Africa has average life expectancy with Oceania having the highest average life expectancy. It is also argued that regions with higher average GDP per capita experience a higher average life expectancy. According to Fatima (2019) who did a study to examine the relationship between GDP per capita, life expectancy and GDP growth rate, a higher average life expectancy is always associatd with a higher GDP per capita. Consier the output below showing the average GDP per capita across continents.

AVG_GDP_PER_CAP<-gapminder %>%
  dplyr::select(continent, gdpPercap)%>%
  group_by(continent)%>%
  summarise(Average_gdppercap = mean(gdpPercap))
AVG_GDP_PER_CAP

From the results above, Oceania have the highest average GDP per capita with Africa having the least GDP per capita. These results matches the results for the average life expectancy across continents. From the two outputs, continent with highest average GDP per capita has the highest average life expectancy and the continent with least average GDP per capita has the least average life expectancy. This bring us to the conclusion that GDP per capita is the primary determinant of life expectancy.

Linear Relatiosnhip Between GDP Per Capita and Life expectancy

Before running the multivariate linear regression analysis it would be appropriate to test whether each independent variable has a linear association the dependent variable. Consider the scatter plot below

gapminder %>%
  filter(gdpPercap < 50000)%>%
  ggplot(aes(x=gdpPercap, y=lifeExp, col=continent))+
  geom_point(alpha=0.5)+
   xlab("GDP Per Capita")+
  ylab("Life Expectancy")+
  labs(title = "A scatter plot showing the linear association between GDP Per capita and Life Expectancy")

gapminder %>%
  filter(gdpPercap < 50000)%>%
  ggplot(aes(x=log(gdpPercap), y=lifeExp, col=continent))+
  geom_point(alpha=0.5)+
   xlab("log of GDP Per Capita")+
  ylab("Life Expectancy")+
  labs(title = "A scatter plot showing the linear association between GDP Per capita and Life Expectancy")

Th graph above shows a positive linear association between the log of GDP per capita and life expectancy.

ggplot(data=gapminder,aes(x=pop,y=lifeExp))+
  geom_point(alpha=0.7)+
  xlab("Population size")+
  ylab("Life Expectancy")+
  labs(title = "A scatter plot showing the linear association between population size and Life Expectancy")

ggplot(data=gapminder,aes(x=log(pop),y=lifeExp))+
  geom_point(alpha=0.7)+
  xlab("Log of Population size")+
  ylab("Life Expectancy")+
  labs(title = "A scatter plot showing the linear association between the log of population size and Life Expectancy")

Mutltivariate Linear Regression

A multivariate linear regression analysis, is a statistical approach where the effect of more than one independent variables on the dependent variables is assessed. In this paper, we assessed the impacts of GDP per capita and population size on life expectancy. Consider the following results.

model <- lm(log(lifeExp)~log(gdpPercap)+log(pop), data=gapminder)
summary(model)
## 
## Call:
## lm(formula = log(lifeExp) ~ log(gdpPercap) + log(pop), data = gapminder)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.67318 -0.07028  0.01482  0.09189  0.35485 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    2.501815   0.039451   63.42   <2e-16 ***
## log(gdpPercap) 0.145427   0.002726   53.35   <2e-16 ***
## log(pop)       0.023564   0.002108   11.18   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1395 on 1701 degrees of freedom
## Multiple R-squared:  0.6397, Adjusted R-squared:  0.6392 
## F-statistic:  1510 on 2 and 1701 DF,  p-value: < 2.2e-16

Results above helped in providing response to our research question. Besides, the results in decinding whether to reject of accept the null hypothesis. According to Schneider (2014) null hypothesis is rejected if and only the p-value associated with the regression coefficient is less than 0.05. From the regression coefficients, we saw that a unit change in GDP per capita results in 0.145427 unit change in life expectancy. The p-value of 2e-16, which is approximately 0.0001 indicates that the effect of GDP per capita on life expectancy is statistically significant as a 1% level of significance. On the other hand, a unit change in population size results in 0.023564 unit change in life expectancy but in the opposite direction and the effect is statistically significant as indicated by the of 4.72e-05, which is approximately 0.0001 at a 1% level of significance. According to Fang and Yang (2019), significance is the probability of committing type I error. that is, the probability of reject the a true hypothesis. Therefore, this test was conducted at a 1% chances of committing type I error. Besides, the p-value in both scenarios are less than 0.01 indicating that both GDP per capita and population size have a statistically significant effect on life expectancy at a 1% level of significance. The regression results gave an adjusted R-squared of 0.6392. This shows that 63.92% variation in life expectancy is explained by GDP per capita and population size. The F-test (1701), p= 2.2e-16 <0.05 on the other show significant results implying that the model is fit for prediction.

Contribution of the Paper in the Social Science

According to Kim and Kim (2018), life expectancy is a social aspects that is defined as the average number of years that a person is expected to live. On the other hand, the variation in GDP per capita across countries and across continents unveils a picture of income inequality across countries and across continents, which Kim and Kim (2018) defined as “inequality in the distribution of expected span of life-based on data from survival tables estimated using the Atkinson inequality index.” This paper aimed at unveiling hidden truth about the variation in life expectancy across various regions as determine by various factors such GDP per capita and population. The findings in this paper are not helpful in adding knowledge to existing body of literature but would also help provide in the policy formulation regarding the improvement life expectancy especially for least developed countries. It is however suggested that future research should focus on other social economic status such as education level as suggested by Mirowsky and Ross (2000) in their paper on social economic status and subjective life expectancy.