According to the world health organization, global life expectancy between 2000 and 2019 increased from 66 years to 72 years. An increase in life expectancy by 6 years between 2000 and 2019 is an indication that life expectancy improved by 9.677%. An increase in global life expectancy by 9.677% is a significant increase in life expectancy (Who.int, 2021). However, it is worth acknowledging that the existing variation in life a cross continent and across different countries raise eye brows. Different continents experience different average life expectancy. This paper focuses on the variation in life expectancy across various continent. The variation in life expectancy is caused by quite a number of factors, with the most significant factors being GDP per capita and population size. Thus, this also aims at establishing the impacts of GDP per capita and population size on life expectancy. Despite the fact the world health organization in their 2020 report found that the average life expectancy increased from 66 years to 72 years, Gutin and Hummer (2021) in their study on the social inequality and the future of US life expectancy found that racial and ethnic identity are the primary determinant of life expectancy in the United States. In their study, it was found that African Americans had a lower average life expectancy as compared to the European American. The study concluded that holding all other factors constant, racial and ethnic identity determined the average number of years one has to live in the United States. As discussed above, life expectancy is determined by quite a number of factors, which include but are not limited to GDP per capita and population. GDP per capita is the measure of the average income per head calculated annually or quarterly.
Like any other study, this study is guided by the following research objectives. I. To establish the impact of GDP per capital on life expectancy II. To establish the impacts of population size on life expectancy III. To establish the difference in the average life expectancy and GDP per capita across continents
Every research seeks to provide a response or an answer to a problem under consideration. The research question provides a ground for which the researcher has to provide the answer for the problem under investigation. The following are the research questions for which this research has to provide the answer. I. What are the impacts of GDP Per capita on life expectancy? II. What are the impacts of population size on life expectancy? III. Is there a statistically significant difference in the average life expectancy across various continents? IV. Is there a statistically significant difference in the average GDP per capita across various continents?
A hypothesis is a proposition with little or no statistical evidence to prove its validity. Conducting a hypothesis help answer the research question under consideration. The following are the null and alternative hypotheses for this study.
There is no statistically significant effect of GDP per capita on life expectancy at a 5% level of significance.
There is no statistically significant effect of GDP per capita on life expectancy at a 5% level of significance.
The literature review section discusses the theoretical and empirical literature related to GDP per capita, population size and life expectancy.
Miladinov (2020) did a study on the relationship between the social economic development and life expectancy. The author collected data from the UE accession countries, which include Macedonia, Serbia, Bosnia and Herzegovina, Montenegro, and Albania. Miladinov (2020) used a time series pooled data from 1990 to 2017. GDP per capita was the social economic variable used as the independent variable and life expectancy used as the dependent variable. From the study, it was established that higher GDP per capita results in a reduced infant mortality with a significant increase in life expectancy. Further, Miladinov (2020) identified income as one of the key determinant of life expectancy. On the other hand, Guo (2016) argues that the existing relationship between GDP per capita and life expectancy is not as simple as it might appear to interpret and understand. In his paper to examine the relationship between GDP and life expectancy, found that when the social economic indicator such as GDP per capita goes beyond the expected level, life expectancy on the other hand goes higher beyond the expected level. Therefore, despite the unclear linear association between GDP per capita and life expectancy, Guo (2016) found a significant and positive relationship between GDP per capita and life expectancy. Dayanikli, Gokare and Kincaid (2016) did a study to establish the effect of GDP per capita on the national life expectancy. The study used cross-sectional data collected from various sources, with the primary purpose of establishing a correlation association between GDP per capita public health expenditure, average years of education and life expectancy. The study, found that a positive and significant correlation between GDP per capita and life expectancy. Further, the study identified individual level of income a primary determinant of life expectancy. Gwatkin and Brandel (1982) focused on the Third World countries to establish the effect of population size on life expectancy. In their study, Gwatkin and Brandel (1982), found that population explosion especially in the least developed countries results in an increase in unemployment rate which ultimately affect overall economy. As a result of an increased population beyond the available job opportunities, cases of inability to meet the daily needs such as medication, education starts to pop up. Further, the inability to meet the daily needs such as medication is becoming the genesis of a reduced life expectancy, especially in the least developed countries. Mackenbach (2002) in his study aimed at establishing the existing income inequality brought about a rapid population growth on the life expectancy. In his study, Mackenbach (2002) found that there exists a strong negative correlation between income inequality and life expectancy. Further, from the study, population growth was found to have a negative correlation with GDP per capita. In other words, an increase in population size significantly results in a low GDP per capita which consequently give rise in a lower life expectancy, especially for the middle and lower income countries. In their study, Torres, Canudas-Romo and Oeppen (2019) found a negative correlation between population growth and life expectancy. The authors measured the distribution of the population in terms of low-mortality and high mortality. The data was collected focusing on the changes in Scottish life expectancy between 1861 and 1910. The variable under consideration were changes in mortality and change in population composition and density. From the study, a negative correlation was established between changes in population composition and density on life expectancy. The findings from the empirical literature above shows that both population and GDP per capita significantly affect life expectancy.
The figure above shows the functional relationship between independent variables and the dependent variable, that is, GDP per capita and population as the independent variables and life expectancy as the dependent variable. GDP per capita is theoretically directly related to life expectancy since an increase in GDP per capita leads to an increase in the average number of years an individual is expected to live. On the other hand, population growth has a negative relationship with life expectancy. A change in population composition and density significantly effects life expectancy.
Exogenous variables Indicator Measure GDP Per Capita GDP per Capita National GDP divided by the total population Population Population size Total population size
The methodology section of this paper discusses the approach and techniques used in this paper from data collection, sampling and analysis. Further, the variables under consideration in this study (GDP Per capita, population and life expectancy) will be defined in this section.
The study employed a quantitative research design where descriptive statistical methods will be used. According Scirp.org (2015), “descriptive statistics help to measure the cause and effect relationship between the variables under consideration.” This in this paper, we shall use descriptive statistics to evaluate the existing relationship between life expectancy, GDP per capita and population. Further, in order to establish the linear effect of GDP per capita and population on life expectancy, we used multiple linear regression analysis, where the effect of GDP per capita and population were estimated.
The population of the study is the total set of elements from which our sample has to come from. In this paper, our population comprises the data on GDP per capita, population size and life expectancy for various countries in various continents.
Different researchers employ different sampling techniques including but are not limited to simple random sampling, systematic sampling, convenience sampling and purposive sampling. This study used purposive sampling technique, which is sometimes referred to as judgmental sampling. Purposive sampling is a sampling technique where the researcher selects the sample data from the population of interest that would produce the desired and expected results. The sample data was used in this study was downloaded from the kaggle.com using the link: https://www.kaggle.com/tklimonova/gapminder-datacamp-2007.
The data used in this paper was a quantitative in nature. The variables of interest which included, GDP per capita, population size and life expectancy were all quantitative in nature.
In order to establish the impacts of GDP per capita and population size on life expectancy, both descriptive and inferential statistics were used. A multivariate linear regression analysis was used to establish the effect of GDP per capita and population size on life expectancy. Further, an independent t-test was also used to determine if there exist a significant difference in the average life expectancy across continents.
Linearity is one of the assumptions of the classical linear regression modelling. In this assumption, we assume that there is a linear association between dependent and independent variables. In this paper, a linear association between the dependent variable and each individual independent variable was tested. Log-linearizing the variables under consideration help bring about a linear relationship between the dependent and independent variables.
Consider the sketch below representing the analytic model for this study: Log(life_expectancy) = α+ β1Log(GDP_per_capita) + β2Log(population) + ε
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5 ✓ purrr 0.3.4
## ✓ tibble 3.1.6 ✓ dplyr 1.0.8
## ✓ tidyr 1.2.0 ✓ stringr 1.4.0
## ✓ readr 2.1.2 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(tidyr)
library(ggplot2)
library(ggthemes)
library(stargazer)
##
## Please cite as:
## Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
## R package version 5.2.3. https://CRAN.R-project.org/package=stargazer
library(rmarkdown)
library(gapminder)
data("gapminder")
head(gapminder,10)
attach(gapminder)
summary(gapminder)
## country continent year lifeExp
## Afghanistan: 12 Africa :624 Min. :1952 Min. :23.60
## Albania : 12 Americas:300 1st Qu.:1966 1st Qu.:48.20
## Algeria : 12 Asia :396 Median :1980 Median :60.71
## Angola : 12 Europe :360 Mean :1980 Mean :59.47
## Argentina : 12 Oceania : 24 3rd Qu.:1993 3rd Qu.:70.85
## Australia : 12 Max. :2007 Max. :82.60
## (Other) :1632
## pop gdpPercap
## Min. :6.001e+04 Min. : 241.2
## 1st Qu.:2.794e+06 1st Qu.: 1202.1
## Median :7.024e+06 Median : 3531.8
## Mean :2.960e+07 Mean : 7215.3
## 3rd Qu.:1.959e+07 3rd Qu.: 9325.5
## Max. :1.319e+09 Max. :113523.1
##
The summary output above, life expectancy has a mean of 59.47. In other words, the number of year one is expected to live as from the sample collected is 59.47 years. The life expectancy has a minimum and a maximum of 23.60 and 82.60, respectively. On the other hand, GDP per capita has a mean of $ 7215.3 with a minimum and a maximum of $241.2 of $113523.1, respectively. From the sample collected, population size gave a mean of 2.960E+07 with a minimum and a maximum of 6.001E+04 and 1.319E+09, respectively.
shapiro.test(lifeExp)
##
## Shapiro-Wilk normality test
##
## data: lifeExp
## W = 0.95248, p-value < 2.2e-16
shapiro.test(gdpPercap)
##
## Shapiro-Wilk normality test
##
## data: gdpPercap
## W = 0.6522, p-value < 2.2e-16
shapiro.test(pop)
##
## Shapiro-Wilk normality test
##
## data: pop
## W = 0.23598, p-value < 2.2e-16
Consider the null and alternative hypothesis below for conducting normality test.
The observations in the data set are normally distributed
The null hypothesis is rejected if and only the p-value is less than 0.05. Therefore, from the results above for the three variables, we rejected the null hypothesis and conclude that the observations from the three variables (life expectancy, GDP per capita and population) are not normally distributed.
hist(gdpPercap,breaks = 20, main="Histogram for the GDP Per capita",xlab="GDP Per Capita", ylab="Frequency")
The histogram above shows that in deed the observations for the GDP per capita are not normally distributed. The graph shows that the observations are skewed to the right. However, the log transformed variable (GDP per capita) gave a histogram showing slightly normally distributed observations. Consider the graph below.
hist(log(gdpPercap),breaks = 20, main="Histogram for the log of GDP Per capita",xlab="log of GDP Per Capita", ylab="Frequency")
The histogram above shows the observations for the log transformed GDP
per capita are normally distributed.
hist(pop, main="Histogram for the population size",xlab="Population Size", ylab="Frequency")
Similarly, the graph above show that population size is not normally
distributed. Consider the histogram below for the log transformed
population.
hist(log(pop), main="Histogram for the log of population size",xlab="log of Population Size", ylab="Frequency")
The histogram above is a clear representation of a normal data having
tranformed the variable “pop” into its log equivalence.
hist(lifeExp, main="Histogram Showing the Distribution of life expectancy",xlab="Life Expectancy", ylab="Frequency")
The histogram above shows a slightly left skewed distribution. In this
case, any attempt to log transform the variable (life expectancy) does
not make observation to be normally distributed. For example, consider
the graph below.
hist(log(lifeExp), main="Histogram Showing the Distribution of the log of life expectancy",xlab="log of Life Expectancy", ylab="Frequency")
The graph above is an evidence showing that any further log
transformation of the variable “lifeEXp” does not bring about normality
of the data.
An independent t-test is an important statistical test when we examine the difference in the averages between two groups. This paper examine the existing difference in the average life expectancy between various continents. For example consider the output below:
DF <- gapminder %>%
dplyr::select(continent, lifeExp)%>%
filter(continent == "Americas"|
continent == "Asia")
head(DF,5)
tail(DF,5)
t.test(data = DF, lifeExp ~ continent)
##
## Welch Two Sample t-test
##
## data: lifeExp by continent
## t = 5.713, df = 692.94, p-value = 1.648e-08
## alternative hypothesis: true difference in means between group Americas and group Asia is not equal to 0
## 95 percent confidence interval:
## 3.015071 6.172596
## sample estimates:
## mean in group Americas mean in group Asia
## 64.65874 60.06490
The p-value of 1.648e-08, which is approximately 0.0001 indicates that there is a statistically significant difference in the average life expectancy between Asia and Americas. These results are confirmed by findings from Statista (2021), indicating that the average life expectancy in Asia is approximately lower than the average life expectancy in Latin America, Caribbean. This is one example showing that life expectancy significantly varies across continents. Consider the following second output for the independent t-test showing the difference in the average life expectancy for Africa and Europe.
DF2 <- gapminder %>%
dplyr::select(continent, lifeExp)%>%
filter(continent == "Africa"|
continent == "Europe")
head(DF2,5)
tail(DF2,5)
t.test(data = DF2, lifeExp ~ continent)
##
## Welch Two Sample t-test
##
## data: lifeExp by continent
## t = -49.551, df = 981.2, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group Africa and group Europe is not equal to 0
## 95 percent confidence interval:
## -23.95076 -22.12595
## sample estimates:
## mean in group Africa mean in group Europe
## 48.86533 71.90369
The p-value of 2.2e-16 which is approximately 0.0001, is less than 0.05. This indicates that the average life expectancy in African is significantly different from the average life expectancy in Europe. From the output, the average life expectancy in African is approximately 48.87 years, while the average life expectancy in Europe is approximately 71.90 years.
Therefore, from the results in the two scenarios above, it is statistically convincing that the average life expectancy varies across various continents. Consider the following summary table showing the average life expectancy across various continents.
AVG_LIFE_EXP<-gapminder %>%
dplyr::select(continent, lifeExp)%>%
group_by(continent)%>%
summarise(Average_life = mean(lifeExp))
AVG_LIFE_EXP
The results above show that Africa has average life expectancy with Oceania having the highest average life expectancy. It is also argued that regions with higher average GDP per capita experience a higher average life expectancy. According to Fatima (2019) who did a study to examine the relationship between GDP per capita, life expectancy and GDP growth rate, a higher average life expectancy is always associatd with a higher GDP per capita. Consier the output below showing the average GDP per capita across continents.
AVG_GDP_PER_CAP<-gapminder %>%
dplyr::select(continent, gdpPercap)%>%
group_by(continent)%>%
summarise(Average_gdppercap = mean(gdpPercap))
AVG_GDP_PER_CAP
From the results above, Oceania have the highest average GDP per capita with Africa having the least GDP per capita. These results matches the results for the average life expectancy across continents. From the two outputs, continent with highest average GDP per capita has the highest average life expectancy and the continent with least average GDP per capita has the least average life expectancy. This bring us to the conclusion that GDP per capita is the primary determinant of life expectancy.
Before running the multivariate linear regression analysis it would be appropriate to test whether each independent variable has a linear association the dependent variable. Consider the scatter plot below
gapminder %>%
filter(gdpPercap < 50000)%>%
ggplot(aes(x=gdpPercap, y=lifeExp, col=continent))+
geom_point(alpha=0.5)+
xlab("GDP Per Capita")+
ylab("Life Expectancy")+
labs(title = "A scatter plot showing the linear association between GDP Per capita and Life Expectancy")
gapminder %>%
filter(gdpPercap < 50000)%>%
ggplot(aes(x=log(gdpPercap), y=lifeExp, col=continent))+
geom_point(alpha=0.5)+
xlab("log of GDP Per Capita")+
ylab("Life Expectancy")+
labs(title = "A scatter plot showing the linear association between GDP Per capita and Life Expectancy")
Th graph above shows a positive linear association between the log of
GDP per capita and life expectancy.
ggplot(data=gapminder,aes(x=pop,y=lifeExp))+
geom_point(alpha=0.7)+
xlab("Population size")+
ylab("Life Expectancy")+
labs(title = "A scatter plot showing the linear association between population size and Life Expectancy")
ggplot(data=gapminder,aes(x=log(pop),y=lifeExp))+
geom_point(alpha=0.7)+
xlab("Log of Population size")+
ylab("Life Expectancy")+
labs(title = "A scatter plot showing the linear association between the log of population size and Life Expectancy")
A multivariate linear regression analysis, is a statistical approach where the effect of more than one independent variables on the dependent variables is assessed. In this paper, we assessed the impacts of GDP per capita and population size on life expectancy. Consider the following results.
model <- lm(log(lifeExp)~log(gdpPercap)+log(pop), data=gapminder)
summary(model)
##
## Call:
## lm(formula = log(lifeExp) ~ log(gdpPercap) + log(pop), data = gapminder)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.67318 -0.07028 0.01482 0.09189 0.35485
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.501815 0.039451 63.42 <2e-16 ***
## log(gdpPercap) 0.145427 0.002726 53.35 <2e-16 ***
## log(pop) 0.023564 0.002108 11.18 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1395 on 1701 degrees of freedom
## Multiple R-squared: 0.6397, Adjusted R-squared: 0.6392
## F-statistic: 1510 on 2 and 1701 DF, p-value: < 2.2e-16
Results above helped in providing response to our research question. Besides, the results in decinding whether to reject of accept the null hypothesis. According to Schneider (2014) null hypothesis is rejected if and only the p-value associated with the regression coefficient is less than 0.05. From the regression coefficients, we saw that a unit change in GDP per capita results in 0.145427 unit change in life expectancy. The p-value of 2e-16, which is approximately 0.0001 indicates that the effect of GDP per capita on life expectancy is statistically significant as a 1% level of significance. On the other hand, a unit change in population size results in 0.023564 unit change in life expectancy but in the opposite direction and the effect is statistically significant as indicated by the of 4.72e-05, which is approximately 0.0001 at a 1% level of significance. According to Fang and Yang (2019), significance is the probability of committing type I error. that is, the probability of reject the a true hypothesis. Therefore, this test was conducted at a 1% chances of committing type I error. Besides, the p-value in both scenarios are less than 0.01 indicating that both GDP per capita and population size have a statistically significant effect on life expectancy at a 1% level of significance. The regression results gave an adjusted R-squared of 0.6392. This shows that 63.92% variation in life expectancy is explained by GDP per capita and population size. The F-test (1701), p= 2.2e-16 <0.05 on the other show significant results implying that the model is fit for prediction.