Human endeavors, more than any other time in history have made awesome progress in technology and science. Twenty-first century’s rapid development and deployment of a COVID-19 vaccination to virtual reality commerce in the metaverse, where avatars of our likeness meet in real-time with business associates across the world. Despite the advancements in technology, much of humanity has been left behind. The most basic of human needs are not being met. Clean water, sanitary living conditions, nutritious food,health care,and even the COVID-19 vaccinations are not accessible to most of the world.1
Most of us, so unaware of what occurs within our own neighborhoods, that lack of knowledge of the worlds beyond our own zip codes is common place. Facts appear as murky and vague.
For this project, I therefore draw from the Gapminder database of global macroeconomic and public health indicators to explore and analyse the progression of 142 countries on 5 continents,from 1952 to 2007, in 5 year increments. Specifically, I will provide an overview of the data set and then focus on two key indicators of a country’s wealth: life expectancy and GDP-per capita(Gross domestic product per capita).
Data for this project originates from the Gapminder Foundation, a Swedish based non-profit whose mission, “is to fight devastating ignorance with a fact-based worldview everyone can understand.”2 This dataset is a subset of the Gapminder database and is imported from the Gapminder library in R. The data was collected over a 55 year period from 1952 to 2007 in 5 year increments. It is comprised of a subset of key indicators of 142 countries on 5 continents macroeconomic and public health,specifically, GDP per capita(Gross domestic product-per Capita),life expectancy from birth and population.
#importing the Gapminder data for analysis
library(gapminder)
data(gapminder)
view(gapminder)
To learn about the dataset we take a snapshot of it, such as column names, data dimensions, and a statistical summary comprised of min,max,median,mean and interquartile range.
#**summary** , **colnames** and **dim** functions used for descriptives of dataset
summary(gapminder)
## country continent year lifeExp
## Afghanistan: 12 Africa :624 Min. :1952 Min. :23.60
## Albania : 12 Americas:300 1st Qu.:1966 1st Qu.:48.20
## Algeria : 12 Asia :396 Median :1980 Median :60.71
## Angola : 12 Europe :360 Mean :1980 Mean :59.47
## Argentina : 12 Oceania : 24 3rd Qu.:1993 3rd Qu.:70.85
## Australia : 12 Max. :2007 Max. :82.60
## (Other) :1632
## pop gdpPercap
## Min. :6.001e+04 Min. : 241.2
## 1st Qu.:2.794e+06 1st Qu.: 1202.1
## Median :7.024e+06 Median : 3531.8
## Mean :2.960e+07 Mean : 7215.3
## 3rd Qu.:1.959e+07 3rd Qu.: 9325.5
## Max. :1.319e+09 Max. :113523.1
##
colnames(gapminder)
## [1] "country" "continent" "year" "lifeExp" "pop" "gdpPercap"
dim(gapminder)
## [1] 1704 6
The dataset contains 1704 observations and 6 variables.
Since this is a real-world dataset it may suffer from the problem of missing data. I use the skim and focus functions to ascertain if this is the case.
# **skim** function provides a snapshot the dataset such as means, missing variables,dimensions and frequency of column type.
gapminder %>%
skim() %>%
focus(n_missing, numeric.mean)
| Name | Piped data |
| Number of rows | 1704 |
| Number of columns | 6 |
| _______________________ | |
| Column type frequency: | |
| factor | 2 |
| numeric | 4 |
| ________________________ | |
| Group variables | None |
Variable type: factor
| skim_variable | n_missing |
|---|---|
| country | 0 |
| continent | 0 |
Variable type: numeric
| skim_variable | n_missing | mean |
|---|---|---|
| year | 0 | 1979.50 |
| lifeExp | 0 | 59.47 |
| pop | 0 | 29601212.32 |
| gdpPercap | 0 | 7215.33 |
There are no missing factors or numeric variables.
For easier readability and more insight into the data, it is converted to tabular format.
#DT function used to convert data to table
library(DT)
datatable(data = gapminder,
rownames = FALSE,
filter ="top",
options = list(autoWidth = TRUE))
To gain further insight I look at the first five rows, and last five rows of the dataset.
#head displays the first five rows of the data set
head(gapminder,5)
## # A tibble: 5 x 6
## country continent year lifeExp pop gdpPercap
## <fct> <fct> <int> <dbl> <int> <dbl>
## 1 Afghanistan Asia 1952 28.8 8425333 779.
## 2 Afghanistan Asia 1957 30.3 9240934 821.
## 3 Afghanistan Asia 1962 32.0 10267083 853.
## 4 Afghanistan Asia 1967 34.0 11537966 836.
## 5 Afghanistan Asia 1972 36.1 13079460 740.
# tail displays the last five rows of data set
tail(gapminder,5)
## # A tibble: 5 x 6
## country continent year lifeExp pop gdpPercap
## <fct> <fct> <int> <dbl> <int> <dbl>
## 1 Zimbabwe Africa 1987 62.4 9216418 706.
## 2 Zimbabwe Africa 1992 60.4 10704340 693.
## 3 Zimbabwe Africa 1997 46.8 11404948 792.
## 4 Zimbabwe Africa 2002 40.0 11926563 672.
## 5 Zimbabwe Africa 2007 43.5 12311143 470.
We see that each observation of Gapminder represents a unique pair of a country and a year in 5 year increments. For instance, the first observation represents country statistics for Afghanistan in the year 1952, the second for Afghanistan in 1957 and so on.
For each combination of a country and a year the dataset also contains specific variables describing a country’s demographics over time also in 5 year increments. We see that in the continent Asia,the life expectancy in years, the population and the GDP per capita. The GDP per capita is measured as a country’s total economic output per person. Its calculated by dividing the GDP of a country by its total population. Its a common measure of a country’s wealth.10
Each variable is seen as one consistent data type, thus some are numeric such as life expectancy, population,year and GDP per capita. Where as some are factors or categorical such as continent and country.
Tidying the data
Here,I rename the variables year,pop, and gdpPercapfor better clarity and insight.
#renaming data set for more clarity and insight
gapminder_rename<-gapminder %>%
rename(five_year_increment=year)%>%
rename(total_country_population=pop)%>%
rename(gdpPerCapita=gdpPercap)
gapminder_rename<-gapminder_rename %>%
rename(gdpPerCap_US_Dollars=gdpPerCapita)
I check again for any missing data within renamed dataset.
#checking for missing data
gapminder_rename %>%
is.na() %>%
sum()
## [1] 0
Now that Gapminder data has been imported,cleaned and tidied it can be analysed. I begin with general questions of the data. Such as median life expectancy of each continent,and median gross national income per capita from 1952-2007.
The variables I focus on are:
The aggregate function tells us the median life expectancy of each continent from 1952-2007.
# lifeExp disaggregated with continent for median lifeExp of continents
aggregate(lifeExp ~ continent, gapminder_rename, median)
## continent lifeExp
## 1 Africa 47.7920
## 2 Americas 67.0480
## 3 Asia 61.7915
## 4 Europe 72.2410
## 5 Oceania 73.6650
We see that the African continent has the shortest median life expectancy at 47 whereas Oceania has the longest at 73.
I next want to determine the top 50 countries in the 80th percent_rank for life_expectancy in 1972.
#using percent_rank, top_n and mutate functions to determine top 20 countries in the 80th percentile in 2002.
gapminder %>%
filter(year==2002) %>%
mutate(percent_rank=ntile(lifeExp,100)) %>%
filter(percent_rank>80) %>%
arrange(desc(percent_rank)) %>%
top_n(20,wt=percent_rank) %>%
select(continent,country,lifeExp,percent_rank)
## # A tibble: 20 x 4
## continent country lifeExp percent_rank
## <fct> <fct> <dbl> <int>
## 1 Asia Japan 82 100
## 2 Asia Hong Kong, China 81.5 99
## 3 Europe Switzerland 80.6 98
## 4 Europe Iceland 80.5 97
## 5 Oceania Australia 80.4 96
## 6 Europe Italy 80.2 95
## 7 Europe Sweden 80.0 94
## 8 Europe Spain 79.8 93
## 9 Americas Canada 79.8 92
## 10 Asia Israel 79.7 91
## 11 Europe France 79.6 90
## 12 Oceania New Zealand 79.1 89
## 13 Europe Norway 79.0 88
## 14 Europe Austria 79.0 87
## 15 Asia Singapore 78.8 86
## 16 Europe Germany 78.7 85
## 17 Europe Netherlands 78.5 84
## 18 Europe United Kingdom 78.5 83
## 19 Europe Finland 78.4 82
## 20 Europe Belgium 78.3 81
This analysis reveals that the countries in the top 80th percent_rank were primarily located in Europe, Oceania and Asia. Only one country,Canada, appeared from the Americas, from 2002.
For comparison I calculate the countries in the lower 20th percentile for life expectancy, during the same year.
#using percent_rank, top_n and mutate functions to determine bottom 20 countries in the 20th percentile in 2002.
gapminder %>%
filter(year==2002) %>%
mutate(percent_rank=ntile(lifeExp,100)) %>%
filter(percent_rank<20) %>%
arrange(percent_rank) %>%
top_n(-20,wt=percent_rank) %>%
select(continent,country,lifeExp,percent_rank)
## # A tibble: 20 x 4
## continent country lifeExp percent_rank
## <fct> <fct> <dbl> <int>
## 1 Africa Zambia 39.2 1
## 2 Africa Zimbabwe 40.0 1
## 3 Africa Angola 41.0 2
## 4 Africa Sierra Leone 41.0 2
## 5 Asia Afghanistan 42.1 3
## 6 Africa Central African Republic 43.3 3
## 7 Africa Liberia 43.8 4
## 8 Africa Rwanda 43.4 4
## 9 Africa Mozambique 44.0 5
## 10 Africa Swaziland 43.9 5
## 11 Africa Congo, Dem. Rep. 45.0 6
## 12 Africa Lesotho 44.6 6
## 13 Africa Guinea-Bissau 45.5 7
## 14 Africa Malawi 45.0 7
## 15 Africa Nigeria 46.6 8
## 16 Africa Somalia 45.9 8
## 17 Africa Botswana 46.6 9
## 18 Africa Cote d'Ivoire 46.8 9
## 19 Africa Burundi 47.4 10
## 20 Africa Uganda 47.8 10
This analysis reveals that countries in the lower 20 percent_rank in 2002 were located primarily in Africa. Afghanistan located in Asia was also in the lower 20 percent rank.The lower life_expectancy in these countries was primarily attributed to wars.12
The aggregate function shows us the Gross Income per capita (US $) of each continent from 1952-2007.
# gdpPercap disaggregated with continent for median gdpPercap of continents
aggregate(gdpPerCap_US_Dollars~ continent, gapminder_rename,median)
## continent gdpPerCap_US_Dollars
## 1 Africa 1192.138
## 2 Americas 5465.510
## 3 Asia 2646.787
## 4 Europe 12081.749
## 5 Oceania 17983.304
We see that the African continent has the lowest median gdpPercap at $1192.14 whereas Oceania has the highest $17,983. The continent of Africa has consistently trended lower for median GDP per capita. This brings up another question of the data. Since we know the analysis suggests that gdpPercapita leads to longer life expectancy, does total GDP contribute to longer life expectancy? To tease this insight from the data we can derive a new column totalGDP using the mutate function.
#filter and mutate functions used to derive a new column **total GDP**
gapminder %>%
filter(year>1950) %>%
mutate(totalGDP=pop*gdpPercap)
## # A tibble: 1,704 x 7
## country continent year lifeExp pop gdpPercap totalGDP
## <fct> <fct> <int> <dbl> <int> <dbl> <dbl>
## 1 Afghanistan Asia 1952 28.8 8425333 779. 6567086330.
## 2 Afghanistan Asia 1957 30.3 9240934 821. 7585448670.
## 3 Afghanistan Asia 1962 32.0 10267083 853. 8758855797.
## 4 Afghanistan Asia 1967 34.0 11537966 836. 9648014150.
## 5 Afghanistan Asia 1972 36.1 13079460 740. 9678553274.
## 6 Afghanistan Asia 1977 38.4 14880372 786. 11697659231.
## 7 Afghanistan Asia 1982 39.9 12881816 978. 12598563401.
## 8 Afghanistan Asia 1987 40.8 13867957 852. 11820990309.
## 9 Afghanistan Asia 1992 41.7 16317921 649. 10595901589.
## 10 Afghanistan Asia 1997 41.8 22227415 635. 14121995875.
## # ... with 1,694 more rows
To further test my assumption that life expectancy is tied to GDP, I will calculate the percent_rank using the lifeExp variable. This will reveal the life expectancy percent_rank, and will be sorted in descending order.
#Testing assumption that lifeExp is related to GDP_PerCapita.
gapminder %>%
filter(year>1950) %>%
mutate(percent_rank=ntile(lifeExp,100)) %>%
arrange(desc(gdpPercap))
## # A tibble: 1,704 x 7
## country continent year lifeExp pop gdpPercap percent_rank
## <fct> <fct> <int> <dbl> <int> <dbl> <int>
## 1 Kuwait Asia 1957 58.0 212846 113523. 45
## 2 Kuwait Asia 1972 67.7 841934 109348. 64
## 3 Kuwait Asia 1952 55.6 160000 108382. 40
## 4 Kuwait Asia 1962 60.5 358266 95458. 50
## 5 Kuwait Asia 1967 64.6 575003 80895. 57
## 6 Kuwait Asia 1977 69.3 1140357 59265. 68
## 7 Norway Europe 2007 80.2 4627926 49357. 99
## 8 Kuwait Asia 2007 77.6 2505559 47307. 96
## 9 Singapore Asia 2007 80.0 4553009 47143. 99
## 10 Norway Europe 2002 79.0 4535591 44684. 98
## # ... with 1,694 more rows
In just the first 10 observations of this dataframe we see the top countries sorted by GDP per capita and their respective life expectancy percent_rank on the right of the data frame have increased as GDP grew. This is especially evident with the country Kuwait. Kuwait went from life expectancy in the 40th percentile in 1952 to the 96th percentile in 2007. I conducted external research on Kuwait for that time frame and learned they experienced an increase to GDP per capita, when it went through nearly forty years of modernization, from 1946 to 1982 due to the discovery of commercial oil reserves.5
Before I can verify any claims that GDP per capita is correlated to life expectancy, I also must look at the percent_rank of countries with the lowest GDP per_capita.
#ascending percent_rank of countries with low GDP_per capita from 1952
gapminder %>%
filter(year>1950) %>%
mutate(percent_rank=ntile(lifeExp,100)) %>%
arrange(gdpPercap)
## # A tibble: 1,704 x 7
## country continent year lifeExp pop gdpPercap percent_rank
## <fct> <fct> <int> <dbl> <int> <dbl> <int>
## 1 Congo, Dem. Rep. Africa 2002 45.0 55379852 241. 18
## 2 Congo, Dem. Rep. Africa 2007 46.5 64606759 278. 22
## 3 Lesotho Africa 1952 42.1 748747 299. 12
## 4 Guinea-Bissau Africa 1952 32.5 580653 300. 1
## 5 Congo, Dem. Rep. Africa 1997 42.6 47798986 312. 13
## 6 Eritrea Africa 1952 35.9 1438760 329. 3
## 7 Myanmar Asia 1952 36.3 20092996 331 3
## 8 Lesotho Africa 1957 45.0 813338 336. 18
## 9 Burundi Africa 1952 39.0 2445618 339. 6
## 10 Eritrea Africa 1957 38.0 1542611 344. 5
## # ... with 1,694 more rows
#ascending percent_rank of countries with low GDP_per capita in 2007
gapminder %>%
filter(year==2007) %>%
mutate(percent_rank=ntile(lifeExp,100)) %>%
arrange(gdpPercap)
## # A tibble: 142 x 7
## country continent year lifeExp pop gdpPercap percent_rank
## <fct> <fct> <int> <dbl> <int> <dbl> <int>
## 1 Congo, Dem. Rep. Africa 2007 46.5 6.46e7 278. 7
## 2 Liberia Africa 2007 45.7 3.19e6 415. 5
## 3 Burundi Africa 2007 49.6 8.39e6 430. 10
## 4 Zimbabwe Africa 2007 43.5 1.23e7 470. 4
## 5 Guinea-Bissau Africa 2007 46.4 1.47e6 579. 6
## 6 Niger Africa 2007 56.9 1.29e7 620. 18
## 7 Eritrea Africa 2007 58.0 4.91e6 641. 19
## 8 Ethiopia Africa 2007 52.9 7.65e7 691. 14
## 9 Central African Repub~ Africa 2007 44.7 4.37e6 706. 5
## 10 Gambia Africa 2007 59.4 1.69e6 753. 21
## # ... with 132 more rows
Looking at the lower percent_rank and GDP_per capita of countries does reveal that these countries indeed have lower life expectancy compared to countries with higher GDP.
Here, a more granular analysis is conducted comparing two countries,South Africa and Ireland focusing on life expectancy of both countries. I choose South Africa because it has recently been in the headlines as the country where the Omicron variant of COVID-19 originated,and has one of the lowest vaccination rates in the world.3 By contrast Ireland was choosen because it has one of the highest vaccination rates in the world.4
For comparison between South Africa and Ireland,I analyze which country has the longest life expectancy from birth.
#Here I use the select, filter and group_by functions to compare the lifeExp of 2 countrys.
gapminder_rename%>%
select(country,lifeExp)%>%
filter(country=="Ireland"|
country=="South Africa")%>%
group_by(country)%>%
summarise(Average_life=mean(lifeExp))
## # A tibble: 2 x 2
## country Average_life
## <fct> <dbl>
## 1 Ireland 73.0
## 2 South Africa 54.0
We observe that Ireland at 73 years has a life expectancy of nearly 20 years greater than South Africa at 54 years. This difference posits a question, is this statistically significant? To make this determination I conduct a t-test.
A t-test is run to determine whether the difference in life expectancy between South Africa and Ireland are statistically significant.
# A new data frame gapminder_t is created to perform a t-test.
gapminder_t<-gapminder_rename%>%
select(country,lifeExp)%>% #**select** narrows down the columns we work with
filter(country=="South Africa"|
country=="Ireland") #**filters** out a specific country.
The t-test is used to compare average life expectancy between South Africa and Ireland.
# Apply a t-test to the newly created data frame.
t.test(data=gapminder_t,lifeExp~country)
##
## Welch Two Sample t-test
##
## data: lifeExp by country
## t = 10.067, df = 19.109, p-value = 4.466e-09
## alternative hypothesis: true difference in means between group Ireland and group South Africa is not equal to 0
## 95 percent confidence interval:
## 15.07022 22.97794
## sample estimates:
## mean in group Ireland mean in group South Africa
## 73.01725 53.99317
T-Test analysis
From this analysis we see the average life expectancy from 1952 -2007 in South Africa is 54 years and Ireland is 73 years respectively. Thus, our observed difference is 20 years, so our question is if there isn’t actually a difference in the average life expectancy between these two countries which is 0 and is the null hypothesis, then what are the chances from our sample that we would get the difference we observed?
The chances of that happening or probability is the p-value which is 4.466e-09 and is extremely close to 0. Because it is so close to 0 and extremely unlikely, then we can reject the null hypothesis. The assumption that the difference between the means of the countries is 0 and we can accept the only alternative, which is that the difference between the means is not 0 and that there is a real difference. To support this we have a 95 percent confidence interval for where we think the actual difference is likely to be,which is between 15 and 22 years.
After the conducting t-test I am interested if life expectancy in South Africa,despite being low, is on an upward trajectory. I use the years 1997,2002 and 2007.
#filter for South African lifeExp over 10 years
gapminder_rename %>%
filter(
continent == "Africa",
country == "South Africa",
five_year_increment %in% c(1997, 2002, 2007)
)
## # A tibble: 3 x 6
## country continent five_year_increm~ lifeExp total_country_p~ gdpPerCap_US_Do~
## <fct> <fct> <int> <dbl> <int> <dbl>
## 1 South A~ Africa 1997 60.2 42835005 7479.
## 2 South A~ Africa 2002 53.4 44433622 7711.
## 3 South A~ Africa 2007 49.3 43997828 9270.
After running the chunk I am surprised to see that life expectancy of South Africa is on a downward trajectory, going from 60 to 49 in 10 years, despite an increase of population from 1997 to 2002. However,there is a decrease in population by 435,794 between the years 2002 and 2007. After conducting external research for that time period,I determined the reduction of life expectancy was primarily attributed to the HIV/AIDS epidemic in South Africa.6
I run the same analysis for Ireland to determine if life expectancy is going down from 1997-2007.
gapminder_rename %>%
filter(
continent == "Europe",
country == "Ireland",
five_year_increment %in% c(1997, 2002, 2007)
)
## # A tibble: 3 x 6
## country continent five_year_increm~ lifeExp total_country_po~ gdpPerCap_US_Do~
## <fct> <fct> <int> <dbl> <int> <dbl>
## 1 Ireland Europe 1997 76.1 3667233 24522.
## 2 Ireland Europe 2002 77.8 3879155 34077.
## 3 Ireland Europe 2007 78.9 4109086 40676.
Life expectancy in Ireland is in an upward trajectory, from 76 years to almost 79 years during a 10 year window. Total population is also on an upward trend.
To gain insight into the data, it is be helpful to represent data visually for ease of comprehension and understanding of the dataset.
For the Gapminder dataset, I start with a boxplot. It effectively displays the distribution of the continuous variable lifeExp (Life Expectancy from Birth) against the continent variable.
## Boxplot showing distribution of the life expectancy data dis-aggregated by the continent data.
boxplot(lifeExp~continent,data= gapminder_rename,main="Life Expectancy from Birth Continent Data from 1952-2007",xlab = "Continent", ylab="(lifeExp) Life Expectancy from Birth",col="skyblue")
We observe the mean life expectancy for the African continent is approximately 47 years compared to Oceania which has a mean life expectancy of approximately 76 years of age.
I now want to take a look the relationship between two numeric variables lifeExp as a dependent variable on the y-axis and gdpPerCap_US_Dollars as an independent variable on the x-axis using a scatter plot. However, when I first run the data, the plot isn’t particularly linear, and is very skewed, therefore, I try a log transformation.11 The log transformation is useful in that it normalizes highly skewed data.
#plot of numeric variables lifeExp and gdpPerCap
plot(lifeExp~gdpPerCap_US_Dollars,data = gapminder_rename,col="green",main="Life Expectancy as a function of GDP-PerCapita")
#variables lifeExp and gdpPerCap plot with log transformation
plot(lifeExp~log(gdpPerCap_US_Dollars),data = gapminder_rename,col="green",main="Life Expectancy as a Function of GDP PerCapita - Log Scale")
This plot shows that countries with a higher GDP per capita seem to have a higher life expectancy. This makes sense, because countries with higher GDP have more access to health care,as well as basic necessitates such as clean drinking water,nutritious food, shelter and less internal strife such as war.7
I now look at the 5 continents individually, to determine life expectancy as a function of GDP_perCapita and if there is a correlation.
library(ggplot2)
gapminder %>%
filter(gdpPercap<50000) %>% #filters what data is fed into graph
ggplot(aes(x= log(gdpPercap),y=lifeExp,col=continent,size=pop))+ # determines how the variables will be mapped onto the canvas
geom_point(alpha=0.1)+
geom_smooth(method=lm)+
facet_wrap(~continent)+ #produces 5 continents on a single canvas.
labs(title = "Life Expectancy vs.Gross Domestic Product Per Capita within 5 Continents")
## `geom_smooth()` using formula 'y ~ x'
This faceted graph is representative of Life Expectancy from birth as a function of GDP_per capita across five separate graphs representing five continents. Europe and Oceania both have high life expectancies,averaging 80 years, followed by the Americas and Asia, which are high but more evenly distributed. Africa unfortunately, has a mean life expectancy of 47 years. This outcome coincides with our findings in the previous boxplot.
I wanted to finish this paper on a positive note,therefore my last question of the Gapminder data are which countries have population growth at five times more than the original data, observed initially in 1952 compared to 2007?
The analysis will focus on the span of 55 years from 1952 to 2007. I will define a new data_frame called “population_growth”, then start with the select and filter functions and then calculate between the two selected years. I will then arrange the output such that the countries with the largest population growth will appear first, and decrease from there.
#produces a subset data_frame of countries that experience the most population growth of twice the original data
population_growth<-gapminder_rename %>%
select(country,five_year_increment,total_country_population,continent) %>%
filter(five_year_increment==1952|five_year_increment==2007) %>%
spread(key = five_year_increment,value = total_country_population) %>%
mutate(pop_increment_growth =`2007`-`1952`,
pop_increment_percent = round(pop_increment_growth/`1952`*100,2)) %>%
arrange(desc(pop_increment_percent)) %>%
filter(pop_increment_percent >= 500) %>%
mutate(country=factor(country,levels = country)) %>%
select(-pop_increment_growth)
I now have a new data_frame called population growth. Let’s take a look.
head(population_growth)
## # A tibble: 6 x 5
## country continent `1952` `2007` pop_increment_percent
## <fct> <fct> <int> <int> <dbl>
## 1 Kuwait Asia 160000 2505559 1466.
## 2 Jordan Asia 607914 6053193 896.
## 3 Djibouti Africa 63149 496374 686.
## 4 Saudi Arabia Asia 4005677 27601038 589.
## 5 Oman Asia 507833 3204897 531.
## 6 Cote d'Ivoire Africa 2977019 18013409 505.
tail(population_growth)
## # A tibble: 6 x 5
## country continent `1952` `2007` pop_increment_percent
## <fct> <fct> <int> <int> <dbl>
## 1 Kuwait Asia 160000 2505559 1466.
## 2 Jordan Asia 607914 6053193 896.
## 3 Djibouti Africa 63149 496374 686.
## 4 Saudi Arabia Asia 4005677 27601038 589.
## 5 Oman Asia 507833 3204897 531.
## 6 Cote d'Ivoire Africa 2977019 18013409 505.
glimpse(population_growth)
## Rows: 6
## Columns: 5
## $ country <fct> Kuwait, Jordan, Djibouti, Saudi Arabia, Oman, Co~
## $ continent <fct> Asia, Asia, Africa, Asia, Asia, Africa
## $ `1952` <int> 160000, 607914, 63149, 4005677, 507833, 2977019
## $ `2007` <int> 2505559, 6053193, 496374, 27601038, 3204897, 180~
## $ pop_increment_percent <dbl> 1465.97, 895.73, 686.04, 589.05, 531.09, 505.08
We see that from 1952 to 2007, that Kuwait has experienced an over 1400 percent population growth rate. This is attributed to increased immigration to the country8
Here, I use a table to represent the countries that have experienced population growth five times greater than their original population.
#renaming data_frame for better clarity.
population_growth_rename<-population_growth %>%
rename(percent_increase=pop_increment_percent)
#table that reveals snapshot of data in five rows.
population_growth_rename %>%
knitr::kable(caption = "Table 1: Countries with a Five Fold Population Increase Between 1952 and 2007" )
| country | continent | 1952 | 2007 | percent_increase |
|---|---|---|---|---|
| Kuwait | Asia | 160000 | 2505559 | 1465.97 |
| Jordan | Asia | 607914 | 6053193 | 895.73 |
| Djibouti | Africa | 63149 | 496374 | 686.04 |
| Saudi Arabia | Asia | 4005677 | 27601038 | 589.05 |
| Oman | Asia | 507833 | 3204897 | 531.09 |
| Cote d’Ivoire | Africa | 2977019 | 18013409 | 505.08 |
Here, I use a Lollipop plot to represent which countries that have experienced population growth five times(500%) greater than their original population, from 1952 to 2007. I used the Lollipop plot because it clearly and simply depicts the relationship between a numeric variable (population_growth_percentage) and categorical variable(country).
#plot inspiration^9^
ggplot(population_growth_rename,
aes(x=country,y=percent_increase,color = continent))+
geom_segment( aes(x=country,xend = country,y = 0,yend=percent_increase),
color = "skyblue")+
geom_point(size = 2)+
geom_text(aes(label=paste0(percent_increase,"%")),size = 2.5 ,nudge_y = 130)+
theme_light()+
coord_flip()+
theme(panel.grid.major.y=element_blank(),
panel.grid.minor =element_blank(),
panel.border = element_blank()
)+
labs(title = "Countries with a Five-Hundred Percent Population increase from 1952-2007")+
xlab("")+
ylab("Population Growth from 1952 to 2007 (Percentage)")
This graph clearly shows that Kuwait has had major population growth, also noted in Table 1. It went from a population of 160,000 in 1952 to 2,505,559 in 55 years. This represents an almost 15 times increase.This increase as noted earlier is primarily attributed to an increase in immigration to the region.Immigration represents approximately 70% of the population of Kuwait, while Kuwaiti citizens account for between 28% and 32%. As of 2011 the population growth rate was 1.986%. Expatriates are attracted to Kuwait primarily because of the employment opportunities.13
The Gapminder Global is a fascinating dataset, and the mission of Gapminder Foundation aligned with my sensibilities and values. As a new user of R it was compelling to explore this data,help build my statistical and R skills. That said, all has not been rosey,this has been a steep learning curve.
Until I worked on this project, I didn’t appreciate the impact of the decisions I made regarding the variables. Not just with the Gapminder data set but all data sets. One could easily spend hours working on just a pair of variables, in an attempt to tease insight from the data. As I reflect on this paper I would be interested in analyzing more recent data from 2008-2021, to determine how key country wealth indicators such as GDP and GDP per capita are impacting countries in the twenty-first century, and in the age of COVID-19.
In spite of the challenges, I enjoyed the process of working on this paper. Every decision I made had an impact on the outcomes of my analysis. I will continue to work on the Gapminder data as well as Gapminder adjacent projects, such as life expectancy and health outcomes as a function of per capita income in the United States,so that I may continue grow my skills in Data Science. It is a marathon, not a sprint.
The wealth of a country can not just measured by its GDP but just as important is the quality of life and health of its citizens, shone through the lens of longevity and GDP_PerCapita.
This project examined key disparities in global wealth represented by the key indicators: life expectancy,total country population and GDP_PerCapita. Although my analysis was an overview of the state of these indicators from 1952-2007, it provided much insight. For instance, we saw that countries such as South Africa and the African continent have among the lowest Gdp_PerCapitas and life expectancies globally where as countries such as Kuwait have key indicators that have grown exponentially,due to the commercialization of oil and immigration. A few aspects of this data also surprised me.I was surprised to learn that life expectancy in South Africa from 1997-2002 decreased in part due to HIV/Aids or that Canada was the only country in the Americas in 2002 to have life expectancy from birth in the 80th percentile.
Ultimately, this project taught me the importance of data-driven analysis,methods,choices and techniques needed to gain insight to a dataset; to never make assumptions, or let bias seep in. Discovering the nuances of a data-set and tuning our approach accordingly is just as important as the data it self. It is more important to keep the goal in mind before prioritizing a routine method of achieving it.
1: https://www.kff.org/coronavirus-covid-19/issue-brief/tracking-global-covid-19-vaccine-equity/
2: https://www.gapminder.org/data/documentation/
3: https://www.bbc.com/news/59462647
4: https://www.bbc.com/news/world-europe-58522792
5: https://en.wikipedia.org/wiki/Kuwait
6: https://www.statista.com/statistics/1072248/life-expectancy-south-africa-historical/
7: https://www.visualcapitalist.com/mapped-gdp-per-capita-worldwide/
8: https://en.wikipedia.org/wiki/List_of_countries_by_population_growth_rate
9: https://www.r-graph-gallery.com/301-custom-lollipop-chart/
10: https://www.data.worldbank.org/indicator/NY.GDP.PCAP.CD
11: Grolemund, G., & Wickham, H. (2017). R for Data Science. O’Reilly Media.
12: https://en.wikipedia.org/wiki/List_of_ongoing_armed_conflicts