The Gap Minder data set consists of population data for 1,704 countries from Africa, the Americas, Asia, Europe, and Oceania. Life expectancy, population and gross domestic product (GDP) per capita have been recorded from 1952 to 2007, in 5-year intervals. The breath of the Gap Minder data set provides numerous avenues of investigation. For example, if only two variables are selected, there is still 15 possible relationships to consider.
The current data investigation was interested in understanding the economic outcomes within the continent of the Americas. Thus GDP per capita was selected as the primary dependent variable. GDP per capita is the sum of the total economic income of a country divided by the mid-year population. It is a scaled variable that is comparable across year, continent, and country.
To that end, three plots have been created to investigate the relationship between GDP per captia and year (1952 to 2007). The first compares the Americas GDP per captia to all 5 other continents. The second considers the variability of GDP per captia growth for the years of 1997 to 2007 between the different regions of the Americans. Finally, the third plot brakes down the region with the most fluctuating GDP per captia growth, identifying which countries experienced high growth and which experienced low.
For each plot, the corresponding data transformation and plot code is given prior to the plot. The following packages were used to complete this data investigation:
library(gapminder)
#Data set
library(ggplot2)
library(scales)
library(dplyr)
library(tidyverse)
First consider was the global trends of GDP per captia over time. Specifically, how much does the Americas gross domestic product increase over time (1952 - 2007), and how does it compare to other continents?
To achieve this, a line graph was created. Time (1952 - 2007) was the x-axis and GDP per captia (transformed to log 2) along the y-axis. Each continent was represented as a separate line and is identifiable with a symbol.
Global_GDP <- gapminder %>%
ggplot( aes(x = year, y = gdpPercap, shape = continent)) +
stat_summary(fun = mean, geom="point", size=2, color="#F8766D")+
stat_summary(fun = mean, geom="line", size = 1, color = "#F8766D", alpha = .6)+
scale_x_continuous(breaks = seq(1952, 2007, 5))+
theme_bw()+
scale_y_continuous(
trans = log2_trans(),
breaks = trans_breaks("log2", function(x) 2^x),
labels = trans_format("log2", math_format(2^.x))
)+
labs(y= "Gross Domestic Product per capita (Log 2) ", x = "Year")+
labs(color = "Continent")+
guides(fill = guide_legend(reverse = TRUE))
Plot 1 illustrates that the Americas GDP per captia does increase between 1952 to 2007, from a continental mean amount of 4079 to 11003. Moreover, as compared to Africa, Asia, Europe and Oceania, the Americas GDP per captia remains the third highest. Positioning it in the middle between all six continents. Between 1952 to 2007 Oceania’s GDP per captia is consistently the highest and Africa’s GDP per captia is consistently the lowest.
Important to consider however is the spread of data that makes up the Americas mean GDP per captia. While as a whole the Americas mean GDP per captia increases between 1952-2007, there is a large variation in the minimum GDP per captia and maxim GDP per captia for each year. Moreover, the Table 1 shows that in year 2007, the minimum GDP per captia is less then the minimum GDP per captia for 1952.
group_by(gapminder )%>%
filter(continent == "Americas") %>%
group_by(year)%>%
summarize(
n = n(),
mean = mean(gdpPercap),
min = min(gdpPercap),
max = max(gdpPercap))
## # A tibble: 12 × 5
## year n mean min max
## <int> <int> <dbl> <dbl> <dbl>
## 1 1952 25 4079. 1398. 13990.
## 2 1957 25 4616. 1544. 14847.
## 3 1962 25 4902. 1662. 16173.
## 4 1967 25 5668. 1452. 19530.
## 5 1972 25 6491. 1654. 21806.
## 6 1977 25 7352. 1874. 24073.
## 7 1982 25 7507. 2011. 25010.
## 8 1987 25 7793. 1823. 29884.
## 9 1992 25 8045. 1456. 32004.
## 10 1997 25 8889. 1342. 35767.
## 11 2002 25 9288. 1270. 39097.
## 12 2007 25 11003. 1202. 42952.
This variation in the data is to be expected given the geographic scope of the Americans. North America as compared to Central America, South America and the Caribbean is more developed, having much higher GDP per capt. Thus graphing the variability of GDP per captia within the Americas would provide a clearer picture of the differences between regions. Specifically, the GDP per captia growth as a percentage for the last 10 year (1997 - 2007).
To develop this graph, a new data set was created that only included countries of the Americas.
Americas <- gapminder %>%
filter(continent == "Americas" )
A new Region variable was created that classified each country as belonging to either North America, Central America, South America, and the Caribbean.
Refining the data set further, the relevant years (1997 and 2007) and GDP per captia were filtered for. The data was then transformed from long format into wide format, so each country only had one line of data. Next, a new variable for GDP per captia growth, which was the percentage change between 1997 and 2007. This allowed for a comparison to be made between regions while taking into wealth discrepancy.
Americas_GDP <- Americas %>%
filter(year == 1997| year==2007)%>%
select(-lifeExp, -pop, -continent)
Americas_GDP <-spread(Americas_GDP, year, gdpPercap)
Americas_GDP$GDP_Percent <- ((Americas_GDP$`2007`-Americas_GDP$`1997`)/Americas_GDP$`1997`)*100
Finally, a Box Plot was constructed with Region along the x-axis and percent increase of GDP per captia along the y-axis.
Americas_GDP_Plot <-
Americas_GDP %>%
mutate(country = fct_reorder(Region, GDP_Percent)) %>%
ggplot( aes(y=GDP_Percent, x=Region)) +
geom_boxplot(fill = "gray", outlier.shape = NA, alpha=0.3)+
geom_jitter(color="#F8766D", fill="#F8766D", alpha=0.3, width=.13)+
scale_y_continuous(name="Percent Increase of GDP Per Captia", limits=c(-20, 120), breaks = c(-20, 0, 20, 40, 60, 80, 100, 120))+
labs(x = "Region")+
theme_bw()
Plot 2 shows that the Caribbean region had the highest increase of GDP per captia with a median of 40 % between the 1997 and 2007. South America, North America and Central America all grouped around the median of a 20% increase. Importantly, Plot 2 showed that the Caribbean region also has the highest degree of variability within GDP per captia percent increase, with one country doubling its GDP per captia over the 10 year period while another country fell by 10%. The third and final plot identifies which countries within the Caribbean region have experienced high growth and which low.
A LollyPop graph was created to compare GDP per captia percent increase between the different Caribbean countries. Percent increase was mapped onto the x-axis and country was mapped onto the y-axis.
Caribbean <-Americas_GDP %>%
filter(Region == "Caribbean")%>%
mutate(country = fct_reorder(country, GDP_Percent)) %>%
ggplot( aes(x=GDP_Percent, y=country)) +
geom_segment( aes(y=country, yend=country, x=0, xend=GDP_Percent), color="gray", size = 1) +
geom_vline(xintercept = 0)+
geom_point(size=2, color="#F8766D") +
theme_light()+
theme(
panel.grid.major.y = element_blank(),
panel.border = element_blank(),
axis.ticks.x = element_blank()
) +
theme_bw()+
ylab("Country")+
scale_x_continuous(name="GDP Per Captia Percent Change", limits=c(-20, 120), breaks = c(-20, 0, 20, 40, 60, 80, 100, 120))
As shown in Plot 3, Trinidad and Tobago experienced a large growth in GDP per captia, having an over 100% increase between 1997 and 2007. This raises an interesting question as to why Trinidad and Tobago experience such a high degree of growth. Was this the effect of trade agreements, development projects or new government initiatives? Haiti on the other hand suffered a 10% loss in GDP per captia between 1997 and 2007. This too raise questions about possible political or geographic events that may have effected the countries economic growth.
All three plots showcase the variability in continental GDP per captia. Specifically within the continent of the Americas which contains diverse geographical, political and developmental characteristics. Not only do they illustrate the economic discrepancies between each region of the Americas, but also within regions. Furthermore, it has allowed for clear economic anomalies to be identified, which presents further avenues of investigation for other domains in the social sciences.