Nations Dataset Charts Assignment

Load packages

library(tidyverse)
library(RColorBrewer)
library(ggfortify)
library(plotly)
library(GGally)

Set the working directory and load the nations CSV file

setwd("C:/Users/jedi_/Documents/Academic/MC/Datasets")
nations <- read_csv("nations.csv")

Make headers lowercase and remove spaces, then view the data

names(nations) <- tolower(names(nations))
names(nations) <- gsub(" ","",names(nations))
head(nations)
## # A tibble: 6 × 10
##   iso2c iso3c country  year gdp_percap population birth_rate neonat_mortal_rate
##   <chr> <chr> <chr>   <dbl>      <dbl>      <dbl>      <dbl>              <dbl>
## 1 AD    AND   Andorra  1996         NA      64291       10.9                2.8
## 2 AD    AND   Andorra  1994         NA      62707       10.9                3.2
## 3 AD    AND   Andorra  2003         NA      74783       10.3                2  
## 4 AD    AND   Andorra  1990         NA      54511       11.9                4.3
## 5 AD    AND   Andorra  2009         NA      85474        9.9                1.7
## 6 AD    AND   Andorra  2011         NA      82326       NA                  1.6
## # ℹ 2 more variables: region <chr>, income <chr>

View a statistical summary of this dataset

summary(nations)
##     iso2c              iso3c             country               year     
##  Length:5275        Length:5275        Length:5275        Min.   :1990  
##  Class :character   Class :character   Class :character   1st Qu.:1996  
##  Mode  :character   Mode  :character   Mode  :character   Median :2002  
##                                                           Mean   :2002  
##                                                           3rd Qu.:2008  
##                                                           Max.   :2014  
##                                                                         
##    gdp_percap         population          birth_rate    neonat_mortal_rate
##  Min.   :   239.7   Min.   :9.004e+03   Min.   : 6.90   Min.   : 0.70     
##  1st Qu.:  2263.6   1st Qu.:7.175e+05   1st Qu.:13.40   1st Qu.: 6.70     
##  Median :  6563.2   Median :5.303e+06   Median :21.60   Median :15.00     
##  Mean   : 12788.8   Mean   :2.958e+07   Mean   :24.16   Mean   :19.40     
##  3rd Qu.: 17195.0   3rd Qu.:1.757e+07   3rd Qu.:33.88   3rd Qu.:29.48     
##  Max.   :141968.1   Max.   :1.364e+09   Max.   :55.12   Max.   :73.10     
##  NA's   :766        NA's   :14          NA's   :295     NA's   :525       
##     region             income         
##  Length:5275        Length:5275       
##  Class :character   Class :character  
##  Mode  :character   Mode  :character  
##                                       
##                                       
##                                       
## 

Create a new column (variable) showing the GDP of each country in trillions of dollars

nations_gdp <- nations |>
  mutate(gdp_trillions = gdp_percap * population / 10^12)

Filter the data for Malaysia, Singapore, Indonesia, and Thailand - four countries in Southeast Asia.

gdp_trillions_4 <- nations_gdp |>
  filter(country == "Malaysia" | country == "Singapore" | country == "Indonesia" | country == "Thailand")

Plot GDP over time for these four countries.

p1 <- gdp_trillions_4 |> ggplot() +
          geom_point(aes(x = year, y = gdp_trillions, color = country)) + scale_color_brewer(palette = "Set1") +
          geom_line(aes(x = year, y = gdp_trillions, color = country))
p1

Add titles, labels, change the theme, move the legend, and center the title

p1a <- gdp_trillions_4 |> ggplot() +
          geom_point(aes(x = year, y = gdp_trillions, color = country)) +       scale_color_brewer(palette = "Set1") +
          geom_line(aes(x = year, y = gdp_trillions, color = country)) +
  labs(title = "Indonesia's Rising Economy", 
       x = "Year",
       y = "GDP ($ Trillion)",
      color = "Country") +
  theme_classic(base_size = 12) +
  theme(legend.position = c(0.2, 0.7)) +
  theme(plot.title = element_text(hjust = 0.5))

p1a

From about 2005 onwards, Indonesia’s GDP grew at a faster rate than the other countries’, whose rates of growth remained similar through 2015. According to this Wikipedia article, there was a global economic downturn in 2007 which decreased the rate of economic growth in most Southeast Asian countries. Indonesia’s economy was relatively unaffected by this downturn due to strong domestic consumption, which at that time accounted for 75% of Indonesia’s GDP. Poverty rates and unemployment in Indonesia actually decreased during the downturn as a result.

Group GDP ($ trillion) by region and year. Summarize by gdp_trillions, the mutated variable.

regyr_nations_gdp <- nations_gdp |> 
  group_by(region, year) |>
  summarise(gdp = sum(gdp_trillions, na.rm = TRUE))

Plot GDP over time for each region. Add interactivity to compare data across regions simultaneously.

p2 <- regyr_nations_gdp |>
  ggplot(text = paste("Region:", region)) +
  geom_line(aes(x = year, y = gdp)) +
  geom_area(aes(x = year, y = gdp, fill = region), colour = "white") +
  scale_fill_brewer(palette = "Set2") +
  labs(title = "GDP by World Bank Region", 
       x = "Year",
       y = "GDP ($ Trillion)",
      color = "Region") +
  theme_dark(base_size = 11) +
  theme(legend.position = c(0.16, 0.7)) +
  theme(plot.title = element_text(hjust = 0.5))
p2 <- ggplotly(p2)
p2

The East Asia & Pacific region overtook other regions in the 2000s to become the region with the highest GDP. This makes sense because according to the data presented in the first sample chart of this assignment, China became the largest economy in the 2010s. China’s GDP likely constitutes a large share of the GDP of this region. All World Bank regions have shown economic growth over time, which is to be expected as population increases. It would be interesting to look at GDP per capita to see if the data could be interpreted differently.