Nations Charts

Author

O. Nseyo

Charting using the Nations Dataset

Using the given data set “Nations” we are going to plot charts using ggplot2

First off we call in our libraries and load in our dataset.

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(plotly)

Attaching package: 'plotly'

The following object is masked from 'package:ggplot2':

    last_plot

The following object is masked from 'package:stats':

    filter

The following object is masked from 'package:graphics':

    layout
library(ggfortify)
library(dplyr)
library(GGally)
Registered S3 method overwritten by 'GGally':
  method from   
  +.gg   ggplot2
Nations <- read_csv('/Users/oworenibanseyo/Desktop/Data 110 2025/Datasets/nations.csv')
Rows: 5275 Columns: 10
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (5): iso2c, iso3c, country, region, income
dbl (5): year, gdp_percap, population, birth_rate, neonat_mortal_rate

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Looking in my environment i see my dataset an know i have 5,275 observables and 10 variables to work with

Lets look at the first few lines

head(Nations)
# A tibble: 6 × 10
  iso2c iso3c country  year gdp_percap population birth_rate neonat_mortal_rate
  <chr> <chr> <chr>   <dbl>      <dbl>      <dbl>      <dbl>              <dbl>
1 AD    AND   Andorra  1996         NA      64291       10.9                2.8
2 AD    AND   Andorra  1994         NA      62707       10.9                3.2
3 AD    AND   Andorra  2003         NA      74783       10.3                2  
4 AD    AND   Andorra  1990         NA      54511       11.9                4.3
5 AD    AND   Andorra  2009         NA      85474        9.9                1.7
6 AD    AND   Andorra  2011         NA      82326       NA                  1.6
# ℹ 2 more variables: region <chr>, income <chr>

Now we want to create a new variable, “GDP”. taking the variable “gdp_percap”, multiplying by “population” then dividing by one trillion

Nations_2 <- Nations |>
  mutate(GDP = (gdp_percap * population) / 1e12)

head(Nations_2)
# A tibble: 6 × 11
  iso2c iso3c country  year gdp_percap population birth_rate neonat_mortal_rate
  <chr> <chr> <chr>   <dbl>      <dbl>      <dbl>      <dbl>              <dbl>
1 AD    AND   Andorra  1996         NA      64291       10.9                2.8
2 AD    AND   Andorra  1994         NA      62707       10.9                3.2
3 AD    AND   Andorra  2003         NA      74783       10.3                2  
4 AD    AND   Andorra  1990         NA      54511       11.9                4.3
5 AD    AND   Andorra  2009         NA      85474        9.9                1.7
6 AD    AND   Andorra  2011         NA      82326       NA                  1.6
# ℹ 3 more variables: region <chr>, income <chr>, GDP <dbl>

Next, we create a new dataframe filtering our 4 countries. China, Germany and the United States

Nations_df <- Nations_2 |>
  filter(country %in% c("China", "Germany", "Japan", "United States")) |>
  filter(!is.na(GDP)) |>
  select(GDP, year, country)

Nations_df
# A tibble: 100 × 3
     GDP  year country
   <dbl> <dbl> <chr>  
 1  1.47  1992 China  
 2  6.59  2005 China  
 3  3.68  2000 China  
 4  1.26  1991 China  
 5 16.6   2013 China  
 6  3.32  1999 China  
 7 18.1   2014 China  
 8  5.07  2003 China  
 9  5.73  2004 China  
10  1.71  1993 China  
# ℹ 90 more rows

Now, we plot out the first chart using geom_point and geom_line.

Plot_1 <- ggplot(Nations_df, aes(x = year, y = GDP, color = country)) +
  labs(title = "China's Rise to Become the Largest Economy",
       x = "year",
       y = "GDP ($ trillion)") +
  theme_minimal(base_size = 12) + 
  geom_line() +
  geom_point() +
  scale_color_brewer(palette = "Set1")
Plot_1

Next as per instruction we will use the group_by function provided in the dplyr library

Regions_df <- Nations_2 |>
  group_by(region, year) |>
  summarise(GDP = sum(GDP, na.rm = TRUE), .groups = "drop")

Regions_df
# A tibble: 175 × 3
   region               year   GDP
   <chr>               <dbl> <dbl>
 1 East Asia & Pacific  1990  5.52
 2 East Asia & Pacific  1991  6.03
 3 East Asia & Pacific  1992  6.50
 4 East Asia & Pacific  1993  7.04
 5 East Asia & Pacific  1994  7.64
 6 East Asia & Pacific  1995  8.29
 7 East Asia & Pacific  1996  8.96
 8 East Asia & Pacific  1997  9.55
 9 East Asia & Pacific  1998  9.60
10 East Asia & Pacific  1999 10.1 
# ℹ 165 more rows

We plot the second chart making it interactive using ggplotly

Plot_2 <- ggplot(Regions_df, aes(x = year, y = GDP, fill=region)) +
  geom_area(color = "white", size = 0.2) + 
  labs(title = "Global GDP Growth by World bank Region",
       x = "Year",
       y = "Total GDP ($ Trillion)") + 
  scale_fill_brewer(palette = "Set2") + 
  theme_minimal(base_size = 12) 
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
ggplotly(Plot_2)