Nations Charts Assignment

Author

Duchelle K

Load the library

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Load the data

setwd("C:/Users/User/Downloads/Data 110 Projects and Assignments")
nations <- read_csv("nations.csv")
Rows: 5275 Columns: 10
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (5): iso2c, iso3c, country, region, income
dbl (5): year, gdp_percap, population, birth_rate, neonat_mortal_rate

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
spec(nations)
cols(
  iso2c = col_character(),
  iso3c = col_character(),
  country = col_character(),
  year = col_double(),
  gdp_percap = col_double(),
  population = col_double(),
  birth_rate = col_double(),
  neonat_mortal_rate = col_double(),
  region = col_character(),
  income = col_character()
)

Check out the first few lines

head(nations)
# A tibble: 6 × 10
  iso2c iso3c country  year gdp_percap population birth_rate neonat_mortal_rate
  <chr> <chr> <chr>   <dbl>      <dbl>      <dbl>      <dbl>              <dbl>
1 AD    AND   Andorra  1996         NA      64291       10.9                2.8
2 AD    AND   Andorra  1994         NA      62707       10.9                3.2
3 AD    AND   Andorra  2003         NA      74783       10.3                2  
4 AD    AND   Andorra  1990         NA      54511       11.9                4.3
5 AD    AND   Andorra  2009         NA      85474        9.9                1.7
6 AD    AND   Andorra  2011         NA      82326       NA                  1.6
# ℹ 2 more variables: region <chr>, income <chr>

clean the dataset

# check missing values
ifelse(mean(complete.cases(nations)) == 1, "No NA Founded", "Found NA")
[1] "Found NA"
# remove the NA values
nations2 <- nations |>
  filter(!is.na(country) & !is.na(year) & !is.na(gdp_percap) & !is.na(population) & !is.na(birth_rate) & !is.na(neonat_mortal_rate) &  !is.na(region) & !is.na(income))

nations2
# A tibble: 4,328 × 10
   iso2c iso3c country  year gdp_percap population birth_rate neonat_mortal_rate
   <chr> <chr> <chr>   <dbl>      <dbl>      <dbl>      <dbl>              <dbl>
 1 AE    ARE   United…  1991     73037.    1913190       24.6                7.9
 2 AE    ARE   United…  1993     71960.    2127863       22.4                7.3
 3 AE    ARE   United…  2001     83534.    3217865       15.8                5.5
 4 AE    ARE   United…  1992     73154.    2019014       23.5                7.6
 5 AE    ARE   United…  1994     74684.    2238281       21.3                6.9
 6 AE    ARE   United…  2007     75427.    6010100       12.8                4.7
 7 AE    ARE   United…  2004     87844.    3975945       14.2                5.1
 8 AE    ARE   United…  1996     79480.    2467726       19.3                6.4
 9 AE    ARE   United…  2006     82754.    5171255       13.3                4.9
10 AE    ARE   United…  2000     84975.    3050128       16.4                5.6
# ℹ 4,318 more rows
# ℹ 2 more variables: region <chr>, income <chr>

Create a new variable in the data

The mutate function creates a new column gdp_in_trillions which calculates the GDP by multiplying gdp_percap (GDP per capita) by population and then dividing by 10^12 to convert it to trillions.

The .after = gdp_percap argument specifies that the new column should be placed immediately after the gdp_percap column.

# create a new variable 
nations3 <- nations2 |>
  mutate(gdp_in_trillions = gdp_percap*population/10^12 , .after = gdp_percap)

nations3
# A tibble: 4,328 × 11
   iso2c iso3c country    year gdp_percap gdp_in_trillions population birth_rate
   <chr> <chr> <chr>     <dbl>      <dbl>            <dbl>      <dbl>      <dbl>
 1 AE    ARE   United A…  1991     73037.            0.140    1913190       24.6
 2 AE    ARE   United A…  1993     71960.            0.153    2127863       22.4
 3 AE    ARE   United A…  2001     83534.            0.269    3217865       15.8
 4 AE    ARE   United A…  1992     73154.            0.148    2019014       23.5
 5 AE    ARE   United A…  1994     74684.            0.167    2238281       21.3
 6 AE    ARE   United A…  2007     75427.            0.453    6010100       12.8
 7 AE    ARE   United A…  2004     87844.            0.349    3975945       14.2
 8 AE    ARE   United A…  1996     79480.            0.196    2467726       19.3
 9 AE    ARE   United A…  2006     82754.            0.428    5171255       13.3
10 AE    ARE   United A…  2000     84975.            0.259    3050128       16.4
# ℹ 4,318 more rows
# ℹ 3 more variables: neonat_mortal_rate <dbl>, region <chr>, income <chr>

Create a dot-and-line chart

The filter function from the dplyr package selects only the rows where the country column matches one of the specified countries: United States, China, Switzerland, or Nigeria.

geom_line(linewidth = 1.2) adds lines to the plot, with a line thickness of 1.2. geom_point(size = 3, alpha = 0.3) adds points to the plot with a size of 3 and an alpha (transparency) of 0.3, making the points semi-transparent.

theme_minimal() applies a minimal theme for a clean and simple look.

scale_y_continuous(limits = c(0, 20)) sets the y-axis to have limits between 0 and 20 trillion dollars. scale_color_brewer(palette = “Set1”) uses the “Set1” palette from the RColorBrewer package for the line colors, providing distinct and visually appealing colors for each country.

ggtitle, xlab, and ylab functions add a title and axis labels to the plot.

nations_chart1 <- nations3 |>
# Filter Data for 4 Specific Countries  
  filter(country %in% c("United States", "China", "Switzerland", "Nigeria")) |> 
ggplot(aes(x= year, y = gdp_in_trillions, color = country)) +
# Add Line and Point Geometries
  geom_line(linewidth = 1.2)+
  geom_point(size = 3, alpha = 0.3) +
# Apply Minimal Theme
  theme_minimal(base_size = 12) +
# Customize Y-Axis and Color Scale
  scale_y_continuous(limits=c(0,20)) +
  scale_color_brewer(palette = "Set1") +
# Add Titles and Labels
  ggtitle("China's Economy is Rising") +
  xlab("Year") + 
  ylab("GDP in $ trillions") +
  labs(caption = "Source: World Bank" )+
# Center Title and Caption
  theme(plot.title = element_text(hjust = 0.5),
        plot.caption = element_text(hjust = 0.5)) # plot.title and plot.caption center both title and caption.

nations_chart1  

I wanted to compare the GDP of some developed countries with the GDP of a large African country like Nigeria on this line chart.To keep the line I was obliged to avoid adding interactivity to the plot with plotly.

I was surprised to see that Switzerland’s GDP is lower compared to China and the United States, even though it is considered one of the richest countries in Europe. Additionally, its GDP is not significantly different from Nigeria’s GDP. In 2013, China was on the same level as the United States, but in 2014 its GDP was higher than the United States’. The rapid economic growth of China might be due to its technological development. Analyzing current data will help determine if China’s economy is already significantly larger than that of the United States. While China and the United States gdp has increased over the years, Switzerland and Nigeria gdp remained constant with a shy increase after 2010.

Create a second chart

The summarise function calculates the sum of gdp_in_trillions for each group (each region and year combination). The na.rm = TRUE argument ensures that any missing values are removed before summing.

The geom_area function creates an area plot, where the areas under the curves are filled. The color = “white” argument outlines the areas with a white border for better visual separation.

scale_fill_brewer(palette = “Set2”) applies a color palette from the RColorBrewer package, providing distinct colors for different regions.

nations_chart2 <- nations |> 
# Group Data by Region and Year
  group_by(region, year) |>
# Calculate GDP in Trillions
   mutate(gdp_in_trillions = gdp_percap*population/10^12 , .after = gdp_percap) |>
# Summarize Data
  summarise(gdp_in_trillions = sum(gdp_in_trillions, na.rm = TRUE)) |>
# Create the area plot
  ggplot(aes(x= year, y= gdp_in_trillions, fill = region)) +
  geom_area(color = "white") +
# Customize Theme and Appearance
  theme_minimal() +
  scale_fill_brewer(palette = "Set2")+
  ggtitle("World GDP Per Region \n (1990-2014)") +
  xlab("Year") + 
  ylab("GDP in $ trillions") +
  labs(caption = "Source: World Bank" ) +
  theme(plot.title = element_text(hjust = 0.5),
        plot.caption = element_text(hjust = 0.5))
`summarise()` has grouped output by 'region'. You can override using the
`.groups` argument.
nations_chart2

The plot shows how the GDP of various regions has changed over time using different colors to signify each region. The area plot helps to understand each region’s contribution to the total world GDP over the years. This visualization is helpful for comparing the economic growth of different regions over time and understanding global economic trends.

The East Asia & Pacific region has experienced rapid GDP growth, especially after 2000, indicating a significant economic boom in countries like China, which has had a major impact on the global economy like highlighted with the first chart. Both North America and Europe & Central Asia have shown constant economic growth, with Europe & Central Asia growth appearing to be more pronounced. In other regions such as South Asia, Latin America & Caribbean, Middle East & North Africa, and Sub-Saharan Africa, there is also growth, but it is not as significant as in East Asia & Pacific or North America; their growth remained low.