Nations Dataset Charts Assignments

Author

Su Thet Hninn

Visualizing Data with the Nations Dataset

Source: https://howmuch.net/articles/the-world-economy-2019

Charts based on the World Bank’s GDP dataset were created using the following procedures.

Chart 1

Load required packages

The required packages, ‘tidyverse’ and ‘RColorBrewer’, have been loaded.

# load required packages
library (tidyverse)
library (RColorBrewer)

Load and process nations dataset

The dataset has been loaded using the ‘read_csv’ command.

# load the dataset
nations <- read_csv("~/Documents/1 - College/DATA 110/2 - Dataset/Nation/nations.csv") 

A new variable has been created in the dataset using the ‘mutate’ function from ‘dplyr’. This variable represents the GDP of each country in trillions of dollars, calculated by multiplying GDP per capita by the population and dividing by a trillion.

# create a new dataset with a new column showing GDP in trillions of dollars 
nations_nv <- mutate(nations, gdp_tn = gdp_percap*population/1000000000000)

The dataset has been checked using the head function to display the first six rows.

# check the dataset 
head (nations_nv)
# A tibble: 6 × 11
  iso2c iso3c country  year gdp_percap population birth_rate neonat_mortal_rate
  <chr> <chr> <chr>   <dbl>      <dbl>      <dbl>      <dbl>              <dbl>
1 AD    AND   Andorra  1996         NA      64291       10.9                2.8
2 AD    AND   Andorra  1994         NA      62707       10.9                3.2
3 AD    AND   Andorra  2003         NA      74783       10.3                2  
4 AD    AND   Andorra  1990         NA      54511       11.9                4.3
5 AD    AND   Andorra  2009         NA      85474        9.9                1.7
6 AD    AND   Andorra  2011         NA      82326       NA                  1.6
# ℹ 3 more variables: region <chr>, income <chr>, gdp_tn <dbl>

Prepare the dataset

A new dataset has been created by selecting four Southeast Asian nations: Malaysia, Indonesia, Thailand, and Singapore, to analyze their economic progress over a span of 24 years.

# filter the data for the desired four nations
ASEAN4 <- nations_nv %>% 
  filter(iso3c == "MYS" | iso3c == "IDN" | iso3c == "THA" | iso3c == "SGP") %>% arrange (year)

I used the summary function to detect NA values in the dataset. However, there is no NA value.

# check the summary of dataset if there is na values
summary (ASEAN4)
    iso2c              iso3c             country               year     
 Length:100         Length:100         Length:100         Min.   :1990  
 Class :character   Class :character   Class :character   1st Qu.:1996  
 Mode  :character   Mode  :character   Mode  :character   Median :2002  
                                                          Mean   :2002  
                                                          3rd Qu.:2008  
                                                          Max.   :2014  
   gdp_percap      population          birth_rate    neonat_mortal_rate
 Min.   : 2894   Min.   :  3047132   Min.   : 9.30   Min.   : 1.10     
 1st Qu.: 6911   1st Qu.: 15025754   1st Qu.:12.75   1st Qu.: 4.00     
 Median :11626   Median : 43242410   Median :17.09   Median : 7.75     
 Mean   :19733   Mean   : 77316923   Mean   :17.57   Mean   :10.29     
 3rd Qu.:23874   3rd Qu.: 96153690   3rd Qu.:21.49   3rd Qu.:16.45     
 Max.   :83689   Max.   :254454778   Max.   :28.22   Max.   :30.30     
    region             income              gdp_tn       
 Length:100         Length:100         Min.   :0.06755  
 Class :character   Class :character   1st Qu.:0.26028  
 Mode  :character   Mode  :character   Median :0.44776  
                                       Mean   :0.63286  
                                       3rd Qu.:0.82798  
                                       Max.   :2.68531  

Make the chart

I created Chart-1 using ggplot with geom_line and geom_point functions, with scale_color_brewer(palette = “Set1”. The x-axis represents years, and the y-axis represents GDP in trillions, illustrating the growth of selected ASEAN economies.

# create the chart using ggplot with geom_line and geom_point

ASEAN4_chart <- ggplot (ASEAN4, aes(x = year, y = gdp_tn, color = country)) +
  geom_line(size = 0.5) + # adjust the size of line
  geom_point(alpha = 0.9, size = 3, pch = 18) + # use the different point shape with pch code
  scale_color_brewer(name = "Selected Nations", palette = "Set1") +
    ylim (0,3) +
  labs(
    x = "Year",
    y = "GDP ($ trillion)",
    title = "ASEAN's Economic Growth: A Rising Tide in Selected Nations",
    caption = "Source: World Bank",
    color = "Selected Nations"
  ) +
  theme_minimal(base_size = 9) +
  theme(
    legend.position = "right",
    legend.title = element_text(size = 10),
    legend.text = element_text(size = 8),
    plot.title = element_text(hjust = 0.6, face = "bold", margin = margin(b = 10, t = 10)), 
    plot.caption = element_text(face = "italic") 
  )
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
# print the chart
print(ASEAN4_chart)

Chart 2

A new dataset has been created by grouping regions and years. The x-axis represents years, and the y-axis represents GDP in trillions, illustrating the growth of regions worldwide.

# create a new data set 
regions <- nations_nv %>% 
  group_by (year, region) %>% 
  summarize (gdp_tn = sum (gdp_tn, na.rm = TRUE)) %>% 
  arrange (year, region)
`summarise()` has grouped output by 'year'. You can override using the
`.groups` argument.

I used the head function to display the first ten rows of the dataset.

head (regions, 10)
# A tibble: 10 × 3
# Groups:   year [2]
    year region                     gdp_tn
   <dbl> <chr>                       <dbl>
 1  1990 East Asia & Pacific         5.52 
 2  1990 Europe & Central Asia       9.36 
 3  1990 Latin America & Caribbean   2.40 
 4  1990 Middle East & North Africa  1.66 
 5  1990 North America               6.54 
 6  1990 South Asia                  1.35 
 7  1990 Sub-Saharan Africa          0.787
 8  1991 East Asia & Pacific         6.03 
 9  1991 Europe & Central Asia       9.71 
10  1991 Latin America & Caribbean   2.55 

I created Chart - 2 using ‘ggplot’ with geom_area with cale_fill_brewer (palette = “Set2”). The x-axis represents years, and the y-axis represents GDP in trillions, illustrating the growth of selected ASEAN economies.

# create the area chart
Region_chart <-ggplot(regions, aes(x = year, y = gdp_tn, fill = region)) +
  geom_area() +
  scale_fill_brewer(name = "Regions", palette = "Set2") +
  labs(
    x = "Year",
    y = "GDP (trillion USD)",
    title = "Trends in Regional GDP (1990 - 2014)",
    caption = "Source: World Bank"
  ) +
  scale_x_continuous(breaks = seq(min(regions$year), max(regions$year), by = 4), 
                     limits = c(min(regions$year), 2015)) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 12, hjust = 0.5, margin = margin(t = 10, b = 10)),
    axis.title.x = element_text(size = 9),
    axis.title.y = element_text(size = 9), 
    plot.caption = element_text(face = "italic")  # Makes the caption italic
  )

# print the chart
print(Region_chart)

Conclusion

From 1990 to 2014, Chart-1 shows that Indonesia exhibited the highest economic growth among the selected four ASEAN countries, followed by Thailand, Malaysia, and Singapore. Singapore’s growth can be attributed to its smaller population size relative to the other nations, allowing for more focused economic development strategies.

Regarding regional GDP growth, as depicted in Chart-2, the East Asia and Pacific region led globally, followed by Europe and Central Asia, Latin America and the Caribbean, the Middle East and North Africa, North America, South Asia, and Sub-Saharan Africa, respectively. These findings underscore the varied economic trajectories and regional dynamics within the global economy during this period.