Assignment 2

Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.

Original

Source: howmuch.net/articles/the-world-economy-2019 (2020).

Objective

The analysed visualisation took data from datasets from the World Bank, publishing the visualisation on howmuch.net. The website has a goal to provide financial information in reports and data, therefore cultivating collections of data visualisations. With that context, the aim of this report and data visualisation was to show the GDP of the world before the coronavirus pandemic, specifically the dominance of the US economy. This would form a basis for further investigation and comparison with data of future years to plot the course of GDP through the pandemic. The targeted audience appears to be economists and anyone interested in the financial impact of the coronavirus pandemic.

The visualisation chosen had the following three main issues:

The visualisation using small polygons to depict size relies on area to convey numeric data. Area is not a very good method for a reader to quickly and easy decipher the differences between different country GDPs. This is especially difficult given the polygons are of inconsistent shapes, as simple squares would be easier if choosing area. Identical values for ‘Singapore’ and ‘Hong Kong’ of $0.37T have completely different shapes representing these values.
The text of the data labels is poorly formatted, splitting words like ‘Switzerland’ across multiple lines, making it difficult to read. Inconsistencies are also seen with the ‘Australia’ label being the only text on an angle, as well as the different colours for both on top of the graph and outside the graph.
The title orientation was centred instead of left aligned, and furthermore the size of the title text was smaller than the data values for United States. This provides no clear order to the subject matter, increasing difficulty to interpret the important features.

Reference

Irena (2020). The World Economy in One Chart: GDP by Country. Retrieved November 10, 2022, from Howmuch.net website: https://howmuch.net/articles/the-world-economy-2019

Code

The following code was used to fix the issues identified in the original.

library(here)
library(magrittr)
library(readr)
library(dplyr)
library(ggplot2)
library(RColorBrewer)
library(knitr)

# Importing dataset into R
gdp_df <- read_csv(here("data", "API_NY.GDP.MKTP.CD_DS2_en_csv_v2_4701247.csv"), skip = 4)

# Importing second dataser
countrycode <- read_csv(here("data", "Metadata_Country_API_NY.GDP.MKTP.CD_DS2_en_csv_v2_4701247.csv"))

# Viewing variable names
names(gdp_df)

##  [1] "Country Name"   "Country Code"   "Indicator Name" "Indicator Code"
##  [5] "1960"           "1961"           "1962"           "1963"          
##  [9] "1964"           "1965"           "1966"           "1967"          
## [13] "1968"           "1969"           "1970"           "1971"          
## [17] "1972"           "1973"           "1974"           "1975"          
## [21] "1976"           "1977"           "1978"           "1979"          
## [25] "1980"           "1981"           "1982"           "1983"          
## [29] "1984"           "1985"           "1986"           "1987"          
## [33] "1988"           "1989"           "1990"           "1991"          
## [37] "1992"           "1993"           "1994"           "1995"          
## [41] "1996"           "1997"           "1998"           "1999"          
## [45] "2000"           "2001"           "2002"           "2003"          
## [49] "2004"           "2005"           "2006"           "2007"          
## [53] "2008"           "2009"           "2010"           "2011"          
## [57] "2012"           "2013"           "2014"           "2015"          
## [61] "2016"           "2017"           "2018"           "2019"          
## [65] "2020"           "2021"           "...67"

# Joining the 2 datasets together
gdp_df %<>% left_join(countrycode, by = "Country Code")

# Seeing what values have an NA for Region variable
gdp_dftest <- gdp_df[is.na(gdp_df$"Region"),]

# Eliminating aggregate values to see if any important observations would be missed if all NA values for Region were excluded
gdp_dftest <- gdp_dftest[!grepl('aggregate', gdp_dftest$'SpecialNotes'),]

# Remove all values with an NA for Region and GDP of 2019
gdp_df <- gdp_df[!(is.na(gdp_df$'Region')) & !is.na(gdp_df$'2019'),]

# Creating varialbes for total GDP percent of the world
gdp_df %<>% mutate("Percent" = round(100 * gdp_df$'2019' / sum(gdp_df$'2019'), 2))

# Selecting only variables to be used for visualisation
gdp_df %<>% select(c(1,"Region","2019", "Percent"))

# Filtering countries and their values for above 0.32% of Percent variable
gdp_dffinal <- filter(gdp_df, gdp_df$'Percent'>=0.32)

# Create the values to be classed as Rest of the World
othercountries <- c("Rest of the World", "Rest of the World", round(sum(gdp_df$'2019'[gdp_df$'Percent' < 0.32]), 2), round(sum(gdp_df$'Percent'[gdp_df$'Percent' < 0.32]), 2))

# Joining Rest of World values to the Countries with values equal to and above 0.32%
gdp_dffinal %<>% rbind(othercountries)

# Converting 2019 and Percent variables to numeric
gdp_dffinal[c("2019","Percent")] %<>% lapply(as.numeric)

# Converting Region and Country Name variables to factor, to make them easier to deal with in ggplot
gdp_dffinal[c("Region", "Country Name")] %<>% lapply(as.factor)

# Changing GDP values to whole trillion values, with 2 decimal points
gdp_dffinal %<>% mutate("2019" = round(gdp_dffinal$'2019' / 1000000000000, 2)) 

# Changing variable names to be more reflective of the data
names(gdp_dffinal) <- c("Country", "Region", "GDP", "Percentage")

# Creating ggplot
gdp <- ggplot(gdp_dffinal, aes(reorder(Country, GDP), GDP, fill = Region, label = paste0(gdp_dffinal$'Percentage', "%"))) +
  geom_bar(stat = "identity") +
  coord_flip() +
  scale_fill_brewer(palette = "Dark2") +
  labs(x = "Country", y = "GDP ($USD Trillion)") +
  geom_text(hjust = -0.1, size = 3.5) +
  labs(title = "Countries and their Gross Domestic Product in $USD Trillion",
       subtitle = "Ranking Countries' GDP and their Percentage of the World's total GDP") +
  theme(plot.title = element_text(size = 38, face = "bold"), plot.subtitle=element_text(size=34), axis.text = element_text(size = 16), axis.title=element_text(size=30,face="bold"), legend.text=element_text(size=16), legend.title=element_text(size=20))

Data Reference

Irena (2020). The World Economy in One Chart: GDP by Country. Retrieved November 10, 2022, from Howmuch.net website: https://howmuch.net/articles/the-world-economy-2019
The World Bank (2022) GDP (current US$) [data set], The World Bank, accessed 10 December 2021. https://data.worldbank.org/indicator/NY.GDP.MKTP.CD?most_recent_value_desc=true

Reconstruction

The following plot fixes the main issues in the original.

Assignment 2

Deconstruct, Reconstruct Web Report

Ben Frazer (S3958204)

Original

Code

Reconstruction