Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.
Objective
The analysed visualisation took data from datasets from the World Bank, publishing the visualisation on howmuch.net. The website has a goal to provide financial information in reports and data, therefore cultivating collections of data visualisations. With that context, the aim of this report and data visualisation was to show the GDP of the world before the coronavirus pandemic, specifically the dominance of the US economy. This would form a basis for further investigation and comparison with data of future years to plot the course of GDP through the pandemic. The targeted audience appears to be economists and anyone interested in the financial impact of the coronavirus pandemic.
The visualisation chosen had the following three main issues:
Reference
The following code was used to fix the issues identified in the original.
library(here)
library(magrittr)
library(readr)
library(dplyr)
library(ggplot2)
library(RColorBrewer)
library(knitr)
# Importing dataset into R
gdp_df <- read_csv(here("data", "API_NY.GDP.MKTP.CD_DS2_en_csv_v2_4701247.csv"), skip = 4)
# Importing second dataser
countrycode <- read_csv(here("data", "Metadata_Country_API_NY.GDP.MKTP.CD_DS2_en_csv_v2_4701247.csv"))
# Viewing variable names
names(gdp_df)
## [1] "Country Name" "Country Code" "Indicator Name" "Indicator Code"
## [5] "1960" "1961" "1962" "1963"
## [9] "1964" "1965" "1966" "1967"
## [13] "1968" "1969" "1970" "1971"
## [17] "1972" "1973" "1974" "1975"
## [21] "1976" "1977" "1978" "1979"
## [25] "1980" "1981" "1982" "1983"
## [29] "1984" "1985" "1986" "1987"
## [33] "1988" "1989" "1990" "1991"
## [37] "1992" "1993" "1994" "1995"
## [41] "1996" "1997" "1998" "1999"
## [45] "2000" "2001" "2002" "2003"
## [49] "2004" "2005" "2006" "2007"
## [53] "2008" "2009" "2010" "2011"
## [57] "2012" "2013" "2014" "2015"
## [61] "2016" "2017" "2018" "2019"
## [65] "2020" "2021" "...67"
# Joining the 2 datasets together
gdp_df %<>% left_join(countrycode, by = "Country Code")
# Seeing what values have an NA for Region variable
gdp_dftest <- gdp_df[is.na(gdp_df$"Region"),]
# Eliminating aggregate values to see if any important observations would be missed if all NA values for Region were excluded
gdp_dftest <- gdp_dftest[!grepl('aggregate', gdp_dftest$'SpecialNotes'),]
# Remove all values with an NA for Region and GDP of 2019
gdp_df <- gdp_df[!(is.na(gdp_df$'Region')) & !is.na(gdp_df$'2019'),]
# Creating varialbes for total GDP percent of the world
gdp_df %<>% mutate("Percent" = round(100 * gdp_df$'2019' / sum(gdp_df$'2019'), 2))
# Selecting only variables to be used for visualisation
gdp_df %<>% select(c(1,"Region","2019", "Percent"))
# Filtering countries and their values for above 0.32% of Percent variable
gdp_dffinal <- filter(gdp_df, gdp_df$'Percent'>=0.32)
# Create the values to be classed as Rest of the World
othercountries <- c("Rest of the World", "Rest of the World", round(sum(gdp_df$'2019'[gdp_df$'Percent' < 0.32]), 2), round(sum(gdp_df$'Percent'[gdp_df$'Percent' < 0.32]), 2))
# Joining Rest of World values to the Countries with values equal to and above 0.32%
gdp_dffinal %<>% rbind(othercountries)
# Converting 2019 and Percent variables to numeric
gdp_dffinal[c("2019","Percent")] %<>% lapply(as.numeric)
# Converting Region and Country Name variables to factor, to make them easier to deal with in ggplot
gdp_dffinal[c("Region", "Country Name")] %<>% lapply(as.factor)
# Changing GDP values to whole trillion values, with 2 decimal points
gdp_dffinal %<>% mutate("2019" = round(gdp_dffinal$'2019' / 1000000000000, 2))
# Changing variable names to be more reflective of the data
names(gdp_dffinal) <- c("Country", "Region", "GDP", "Percentage")
# Creating ggplot
gdp <- ggplot(gdp_dffinal, aes(reorder(Country, GDP), GDP, fill = Region, label = paste0(gdp_dffinal$'Percentage', "%"))) +
geom_bar(stat = "identity") +
coord_flip() +
scale_fill_brewer(palette = "Dark2") +
labs(x = "Country", y = "GDP ($USD Trillion)") +
geom_text(hjust = -0.1, size = 3.5) +
labs(title = "Countries and their Gross Domestic Product in $USD Trillion",
subtitle = "Ranking Countries' GDP and their Percentage of the World's total GDP") +
theme(plot.title = element_text(size = 38, face = "bold"), plot.subtitle=element_text(size=34), axis.text = element_text(size = 16), axis.title=element_text(size=30,face="bold"), legend.text=element_text(size=16), legend.title=element_text(size=20))
Data Reference
The following plot fixes the main issues in the original.