Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.

Original


Source: This Chart Shows How GDP Determines Unemployment & Wages Over the Past 20 Years.


Objective

The visualisation aims to analyse the GDP growth rate trend in the United States to see whether 6% GDP growth rate is reachable reflecting on the possible correlation between GDP growth rate and the two lagging indicators - unemployment rate and real median household income. It also informs the audience about the health and trajectory of the U.S. economy using historical data from 1997 to 2016.

Due to the U.S being the world’s largest economy and the rest of the worlds’ heavy dependence on the U.S economy, a wide range of audiences are targetted in this analysis. This include:

Table: Targetted audience

Audience Interests
Stock traders and businesses who invest in the stock market (American & international) Investing in / withdrawing investment from the U.S stocks
Investors / venture capitalists To decide if to invest in U.S. businesses
International businesses with trading relationships with American businesses To monitor and manage supply / demand risks
U.S. politicians To credit / discredit government’s economic performance
Other countries’ leaders, finance / trade ministers of other countries To develop a trade relationship with the U.S, monitor and manage supply / demand risks, impacts to the country’s economy
Economists, researchers Due to professional and educational interests
IMF, World Bank, other investment and commercial banks, currency traders Due to U.S. dollar being the world’s reserve currency, the performance of the U.S. economy impacts the dollar value and hence the currency trading

The visualisation chosen had the following three main issues:

  • The choice of variables used to analyse and visualise the objective was inappropriate and thus failed to answer the practical question properly. To analyse if the U.S. can achieve 6% GDP growth rate (co-incident economic indicator), unemployment rate and the real median household income were used - which are both lagging economic indicators. The Co-incident indicator occurs in real-time and informs the state of the economy. Lagging indicator, on the other hand, is an economic statistic that tends to have a delayed reaction to a change in the economic cycle. It shifts after the economy changes and is not useful to predict future changes in the economic process but reflects the impact of the economy’s historical performance. In contrast, A leading indicator is an economic statistic that tends to predict future changes in the economic cycle-for example, the stock market, the housing market, Consumer confidence index (CCI) and so on. Therefore, the visualisation is missing the critical variable(s) - leading indicators of an economy in the analysis. Instead of real median income, leading indicator such as, CCI could have been used to predict how the change in the GDP growth rate is following CCI (including if 6% GDP growth rate is possible to achieve) and how unemployment rate (lagging indicator) follows GDP growth rate.
  • Putting three data series in the dual axes graph introduced confusion points, for example, where the real median income(U.S. dollar in thousand) and jobless rate(%) lines or the real median income(U.S. dollar in thousand) and GDP growth rate(%) lines cross in the graph above. In normal circumstances, cross-over points are relevant graphical information. However, in this visualisation, the variables are not on the same scale and measure. Also, the scales are arbitrary. Thus, the cross-over points have hardly any meaning. Different scales would have them crossed in other places or would not cross at all.
  • It is impractical to stop the temptation of decoding one graph against the other scale in the dual axes graph. The content creator had partially improved this by using different colours that visually associate the graphs with their scales. But often the secondary scale can go unnoticed.

Reference

Code

The following code was used to fix the issues identified in the original.

library(readxl)
library(dplyr)
library(tidyr)
library(colourpicker)
library(ggplot2)
library(dplyr)
library(tidyr)
library(tidyverse)
library(scales)
library(cowplot)
library(plotly)

setwd("C:/Users/tanto/Desktop/Graduate cert-Data Science/Data Visualisation/Assignments/Assignment2")

US_eco <- read_excel("Dataset/US_eco.xlsx")
glimpse(US_eco)
## Rows: 22
## Columns: 6
## $ Year            <dbl> 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 200...
## $ `Median Income` <dbl> 57911, 60040, 61526, 61399, 60038, 59360, 59286, 59...
## $ DJIA            <dbl> 7437.576, 8610.202, 10474.777, 10688.042, 10139.927...
## $ CCI             <dbl> 102.07988, 102.23355, 102.39961, 102.57573, 100.329...
## $ GDP_growth      <dbl> 4.4, 4.5, 4.8, 4.1, 1.0, 1.7, 2.9, 3.8, 3.5, 2.9, 1...
## $ Jobless_rate    <dbl> 4.7, 4.4, 4.0, 3.9, 5.7, 6.0, 5.7, 5.4, 4.9, 4.4, 5...
US_gather <- gather(US_eco,
                    key=variables, value=value, c(4:6)) 
glimpse(US_gather)
## Rows: 66
## Columns: 5
## $ Year            <dbl> 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 200...
## $ `Median Income` <dbl> 57911, 60040, 61526, 61399, 60038, 59360, 59286, 59...
## $ DJIA            <dbl> 7437.576, 8610.202, 10474.777, 10688.042, 10139.927...
## $ variables       <chr> "CCI", "CCI", "CCI", "CCI", "CCI", "CCI", "CCI", "C...
## $ value           <dbl> 102.07988, 102.23355, 102.39961, 102.57573, 100.329...
US_gather$variables %>% unique()
## [1] "CCI"          "GDP_growth"   "Jobless_rate"
US_gather$variables <- factor(US_gather$variables,
                            levels = c("CCI","GDP_growth", "Jobless_rate"), 
                            labels = c("Consumer\nConfidence\nIndex",
                                "GDP Growth\nRate %",
                                "Unemployment\nRate %"))
                                       
p1 <-ggplot(data = US_gather, aes(x = Year , y = value, color = variables))

my_breaks <- function(x) { if (min(x) < 0) seq(-3, 5, 2) else if (min(x) > 0 & min(x) < 10) seq(3, 11, 2) else seq(97, 103, 2) }
my_limit <- function(x) { if (min(x) < 0) c(-3, 5)  else if (min(x) > 0 & min(x) < 10) c(3, 11) else c(97, 103) }

p2 <- p1 + geom_line( size = 1 ) +  
  facet_grid(variables ~ ., switch = "y", scales = "free", labeller = label_value) +
  labs(title = "The Relationship Among Consumer Confidence Index, GDP Growth Rate &\nUnemployment Rate of U.S. in Past 22 Years Since 1997", 
       subtitle = "GDP measured in billions of chained 2012 U.S. dollars.", 
       caption = "Data Sources:\nOECD, 2020.\nBureau of Economic Analysis, 2020.\nU.S. Bureau of Labor Statistics, 2020.",
       y = "") +
      scale_x_continuous(breaks=c(1997:2018), 
                         limits=c(1997, 2018)) + 
      scale_y_continuous(breaks = my_breaks,
                          limits = my_limit) + 
      theme_light() +
      theme( axis.text.x=element_text(angle=90,hjust=1),
             plot.title = element_text(size = 13, face="bold"), 
             plot.subtitle = element_text(size=10),
             legend.position="none",
             strip.text.y = element_text(size=8, face="bold"),
             plot.caption = element_text(hjust = 0, size = 8),
             plot.title.position = "plot",
             plot.caption.position =  "plot") +
      scale_color_manual(values = c( "Consumer\nConfidence\nIndex" = "#0000CD","GDP Growth\nRate %" = "#31a354", "Unemployment\nRate %" = "#EE1289"))+
      geom_point()

Data Reference

Reconstruction

The following plot fixes the main issues in the original.