Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.

Original


Source: BullionVault (2019).


Objective

The objective of the data visualisation was to compare the relative performance of a number of asset classes over a 40 year period, from 1978 to 2017.

The target audience was retail investors, in particular those considering investing in gold or other precious metals as well as those interested in the relative performance of various asset classes over a long timespan.

The visualisation chosen had the following three main issues:

  • The position of a particular asset from year to year is not indicative of it’s performance in those two years, instead it’s position is soley a measure of it’s performance relative to the other assets in that single year. Accordingly, as one compares data from one year to the next, an asset could move higher relative to the previous year but may have made a significantly lower return and vice versa. For example, cash was ranked 8th in 1986, then 5th in 1987, however it’s return actually fell slightly, from 5.98% to 5.78% The opportunity to clearly show both the historical performance of each asset and it’s volatility, along with it’s performance relative to other asset classes, are therefore missed in this visualisation.
  • Inflation is included in the visualisation as an asset class. While this provides a consistent baseline to compare against, inflation is not an asset class that one can invest in, it only measures the reduction in buying power of cash or an asset that can be converted to cash. The effect of including it as an asset class only serves to show how asset classes ranked in comparison to it and not the net performance of the asset class. As inflation varies from year to year, the performance of each asset net of inflation cannot be easily seen. It also adds another variable to a visualisation that is already attempting to compare a significant number of asset classes.
  • When the original data sources for this visualisation were checked, a number of values were found to be incorrect, the differences were generally small and were not consistently favouring any particular asset class, so it appears that this is due to either the source data being updated since the visualisation was published, or mistakes being made in the production of the visualisation, rather than an attempt to deliberately deceive the reader. For example, some inconsistencies with the published inflation data (Federal Reserve Bank of Minneapolis 2021) were observed. It also appears that data has been estimated for assets that didn’t exist for part of the time period, for example the ETF used for Non-US Stocks has an inception date of August 14, 2001 (BlackRock Investments 2021), the Index used for Commodities began trading in 1986 (StoneX Financial 2021) and the Index used for Home Prices only includes values from January 31, 1987 (S&P Dow Jones Indices 2021). There are cases in the visualisation where a return was negative but the asset class has been ranked as if the return was positive. For example in 1990, US stocks, Gold and US housing have all been ranked as if their return was positive when in fact it was negative. It appears the issue has arisen because the negative sign is on the line above the number and this has been missed when the visualisation was built.

References

Code

The following code was used to fix the issues identified in the original.

library(ggplot2)
library(readxl)
library(tidyr)

# Load data files
USData = read_excel("USData.xlsx")
Rankings = read_excel("USData-Ranking.xlsx")

# Adjust all values for inflation that year
for (j in c(2:41)) {
  for (i in c(1:4, 6:10)) {
    USData[i, j] <- (USData[i, j] - USData[5, j])
  }
}

# Remove the inflation variable as it is no longer needed
USData_corrected <- USData[-c(5), ]

# Convert data to long format
longData1 <- gather(USData_corrected, key = "Year", value = "Return", "1978":"2017")
longRankings <- gather(Rankings, key = "Year", value = "Rank", "1978":"2017")

# Add rankings to data
for (i in c(1:360)) {
  longData1[i, 4] <- longRankings[i, 3]
}
longData1$Rank <- factor(longData1$Rank, levels = c(1:9), ordered = TRUE)

# Convert the Year variable to numeric type
longData1$Year <- as.numeric(longData1$Year)

# Produce plot
p <- ggplot(data = longData1, aes(x = Year, y = Return, fill = Rank))
p <- p + geom_col(data = longData1, aes(x = Year, y = Return)) +
  scale_fill_brewer(type = "seq", palette = 1, direction = -1, aesthetics = "fill") +
  labs(title = "US Dollar Asset Class Performance",
       subtitle = "1978-2017 annual returns (adjusted for inflation) and asset ranking each year",
       x = "", y = "Return %") +
  facet_grid(. ~ Asset) +
  theme_dark() +
  theme(plot.title = element_text(size = rel(2)),
        plot.subtitle = element_text(size = rel(1.4)),
        legend.title = element_text(face = "bold", size = rel(1.1)),
        legend.margin = unit(0,"mm"),
        strip.background = element_rect(colour = "white", fill = "white"),
        strip.text = element_text(colour = "black", face = "bold", size = rel(1.1)),
        panel.grid.minor = element_blank())

Data References

Reconstruction

The following plot fixes the main issues in the original.