Recreation of Original Plot

Below is my recreation of this plot from a 2020 HelpMeViz post. It is a time series plot detailing the average household income among different age groups in the Brazilian State of Minas Gerais. The line marked ‘Total’ in the legend is a representation of the average average household income across all age groups. Note that though I used a similar color scale, the line for the 0 to 13 age group is very light; it is not so light in the original, though there are other problems with visibility to be mentioned in my critique. Also note that axis labels and legend elements have been translated from Portuguese to English, which was completed by use of DeepL.

My Version and Critique

I found there to be much room for improvement on the original plot. Below is my version:

The original plot has a few positives, but unfortunately uses some design practices that take away from its usefulness as a visual.

The main issue with the way the data was represented originally was that the author used text objects to designate the values at each point. While it is understandable to want a high level of specificity in your visualization, the data points are close together in some places, making it so they are not easily readable. On top of this, the original plot uses a color palette made up of blues, which, though its use makes it easy to notice the increasing nature of the age groups, makes both the lines and text objects blend into each other, further reducing readability. Chapter 19.1 of the Wilke textbook states: “We should use color to enhance figures and make them easier to read, not to obscure the data by creating visual puzzles”. The original visualization produces one such “visual puzzle”.

To alleviate these problems, I removed the text objects, and in their place re-added tick marks and values along the y-axis as well as the major horizontal gridlines. A little bit of the specificity was lost, but readability was greatly improved. To add to this, I changed the color palette with much more distinct colors that are also colorblind and greyscale-friendly—no more “visual puzzle”. The “Total” line has been made into a dashed line for further accessibility, as I have chosen to retain its blue color, which is similar to the line below it. I have also applied a simple theme that makes the plot more pleasing to look at, while not being distracting from the data at hand.

The original plot attempts to be very minimalist, which is no problem in itself, but the original plot’s removal of gridlines and addition of text objects in their place is distracting and hard to read, which goes against the Principle of Proportional Ink. My redesign is as minimal as possible while still conveying the data and its message effectively.

R Code Appendix

# Setup chunk
library(ggplot2)
library(tidyverse)

# Code for my recreation of the original plot
avgInc <- read_csv("Average-income-by-Age.csv")

ggplot(avgInc, aes(x = Year, y = `Average Income`)) + 
  geom_line(aes(color = Age)) + 
  stat_summary(aes(group = 1, color = 'Total'), fun = 'mean', geom = 'line', size = 1) +       # Obj. for 'Total'
  geom_text(aes(color = Age, label = round((`Average Income` * .001), 2), vjust = -.5)) +      # Text values on points, manipulated and rounded to appropriate values
  scale_color_brewer(palette = 'Blues', labels = c('0 to 13 years', '14 to 17 years',          # Applies color and labels to be used in legend
                                                   '18 to 24 years', '25 to 59 years',         # 'Total' on the end applies to the stat_summary line 
                                                   'More than 60 years', 'Total')) +
  scale_x_continuous(breaks = seq(2012, 2019, by = 1)) +                                       # Making x scale consistent with original
  theme_linedraw() + 
  theme(
    legend.position = 'bottom',
    legend.box = 'horizontal', 
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
    axis.text.y = element_blank()                                                              # Removing tick marks to be consistent with original
  ) +
  labs(
    y = 'Avg. Household Income Per Capita(tsd. R$)', x = 'Year',
    color = '', 
    caption = 'Source: Microdados de PNAD Contínua(2012 to 2019)'
  )

# Code for my own version of the plot
library(ggthemes) # Want to use a special theme
library(paletteer) # And find a color palette that's more discernible while also being colorblind friendly
library(showtext) # Also using different fonts

font_add_google('Alegreya Sans')
showtext_auto()

incomeTsds <- avgInc$`Average Income` * .001  # Likely a much more simple solution for getting values into the thousands than I used on the first plot
ggplot(avgInc, aes(x = Year, y = incomeTsds)) + 
  geom_line(aes(color = Age)) + 
  stat_summary(aes(group = 1, color = 'Total'), fun = 'mean', geom = 'line', linetype = 'dashed', size = 1) + 
  scale_color_paletteer_d('khroma::mediumcontrast', labels = c('0 to 13 years', '14 to 17 years',
                                                   '18 to 24 years', '25 to 59 years',   
                                                   'More than 60 years', 'Total')) +
  scale_x_continuous(breaks = seq(2012, 2019, by = 1)) + 
  scale_y_continuous(breaks = seq(.75, 2.0, by = .25)) + # Adding more room to graph to make top values more readable
  theme_stata() + 
  theme(
    legend.position = 'bottom',
    legend.box = 'horizontal', 
    panel.grid.major.x = element_blank(),
    theme(text = element_text(size=14, family="Alegreya Sans"))
  ) +
  labs(
    title = 'The Average Household Income in Minas Gerais Increases with Age',
    subtitle = 'From 2012 to 2019',
    y = 'Avg. Household Income Per Capita (tsd. R$)', x = 'Year',
    color = '', 
    caption = 'Source: Microdados de PNAD Contínua (2012 to 2019)'
  )