Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.

Original


Source: AODStats By Turning Point, Treatment Services (2023).


Objective

According to Kirk (2019) a data visualisation is “The visual representation and presentation of data to facilitate understanding”. Using a visualisation as opposed to displaying raw data provides opportunity to improve communication of the data to aid and simplify its interpretation by the viewer.

The data visualisation presented is a count of ‘Episodes of Care’ within Victorian Alcohol and Other Drug (AOD) Treatment Services by financial year. For the purposes of this example, the data presented is the ‘Time Series’ visualisation for financial years 2015 to 2019. The objective of the chart is to provide a number of statistics across the treatment domain.

The target audiences for the information are:

  • AOD Services providers - with the requirement to gain a broader understanding of the trends of treatment within the state.
  • Health agencies - with the requirement for strategic decision making and advice to services ensuring alignment with industry trends.
  • Research bodies - within the AOD field with the requirement to conduct or contribute to research.

In the visualisation from AODStats (2022) the data presented provides an interactive dashboard approach and therefore meeting a non-specific and high-level question around the data domain. This approach can be useful to cater for multiple audiences but can also pose difficulties given the variety of audience roles and proficiencies, and the heterogeneous nature of the objectives, charateristics and preferences for obtaining data (Vázquez-Ingelmo et al. 2021).

One aspect of the data visualisation is the ability to filter and interact with the data, seen by the fields presented at the top of the chart. Assuming the persona of an AOD Service Provider, my requirement would be to better understand the trends for specific substances to compare to trends I may be experiencing in my service. For example, given I am a subject matter expert within the AOD field, I know that Alcohol, Amphetamine, Cannabis and Heroin are the more prevalent primary substances seen within the treatment field.

In order to obtain the trends for these substances, I need to filter and display 4 different visualisations. I would then need to deconstruct each of these in order to understand the side-by-side comparison and trend of these substances over time, making the visualisation ineffective.


Source: AODStats By Turning Point, Treatment Services (2022).


When critiquing data visualisations the Trifecta Checkup (Junk Charts 2019) provides a useful framework for initial evaluation. The checkup asks the following questions -

  1. What is the practical question?
  2. What does the chart say?
  3. What does the data say?

The visualisation selected (despite the interactive nature overall) aims to provide an answer relating to the number of AOD Treatment Episodes by Financial Year. The question is answered but the visualisation is not direct enough in providing the detail. The chart is missing vital elements to provide a quick and accurate answer. As we switch context within the chart, in this example between different Substances, the data changes via the visualisation making further interpretation difficult. Therefore, is it difficult to easily comprehend what the chart says and also what the data says.

The selected visualsation has numerous issues. This critique focuses on the following three:

  1. Truncated y-axis - the Y axis doesn’t start from zero. When comparing visualisations this provides a distorted view of the amounts and the differences between financial years. In particular, when switching between Substance types the starting point for the y axis continues to re-calibrate, providing the audience no context to the actual amounts, only the trends and changes over time.
  2. Poor Labeling - there are no data labels on the axis points making it difficult to easily see the amounts for each financial year on the y axis. If I was to present this visualisation without the labels and to only verbally communicate the amounts to the audience, this would impact the explanation and ability of to quickly assimilate the meaning of what is presented.
  3. No Context - when viewing individual Substances the y-axis on the chart changes and without a side-by-side comparison to other Substances, the context of the actual usage is lost. Losing context means the visualisation provides an unclear perspective of the usages measurement it is trying to convey.

From an ethical and data integrity perspective, the visualisation already outlines limitations observed within the dataset which can be found described at the end of the page. Issues exist such as missing data and inadequate coding and a recommendation is made to only use the data as it was intended, which is to track trends of acute harms within a community population. The dataset also only represents data between 2010 and 2019 which poses a limitation when compared to similar data found at other state represented websites.

Reference

Code

The following code was used to fix the issues identified in the original.

# Load the required libraries
library(dplyr)
library(magrittr)
library(ggplot2)
library(readxl)
library(data.table)
library(tidyverse)
library(scales)

# Load the AOD Data
# Cater for the formatting and also some blank rows and explanations at the bottom of the file
VicAODData <- read_excel("VADC_data_extract_State_Mar2022.xlsx", 
    sheet = "Table 1.1_Victoria", col_names = TRUE, 
    skip = 7, n_max = 3406)
VicAODData = data.table(VicAODData)

# Check the data load
# Check the first and last rows of the dataset to make sure we loaded everything
firstrow <- head(VicAODData, n = 1)
head(firstrow)
##    DRUG CATEGORY DATASET INDICATOR FINANCIAL YEAR CATEGORY
## 1:       Alcohol    VADC   Session           2019    Total
##                                           DESCRIPTION   UNIT Victoria
## 1: Number of VADC sessions for alcohol-related events Number    13990
lastrow <- tail(VicAODData, n = 1)
head(lastrow)
##      DRUG CATEGORY DATASET INDICATOR FINANCIAL YEAR CATEGORY
## 1: Pharmacotherapy    VADC   Session           2010   65+yrs
##                                     DESCRIPTION UNIT Victoria
## 1: Rate of VADC sessions per 100,000 population Rate        0
# Perform some tidy up
# Rename the fields we want to use
colnames(VicAODData)[c(1)] <- c("SUBSTANCE") # Was DRUG CATEGORY
colnames(VicAODData)[c(4)] <- c("FY") # Was FINANCIAL YEAR   
colnames(VicAODData)[c(8)] <- c("VALUE") # Was Victoria

# Make sure the 'VALUE' field is numeric as we want to SUM it
VicAODData$VALUE <- as.double(VicAODData$VALUE)

# Check data
head(VicAODData)
##    SUBSTANCE DATASET INDICATOR   FY CATEGORY
## 1:   Alcohol    VADC   Session 2019    Total
## 2:   Alcohol    VADC   Session 2019     Male
## 3:   Alcohol    VADC   Session 2019   Female
## 4:   Alcohol    VADC   Session 2019  0-19yrs
## 5:   Alcohol    VADC   Session 2019 20-24yrs
## 6:   Alcohol    VADC   Session 2019 25-34yrs
##                                           DESCRIPTION   UNIT VALUE
## 1: Number of VADC sessions for alcohol-related events Number 13990
## 2: Number of VADC sessions for alcohol-related events Number  8486
## 3: Number of VADC sessions for alcohol-related events Number  5307
## 4: Number of VADC sessions for alcohol-related events Number   611
## 5: Number of VADC sessions for alcohol-related events Number   961
## 6: Number of VADC sessions for alcohol-related events Number  2681
# Filter the data set to contain only the Substances we want to view for the comparable Financial Years
substances <- c("Alcohol", "Amphetamines (Any)", "Cannabis", "Heroin (Any)")
financial_years <- c("2015", "2016", "2017", "2018", "2019")
VicAODData_substances <- VicAODData %>% 
  filter(SUBSTANCE %in% substances) %>%
  filter(FY %in% financial_years) 
# Check the data
head(VicAODData_substances)
##    SUBSTANCE DATASET INDICATOR   FY CATEGORY
## 1:   Alcohol    VADC   Session 2019    Total
## 2:   Alcohol    VADC   Session 2019     Male
## 3:   Alcohol    VADC   Session 2019   Female
## 4:   Alcohol    VADC   Session 2019  0-19yrs
## 5:   Alcohol    VADC   Session 2019 20-24yrs
## 6:   Alcohol    VADC   Session 2019 25-34yrs
##                                           DESCRIPTION   UNIT VALUE
## 1: Number of VADC sessions for alcohol-related events Number 13990
## 2: Number of VADC sessions for alcohol-related events Number  8486
## 3: Number of VADC sessions for alcohol-related events Number  5307
## 4: Number of VADC sessions for alcohol-related events Number   611
## 5: Number of VADC sessions for alcohol-related events Number   961
## 6: Number of VADC sessions for alcohol-related events Number  2681
# Create a GROUP that adds up all by SUBSTANCE (DRUG CATEGORY) and FINANCIAL YEAR
Sum_DrugTypes_ByFY <- VicAODData_substances %>% 
  group_by(SUBSTANCE, FY) %>% 
  summarise(max_sum = sum(VALUE))
# Convert the SUM to an Integer so we get rid of the decimal places
Sum_DrugTypes_ByFY$max_sum <- as.integer(Sum_DrugTypes_ByFY$max_sum)

# Check the dataset we want to plot
head(Sum_DrugTypes_ByFY, n=10)
## # A tibble: 10 × 3
## # Groups:   SUBSTANCE [2]
##    SUBSTANCE             FY max_sum
##    <chr>              <dbl>   <int>
##  1 Alcohol             2015   52228
##  2 Alcohol             2016   54490
##  3 Alcohol             2017   46455
##  4 Alcohol             2018   44894
##  5 Alcohol             2019   44003
##  6 Amphetamines (Any)  2015   35347
##  7 Amphetamines (Any)  2016   39812
##  8 Amphetamines (Any)  2017   32921
##  9 Amphetamines (Any)  2018   36683
## 10 Amphetamines (Any)  2019   39157
# Format the data for the main plot
Sum_Treatments_ByFY <- VicAODData_substances %>% 
  group_by(FY) %>% 
  summarise(total_sum = sum(VALUE))
# Convert the SUM to an Integer so we get rid of the decimal places
Sum_Treatments_ByFY$total_sum <- as.integer(Sum_Treatments_ByFY$total_sum)

Data Reference

Reconstruction

The following plot fixes the main issues in the original.

# Set some variables for the plot
line_size = 1.5
point_size = 3

# Generate the plot for the main 'All Substances' chart
p1 <- ggplot(Sum_Treatments_ByFY, aes(x=FY, y=total_sum, label=comma(total_sum)), hjust = 0.5) +
  geom_line(size=line_size) + 
  geom_point(size=point_size) + 
  geom_smooth(method=lm,se=FALSE, linetype = "dashed", size=1) + 
  geom_label(hjust=0.5, vjust=-0.3, colour = "black") + 
  scale_color_brewer(palette="BrBG")  +
  theme_light() +
  theme(axis.title.y=element_text(angle=0)) +
  scale_y_continuous(limits = c(0, 150000), breaks=c(0,25000,50000,75000,100000,125000,150000)) +
  labs(
    title = "Victorian Alcohol and Drug related Episodes of Care (by Financial Year)",
    y = "Count of\nEpisodes\nof Care\nAll Substances",
    x = "Financial Year")
p1

# Generate the plot for the 'Individual Substances' chart
p2 <- ggplot(Sum_DrugTypes_ByFY, aes(x=FY, y=max_sum, group=SUBSTANCE, label=comma(max_sum)), hjust = 0.5) +
  geom_line(aes(colour=SUBSTANCE), size=line_size) + 
  geom_point(size=point_size) + 
  geom_smooth(method=lm,se=FALSE, linetype = "dashed", size=1) + 
  geom_label(hjust=0.5, vjust=-0.3, colour = "black") +
  scale_color_brewer(palette="BrBG")  +
  theme_light() +
  theme(axis.title.y=element_text(angle=0)) +
  scale_y_continuous(limits = c(0, 60000), breaks=c(0,10000,20000,30000,40000,50000,60000)) +
  labs(
    title = "Victorian Alcohol and Drug related Episodes of Care (by Substance)",
    y = "Count of\nEpisodes\nof Care by\nSubstance",
    x = "Financial Year")
p2

Conclusion From the reconstructed visualisations the audience can now immediately see what the chart represents and the actual values presented via the data labels. This improves the overall ‘Reading Tone’ (Kirk 2019) of the visualisation by emphasising the values, particularly in the Substance comparison chart.

The distortion apparent in the original visualisation which exaggerated the change over time is now accurately represented by ensuring the y-axis begins at zero.

The context provided by the Substances chart better conveys the amounts represented by each individual substance and the distortion of change is also now eliminated also.

With the addition of trend lines we can immediately see that all substances except ‘Amphetamines (Any)’ are trending down between the years selected.

References

AODStats By Turning Point, Treatment Services (2022). Victoria State. Retrieved July 22, 2023, from AODstats - Victorian alcohol and drug statistics website: https://aodstats.org.au/explore-data/treatment-services-vadc/

Junk Charts Trifecta Checkup: The Definitive Guide, https://junkcharts.typepad.com/junk_charts/junk-charts-trifecta-checkup-the-definitive-guide.html

Kirk Andy (2019) Data Visualisation: A Handbook for Data Driven Design, 2nd edn, Sage, London.

Vázquez-Ingelmo, A., García-Peñalvo, F.J. & Therón, R. 2021, “Towards a Technological Ecosystem to Provide Information Dashboards as a Service: A Dynamic Proposal for Supplying Dashboards Adapted to Specific Scenarios”, Applied Sciences, vol. 11, no. 7, pp. 3249.