Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.
Objective
According to Kirk (2019) a data visualisation is “The visual representation and presentation of data to facilitate understanding”. Using a visualisation as opposed to displaying raw data provides opportunity to improve communication of the data to aid and simplify its interpretation by the viewer.
The data visualisation presented is a count of ‘Episodes of Care’ within Victorian Alcohol and Other Drug (AOD) Treatment Services by financial year. For the purposes of this example, the data presented is the ‘Time Series’ visualisation for financial years 2015 to 2019. The objective of the chart is to provide a number of statistics across the treatment domain.
The target audiences for the information are:
In the visualisation from AODStats (2022) the data presented provides an interactive dashboard approach and therefore meeting a non-specific and high-level question around the data domain. This approach can be useful to cater for multiple audiences but can also pose difficulties given the variety of audience roles and proficiencies, and the heterogeneous nature of the objectives, charateristics and preferences for obtaining data (Vázquez-Ingelmo et al. 2021).
One aspect of the data visualisation is the ability to filter and interact with the data, seen by the fields presented at the top of the chart. Assuming the persona of an AOD Service Provider, my requirement would be to better understand the trends for specific substances to compare to trends I may be experiencing in my service. For example, given I am a subject matter expert within the AOD field, I know that Alcohol, Amphetamine, Cannabis and Heroin are the more prevalent primary substances seen within the treatment field.
In order to obtain the trends for these substances, I need to filter and display 4 different visualisations. I would then need to deconstruct each of these in order to understand the side-by-side comparison and trend of these substances over time, making the visualisation ineffective.
When critiquing data visualisations the Trifecta Checkup (Junk Charts 2019) provides a useful framework for initial evaluation. The checkup asks the following questions -
The visualisation selected (despite the interactive nature overall) aims to provide an answer relating to the number of AOD Treatment Episodes by Financial Year. The question is answered but the visualisation is not direct enough in providing the detail. The chart is missing vital elements to provide a quick and accurate answer. As we switch context within the chart, in this example between different Substances, the data changes via the visualisation making further interpretation difficult. Therefore, is it difficult to easily comprehend what the chart says and also what the data says.
The selected visualsation has numerous issues. This critique focuses on the following three:
From an ethical and data integrity perspective, the visualisation already outlines limitations observed within the dataset which can be found described at the end of the page. Issues exist such as missing data and inadequate coding and a recommendation is made to only use the data as it was intended, which is to track trends of acute harms within a community population. The dataset also only represents data between 2010 and 2019 which poses a limitation when compared to similar data found at other state represented websites.
Reference
The following code was used to fix the issues identified in the original.
# Load the required libraries
library(dplyr)
library(magrittr)
library(ggplot2)
library(readxl)
library(data.table)
library(tidyverse)
library(scales)
# Load the AOD Data
# Cater for the formatting and also some blank rows and explanations at the bottom of the file
VicAODData <- read_excel("VADC_data_extract_State_Mar2022.xlsx",
sheet = "Table 1.1_Victoria", col_names = TRUE,
skip = 7, n_max = 3406)
VicAODData = data.table(VicAODData)
# Check the data load
# Check the first and last rows of the dataset to make sure we loaded everything
firstrow <- head(VicAODData, n = 1)
head(firstrow)
## DRUG CATEGORY DATASET INDICATOR FINANCIAL YEAR CATEGORY
## 1: Alcohol VADC Session 2019 Total
## DESCRIPTION UNIT Victoria
## 1: Number of VADC sessions for alcohol-related events Number 13990
lastrow <- tail(VicAODData, n = 1)
head(lastrow)
## DRUG CATEGORY DATASET INDICATOR FINANCIAL YEAR CATEGORY
## 1: Pharmacotherapy VADC Session 2010 65+yrs
## DESCRIPTION UNIT Victoria
## 1: Rate of VADC sessions per 100,000 population Rate 0
# Perform some tidy up
# Rename the fields we want to use
colnames(VicAODData)[c(1)] <- c("SUBSTANCE") # Was DRUG CATEGORY
colnames(VicAODData)[c(4)] <- c("FY") # Was FINANCIAL YEAR
colnames(VicAODData)[c(8)] <- c("VALUE") # Was Victoria
# Make sure the 'VALUE' field is numeric as we want to SUM it
VicAODData$VALUE <- as.double(VicAODData$VALUE)
# Check data
head(VicAODData)
## SUBSTANCE DATASET INDICATOR FY CATEGORY
## 1: Alcohol VADC Session 2019 Total
## 2: Alcohol VADC Session 2019 Male
## 3: Alcohol VADC Session 2019 Female
## 4: Alcohol VADC Session 2019 0-19yrs
## 5: Alcohol VADC Session 2019 20-24yrs
## 6: Alcohol VADC Session 2019 25-34yrs
## DESCRIPTION UNIT VALUE
## 1: Number of VADC sessions for alcohol-related events Number 13990
## 2: Number of VADC sessions for alcohol-related events Number 8486
## 3: Number of VADC sessions for alcohol-related events Number 5307
## 4: Number of VADC sessions for alcohol-related events Number 611
## 5: Number of VADC sessions for alcohol-related events Number 961
## 6: Number of VADC sessions for alcohol-related events Number 2681
# Filter the data set to contain only the Substances we want to view for the comparable Financial Years
substances <- c("Alcohol", "Amphetamines (Any)", "Cannabis", "Heroin (Any)")
financial_years <- c("2015", "2016", "2017", "2018", "2019")
VicAODData_substances <- VicAODData %>%
filter(SUBSTANCE %in% substances) %>%
filter(FY %in% financial_years)
# Check the data
head(VicAODData_substances)
## SUBSTANCE DATASET INDICATOR FY CATEGORY
## 1: Alcohol VADC Session 2019 Total
## 2: Alcohol VADC Session 2019 Male
## 3: Alcohol VADC Session 2019 Female
## 4: Alcohol VADC Session 2019 0-19yrs
## 5: Alcohol VADC Session 2019 20-24yrs
## 6: Alcohol VADC Session 2019 25-34yrs
## DESCRIPTION UNIT VALUE
## 1: Number of VADC sessions for alcohol-related events Number 13990
## 2: Number of VADC sessions for alcohol-related events Number 8486
## 3: Number of VADC sessions for alcohol-related events Number 5307
## 4: Number of VADC sessions for alcohol-related events Number 611
## 5: Number of VADC sessions for alcohol-related events Number 961
## 6: Number of VADC sessions for alcohol-related events Number 2681
# Create a GROUP that adds up all by SUBSTANCE (DRUG CATEGORY) and FINANCIAL YEAR
Sum_DrugTypes_ByFY <- VicAODData_substances %>%
group_by(SUBSTANCE, FY) %>%
summarise(max_sum = sum(VALUE))
# Convert the SUM to an Integer so we get rid of the decimal places
Sum_DrugTypes_ByFY$max_sum <- as.integer(Sum_DrugTypes_ByFY$max_sum)
# Check the dataset we want to plot
head(Sum_DrugTypes_ByFY, n=10)
## # A tibble: 10 × 3
## # Groups: SUBSTANCE [2]
## SUBSTANCE FY max_sum
## <chr> <dbl> <int>
## 1 Alcohol 2015 52228
## 2 Alcohol 2016 54490
## 3 Alcohol 2017 46455
## 4 Alcohol 2018 44894
## 5 Alcohol 2019 44003
## 6 Amphetamines (Any) 2015 35347
## 7 Amphetamines (Any) 2016 39812
## 8 Amphetamines (Any) 2017 32921
## 9 Amphetamines (Any) 2018 36683
## 10 Amphetamines (Any) 2019 39157
# Format the data for the main plot
Sum_Treatments_ByFY <- VicAODData_substances %>%
group_by(FY) %>%
summarise(total_sum = sum(VALUE))
# Convert the SUM to an Integer so we get rid of the decimal places
Sum_Treatments_ByFY$total_sum <- as.integer(Sum_Treatments_ByFY$total_sum)
Data Reference
The following plot fixes the main issues in the original.
# Set some variables for the plot
line_size = 1.5
point_size = 3
# Generate the plot for the main 'All Substances' chart
p1 <- ggplot(Sum_Treatments_ByFY, aes(x=FY, y=total_sum, label=comma(total_sum)), hjust = 0.5) +
geom_line(size=line_size) +
geom_point(size=point_size) +
geom_smooth(method=lm,se=FALSE, linetype = "dashed", size=1) +
geom_label(hjust=0.5, vjust=-0.3, colour = "black") +
scale_color_brewer(palette="BrBG") +
theme_light() +
theme(axis.title.y=element_text(angle=0)) +
scale_y_continuous(limits = c(0, 150000), breaks=c(0,25000,50000,75000,100000,125000,150000)) +
labs(
title = "Victorian Alcohol and Drug related Episodes of Care (by Financial Year)",
y = "Count of\nEpisodes\nof Care\nAll Substances",
x = "Financial Year")
p1
# Generate the plot for the 'Individual Substances' chart
p2 <- ggplot(Sum_DrugTypes_ByFY, aes(x=FY, y=max_sum, group=SUBSTANCE, label=comma(max_sum)), hjust = 0.5) +
geom_line(aes(colour=SUBSTANCE), size=line_size) +
geom_point(size=point_size) +
geom_smooth(method=lm,se=FALSE, linetype = "dashed", size=1) +
geom_label(hjust=0.5, vjust=-0.3, colour = "black") +
scale_color_brewer(palette="BrBG") +
theme_light() +
theme(axis.title.y=element_text(angle=0)) +
scale_y_continuous(limits = c(0, 60000), breaks=c(0,10000,20000,30000,40000,50000,60000)) +
labs(
title = "Victorian Alcohol and Drug related Episodes of Care (by Substance)",
y = "Count of\nEpisodes\nof Care by\nSubstance",
x = "Financial Year")
p2
Conclusion From the reconstructed visualisations the audience can now immediately see what the chart represents and the actual values presented via the data labels. This improves the overall ‘Reading Tone’ (Kirk 2019) of the visualisation by emphasising the values, particularly in the Substance comparison chart.
The distortion apparent in the original visualisation which exaggerated the change over time is now accurately represented by ensuring the y-axis begins at zero.
The context provided by the Substances chart better conveys the amounts represented by each individual substance and the distortion of change is also now eliminated also.
With the addition of trend lines we can immediately see that all substances except ‘Amphetamines (Any)’ are trending down between the years selected.
AODStats By Turning Point, Treatment Services (2022). Victoria State. Retrieved July 22, 2023, from AODstats - Victorian alcohol and drug statistics website: https://aodstats.org.au/explore-data/treatment-services-vadc/
Junk Charts Trifecta Checkup: The Definitive Guide, https://junkcharts.typepad.com/junk_charts/junk-charts-trifecta-checkup-the-definitive-guide.html
Kirk Andy (2019) Data Visualisation: A Handbook for Data Driven Design, 2nd edn, Sage, London.
Vázquez-Ingelmo, A., García-Peñalvo, F.J. & Therón, R. 2021, “Towards a Technological Ecosystem to Provide Information Dashboards as a Service: A Dynamic Proposal for Supplying Dashboards Adapted to Specific Scenarios”, Applied Sciences, vol. 11, no. 7, pp. 3249.