Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.

Original


Reddit / Data is ugly. (https://www.reddit.com/r/dataisugly/comments/plb2xz/because_28_261_14246_totally_nonbiased/) Source: Citi Research , Public health Scotland.


Objective

The objective of the graph is to show that vaccines don’t offer immunity.

The visualisation chosen had the following three main issues:

  • The y axis is a percentage and (either intentianally or unintentionally) makes the data look like vaccinated persons are more likely to die from covid.
  • The title of the graph is a conclusion, rather than a neutral description.
  • The proportion of vaccinated people is much higher (~80%) So percentage comparision is unfair if the rate of infection is much lower amongst the vaccinated (ie data not normalised).

Reference

Code

We can reverse engineer the data and use total instead of percentage to give a more realistic scale

#install.packages("ggpubr")
library(ggplot2)
library(dplyr)
library(ggpubr)

status <- c("Unvaccinated", "One Dose", "Two Dose")

scotland_covid <- data.frame(
  Status = factor(status, levels=c("Unvaccinated", "One Dose", "Two Dose"), ordered = T),
  NewInfections = c(15443L, 7599L, 14346L),
  Hospitalisations = c(153L, 34L, 261L),
  Deaths = c(9L, 3L, 28L)
)

#calculate approximate total population from 2 dose data
total_second_dose <- 3802183
total_second_dose_percentage <- 0.836
total_population <- floor(total_second_dose / total_second_dose_percentage)

#calculate amount of people those who only have 1 dose
total_first_dose <- 4151735
total_first_dose_only <- total_first_dose - total_second_dose

#find the total remaining unvaccinated people
total_unvaccinated <- total_population - total_first_dose


#Using vaccination data from Public Health Scotland | United Kingdom, September 2021
to_percent_of_total <- function(people, total_people_status) {
  percent <- people / total_people_status * 100
  return(round(percent, digits = 4))
}

#add new rows that represent percentage of people infected / hospital / death 
#in each group of vaccinatied 1, 2 and unvaccinatied

infections_proportion = data.frame(
  NewInfectionsProportion=c(
    to_percent_of_total(15443, total_unvaccinated),
    to_percent_of_total(7599, total_first_dose_only),
    to_percent_of_total(14346, total_second_dose)
  ) 
)

hospital_proportion = data.frame(
  HospitalisationsProportion=c(
    to_percent_of_total(153, total_unvaccinated),
    to_percent_of_total(34, total_first_dose_only),
    to_percent_of_total(261, total_second_dose)
  ) 
)

deaths_proportion = data.frame(
  DeathsProportion=c(
    to_percent_of_total(9, total_unvaccinated),
    to_percent_of_total(3, total_first_dose_only),
    to_percent_of_total(28, total_second_dose)
  ) 
)

combined <- cbind(scotland_covid, infections_proportion, hospital_proportion, deaths_proportion)


ggp1 <- ggplot(
  data = combined, 
  aes(
    x = Status, 
    y = NewInfectionsProportion,
    fill = Status
  )
) +
geom_bar(stat = "identity", width = 0.5) +
theme(axis.title.x=element_blank()) +
ylab("New Infections %")

ggp2 <- ggplot(
  data = combined, 
  aes(
    x=reorder(Status,Status,length),
    y = HospitalisationsProportion,
    fill = Status
    
  )
) +
geom_bar(stat = "identity") +
theme(legend.position="none", axis.title.x=element_blank()) +
ylab("Hospitalisations %") +
coord_cartesian(ylim=c(0, 0.05))

ggp3 <- ggplot(
  data = combined, 
  aes(
    x = Status, 
    y = DeathsProportion,
    fill = Status
    )
) +
geom_bar(stat = "identity") +
theme(legend.position="none", axis.title.x=element_blank()) +
ylab("Deaths %") + 
coord_cartesian(ylim=c(0, 0.05)) +
geom_text(aes(label = DeathsProportion), vjust = -0.2)

arranged <- ggarrange(
  ggp1,
  #labels = c("New Infections", "Hospitalisations", "Deaths"),
  common.legend = T,
  legend = "bottom",
  ggarrange(ggp2, ggp3, ncol = 2),
  nrow = 2
)

annotated <- annotate_figure(arranged,
  top = text_grob("Scotland 1 week Covid data, normalised by total population percentage.\nInfections, hospitalisation and deaths by vaccination status", color = "Black", face = "bold", size = 14)
)

Data Reference

Reconstruction

The following plot fixes the main issues in the original.