Original


Source: Florida COVID19 Case (2020).


Objective

The coronavirus disease (COVID-19) is an infectious disease caused by a newly discovered coronavirus. Although for most people it only causes mild illness, it can make other very ill or more rarely, it can be fatal. The visualisation above is depicting the total hospitalized cases of patients contracted with the COVID-19 and the rate of hospitalization from March to July 2020.

Target Audience

  • This visualization is intended for healthcare workers and the general public.

Issues

  • Visual bombardment - The presence of the dual axes, each with its own scale, along with barplots of 5 months of different shapes with age group represented by different colour and the large number of text written is overwhelming for the readers and takes them a long time to interpret the visualisation.
  • Dual Axes - The presence of dual axes makes it confusing for the viewers to associate and interpret 3 different categories of visualization on the same plot.
  • Poor Scaling - Mislead the reader about the width on the visualisation and does not give an appropriate representation of the number of cases to the width in the barplot.

Reference

Code

The following code was used to fix the issues identified in the original.

library(extrafont)
library(readr)
library(magrittr)
library(tidyr)
library(Hmisc)
library(dplyr)
library(outliers)
library(lubridate)
library(forecast)
library(colourpicker)


florida <- read.csv("Florida_Cases.csv", sep = ",")
florida1 <- florida %>% select(-Age, -(Jurisdiction:EDvisit), -Died,-Contact,-Case1, -ChartDate, -ObjectId)
florida1$EventDate <- as.Date(florida$EventDate)

## This Block Cleans the data for the visualisation

#Filter by total confirmed cases and both hospitalized and non hospitalized
florida_YES <-florida1 %>% filter(`Case_` == 'Yes') %>% filter(`Hospitalized` == 'YES' | `Hospitalized` == 'NO')
f_YES <- florida_YES %>% mutate(Month = format(florida_YES$EventDate,'%B')) %>%
  filter(`Month`!= 'January', `Month`!= 'February')
f_YES$Month <- factor(f_YES$Month, levels = c('March', 'April', 'May', 'June', 'July'))
f_YES$Age_group <- factor(f_YES$Age_group, levels = c('0-4 years', '5-14 years', "15-24 years", "25-34 years", "35-44 years", "45-54 years","55-64 years", "65-74 years", "75-84 years", "85+ years"))
f_YES <- f_YES[complete.cases(f_YES),]



new_f <- f_YES %>% group_by(Month) %>% filter(Hospitalized == 'YES') %>%
  count(YES=(Hospitalized == 'YES'))
new_f <- new_f %>% select(-"YES")
total <- f_YES %>% group_by(Month) %>% count()



rate <- full_join(new_f, total, by="Month")
colnames(rate)[colnames(rate) == "n.x"] <- "HospitalizedCases"
colnames(rate)[colnames(rate) == "n.y"] <- "TotalCases"
rate <- rate %>% mutate(Rate_of_Hospitalized = HospitalizedCases/TotalCases, 
                        Percentage = round(Rate_of_Hospitalized*100,1))
rate <- rate[c("Month", "TotalCases", "Rate_of_Hospitalized", "Percentage")]
# This Block is to solve issues appeared on the original and corrected with the following code

# looking at the vizualisation for confirm cases.
p <- ggplot(f_YES , aes(x=Age_group, fill = Month))
p + geom_bar(position="dodge", colour="black") + 
  facet_grid(cols = vars(Month)) +
  labs(
  title="Total Cases in Florida (in Thousands)",
  x="Age Group")  +
  theme_gray() +
  theme(legend.position="none", axis.text.x=element_text(hjust=1), 
        text=element_text(family="Georgia"),
        title = element_text(face = "bold"),) +
  scale_y_continuous(name = "Count of Cases", labels = function(y) y / 1000) +
  coord_flip() 


# looking at the vizualisation for hospitalized cases.
p1 <- ggplot(f_HOSYES, aes(x=Age_group, fill = Month))
p1 + geom_bar(position="dodge", colour="black") +
  labs(
  title="Hospitalized Cases in Florida (in Hundreds)",
  x="Age Group") + 
  facet_grid(cols = vars(Month)) + theme_gray() +
  theme(legend.position="none", axis.text.x=element_text(hjust=1), 
        text=element_text(family="Georgia"),
        title = element_text(face = "bold"),) +
  scale_y_continuous(name = "Counts of Hospitalized", labels = function(y) y / 100) +
  coord_flip()


p2 <- ggplot(rate,aes(x=Month, y=Percentage))
p2 + geom_line(stat = "identity", colour = "turquoise3", aes(group = 1)) + 
  labs(
    title="Rate of Hospitalization in Florida",
    x="Month",
    y="Percentage Hospitalized") + theme_gray() +
  theme(legend.position="none", axis.text.x=element_text(hjust=1), 
        text=element_text(family="Georgia"),
        title = element_text(face = "bold"),) +
  geom_smooth(se = FALSE, span = 0.4) + 
  geom_point(colour = "turquoise3") + 
  geom_text(aes(label = paste(Percentage,"%",sep="")),nudge_y = -2, nudge_x = .05) +
  scale_y_continuous(limits = c(0, 30))

Data Reference

Reconstruction

The following plot fixes the main issues in the original.