Original


Source:Medical Data Exploration and Visualization - Heart Disease.


Objective

The objective of the visualization is to understand the data distribution of Age and Trestbps(Blood Pressure) with respect to Disease and Non-Disease category. The target audience is the general population as the article was published in an online publishing platform called medium to get an understanding of data exploration.

The visualisation chosen had the following three main issues:

  1. Confusing Labels: There are two distinct colours representing the same category {Disease and Non Disease} hence creating a lot of confusion.
  2. Deceptive Convention and Unclear representation: The visualizer has shown the distribution of both age and trest bps in the same axis label which is completely wrong and creates a lot of deception. The understanding of having a blood pressure 25 is illogical and clearly misrepresents the data. The histogram goes beyond the y limit hence the plot is deceptive.
  3. Wrong Colour and Visual Perception: The colour representation does not highlight the main objective of the data visualization. Overlaping areas does npt represent transparency hence can be misleading.

Reference * Basic Medical Data Exploration / Visualization — Heart Diseases by Jae Duk Seo * Link Reference - https://towardsdatascience.com/basic-medical-data-exploration-visualization-heart-diseases-6ab12bc0a8b7

Code

The following code was used to fix the issues identified in the original.

library(plotly)
library(ggplot2)
library(readr)
library(dplyr)
library(Hmisc)
library(GGally)
library(magrittr)
library(grid)
library(colorblindr)
library(gridExtra)
heart=read.csv("processed.cleveland.csv", header = FALSE)
colnames(heart)<-c('Age', 'Gender', 'ChestPain', 'Trestbps', 'Cholestoral',
                'FBS', 'Restecg','MaximumHeartRate','Exang', 'OldPeak', 
                'Slope','Ca', 'Thal', 'Target')

heart$Target = ifelse(heart$Target > 0, 1, 0)
heart$Target=as.factor(heart$Target)
heart$Target=heart$Target %>% factor(levels = c(0,1),labels=c("No Disease","Disease"))
heart$Gender=as.factor(heart$Gender)
heart$Gender=heart$Gender%>% factor(levels = c(1,0),labels=c("Male","Female"))

p9<-ggplot(heart,aes(x=Age))
p10<- ggplot(heart,aes(x=Trestbps))


Trestbps <-p10 + geom_histogram(aes(y = ..density.., color = Target, fill = Target), position = "identity", alpha=0.2)+
  ylim(0, 0.06)+
  geom_density(aes(color = Target), size = 1,alpha=.2) +
  scale_color_manual(values = c("#0073C2FF","#FC4E07"))+ scale_fill_manual(values  = c ("#0073C2FF", "#FC4E07"))+
   ggtitle(" Data Distribution of Trestpbs")+
   theme(
    plot.title = element_text(color="Black", size= 10, face="bold.italic"),
    axis.title.x = element_text(color="Black", size=7, face="bold"),
    axis.title.y = element_text(color="Black", size=7, face="bold")
    )
 
Age<-p9 + geom_histogram(aes(y = ..density.., color = Target, fill = Target), position = "identity", alpha=0.2)+
  ylim(0, 0.08)+
  geom_density(aes(color = Target), size = 1,alpha=.2) +
  scale_color_manual(values = c("#0073C2FF","#FC4E07"))+ scale_fill_manual(values  = c ("#0073C2FF", "#FC4E07"))+
   ggtitle(" Data Distribution of Age")+
  theme(
    plot.title = element_text(color="Black", size=10, face="bold.italic", margin = margin(10, 0, 10, 0)),
    axis.title.x = element_text(color="Black", size=7, face="bold"),
    axis.title.y = element_text(color="Black", size=7, face="bold")
    )

Data Reference

Reconstruction

We reconstruct the graph by:

  1. Proper Labelling and Axis limits.
  2. Showing distribution of Age and Trestbps seperately.
  3. Adusting the colours and highlighting the density plot to make the objective clear. The blue and orange colour now only highlight disease and non-disease respectively. Transparency introduced in overlapping elements.