Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.

Original


Source: Ministry of Road Transport and Highways(Community.data.gov.in).


Objective

The main objective of the original data visualization is to display the count of the total number of people injured in road accidents and also displays the number of people injured per lakh population from the year 2000 to 2017 in India.

This data Visualisation was published on the website community.data.gov.in, where the source for data was from the Ministry of Road Transport and Highways, Government of India.

The target audience for this data Visualisation is the common people of that country so that people would take safety precautions while traveling or riding on the road.

The visualization chosen had the following three main issues:

  • Use of Colours and Numbers:- The color used to fill the bar graph and the color used to plot the line graph are saturated to each other and they lack visual clarity. The color used for data visualization could be improved, as they lead to color blindness. Adding to this the numbers which are used to represent the value are bombarded, and the values overlap each other, which makes the viewer get confused easily which leads to wrong interpretation.

  • Use of Multiple Values for Y-axis(Dual Axes):- In the data visualization there are multiple scales present in the graph for() Y-axis, one at the extreme right of the graph and another towards the extreme left. The values are of different scales at both ends making dual y-axis.There is less probability of people visualizing dual axes. Dual axes always lead to miss interpretation. Due to the visual bombardment, the values couldn’t be read too. This would have been avoided by using Multiple Visualisations.

  • Inappropriate Use of Grammer and Vocabulary:- The data consists of the number of people injured over a period of 17 years. So according to the Grammer rules the time series data(Line Plots) should have been used. Since we compare and record the data over some time its always a good option to use Line Plots instead of Bar plots.

Reference

*Community.data.gov.in (2019). Persons Injured in road Accidents In India From 2000 to 2017. Retrieved September 09,2019, from community.data.gov.in, website :https://community.data.gov.in/persons-injured-in-road-accidents-in-india-from-2000-to-2017/

Code

The following code was used to fix the issues identified in the original.

library(ggplot2)
library(readr)
library(dplyr)
library(tidyr)
library(grid)
library(knitr)



#Setting The Directory
setwd("C:/Users/ravindra/Desktop/SEMESTER 2/DATA VISUALISATION/ASSIGNMENT/ASSIGNMENT 2")
getwd()
## [1] "C:/Users/ravindra/Desktop/SEMESTER 2/DATA VISUALISATION/ASSIGNMENT/ASSIGNMENT 2"
#Reading The Csv File
road_accidents <- read.csv("Road_Accidents.csv")



#Filtering The Data 
road_accidents <- filter(road_accidents,road_accidents$Years>=2000)


#Creating a Dataframe

df <- road_accidents %>%
  select(Years,Road.Accidents , Persons.Injured) %>%
  gather(key = "variable", value = "value", -Years)
head(df)
##   Years       variable  value
## 1  2000 Road.Accidents 391449
## 2  2001 Road.Accidents 405637
## 3  2002 Road.Accidents 407497
## 4  2003 Road.Accidents 406726
## 5  2004 Road.Accidents 429910
## 6  2005 Road.Accidents 439255
#Plot 1{ Plotting Number of Road accident and Number of Injuries Caused in Road Accidents}

Plot1 <-ggplot(df, aes(x = Years, y = value)) + 
  geom_line(aes(color = variable), size = 1) +
  geom_point(aes(y = value),color = "#000000",size=2) +
  scale_color_manual(values = c("black", "red")) +
  theme_minimal()+
  ggtitle("Road Accidents & Injuries in India from 2000 to 2017") + 
  theme(axis.text.y   = element_text(size=10),
        axis.text.x   = element_text(size=10),
        axis.title.y  = element_text(size=10),
        axis.title.x  = element_text(size=10),
        panel.background = element_blank(),
        panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(),
        axis.line = element_line(colour = "black"),
        panel.border = element_rect(colour = "BLACK", fill=NA, size=2))+
  theme(plot.title = element_text(hjust  = 0.2))+
  theme(plot.background = element_rect(fill = "white"))


#Creating a Dtaframe1

df1 <- road_accidents %>%
  select(Years,Road.Accidents.per.Lakh.Population , Persons.Injured.per.Lakh.Population) %>%
  gather(key = "variable", value = "value", -Years)
head(df1)
##   Years                           variable value
## 1  2000 Road.Accidents.per.Lakh.Population  38.6
## 2  2001 Road.Accidents.per.Lakh.Population  39.4
## 3  2002 Road.Accidents.per.Lakh.Population  39.0
## 4  2003 Road.Accidents.per.Lakh.Population  38.3
## 5  2004 Road.Accidents.per.Lakh.Population  39.8
## 6  2005 Road.Accidents.per.Lakh.Population  40.1
#PLOT 2{ Plotting Number of Road accident and Number of Injuries Caused in Road Accidents Per lakh population}

Plot2 <- ggplot(df1, aes(x = Years, y = value)) + 
  geom_line(aes(color = variable), size = 1) +
  geom_point(aes(y = value),color = "#000000", size=2) +
  scale_color_manual(values = c("black", "red")) +
  theme_minimal()+
  ggtitle("Accidents & Injuries per Lakh Population in India from 2000 to 2017") + 
  theme(axis.text.y   = element_text(size=10),
        axis.text.x   = element_text(size=10),
        axis.title.y  = element_text(size=10),
        axis.title.x  = element_text(size=10),
        panel.background = element_blank(),
        panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(),
        axis.line = element_line(colour = "black"),
        panel.border = element_rect(colour = "BLACK", fill=NA, size=2))+
  theme(plot.title = element_text(hjust  = 0.2))+
  theme(plot.background = element_rect(fill = "white "))

Data Reference

Reconstruction

The following plot fixes the main issues in the original.