Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.

Original



Objective

  • The purpose of considered data visualisation was to represent the number of injuried persons due to road mishap in India during year 2000 to 017.This data was published by Ministry of Road Transport and Highways, Government of India.
  • Here in original visualisation line graph represents number of people injured per lakh population during the year 2000 to 2017 whereas bar graph depicts the exact number of people injured due to road accidents.
  • The main objective of this study is to reduce the road accidents and to spread awareness regarding road safety.
  • Therefore the target audience are general public of India as the data is published on Open Government Data (OGD) platform India website.

The visualisation chosen had the following three main issues:

  • The prominient issue with the original visualisation is incorrect selection of bar graph to depict chronological data for total count of injured people. As line charts are useful for time series as they connect data plotted at regular points.bar chart is preferred where there is data comparison. Whereas bar chart is preferred where there is data comparison
  • Secondly, we have dual axes issue in the original graph.Dual axes should be avoided as it may lead to misinterpretation.Data manipulation becomes easy with dual axes. Here second y axis may go unnoticed.
  • Thirdly, the colour combination is inappropriate for people with Protanopia disorder.Protanopes are more likely to confuse with shades of black and red.

Reference

Code

The following code was used to fix the issues identified in the original.

library(ggplot2)
library(dplyr)
# Reading the data from csv file
road_accident<- read.csv("C:/Users/aad19/Documents/Archita/Road_Accidents.csv")

# Flitering the data for year 2000 to 2017
road_accident_2000_2017<-road_accident %>% filter(road_accident$Years>=2000)

# Changing the column names
names(road_accident_2000_2017)[names(road_accident_2000_2017)=="Total.Number.of.Persons.Injured..in.numbers."] <- "Total Number of Persons Injured (in numbers)"
names(road_accident_2000_2017)[names(road_accident_2000_2017)=="Number.of.Persons.Injured.per.Lakh.Population"] <- "Number of Persons Injured (Per Lakh Population)"

# Converting int to factor datatype for Years
road_accident_2000_2017$Years <- factor(road_accident_2000_2017$Years,levels = road_accident_2000_2017$Years)

# To prevent scientic notation
options(scipen = 999)

# Persons Injured in Road Accidents in India from 2000 to 2017
plot_1 <- ggplot(data = road_accident_2000_2017, aes(group = 1, x = `Years`,y = `Total Number of Persons Injured (in numbers)` ))
plot_1 <- plot_1 + geom_line(stat = "identity", colour = "MAROON") + geom_point(colour = "MAROON") +
labs(title = "Persons Injured in Road Accidents in India from 2000 to 2017",
y="Number of Persons Injured (in numbers)")+
  theme(plot.title = element_text(face = "bold",size = 10, hjust = 0.5))+ scale_y_continuous(limits = c(350000,550000)) 

# Persons Injured (Per Lakh Population) in India from 2000 to 2017
plot_2 <- ggplot(data = road_accident_2000_2017, aes(group = 1, x = `Years`,y = `Number of Persons Injured (Per Lakh Population)` ))
plot_2 <- plot_2 + geom_line(stat = "identity", colour = "CYAN") + geom_point(colour = "CYAN") +
labs(title = "Persons Injured (Per Lakh) in India from 2000 to 2017",
y="Number of Persons Injured (Per Lakh)")+
theme(plot.title = element_text(face = "bold",size = 10, hjust = 0.5))+ scale_y_continuous(limits = c(36,46))

Data Reference

Reconstruction

The following plot fixes the main issues in the original.