Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.

Original


Source: OGD India, 2019.


Objective

The main objective of this data visualisation is to showcase the trends in the quantity of exports of major chemicals from India during the period 2010-11 till 2017-18, which contributes to the growth of Indian economy.

The target audience of this visualisation are as following:-

  • Policy makers of the country (to decide on the future policies and strategies changes to florish the sector)
  • Economists (to understand the economy of the chemical sector)
  • Current and future investors of the chemicals production sector (to decide whether to invest in Indian chemical manufacturing or related companies)
  • Various agencies which determine the GDP, economy and other related data of countries.

The Data Visualisation that was chosen had the following three main issues:

  • Missing grids or comparison lines: The grids or the horizontal lines are not present at the background to make comparison between the far aways points (e.g. for comparison between 2010-11 and 2017-18 observations).
  • Individual sub-category data unavailable: The plot is to display major chemicals exports, but the individual data for chemical type is not present in the plot (even though data is available).
  • Incorrect x and Y axes labels: The label for x-axis does not specify that it reflects the quantity of exports (only mentions the unit) and the Y-axis label displays Year instead of Financial Year and each year has the text “QTY” following, which is unnecessary. The label “QTY” might mean different thing to different audiences, which might cause deception.

Reference

Code

Following is the code that was used to fix the identified issues with the original Data Visualisation.

Setup

# Import necessary packages
library(readr)
library(ggplot2)
library(tidyr)
library(dplyr)

# Set working directory
setwd("C:\\Users\\abhis\\OneDrive\\Desktop\\Master of Data Science\\Sem 2\\Data Visualisation MATH2270\\Assignments\\Assignment 2")

Read Data

# Read the CSV file
chemical_exports <- read_csv("Exports_of_Major_Chemicals_ProductWise_or_GroupWise_2010-11_to_2018-19-up_to_september_2018.csv")

Process Data

# Subset the data with only the necessary coulmns
chemical_exports_subset <- chemical_exports[-c(68),-c(4,6,8,10,12,14,16,18:20)]

# Rename the columns
names(chemical_exports_subset) <- c("Group", "Product", "2010-11", "2011-12", "2012-13", "2013-14", "2014-15", "2015-16", "2016-17", "2017-18")

# Rename the Chemical Group names
chemical_exports_subset$Group[chemical_exports_subset$Group == "ALKALI CHEMICALS"] <- "Alkali"
chemical_exports_subset$Group[chemical_exports_subset$Group == "INORGANIC CHEMICALS"] <- "Inorganic"
chemical_exports_subset$Group[chemical_exports_subset$Group == "ORGANIC CHEMICALS"] <- "Organic"
chemical_exports_subset$Group[chemical_exports_subset$Group == "PESTICIDES & INSECTICIDES"] <- "Pesticides & Insecticides"
chemical_exports_subset$Group[chemical_exports_subset$Group == "DYES & DYESTUFF"] <- "Dyes & Dyestuff"

# Reformat the columns by using gather function
chemical_exports_subset_gather <- chemical_exports_subset %>% gather("2010-11", "2011-12", "2012-13", "2013-14", "2014-15", "2015-16", "2016-17", "2017-18", key = "Year", value = "Quantity")

# Group the table by chemical group and year
chemical_exports_subset_gather_group <- chemical_exports_subset_gather %>% group_by(Group, Year) %>% summarise(Total.Quantity = sum(Quantity))

Reconstruct the Data Visualisation using GG Plot

# Plot GGPlot and save it as `plot`
plot <- ggplot(data = chemical_exports_subset_gather_group, 
               aes(x = Year, y = Total.Quantity, fill = Group)) + 
  
  # Plot bar chart
  geom_bar(stat = "identity", position = "stack") + 
  
  # Set the x-axis label
  xlab("Financial Year (India)") + 
  
  # Set the Y-axis label
  ylab("Quantity (in metric tonne)") + 
  
  # Add the labels (title, subtitle and caption)
  labs(title = "Export of Major Chemicals (India)", 
       subtitle = "2010-11 to 2017-18", 
       caption = "Data Source: OGD India, 2019") + 
  
  # Add the quantity of each stacked bar in Bar Chart
  geom_text(aes(label = Total.Quantity), vjust = 1.13, position = "stack") + 
  
  # Set the GGPlot background theme
  theme_light() + 
  
  # Make changes to the applied theme (change x-axis text angle, place legend to bottom, remove legend title, adjust title and subtitle position)
  theme(axis.text.x = element_text(angle = 90), 
        legend.position = "bottom", legend.title = element_blank(), 
        plot.title = element_text(hjust = 0.5), plot.subtitle = element_text(hjust = 0.5)) + 
  
  # Add the total quantity on top of each bar in Bar Chart
  stat_summary(aes(label = ..y.., group = Year), fun.y = sum, geom = "text", vjust = -0.35)

Data Reference

Reconstruction

Following is the plot generated that fixes the main issues with the original Data Visualisation.