Assignment 2

Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.

Original

Source: ACMA Research and Analysis Section (2015).

Objective

The original data visualisation consisted of a pie-chart, detailing the Top 30 sources causing injuries from 2015 to 2017 in the USA.

Based purely off the visulisation, the objective is to show the Top 30 sources causing injuries from 2015 onwards in the USA.

Occupational Safety and Health Administration aka OSHA requires employers to report all severe work-related injuries, defined as an amputation, in-patient hospitalization, or loss of an eye and the requirement began on January 1, 2015. The targeted audience can be determined from the media type and the content of the artile and visualisation.

This article writes mainly about injury data for US workers, data covers over 22 000 incidents from Jan 1 2015 to Feb 28 2017. There are 26 columns that describe incident, parties involved, employer, geographical data, injury sustained, and final outcome. The visulalisation contains no complex concepts, and can be understood by the lay public.

The visualisation chosen had the following main issues: * Pie charts with very small proportions are hard to see and label. And, It is difficult to differentiate between small values; there are too many slices in the pie chart. * Using pie chart is a poor way to compare ,when proportions are similar and contrast the changes. * There is a disconnection between the message being conveyed in the visualisation, and what the article writes about. They do not contradict each other, but state only vaguely connected facts.

Reference

The Most Dangerous Places to Work in the USA by Pranav Pandya (https://www.kaggle.com/pranav84/the-most-dangerous-places-to-work-in-the-usa/report)
chosen Donut or Pie chart’s source- https://www.kaggle.com/pranav84/the-most-dangerous-places-to-work-in-the-usa/data#main-source

Reconstruction

Source: ACMA Research and Analysis Section (2015).

The original visualization was converted into a bar chart from pie chart so that we can compare things between different groups or to track changes over time, which helps people to visualize trends. Each bar represents one value. When the bars are stacked next to one another, the viewer can compare the different bars, or values, at a glance. And, the mean count of the top 20 sources causing injuries was added as a vertical line.

Code

The following code was used to fix the issues identified in the original.

library(ggplot2)

#Data was derived from the visualisation and the article it appears in. Tests were conducted to ensure the article and visualisation values matched up.
#No raw data was obtainable for this visualisation.

pkgs <- c("readr", "data.table", "dplyr", "tidyr", "DT", "reshape2", "tm", "stringr", "gsubfn", "lubridate",
          "ggplot2", "gridExtra", "highcharter", "plotly", "ggrepel", "leaflet", "leaflet.extras", "ggmap", 
          "RColorBrewer", "viridisLite", "countrycode", "ggmap", "zipcode") 

for (pkg in pkgs) {
                    if (! (pkg %in% rownames(installed.packages())))
                      { install.packages(pkg) }
                    require(pkg, character.only = TRUE)
                  }
rm(pkgs, pkg)

#Load data 
df <- fread("severeinjury.csv", na.strings = "" ,stringsAsFactors = FALSE, strip.white = TRUE, data.table = FALSE)


#Fix column names
vars = names(df)

cleanvars <- gsub(' ', '_', vars)
colnames(df) <- cleanvars

df<- df %>% 
  select(c(EventDate, Employer, Zip, City, State, Longitude, Latitude, NatureTitle, Part_of_Body_Title, Hospitalized, Amputation, EventTitle, SourceTitle, Secondary_Source_Title, Final_Narrative )) %>%
  mutate(EventDate = mdy(EventDate)) %>%
  rename(Injury = NatureTitle, Part_of_Body = Part_of_Body_Title, 
         count_Hospitalized = Hospitalized, count_Amputation =Amputation, 
         Event = EventTitle, Source = SourceTitle, Sec_Source = Secondary_Source_Title)

df0<- df %>% group_by(Source) %>% 
  filter(count_Hospitalized !=0 || count_Amputation !=0) %>% 
  summarise(count = n()) %>% arrange(desc(count))

df1<- df %>% group_by(Source) %>% 
  filter(count_Hospitalized !=0 || count_Amputation !=0) %>% 
  summarise(count = n()) %>% arrange(desc(count)) %>% head(20)

df %>% group_by(Source) %>% 
  filter(count_Hospitalized !=0 || count_Amputation !=0) %>% 
  summarise(count = n()) %>% arrange(desc(count)) %>% head(20) %>%
  hchart("bar", innerSize= '40%', showInLegend= F, 
        hcaes(x = Source, y = count, colour = -count)) %>%
  hc_add_theme(hc_theme_ffx()) %>% 
  hc_title(text = "Top 20 sources causing injuries from 2015 onward") %>%
  hc_credits(enabled = TRUE, text = "Sources: Occupational Safety and Health Administration aka OSHA") %>%
  hc_yAxis(plotLines = list(list(label = list(text = "Average"),
                  color = "#FF0000",
                  width = 2,
                  value = mean(df1$count))))

Data Reference

https://www.kaggle.com/pranav84/the-most-dangerous-places-to-work-in-the-usa/data
This dataset was created by OSHA and released to the public.

Assignment 2

Deconstruct, Reconstruct Web Report

Bodiyabaduge Dewsri Lalithi Perera (s3762890) & Harini Mylanahally Sannaveeranna (s3755660)

Original

Reconstruction

Code