Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.
The data used in this assessment has been sourced from Occupational Safety and Health Administration aka OSHA by US government from the year 2015 to 2017. The data are collected from employers who reported all severe work-related injuries starting from the year January 2015. It covers 22,00 incidents and contains 26 columns including geographical data, employers, and outcome.
Objective
The primary goal of the data visualization is to draw the attention of the audience on the secondary sources of worker’s injury in the US from the year 2015 to 2017.
The targeted audiences for this visualization are government, employers and workers who are trying to reduce the accidents occurs during work hours in the US. The workers are victims in the injury who are responsible for working with more precaution, employers are responsible to take good care on their employees’ safety and provide a safe environment for the employees to work and the government is responsible to analyse the root cause for better solution and stricter the rules and regulation to ensure the working environment is safe for everyone.
The visualisation chosen had the following three main issues:
Reference
Pandya, P., 2017. The Most Dangerous Places to Work in the USA. [Online] Available at: https://www.kaggle.com/pranav84/the-most-dangerous-places-to-work-in-the-usa/data#main-source [Accessed 7 May 2020].
United States Department of Labor, 2017. Severe Injury Reports. [Online] Available at: https://www.osha.gov/severeinjury/index.html [Accessed 7 May 2020].
The following code was used to fix the issues identified in the original.
library(ggplot2)
library(readr)
library(dplyr)
library(lubridate)
library(highcharter)
library(data.table)
#Load the data
df <- read_csv("severeinjury.csv")
#Remove the whitespaces from the column name
name = names(df)
clean_name <- gsub(' ', '_', name)
colnames(df) <- clean_name
df<- df %>% select(c(EventDate, Employer, Zip, City, State, Longitude, Latitude,
NatureTitle, Part_of_Body_Title, Hospitalized, Amputation,
EventTitle, SourceTitle, Secondary_Source_Title, Final_Narrative )) %>%mutate(EventDate = mdy(EventDate))
df2<-df %>% group_by(Secondary_Source_Title) %>%
filter(Hospitalized !=0 || Amputation !=0) %>% na.omit(count) %>%
summarise(count = n()) %>%
arrange(desc(count)) %>%
head(30)
p1<-df2%>%
hchart("bar", innerSize= '60%', showInLegend= F,
hcaes(x = Secondary_Source_Title, y = count, colour = -count)) %>%
hc_add_theme(hc_theme_flat()) %>%
hc_title(text = "Top 30 secondary sources of workers injury from 2015 onward") %>%
hc_credits(enabled = TRUE, text = "Sources: U.S Department of Labor") %>%
hc_yAxis(plotLines = list(list(label = list(text = "Average"),
color = "#A93226",
width = 2,
value = mean(df2$count))))
Data Reference
The following plot fixes the main issues in the original.