Scope 1: Data & Challenges

Describe the major data and design challenges faced in accomplishing the task, and how you plan to overcome these challenges with a proposed sketched design. (3 marks)

Proposed Sketched Design:

Scope 2: Step-By-Step Description

Provide step-by-step description on how the data visualization was prepared by using ggplot2 and other related R packages. (3 marks)

Loading Packages:

packages = c('tidyverse', 'ggplot2', 'dplyr')
for(p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p, character.only = T)
}

library(shiny)
library(plotly)
library(dplyr)

Importing Data:

plot_data <- read_csv(("workplace-injuries-by-industry-and-incident-types.csv"))

Examining Data:

plot_data
## # A tibble: 26,332 x 8
##     year degree_of_injury industry sub_industry incident_type incident_agent
##    <dbl> <chr>            <chr>    <chr>        <chr>         <chr>         
##  1  2011 Fatal            Communi… Repair & Ma… Caught in/ b… Vehicles      
##  2  2011 Fatal            Communi… Repair & Ma… Falls - Slip… Vehicles      
##  3  2011 Fatal            Constru… Civil Engin… Collapse/Fai… Others        
##  4  2011 Fatal            Constru… Civil Engin… Struck by Mo… Lifting Equip…
##  5  2011 Fatal            Constru… Civil Engin… Struck by Mo… Pressurised E…
##  6  2011 Fatal            Constru… Constructio… Caught in/ b… Lifting Equip…
##  7  2011 Fatal            Constru… Constructio… Caught in/ b… Vehicles      
##  8  2011 Fatal            Constru… Constructio… Cave-in of e… Others        
##  9  2011 Fatal            Constru… Constructio… Collapse of … Physical Work…
## 10  2011 Fatal            Constru… Constructio… Crane-related Lifting Equip…
## # … with 26,322 more rows, and 2 more variables: incident_agent_sub_type <chr>,
## #   no._of_injuries <dbl>

Visualization 1: Trend of Workplace Injuries per Degree of Injury (2011 - 2018)

Data Manipulation:

trend_data <- plot_data %>% 
  group_by(year, degree_of_injury) %>% 
  summarise(injury_count= sum(no._of_injuries))

Trial 1 Graph Plotting:

ggplot(data=trend_data, aes(x = year, y = injury_count, color = degree_of_injury)) + 
  geom_line(show.legend = TRUE) +
  theme_minimal() +
  xlab("Year") +
  ylab("No. of Injuries") +
  ggtitle("Trend of Workplace Injuries per Degree of Injury \n Fatal vs Major vs Minor") +
  theme(plot.title = element_text(size = 15, face = "bold")) +
  labs(col="Degree of Injury")

Comments: I realized that the difference in scale is too huge here so the readability of the visualization for the audience is not as insightful. Hence I decided to make use of R Shiny to insert checkboxes in my graph for the readers to choose from. I made it reactive so the graph scale will change accordingly. Please run this through Rshiny where the link will be submitted via elearn to be able to see it.

Improved Visualization 1 with RShiny

Building the UI:

#shiny

ui <- bootstrapPage(
  checkboxGroupInput("degree_of_injury_list", label = "Select the Injury Type",
                     choices = list("Fatal" = "Fatal",
                                    "Major" = "Major",
                                    "Minor" = "Minor")),
  mainPanel(
   plotOutput("myplot"),
   textOutput("selected_var")
  )
)

RShiny Server:

trend_data
## # A tibble: 24 x 3
## # Groups:   year [8]
##     year degree_of_injury injury_count
##    <dbl> <chr>                   <dbl>
##  1  2011 Fatal                      61
##  2  2011 Major                     556
##  3  2011 Minor                    9504
##  4  2012 Fatal                      56
##  5  2012 Major                     588
##  6  2012 Minor                   10469
##  7  2013 Fatal                      73
##  8  2013 Major                     640
##  9  2013 Minor                   11740
## 10  2014 Fatal                      60
## # … with 14 more rows
server <- function(input, output, session) {
    


 filteredData <- reactive({
    
    res <- trend_data %>% filter(degree_of_injury %in% input$degree_of_injury_list)
    

  })

   output$myplot <- renderPlot({
     ggplot(data = filteredData(), aes(x = year, y = injury_count, color = degree_of_injury)) + 
    geom_line(show.legend = TRUE) +
    theme_minimal() +
    xlab("Year") +
    ylab("No. of Injuries") +
    ggtitle("Trend of Workplace Injuries per Degree of Injury \n Fatal vs Major vs Minor") +
    theme(plot.title = element_text(size = 15, face = "bold")) +
    labs(col="Degree of Injury")
   })
}
 
 shinyApp(ui, server)
Shiny applications not supported in static R Markdown documents

Visualization 2: Degree of Injury Per Industry (2011 - 2018)

Data Manipulation:

indust_data <- plot_data %>% 
  group_by(industry) %>% 
  summarise(total_count = sum(no._of_injuries), fatal_count = sum(degree_of_injury == "Fatal"), major_count = sum(degree_of_injury == "Major"), minor_count = sum(degree_of_injury == "Minor")) %>% 
  arrange(desc(total_count))
indust_data
## # A tibble: 17 x 5
##    industry                      total_count fatal_count major_count minor_count
##    <chr>                               <dbl>       <int>       <int>       <int>
##  1 Manufacturing                       21469          61         764        5817
##  2 Construction                        17832         139         693        3062
##  3 Others                              13804           2         306        1426
##  4 Community, Social & Personal…        8899          14         267        2745
##  5 Transportation & Storage             8009          69         269        2013
##  6 Accommodation & Food Services        8005           4         123        1082
##  7 Wholesale & Retail Trade             5044          19         187        1229
##  8 Administrative & Support Ser…        3410          26         109        1276
##  9 Marine                               3103          32         172         803
## 10 Professional, Scientific & T…        3080           4         101        1290
## 11 Real Estate Activities               2614           7          67         568
## 12 Financial & Insurance Servic…        1072           1          40         470
## 13 Water Supply, Sewerage & Was…        1001           8          61         514
## 14 Information & Communications          370           4           8         262
## 15 Agriculture & Fishing                 120           2           7          96
## 16 Electricity, Gas and Air-Con…         100           1           8          81
## 17 Mining & Quarrying                     23           0           2          21

Graph Plotting:

final_plot <- plot_ly(indust_data, y = ~industry, x = ~fatal_count, type = 'bar', name = 'Fatal Injuries')
final_plot <- final_plot %>% add_trace(x = ~major_count, name = 'Major Injuries')
final_plot <- final_plot %>% add_trace(x = ~minor_count, name = 'Minor Injuries')

final_plot <- final_plot %>% layout(title = "Degree of Injury Per Industry (2011 - 2018)",
         xaxis = list(title = ""),
         yaxis = list(title = "", barmode = 'group'))

Due to the huge differences in scale, please double click on each category for better visualization.

Data Source: http://www.mom.gov.sg/workplace-safety-and-health/wsh-reports-and-statistics

Scope 3: Description & Insights

The final data visualization and a short description of not more than 350 words. The description must provide at least two useful information revealed by the data visualization. (4 marks)

Description:

Visualization 1 explores the trend of the occurences of workplace injuries between 2011 to 2018, based on the degree of injuries. They are categorized into fatal, major and minor injuries. The x-axis represents the years while y-axis represents the number of injuries.

Visualization 2 shows the occurrences of workplace injuries per industry, with the x-axis as the count and y-axis as the industry. I made use of plotly to come up with this graph and overcome the problem of large scale differences.

Useful Information:

  1. The number of fatal injuries has greatly decreased over the years, indicating improvements and efforts of workplaces putting in place better safety measures to protect their workers.

  2. It is observed that both Manufacturing and Construction are within the top 3 industries in all three categories of injuries. However, I realize that the injuries sustained in the Construction Industry is more likely to be fatal than that of Manufacturing as number of fatal injuries in Construction is twice as much as Manufacturing. On the other hand, the injuries sustained in the Manufacturing Industry is more likely to be minor as compared to Manufacturing, with the number of minor injuries in Manufacturing being twice as much as Construction.

Thank you.