Describe the major data and design challenges faced in accomplishing the task, and how you plan to overcome these challenges with a proposed sketched design. (3 marks)
Huge Dataset - There are too many sub categories and information so it is difficult to find out exactly what I want to visualize. The data in “Incident_agent_sub_type” contains the same word as that from the “incident_agent”. It is difficult to filter the incident agent sub type uniquely.
Due to the nature of the data, some categories (minor injuries) have got larger number of accidents as compared to fatal ones. This is difficult to visualize due to the scale. As a result, I used both methods of R Shiny to make checkboxes as well as plotly for my interactive visualizations.
Many categories have got overlapping – eg, the same incident types can belong to different sub industries at the same time. This will not bring insightful results.
Provide step-by-step description on how the data visualization was prepared by using ggplot2 and other related R packages. (3 marks)
Loading Packages:
packages = c('tidyverse', 'ggplot2', 'dplyr')
for(p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p, character.only = T)
}
library(shiny)
library(plotly)
library(dplyr)
Importing Data:
plot_data <- read_csv(("workplace-injuries-by-industry-and-incident-types.csv"))
Examining Data:
plot_data
## # A tibble: 26,332 x 8
## year degree_of_injury industry sub_industry incident_type incident_agent
## <dbl> <chr> <chr> <chr> <chr> <chr>
## 1 2011 Fatal Communi… Repair & Ma… Caught in/ b… Vehicles
## 2 2011 Fatal Communi… Repair & Ma… Falls - Slip… Vehicles
## 3 2011 Fatal Constru… Civil Engin… Collapse/Fai… Others
## 4 2011 Fatal Constru… Civil Engin… Struck by Mo… Lifting Equip…
## 5 2011 Fatal Constru… Civil Engin… Struck by Mo… Pressurised E…
## 6 2011 Fatal Constru… Constructio… Caught in/ b… Lifting Equip…
## 7 2011 Fatal Constru… Constructio… Caught in/ b… Vehicles
## 8 2011 Fatal Constru… Constructio… Cave-in of e… Others
## 9 2011 Fatal Constru… Constructio… Collapse of … Physical Work…
## 10 2011 Fatal Constru… Constructio… Crane-related Lifting Equip…
## # … with 26,322 more rows, and 2 more variables: incident_agent_sub_type <chr>,
## # no._of_injuries <dbl>
Data Manipulation:
trend_data <- plot_data %>%
group_by(year, degree_of_injury) %>%
summarise(injury_count= sum(no._of_injuries))
ggplot(data=trend_data, aes(x = year, y = injury_count, color = degree_of_injury)) +
geom_line(show.legend = TRUE) +
theme_minimal() +
xlab("Year") +
ylab("No. of Injuries") +
ggtitle("Trend of Workplace Injuries per Degree of Injury \n Fatal vs Major vs Minor") +
theme(plot.title = element_text(size = 15, face = "bold")) +
labs(col="Degree of Injury")
Comments: I realized that the difference in scale is too huge here so the readability of the visualization for the audience is not as insightful. Hence I decided to make use of R Shiny to insert checkboxes in my graph for the readers to choose from. I made it reactive so the graph scale will change accordingly. Please run this through Rshiny where the link will be submitted via elearn to be able to see it.
Building the UI:
#shiny
ui <- bootstrapPage(
checkboxGroupInput("degree_of_injury_list", label = "Select the Injury Type",
choices = list("Fatal" = "Fatal",
"Major" = "Major",
"Minor" = "Minor")),
mainPanel(
plotOutput("myplot"),
textOutput("selected_var")
)
)
RShiny Server:
trend_data
## # A tibble: 24 x 3
## # Groups: year [8]
## year degree_of_injury injury_count
## <dbl> <chr> <dbl>
## 1 2011 Fatal 61
## 2 2011 Major 556
## 3 2011 Minor 9504
## 4 2012 Fatal 56
## 5 2012 Major 588
## 6 2012 Minor 10469
## 7 2013 Fatal 73
## 8 2013 Major 640
## 9 2013 Minor 11740
## 10 2014 Fatal 60
## # … with 14 more rows
server <- function(input, output, session) {
filteredData <- reactive({
res <- trend_data %>% filter(degree_of_injury %in% input$degree_of_injury_list)
})
output$myplot <- renderPlot({
ggplot(data = filteredData(), aes(x = year, y = injury_count, color = degree_of_injury)) +
geom_line(show.legend = TRUE) +
theme_minimal() +
xlab("Year") +
ylab("No. of Injuries") +
ggtitle("Trend of Workplace Injuries per Degree of Injury \n Fatal vs Major vs Minor") +
theme(plot.title = element_text(size = 15, face = "bold")) +
labs(col="Degree of Injury")
})
}
shinyApp(ui, server)
Data Manipulation:
indust_data <- plot_data %>%
group_by(industry) %>%
summarise(total_count = sum(no._of_injuries), fatal_count = sum(degree_of_injury == "Fatal"), major_count = sum(degree_of_injury == "Major"), minor_count = sum(degree_of_injury == "Minor")) %>%
arrange(desc(total_count))
indust_data
## # A tibble: 17 x 5
## industry total_count fatal_count major_count minor_count
## <chr> <dbl> <int> <int> <int>
## 1 Manufacturing 21469 61 764 5817
## 2 Construction 17832 139 693 3062
## 3 Others 13804 2 306 1426
## 4 Community, Social & Personal… 8899 14 267 2745
## 5 Transportation & Storage 8009 69 269 2013
## 6 Accommodation & Food Services 8005 4 123 1082
## 7 Wholesale & Retail Trade 5044 19 187 1229
## 8 Administrative & Support Ser… 3410 26 109 1276
## 9 Marine 3103 32 172 803
## 10 Professional, Scientific & T… 3080 4 101 1290
## 11 Real Estate Activities 2614 7 67 568
## 12 Financial & Insurance Servic… 1072 1 40 470
## 13 Water Supply, Sewerage & Was… 1001 8 61 514
## 14 Information & Communications 370 4 8 262
## 15 Agriculture & Fishing 120 2 7 96
## 16 Electricity, Gas and Air-Con… 100 1 8 81
## 17 Mining & Quarrying 23 0 2 21
Graph Plotting:
final_plot <- plot_ly(indust_data, y = ~industry, x = ~fatal_count, type = 'bar', name = 'Fatal Injuries')
final_plot <- final_plot %>% add_trace(x = ~major_count, name = 'Major Injuries')
final_plot <- final_plot %>% add_trace(x = ~minor_count, name = 'Minor Injuries')
final_plot <- final_plot %>% layout(title = "Degree of Injury Per Industry (2011 - 2018)",
xaxis = list(title = ""),
yaxis = list(title = "", barmode = 'group'))
Due to the huge differences in scale, please double click on each category for better visualization.
Data Source: http://www.mom.gov.sg/workplace-safety-and-health/wsh-reports-and-statistics
The final data visualization and a short description of not more than 350 words. The description must provide at least two useful information revealed by the data visualization. (4 marks)
Description:
Visualization 1 explores the trend of the occurences of workplace injuries between 2011 to 2018, based on the degree of injuries. They are categorized into fatal, major and minor injuries. The x-axis represents the years while y-axis represents the number of injuries.
Visualization 2 shows the occurrences of workplace injuries per industry, with the x-axis as the count and y-axis as the industry. I made use of plotly to come up with this graph and overcome the problem of large scale differences.
Useful Information:
The number of fatal injuries has greatly decreased over the years, indicating improvements and efforts of workplaces putting in place better safety measures to protect their workers.
It is observed that both Manufacturing and Construction are within the top 3 industries in all three categories of injuries. However, I realize that the injuries sustained in the Construction Industry is more likely to be fatal than that of Manufacturing as number of fatal injuries in Construction is twice as much as Manufacturing. On the other hand, the injuries sustained in the Manufacturing Industry is more likely to be minor as compared to Manufacturing, with the number of minor injuries in Manufacturing being twice as much as Construction.
Thank you.