library(tidyverse)
library(knitr)
library(dplyr)
library(ggplot2)
In the following code block, I load the csv of Sean’s work order data and glimpse the data frame. I am interested in determining if there is a backlog in any one step of the work order process, and specifically, if this varies by whether or not the work order is preventative or corrective.
work_order_url <- "https://raw.githubusercontent.com/mollysiebecker/DATA-607/main/work_order_csv_untidy.csv"
work_order <- read.csv(url(work_order_url))
head(work_order)
## Work.Orders X Work.Order.Status X.1 X.2 X.3
## 1 Building Type NOASSIGN WORKASSIGN WORKPLAN WORKSCHED
## 2 B7 Prev Maintenance 16 44 47 93
## 3 Corr Maintenacnce 20 7 8 4
## 4 B23 Prev Maintenance 75 47 24 77
## 5 Corr Maintenacnce 8 20 8 3
## 6 B30 Prev Maintenance 59 10 15 54
## X.4 X.5
## 1 INPROG COMP
## 2 46 2367
## 3 16 326
## 4 41 1482
## 5 18 526
## 6 51 2345
In the following code block, I drop the first row, rename the columns, fill in the missing values in the Building column, and pivot longer.
work_order <- work_order[-1, ]
work_order <- work_order %>%
rename("building" = "Work.Orders", "maintenance_type" = "X", "no_assign" = "Work.Order.Status", "work_assign" = "X.1", "work_plan" = "X.2", "work_sched" = "X.3", "in_progress" = "X.4", "complete" = "X.5") %>%
mutate(building = na_if(building, "")) %>%
fill(building) %>%
pivot_longer(cols = 3:8, names_to = "status", values_to = "count")
Below, I replace values for greater clarity, and display the resulting tidy data frame using kable.
work_order$status[work_order$status == "no_assign"] <- "not assigned"
work_order$status[work_order$status == "work_assign"] <- "assigned"
work_order$status[work_order$status == "work_plan"] <- "planned"
work_order$status[work_order$status == "work_sched"] <- "scheduled"
work_order$status[work_order$status == "in_progress"] <- "in progress"
work_order$status[work_order$status == "complete"] <- "completed"
work_order$maintenance_type[work_order$maintenance_type == "Prev Maintenance"] <- "preventative"
work_order$maintenance_type[work_order$maintenance_type == "Corr Maintenacnce"] <- "corrective"
kable(work_order, format = "pipe", col.names = c("Building", "Maintenance Type", "Status", "Count"), caption = "Work Order Status by Building and Maintenance Type", align = "c")
Building | Maintenance Type | Status | Count |
---|---|---|---|
B7 | preventative | not assigned | 16 |
B7 | preventative | assigned | 44 |
B7 | preventative | planned | 47 |
B7 | preventative | scheduled | 93 |
B7 | preventative | in progress | 46 |
B7 | preventative | completed | 2367 |
B7 | corrective | not assigned | 20 |
B7 | corrective | assigned | 7 |
B7 | corrective | planned | 8 |
B7 | corrective | scheduled | 4 |
B7 | corrective | in progress | 16 |
B7 | corrective | completed | 326 |
B23 | preventative | not assigned | 75 |
B23 | preventative | assigned | 47 |
B23 | preventative | planned | 24 |
B23 | preventative | scheduled | 77 |
B23 | preventative | in progress | 41 |
B23 | preventative | completed | 1482 |
B23 | corrective | not assigned | 8 |
B23 | corrective | assigned | 20 |
B23 | corrective | planned | 8 |
B23 | corrective | scheduled | 3 |
B23 | corrective | in progress | 18 |
B23 | corrective | completed | 526 |
B30 | preventative | not assigned | 59 |
B30 | preventative | assigned | 10 |
B30 | preventative | planned | 15 |
B30 | preventative | scheduled | 54 |
B30 | preventative | in progress | 51 |
B30 | preventative | completed | 2345 |
B30 | corrective | not assigned | 15 |
B30 | corrective | assigned | 11 |
B30 | corrective | planned | 18 |
B30 | corrective | scheduled | 4 |
B30 | corrective | in progress | 8 |
B30 | corrective | completed | 123 |
Below, I coerce the count variable to be numeric, and specify the correct order of the work status to aid in visualizing the data later.
work_order$count <- as.numeric(work_order$count)
work_flow_order <- c("not assigned", "assigned", "planned", "scheduled", "in progress", "completed")
work_order$status <- factor(work_order$status, levels = work_flow_order)
Below, I create a new data frame that finds the total number of projects in each maintenance category, grouped by status, then display this data in a bar graph.
work_order_summary <- work_order %>%
group_by(maintenance_type, status) %>%
summarize(total_orders = sum(count), .groups = "drop")
ggplot(work_order_summary, aes(x = maintenance_type, y = total_orders, fill = status)) +
geom_bar(stat = "identity", position = "dodge") +
labs(title = "Number of Work Projects by Maintenance Type",
x = "Maintenance Type", y = "Count") +
scale_fill_brewer(palette = "Set3")
Since it is clear that there are vastly more projects that are completed, I create a new data frame filtering these out, to aid in visualizing how the remaining work order statuses compare.
work_order_summary_incomplete <- work_order_summary %>%
filter(!(status == "completed"))
ggplot(work_order_summary_incomplete, aes(x = maintenance_type, y = total_orders, fill = status)) +
geom_bar(stat = "identity", position = "dodge") +
labs(title = "Number of Work Projects by Maintenance Type (Incomplete Projects)",
x = "Maintenance Type", y = "Count") +
scale_fill_brewer(palette = "Set3")
Many more preventative than corrective orders have been placed, and for both types, most of the orders are completed. Out of those that are not completed, preventative and corrective maintenance follow a similar pattern for all except “scheduled” work orders. Scheduled work orders are the most common status for preventative maintenance, and the least common for corrective maintenance. One possible interpretation is that corrective maintenance is likely to be more urgent than preventative maintenance, and therefore when corrective maintenance is scheduled, it is likely to be scheduled to happen very soon, and quickly switch from “scheduled” to “in progress,” whereas preventative maintenance may be more likely to be scheduled further in advance, resulting in the higher number of scheduled projects. Further data collection and analysis could confirm this by gathering the dates on which each project is moved into each status, to find the average time that corrective maintanence and preventative maintenance projects stay in “scheduled.”