library(tidyverse)
library(knitr)
library(dplyr)
library(ggplot2)

Tidying the Data

In the following code block, I load the csv of Sean’s work order data and glimpse the data frame. I am interested in determining if there is a backlog in any one step of the work order process, and specifically, if this varies by whether or not the work order is preventative or corrective.

work_order_url <- "https://raw.githubusercontent.com/mollysiebecker/DATA-607/main/work_order_csv_untidy.csv"
work_order <- read.csv(url(work_order_url))
head(work_order)
##   Work.Orders                 X Work.Order.Status        X.1      X.2       X.3
## 1    Building              Type          NOASSIGN WORKASSIGN WORKPLAN WORKSCHED
## 2          B7  Prev Maintenance                16         44       47        93
## 3             Corr Maintenacnce                20          7        8         4
## 4         B23  Prev Maintenance                75         47       24        77
## 5             Corr Maintenacnce                 8         20        8         3
## 6         B30  Prev Maintenance                59         10       15        54
##      X.4  X.5
## 1 INPROG COMP
## 2     46 2367
## 3     16  326
## 4     41 1482
## 5     18  526
## 6     51 2345

In the following code block, I drop the first row, rename the columns, fill in the missing values in the Building column, and pivot longer.

work_order <- work_order[-1, ]
work_order <- work_order %>%
  rename("building" = "Work.Orders", "maintenance_type" = "X", "no_assign" = "Work.Order.Status", "work_assign" = "X.1", "work_plan" = "X.2", "work_sched" = "X.3", "in_progress" = "X.4", "complete" = "X.5") %>%
  mutate(building = na_if(building, "")) %>%
  fill(building) %>%
  pivot_longer(cols = 3:8, names_to = "status", values_to = "count")

Below, I replace values for greater clarity, and display the resulting tidy data frame using kable.

work_order$status[work_order$status == "no_assign"] <- "not assigned"
work_order$status[work_order$status == "work_assign"] <- "assigned"
work_order$status[work_order$status == "work_plan"] <- "planned"
work_order$status[work_order$status == "work_sched"] <- "scheduled"
work_order$status[work_order$status == "in_progress"] <- "in progress"
work_order$status[work_order$status == "complete"] <- "completed"
work_order$maintenance_type[work_order$maintenance_type == "Prev Maintenance"] <- "preventative"
work_order$maintenance_type[work_order$maintenance_type == "Corr Maintenacnce"] <- "corrective"

kable(work_order, format = "pipe", col.names = c("Building", "Maintenance Type", "Status", "Count"), caption = "Work Order Status by Building and Maintenance Type", align = "c")
Work Order Status by Building and Maintenance Type
Building Maintenance Type Status Count
B7 preventative not assigned 16
B7 preventative assigned 44
B7 preventative planned 47
B7 preventative scheduled 93
B7 preventative in progress 46
B7 preventative completed 2367
B7 corrective not assigned 20
B7 corrective assigned 7
B7 corrective planned 8
B7 corrective scheduled 4
B7 corrective in progress 16
B7 corrective completed 326
B23 preventative not assigned 75
B23 preventative assigned 47
B23 preventative planned 24
B23 preventative scheduled 77
B23 preventative in progress 41
B23 preventative completed 1482
B23 corrective not assigned 8
B23 corrective assigned 20
B23 corrective planned 8
B23 corrective scheduled 3
B23 corrective in progress 18
B23 corrective completed 526
B30 preventative not assigned 59
B30 preventative assigned 10
B30 preventative planned 15
B30 preventative scheduled 54
B30 preventative in progress 51
B30 preventative completed 2345
B30 corrective not assigned 15
B30 corrective assigned 11
B30 corrective planned 18
B30 corrective scheduled 4
B30 corrective in progress 8
B30 corrective completed 123

Analysis

Below, I coerce the count variable to be numeric, and specify the correct order of the work status to aid in visualizing the data later.

work_order$count <- as.numeric(work_order$count)
work_flow_order <- c("not assigned", "assigned", "planned", "scheduled", "in progress", "completed")
work_order$status <- factor(work_order$status, levels = work_flow_order)

Below, I create a new data frame that finds the total number of projects in each maintenance category, grouped by status, then display this data in a bar graph.

work_order_summary <- work_order %>%
  group_by(maintenance_type, status) %>%
  summarize(total_orders = sum(count), .groups = "drop")

ggplot(work_order_summary, aes(x = maintenance_type, y = total_orders, fill = status)) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(title = "Number of Work Projects by Maintenance Type",
       x = "Maintenance Type", y = "Count") +
  scale_fill_brewer(palette = "Set3")

Since it is clear that there are vastly more projects that are completed, I create a new data frame filtering these out, to aid in visualizing how the remaining work order statuses compare.

work_order_summary_incomplete <- work_order_summary %>%
  filter(!(status == "completed")) 

ggplot(work_order_summary_incomplete, aes(x = maintenance_type, y = total_orders, fill = status)) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(title = "Number of Work Projects by Maintenance Type (Incomplete Projects)",
       x = "Maintenance Type", y = "Count") +
  scale_fill_brewer(palette = "Set3")

Findings and Recommendations

Many more preventative than corrective orders have been placed, and for both types, most of the orders are completed. Out of those that are not completed, preventative and corrective maintenance follow a similar pattern for all except “scheduled” work orders. Scheduled work orders are the most common status for preventative maintenance, and the least common for corrective maintenance. One possible interpretation is that corrective maintenance is likely to be more urgent than preventative maintenance, and therefore when corrective maintenance is scheduled, it is likely to be scheduled to happen very soon, and quickly switch from “scheduled” to “in progress,” whereas preventative maintenance may be more likely to be scheduled further in advance, resulting in the higher number of scheduled projects. Further data collection and analysis could confirm this by gathering the dates on which each project is moved into each status, to find the average time that corrective maintanence and preventative maintenance projects stay in “scheduled.”