Overview

This is the third part of my Project 2 assignment for DATA607 in the Fall 2023 Term at CUNY SPS. In this assignment I import a wide data set, tidy it, and then analyze it. This third data set contains data for a work order process at a manufacturing site.

Tidying Data

In this code block, I load the necessary libraries and import the data from my github repository.

library(tidyr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats   1.0.0     ✔ readr     2.1.4
## ✔ ggplot2   3.4.4     ✔ stringr   1.5.0
## ✔ lubridate 1.9.2     ✔ tibble    3.2.1
## ✔ purrr     1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(knitr)

raw_data <- read.csv("https://raw.githubusercontent.com/Marley-Myrianthopoulos/Data607Project2/main/DATA607_work_order.csv")

kable(raw_data, format = "pipe", caption = "Initial Work Order Data", align = "cccccccc")
Initial Work Order Data
Work.Orders X Work.Order.Status X.1 X.2 X.3 X.4 X.5
Building Type NOASSIGN WORKASSIGN WORKPLAN WORKSCHED INPROG COMP
B7 Prev Maintenance 16 44 47 93 46 2367
Corr Maintenacnce 20 7 8 4 16 326
B23 Prev Maintenance 75 47 24 77 41 1482
Corr Maintenacnce 8 20 8 3 18 526
B30 Prev Maintenance 59 10 15 54 51 2345
Corr Maintenacnce 15 11 18 4 8 123

In this code block, I prepare the data for tidying by renaming the columns and removing the first row (which does not contain any useful data).

prep_data <- raw_data[-1,]

colnames(prep_data) <- c("Building", "Type", "None", "Assigned", "Planned", "Scheduled", "In Progress", "Complete")

kable(prep_data, format = "pipe", caption = "Data Prepped for Tidying", align = "cccccccc")
Data Prepped for Tidying
Building Type None Assigned Planned Scheduled In Progress Complete
2 B7 Prev Maintenance 16 44 47 93 46 2367
3 Corr Maintenacnce 20 7 8 4 16 326
4 B23 Prev Maintenance 75 47 24 77 41 1482
5 Corr Maintenacnce 8 20 8 3 18 526
6 B30 Prev Maintenance 59 10 15 54 51 2345
7 Corr Maintenacnce 15 11 18 4 8 123

In this code block, I “tidy” the data by using pivot longer to convert the data into a format that includes a variable for the work order status, rather than having each possible status be a separate column. The data is now “tidy”.

tidy_data <- prep_data %>%
  mutate(Building = na_if(Building, "")) %>%
  fill(Building) %>%
  pivot_longer(
    cols = -c("Building", "Type"),
    names_to = ("Status"),
    values_to = ("Tally")
  )

kable(tidy_data, format = "pipe", caption = "Tidy Work Order Data", align = "cllc")
Tidy Work Order Data
Building Type Status Tally
B7 Prev Maintenance None 16
B7 Prev Maintenance Assigned 44
B7 Prev Maintenance Planned 47
B7 Prev Maintenance Scheduled 93
B7 Prev Maintenance In Progress 46
B7 Prev Maintenance Complete 2367
B7 Corr Maintenacnce None 20
B7 Corr Maintenacnce Assigned 7
B7 Corr Maintenacnce Planned 8
B7 Corr Maintenacnce Scheduled 4
B7 Corr Maintenacnce In Progress 16
B7 Corr Maintenacnce Complete 326
B23 Prev Maintenance None 75
B23 Prev Maintenance Assigned 47
B23 Prev Maintenance Planned 24
B23 Prev Maintenance Scheduled 77
B23 Prev Maintenance In Progress 41
B23 Prev Maintenance Complete 1482
B23 Corr Maintenacnce None 8
B23 Corr Maintenacnce Assigned 20
B23 Corr Maintenacnce Planned 8
B23 Corr Maintenacnce Scheduled 3
B23 Corr Maintenacnce In Progress 18
B23 Corr Maintenacnce Complete 526
B30 Prev Maintenance None 59
B30 Prev Maintenance Assigned 10
B30 Prev Maintenance Planned 15
B30 Prev Maintenance Scheduled 54
B30 Prev Maintenance In Progress 51
B30 Prev Maintenance Complete 2345
B30 Corr Maintenacnce None 15
B30 Corr Maintenacnce Assigned 11
B30 Corr Maintenacnce Planned 18
B30 Corr Maintenacnce Scheduled 4
B30 Corr Maintenacnce In Progress 8
B30 Corr Maintenacnce Complete 123

Data Analysis

Sean shared this data set. Although he did not provide an analysis question, the most logical question to me is whether buildings that conduct more preventative maintenance have to conduct less corrective maintenance. To accomplish this, I will create a bar plot of completed preventative maintenance by building and active corrective maintenance (assigned, planned, scheduled, or in progress) by building.

In this code block, I convert the tally column to integers so I can add them and then create a new data frame with each building’s current active corrective maintenance projects. In order to accomplish this, I had to correct the misspelling of “corr maintenance” from the “Type” column (the misspelling is present in the original data). I could have done both of these adjustments as part of the tidying process but hadn’t realized I would need them at the time and wanted to preserve the order of my process in this project.

tidy_data <- mutate_if(tidy_data, is.character, str_replace_all, pattern = "Corr Maintenacnce", replacement = "Corr Maintenance")

tidy_data$Tally <- as.integer(tidy_data$Tally)

active_status <- c("Assigned", "Planned", "Scheduled", "In Progress")

corr_maintenance_summary <- tidy_data %>%
  filter(Status %in% active_status & Type == "Corr Maintenance") %>%
  group_by(Building) %>%
  mutate(corr_active = sum(Tally)) %>%
  select(Building, corr_active) %>%
  distinct()

kable(corr_maintenance_summary, format = "pipe", caption = "Active Corrective Maintenance Projects by Building", align = "lc")
Active Corrective Maintenance Projects by Building
Building corr_active
B7 35
B23 49
B30 41

In this code block, I create a new data frame with each building’s completed preventative maintenance projects and then join the two data frames I’ve created, then pivot them again to get the data ready for a bar plot.

completed_prev <- tidy_data %>%
  filter(Status == "Complete" & Type == "Prev Maintenance") %>%
  select(Building, Tally) %>%
  rename(comp_prev = Tally)

building_data <- inner_join(completed_prev, corr_maintenance_summary, join_by(Building))

building_data <- building_data %>%
  pivot_longer(
    cols = -"Building",
    names_to = "Type",
    values_to = "Count"
  )

building_data <- mutate_if(building_data, is.character, str_replace_all, pattern = "corr_active", replacement = "Active Corrective")

building_data <- mutate_if(building_data, is.character, str_replace_all, pattern = "comp_prev", replacement = "Completed Preventative")

kable(building_data, format = "pipe", caption = "Comparison of Completed Preventative Maintenance and Active Corrective Maintenance by Building", align = "llc")
Comparison of Completed Preventative Maintenance and Active Corrective Maintenance by Building
Building Type Count
B7 Completed Preventative 2367
B7 Active Corrective 35
B23 Completed Preventative 1482
B23 Active Corrective 49
B30 Completed Preventative 2345
B30 Active Corrective 41

In this code block, I use ggplot to create bar graphs comparing the completed preventative maintenance projects and the active corrective maintenance projects for each building.

ggplot(building_data, aes(x = Type, y = Count, fill = Building)) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(title = "Building Maintenance")

Findings and Recommendations

As I suspected would be the case, the building with the greatest number of completed preventative maintenance projects (B7) has the smallest number of active corrective maintenance projects, and the building with the smallest number of completed preventative maintenance projects (B23) has the greatest number of active corrective maintenance projects. I’m reminded of the old line, “If you don’t schedule time for maintenance, your equipment will schedule it for you.”