This is the third part of my Project 2 assignment for DATA607 in the Fall 2023 Term at CUNY SPS. In this assignment I import a wide data set, tidy it, and then analyze it. This third data set contains data for a work order process at a manufacturing site.
In this code block, I load the necessary libraries and import the data from my github repository.
library(tidyr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0 ✔ readr 2.1.4
## ✔ ggplot2 3.4.4 ✔ stringr 1.5.0
## ✔ lubridate 1.9.2 ✔ tibble 3.2.1
## ✔ purrr 1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(knitr)
raw_data <- read.csv("https://raw.githubusercontent.com/Marley-Myrianthopoulos/Data607Project2/main/DATA607_work_order.csv")
kable(raw_data, format = "pipe", caption = "Initial Work Order Data", align = "cccccccc")
Work.Orders | X | Work.Order.Status | X.1 | X.2 | X.3 | X.4 | X.5 |
---|---|---|---|---|---|---|---|
Building | Type | NOASSIGN | WORKASSIGN | WORKPLAN | WORKSCHED | INPROG | COMP |
B7 | Prev Maintenance | 16 | 44 | 47 | 93 | 46 | 2367 |
Corr Maintenacnce | 20 | 7 | 8 | 4 | 16 | 326 | |
B23 | Prev Maintenance | 75 | 47 | 24 | 77 | 41 | 1482 |
Corr Maintenacnce | 8 | 20 | 8 | 3 | 18 | 526 | |
B30 | Prev Maintenance | 59 | 10 | 15 | 54 | 51 | 2345 |
Corr Maintenacnce | 15 | 11 | 18 | 4 | 8 | 123 |
In this code block, I prepare the data for tidying by renaming the columns and removing the first row (which does not contain any useful data).
prep_data <- raw_data[-1,]
colnames(prep_data) <- c("Building", "Type", "None", "Assigned", "Planned", "Scheduled", "In Progress", "Complete")
kable(prep_data, format = "pipe", caption = "Data Prepped for Tidying", align = "cccccccc")
Building | Type | None | Assigned | Planned | Scheduled | In Progress | Complete | |
---|---|---|---|---|---|---|---|---|
2 | B7 | Prev Maintenance | 16 | 44 | 47 | 93 | 46 | 2367 |
3 | Corr Maintenacnce | 20 | 7 | 8 | 4 | 16 | 326 | |
4 | B23 | Prev Maintenance | 75 | 47 | 24 | 77 | 41 | 1482 |
5 | Corr Maintenacnce | 8 | 20 | 8 | 3 | 18 | 526 | |
6 | B30 | Prev Maintenance | 59 | 10 | 15 | 54 | 51 | 2345 |
7 | Corr Maintenacnce | 15 | 11 | 18 | 4 | 8 | 123 |
In this code block, I “tidy” the data by using pivot longer to convert the data into a format that includes a variable for the work order status, rather than having each possible status be a separate column. The data is now “tidy”.
tidy_data <- prep_data %>%
mutate(Building = na_if(Building, "")) %>%
fill(Building) %>%
pivot_longer(
cols = -c("Building", "Type"),
names_to = ("Status"),
values_to = ("Tally")
)
kable(tidy_data, format = "pipe", caption = "Tidy Work Order Data", align = "cllc")
Building | Type | Status | Tally |
---|---|---|---|
B7 | Prev Maintenance | None | 16 |
B7 | Prev Maintenance | Assigned | 44 |
B7 | Prev Maintenance | Planned | 47 |
B7 | Prev Maintenance | Scheduled | 93 |
B7 | Prev Maintenance | In Progress | 46 |
B7 | Prev Maintenance | Complete | 2367 |
B7 | Corr Maintenacnce | None | 20 |
B7 | Corr Maintenacnce | Assigned | 7 |
B7 | Corr Maintenacnce | Planned | 8 |
B7 | Corr Maintenacnce | Scheduled | 4 |
B7 | Corr Maintenacnce | In Progress | 16 |
B7 | Corr Maintenacnce | Complete | 326 |
B23 | Prev Maintenance | None | 75 |
B23 | Prev Maintenance | Assigned | 47 |
B23 | Prev Maintenance | Planned | 24 |
B23 | Prev Maintenance | Scheduled | 77 |
B23 | Prev Maintenance | In Progress | 41 |
B23 | Prev Maintenance | Complete | 1482 |
B23 | Corr Maintenacnce | None | 8 |
B23 | Corr Maintenacnce | Assigned | 20 |
B23 | Corr Maintenacnce | Planned | 8 |
B23 | Corr Maintenacnce | Scheduled | 3 |
B23 | Corr Maintenacnce | In Progress | 18 |
B23 | Corr Maintenacnce | Complete | 526 |
B30 | Prev Maintenance | None | 59 |
B30 | Prev Maintenance | Assigned | 10 |
B30 | Prev Maintenance | Planned | 15 |
B30 | Prev Maintenance | Scheduled | 54 |
B30 | Prev Maintenance | In Progress | 51 |
B30 | Prev Maintenance | Complete | 2345 |
B30 | Corr Maintenacnce | None | 15 |
B30 | Corr Maintenacnce | Assigned | 11 |
B30 | Corr Maintenacnce | Planned | 18 |
B30 | Corr Maintenacnce | Scheduled | 4 |
B30 | Corr Maintenacnce | In Progress | 8 |
B30 | Corr Maintenacnce | Complete | 123 |
Sean shared this data set. Although he did not provide an analysis question, the most logical question to me is whether buildings that conduct more preventative maintenance have to conduct less corrective maintenance. To accomplish this, I will create a bar plot of completed preventative maintenance by building and active corrective maintenance (assigned, planned, scheduled, or in progress) by building.
In this code block, I convert the tally column to integers so I can add them and then create a new data frame with each building’s current active corrective maintenance projects. In order to accomplish this, I had to correct the misspelling of “corr maintenance” from the “Type” column (the misspelling is present in the original data). I could have done both of these adjustments as part of the tidying process but hadn’t realized I would need them at the time and wanted to preserve the order of my process in this project.
tidy_data <- mutate_if(tidy_data, is.character, str_replace_all, pattern = "Corr Maintenacnce", replacement = "Corr Maintenance")
tidy_data$Tally <- as.integer(tidy_data$Tally)
active_status <- c("Assigned", "Planned", "Scheduled", "In Progress")
corr_maintenance_summary <- tidy_data %>%
filter(Status %in% active_status & Type == "Corr Maintenance") %>%
group_by(Building) %>%
mutate(corr_active = sum(Tally)) %>%
select(Building, corr_active) %>%
distinct()
kable(corr_maintenance_summary, format = "pipe", caption = "Active Corrective Maintenance Projects by Building", align = "lc")
Building | corr_active |
---|---|
B7 | 35 |
B23 | 49 |
B30 | 41 |
In this code block, I create a new data frame with each building’s completed preventative maintenance projects and then join the two data frames I’ve created, then pivot them again to get the data ready for a bar plot.
completed_prev <- tidy_data %>%
filter(Status == "Complete" & Type == "Prev Maintenance") %>%
select(Building, Tally) %>%
rename(comp_prev = Tally)
building_data <- inner_join(completed_prev, corr_maintenance_summary, join_by(Building))
building_data <- building_data %>%
pivot_longer(
cols = -"Building",
names_to = "Type",
values_to = "Count"
)
building_data <- mutate_if(building_data, is.character, str_replace_all, pattern = "corr_active", replacement = "Active Corrective")
building_data <- mutate_if(building_data, is.character, str_replace_all, pattern = "comp_prev", replacement = "Completed Preventative")
kable(building_data, format = "pipe", caption = "Comparison of Completed Preventative Maintenance and Active Corrective Maintenance by Building", align = "llc")
Building | Type | Count |
---|---|---|
B7 | Completed Preventative | 2367 |
B7 | Active Corrective | 35 |
B23 | Completed Preventative | 1482 |
B23 | Active Corrective | 49 |
B30 | Completed Preventative | 2345 |
B30 | Active Corrective | 41 |
In this code block, I use ggplot to create bar graphs comparing the completed preventative maintenance projects and the active corrective maintenance projects for each building.
ggplot(building_data, aes(x = Type, y = Count, fill = Building)) +
geom_bar(stat = "identity", position = "dodge") +
labs(title = "Building Maintenance")
As I suspected would be the case, the building with the greatest number of completed preventative maintenance projects (B7) has the smallest number of active corrective maintenance projects, and the building with the smallest number of completed preventative maintenance projects (B23) has the greatest number of active corrective maintenance projects. I’m reminded of the old line, “If you don’t schedule time for maintenance, your equipment will schedule it for you.”