BUsinesss Process Analytics with R (BUPAR)
Introducing dataset
For this exercise, I have used production data from 4TU data repository. Data is available here
[1] "Case.ID" "Activity" "Resource" "Start.Timestamp"
[5] "Complete.Timestamp" "Span" "Work.Order..Qty" "Part.Desc."
[9] "Worker.ID" "Report.Type" "Qty.Completed" "Qty.Rejected"
[13] "Qty.for.MRB" "Rework"
The column names from the above data need to be mapped to the standard event log nomenclature. This is done by using the event log creating command. The standard nomencalture includes following column inputs
- case_id - which is a unique identifier for the whole process sequence,
- activity_id - a description on the individual activity within a process,
- activity_instance_id - which instances of activity should be treated as different from others,
- lifecycle_id - status/outcome of the process,
- timestamp - time at which logging was done,
- resource_id - machine/individual responsible for the instance of the process/activity.
However, before that the date time data needs to be brought to a proper date time format. For this, lubridate library is used.
At this point, an activity instance is added to the logs and an event log is created.
prod_data_with_instance<- prod_data %>%
group_by(Case.ID) %>%
mutate(activity_instance = as.character(row_number()))
prod_event = prod_data_with_instance %>%
eventlog(
case_id = "Case.ID",
activity_id = "Activity",
activity_instance_id = "activity_instance",
lifecycle_id = "Rework",
timestamp = "Complete.Timestamp",
resource_id = "Worker.ID"
)
prod_event %>% n_activities
[1] 55
prod_event %>% n_cases
[1] 225
prod_event %>% n_traces
[1] 221
prod_event %>% n_resources
[1] 49
There are a total of 55 activities, carried out for 225 cases, in 221 unique ways (traces) by 49 resources.
Activity Understanding & Analysis
Let us start what different level of activities are present and their relative frequency
activity_data<- prod_event %>%
activity_frequency(level = "activity")
activity_data_reduced<- activity_data[activity_data$relative>0.02,]
(plot(activity_data_reduced))

This shows that final inspection is the highest frequency activity, which is to be expected. After which the turning and milling quality checks are also high. Lapping and packaging follow up.
Visualizing process maps
Let us start by viewing one of the process maps.
event_reduced<- prod_event[prod_event$Case.ID %in% c("Case 1"),]
event_reduced %>% process_map(type = frequency("relative"))
This shows for one of the cases, how the steps involved in the process play out. As we add more data, we can start to see some of the possible iterations in the process play out. It appears that the steps for this case are 1. Turning and milling 2. Turning and milling Q.C 3. Laser Marking 4. Lapping 5. Round grinding 6. Final Inspection 7. Packing
Self loops represent that more than one logging of the same activity is observed, which might be due to error messages or logging discrepancy.
If we add two more cases to this, more insights start to appear, see below plot for 3 cases.
event_reduced<- prod_event[prod_event$Case.ID %in% c("Case 1","Case 111","Case 104"),]
event_reduced %>% process_map(type = frequency("relative"))
It is clear that one of the cases (1/3) went for turning rather than turning and milling, which also increased a turning Q.C point
Above plot also shows that about 20% of the components directly go from laser marking to end, 8% go from turning to end while about 20% go through the lapping and griding operation before going to packing and then ending.
If we add few more cases to the above plot, it starts to get cumbersome to understand.
Similar to process maps, resource maps are also another way of understanding flow.
event_reduced<- prod_event[prod_event$Case.ID %in% c("Case 1","Case 111","Case 104"),]
event_reduced %>% filter_trace_frequency(percentage = 0.2) %>% resource_map(type = frequency("absolute"))
There are further ways available for exploring processes.
Precedence Matrix
Another way to visualize the process is precdence matrix which shows which steps tend to happen together. In this case since logging seems to have duplication in activity, the plot is not very insightful.

Resource & activity Analysis
Resource specialization and utilization is another key activity which can be helped by process analytics
prod_event %>%
resource_specialisation("resource") %>% plot()

The above plot for instance shows that ID4932 and ID0937 are generalists, performing upto 15 activity types, ID3641, ID3719 are specialists.
prod_event %>%
resource_specialisation("activity") %>% plot()

Final inspection, packing, lapping, round griding and turning Q.C are specialized activities perfomred only by one resource.
In terms of activities, it would also be useful to understand which activities are always performed, and which are rare.
prod_event %>% activity_presence() %>% plot()

Trace analysis
Trace length and plots are used to see how much variation is there across cases.
prod_event %>% trace_length() %>% plot()

prod_event %>% trace_length()
min q1 median mean q3 max st_dev iqr
1.00000 8.00000 14.00000 20.19111 23.00000 175.00000 20.93024 15.00000
The plot shows that on an average, 75% of cases have between 8-23 steos, although a maximum of 175 steps have been observed as well. Median number of steps is 14, and average number of steps is 20.
Do all activities start and end at the same points? This can be visualized using bar plots as well
start_activities(prod_event, level = "activity") %>% plot()

While turning and milling seems to be the first operation, machine 6 sees highest rate of starting points.

As expected, most operations end with final inspection or packing.

The above plot shows how many cases can be described with a relatively small number of traces indicating the consistency in the process. Here we have seen that for 225 cases, as high as 221 traces exist, so there is just less consistency. However, this is due to machine number being part of activity description, which means that similar steps are also treated differently.
Consolidation of activities
Consolidation of activities allows relabeling and further high level view of the process.
event_reduced<- prod_event_united[prod_event_united$Case.ID %in% c("Case 1","Case 111","Case 104"),]
event_reduced %>% filter_trace_frequency(percentage = 0.8) %>% process_map(type = frequency("absolute"))
The 10% most infrequent traces are plotted below with united data.
prod_event_united %>% trace_explorer(coverage = 0.1, type = "infrequent")

