bupaR is an open-source, integrated suite of R-packages designed for handling & analysis of business process data.
library(bupaR) # Core package of framework. Includes dplyr functions for event log objects. implements an S3-objects class for event data
library(edeaR) # Exploratory & Descriptive Event-Data Analyses
#The process metrics are based on Lean Six Sigma literature and can be analyzed and visualized at different levels of granularity. Additionally, edeaR contains an extensive collection of event data specific filters
library(eventdataR) # Contains Sepsis & patients dataset
#eventdataR is a data-package which provide easy access to event logs for testing and experiments. Currently, both artificial event data, e.g. patients, as well as real-life event data, such as the sepsis dataset.
library(processmapR) # Process Data Specific Visualizations
#The provided visualizations are highly customizable and can be used to give insights to different aspects of the process
library(processanimateR) #Extension of processmapR, allows for animation of process
library(petrinetR) # Building, Visualizing, Exporting, Replaying Petri Nets
#Petri Nets can be visualized, adjusted and one can perform manual token replay and parse transition sequences.
#######what is a petri net?
library(processmonitR) #Provides a limited set of process dashboards which can be used in a permanent, real-time fashion and for interactive analysis
library(heuristicsmineR) # Algorithm that actos on Directly-Follows Graph. Enables noise visibility and common constructs
# Basic Packages
library(dplyr) ##pipes
library(tidyr) ##tidy data, partcularly the crossing() function
library(lubridate) ##date time manipulation
#Build a Data Frame
t=Sys.time()
data <- data.frame(case = rep("A",5),
activity_id = c("A","B","C","D","E"),
activity_instance_id = 1:5,
lifecycle_id = rep("complete",5),
timestamp = c(t,t+1000,t+2000,t+3000,t+4000),
resource = rep("resource 1", 5))
data
## case activity_id activity_instance_id lifecycle_id timestamp
## 1 A A 1 complete 2020-06-06 15:07:07
## 2 A B 2 complete 2020-06-06 15:23:47
## 3 A C 3 complete 2020-06-06 15:40:27
## 4 A D 4 complete 2020-06-06 15:57:07
## 5 A E 5 complete 2020-06-06 16:13:47
## resource
## 1 resource 1
## 2 resource 1
## 3 resource 1
## 4 resource 1
## 5 resource 1
#How to create an Event Log
first_log <- eventlog(data,case_id = "case",
activity_id = "activity_id",
activity_instance_id = "activity_instance_id",
lifecycle_id = "lifecycle_id",
timestamp = "timestamp",
resource_id = "resource")
first_log
## Log of 5 events consisting of:
## 1 trace
## 1 case
## 5 instances of 5 activities
## 1 resource
## Events occurred from 2020-06-06 15:07:07 until 2020-06-06 16:13:47
##
## Variables were mapped as follows:
## Case identifier: case
## Activity identifier: activity_id
## Resource identifier: resource
## Activity instance identifier: activity_instance_id
## Timestamp: timestamp
## Lifecycle transition: lifecycle_id
##
## # A tibble: 5 x 7
## case activity_id activity_instan~ lifecycle_id timestamp resource
## <chr> <fct> <chr> <fct> <dttm> <fct>
## 1 A A 1 complete 2020-06-06 15:07:07 resourc~
## 2 A B 2 complete 2020-06-06 15:23:47 resourc~
## 3 A C 3 complete 2020-06-06 15:40:27 resourc~
## 4 A D 4 complete 2020-06-06 15:57:07 resourc~
## 5 A E 5 complete 2020-06-06 16:13:47 resourc~
## # ... with 1 more variable: .order <int>
#Summary of Data
summary(patients)
## Number of events: 5442
## Number of cases: 500
## Number of traces: 7
## Number of distinct activities: 7
## Average trace length: 10.884
##
## Start eventlog: 2017-01-02 11:41:53
## End eventlog: 2018-05-05 07:16:02
## handling patient employee handling_id
## Blood test : 474 Length:5442 r1:1000 Length:5442
## Check-out : 984 Class :character r2:1000 Class :character
## Discuss Results : 990 Mode :character r3: 474 Mode :character
## MRI SCAN : 472 r4: 472
## Registration :1000 r5: 522
## Triage and Assessment:1000 r6: 990
## X-Ray : 522 r7: 984
## registration_type time .order
## complete:2721 Min. :2017-01-02 11:41:53 Min. : 1
## start :2721 1st Qu.:2017-05-06 17:15:18 1st Qu.:1361
## Median :2017-09-08 04:16:50 Median :2722
## Mean :2017-09-02 20:52:34 Mean :2722
## 3rd Qu.:2017-12-22 15:44:11 3rd Qu.:4082
## Max. :2018-05-05 07:16:02 Max. :5442
##
slice(patients, 1)
## Log of 12 events consisting of:
## 1 trace
## 1 case
## 6 instances of 6 activities
## 6 resources
## Events occurred from 2017-01-02 11:41:53 until 2017-01-09 19:45:45
##
## Variables were mapped as follows:
## Case identifier: patient
## Activity identifier: handling
## Resource identifier: employee
## Activity instance identifier: handling_id
## Timestamp: time
## Lifecycle transition: registration_type
##
## # A tibble: 12 x 7
## handling patient employee handling_id registration_ty~ time
## <fct> <chr> <fct> <chr> <fct> <dttm>
## 1 Registr~ 1 r1 1 start 2017-01-02 11:41:53
## 2 Triage ~ 1 r2 501 start 2017-01-02 12:40:20
## 3 Blood t~ 1 r3 1001 start 2017-01-05 08:59:04
## 4 MRI SCAN 1 r4 1238 start 2017-01-05 21:37:12
## 5 Discuss~ 1 r6 1735 start 2017-01-07 07:57:49
## 6 Check-o~ 1 r7 2230 start 2017-01-09 17:09:43
## 7 Registr~ 1 r1 1 complete 2017-01-02 12:40:20
## 8 Triage ~ 1 r2 501 complete 2017-01-02 22:32:25
## 9 Blood t~ 1 r3 1001 complete 2017-01-05 14:34:27
## 10 MRI SCAN 1 r4 1238 complete 2017-01-06 01:54:23
## 11 Discuss~ 1 r6 1735 complete 2017-01-07 10:18:08
## 12 Check-o~ 1 r7 2230 complete 2017-01-09 19:45:45
## # ... with 1 more variable: .order <int>
n_activities(patients)
## [1] 7
activity_labels(patients)
## [1] Registration Triage and Assessment Blood test
## [4] MRI SCAN X-Ray Discuss Results
## [7] Check-out
## 7 Levels: Blood test Check-out Discuss Results MRI SCAN ... X-Ray
activities(patients)
## # A tibble: 7 x 3
## handling absolute_frequency relative_frequency
## <fct> <int> <dbl>
## 1 Registration 500 0.184
## 2 Triage and Assessment 500 0.184
## 3 Discuss Results 495 0.182
## 4 Check-out 492 0.181
## 5 X-Ray 261 0.0959
## 6 Blood test 237 0.0871
## 7 MRI SCAN 236 0.0867
activity_presence(patients)
## # A tibble: 7 x 3
## handling absolute relative
## <fct> <int> <dbl>
## 1 Registration 500 1
## 2 Triage and Assessment 500 1
## 3 Discuss Results 495 0.99
## 4 Check-out 492 0.984
## 5 X-Ray 261 0.522
## 6 Blood test 237 0.474
## 7 MRI SCAN 236 0.472
end_act_patients <- end_activities(patients, level = "activity")
plot(end_act_patients)
patients %>%
idle_time("resource", units = "days") %>%
plot()
patients %>%
processing_time("activity") %>%
plot
patients %>%
throughput_time("log") %>%
plot()
patients %>%
resource_frequency("resource") %>% plot
patients %>%
resource_involvement("resource") %>% plot
patients %>%
resource_specialisation("resource") %>% plot
traces(patients)
## # A tibble: 7 x 3
## trace absolute_frequen~ relative_frequen~
## <chr> <int> <dbl>
## 1 Registration,Triage and Assessment,X-Ray,~ 258 0.516
## 2 Registration,Triage and Assessment,Blood ~ 234 0.468
## 3 Registration,Triage and Assessment,Blood ~ 2 0.004
## 4 Registration,Triage and Assessment,X-Ray 2 0.004
## 5 Registration,Triage and Assessment 2 0.004
## 6 Registration,Triage and Assessment,X-Ray,~ 1 0.002
## 7 Registration,Triage and Assessment,Blood ~ 1 0.002
n_traces(patients)
## [1] 7
trace_explorer(patients, coverage = 1)
patients %>%
trace_coverage("trace") %>%
plot()
The trace length metric describes the length of traces, i.e. the number of activity instances for each case. It can be computed at the levels case, trace and log.
patients %>%
trace_length("log") %>%
plot
trace_explorer(sepsis, coverage = .10)
trace_length(sepsis)
## min q1 median mean q3 max st_dev iqr
## 3.00000 9.00000 13.00000 14.48952 16.00000 185.00000 11.47594 7.00000
sepsis %>%
activity_presence() %>%
plot()
patients %>% activity_presence() %>%
plot
patients %>%
activity_frequency("activity")
## # A tibble: 7 x 3
## handling absolute relative
## <fct> <int> <dbl>
## 1 Registration 500 0.184
## 2 Triage and Assessment 500 0.184
## 3 Discuss Results 495 0.182
## 4 Check-out 492 0.181
## 5 X-Ray 261 0.0959
## 6 Blood test 237 0.0871
## 7 MRI SCAN 236 0.0867
patients %>%
start_activities("resource-activity")
## # A tibble: 1 x 5
## employee handling absolute relative cum_sum
## <fct> <fct> <int> <dbl> <dbl>
## 1 r1 Registration 500 1 1
patients %>%
end_activities("resource-activity")
## # A tibble: 5 x 5
## employee handling absolute relative cum_sum
## <fct> <fct> <int> <dbl> <dbl>
## 1 r7 Check-out 492 0.984 0.984
## 2 r6 Discuss Results 3 0.006 0.99
## 3 r2 Triage and Assessment 2 0.004 0.994
## 4 r5 X-Ray 2 0.004 0.998
## 5 r3 Blood test 1 0.002 1
The Patients Data example has 1 activity per resource per track so it will not show rework. We can leverage the Sepsis dataset for rework examples.
Redo repetitions are activity executions of the same activity type that are executed not immediately following each other and by a different resource than the first activity occurrence of this activity type.
n_reps_per_resource <- number_of_repetitions(sepsis, level = "resource")
plot(n_reps_per_resource)
n_reps_per_activity <- number_of_repetitions(sepsis, level = "activity")
plot(n_reps_per_activity)
n_reps_per_eventlog <- number_of_repetitions(sepsis, level = "log")
plot(n_reps_per_eventlog)
bupaR has built in functionality for generation, visualization and interpretation of process maps.
Basic Process Map
# Draw process map
process_map(patients)
animate_process(patients)
animate_process(patients, mode = "relative", jitter = 10, legend = "color",
mapping = token_aes(color = token_scale("employee",
scale = "ordinal",
range = RColorBrewer::brewer.pal(7, "Paired"))))
animate_process(patients,
legend = "color",
mapping = token_aes(color = token_scale("employee",
scale = "ordinal",
range = RColorBrewer::brewer.pal(8, "Paired"))))
#convert numeric value into days
my_flags <- data.frame(value = c(0,2,4,8,16)) %>%
mutate(day = days(value))
#The crossing() function joins the cases of ‘patients’ to ‘my_flags’ and creates all possible combinations.
my_timeflags <- patients %>%
cases %>%
crossing(my_flags) %>% ##similar to a SQL outer join
mutate(time = start_timestamp + day) %>%
filter(time <= complete_timestamp) %>%
select("case" = patient,time,value) ##must be case, time, value
patients %>%
animate_process(mode ="absolute",
jitter=10,
legend = "color",
mapping = token_aes(
color = token_scale(my_timeflags
, scale = "ordinal"
, domain = my_flags$value
, range = rev(RColorBrewer::brewer.pal(5,"Spectral"))
)))
animate_process(sample_n(traffic_fines, 1000) %>% filter_trace_frequency(percentage = 0.95),
mode = "relative",
legend = "color",
mapping = token_aes(color = token_scale("amount",
scale = "linear",
range = c("yellow","red"))))
animate_process(patients,
mapping = token_aes(color = token_scale("time",
scale = "time",
range = c("blue","red"))))
library(processcheckR)
sepsis %>%
# check if cases starts with "ER Registration"
check_rule(starts("ER Registration"), label = "r1") %>%
# check if activities "CRP" and "LacticAcid" occur together
check_rule(and("CRP","LacticAcid"), label = "r2") %>%
group_by(r1, r2) %>%
n_cases()
## # A tibble: 4 x 3
## # Groups: r1 [2]
## r1 r2 n_cases
## <lgl> <lgl> <int>
## 1 FALSE FALSE 10
## 2 FALSE TRUE 45
## 3 TRUE FALSE 137
## 4 TRUE TRUE 858
sepsis %>%
filter_rules(
r1 = starts("ER Registration"),
r2 = and("CRP","LacticAcid")) %>%
n_cases()
## [1] 858
sepsis %>%
check_rule(contains("Leucocytes", n = 3)) %>%
group_by(contains_Leucocytes_3) %>%
n_cases()
## # A tibble: 2 x 2
## contains_Leucocytes_3 n_cases
## <lgl> <int>
## 1 FALSE 590
## 2 TRUE 460
sepsis %>%
check_rule(contains_exactly("Leucocytes", n = 4), label = "r1") %>%
group_by(r1) %>%
n_cases()
## # A tibble: 2 x 2
## r1 n_cases
## <lgl> <int>
## 1 FALSE 960
## 2 TRUE 90
sepsis %>%
check_rule(contains_between("Leucocytes", min = 0, max = 10), label = "r1") %>%
group_by(r1) %>%
n_cases()
## # A tibble: 2 x 2
## r1 n_cases
## <lgl> <int>
## 1 FALSE 38
## 2 TRUE 1012
sepsis %>%
check_rule(starts("ER Registration"), label = "r1") %>%
group_by(r1) %>%
n_cases()
## # A tibble: 2 x 2
## r1 n_cases
## <lgl> <int>
## 1 FALSE 55
## 2 TRUE 995
sepsis %>%
check_rule(ends("Release A"), label = "r1") %>%
group_by(r1) %>%
n_cases()
## # A tibble: 2 x 2
## r1 n_cases
## <lgl> <int>
## 1 FALSE 657
## 2 TRUE 393
*How many cases is “ER Sepsis Triage” succeeded by “CRP”
sepsis %>%
check_rule(succession("ER Sepsis Triage","CRP"), label = "r1") %>%
group_by(r1) %>%
n_cases()
## # A tibble: 1 x 2
## r1 n_cases
## <lgl> <int>
## 1 FALSE 1050
sepsis %>%
check_rule(response("ER Sepsis Triage","CRP"), label = "r1") %>%
group_by(r1) %>%
n_cases()
## # A tibble: 2 x 2
## r1 n_cases
## <lgl> <int>
## 1 FALSE 1049
## 2 TRUE 1
sepsis %>%
check_rule(precedence("ER Sepsis Triage","CRP"), label = "r1") %>%
group_by(r1) %>%
n_cases()
## # A tibble: 2 x 2
## r1 n_cases
## <lgl> <int>
## 1 FALSE 1007
## 2 TRUE 43
*(cases where if activity a occurs, activity b also occurs (but not vice versa)) How many cases contain both “CRP” and “ER Sepsis Triage”, if “CPR” occur
sepsis %>%
check_rule(responded_existence("CRP", "ER Sepsis Triage"), label = "r1") %>%
group_by(r1) %>%
n_cases()
## # A tibble: 2 x 2
## r1 n_cases
## <lgl> <int>
## 1 FALSE 1
## 2 TRUE 1049
sepsis %>%
check_rule(and("CRP", "ER Sepsis Triage"), label = "r1") %>%
group_by(r1) %>%
n_cases()
## # A tibble: 2 x 2
## r1 n_cases
## <lgl> <int>
## 1 FALSE 44
## 2 TRUE 1006
sepsis %>%
check_rule(xor("CRP", "ER Sepsis Triage"), label = "r1") %>%
group_by(r1) %>%
n_cases()
## # A tibble: 2 x 2
## r1 n_cases
## <lgl> <int>
## 1 FALSE 1006
## 2 TRUE 44
bupaR supports the following discovery algorithms: Alpha Miner, Inductive Miner and Heuristics Miner
#library(pm4py) # Process mining library from Python which acts as a bridge between PM4Py and bupaR
#use only complete timestamp
#patients_completes <- patients %>% filter_lifecycle("complete")
#discovery_alpha(patients_completes) -> PN
#PN %>% str
#PN$petrinet %>% render_PN()
#discovery_alpha(patients_completes, variant = variant_alpha_plus()) -> PN
#PN$petrinet %>% render_PN()
#use only complete timestamp
#patients_completes <- patients %>% filter_lifecycle("complete")
#discovery_inductive(patients_completes, variant = variant_inductive_only_dfg()) -> PN
#PN %>% str
#PN$petrinet %>% render_PN()
Heuristics Miner is an algorithm which acts on the Directly-Follows Graph. This allows visibility into noice and to find common constructs (i.e. dependecy between two activities - XOR/AND). The output here is a Heuristics Net which is an object that contains the objects and relationships between them. This algorithm is best used for either real-life data without too many different events or for generation of a petri net.
Dependency graph / matrix
dependency_matrix(patients) %>% render_dependency_matrix()
m <- precedence_matrix_absolute(patients)
as.matrix(m)
## consequent
## antecedent Blood test Check-out Discuss Results End MRI SCAN
## Blood test 0 0 0 1 236
## Check-out 0 0 0 492 0
## Discuss Results 0 492 0 3 0
## End 0 0 0 0 0
## MRI SCAN 0 0 236 0 0
## Registration 0 0 0 0 0
## Start 0 0 0 0 0
## Triage and Assessment 237 0 0 2 0
## X-Ray 0 0 259 2 0
## consequent
## antecedent Registration Start Triage and Assessment X-Ray
## Blood test 0 0 0 0
## Check-out 0 0 0 0
## Discuss Results 0 0 0 0
## End 0 0 0 0
## MRI SCAN 0 0 0 0
## Registration 0 0 500 0
## Start 500 0 0 0
## Triage and Assessment 0 0 0 261
## X-Ray 0 0 0 0
cn <- causal_net(patients, threshold = .7)
pn <- as.petrinet(cn)
render_PN(pn)
animate_process(patients,
mapping = token_aes(shape = "image",
size = token_scale(10),
image = token_scale("https://upload.wikimedia.org/wikipedia/en/5/5f/Pacman.gif")))
The bupaR package is deveopled by volunteers and academic researchers. Shout out to:Janssenswillen, G., Depaire, B., Swennen, M., Jans, M., & Vanhoof, K. (2019). bupaR: Enabling reproducible business process analysis. Knowledge-Based Systems, 163, 927-930.
###########Use purrr to apply to all packages - WIP
citation("processmapR")
##
## To cite package 'processmapR' in publications use:
##
## Gert Janssenswillen (2020). processmapR: Construct Process Maps Using
## Event Data. R package version 0.3.4.
## https://CRAN.R-project.org/package=processmapR
##
## A BibTeX entry for LaTeX users is
##
## @Manual{,
## title = {processmapR: Construct Process Maps Using Event Data},
## author = {Gert Janssenswillen},
## year = {2020},
## note = {R package version 0.3.4},
## url = {https://CRAN.R-project.org/package=processmapR},
## }