install
create Event Log Object
Basic Event Log Functionalities
Reading and Writing XES-files
data set in BupaR
Exploratory and Descriptive Event Data Analysi
event data subsetting
- EVENT FILTERS
- CASE FILTERS
process Visulization
Process Dashboard

install

install.packages("bupaR")
install.packages("edeaR")
install.packages("eventdataR")
install.packages("processmapR")
install.packages("processmonitR")
install.packages("xesreadR")
install.packages("petrinetR")

then library

library(bupaR)

## 
## Attaching package: 'bupaR'

## The following object is masked from 'package:stats':
## 
##     filter

## The following object is masked from 'package:utils':
## 
##     timestamp

create Event Log Object

An eventlog-object can be created using the eventlog function. This function needs as arguments a data.frame and the column names of the appropriate fields descri- bing the case identifier, activity identifier, timestamp, li- fecycle transition, resource and an activity instance iden- tifier.

An event log with minimal requirements (timestamp, ca- se and activity identifier) can be created with the sim- ple_eventlog function.

for example

library(lubridate)

## 
## Attaching package: 'lubridate'

## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union

data <- data.frame(case = rep("A",5),
activity_id = c("A","B","C","D","E"),
activity_instance_id = 1:5,
lifecycle_id = rep("complete",5),
timestamp = now()+ddays(5),
resource = rep("resource 1", 5))

Meventlog <- eventlog(data,case_id = "case",
activity_id = "activity_id",
activity_instance_id = "activity_instance_id",
lifecycle_id = "lifecycle_id",
timestamp = "timestamp",
resource_id = "resource")

Basic Event Log Functionalities

Create a summary of an eventlog-object.

summary(Meventlog)

## Number of events:  5
## Number of cases:  1
## Number of traces:  1
## Number of distinct activities:  5
## Average trace length:  5
## 
## Start eventlog:  2022-03-22 04:26:53
## End eventlog:  2022-03-22 04:26:53

##      case           activity_id activity_instance_id   lifecycle_id
##  Length:5           A:1         Length:5             complete:5    
##  Class :character   B:1         Class :character                   
##  Mode  :character   C:1         Mode  :character                   
##                     D:1                                            
##                     E:1                                            
##                                                                    
##    timestamp                         resource     .order 
##  Min.   :2022-03-22 04:26:53   resource 1:5   Min.   :1  
##  1st Qu.:2022-03-22 04:26:53                  1st Qu.:2  
##  Median :2022-03-22 04:26:53                  Median :3  
##  Mean   :2022-03-22 04:26:53                  Mean   :3  
##  3rd Qu.:2022-03-22 04:26:53                  3rd Qu.:4  
##  Max.   :2022-03-22 04:26:53                  Max.   :5

bupaR::activities(Meventlog)

## # A tibble: 5 × 3
##   activity_id absolute_frequency relative_frequency
##   <fct>                    <int>              <dbl>
## 1 A                            1                0.2
## 2 B                            1                0.2
## 3 C                            1                0.2
## 4 D                            1                0.2
## 5 E                            1                0.2

bupaR::cases(Meventlog)

## Warning: `tbl_df()` was deprecated in dplyr 1.0.0.
## Please use `tibble::as_tibble()` instead.

## # A tibble: 1 × 10
##   case  trace_length number_of_activities start_timestamp     complete_timestamp 
##   <chr>        <int>                <int> <dttm>              <dttm>             
## 1 A                5                    5 2022-03-22 04:26:53 2022-03-22 04:26:53
## # … with 5 more variables: trace <chr>, trace_id <dbl>, duration_in_days <dbl>,
## #   first_activity <fct>, last_activity <fct>

bupaR::resources(Meventlog)

## # A tibble: 1 × 3
##   resource   absolute_frequency relative_frequency
##   <fct>                   <int>              <dbl>
## 1 resource 1                  5                  1

bupaR::traces(Meventlog)

## # A tibble: 1 × 3
##   trace     absolute_frequency relative_frequency
##   <chr>                  <int>              <dbl>
## 1 A,B,C,D,E                  1                  1

Obtain the mapping of an eventlog (set of identifiers) or obtain single identifiers.

bupaR::mapping(Meventlog)

## Case identifier:     case 
## Activity identifier:     activity_id 
## Resource identifier:     resource 
## Activity instance identifier:    activity_instance_id 
## Timestamp:           timestamp 
## Lifecycle transition:        lifecycle_id

bupaR::activity_id(Meventlog)

## [1] "activity_id"

bupaR::activity_instance_id(Meventlog)

## [1] "activity_instance_id"

bupaR::case_id(Meventlog)

## [1] "case"

bupaR::lifecycle_id(Meventlog)

## [1] "lifecycle_id"

bupaR::resource_id(Meventlog)

## [1] "resource"

bupaR::timestamp(Meventlog)

## [1] "timestamp"

Calculate the number of distinct activities, activity instan- ces, cases, events, resources and traces.

n_activities(Meventlog)

## [1] 5

bupaR::n_activity_instances(Meventlog)

## [1] 5

bupaR::n_cases(Meventlog)

## [1] 1

bupaR::n_events(Meventlog)

## [1] 5

bupaR::n_resources(Meventlog)

## [1] 1

bupaR::n_traces(Meventlog)

## [1] 1

Subset of slice of the event log from row n until row m

bupaR::slice(eventlog, n:m)

Sample n cases from the event log.

bupaR::sample_n(eventlog, n)

Add new variables to the event log

bupaR::mutate(eventlog, ...)

Group an event log on one or more event or case attribu- tes.

bupaR::group_by(eventlog, ...)

Reading and Writing XES-files

 xesreadR::read_xes(xesfile)
xesreadR::read_xes_cases(xesfile)
xesreadR::write_xes(eventlog,
    xesfile,
    case_attributes)

data set in BupaR

eventdataR::BPIC_14_incident_log eventdataR::BPIC_14_incident_case_attributes eventdataR::BPIC_15_1
eventdataR::sepsis
eventdataR::patients

Exploratory and Descriptive Event Data Analysi

edeaR provides a varied set of metrics to analyse event logs.

edeaR::activity_frequency
edeaR::activity_presence
edeaR::end_activities
edeaR::idle_time
edeaR::number_of_repetitions
edeaR::number_of_selfloops
edeaR::number_of_traces
edeaR::processing_time
edeaR::resource_frequency
edeaR::resource_involvement
edeaR::resource_specialisation
edeaR::size_of_repetitions
edeaR::size_of_selfloops
edeaR::start_activities
edeaR::throughput_time
edeaR::trace_coverage
edeaR::trace_length

for example

library(edeaR)
patients %>% activity_frequency(level = "activity")

## # A tibble: 7 × 3
##   handling              absolute relative
##   <fct>                    <int>    <dbl>
## 1 Registration               500   0.184 
## 2 Triage and Assessment      500   0.184 
## 3 Discuss Results            495   0.182 
## 4 Check-out                  492   0.181 
## 5 X-Ray                      261   0.0959
## 6 Blood test                 237   0.0871
## 7 MRI SCAN                   236   0.0867

we can do visulization

patients %>% activity_frequency(level = "activity") %>% plot()

patients %>% trace_length()

##       min        q1    median      mean        q3       max    st_dev       iqr 
## 2.0000000 5.0000000 5.0000000 5.4420000 6.0000000 6.0000000 0.5790567 1.0000000

event data subsetting

EVENT FILTERS

edeaR::ifilter_activity
edeaR::ifilter_activity_Frequency
edeaR::filter_attributes
edeaR::ifilter_resource
edeaR::ifilter_resource_frequency
edeaR::ifilter_time_period
edeaR::ifilter_trim

CASE FILTERS

edeaR::ifilter_activity_presence
edeaR::ifilter_case
edeaR::ifilter_endpoints
edeaR::ifilter_precedence
edeaR::ifilter_processing_time
edeaR::ifilter_throughput_time
edeaR::ifilter_time_period
edeaR::ifilter_trace_frequency
edeaR::ifilter_trace_length

for example

patients %>% filter_activity_frequency(percentile = 0.8)

## Warning in deprecated_perc(percentage, ...): Argument percentile_cut_off is
## deprecated. Use percentage instead.

## Log of 4496 events consisting of:
## 6 traces 
## 500 cases 
## 2248 instances of 5 activities 
## 5 resources 
## Events occurred from 2017-01-02 11:41:53 until 2018-05-05 07:16:02 
##  
## Variables were mapped as follows:
## Case identifier:     patient 
## Activity identifier:     handling 
## Resource identifier:     employee 
## Activity instance identifier:    handling_id 
## Timestamp:           time 
## Lifecycle transition:        registration_type 
## 
## # A tibble: 4,496 × 7
##    handling     patient employee handling_id registration_type time               
##    <fct>        <chr>   <fct>    <chr>       <fct>             <dttm>             
##  1 Registration 1       r1       1           start             2017-01-02 11:41:53
##  2 Registration 2       r1       2           start             2017-01-02 11:41:53
##  3 Registration 3       r1       3           start             2017-01-04 01:34:05
##  4 Registration 4       r1       4           start             2017-01-04 01:34:04
##  5 Registration 5       r1       5           start             2017-01-04 16:07:47
##  6 Registration 6       r1       6           start             2017-01-04 16:07:47
##  7 Registration 7       r1       7           start             2017-01-05 04:56:11
##  8 Registration 8       r1       8           start             2017-01-05 04:56:11
##  9 Registration 9       r1       9           start             2017-01-06 05:58:54
## 10 Registration 10      r1       10          start             2017-01-06 05:58:54
## # … with 4,486 more rows, and 1 more variable: .order <int>

patients %>% filter_activity_presence(
c("Registration","Check-out"), method = "all")

## Log of 5388 events consisting of:
## 2 traces 
## 492 cases 
## 2694 instances of 7 activities 
## 7 resources 
## Events occurred from 2017-01-02 11:41:53 until 2018-05-03 03:34:55 
##  
## Variables were mapped as follows:
## Case identifier:     patient 
## Activity identifier:     handling 
## Resource identifier:     employee 
## Activity instance identifier:    handling_id 
## Timestamp:           time 
## Lifecycle transition:        registration_type 
## 
## # A tibble: 5,388 × 7
##    handling     patient employee handling_id registration_type time               
##    <fct>        <chr>   <fct>    <chr>       <fct>             <dttm>             
##  1 Registration 1       r1       1           start             2017-01-02 11:41:53
##  2 Registration 2       r1       2           start             2017-01-02 11:41:53
##  3 Registration 3       r1       3           start             2017-01-04 01:34:05
##  4 Registration 4       r1       4           start             2017-01-04 01:34:04
##  5 Registration 5       r1       5           start             2017-01-04 16:07:47
##  6 Registration 6       r1       6           start             2017-01-04 16:07:47
##  7 Registration 7       r1       7           start             2017-01-05 04:56:11
##  8 Registration 8       r1       8           start             2017-01-05 04:56:11
##  9 Registration 9       r1       9           start             2017-01-06 05:58:54
## 10 Registration 10      r1       10          start             2017-01-06 05:58:54
## # … with 5,378 more rows, and 1 more variable: .order <int>

process Visulization

The processmapR package provides functions to visualize processes, both from a control-flow perspective and from a resource perspective. The process_map function allows the user to analyse control-flow from a frequency and a performance per- spectives. The precendence_matrix provides a more compact overview of the process flows. Furthermore, the package provides functions to explore (in)frequent traces and a dotted chart (including an interactive version). Also resource maps and matrices can be made.

library(processmapR)
process_map(patients)

patients %>% processmapR::precedence_matrix() %>% plot()

patients %>% trace_explorer()

## Warning in trace_explorer(.): No coverage or number of traces set. Defaulting to
## 0.2 for frequent traces.

## Warning: `rename_()` was deprecated in dplyr 0.7.0.
## Please use `rename()` instead.

patients %>% idotted_chart()

patients %>% resource_map()

patients %>% processmapR::resource_matrix()

## # A tibble: 7 × 3
##   antecedent consequent     n
##   <fct>      <fct>      <int>
## 1 r1         r2           500
## 2 r2         r3           237
## 3 r2         r5           261
## 4 r3         r4           236
## 5 r4         r6           236
## 6 r5         r6           259
## 7 r6         r7           492

Process Dashboard

The processmonitR package provides predefined dashboards to interactively monitor processes from different perspectives. Cur- rently, four different dashboards are provided: 1) an activity dashboard, focused on activities, 2) a resource dashboard, focusing on resources, 3) a rework dashboard, focusing on rework and waste, such as self-loops and repetitions, and a 4) performance dashboard, focusing on the time perspective, i.e. throughput time, processing time and idle time. Each dashboard combines several of the metrics and visualization from other bupaR packages into easy to use and navigate dashboards. The dashboards, implemented in Shiny, can be used as standalone dashboards, or incorporated into larger, tailor- made process monitoring dashboards.

library(processmonitR)
patients %>% activity_dashboard() 
resource_dashboard()
rework_dashboard()
performance_dashboard()

Process Mining With R

MiLin

3/17/2022

install

create Event Log Object

Basic Event Log Functionalities

Create a summary of an eventlog-object.

Obtain the mapping of an eventlog (set of identifiers) or obtain single identifiers.

Calculate the number of distinct activities, activity instan- ces, cases, events, resources and traces.

Subset of slice of the event log from row n until row m

Sample n cases from the event log.

Add new variables to the event log

Group an event log on one or more event or case attribu- tes.

Reading and Writing XES-files

data set in BupaR

Exploratory and Descriptive Event Data Analysi

event data subsetting

EVENT FILTERS

CASE FILTERS

process Visulization

Process Dashboard